[PATCH 0/5] btrfs: qgroup fixes for btrfs_drop

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] btrfs: qgroup fixes for btrfs_drop_snapshot V5
@ 2014-07-17 19:38 Mark Fasheh
  2014-07-17 19:39 ` [PATCH 1/5] btrfs: add trace for qgroup accounting Mark Fasheh
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Mark Fasheh @ 2014-07-17 19:38 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, Josef Bacik, Mark Fasheh

Hi, the following patches try to fix a long outstanding issue with qgroups
and snapshot deletion. The core problem is that btrfs_drop_snapshot will
skip shared extents during it's tree walk. This results in an inconsistent
qgroup state once the drop is processed. We also have a bug where qgroup
items are not deleted after drop_snapshot. The orphaned items will cause
btrfs to go readonly when a snapshot is created with the same id as the
deleted one.

The first patch adds some tracing which I found very useful in debugging
qgroup operations. The second patch is an actual fix to the problem. A third
patch, from Josef is also added. We need this because it fixes at least one
set of inconsistencies qgroups can get to via drop_snapshot. The fourth
patch adds code to delete qgroup items from disk once drop_snapshot has
completed.

With this version of the patch series, I can no longer reproduce
qgroup inconsistencies via drop_snapshot on my test disks.

Change from last patch set:

- Added a small fix (patch #5). I can fold this back into the main patch if
  requested.

Changes from V3-V4:

- Added patch 'btrfs: delete qgroup items in drop_snapshot'

Changes from V2-V3:

- search on bytenr and root, but not seq in btrfs_record_ref when
  we're looking for existing qgroup operations.

Changes before that (V1-V2):

- remove extra extent_buffer_uptodate call from account_shared_subtree()

- catch return values for the accounting calls now and do the right thing
  (log an error and tell the user to rescan)

- remove the loop on roots in qgroup_subtree_accounting and just use the
  nnodes member to make our first decision.

- Don't queue up the subtree root for a change (the code in drop_snapshot
  handkles qgroup updates for this block).

- only walk subtrees if we're actually in DROP_REFERENCE stage and we're
  going to call free_extent

- account leaf items for level zero blocks that we are dropping in
  walk_up_proc

Please review, thanks. Diffstat follows,
	--Mark

 fs/btrfs/ctree.c             |   20 +-
 fs/btrfs/ctree.h             |    4 
 fs/btrfs/extent-tree.c       |  291  ++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/qgroup.c            |  295  +++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/qgroup.h            |    4 
 fs/btrfs/super.c             |    1 
 include/trace/events/btrfs.h |   59 ++++++++
 7 files changed, 641 insertions(+), 33 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/5] btrfs: add trace for qgroup accounting
  2014-07-17 19:38 [PATCH 0/5] btrfs: qgroup fixes for btrfs_drop_snapshot V5 Mark Fasheh
@ 2014-07-17 19:39 ` Mark Fasheh
  2014-07-17 19:39 ` [PATCH 2/5] btrfs: qgroup: account shared subtrees during snapshot delete Mark Fasheh
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Mark Fasheh @ 2014-07-17 19:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, Josef Bacik, Mark Fasheh

We want this to debug qgroup changes on live systems.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/qgroup.c            |  3 +++
 fs/btrfs/super.c             |  1 +
 include/trace/events/btrfs.h | 56 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 60 insertions(+)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 98cb6b2..6a6dc62 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1290,6 +1290,7 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans,
 	oper->seq = atomic_inc_return(&fs_info->qgroup_op_seq);
 	INIT_LIST_HEAD(&oper->elem.list);
 	oper->elem.seq = 0;
+	trace_btrfs_qgroup_record_ref(oper);
 	ret = insert_qgroup_oper(fs_info, oper);
 	if (ret) {
 		/* Shouldn't happen so have an assert for developers */
@@ -1911,6 +1912,8 @@ static int btrfs_qgroup_account(struct btrfs_trans_handle *trans,
 
 	ASSERT(is_fstree(oper->ref_root));
 
+	trace_btrfs_qgroup_account(oper);
+
 	switch (oper->type) {
 	case BTRFS_QGROUP_OPER_ADD_EXCL:
 	case BTRFS_QGROUP_OPER_SUB_EXCL:
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 8e16bca..38b8bd8 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -60,6 +60,7 @@
 #include "backref.h"
 #include "tests/btrfs-tests.h"
 
+#include "qgroup.h"
 #define CREATE_TRACE_POINTS
 #include <trace/events/btrfs.h>
 
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 4ee4e30..b8774b3 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -23,6 +23,7 @@ struct map_lookup;
 struct extent_buffer;
 struct btrfs_work;
 struct __btrfs_workqueue;
+struct btrfs_qgroup_operation;
 
 #define show_ref_type(type)						\
 	__print_symbolic(type,						\
@@ -1119,6 +1120,61 @@ DEFINE_EVENT(btrfs__workqueue_done, btrfs_workqueue_destroy,
 	TP_ARGS(wq)
 );
 
+#define show_oper_type(type)						\
+	__print_symbolic(type,						\
+		{ BTRFS_QGROUP_OPER_ADD_EXCL, 	"OPER_ADD_EXCL" },	\
+		{ BTRFS_QGROUP_OPER_ADD_SHARED, "OPER_ADD_SHARED" },	\
+		{ BTRFS_QGROUP_OPER_SUB_EXCL, 	"OPER_SUB_EXCL" },	\
+		{ BTRFS_QGROUP_OPER_SUB_SHARED,	"OPER_SUB_SHARED" })
+
+DECLARE_EVENT_CLASS(btrfs_qgroup_oper,
+
+	TP_PROTO(struct btrfs_qgroup_operation *oper),
+
+	TP_ARGS(oper),
+
+	TP_STRUCT__entry(
+		__field(	u64,  ref_root		)
+		__field(	u64,  bytenr		)
+		__field(	u64,  num_bytes		)
+		__field(	u64,  seq		)
+		__field(	int,  type		)
+		__field(	u64,  elem_seq		)
+	),
+
+	TP_fast_assign(
+		__entry->ref_root	= oper->ref_root;
+		__entry->bytenr		= oper->bytenr,
+		__entry->num_bytes	= oper->num_bytes;
+		__entry->seq 		= oper->seq;
+		__entry->type		= oper->type;
+		__entry->elem_seq	= oper->elem.seq;
+	),
+
+	TP_printk("ref_root = %llu, bytenr = %llu, num_bytes = %llu, "
+		  "seq = %llu, elem.seq = %llu, type = %s",
+		  (unsigned long long)__entry->ref_root,
+		  (unsigned long long)__entry->bytenr,
+		  (unsigned long long)__entry->num_bytes,
+		  (unsigned long long)__entry->seq,
+		  (unsigned long long)__entry->elem_seq,
+		  show_oper_type(__entry->type))
+);
+
+DEFINE_EVENT(btrfs_qgroup_oper, btrfs_qgroup_account,
+
+	TP_PROTO(struct btrfs_qgroup_operation *oper),
+
+	TP_ARGS(oper)
+);
+
+DEFINE_EVENT(btrfs_qgroup_oper, btrfs_qgroup_record_ref,
+
+	TP_PROTO(struct btrfs_qgroup_operation *oper),
+
+	TP_ARGS(oper)
+);
+
 #endif /* _TRACE_BTRFS_H */
 
 /* This part must be outside protection */
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/5] btrfs: qgroup: account shared subtrees during snapshot delete
  2014-07-17 19:38 [PATCH 0/5] btrfs: qgroup fixes for btrfs_drop_snapshot V5 Mark Fasheh
  2014-07-17 19:39 ` [PATCH 1/5] btrfs: add trace for qgroup accounting Mark Fasheh
@ 2014-07-17 19:39 ` Mark Fasheh
  2014-08-12 18:22   ` Chris Mason
  2014-07-17 19:39 ` [PATCH 3/5] Btrfs: __btrfs_mod_ref should always use no_quota Mark Fasheh
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Mark Fasheh @ 2014-07-17 19:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, Josef Bacik, Mark Fasheh

During its tree walk, btrfs_drop_snapshot() will skip any shared
subtrees it encounters. This is incorrect when we have qgroups
turned on as those subtrees need to have their contents
accounted. In particular, the case we're concerned with is when
removing our snapshot root leaves the subtree with only one root
reference.

In those cases we need to find the last remaining root and add
each extent in the subtree to the corresponding qgroup exclusive
counts.

This patch implements the shared subtree walk and a new qgroup
operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of
this type is encountered during qgroup accounting, we search for
any root references to that extent and in the case that we find
only one reference left, we go ahead and do the math on it's
exclusive counts.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/extent-tree.c       | 261 +++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/qgroup.c            | 165 +++++++++++++++++++++++++++
 fs/btrfs/qgroup.h            |   1 +
 include/trace/events/btrfs.h |   3 +-
 4 files changed, 429 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 813537f..1aa4325 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7478,6 +7478,220 @@ reada:
 	wc->reada_slot = slot;
 }
 
+static int account_leaf_items(struct btrfs_trans_handle *trans,
+			      struct btrfs_root *root,
+			      struct extent_buffer *eb)
+{
+	int nr = btrfs_header_nritems(eb);
+	int i, extent_type, ret;
+	struct btrfs_key key;
+	struct btrfs_file_extent_item *fi;
+	u64 bytenr, num_bytes;
+
+	for (i = 0; i < nr; i++) {
+		btrfs_item_key_to_cpu(eb, &key, i);
+
+		if (key.type != BTRFS_EXTENT_DATA_KEY)
+			continue;
+
+		fi = btrfs_item_ptr(eb, i, struct btrfs_file_extent_item);
+		/* filter out non qgroup-accountable extents  */
+		extent_type = btrfs_file_extent_type(eb, fi);
+
+		if (extent_type == BTRFS_FILE_EXTENT_INLINE)
+			continue;
+
+		bytenr = btrfs_file_extent_disk_bytenr(eb, fi);
+		if (!bytenr)
+			continue;
+
+		num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
+
+		ret = btrfs_qgroup_record_ref(trans, root->fs_info,
+					      root->objectid,
+					      bytenr, num_bytes,
+					      BTRFS_QGROUP_OPER_SUB_SUBTREE, 0);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+/*
+ * Walk up the tree from the bottom, freeing leaves and any interior
+ * nodes which have had all slots visited. If a node (leaf or
+ * interior) is freed, the node above it will have it's slot
+ * incremented. The root node will never be freed.
+ *
+ * At the end of this function, we should have a path which has all
+ * slots incremented to the next position for a search. If we need to
+ * read a new node it will be NULL and the node above it will have the
+ * correct slot selected for a later read.
+ *
+ * If we increment the root nodes slot counter past the number of
+ * elements, 1 is returned to signal completion of the search.
+ */
+static int adjust_slots_upwards(struct btrfs_root *root,
+				struct btrfs_path *path, int root_level)
+{
+	int level = 0;
+	int nr, slot;
+	struct extent_buffer *eb;
+
+	if (root_level == 0)
+		return 1;
+
+	while (level <= root_level) {
+		eb = path->nodes[level];
+		nr = btrfs_header_nritems(eb);
+		path->slots[level]++;
+		slot = path->slots[level];
+		if (slot >= nr || level == 0) {
+			/*
+			 * Don't free the root -  we will detect this
+			 * condition after our loop and return a
+			 * positive value for caller to stop walking the tree.
+			 */
+			if (level != root_level) {
+				btrfs_tree_unlock_rw(eb, path->locks[level]);
+				path->locks[level] = 0;
+
+				free_extent_buffer(eb);
+				path->nodes[level] = NULL;
+				path->slots[level] = 0;
+			}
+		} else {
+			/*
+			 * We have a valid slot to walk back down
+			 * from. Stop here so caller can process these
+			 * new nodes.
+			 */
+			break;
+		}
+
+		level++;
+	}
+
+	eb = path->nodes[root_level];
+	if (path->slots[root_level] >= btrfs_header_nritems(eb))
+		return 1;
+
+	return 0;
+}
+
+/*
+ * root_eb is the subtree root and is locked before this function is called.
+ */
+static int account_shared_subtree(struct btrfs_trans_handle *trans,
+				  struct btrfs_root *root,
+				  struct extent_buffer *root_eb,
+				  u64 root_gen,
+				  int root_level)
+{
+	int ret = 0;
+	int level;
+	struct extent_buffer *eb = root_eb;
+	struct btrfs_path *path = NULL;
+
+	BUG_ON(root_level < 0 || root_level > BTRFS_MAX_LEVEL);
+	BUG_ON(root_eb == NULL);
+
+	if (!root->fs_info->quota_enabled)
+		return 0;
+
+	if (!extent_buffer_uptodate(root_eb)) {
+		ret = btrfs_read_buffer(root_eb, root_gen);
+		if (ret)
+			goto out;
+	}
+
+	if (root_level == 0) {
+		ret = account_leaf_items(trans, root, root_eb);
+		goto out;
+	}
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	/*
+	 * Walk down the tree.  Missing extent blocks are filled in as
+	 * we go. Metadata is accounted every time we read a new
+	 * extent block.
+	 *
+	 * When we reach a leaf, we account for file extent items in it,
+	 * walk back up the tree (adjusting slot pointers as we go)
+	 * and restart the search process.
+	 */
+	extent_buffer_get(root_eb); /* For path */
+	path->nodes[root_level] = root_eb;
+	path->slots[root_level] = 0;
+	path->locks[root_level] = 0; /* so release_path doesn't try to unlock */
+walk_down:
+	level = root_level;
+	while (level >= 0) {
+		if (path->nodes[level] == NULL) {
+			int child_bsize = btrfs_level_size(root, level);
+			int parent_slot;
+			u64 child_gen;
+			u64 child_bytenr;
+
+			/* We need to get child blockptr/gen from
+			 * parent before we can read it. */
+			eb = path->nodes[level + 1];
+			parent_slot = path->slots[level + 1];
+			child_bytenr = btrfs_node_blockptr(eb, parent_slot);
+			child_gen = btrfs_node_ptr_generation(eb, parent_slot);
+
+			eb = read_tree_block(root, child_bytenr, child_bsize,
+					     child_gen);
+			if (!eb || !extent_buffer_uptodate(eb)) {
+				ret = -EIO;
+				goto out;
+			}
+
+			path->nodes[level] = eb;
+			path->slots[level] = 0;
+
+			btrfs_tree_read_lock(eb);
+			btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
+			path->locks[level] = BTRFS_READ_LOCK_BLOCKING;
+
+			ret = btrfs_qgroup_record_ref(trans, root->fs_info,
+						root->objectid,
+						child_bytenr,
+						child_bsize,
+						BTRFS_QGROUP_OPER_SUB_SUBTREE,
+						0);
+			if (ret)
+				goto out;
+
+		}
+
+		if (level == 0) {
+			ret = account_leaf_items(trans, root, path->nodes[level]);
+			if (ret)
+				goto out;
+
+			/* Nonzero return here means we completed our search */
+			ret = adjust_slots_upwards(root, path, root_level);
+			if (ret)
+				break;
+
+			/* Restart search with new slots */
+			goto walk_down;
+		}
+
+		level--;
+	}
+
+	ret = 0;
+out:
+	btrfs_free_path(path);
+
+	return ret;
+}
+
 /*
  * helper to process tree block while walking down the tree.
  *
@@ -7581,6 +7795,7 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans,
 	int level = wc->level;
 	int reada = 0;
 	int ret = 0;
+	bool need_account = false;
 
 	generation = btrfs_node_ptr_generation(path->nodes[level],
 					       path->slots[level]);
@@ -7626,6 +7841,7 @@ static noinline int do_walk_down(struct btrfs_trans_handle *trans,
 
 	if (wc->stage == DROP_REFERENCE) {
 		if (wc->refs[level - 1] > 1) {
+			need_account = true;
 			if (level == 1 &&
 			    (wc->flags[0] & BTRFS_BLOCK_FLAG_FULL_BACKREF))
 				goto skip;
@@ -7689,6 +7905,16 @@ skip:
 			parent = 0;
 		}
 
+		if (need_account) {
+			ret = account_shared_subtree(trans, root, next,
+						     generation, level - 1);
+			if (ret) {
+				printk_ratelimited(KERN_ERR "BTRFS: %s Error "
+					"%d accounting shared subtree. Quota "
+					"is out of sync, rescan required.\n",
+					root->fs_info->sb->s_id, ret);
+			}
+		}
 		ret = btrfs_free_extent(trans, root, bytenr, blocksize, parent,
 				root->root_key.objectid, level - 1, 0, 0);
 		BUG_ON(ret); /* -ENOMEM */
@@ -7775,6 +8001,13 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans,
 				ret = btrfs_dec_ref(trans, root, eb, 0,
 						    wc->for_reloc);
 			BUG_ON(ret); /* -ENOMEM */
+			ret = account_leaf_items(trans, root, eb);
+			if (ret) {
+				printk_ratelimited(KERN_ERR "BTRFS: %s Error "
+					"%d accounting leaf items. Quota "
+					"is out of sync, rescan required.\n",
+					root->fs_info->sb->s_id, ret);
+			}
 		}
 		/* make block locked assertion in clean_tree_block happy */
 		if (!path->locks[level] &&
@@ -7900,6 +8133,8 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
 	int level;
 	bool root_dropped = false;
 
+	btrfs_debug(root->fs_info, "Drop subvolume %llu", root->objectid);
+
 	path = btrfs_alloc_path();
 	if (!path) {
 		err = -ENOMEM;
@@ -8025,6 +8260,24 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
 				goto out_end_trans;
 			}
 
+			/*
+			 * Qgroup update accounting is run from
+			 * delayed ref handling. This usually works
+			 * out because delayed refs are normally the
+			 * only way qgroup updates are added. However,
+			 * we may have added updates during our tree
+			 * walk so run qgroups here to make sure we
+			 * don't lose any updates.
+			 */
+			ret = btrfs_delayed_qgroup_accounting(trans,
+							      root->fs_info);
+			if (ret)
+				printk_ratelimited(KERN_ERR "BTRFS: Failure %d "
+						   "running qgroup updates "
+						   "during snapshot delete. "
+						   "Quota is out of sync, "
+						   "rescan required.\n", ret);
+
 			btrfs_end_transaction_throttle(trans, tree_root);
 			if (!for_reloc && btrfs_need_cleaner_sleep(root)) {
 				pr_debug("BTRFS: drop snapshot early exit\n");
@@ -8078,6 +8331,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
 	}
 	root_dropped = true;
 out_end_trans:
+	ret = btrfs_delayed_qgroup_accounting(trans, root->fs_info);
+	if (ret)
+		printk_ratelimited(KERN_ERR "BTRFS: Failure %d "
+				   "running qgroup updates "
+				   "during snapshot delete. "
+				   "Quota is out of sync, "
+				   "rescan required.\n", ret);
+
 	btrfs_end_transaction_throttle(trans, tree_root);
 out_free:
 	kfree(wc);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 6a6dc62..1569338 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1201,6 +1201,50 @@ out:
 	mutex_unlock(&fs_info->qgroup_ioctl_lock);
 	return ret;
 }
+
+static int comp_oper_exist(struct btrfs_qgroup_operation *oper1,
+			   struct btrfs_qgroup_operation *oper2)
+{
+	/*
+	 * Ignore seq and type here, we're looking for any operation
+	 * at all related to this extent on that root.
+	 */
+	if (oper1->bytenr < oper2->bytenr)
+		return -1;
+	if (oper1->bytenr > oper2->bytenr)
+		return 1;
+	if (oper1->ref_root < oper2->ref_root)
+		return -1;
+	if (oper1->ref_root > oper2->ref_root)
+		return 1;
+	return 0;
+}
+
+static int qgroup_oper_exists(struct btrfs_fs_info *fs_info,
+			      struct btrfs_qgroup_operation *oper)
+{
+	struct rb_node *n;
+	struct btrfs_qgroup_operation *cur;
+	int cmp;
+
+	spin_lock(&fs_info->qgroup_op_lock);
+	n = fs_info->qgroup_op_tree.rb_node;
+	while (n) {
+		cur = rb_entry(n, struct btrfs_qgroup_operation, n);
+		cmp = comp_oper_exist(cur, oper);
+		if (cmp < 0) {
+			n = n->rb_right;
+		} else if (cmp) {
+			n = n->rb_left;
+		} else {
+			spin_unlock(&fs_info->qgroup_op_lock);
+			return -EEXIST;
+		}
+	}
+	spin_unlock(&fs_info->qgroup_op_lock);
+	return 0;
+}
+
 static int comp_oper(struct btrfs_qgroup_operation *oper1,
 		     struct btrfs_qgroup_operation *oper2)
 {
@@ -1290,7 +1334,25 @@ int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans,
 	oper->seq = atomic_inc_return(&fs_info->qgroup_op_seq);
 	INIT_LIST_HEAD(&oper->elem.list);
 	oper->elem.seq = 0;
+
 	trace_btrfs_qgroup_record_ref(oper);
+
+	if (type == BTRFS_QGROUP_OPER_SUB_SUBTREE) {
+		/*
+		 * If any operation for this bytenr/ref_root combo
+		 * exists, then we know it's not exclusively owned and
+		 * shouldn't be queued up.
+		 *
+		 * This also catches the case where we have a cloned
+		 * extent that gets queued up multiple times during
+		 * drop snapshot.
+		 */
+		if (qgroup_oper_exists(fs_info, oper)) {
+			kfree(oper);
+			return 0;
+		}
+	}
+
 	ret = insert_qgroup_oper(fs_info, oper);
 	if (ret) {
 		/* Shouldn't happen so have an assert for developers */
@@ -1885,6 +1947,106 @@ out:
 }
 
 /*
+ * Process a reference to a shared subtree. This type of operation is
+ * queued during snapshot removal when we encounter extents which are
+ * shared between more than one root.
+ */
+static int qgroup_subtree_accounting(struct btrfs_trans_handle *trans,
+				     struct btrfs_fs_info *fs_info,
+				     struct btrfs_qgroup_operation *oper)
+{
+	struct ulist *roots = NULL;
+	struct ulist_node *unode;
+	struct ulist_iterator uiter;
+	struct btrfs_qgroup_list *glist;
+	struct ulist *parents;
+	int ret = 0;
+	struct btrfs_qgroup *qg;
+	u64 root_obj = 0;
+	struct seq_list elem = {};
+
+	parents = ulist_alloc(GFP_NOFS);
+	if (!parents)
+		return -ENOMEM;
+
+	btrfs_get_tree_mod_seq(fs_info, &elem);
+	ret = btrfs_find_all_roots(trans, fs_info, oper->bytenr,
+				   elem.seq, &roots);
+	btrfs_put_tree_mod_seq(fs_info, &elem);
+	if (ret < 0)
+		return ret;
+
+	if (roots->nnodes != 1)
+		goto out;
+
+	ULIST_ITER_INIT(&uiter);
+	unode = ulist_next(roots, &uiter); /* Only want 1 so no need to loop */
+	/*
+	 * If we find our ref root then that means all refs
+	 * this extent has to the root have not yet been
+	 * deleted. In that case, we do nothing and let the
+	 * last ref for this bytenr drive our update.
+	 *
+	 * This can happen for example if an extent is
+	 * referenced multiple times in a snapshot (clone,
+	 * etc). If we are in the middle of snapshot removal,
+	 * queued updates for such an extent will find the
+	 * root if we have not yet finished removing the
+	 * snapshot.
+	 */
+	if (unode->val == oper->ref_root)
+		goto out;
+
+	root_obj = unode->val;
+	BUG_ON(!root_obj);
+
+	spin_lock(&fs_info->qgroup_lock);
+	qg = find_qgroup_rb(fs_info, root_obj);
+	if (!qg)
+		goto out_unlock;
+
+	qg->excl += oper->num_bytes;
+	qg->excl_cmpr += oper->num_bytes;
+	qgroup_dirty(fs_info, qg);
+
+	/*
+	 * Adjust counts for parent groups. First we find all
+	 * parents, then in the 2nd loop we do the adjustment
+	 * while adding parents of the parents to our ulist.
+	 */
+	list_for_each_entry(glist, &qg->groups, next_group) {
+		ret = ulist_add(parents, glist->group->qgroupid,
+				ptr_to_u64(glist->group), GFP_ATOMIC);
+		if (ret < 0)
+			goto out_unlock;
+	}
+
+	ULIST_ITER_INIT(&uiter);
+	while ((unode = ulist_next(parents, &uiter))) {
+		qg = u64_to_ptr(unode->aux);
+		qg->excl += oper->num_bytes;
+		qg->excl_cmpr += oper->num_bytes;
+		qgroup_dirty(fs_info, qg);
+
+		/* Add any parents of the parents */
+		list_for_each_entry(glist, &qg->groups, next_group) {
+			ret = ulist_add(parents, glist->group->qgroupid,
+					ptr_to_u64(glist->group), GFP_ATOMIC);
+			if (ret < 0)
+				goto out_unlock;
+		}
+	}
+
+out_unlock:
+	spin_unlock(&fs_info->qgroup_lock);
+
+out:
+	ulist_free(roots);
+	ulist_free(parents);
+	return ret;
+}
+
+/*
  * btrfs_qgroup_account_ref is called for every ref that is added to or deleted
  * from the fs. First, all roots referencing the extent are searched, and
  * then the space is accounted accordingly to the different roots. The
@@ -1923,6 +2085,9 @@ static int btrfs_qgroup_account(struct btrfs_trans_handle *trans,
 	case BTRFS_QGROUP_OPER_SUB_SHARED:
 		ret = qgroup_shared_accounting(trans, fs_info, oper);
 		break;
+	case BTRFS_QGROUP_OPER_SUB_SUBTREE:
+		ret = qgroup_subtree_accounting(trans, fs_info, oper);
+		break;
 	default:
 		ASSERT(0);
 	}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 5952ff1..18cc68c 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -44,6 +44,7 @@ enum btrfs_qgroup_operation_type {
 	BTRFS_QGROUP_OPER_ADD_SHARED,
 	BTRFS_QGROUP_OPER_SUB_EXCL,
 	BTRFS_QGROUP_OPER_SUB_SHARED,
+	BTRFS_QGROUP_OPER_SUB_SUBTREE,
 };
 
 struct btrfs_qgroup_operation {
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index b8774b3..96daac6 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1125,7 +1125,8 @@ DEFINE_EVENT(btrfs__workqueue_done, btrfs_workqueue_destroy,
 		{ BTRFS_QGROUP_OPER_ADD_EXCL, 	"OPER_ADD_EXCL" },	\
 		{ BTRFS_QGROUP_OPER_ADD_SHARED, "OPER_ADD_SHARED" },	\
 		{ BTRFS_QGROUP_OPER_SUB_EXCL, 	"OPER_SUB_EXCL" },	\
-		{ BTRFS_QGROUP_OPER_SUB_SHARED,	"OPER_SUB_SHARED" })
+		{ BTRFS_QGROUP_OPER_SUB_SHARED,	"OPER_SUB_SHARED" },	\
+		{ BTRFS_QGROUP_OPER_SUB_SUBTREE,"OPER_SUB_SUBTREE" })
 
 DECLARE_EVENT_CLASS(btrfs_qgroup_oper,
 
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/5] Btrfs: __btrfs_mod_ref should always use no_quota
  2014-07-17 19:38 [PATCH 0/5] btrfs: qgroup fixes for btrfs_drop_snapshot V5 Mark Fasheh
  2014-07-17 19:39 ` [PATCH 1/5] btrfs: add trace for qgroup accounting Mark Fasheh
  2014-07-17 19:39 ` [PATCH 2/5] btrfs: qgroup: account shared subtrees during snapshot delete Mark Fasheh
@ 2014-07-17 19:39 ` Mark Fasheh
  2014-07-17 19:39 ` [PATCH 4/5] btrfs: delete qgroup items in drop_snapshot Mark Fasheh
  2014-07-17 19:39 ` [PATCH 5/5] btrfs: correctly handle return from ulist_add Mark Fasheh
  4 siblings, 0 replies; 11+ messages in thread
From: Mark Fasheh @ 2014-07-17 19:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, Josef Bacik, Mark Fasheh

From: Josef Bacik <jbacik@fb.com>

Before I extended the no_quota arg to btrfs_dec/inc_ref because I didn't
understand how snapshot delete was using it and assumed that we needed the
quota operations there.  With Mark's work this has turned out to be not the
case, we _always_ need to use no_quota for btrfs_dec/inc_ref, so just drop the
argument and make __btrfs_mod_ref call it's process function with no_quota set
always.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/ctree.c       | 20 ++++++++++----------
 fs/btrfs/ctree.h       |  4 ++--
 fs/btrfs/extent-tree.c | 24 +++++++++++-------------
 3 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index aeab453..44ee5d2 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -280,9 +280,9 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
 
 	WARN_ON(btrfs_header_generation(buf) > trans->transid);
 	if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID)
-		ret = btrfs_inc_ref(trans, root, cow, 1, 1);
+		ret = btrfs_inc_ref(trans, root, cow, 1);
 	else
-		ret = btrfs_inc_ref(trans, root, cow, 0, 1);
+		ret = btrfs_inc_ref(trans, root, cow, 0);
 
 	if (ret)
 		return ret;
@@ -1035,14 +1035,14 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
 		if ((owner == root->root_key.objectid ||
 		     root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) &&
 		    !(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF)) {
-			ret = btrfs_inc_ref(trans, root, buf, 1, 1);
+			ret = btrfs_inc_ref(trans, root, buf, 1);
 			BUG_ON(ret); /* -ENOMEM */
 
 			if (root->root_key.objectid ==
 			    BTRFS_TREE_RELOC_OBJECTID) {
-				ret = btrfs_dec_ref(trans, root, buf, 0, 1);
+				ret = btrfs_dec_ref(trans, root, buf, 0);
 				BUG_ON(ret); /* -ENOMEM */
-				ret = btrfs_inc_ref(trans, root, cow, 1, 1);
+				ret = btrfs_inc_ref(trans, root, cow, 1);
 				BUG_ON(ret); /* -ENOMEM */
 			}
 			new_flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF;
@@ -1050,9 +1050,9 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
 
 			if (root->root_key.objectid ==
 			    BTRFS_TREE_RELOC_OBJECTID)
-				ret = btrfs_inc_ref(trans, root, cow, 1, 1);
+				ret = btrfs_inc_ref(trans, root, cow, 1);
 			else
-				ret = btrfs_inc_ref(trans, root, cow, 0, 1);
+				ret = btrfs_inc_ref(trans, root, cow, 0);
 			BUG_ON(ret); /* -ENOMEM */
 		}
 		if (new_flags != 0) {
@@ -1069,11 +1069,11 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
 		if (flags & BTRFS_BLOCK_FLAG_FULL_BACKREF) {
 			if (root->root_key.objectid ==
 			    BTRFS_TREE_RELOC_OBJECTID)
-				ret = btrfs_inc_ref(trans, root, cow, 1, 1);
+				ret = btrfs_inc_ref(trans, root, cow, 1);
 			else
-				ret = btrfs_inc_ref(trans, root, cow, 0, 1);
+				ret = btrfs_inc_ref(trans, root, cow, 0);
 			BUG_ON(ret); /* -ENOMEM */
-			ret = btrfs_dec_ref(trans, root, buf, 1, 1);
+			ret = btrfs_dec_ref(trans, root, buf, 1);
 			BUG_ON(ret); /* -ENOMEM */
 		}
 		clean_tree_block(trans, root, buf);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index be91397..8e29b61 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3326,9 +3326,9 @@ int btrfs_reserve_extent(struct btrfs_root *root, u64 num_bytes,
 			 u64 min_alloc_size, u64 empty_size, u64 hint_byte,
 			 struct btrfs_key *ins, int is_data, int delalloc);
 int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
-		  struct extent_buffer *buf, int full_backref, int no_quota);
+		  struct extent_buffer *buf, int full_backref);
 int btrfs_dec_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
-		  struct extent_buffer *buf, int full_backref, int no_quota);
+		  struct extent_buffer *buf, int full_backref);
 int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans,
 				struct btrfs_root *root,
 				u64 bytenr, u64 num_bytes, u64 flags,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1aa4325..ed9e13c 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3057,7 +3057,7 @@ out:
 static int __btrfs_mod_ref(struct btrfs_trans_handle *trans,
 			   struct btrfs_root *root,
 			   struct extent_buffer *buf,
-			   int full_backref, int inc, int no_quota)
+			   int full_backref, int inc)
 {
 	u64 bytenr;
 	u64 num_bytes;
@@ -3111,7 +3111,7 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans,
 			key.offset -= btrfs_file_extent_offset(buf, fi);
 			ret = process_func(trans, root, bytenr, num_bytes,
 					   parent, ref_root, key.objectid,
-					   key.offset, no_quota);
+					   key.offset, 1);
 			if (ret)
 				goto fail;
 		} else {
@@ -3119,7 +3119,7 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans,
 			num_bytes = btrfs_level_size(root, level - 1);
 			ret = process_func(trans, root, bytenr, num_bytes,
 					   parent, ref_root, level - 1, 0,
-					   no_quota);
+					   1);
 			if (ret)
 				goto fail;
 		}
@@ -3130,15 +3130,15 @@ fail:
 }
 
 int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
-		  struct extent_buffer *buf, int full_backref, int no_quota)
+		  struct extent_buffer *buf, int full_backref)
 {
-	return __btrfs_mod_ref(trans, root, buf, full_backref, 1, no_quota);
+	return __btrfs_mod_ref(trans, root, buf, full_backref, 1);
 }
 
 int btrfs_dec_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
-		  struct extent_buffer *buf, int full_backref, int no_quota)
+		  struct extent_buffer *buf, int full_backref)
 {
-	return __btrfs_mod_ref(trans, root, buf, full_backref, 0, no_quota);
+	return __btrfs_mod_ref(trans, root, buf, full_backref, 0);
 }
 
 static int write_one_cache_group(struct btrfs_trans_handle *trans,
@@ -7746,9 +7746,9 @@ static noinline int walk_down_proc(struct btrfs_trans_handle *trans,
 	/* wc->stage == UPDATE_BACKREF */
 	if (!(wc->flags[level] & flag)) {
 		BUG_ON(!path->locks[level]);
-		ret = btrfs_inc_ref(trans, root, eb, 1, wc->for_reloc);
+		ret = btrfs_inc_ref(trans, root, eb, 1);
 		BUG_ON(ret); /* -ENOMEM */
-		ret = btrfs_dec_ref(trans, root, eb, 0, wc->for_reloc);
+		ret = btrfs_dec_ref(trans, root, eb, 0);
 		BUG_ON(ret); /* -ENOMEM */
 		ret = btrfs_set_disk_extent_flags(trans, root, eb->start,
 						  eb->len, flag,
@@ -7995,11 +7995,9 @@ static noinline int walk_up_proc(struct btrfs_trans_handle *trans,
 	if (wc->refs[level] == 1) {
 		if (level == 0) {
 			if (wc->flags[level] & BTRFS_BLOCK_FLAG_FULL_BACKREF)
-				ret = btrfs_dec_ref(trans, root, eb, 1,
-						    wc->for_reloc);
+				ret = btrfs_dec_ref(trans, root, eb, 1);
 			else
-				ret = btrfs_dec_ref(trans, root, eb, 0,
-						    wc->for_reloc);
+				ret = btrfs_dec_ref(trans, root, eb, 0);
 			BUG_ON(ret); /* -ENOMEM */
 			ret = account_leaf_items(trans, root, eb);
 			if (ret) {
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/5] btrfs: delete qgroup items in drop_snapshot
  2014-07-17 19:38 [PATCH 0/5] btrfs: qgroup fixes for btrfs_drop_snapshot V5 Mark Fasheh
                   ` (2 preceding siblings ...)
  2014-07-17 19:39 ` [PATCH 3/5] Btrfs: __btrfs_mod_ref should always use no_quota Mark Fasheh
@ 2014-07-17 19:39 ` Mark Fasheh
  2014-07-17 19:39 ` [PATCH 5/5] btrfs: correctly handle return from ulist_add Mark Fasheh
  4 siblings, 0 replies; 11+ messages in thread
From: Mark Fasheh @ 2014-07-17 19:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, Josef Bacik, Mark Fasheh

btrfs_drop_snapshot() leaves subvolume qgroup items on disk after
completion. This wastes space and also can cause problems with snapshot
creation. If a new snapshot tries to claim the deleted subvolumes id,
btrfs will get -EEXIST from add_qgroup_item() and go read-only.

We can partially fix this by catching -EEXIST in add_qgroup_item() and
initializing the existing items. This will leave orphaned relation items
(BTRFS_QGROUP_RELATION_KEY) around however would be confusing to the end
user. Also this does nothing to fix the wasted space taken up by orphaned
qgroup items.

So the full fix is to delete all qgroup items related to the deleted
snapshot in btrfs_drop_snapshot.  If an item persists (either due to a
previous drop_snapshot without the fix, or some error) we can still continue
with snapshot create instead of throwing the whole filesystem readonly.

In the very small chance that some relation items persist, they will not
affect functioning of our level 0 subvolume qgroup.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/extent-tree.c |   6 +++
 fs/btrfs/qgroup.c      | 114 +++++++++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/qgroup.h      |   3 ++
 3 files changed, 120 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ed9e13c..2dad701 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8296,6 +8296,12 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
 	if (err)
 		goto out_end_trans;
 
+	ret = btrfs_del_qgroup_items(trans, root);
+	if (ret) {
+		btrfs_abort_transaction(trans, root, ret);
+		goto out_end_trans;
+	}
+
 	ret = btrfs_del_root(trans, tree_root, &root->root_key);
 	if (ret) {
 		btrfs_abort_transaction(trans, tree_root, ret);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 1569338..2ec2432 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -35,7 +35,6 @@
 #include "qgroup.h"
 
 /* TODO XXX FIXME
- *  - subvol delete -> delete when ref goes to 0? delete limits also?
  *  - reorganize keys
  *  - compressed
  *  - sync
@@ -99,6 +98,16 @@ struct btrfs_qgroup_list {
 	struct btrfs_qgroup *member;
 };
 
+/*
+ * used in remove_qgroup_relations() to track qgroup relations that
+ * need deleting
+ */
+struct relation_rec {
+	struct list_head list;
+	u64 src;
+	u64 dst;
+};
+
 #define ptr_to_u64(x) ((u64)(uintptr_t)x)
 #define u64_to_ptr(x) ((struct btrfs_qgroup *)(uintptr_t)x)
 
@@ -551,9 +560,15 @@ static int add_qgroup_item(struct btrfs_trans_handle *trans,
 	key.type = BTRFS_QGROUP_INFO_KEY;
 	key.offset = qgroupid;
 
+	/*
+	 * Avoid a transaction abort by catching -EEXIST here. In that
+	 * case, we proceed by re-initializing the existing structure
+	 * on disk.
+	 */
+
 	ret = btrfs_insert_empty_item(trans, quota_root, path, &key,
 				      sizeof(*qgroup_info));
-	if (ret)
+	if (ret && ret != -EEXIST)
 		goto out;
 
 	leaf = path->nodes[0];
@@ -572,7 +587,7 @@ static int add_qgroup_item(struct btrfs_trans_handle *trans,
 	key.type = BTRFS_QGROUP_LIMIT_KEY;
 	ret = btrfs_insert_empty_item(trans, quota_root, path, &key,
 				      sizeof(*qgroup_limit));
-	if (ret)
+	if (ret && ret != -EEXIST)
 		goto out;
 
 	leaf = path->nodes[0];
@@ -2817,3 +2832,96 @@ btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
 		btrfs_queue_work(fs_info->qgroup_rescan_workers,
 				 &fs_info->qgroup_rescan_work);
 }
+
+static struct relation_rec *
+qlist_to_relation_rec(struct btrfs_qgroup_list *qlist, struct list_head *all)
+{
+	u64 group, member;
+	struct relation_rec *rec;
+
+	BUILD_BUG_ON(sizeof(struct btrfs_qgroup_list) < sizeof(struct relation_rec));
+
+	list_del(&qlist->next_group);
+	list_del(&qlist->next_member);
+	group = qlist->group->qgroupid;
+	member = qlist->member->qgroupid;
+	rec = (struct relation_rec *)qlist;
+	rec->src = group;
+	rec->dst = member;
+
+	list_add(&rec->list, all);
+	return rec;
+}
+
+static int remove_qgroup_relations(struct btrfs_trans_handle *trans,
+				   struct btrfs_fs_info *fs_info, u64 qgroupid)
+{
+	int ret, err;
+	struct btrfs_root *quota_root = fs_info->quota_root;
+	struct relation_rec *rec;
+	struct btrfs_qgroup_list *qlist;
+	struct btrfs_qgroup *qgroup;
+	LIST_HEAD(relations);
+
+	spin_lock(&fs_info->qgroup_lock);
+	qgroup = find_qgroup_rb(fs_info, qgroupid);
+
+	while (!list_empty(&qgroup->groups)) {
+		qlist = list_first_entry(&qgroup->groups,
+					 struct btrfs_qgroup_list, next_group);
+		rec = qlist_to_relation_rec(qlist, &relations);
+	}
+
+	while (!list_empty(&qgroup->members)) {
+		qlist = list_first_entry(&qgroup->members,
+					 struct btrfs_qgroup_list, next_member);
+		rec = qlist_to_relation_rec(qlist, &relations);
+	}
+
+	spin_unlock(&fs_info->qgroup_lock);
+
+	ret = 0;
+	list_for_each_entry(rec, &relations, list) {
+		ret = del_qgroup_relation_item(trans, quota_root, rec->src, rec->dst);
+		err = del_qgroup_relation_item(trans, quota_root, rec->dst, rec->src);
+		if (err && !ret)
+			ret = err;
+		if (ret && ret != -ENOENT)
+			break;
+		ret = 0;
+	}
+
+	return ret;
+}
+
+int btrfs_del_qgroup_items(struct btrfs_trans_handle *trans,
+			   struct btrfs_root *root)
+{
+	int ret;
+	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_root *quota_root = fs_info->quota_root;
+	u64 qgroupid = root->root_key.objectid;
+
+	if (!fs_info->quota_enabled)
+		return 0;
+
+	mutex_lock(&fs_info->qgroup_ioctl_lock);
+
+	ret = remove_qgroup_relations(trans, fs_info, qgroupid);
+	if (ret)
+		goto out_unlock;
+
+	spin_lock(&fs_info->qgroup_lock);
+	del_qgroup_rb(quota_root->fs_info, qgroupid);
+	spin_unlock(&fs_info->qgroup_lock);
+
+	ret = del_qgroup_item(trans, quota_root, qgroupid);
+	if (ret && ret != -ENOENT)
+		goto out_unlock;
+
+	ret = 0;
+out_unlock:
+	mutex_unlock(&fs_info->qgroup_ioctl_lock);
+
+	return ret;
+}
diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h
index 18cc68c..b22a2ce 100644
--- a/fs/btrfs/qgroup.h
+++ b/fs/btrfs/qgroup.h
@@ -98,6 +98,9 @@ int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
 int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
 void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
 
+int btrfs_del_qgroup_items(struct btrfs_trans_handle *trans,
+			   struct btrfs_root *root);
+
 void assert_qgroups_uptodate(struct btrfs_trans_handle *trans);
 
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/5] btrfs: correctly handle return from ulist_add
  2014-07-17 19:38 [PATCH 0/5] btrfs: qgroup fixes for btrfs_drop_snapshot V5 Mark Fasheh
                   ` (3 preceding siblings ...)
  2014-07-17 19:39 ` [PATCH 4/5] btrfs: delete qgroup items in drop_snapshot Mark Fasheh
@ 2014-07-17 19:39 ` Mark Fasheh
  4 siblings, 0 replies; 11+ messages in thread
From: Mark Fasheh @ 2014-07-17 19:39 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason, Josef Bacik, Mark Fasheh

ulist_add() can return '1' on sucess, which qgroup_subtree_accounting()
doesn't take into account. As a result, that value can be bubbled up to
callers, causing an error to be printed. Fix this by only returning the
value of ulist_add() when it indicates an error.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
---
 fs/btrfs/qgroup.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 2ec2432..b55870c 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1976,6 +1976,7 @@ static int qgroup_subtree_accounting(struct btrfs_trans_handle *trans,
 	struct btrfs_qgroup_list *glist;
 	struct ulist *parents;
 	int ret = 0;
+	int err;
 	struct btrfs_qgroup *qg;
 	u64 root_obj = 0;
 	struct seq_list elem = {};
@@ -2030,10 +2031,12 @@ static int qgroup_subtree_accounting(struct btrfs_trans_handle *trans,
 	 * while adding parents of the parents to our ulist.
 	 */
 	list_for_each_entry(glist, &qg->groups, next_group) {
-		ret = ulist_add(parents, glist->group->qgroupid,
+		err = ulist_add(parents, glist->group->qgroupid,
 				ptr_to_u64(glist->group), GFP_ATOMIC);
-		if (ret < 0)
+		if (err < 0) {
+			ret = err;
 			goto out_unlock;
+		}
 	}
 
 	ULIST_ITER_INIT(&uiter);
@@ -2045,10 +2048,12 @@ static int qgroup_subtree_accounting(struct btrfs_trans_handle *trans,
 
 		/* Add any parents of the parents */
 		list_for_each_entry(glist, &qg->groups, next_group) {
-			ret = ulist_add(parents, glist->group->qgroupid,
+			err = ulist_add(parents, glist->group->qgroupid,
 					ptr_to_u64(glist->group), GFP_ATOMIC);
-			if (ret < 0)
+			if (err < 0) {
+				ret = err;
 				goto out_unlock;
+			}
 		}
 	}
 
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/5] btrfs: qgroup: account shared subtrees during snapshot delete
  2014-07-17 19:39 ` [PATCH 2/5] btrfs: qgroup: account shared subtrees during snapshot delete Mark Fasheh
@ 2014-08-12 18:22   ` Chris Mason
  2014-08-12 18:32     ` Mark Fasheh
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Mason @ 2014-08-12 18:22 UTC (permalink / raw)
  To: Mark Fasheh, linux-btrfs; +Cc: Josef Bacik



On 07/17/2014 03:39 PM, Mark Fasheh wrote:
> During its tree walk, btrfs_drop_snapshot() will skip any shared
> subtrees it encounters. This is incorrect when we have qgroups
> turned on as those subtrees need to have their contents
> accounted. In particular, the case we're concerned with is when
> removing our snapshot root leaves the subtree with only one root
> reference.
> 
> In those cases we need to find the last remaining root and add
> each extent in the subtree to the corresponding qgroup exclusive
> counts.
> 
> This patch implements the shared subtree walk and a new qgroup
> operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of
> this type is encountered during qgroup accounting, we search for
> any root references to that extent and in the case that we find
> only one reference left, we go ahead and do the math on it's
> exclusive counts.
> 
> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> Reviewed-by: Josef Bacik <jbacik@fb.com>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 813537f..1aa4325 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -8078,6 +8331,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>  	}
>  	root_dropped = true;
>  out_end_trans:
> +	ret = btrfs_delayed_qgroup_accounting(trans, root->fs_info);
                                                     ^^^^^^^^^^^

CONFIG_DEBUG_PAGEALLOC noticed that root is already free at this point.
 I switched it to tree_root instead ;)

-chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/5] btrfs: qgroup: account shared subtrees during snapshot delete
  2014-08-12 18:22   ` Chris Mason
@ 2014-08-12 18:32     ` Mark Fasheh
  2014-08-12 18:36       ` Chris Mason
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Fasheh @ 2014-08-12 18:32 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Josef Bacik

On Tue, Aug 12, 2014 at 02:22:31PM -0400, Chris Mason wrote:
> 
> 
> On 07/17/2014 03:39 PM, Mark Fasheh wrote:
> > During its tree walk, btrfs_drop_snapshot() will skip any shared
> > subtrees it encounters. This is incorrect when we have qgroups
> > turned on as those subtrees need to have their contents
> > accounted. In particular, the case we're concerned with is when
> > removing our snapshot root leaves the subtree with only one root
> > reference.
> > 
> > In those cases we need to find the last remaining root and add
> > each extent in the subtree to the corresponding qgroup exclusive
> > counts.
> > 
> > This patch implements the shared subtree walk and a new qgroup
> > operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of
> > this type is encountered during qgroup accounting, we search for
> > any root references to that extent and in the case that we find
> > only one reference left, we go ahead and do the math on it's
> > exclusive counts.
> > 
> > Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> > Reviewed-by: Josef Bacik <jbacik@fb.com>
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index 813537f..1aa4325 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -8078,6 +8331,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> >  	}
> >  	root_dropped = true;
> >  out_end_trans:
> > +	ret = btrfs_delayed_qgroup_accounting(trans, root->fs_info);
>                                                      ^^^^^^^^^^^
> 
> CONFIG_DEBUG_PAGEALLOC noticed that root is already free at this point.
>  I switched it to tree_root instead ;)

Oh nice catch, thanks for pointing it out.

Time to go update my suse patches.
	--Mark


--
Mark Fasheh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/5] btrfs: qgroup: account shared subtrees during snapshot delete
  2014-08-12 18:32     ` Mark Fasheh
@ 2014-08-12 18:36       ` Chris Mason
  2014-08-12 19:01         ` Mark Fasheh
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Mason @ 2014-08-12 18:36 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: linux-btrfs, Josef Bacik



On 08/12/2014 02:32 PM, Mark Fasheh wrote:
> On Tue, Aug 12, 2014 at 02:22:31PM -0400, Chris Mason wrote:
>>
>>
>> On 07/17/2014 03:39 PM, Mark Fasheh wrote:
>>> During its tree walk, btrfs_drop_snapshot() will skip any shared
>>> subtrees it encounters. This is incorrect when we have qgroups
>>> turned on as those subtrees need to have their contents
>>> accounted. In particular, the case we're concerned with is when
>>> removing our snapshot root leaves the subtree with only one root
>>> reference.
>>>
>>> In those cases we need to find the last remaining root and add
>>> each extent in the subtree to the corresponding qgroup exclusive
>>> counts.
>>>
>>> This patch implements the shared subtree walk and a new qgroup
>>> operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of
>>> this type is encountered during qgroup accounting, we search for
>>> any root references to that extent and in the case that we find
>>> only one reference left, we go ahead and do the math on it's
>>> exclusive counts.
>>>
>>> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
>>> Reviewed-by: Josef Bacik <jbacik@fb.com>
>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>> index 813537f..1aa4325 100644
>>> --- a/fs/btrfs/extent-tree.c
>>> +++ b/fs/btrfs/extent-tree.c
>>> @@ -8078,6 +8331,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>>  	}
>>>  	root_dropped = true;
>>>  out_end_trans:
>>> +	ret = btrfs_delayed_qgroup_accounting(trans, root->fs_info);
>>                                                      ^^^^^^^^^^^
>>
>> CONFIG_DEBUG_PAGEALLOC noticed that root is already free at this point.
>>  I switched it to tree_root instead ;)
> 
> Oh nice catch, thanks for pointing it out.
> 
> Time to go update my suse patches.

Grin, pretty sure it doesn't count as a catch if CONFIG_DEBUG_PAGEALLOC
finds it.  But it did make it through the balance test we were crashing
in before.  It's probably faster to hand edit the incremental in for the
suse patch, but here you go just in case:

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 763632d..ef0845d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8314,7 +8314,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
 	}
 	root_dropped = true;
 out_end_trans:
-	ret = btrfs_delayed_qgroup_accounting(trans, root->fs_info);
+	ret = btrfs_delayed_qgroup_accounting(trans, tree_root->fs_info);
 	if (ret)
 		printk_ratelimited(KERN_ERR "BTRFS: Failure %d "
 				   "running qgroup updates "


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/5] btrfs: qgroup: account shared subtrees during snapshot delete
  2014-08-12 18:36       ` Chris Mason
@ 2014-08-12 19:01         ` Mark Fasheh
  2014-08-12 19:08           ` Chris Mason
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Fasheh @ 2014-08-12 19:01 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs, Josef Bacik

On Tue, Aug 12, 2014 at 02:36:17PM -0400, Chris Mason wrote:
> 
> 
> On 08/12/2014 02:32 PM, Mark Fasheh wrote:
> > On Tue, Aug 12, 2014 at 02:22:31PM -0400, Chris Mason wrote:
> >>
> >>
> >> On 07/17/2014 03:39 PM, Mark Fasheh wrote:
> >>> During its tree walk, btrfs_drop_snapshot() will skip any shared
> >>> subtrees it encounters. This is incorrect when we have qgroups
> >>> turned on as those subtrees need to have their contents
> >>> accounted. In particular, the case we're concerned with is when
> >>> removing our snapshot root leaves the subtree with only one root
> >>> reference.
> >>>
> >>> In those cases we need to find the last remaining root and add
> >>> each extent in the subtree to the corresponding qgroup exclusive
> >>> counts.
> >>>
> >>> This patch implements the shared subtree walk and a new qgroup
> >>> operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of
> >>> this type is encountered during qgroup accounting, we search for
> >>> any root references to that extent and in the case that we find
> >>> only one reference left, we go ahead and do the math on it's
> >>> exclusive counts.
> >>>
> >>> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
> >>> Reviewed-by: Josef Bacik <jbacik@fb.com>
> >>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >>> index 813537f..1aa4325 100644
> >>> --- a/fs/btrfs/extent-tree.c
> >>> +++ b/fs/btrfs/extent-tree.c
> >>> @@ -8078,6 +8331,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> >>>  	}
> >>>  	root_dropped = true;
> >>>  out_end_trans:
> >>> +	ret = btrfs_delayed_qgroup_accounting(trans, root->fs_info);
> >>                                                      ^^^^^^^^^^^
> >>
> >> CONFIG_DEBUG_PAGEALLOC noticed that root is already free at this point.
> >>  I switched it to tree_root instead ;)
> > 
> > Oh nice catch, thanks for pointing it out.
> > 
> > Time to go update my suse patches.
> 
> Grin, pretty sure it doesn't count as a catch if CONFIG_DEBUG_PAGEALLOC
> finds it.

Fair enough, on my end I get to add CONFIG_DEBUG_PAGEALLOC to my testing :)

> But it did make it through the balance test we were crashing
> in before.  It's probably faster to hand edit the incremental in for the
> suse patch, but here you go just in case:

Yeah we haven't hit anything internally and like I said we've been running
with it for a while. It's probably hard to hit as I can promise that code
has been executed more than a couple times by now.

The delete-items path is definitely broken so maybe you hit something from
that?

Btw, I should be sending a two-liner fix that can be extracted from the
delete-items patch soon.
	--Mark


> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 763632d..ef0845d 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -8314,7 +8314,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>  	}
>  	root_dropped = true;
>  out_end_trans:
> -	ret = btrfs_delayed_qgroup_accounting(trans, root->fs_info);
> +	ret = btrfs_delayed_qgroup_accounting(trans, tree_root->fs_info);
>  	if (ret)
>  		printk_ratelimited(KERN_ERR "BTRFS: Failure %d "
>  				   "running qgroup updates "
--
Mark Fasheh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/5] btrfs: qgroup: account shared subtrees during snapshot delete
  2014-08-12 19:01         ` Mark Fasheh
@ 2014-08-12 19:08           ` Chris Mason
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Mason @ 2014-08-12 19:08 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: linux-btrfs, Josef Bacik

On 08/12/2014 03:01 PM, Mark Fasheh wrote:
> On Tue, Aug 12, 2014 at 02:36:17PM -0400, Chris Mason wrote:
>>
>>
>> On 08/12/2014 02:32 PM, Mark Fasheh wrote:
>>> On Tue, Aug 12, 2014 at 02:22:31PM -0400, Chris Mason wrote:
>>>>
>>>>
>>>> On 07/17/2014 03:39 PM, Mark Fasheh wrote:
>>>>> During its tree walk, btrfs_drop_snapshot() will skip any shared
>>>>> subtrees it encounters. This is incorrect when we have qgroups
>>>>> turned on as those subtrees need to have their contents
>>>>> accounted. In particular, the case we're concerned with is when
>>>>> removing our snapshot root leaves the subtree with only one root
>>>>> reference.
>>>>>
>>>>> In those cases we need to find the last remaining root and add
>>>>> each extent in the subtree to the corresponding qgroup exclusive
>>>>> counts.
>>>>>
>>>>> This patch implements the shared subtree walk and a new qgroup
>>>>> operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of
>>>>> this type is encountered during qgroup accounting, we search for
>>>>> any root references to that extent and in the case that we find
>>>>> only one reference left, we go ahead and do the math on it's
>>>>> exclusive counts.
>>>>>
>>>>> Signed-off-by: Mark Fasheh <mfasheh@suse.de>
>>>>> Reviewed-by: Josef Bacik <jbacik@fb.com>
>>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>>> index 813537f..1aa4325 100644
>>>>> --- a/fs/btrfs/extent-tree.c
>>>>> +++ b/fs/btrfs/extent-tree.c
>>>>> @@ -8078,6 +8331,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
>>>>>  	}
>>>>>  	root_dropped = true;
>>>>>  out_end_trans:
>>>>> +	ret = btrfs_delayed_qgroup_accounting(trans, root->fs_info);
>>>>                                                      ^^^^^^^^^^^
>>>>
>>>> CONFIG_DEBUG_PAGEALLOC noticed that root is already free at this point.
>>>>  I switched it to tree_root instead ;)
>>>
>>> Oh nice catch, thanks for pointing it out.
>>>
>>> Time to go update my suse patches.
>>
>> Grin, pretty sure it doesn't count as a catch if CONFIG_DEBUG_PAGEALLOC
>> finds it.
> 
> Fair enough, on my end I get to add CONFIG_DEBUG_PAGEALLOC to my testing :)

Best debugging feature ever!

> 
>> But it did make it through the balance test we were crashing
>> in before.  It's probably faster to hand edit the incremental in for the
>> suse patch, but here you go just in case:
> 
> Yeah we haven't hit anything internally and like I said we've been running
> with it for a while. It's probably hard to hit as I can promise that code
> has been executed more than a couple times by now.
> 
> The delete-items path is definitely broken so maybe you hit something from
> that?
> 

I had that one removed, so it was just the s/root/tree_root/ I needed.

> Btw, I should be sending a two-liner fix that can be extracted from the
> delete-items patch soon.

Great, thanks.

-chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-08-12 19:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-17 19:38 [PATCH 0/5] btrfs: qgroup fixes for btrfs_drop_snapshot V5 Mark Fasheh
2014-07-17 19:39 ` [PATCH 1/5] btrfs: add trace for qgroup accounting Mark Fasheh
2014-07-17 19:39 ` [PATCH 2/5] btrfs: qgroup: account shared subtrees during snapshot delete Mark Fasheh
2014-08-12 18:22   ` Chris Mason
2014-08-12 18:32     ` Mark Fasheh
2014-08-12 18:36       ` Chris Mason
2014-08-12 19:01         ` Mark Fasheh
2014-08-12 19:08           ` Chris Mason
2014-07-17 19:39 ` [PATCH 3/5] Btrfs: __btrfs_mod_ref should always use no_quota Mark Fasheh
2014-07-17 19:39 ` [PATCH 4/5] btrfs: delete qgroup items in drop_snapshot Mark Fasheh
2014-07-17 19:39 ` [PATCH 5/5] btrfs: correctly handle return from ulist_add Mark Fasheh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).