linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/12] random bug fixes
@ 2012-09-06 10:00 Miao Xie
  2012-09-06 10:00 ` [PATCH V4 01/12] Btrfs: fix error path in create_pending_snapshot() Miao Xie
                   ` (11 more replies)
  0 siblings, 12 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:00 UTC (permalink / raw)
  To: Linux Btrfs

This patchset contains 12 bug-fix patches.
- 01-03, 05-09 fix some bugs of the snapshot creation.
- 04 just improves the memory allocation of the ordered data extent object.
- 10 fixes a bug of the tree log that we forgot to protect ->log_batch.
- 11 improve error path handle, makes it show more information to the users
- 12 fixes the problem that fallocate() reserved less space than it need.

And in this patchset, 06, 09-12 are new patches. The others are old ones,
but I updated them according to the comment of David. Thanks, David.

In order to avoid the confusion of the version number, I marked all the patches
as V4 (The version number of the old version is confused, the version number
of two patches is v3, the others is v1).

Beside that, we can pull this patchset from the URL

	git://github.com/miaoxie/linux-btrfs.git for-merge

Thanks
Miao
---
Miao Xie (12):
      Btrfs: fix error path in create_pending_snapshot()
      Btrfs: fix full backref problem when inserting shared block reference
      Btrfs: fix file extent discount problem in the snapshot
      Btrfs: use a slab for ordered extents allocation
      Btrfs: fix wrong orphan count of the fs/file tree
      Btrfs: add a new "type" field into the block reservation structure
      Btrfs: fix corrupted metadata in the snapshot
      Btrfs: fix the snapshot that should not exist
      Btrfs: fix wrong size for the reservation of the snapshot creation
      Btrfs: fix unprotected ->log_batch
      Btrfs: output more information when aborting a unused transaction handle
      Btrfs: fix wrong size for the reservation when doing file pre-allocation.

 fs/btrfs/ctree.h         |   18 ++++++--
 fs/btrfs/delayed-inode.c |    5 +-
 fs/btrfs/disk-io.c       |   17 ++++---
 fs/btrfs/extent-tree.c   |   14 +++---
 fs/btrfs/file.c          |    8 ++--
 fs/btrfs/inode.c         |   33 ++++++++------
 fs/btrfs/ioctl.c         |    5 +-
 fs/btrfs/ordered-data.c  |   85 ++++++++++++++++-------------------
 fs/btrfs/ordered-data.h  |    9 ++++
 fs/btrfs/relocation.c    |    3 +-
 fs/btrfs/super.c         |   17 ++++++-
 fs/btrfs/transaction.c   |  108 ++++++++++++++++++++++++++++++---------------
 fs/btrfs/tree-log.c      |   12 ++---
 13 files changed, 203 insertions(+), 131 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V4 01/12] Btrfs: fix error path in create_pending_snapshot()
  2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
@ 2012-09-06 10:00 ` Miao Xie
  2012-09-17 16:56   ` David Sterba
  2012-09-06 10:00 ` [PATCH V4 02/12] Btrfs: fix full backref problem when inserting shared block reference Miao Xie
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:00 UTC (permalink / raw)
  To: Linux Btrfs

This patch fixes the following problem:
- If we failed to deal with the delayed dir items, we should abort transaction,
  just as its comment said. Fix it.
- If root reference or root back reference insertion failed, we should
  abort transaction. Fix it.
- Fix the double free problem of pending->inherit.
- Do not restore the trans->rsv if we doesn't change it.
- make the error path more clearly.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v3 -> v4:
- No change.

Changelog v2 -> v3:
- rebase on the latest for-linus branch
- fix double free problem of pending->inherit

Changelog v1 -> v2:
- fix double dput() when aborting transaction. In the previous version of the
  patches, this problem was fixed in the second patch, it is not good because
  this problem is the bug of the patch in fact.
---
 fs/btrfs/transaction.c |   40 +++++++++++++++++-----------------------
 1 files changed, 17 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 3ee8d58..b259d22f2 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -962,18 +962,16 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	u64 root_flags;
 	uuid_le new_uuid;
 
-	rsv = trans->block_rsv;
-
 	new_root_item = kmalloc(sizeof(*new_root_item), GFP_NOFS);
 	if (!new_root_item) {
 		ret = pending->error = -ENOMEM;
-		goto fail;
+		goto root_item_alloc_fail;
 	}
 
 	ret = btrfs_find_free_objectid(tree_root, &objectid);
 	if (ret) {
 		pending->error = ret;
-		goto fail;
+		goto no_free_objectid;
 	}
 
 	btrfs_reloc_pre_snapshot(trans, pending, &to_reserve);
@@ -983,22 +981,22 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 						  to_reserve);
 		if (ret) {
 			pending->error = ret;
-			goto fail;
+			goto no_free_objectid;
 		}
 	}
 
 	ret = btrfs_qgroup_inherit(trans, fs_info, root->root_key.objectid,
 				   objectid, pending->inherit);
-	kfree(pending->inherit);
 	if (ret) {
 		pending->error = ret;
-		goto fail;
+		goto no_free_objectid;
 	}
 
 	key.objectid = objectid;
 	key.offset = (u64)-1;
 	key.type = BTRFS_ROOT_ITEM_KEY;
 
+	rsv = trans->block_rsv;
 	trans->block_rsv = &pending->block_rsv;
 
 	dentry = pending->dentry;
@@ -1018,10 +1016,9 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 				BTRFS_FT_DIR, index);
 	if (ret == -EEXIST) {
 		pending->error = -EEXIST;
-		dput(parent);
 		goto fail;
 	} else if (ret) {
-		goto abort_trans_dput;
+		goto abort_trans;
 	}
 
 	btrfs_i_size_write(parent_inode, parent_inode->i_size +
@@ -1029,7 +1026,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	parent_inode->i_mtime = parent_inode->i_ctime = CURRENT_TIME;
 	ret = btrfs_update_inode(trans, parent_root, parent_inode);
 	if (ret)
-		goto abort_trans_dput;
+		goto abort_trans;
 
 	/*
 	 * pull in the delayed directory update
@@ -1038,10 +1035,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	 * snapshot
 	 */
 	ret = btrfs_run_delayed_items(trans, root);
-	if (ret) { /* Transaction aborted */
-		dput(parent);
-		goto fail;
-	}
+	if (ret)	/* Transaction aborted */
+		goto abort_trans;
 
 	record_root_in_trans(trans, root);
 	btrfs_set_root_last_snapshot(&root->root_item, trans->transid);
@@ -1074,7 +1069,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	if (ret) {
 		btrfs_tree_unlock(old);
 		free_extent_buffer(old);
-		goto abort_trans_dput;
+		goto abort_trans;
 	}
 
 	btrfs_set_lock_blocking(old);
@@ -1084,7 +1079,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	btrfs_tree_unlock(old);
 	free_extent_buffer(old);
 	if (ret)
-		goto abort_trans_dput;
+		goto abort_trans;
 
 	/* see comments in should_cow_block() */
 	root->force_cow = 1;
@@ -1097,7 +1092,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	btrfs_tree_unlock(tmp);
 	free_extent_buffer(tmp);
 	if (ret)
-		goto abort_trans_dput;
+		goto abort_trans;
 
 	/*
 	 * insert root back/forward references
@@ -1106,9 +1101,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 				 parent_root->root_key.objectid,
 				 btrfs_ino(parent_inode), index,
 				 dentry->d_name.name, dentry->d_name.len);
-	dput(parent);
 	if (ret)
-		goto fail;
+		goto abort_trans;
 
 	key.offset = (u64)-1;
 	pending->snap = btrfs_read_fs_root_no_name(root->fs_info, &key);
@@ -1120,15 +1114,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	ret = btrfs_reloc_post_snapshot(trans, pending);
 	if (ret)
 		goto abort_trans;
-	ret = 0;
 fail:
-	kfree(new_root_item);
+	dput(parent);
 	trans->block_rsv = rsv;
+no_free_objectid:
+	kfree(new_root_item);
+root_item_alloc_fail:
 	btrfs_block_rsv_release(root, &pending->block_rsv, (u64)-1);
 	return ret;
 
-abort_trans_dput:
-	dput(parent);
 abort_trans:
 	btrfs_abort_transaction(trans, root, ret);
 	goto fail;
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 02/12] Btrfs: fix full backref problem when inserting shared block reference
  2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
  2012-09-06 10:00 ` [PATCH V4 01/12] Btrfs: fix error path in create_pending_snapshot() Miao Xie
@ 2012-09-06 10:00 ` Miao Xie
  2012-09-06 10:01 ` [PATCH V4 03/12] Btrfs: fix file extent discount problem in the, snapshot Miao Xie
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:00 UTC (permalink / raw)
  To: Linux Btrfs

If we create several snapshots at the same time, the following BUG_ON() will be
triggered.

	kernel BUG at fs/btrfs/extent-tree.c:6047!

Steps to reproduce:
 # mkfs.btrfs <partition>
 # mount <partition> <mnt>
 # cd <mnt>
 # for ((i=0;i<2400;i++)); do touch long_name_to_make_tree_more_deep$i; done
 # for ((i=0; i<4; i++))
 > do
 > mkdir $i
 > for ((j=0; j<200; j++))
 > do
 > btrfs sub snap . $i/$j
 > done &
 > done

The reason is:
Before transaction commit, some operations changed the fs tree and new tree
blocks were allocated because of COW. We used the implicit non-shared back
reference for those newly allocated tree blocks because they were not shared by
two or more trees.

And then we created the first snapshot for the fs tree, according to the back
reference rules, we also used implicit back refs for the child tree blocks of
the root node of the fs tree, now those child nodes/leaves were shared by two
trees.

Then We didn't deal with the delayed references, and continued to change the fs
tree(created the second snapshot and inserted the dir item of the new snapshot
into the fs tree). According to the rules of the back reference, we added full
back refs for those tree blocks whose parents have be shared by two trees.
Now some newly allocated tree blocks had two types of the references.

As we know, the delayed reference system handles these delayed references from
back to front, and the full delayed reference is inserted after the implicit
ones. So when we dealt with the back references of those newly allocated tree
blocks, the full references was dealt with at first. And if the first reference
is a shared back reference and the tree block that the reference points to is
newly allocated, It would be considered as a tree block which is shared by two
or more trees when it is allocated and should be a full back reference not a
implicit one, the flag of its reference also should be set to FULL_BACKREF.
But in fact, it was a non-shared tree block with a implicit reference at
beginning, so it was not compulsory to set the flags to FULL_BACKREF. So BUG_ON
was triggered.

We have several methods to fix this bug:
1. deal with delayed references after the snapshot is created and before we
   change the source tree of the snapshot. This is the easiest and safest way.
2. modify the sort method of the delayed reference tree, make the full delayed
   references be inserted before the implicit ones. It is also very easy, but
   I don't know if it will introduce some problems or not.
3. modify select_delayed_ref() and make it select the implicit delayed reference
   at first. This way is not so good because it may wastes CPU time if we have
   lots of delayed references.
4. set the flags to FULL_BACKREF, this method is a little complex comparing with
   the 1st way.

I chose the 1st way to fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1 -> v4:
- No change.
---
 fs/btrfs/transaction.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index b259d22f2..8bd2511 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1114,6 +1114,10 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	ret = btrfs_reloc_post_snapshot(trans, pending);
 	if (ret)
 		goto abort_trans;
+
+	ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
+	if (ret)
+		goto abort_trans;
 fail:
 	dput(parent);
 	trans->block_rsv = rsv;
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 03/12] Btrfs: fix file extent discount problem in the, snapshot
  2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
  2012-09-06 10:00 ` [PATCH V4 01/12] Btrfs: fix error path in create_pending_snapshot() Miao Xie
  2012-09-06 10:00 ` [PATCH V4 02/12] Btrfs: fix full backref problem when inserting shared block reference Miao Xie
@ 2012-09-06 10:01 ` Miao Xie
  2012-09-06 10:01 ` [PATCH V4 04/12] Btrfs: use a slab for ordered extents allocation Miao Xie
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:01 UTC (permalink / raw)
  To: Linux Btrfs

If a snapshot is created while we are writing some data into the file,
the i_size of the corresponding file in the snapshot will be wrong, it will
be beyond the end of the last file extent. And btrfsck will report:
  root 256 inode 257 errors 100

Steps to reproduce:
 # mkfs.btrfs <partition>
 # mount <partition> <mnt>
 # cd <mnt>
 # dd if=/dev/zero of=tmpfile bs=4M count=1024 &
 # for ((i=0; i<4; i++))
 > do
 > btrfs sub snap . $i
 > done

This because the algorithm of disk_i_size update is wrong. Though there are
some ordered extents behind the current one which we use to update disk_i_size,
it doesn't mean those extents will be dealt with in the same transaction. So
We shouldn't use the offset of those extents to update disk_i_size. Or we will
get the wrong i_size in the snapshot.

We fix this problem by recording the max real i_size. If we find there is a
ordered extent which is in front of the current one and doesn't complete, we
will record the end of the current one into that ordered extent. Surely, if
the current extent holds the end of other extent(it must be greater than
the current one because it is behind the current one), we will record the
number that the current extent holds. In this way, we can exclude the ordered
extents that may not be dealth with in the same transaction, and be easy to
know the real disk_i_size.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1 -> v4:
- No change.
---
 fs/btrfs/ordered-data.c |   62 +++++++++++++---------------------------------
 fs/btrfs/ordered-data.h |    7 +++++
 2 files changed, 25 insertions(+), 44 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 643335a..2eb79cc 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -775,7 +775,6 @@ int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
 	struct btrfs_ordered_inode_tree *tree = &BTRFS_I(inode)->ordered_tree;
 	u64 disk_i_size;
 	u64 new_i_size;
-	u64 i_size_test;
 	u64 i_size = i_size_read(inode);
 	struct rb_node *node;
 	struct rb_node *prev = NULL;
@@ -835,55 +834,30 @@ int btrfs_ordered_update_i_size(struct inode *inode, u64 offset,
 			break;
 		if (test->file_offset >= i_size)
 			break;
-		if (test->file_offset >= disk_i_size)
+		if (test->file_offset >= disk_i_size) {
+			/*
+			 * we don't update disk_i_size now, so record this
+			 * undealt i_size. Or we will not know the real
+			 * i_size.
+			 */
+			if (test->outstanding_isize < offset)
+				test->outstanding_isize = offset;
+			if (ordered &&
+			    ordered->outstanding_isize >
+			    test->outstanding_isize)
+				test->outstanding_isize =
+						ordered->outstanding_isize;
 			goto out;
-	}
-	new_i_size = min_t(u64, offset, i_size);
-
-	/*
-	 * at this point, we know we can safely update i_size to at least
-	 * the offset from this ordered extent.  But, we need to
-	 * walk forward and see if ios from higher up in the file have
-	 * finished.
-	 */
-	if (ordered) {
-		node = rb_next(&ordered->rb_node);
-	} else {
-		if (prev)
-			node = rb_next(prev);
-		else
-			node = rb_first(&tree->tree);
-	}
-
-	/*
-	 * We are looking for an area between our current extent and the next
-	 * ordered extent to update the i_size to.  There are 3 cases here
-	 *
-	 * 1) We don't actually have anything and we can update to i_size.
-	 * 2) We have stuff but they already did their i_size update so again we
-	 * can just update to i_size.
-	 * 3) We have an outstanding ordered extent so the most we can update
-	 * our disk_i_size to is the start of the next offset.
-	 */
-	i_size_test = i_size;
-	for (; node; node = rb_next(node)) {
-		test = rb_entry(node, struct btrfs_ordered_extent, rb_node);
-
-		if (test_bit(BTRFS_ORDERED_UPDATED_ISIZE, &test->flags))
-			continue;
-		if (test->file_offset > offset) {
-			i_size_test = test->file_offset;
-			break;
 		}
 	}
+	new_i_size = min_t(u64, offset, i_size);
 
 	/*
-	 * i_size_test is the end of a region after this ordered
-	 * extent where there are no ordered extents, we can safely set
-	 * disk_i_size to this.
+	 * Some ordered extents may completed before the current one, and
+	 * we hold the real i_size in ->outstanding_isize.
 	 */
-	if (i_size_test > offset)
-		new_i_size = min_t(u64, i_size_test, i_size);
+	if (ordered && ordered->outstanding_isize > new_i_size)
+		new_i_size = min_t(u64, ordered->outstanding_isize, i_size);
 	BTRFS_I(inode)->disk_i_size = new_i_size;
 	ret = 0;
 out:
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index e03c560..c2443a4 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -96,6 +96,13 @@ struct btrfs_ordered_extent {
 	/* number of bytes that still need writing */
 	u64 bytes_left;
 
+	/*
+	 * the end of the ordered extent which is behind it but
+	 * didn't update disk_i_size. Please see the comment of
+	 * btrfs_ordered_update_i_size();
+	 */
+	u64 outstanding_isize;
+
 	/* flags (described above) */
 	unsigned long flags;
 
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 04/12] Btrfs: use a slab for ordered extents allocation
  2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
                   ` (2 preceding siblings ...)
  2012-09-06 10:01 ` [PATCH V4 03/12] Btrfs: fix file extent discount problem in the, snapshot Miao Xie
@ 2012-09-06 10:01 ` Miao Xie
  2012-09-06 10:02 ` [PATCH V4 05/12] Btrfs: fix wrong orphan count of the fs/file tree Miao Xie
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:01 UTC (permalink / raw)
  To: Linux Btrfs

The ordered extent allocation is in the fast path of the IO, so use a slab
to improve the speed of the allocation.

 "Size of the struct is 280, so this will fall into the size-512 bucket,
  giving 8 objects per page, while own slab will pack 14 objects into a page.

  Another benefit I see is to check for leaked objects when the module is
  removed (and the cache destroy takes place)."
						-- David Sterba

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1 -> v4:
- just change the changelog of this patch.
---
 fs/btrfs/ordered-data.c |   23 +++++++++++++++++++++--
 fs/btrfs/ordered-data.h |    2 ++
 fs/btrfs/super.c        |    9 ++++++++-
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 2eb79cc..4ae1014 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -25,6 +25,8 @@
 #include "btrfs_inode.h"
 #include "extent_io.h"
 
+static struct kmem_cache *btrfs_ordered_extent_cache;
+
 static u64 entry_end(struct btrfs_ordered_extent *entry)
 {
 	if (entry->file_offset + entry->len < entry->file_offset)
@@ -187,7 +189,7 @@ static int __btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
 	struct btrfs_ordered_extent *entry;
 
 	tree = &BTRFS_I(inode)->ordered_tree;
-	entry = kzalloc(sizeof(*entry), GFP_NOFS);
+	entry = kmem_cache_zalloc(btrfs_ordered_extent_cache, GFP_NOFS);
 	if (!entry)
 		return -ENOMEM;
 
@@ -421,7 +423,7 @@ void btrfs_put_ordered_extent(struct btrfs_ordered_extent *entry)
 			list_del(&sum->list);
 			kfree(sum);
 		}
-		kfree(entry);
+		kmem_cache_free(btrfs_ordered_extent_cache, entry);
 	}
 }
 
@@ -958,3 +960,20 @@ void btrfs_add_ordered_operation(struct btrfs_trans_handle *trans,
 	}
 	spin_unlock(&root->fs_info->ordered_extent_lock);
 }
+
+int __init ordered_data_init(void)
+{
+	btrfs_ordered_extent_cache = kmem_cache_create("btrfs_ordered_extent",
+				     sizeof(struct btrfs_ordered_extent), 0,
+				     SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
+				     NULL);
+	if (!btrfs_ordered_extent_cache)
+		return -ENOMEM;
+	return 0;
+}
+
+void ordered_data_exit(void)
+{
+	if (btrfs_ordered_extent_cache)
+		kmem_cache_destroy(btrfs_ordered_extent_cache);
+}
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index c2443a4..d1ddaef 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -192,4 +192,6 @@ void btrfs_add_ordered_operation(struct btrfs_trans_handle *trans,
 				 struct inode *inode);
 void btrfs_wait_ordered_extents(struct btrfs_root *root,
 				int nocow_only, int delay_iput);
+int __init ordered_data_init(void);
+void ordered_data_exit(void);
 #endif
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 073c236..48e53d6 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1595,10 +1595,14 @@ static int __init init_btrfs_fs(void)
 	if (err)
 		goto free_extent_io;
 
-	err = btrfs_delayed_inode_init();
+	err = ordered_data_init();
 	if (err)
 		goto free_extent_map;
 
+	err = btrfs_delayed_inode_init();
+	if (err)
+		goto free_ordered_data;
+
 	err = btrfs_interface_init();
 	if (err)
 		goto free_delayed_inode;
@@ -1616,6 +1620,8 @@ unregister_ioctl:
 	btrfs_interface_exit();
 free_delayed_inode:
 	btrfs_delayed_inode_exit();
+free_ordered_data:
+	ordered_data_exit();
 free_extent_map:
 	extent_map_exit();
 free_extent_io:
@@ -1632,6 +1638,7 @@ static void __exit exit_btrfs_fs(void)
 {
 	btrfs_destroy_cachep();
 	btrfs_delayed_inode_exit();
+	ordered_data_exit();
 	extent_map_exit();
 	extent_io_exit();
 	btrfs_interface_exit();
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 05/12] Btrfs: fix wrong orphan count of the fs/file tree
  2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
                   ` (3 preceding siblings ...)
  2012-09-06 10:01 ` [PATCH V4 04/12] Btrfs: use a slab for ordered extents allocation Miao Xie
@ 2012-09-06 10:02 ` Miao Xie
  2012-09-06 10:02 ` [PATCH V4 06/12] Btrfs: add a new "type" field into the block reservation structure Miao Xie
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:02 UTC (permalink / raw)
  To: Linux Btrfs

If we add a new orphan item, we should increase the atomic counter,
not decrease it. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1 -> v4:
- No change.
---
 fs/btrfs/inode.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6ba80b9..e954737 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2224,7 +2224,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode)
 			insert = 1;
 #endif
 		insert = 1;
-		atomic_dec(&root->orphan_inodes);
+		atomic_inc(&root->orphan_inodes);
 	}
 
 	if (!test_and_set_bit(BTRFS_INODE_ORPHAN_META_RESERVED,
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 06/12] Btrfs: add a new "type" field into the block reservation structure
  2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
                   ` (4 preceding siblings ...)
  2012-09-06 10:02 ` [PATCH V4 05/12] Btrfs: fix wrong orphan count of the fs/file tree Miao Xie
@ 2012-09-06 10:02 ` Miao Xie
  2012-09-06 10:03 ` [PATCH V4 07/12] Btrfs: fix corrupted metadata in the snapshot Miao Xie
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:02 UTC (permalink / raw)
  To: Linux Btrfs

Sometimes we need choose the method of the reservation according to the type
of the block reservation, such as the reservation for the delayed inode update.
Now we identify the type just by comparing the address of the reservation
variants, it is very ugly if it is a temporary one because we need compare it
with all the common reservation variants. So we add a new "type" field to keep
the type the reservation variants.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1 -> v4:
- new patch
---
 fs/btrfs/ctree.h         |   16 +++++++++++++---
 fs/btrfs/delayed-inode.c |    4 ++--
 fs/btrfs/disk-io.c       |   15 +++++++++------
 fs/btrfs/extent-tree.c   |    8 +++++---
 fs/btrfs/inode.c         |    8 ++++----
 fs/btrfs/ioctl.c         |    3 ++-
 fs/btrfs/relocation.c    |    3 ++-
 7 files changed, 37 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c38734a..171458a 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1028,12 +1028,21 @@ struct btrfs_space_info {
 	wait_queue_head_t wait;
 };
 
+#define	BTRFS_BLOCK_RSV_GLOBAL		1
+#define BTRFS_BLOCK_RSV_DELALLOC	2
+#define BTRFS_BLOCK_RSV_TRANS		3
+#define BTRFS_BLOCK_RSV_CHUNK		4
+#define BTRFS_BLOCK_RSV_DELOPS		5
+#define BTRFS_BLOCK_RSV_EMPTY		6
+#define BTRFS_BLOCK_RSV_TEMP		7
+
 struct btrfs_block_rsv {
 	u64 size;
 	u64 reserved;
 	struct btrfs_space_info *space_info;
 	spinlock_t lock;
-	unsigned int full;
+	unsigned short full;
+	unsigned short type;
 };
 
 /*
@@ -2874,8 +2883,9 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
 void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
 int btrfs_delalloc_reserve_space(struct inode *inode, u64 num_bytes);
 void btrfs_delalloc_release_space(struct inode *inode, u64 num_bytes);
-void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv);
-struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root);
+void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
+struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root,
+					      unsigned short type);
 void btrfs_free_block_rsv(struct btrfs_root *root,
 			  struct btrfs_block_rsv *rsv);
 int btrfs_block_rsv_add(struct btrfs_root *root,
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 07d5eeb..eb768c4 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -650,7 +650,7 @@ static int btrfs_delayed_inode_reserve_metadata(
 	 * we're accounted for.
 	 */
 	if (!src_rsv || (!trans->bytes_reserved &&
-	    src_rsv != &root->fs_info->delalloc_block_rsv)) {
+			 src_rsv->type != BTRFS_BLOCK_RSV_DELALLOC)) {
 		ret = btrfs_block_rsv_add_noflush(root, dst_rsv, num_bytes);
 		/*
 		 * Since we're under a transaction reserve_metadata_bytes could
@@ -668,7 +668,7 @@ static int btrfs_delayed_inode_reserve_metadata(
 						      num_bytes, 1);
 		}
 		return ret;
-	} else if (src_rsv == &root->fs_info->delalloc_block_rsv) {
+	} else if (src_rsv->type == BTRFS_BLOCK_RSV_DELALLOC) {
 		spin_lock(&BTRFS_I(inode)->lock);
 		if (test_and_clear_bit(BTRFS_INODE_DELALLOC_META_RESERVED,
 				       &BTRFS_I(inode)->runtime_flags)) {
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 29c69e6..0c0f8eb 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2017,12 +2017,15 @@ int open_ctree(struct super_block *sb,
 	INIT_LIST_HEAD(&fs_info->space_info);
 	INIT_LIST_HEAD(&fs_info->tree_mod_seq_list);
 	btrfs_mapping_init(&fs_info->mapping_tree);
-	btrfs_init_block_rsv(&fs_info->global_block_rsv);
-	btrfs_init_block_rsv(&fs_info->delalloc_block_rsv);
-	btrfs_init_block_rsv(&fs_info->trans_block_rsv);
-	btrfs_init_block_rsv(&fs_info->chunk_block_rsv);
-	btrfs_init_block_rsv(&fs_info->empty_block_rsv);
-	btrfs_init_block_rsv(&fs_info->delayed_block_rsv);
+	btrfs_init_block_rsv(&fs_info->global_block_rsv,
+			     BTRFS_BLOCK_RSV_GLOBAL);
+	btrfs_init_block_rsv(&fs_info->delalloc_block_rsv,
+			     BTRFS_BLOCK_RSV_DELALLOC);
+	btrfs_init_block_rsv(&fs_info->trans_block_rsv, BTRFS_BLOCK_RSV_TRANS);
+	btrfs_init_block_rsv(&fs_info->chunk_block_rsv, BTRFS_BLOCK_RSV_CHUNK);
+	btrfs_init_block_rsv(&fs_info->empty_block_rsv, BTRFS_BLOCK_RSV_EMPTY);
+	btrfs_init_block_rsv(&fs_info->delayed_block_rsv,
+			     BTRFS_BLOCK_RSV_DELOPS);
 	atomic_set(&fs_info->nr_async_submits, 0);
 	atomic_set(&fs_info->async_delalloc_pages, 0);
 	atomic_set(&fs_info->async_submit_draining, 0);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ba58024..840bfee 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4114,13 +4114,15 @@ static int block_rsv_migrate_bytes(struct btrfs_block_rsv *src,
 	return 0;
 }
 
-void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv)
+void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type)
 {
 	memset(rsv, 0, sizeof(*rsv));
 	spin_lock_init(&rsv->lock);
+	rsv->type = type;
 }
 
-struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root)
+struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root,
+					      unsigned short type)
 {
 	struct btrfs_block_rsv *block_rsv;
 	struct btrfs_fs_info *fs_info = root->fs_info;
@@ -4129,7 +4131,7 @@ struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root)
 	if (!block_rsv)
 		return NULL;
 
-	btrfs_init_block_rsv(block_rsv);
+	btrfs_init_block_rsv(block_rsv, type);
 	block_rsv->space_info = __find_space_info(fs_info,
 						  BTRFS_BLOCK_GROUP_METADATA);
 	return block_rsv;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e954737..d494c11 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2197,7 +2197,7 @@ int btrfs_orphan_add(struct btrfs_trans_handle *trans, struct inode *inode)
 	int ret;
 
 	if (!root->orphan_block_rsv) {
-		block_rsv = btrfs_alloc_block_rsv(root);
+		block_rsv = btrfs_alloc_block_rsv(root, BTRFS_BLOCK_RSV_TEMP);
 		if (!block_rsv)
 			return -ENOMEM;
 	}
@@ -3060,7 +3060,7 @@ out:
 static void __unlink_end_trans(struct btrfs_trans_handle *trans,
 			       struct btrfs_root *root)
 {
-	if (trans->block_rsv == &root->fs_info->global_block_rsv) {
+	if (trans->block_rsv->type == BTRFS_BLOCK_RSV_GLOBAL) {
 		btrfs_block_rsv_release(root, trans->block_rsv,
 					trans->bytes_reserved);
 		trans->block_rsv = &root->fs_info->trans_block_rsv;
@@ -3767,7 +3767,7 @@ void btrfs_evict_inode(struct inode *inode)
 		goto no_delete;
 	}
 
-	rsv = btrfs_alloc_block_rsv(root);
+	rsv = btrfs_alloc_block_rsv(root, BTRFS_BLOCK_RSV_TEMP);
 	if (!rsv) {
 		btrfs_orphan_del(NULL, inode);
 		goto no_delete;
@@ -6783,7 +6783,7 @@ static int btrfs_truncate(struct inode *inode)
 	 * 3) fs_info->trans_block_rsv - this will have 1 items worth left for
 	 * updating the inode.
 	 */
-	rsv = btrfs_alloc_block_rsv(root);
+	rsv = btrfs_alloc_block_rsv(root, BTRFS_BLOCK_RSV_TEMP);
 	if (!rsv)
 		return -ENOMEM;
 	rsv->size = min_size;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a1fbca0..3a16327 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -516,7 +516,8 @@ static int create_snapshot(struct btrfs_root *root, struct dentry *dentry,
 	if (!pending_snapshot)
 		return -ENOMEM;
 
-	btrfs_init_block_rsv(&pending_snapshot->block_rsv);
+	btrfs_init_block_rsv(&pending_snapshot->block_rsv,
+			     BTRFS_BLOCK_RSV_TEMP);
 	pending_snapshot->dentry = dentry;
 	pending_snapshot->root = root;
 	pending_snapshot->readonly = readonly;
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index c5dbd91..f193096 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3674,7 +3674,8 @@ int prepare_to_relocate(struct reloc_control *rc)
 	struct btrfs_trans_handle *trans;
 	int ret;
 
-	rc->block_rsv = btrfs_alloc_block_rsv(rc->extent_root);
+	rc->block_rsv = btrfs_alloc_block_rsv(rc->extent_root,
+					      BTRFS_BLOCK_RSV_TEMP);
 	if (!rc->block_rsv)
 		return -ENOMEM;
 
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 07/12] Btrfs: fix corrupted metadata in the snapshot
  2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
                   ` (5 preceding siblings ...)
  2012-09-06 10:02 ` [PATCH V4 06/12] Btrfs: add a new "type" field into the block reservation structure Miao Xie
@ 2012-09-06 10:03 ` Miao Xie
  2012-09-06 13:09   ` Josef Bacik
  2012-09-06 10:03 ` [PATCH V4 08/12] Btrfs: fix the snapshot that should not exist Miao Xie
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:03 UTC (permalink / raw)
  To: Linux Btrfs

When we delete a inode, we will remove all the delayed items including delayed
inode update, and then truncate all the relative metadata. If there is lots of
metadata, we will end the current transaction, and start a new transaction to
truncate the left metadata. In this way, we will leave a inode item that its
link counter is > 0, and also may leave some directory index items in fs/file tree
after the current transaction ends. In other words, the metadata in this fs/file tree
is inconsistent. If we create a snapshot for this tree now, we will find a inode with
corrupted metadata in the new snapshot, and we won't continue to drop the left metadata,
because its link counter is not 0.

We fix this problem by updating the inode item before the current transaction ends.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1 -> v4:
- Update the comment of the truncation in the btrfs_evict_inode()
- Fix enospc problem of the inode update
---
 fs/btrfs/delayed-inode.c |    3 ++-
 fs/btrfs/inode.c         |   23 ++++++++++++++---------
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index eb768c4..8f2d1bf 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -650,7 +650,8 @@ static int btrfs_delayed_inode_reserve_metadata(
 	 * we're accounted for.
 	 */
 	if (!src_rsv || (!trans->bytes_reserved &&
-			 src_rsv->type != BTRFS_BLOCK_RSV_DELALLOC)) {
+			 src_rsv->type != BTRFS_BLOCK_RSV_DELALLOC &&
+			 src_rsv->type != BTRFS_BLOCK_RSV_TEMP)) {
 		ret = btrfs_block_rsv_add_noflush(root, dst_rsv, num_bytes);
 		/*
 		 * Since we're under a transaction reserve_metadata_bytes could
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d494c11..709f5b9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3738,7 +3738,7 @@ void btrfs_evict_inode(struct inode *inode)
 	struct btrfs_trans_handle *trans;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_block_rsv *rsv, *global_rsv;
-	u64 min_size = btrfs_calc_trunc_metadata_size(root, 1);
+	u64 min_size;
 	unsigned long nr;
 	int ret;
 
@@ -3772,21 +3772,23 @@ void btrfs_evict_inode(struct inode *inode)
 		btrfs_orphan_del(NULL, inode);
 		goto no_delete;
 	}
+
+	min_size = btrfs_calc_trunc_metadata_size(root, 1);
+	min_size += btrfs_calc_trans_metadata_size(root, 1);
 	rsv->size = min_size;
 	global_rsv = &root->fs_info->global_block_rsv;
 
 	btrfs_i_size_write(inode, 0);
 
 	/*
-	 * This is a bit simpler than btrfs_truncate since
-	 *
-	 * 1) We've already reserved our space for our orphan item in the
-	 *    unlink.
-	 * 2) We're going to delete the inode item, so we don't need to update
-	 *    it at all.
+	 * This is a bit simpler than btrfs_truncate since we've already
+	 * reserved our space for our orphan item in the unlink, so we just
+	 * need to reserve some slack space in case we add bytes and update
+	 * inode item when doing the truncate.
 	 *
-	 * So we just need to reserve some slack space in case we add bytes when
-	 * doing the truncate.
+	 * The differentiation is we can not reserve the space for the inode
+	 * update when starting the transaction because it may cause
+	 * the deadlock.
 	 */
 	while (1) {
 		ret = btrfs_block_rsv_refill_noflush(root, rsv, min_size);
@@ -3820,6 +3822,9 @@ void btrfs_evict_inode(struct inode *inode)
 		if (ret != -EAGAIN)
 			break;
 
+		ret = btrfs_update_inode(trans, root, inode);
+		BUG_ON(ret);
+
 		nr = trans->blocks_used;
 		btrfs_end_transaction(trans, root);
 		trans = NULL;
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 08/12] Btrfs: fix the snapshot that should not exist
  2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
                   ` (6 preceding siblings ...)
  2012-09-06 10:03 ` [PATCH V4 07/12] Btrfs: fix corrupted metadata in the snapshot Miao Xie
@ 2012-09-06 10:03 ` Miao Xie
  2012-09-06 10:03 ` [PATCH V4 09/12] Btrfs: fix wrong size for the reservation of the, snapshot creation Miao Xie
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:03 UTC (permalink / raw)
  To: Linux Btrfs

The snapshot should be the image of the fs tree before it was created,
so the metadata of the snapshot should not exist in the its tree. But now, we
found the directory item and directory name index is in both the snapshot tree
and the fs tree. It introduces some problems and makes the users feel strange:

 # mkfs.btrfs /dev/sda1
 # mount /dev/sda1 /mnt
 # mkdir /mnt/1
 # cd /mnt/1
 # btrfs subvolume snapshot /mnt snap0
 # ls -a /mnt/1/snap0/1
 .	..	[no other file/dir]

 # ll /mnt/1/snap0/
 total 0
 drwxr-xr-x 1 root root 10 Ju1 24 12:11 1
			^^^
			There is no file/dir in it, but it's size is 10

 # cd /mnt/1/snap0/1/snap0
 [Enter a unexisted directory successfully...]

There is nothing in the directory 1 in snap0, but btrfs told the length of
this directory is 10. Beside that, we can enter an unexisted directory, it is
very strange to the users.

 # btrfs subvolume snapshot /mnt/1/snap0 /mnt/snap1
 # ll /mnt/1/snap0/1/
 total 0
 [None]
 # ll /mnt/snap1/1/
 total 0
 drwxr-xr-x 1 root root 0 Ju1 24 12:14 snap0

And the source of snap1 did have any directory in Directory 1, but snap1 have
a snap0, it is different between the source and the snapshot.

So I think we should insert directory item and directory name index and update
the parent inode as the last step of snapshot creation, and do not leave the
useless metadata in the file tree.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v3 -> v4:
- No change.

Changelog v2 -> v3:
- rebase on the latest for-linus branch

Changelog v1 -> v2:
- add comment to explain why we need deal with the delayed items after
  snapshot creation and why this operation do not corrupt the metadata.
- move dput() to patch 01/13
---
 fs/btrfs/transaction.c |   68 +++++++++++++++++++++++++++++++++++++----------
 1 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 8bd2511..67d425b 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -950,6 +950,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	struct btrfs_root *parent_root;
 	struct btrfs_block_rsv *rsv;
 	struct inode *parent_inode;
+	struct btrfs_path *path;
+	struct btrfs_dir_item *dir_item;
 	struct dentry *parent;
 	struct dentry *dentry;
 	struct extent_buffer *tmp;
@@ -962,6 +964,12 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	u64 root_flags;
 	uuid_le new_uuid;
 
+	path = btrfs_alloc_path();
+	if (!path) {
+		ret = pending->error = -ENOMEM;
+		goto path_alloc_fail;
+	}
+
 	new_root_item = kmalloc(sizeof(*new_root_item), GFP_NOFS);
 	if (!new_root_item) {
 		ret = pending->error = -ENOMEM;
@@ -1010,23 +1018,20 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	 */
 	ret = btrfs_set_inode_index(parent_inode, &index);
 	BUG_ON(ret); /* -ENOMEM */
-	ret = btrfs_insert_dir_item(trans, parent_root,
-				dentry->d_name.name, dentry->d_name.len,
-				parent_inode, &key,
-				BTRFS_FT_DIR, index);
-	if (ret == -EEXIST) {
+
+	/* check if there is a file/dir which has the same name. */
+	dir_item = btrfs_lookup_dir_item(NULL, parent_root, path,
+					 btrfs_ino(parent_inode),
+					 dentry->d_name.name,
+					 dentry->d_name.len, 0);
+	if (dir_item != NULL && !IS_ERR(dir_item)) {
 		pending->error = -EEXIST;
 		goto fail;
-	} else if (ret) {
+	} else if (IS_ERR(dir_item)) {
+		ret = PTR_ERR(dir_item);
 		goto abort_trans;
 	}
-
-	btrfs_i_size_write(parent_inode, parent_inode->i_size +
-					 dentry->d_name.len * 2);
-	parent_inode->i_mtime = parent_inode->i_ctime = CURRENT_TIME;
-	ret = btrfs_update_inode(trans, parent_root, parent_inode);
-	if (ret)
-		goto abort_trans;
+	btrfs_release_path(path);
 
 	/*
 	 * pull in the delayed directory update
@@ -1118,12 +1123,30 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
 	ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
 	if (ret)
 		goto abort_trans;
+
+	ret = btrfs_insert_dir_item(trans, parent_root,
+				    dentry->d_name.name, dentry->d_name.len,
+				    parent_inode, &key,
+				    BTRFS_FT_DIR, index);
+	/* We have check then name at the beginning, so it is impossible. */
+	BUG_ON(ret == -EEXIST);
+	if (ret)
+		goto abort_trans;
+
+	btrfs_i_size_write(parent_inode, parent_inode->i_size +
+					 dentry->d_name.len * 2);
+	parent_inode->i_mtime = parent_inode->i_ctime = CURRENT_TIME;
+	ret = btrfs_update_inode(trans, parent_root, parent_inode);
+	if (ret)
+		goto abort_trans;
 fail:
 	dput(parent);
 	trans->block_rsv = rsv;
 no_free_objectid:
 	kfree(new_root_item);
 root_item_alloc_fail:
+	btrfs_free_path(path);
+path_alloc_fail:
 	btrfs_block_rsv_release(root, &pending->block_rsv, (u64)-1);
 	return ret;
 
@@ -1449,13 +1472,28 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
 	 */
 	mutex_lock(&root->fs_info->reloc_mutex);
 
-	ret = btrfs_run_delayed_items(trans, root);
+	/*
+	 * We needn't worry about the delayed items because we will
+	 * deal with them in create_pending_snapshot(), which is the
+	 * core function of the snapshot creation.
+	 */
+	ret = create_pending_snapshots(trans, root->fs_info);
 	if (ret) {
 		mutex_unlock(&root->fs_info->reloc_mutex);
 		goto cleanup_transaction;
 	}
 
-	ret = create_pending_snapshots(trans, root->fs_info);
+	/*
+	 * We insert the dir indexes of the snapshots and update the inode
+	 * of the snapshots' parents after the snapshot creation, so there
+	 * are some delayed items which are not dealt with. Now deal with
+	 * them.
+	 *
+	 * We needn't worry that this operation will corrupt the snapshots,
+	 * because all the tree which are snapshoted will be forced to COW
+	 * the nodes and leaves.
+	 */
+	ret = btrfs_run_delayed_items(trans, root);
 	if (ret) {
 		mutex_unlock(&root->fs_info->reloc_mutex);
 		goto cleanup_transaction;
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 09/12] Btrfs: fix wrong size for the reservation of the, snapshot creation
  2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
                   ` (7 preceding siblings ...)
  2012-09-06 10:03 ` [PATCH V4 08/12] Btrfs: fix the snapshot that should not exist Miao Xie
@ 2012-09-06 10:03 ` Miao Xie
  2012-09-06 10:04 ` [PATCH V4 10/12] Btrfs: fix unprotected ->log_batch Miao Xie
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:03 UTC (permalink / raw)
  To: Linux Btrfs

We should insert/update 6 items(root ref, root backref, dir item, dir index,
root item and parent inode) when creating a snapshot, not 5 items, fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1 -> v4:
- new patch.
---
 fs/btrfs/extent-tree.c |    6 +++---
 fs/btrfs/ioctl.c       |    2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 840bfee..a010234 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4418,10 +4418,10 @@ int btrfs_snap_reserve_metadata(struct btrfs_trans_handle *trans,
 	struct btrfs_block_rsv *src_rsv = get_block_rsv(trans, root);
 	struct btrfs_block_rsv *dst_rsv = &pending->block_rsv;
 	/*
-	 * two for root back/forward refs, two for directory entries
-	 * and one for root of the snapshot.
+	 * two for root back/forward refs, two for directory entries,
+	 * one for root of the snapshot and one for parent inode.
 	 */
-	u64 num_bytes = btrfs_calc_trans_metadata_size(root, 5);
+	u64 num_bytes = btrfs_calc_trans_metadata_size(root, 6);
 	dst_rsv->space_info = src_rsv->space_info;
 	return block_rsv_migrate_bytes(src_rsv, dst_rsv, num_bytes);
 }
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3a16327..9384a2a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -526,7 +526,7 @@ static int create_snapshot(struct btrfs_root *root, struct dentry *dentry,
 		*inherit = NULL;	/* take responsibility to free it */
 	}
 
-	trans = btrfs_start_transaction(root->fs_info->extent_root, 5);
+	trans = btrfs_start_transaction(root->fs_info->extent_root, 6);
 	if (IS_ERR(trans)) {
 		ret = PTR_ERR(trans);
 		goto fail;
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 10/12] Btrfs: fix unprotected ->log_batch
  2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
                   ` (8 preceding siblings ...)
  2012-09-06 10:03 ` [PATCH V4 09/12] Btrfs: fix wrong size for the reservation of the, snapshot creation Miao Xie
@ 2012-09-06 10:04 ` Miao Xie
  2012-09-06 10:04 ` [PATCH V4 11/12] Btrfs: output more information when aborting a unused transaction handle Miao Xie
  2012-09-06 10:04 ` [PATCH V4 12/12] Btrfs: fix wrong size for the reservation when doing, file pre-allocation Miao Xie
  11 siblings, 0 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:04 UTC (permalink / raw)
  To: Linux Btrfs

We forget to protect ->log_batch when syncing a file, this patch fix
this problem by atomic operation. And ->log_batch is used to check
if there are parallel sync operations or not, so it is unnecessary to
reset it to 0 after the sync operation of the current log tree complete.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1 -> v4:
- new patch.
---
 fs/btrfs/ctree.h    |    2 +-
 fs/btrfs/disk-io.c  |    2 +-
 fs/btrfs/file.c     |    4 ++--
 fs/btrfs/tree-log.c |   12 +++++-------
 4 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 171458a..dbb461f 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1490,9 +1490,9 @@ struct btrfs_root {
 	wait_queue_head_t log_commit_wait[2];
 	atomic_t log_writers;
 	atomic_t log_commit[2];
+	atomic_t log_batch;
 	unsigned long log_transid;
 	unsigned long last_log_commit;
-	unsigned long log_batch;
 	pid_t log_start_pid;
 	bool log_multiple_pids;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 0c0f8eb..7fb7069 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1168,8 +1168,8 @@ static void __setup_root(u32 nodesize, u32 leafsize, u32 sectorsize,
 	atomic_set(&root->log_commit[0], 0);
 	atomic_set(&root->log_commit[1], 0);
 	atomic_set(&root->log_writers, 0);
+	atomic_set(&root->log_batch, 0);
 	atomic_set(&root->orphan_inodes, 0);
-	root->log_batch = 0;
 	root->log_transid = 0;
 	root->last_log_commit = 0;
 	extent_io_tree_init(&root->dirty_log_pages,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 9aa01ec..af888da 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1520,9 +1520,9 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * ordered range does a filemape_write_and_wait_range which is why we
 	 * don't do it above like other file systems.
 	 */
-	root->log_batch++;
+	atomic_inc(&root->log_batch);
 	btrfs_wait_ordered_range(inode, start, end);
-	root->log_batch++;
+	atomic_inc(&root->log_batch);
 
 	/*
 	 * check the transaction that last modified this inode
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index c86670f..71a2d19 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -146,7 +146,7 @@ static int start_log_trans(struct btrfs_trans_handle *trans,
 			root->log_multiple_pids = true;
 		}
 
-		root->log_batch++;
+		atomic_inc(&root->log_batch);
 		atomic_inc(&root->log_writers);
 		mutex_unlock(&root->log_mutex);
 		return 0;
@@ -165,7 +165,7 @@ static int start_log_trans(struct btrfs_trans_handle *trans,
 			err = ret;
 	}
 	mutex_unlock(&root->fs_info->tree_log_mutex);
-	root->log_batch++;
+	atomic_inc(&root->log_batch);
 	atomic_inc(&root->log_writers);
 	mutex_unlock(&root->log_mutex);
 	return err;
@@ -2037,7 +2037,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
 	if (atomic_read(&root->log_commit[(index1 + 1) % 2]))
 		wait_log_commit(trans, root, root->log_transid - 1);
 	while (1) {
-		unsigned long batch = root->log_batch;
+		int batch = atomic_read(&root->log_batch);
 		/* when we're on an ssd, just kick the log commit out */
 		if (!btrfs_test_opt(root, SSD) && root->log_multiple_pids) {
 			mutex_unlock(&root->log_mutex);
@@ -2045,7 +2045,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
 			mutex_lock(&root->log_mutex);
 		}
 		wait_for_writer(trans, root);
-		if (batch == root->log_batch)
+		if (batch == atomic_read(&root->log_batch))
 			break;
 	}
 
@@ -2074,7 +2074,6 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
 
 	btrfs_set_root_node(&log->root_item, log->node);
 
-	root->log_batch = 0;
 	root->log_transid++;
 	log->log_transid = root->log_transid;
 	root->log_start_pid = 0;
@@ -2087,7 +2086,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
 	mutex_unlock(&root->log_mutex);
 
 	mutex_lock(&log_root_tree->log_mutex);
-	log_root_tree->log_batch++;
+	atomic_inc(&log_root_tree->log_batch);
 	atomic_inc(&log_root_tree->log_writers);
 	mutex_unlock(&log_root_tree->log_mutex);
 
@@ -2157,7 +2156,6 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
 	btrfs_set_super_log_root_level(root->fs_info->super_for_commit,
 				btrfs_header_level(log_root_tree->node));
 
-	log_root_tree->log_batch = 0;
 	log_root_tree->log_transid++;
 	smp_mb();
 
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 11/12] Btrfs: output more information when aborting a unused transaction handle
  2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
                   ` (9 preceding siblings ...)
  2012-09-06 10:04 ` [PATCH V4 10/12] Btrfs: fix unprotected ->log_batch Miao Xie
@ 2012-09-06 10:04 ` Miao Xie
  2012-09-06 10:04 ` [PATCH V4 12/12] Btrfs: fix wrong size for the reservation when doing, file pre-allocation Miao Xie
  11 siblings, 0 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:04 UTC (permalink / raw)
  To: Linux Btrfs

Though we dump the stack information when aborting a unused transaction
handle, we don't know the correct place where we decide to abort the
transaction handle if one function has several place where the transaction
abort function is invoked and jumps to the same place after this call.
And beside that we also don't know the reason why we jump to abort
the current handle. So I modify the transaction abort function and make
it output the function name, line and error information.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1 -> v4:
- new patch.
---
 fs/btrfs/super.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 48e53d6..b3d9f4b 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -223,7 +223,13 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle *trans,
 	/* Nothing used. The other threads that have joined this
 	 * transaction may be able to continue. */
 	if (!trans->blocks_used) {
-		btrfs_printk(root->fs_info, "Aborting unused transaction.\n");
+		char nbuf[16];
+		const char *errstr;
+
+		errstr = btrfs_decode_error(root->fs_info, errno, nbuf);
+		btrfs_printk(root->fs_info,
+			     "%s:%d: Aborting unused transaction(%s).\n",
+			     function, line, errstr);
 		return;
 	}
 	trans->transaction->aborted = errno;
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V4 12/12] Btrfs: fix wrong size for the reservation when doing, file pre-allocation.
  2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
                   ` (10 preceding siblings ...)
  2012-09-06 10:04 ` [PATCH V4 11/12] Btrfs: output more information when aborting a unused transaction handle Miao Xie
@ 2012-09-06 10:04 ` Miao Xie
  11 siblings, 0 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-06 10:04 UTC (permalink / raw)
  To: Linux Btrfs

When we ran fsstress(a program in xfstests), the filesystem hung up when it
is full. It was because the space reserved in btrfs_fallocate() was wrong,
btrfs_fallocate() just used the size of the pre-allocation to reserve the
space, didn't took the block size aligning into account, so the size of
the reserved space was less than the allocated space, it caused the over
reserve problem and made the filesystem hung up when invoking cow_file_range().
Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1 -> v4:
- new patch.
---
 fs/btrfs/file.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index af888da..1a5f76b 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1640,7 +1640,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 	 * Make sure we have enough space before we do the
 	 * allocation.
 	 */
-	ret = btrfs_check_data_free_space(inode, len);
+	ret = btrfs_check_data_free_space(inode, alloc_end - alloc_start + 1);
 	if (ret)
 		return ret;
 
@@ -1747,7 +1747,7 @@ static long btrfs_fallocate(struct file *file, int mode,
 out:
 	mutex_unlock(&inode->i_mutex);
 	/* Let go of our reservation. */
-	btrfs_free_reserved_data_space(inode, len);
+	btrfs_free_reserved_data_space(inode, alloc_end - alloc_start + 1);
 	return ret;
 }
 
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 07/12] Btrfs: fix corrupted metadata in the snapshot
  2012-09-06 10:03 ` [PATCH V4 07/12] Btrfs: fix corrupted metadata in the snapshot Miao Xie
@ 2012-09-06 13:09   ` Josef Bacik
  2012-09-07  3:10     ` Miao Xie
  0 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2012-09-06 13:09 UTC (permalink / raw)
  To: Miao Xie; +Cc: Linux Btrfs

On Thu, Sep 06, 2012 at 04:03:04AM -0600, Miao Xie wrote:
> When we delete a inode, we will remove all the delayed items including delayed
> inode update, and then truncate all the relative metadata. If there is lots of
> metadata, we will end the current transaction, and start a new transaction to
> truncate the left metadata. In this way, we will leave a inode item that its
> link counter is > 0, and also may leave some directory index items in fs/file tree
> after the current transaction ends. In other words, the metadata in this fs/file tree
> is inconsistent. If we create a snapshot for this tree now, we will find a inode with
> corrupted metadata in the new snapshot, and we won't continue to drop the left metadata,
> because its link counter is not 0.
> 
> We fix this problem by updating the inode item before the current transaction ends.
> 
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
> Changelog v1 -> v4:
> - Update the comment of the truncation in the btrfs_evict_inode()
> - Fix enospc problem of the inode update

This isn't the right way to do the enospc fix, we need to do

btrfs_start_transaction(root, 1); and then change the trans->block_rsv to our
reserve for the truncate and then set it back to the trans rsv for the update
that way we don't run out of space because we used our reservation for the
truncate.  Just update this patch and send it along and I'll include it.
Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 07/12] Btrfs: fix corrupted metadata in the snapshot
  2012-09-06 13:09   ` Josef Bacik
@ 2012-09-07  3:10     ` Miao Xie
  2012-09-07  7:43       ` [PATCH V5 " Miao Xie
  0 siblings, 1 reply; 18+ messages in thread
From: Miao Xie @ 2012-09-07  3:10 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Linux Btrfs

On Thu, 6 Sep 2012 09:09:14 -0400, Josef Bacik wrote:
> On Thu, Sep 06, 2012 at 04:03:04AM -0600, Miao Xie wrote:
>> When we delete a inode, we will remove all the delayed items including delayed
>> inode update, and then truncate all the relative metadata. If there is lots of
>> metadata, we will end the current transaction, and start a new transaction to
>> truncate the left metadata. In this way, we will leave a inode item that its
>> link counter is > 0, and also may leave some directory index items in fs/file tree
>> after the current transaction ends. In other words, the metadata in this fs/file tree
>> is inconsistent. If we create a snapshot for this tree now, we will find a inode with
>> corrupted metadata in the new snapshot, and we won't continue to drop the left metadata,
>> because its link counter is not 0.
>>
>> We fix this problem by updating the inode item before the current transaction ends.
>>
>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>> ---
>> Changelog v1 -> v4:
>> - Update the comment of the truncation in the btrfs_evict_inode()
>> - Fix enospc problem of the inode update
> 
> This isn't the right way to do the enospc fix, we need to do
> 
> btrfs_start_transaction(root, 1); and then change the trans->block_rsv to our
> reserve for the truncate and then set it back to the trans rsv for the update
> that way we don't run out of space because we used our reservation for the
> truncate.  Just update this patch and send it along and I'll include it.
> Thanks,

btrfs_start_transaction() will cause the deadlock problem just as I said in comment,
the reason is:

	start transaction
		|
		v
	reserve meta-data space
		|
		v
	flush delay allocation -> iput inode -> evict inode
		^ 					|
		|					v
	wait for delay allocation flush <- reserve meta-data space

So we may introduce a special starting-transaction function which can reserve the space
without flush. I'll make a patch with this way.

Thanks
Miao

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V5 07/12] Btrfs: fix corrupted metadata in the snapshot
  2012-09-07  3:10     ` Miao Xie
@ 2012-09-07  7:43       ` Miao Xie
  0 siblings, 0 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-07  7:43 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Linux Btrfs

When we delete a inode, we will remove all the delayed items including delayed
inode update, and then truncate all the relative metadata. If there is lots of
metadata, we will end the current transaction, and start a new transaction to
truncate the left metadata. In this way, we will leave a inode item that its
link counter is > 0, and also may leave some directory index items in fs/file tree
after the current transaction ends. In other words, the metadata in this fs/file tree
is inconsistent. If we create a snapshot for this tree now, we will find a inode with
corrupted metadata in the new snapshot, and we won't continue to drop the left metadata,
because its link counter is not 0.

We fix this problem by updating the inode item before the current transaction ends.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v4 -> v5:
- change the method which is used to fix enospc problem of the inode update

Changelog v1 -> v4:
- Update the comment of the truncation in the btrfs_evict_inode()
- Fix enospc problem of the inode update
---
 fs/btrfs/inode.c       |   20 ++++++++++----------
 fs/btrfs/transaction.c |   29 +++++++++++++++++++++--------
 fs/btrfs/transaction.h |    2 ++
 3 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d494c11..b69779d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3772,21 +3772,17 @@ void btrfs_evict_inode(struct inode *inode)
 		btrfs_orphan_del(NULL, inode);
 		goto no_delete;
 	}
+
 	rsv->size = min_size;
 	global_rsv = &root->fs_info->global_block_rsv;
 
 	btrfs_i_size_write(inode, 0);
 
 	/*
-	 * This is a bit simpler than btrfs_truncate since
-	 *
-	 * 1) We've already reserved our space for our orphan item in the
-	 *    unlink.
-	 * 2) We're going to delete the inode item, so we don't need to update
-	 *    it at all.
-	 *
-	 * So we just need to reserve some slack space in case we add bytes when
-	 * doing the truncate.
+	 * This is a bit simpler than btrfs_truncate since we've already
+	 * reserved our space for our orphan item in the unlink, so we just
+	 * need to reserve some slack space in case we add bytes and update
+	 * inode item when doing the truncate.
 	 */
 	while (1) {
 		ret = btrfs_block_rsv_refill_noflush(root, rsv, min_size);
@@ -3807,7 +3803,7 @@ void btrfs_evict_inode(struct inode *inode)
 			goto no_delete;
 		}
 
-		trans = btrfs_start_transaction(root, 0);
+		trans = btrfs_start_transaction_noflush(root, 1);
 		if (IS_ERR(trans)) {
 			btrfs_orphan_del(NULL, inode);
 			btrfs_free_block_rsv(root, rsv);
@@ -3820,6 +3816,10 @@ void btrfs_evict_inode(struct inode *inode)
 		if (ret != -EAGAIN)
 			break;
 
+		trans->block_rsv = &root->fs_info->trans_block_rsv;
+		ret = btrfs_update_inode(trans, root, inode);
+		BUG_ON(ret);
+
 		nr = trans->blocks_used;
 		btrfs_end_transaction(trans, root);
 		trans = NULL;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 8bd2511..6ea5d2d 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -290,7 +290,8 @@ static int may_wait_transaction(struct btrfs_root *root, int type)
 }
 
 static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root,
-						    u64 num_items, int type)
+						    u64 num_items, int type,
+						    int noflush)
 {
 	struct btrfs_trans_handle *h;
 	struct btrfs_transaction *cur_trans;
@@ -324,9 +325,14 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root,
 		}
 
 		num_bytes = btrfs_calc_trans_metadata_size(root, num_items);
-		ret = btrfs_block_rsv_add(root,
-					  &root->fs_info->trans_block_rsv,
-					  num_bytes);
+		if (noflush)
+			ret = btrfs_block_rsv_add_noflush(root,
+						&root->fs_info->trans_block_rsv,
+						num_bytes);
+		else
+			ret = btrfs_block_rsv_add(root,
+						&root->fs_info->trans_block_rsv,
+						num_bytes);
 		if (ret)
 			return ERR_PTR(ret);
 	}
@@ -390,21 +396,28 @@ got_it:
 struct btrfs_trans_handle *btrfs_start_transaction(struct btrfs_root *root,
 						   int num_items)
 {
-	return start_transaction(root, num_items, TRANS_START);
+	return start_transaction(root, num_items, TRANS_START, 0);
 }
+
+struct btrfs_trans_handle *btrfs_start_transaction_noflush(
+					struct btrfs_root *root, int num_items)
+{
+	return start_transaction(root, num_items, TRANS_START, 1);
+}
+
 struct btrfs_trans_handle *btrfs_join_transaction(struct btrfs_root *root)
 {
-	return start_transaction(root, 0, TRANS_JOIN);
+	return start_transaction(root, 0, TRANS_JOIN, 0);
 }
 
 struct btrfs_trans_handle *btrfs_join_transaction_nolock(struct btrfs_root *root)
 {
-	return start_transaction(root, 0, TRANS_JOIN_NOLOCK);
+	return start_transaction(root, 0, TRANS_JOIN_NOLOCK, 0);
 }
 
 struct btrfs_trans_handle *btrfs_start_ioctl_transaction(struct btrfs_root *root)
 {
-	return start_transaction(root, 0, TRANS_USERSPACE);
+	return start_transaction(root, 0, TRANS_USERSPACE, 0);
 }
 
 /* wait for a transaction commit to be fully complete */
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index e8b8416..06c4929 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -96,6 +96,8 @@ int btrfs_end_transaction_nolock(struct btrfs_trans_handle *trans,
 				 struct btrfs_root *root);
 struct btrfs_trans_handle *btrfs_start_transaction(struct btrfs_root *root,
 						   int num_items);
+struct btrfs_trans_handle *btrfs_start_transaction_noflush(
+					struct btrfs_root *root, int num_items);
 struct btrfs_trans_handle *btrfs_join_transaction(struct btrfs_root *root);
 struct btrfs_trans_handle *btrfs_join_transaction_nolock(struct btrfs_root *root);
 struct btrfs_trans_handle *btrfs_start_ioctl_transaction(struct btrfs_root *root);
-- 
1.7.6.5

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 01/12] Btrfs: fix error path in create_pending_snapshot()
  2012-09-06 10:00 ` [PATCH V4 01/12] Btrfs: fix error path in create_pending_snapshot() Miao Xie
@ 2012-09-17 16:56   ` David Sterba
  2012-09-18  1:47     ` Miao Xie
  0 siblings, 1 reply; 18+ messages in thread
From: David Sterba @ 2012-09-17 16:56 UTC (permalink / raw)
  To: Miao Xie; +Cc: Linux Btrfs

On Thu, Sep 06, 2012 at 06:00:32PM +0800, Miao Xie wrote:
> This patch fixes the following problem:
> - If we failed to deal with the delayed dir items, we should abort transaction,
>   just as its comment said. Fix it.
> - If root reference or root back reference insertion failed, we should
>   abort transaction. Fix it.
> - Fix the double free problem of pending->inherit.
> - Do not restore the trans->rsv if we doesn't change it.
> - make the error path more clearly.

I've noticed a pattern in the error + transaction abort paths, that is
touched in this patch and would like to ask you to update it:

> @@ -1018,10 +1016,9 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  				BTRFS_FT_DIR, index);
>  	if (ret == -EEXIST) {
>  		pending->error = -EEXIST;
> -		dput(parent);
>  		goto fail;

normal exit path: here we don't abort transaction, just go the exit
block and do the cleanup

>  	} else if (ret) {
> -		goto abort_trans_dput;
> +		goto abort_trans;

a transaction abort path: here we jump to a common block that calls
abort, but we lose the information where the abort occured

I went through the code and saw several uses of this pattern (and I
remember more than one bugreport that pointed to a abort_transaction
call without leaving any traces what condition failed).

(Search regex I used 'goto.*abort')

So the proposed pattern to use is

---
	if (condition) {
		btrfs_transaction_abort(...);
		goto fail;
	}


fail:
	<cleanup>
	return ...;
---

> @@ -1120,15 +1114,15 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  	ret = btrfs_reloc_post_snapshot(trans, pending);
>  	if (ret)
>  		goto abort_trans;
> -	ret = 0;
>  fail:
> -	kfree(new_root_item);
> +	dput(parent);
>  	trans->block_rsv = rsv;
> +no_free_objectid:
> +	kfree(new_root_item);
> +root_item_alloc_fail:
>  	btrfs_block_rsv_release(root, &pending->block_rsv, (u64)-1);
>  	return ret;
>  
> -abort_trans_dput:
> -	dput(parent);
>  abort_trans:
>  	btrfs_abort_transaction(trans, root, ret);
>  	goto fail;
^^^^^^^^^^^^^^^^
(end of function here)

this will also remove all the instances where a function ends with a
'goto'. All instances are convertible to the pattern described above.

Atlernate approach that I originally considered for fixing was to
introduce a call like 'btrfs_mark_transaction_abort_callsite' which
would need to add a field to fs_info and print it later. But, if we're
going to touch all the code, it makes sense to utilize the
infrastructure we already have.

Please consider updating your patch, I'll send a separate patch that
deals with aborts outside of create_pending_snapshot.

TIA,
david

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH V4 01/12] Btrfs: fix error path in create_pending_snapshot()
  2012-09-17 16:56   ` David Sterba
@ 2012-09-18  1:47     ` Miao Xie
  0 siblings, 0 replies; 18+ messages in thread
From: Miao Xie @ 2012-09-18  1:47 UTC (permalink / raw)
  To: Linux Btrfs

On mon, 17 Sep 2012 18:56:27 +0200, David Sterba wrote:
> On Thu, Sep 06, 2012 at 06:00:32PM +0800, Miao Xie wrote:
>> This patch fixes the following problem:
>> - If we failed to deal with the delayed dir items, we should abort transaction,
>>   just as its comment said. Fix it.
>> - If root reference or root back reference insertion failed, we should
>>   abort transaction. Fix it.
>> - Fix the double free problem of pending->inherit.
>> - Do not restore the trans->rsv if we doesn't change it.
>> - make the error path more clearly.
> 
> I've noticed a pattern in the error + transaction abort paths, that is
> touched in this patch and would like to ask you to update it:

OK, I will send a separate patch to fix this problem.

Thanks for your review.
Miao

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-09-18  1:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-06 10:00 [PATCH V4 0/12] random bug fixes Miao Xie
2012-09-06 10:00 ` [PATCH V4 01/12] Btrfs: fix error path in create_pending_snapshot() Miao Xie
2012-09-17 16:56   ` David Sterba
2012-09-18  1:47     ` Miao Xie
2012-09-06 10:00 ` [PATCH V4 02/12] Btrfs: fix full backref problem when inserting shared block reference Miao Xie
2012-09-06 10:01 ` [PATCH V4 03/12] Btrfs: fix file extent discount problem in the, snapshot Miao Xie
2012-09-06 10:01 ` [PATCH V4 04/12] Btrfs: use a slab for ordered extents allocation Miao Xie
2012-09-06 10:02 ` [PATCH V4 05/12] Btrfs: fix wrong orphan count of the fs/file tree Miao Xie
2012-09-06 10:02 ` [PATCH V4 06/12] Btrfs: add a new "type" field into the block reservation structure Miao Xie
2012-09-06 10:03 ` [PATCH V4 07/12] Btrfs: fix corrupted metadata in the snapshot Miao Xie
2012-09-06 13:09   ` Josef Bacik
2012-09-07  3:10     ` Miao Xie
2012-09-07  7:43       ` [PATCH V5 " Miao Xie
2012-09-06 10:03 ` [PATCH V4 08/12] Btrfs: fix the snapshot that should not exist Miao Xie
2012-09-06 10:03 ` [PATCH V4 09/12] Btrfs: fix wrong size for the reservation of the, snapshot creation Miao Xie
2012-09-06 10:04 ` [PATCH V4 10/12] Btrfs: fix unprotected ->log_batch Miao Xie
2012-09-06 10:04 ` [PATCH V4 11/12] Btrfs: output more information when aborting a unused transaction handle Miao Xie
2012-09-06 10:04 ` [PATCH V4 12/12] Btrfs: fix wrong size for the reservation when doing, file pre-allocation Miao Xie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).