linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Miao Xie <miaox@cn.fujitsu.com>
To: Chris Mason <chris.mason@oracle.com>,
	Linux Btrfs <linux-btrfs@vger.kernel.org>,
	Itaru Kitayama <kitayama@cl.bb4u.ne.jp>
Subject: Re: [PATCH V2] btrfs: implement delayed inode items operation
Date: Fri, 18 Feb 2011 22:13:33 +0800	[thread overview]
Message-ID: <4D5E7E8D.3020705@cn.fujitsu.com> (raw)
In-Reply-To: <20110218133019.GC5615@twin.jikos.cz>

Hi, David

Thanks for your review. I'll update my patch address your comment.

Thanks
Miao

On Fri, 18 Feb 2011 14:30:20 +0100, David Sterba wrote:
> Hi,
>
> a few code comments below.
>
> On Thu, Feb 17, 2011 at 01:48:40PM +0800, Miao Xie wrote:
>> Compare with Ext3/4, the performance of file creation and deletion on btrfs
>> is very poor. the reason is that btrfs must do a lot of b+ tree insertions,
>> such as inode item, directory name item, directory name index and so on.
>>
>> If we can do some delayed b+ tree insertion or deletion, we can improve the
>> performance, so we made this patch which implemented delayed directory name
>> index insertion/deletion and delayed inode update.
>>
>> Implementation:
>> - introduce a delayed root object into the filesystem, that use a list to
>>    manage all the delayed node which is create for every file/directory.
>                             ^^^^^^^^^^^^^^^^^^^^
> (should this serve as a changelog entry):
>                             nodes which are created
>> - Every delayed node has two rb-tree, one is used to manage the directory name
>>    index which is going to be inserted into b+ tree, and the other is used to
>>    manage the directory name index which is going to be deleted from b+ tree.
>> - introduce a worker to deal with the delayed operation. This worker is used
>>    to deal with the works of the delayed directory name index items insertion
>>    and deletion and the delayed inode update.
>>    When the delayed items is beyond the lower limit, we create works for some
>>    delayed nodes and insert them into the work queue of the worker, and then
>>    go back.
>>    When the delayed items is beyond the upper bound, we create works for all
>>    the delayed nodes that haven't been dealt with, and insert them into thw work
>>    queue of the worker, and then wait for that the untreated items is below some
>>    threshold value.
>> - When we want to insert a directory name index into b+ tree, we just add the
>>    information into the delayed inserting rb-tree.
>>    And then we check the number of the delayed items and do delayed items
>>    balance. (The balance policy is above.)
>> - When we want to delete a directory name index from the b+ tree, we search it
>>    in the inserting rb-tree at first. If we look it up, just drop it. If not,
>>    add the key of it into the delayed deleting rb-tree.
>>    Similar to the delayed inserting rb-tree, we also check the number of the
>>    delayed items and do delayed items balance.
>>    (The same to inserting manipulation)
>> - When we want to update the metadata of some inode, we cached the data of the
>>    inode into the delayed node. the worker will flush it into the b+ tree after
>>    dealing with the delayed insertion and deletion.
>> - We will move the delayed node to the tail of the list after we access the
>>    delayed node, By this way, we can cache more delayed items and merge more
>>    inode updates.
>> - If we want to commit transaction, we will deal with all the delayed node.
>> - the delayed node will be freed when we free the btrfs inode.
>
> The delayed node structure will live even after if the data were
> transferred to b+ tree? I'm concerned about unnecessary memory allocated
> when in fact not needed.
>
>> - Before we log the inode items, we commit all the directory name index items
>>    and the delayed inode update.
>>
>> I did a quick test by the benchmark tool[1] and found we can improve the
>> performance of file creation by ~16%, and file deletion by ~19%.
>>
>> Before applying this patch:
>> Create files:
>> 	Total files: 50000
>> 	Total time: 1.113182
>> 	Average time: 0.000022
>> Delete files:
>> 	Total files: 50000
>> 	Total time: 1.544013
>> 	Average time: 0.000031
>>
>> After applying this patch:
>> Create files:
>> 	Total files: 50000
>> 	Total time: 0.935185
>> 	Average time: 0.000019
>> Delete files:
>> 	Total files: 50000
>> 	Total time: 1.252672
>> 	Average time: 0.000025
>>
>> [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3
>>
>> Signed-off-by: Miao Xie<miaox@cn.fujitsu.com>
>> ---
>> Changelog V1 ->  V2
>> - break up the global rb-tree, use a list to manage the delayed nodes,
>>    which is created for every directory and file, and used to manage the
>>    delayed directory name index items and the delayed inode item.
>> - introduce a worker to deal with the delayed nodes.
>>
>>   fs/btrfs/Makefile        |    2 +-
>>   fs/btrfs/btrfs_inode.h   |    5 +
>>   fs/btrfs/ctree.c         |   14 +-
>>   fs/btrfs/ctree.h         |   22 +-
>>   fs/btrfs/delayed-inode.c | 1546 ++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/btrfs/delayed-inode.h |  122 ++++
>>   fs/btrfs/dir-item.c      |   32 +-
>>   fs/btrfs/disk-io.c       |   26 +-
>>   fs/btrfs/extent-tree.c   |   18 +-
>>   fs/btrfs/inode.c         |  109 ++--
>>   fs/btrfs/ioctl.c         |    2 +-
>>   fs/btrfs/super.c         |    1 +
>>   fs/btrfs/transaction.c   |    7 +-
>>   fs/btrfs/tree-log.c      |    7 +
>>   14 files changed, 1809 insertions(+), 104 deletions(-)
>>   create mode 100644 fs/btrfs/delayed-inode.c
>>   create mode 100644 fs/btrfs/delayed-inode.h
>>
>> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
>> index 31610ea..a8411c2 100644
>> --- a/fs/btrfs/Makefile
>> +++ b/fs/btrfs/Makefile
>> @@ -7,4 +7,4 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \
>>   	   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
>>   	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
>>   	   export.o tree-log.o acl.o free-space-cache.o zlib.o lzo.o \
>> -	   compression.o delayed-ref.o relocation.o
>> +	   compression.o delayed-ref.o relocation.o delayed-inode.o
>> diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
>> index ccc991c..ef3baa8 100644
>> --- a/fs/btrfs/btrfs_inode.h
>> +++ b/fs/btrfs/btrfs_inode.h
>> @@ -22,6 +22,7 @@
>>   #include "extent_map.h"
>>   #include "extent_io.h"
>>   #include "ordered-data.h"
>> +#include "delayed-inode.h"
>>
>>   /* in memory btrfs inode */
>        in-memory
>
>>   struct btrfs_inode {
>> @@ -159,9 +160,13 @@ struct btrfs_inode {
>>   	 */
>>   	unsigned force_compress:4;
>>
>> +	struct btrfs_delayed_node *delayed_node;
>> +
>>   	struct inode vfs_inode;
>>   };
>>
>> +extern unsigned char btrfs_filetype_table[];
>> +
>>   static inline struct btrfs_inode *BTRFS_I(struct inode *inode)
>>   {
>>   	return container_of(inode, struct btrfs_inode, vfs_inode);
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index b5baff0..6e3e4c3 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -38,11 +38,6 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
>>   			      struct extent_buffer *src_buf);
>>   static int del_ptr(struct btrfs_trans_handle *trans, struct btrfs_root *root,
>>   		   struct btrfs_path *path, int level, int slot);
>> -static int setup_items_for_insert(struct btrfs_trans_handle *trans,
>> -			struct btrfs_root *root, struct btrfs_path *path,
>> -			struct btrfs_key *cpu_key, u32 *data_size,
>> -			u32 total_data, u32 total_size, int nr);
>> -
>>
>>   struct btrfs_path *btrfs_alloc_path(void)
>>   {
>> @@ -3688,11 +3683,10 @@ out:
>>    * to save stack depth by doing the bulk of the work in a function
>>    * that doesn't call btrfs_search_slot
>>    */
>> -static noinline_for_stack int
>> -setup_items_for_insert(struct btrfs_trans_handle *trans,
>> -		      struct btrfs_root *root, struct btrfs_path *path,
>> -		      struct btrfs_key *cpu_key, u32 *data_size,
>> -		      u32 total_data, u32 total_size, int nr)
>> +int setup_items_for_insert(struct btrfs_trans_handle *trans,
>> +			   struct btrfs_root *root, struct btrfs_path *path,
>> +			   struct btrfs_key *cpu_key, u32 *data_size,
>> +			   u32 total_data, u32 total_size, int nr)
>>   {
>>   	struct btrfs_item *item;
>>   	int i;
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 7219537..7f6fa3c 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -859,6 +859,7 @@ struct btrfs_block_group_cache {
>>   struct reloc_control;
>>   struct btrfs_device;
>>   struct btrfs_fs_devices;
>> +struct btrfs_delayed_root;
>>   struct btrfs_fs_info {
>>   	u8 fsid[BTRFS_FSID_SIZE];
>>   	u8 chunk_tree_uuid[BTRFS_UUID_SIZE];
>> @@ -885,7 +886,10 @@ struct btrfs_fs_info {
>>   	/* logical->physical extent mapping */
>>   	struct btrfs_mapping_tree mapping_tree;
>>
>> -	/* block reservation for extent, checksum and root tree */
>> +	/*
>> +	 * block reservation for extent, checksum, root tree and
>> +	 * delayed dir index item
>> +	 */
>>   	struct btrfs_block_rsv global_block_rsv;
>>   	/* block reservation for delay allocation */
>>   	struct btrfs_block_rsv delalloc_block_rsv;
>> @@ -1012,6 +1016,7 @@ struct btrfs_fs_info {
>>   	 * for the sys_munmap function call path
>>   	 */
>>   	struct btrfs_workers fixup_workers;
>> +	struct btrfs_workers delayed_workers;
>>   	struct task_struct *transaction_kthread;
>>   	struct task_struct *cleaner_kthread;
>>   	int thread_pool_size;
>> @@ -1069,6 +1074,8 @@ struct btrfs_fs_info {
>>
>>   	/* filesystem state */
>>   	u64 fs_state;
>> +
>> +	struct btrfs_delayed_root *delayed_root;
>>   };
>>
>>   /*
>> @@ -2085,6 +2092,13 @@ static inline bool btrfs_mixed_space_info(struct btrfs_space_info *space_info)
>>   }
>>
>>   /* extent-tree.c */
>> +static inline u64 btrfs_calc_trans_metadata_size(struct btrfs_root *root,
>> +						 int num_items)
>> +{
>> +	return (root->leafsize + root->nodesize * (BTRFS_MAX_LEVEL - 1)) *
>> +		3 * num_items;
>
> Can this calculation be documented?
>
>> +}
>> +
>>   void btrfs_put_block_group(struct btrfs_block_group_cache *cache);
>>   int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
>>   			   struct btrfs_root *root, unsigned long count);
>> @@ -2285,6 +2299,10 @@ static inline int btrfs_del_item(struct btrfs_trans_handle *trans,
>>   	return btrfs_del_items(trans, root, path, path->slots[0], 1);
>>   }
>>
>> +int setup_items_for_insert(struct btrfs_trans_handle *trans,
>> +			   struct btrfs_root *root, struct btrfs_path *path,
>> +			   struct btrfs_key *cpu_key, u32 *data_size,
>> +			   u32 total_data, u32 total_size, int nr);
>>   int btrfs_insert_item(struct btrfs_trans_handle *trans, struct btrfs_root
>>   		      *root, struct btrfs_key *key, void *data, u32 data_size);
>>   int btrfs_insert_some_items(struct btrfs_trans_handle *trans,
>> @@ -2346,7 +2364,7 @@ int btrfs_set_root_node(struct btrfs_root_item *item,
>>   /* dir-item.c */
>>   int btrfs_insert_dir_item(struct btrfs_trans_handle *trans,
>>   			  struct btrfs_root *root, const char *name,
>> -			  int name_len, u64 dir,
>> +			  int name_len, struct inode *dir,
>>   			  struct btrfs_key *location, u8 type, u64 index);
>>   struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans,
>>   					     struct btrfs_root *root,
>> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
>> new file mode 100644
>> index 0000000..a377d3b
>> --- /dev/null
>> +++ b/fs/btrfs/delayed-inode.c
>> @@ -0,0 +1,1546 @@
>> +/*
>> + * Copyright (C) 2011 Fujitsu.  All rights reserved.
>> + * Written by Miao Xie<miaox@cn.fujitsu.com>
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public
>> + * License v2 as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public
>> + * License along with this program; if not, write to the
>> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
>> + * Boston, MA 021110-1307, USA.
>> + */
>> +
>> +#include "delayed-inode.h"
>> +#include "disk-io.h"
>> +#include "transaction.h"
>> +
>> +#define BTRFS_DELAYED_LOWER		50
>> +#define BTRFS_DELAYED_WRITEBACK		200
>> +#define BTRFS_DELAYED_BACKGROUND	100
>> +
>> +static inline void btrfs_init_delayed_node(
>> +				struct btrfs_delayed_node *delayed_node,
>> +				u64 root_id, u64 inode_id)
>> +{
>> +	delayed_node->root_id = root_id;
>> +	delayed_node->inode_id = inode_id;
>> +	atomic_set(&delayed_node->refs, 0);
>> +	delayed_node->count = 0;
>> +	delayed_node->in_list = 0;
>> +	delayed_node->inode_dirty = 0;
>> +	delayed_node->delayed_doing = 0;
>> +	delayed_node->ins_root = RB_ROOT;
>> +	delayed_node->del_root = RB_ROOT;
>> +	mutex_init(&delayed_node->mutex);
>> +	delayed_node->index_cnt = 0;
>> +	INIT_LIST_HEAD(&delayed_node->list);
>> +	spin_lock_init(&delayed_node->lock);
>> +	delayed_node->bytes_reserved = 0;
>> +	delayed_node->block_rsv = NULL;
>> +}
>> +
>> +static inline int btrfs_is_continuous_delayed_item(
>> +					struct btrfs_delayed_item *item1,
>> +					struct btrfs_delayed_item *item2)
>> +{
>> +	if (item1->key.type == BTRFS_DIR_INDEX_KEY&&
>> +	    item1->key.objectid == item2->key.objectid&&
>> +	    item1->key.type == item2->key.type&&
>> +	    item1->key.offset + 1 == item2->key.offset)
>> +		return 1;
>> +	return 0;
>> +}
>> +
>> +static inline struct btrfs_delayed_root *btrfs_get_delayed_root(
>> +							struct btrfs_root *root)
>> +{
>> +	return root->fs_info->delayed_root;
>> +}
>> +
>> +static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node(
>> +							struct inode *inode)
>> +{
>> +	struct btrfs_delayed_node *node;
>> +	struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
>> +	struct btrfs_root *root = btrfs_inode->root;
>> +
>> +	if (btrfs_inode->delayed_node) {
>> +		node = btrfs_inode->delayed_node;
>> +		atomic_inc(&node->refs);	/* can be accessed */
>> +		return node;
>> +	}
>> +
>> +	node = kmalloc(sizeof(*node), GFP_NOFS);
>
> What about putting btrfs_delayed_node in a kmem cache? This can speedup
> quick insert/delete sequence.
>
>> +	if (!node)
>> +		return ERR_PTR(-ENOMEM);
>> +	btrfs_init_delayed_node(node, root->objectid, inode->i_ino);
>> +
>> +	btrfs_inode->delayed_node = node;
>> +	node->delayed_root = btrfs_get_delayed_root(root);
>> +	atomic_inc(&node->refs);	/* cached in the btrfs inode */
>> +	atomic_inc(&node->refs);	/* can be accessed */
>> +
>> +	return node;
>> +}
>> +
>> +/* Call it when holding delayed_node->mutex */
>> +static void btrfs_queue_delayed_node(struct btrfs_delayed_root *root,
>> +				     struct btrfs_delayed_node *node)
>> +{
>> +	spin_lock(&root->lock);
>> +	if (node->in_list)
>> +		list_move_tail(&node->list,&root->node_list);
>> +	else {
>> +		list_add_tail(&node->list,&root->node_list);
>> +		atomic_inc(&node->refs);	/* inserted into rb-tree */
>> +		root->nodes++;
>> +		node->in_list = 1;
>> +	}
>> +	spin_unlock(&root->lock);
>> +}
>> +
>> +/* Call it when holding delayed_node->mutex */
>> +static void btrfs_dequeue_delayed_node(struct btrfs_delayed_root *root,
>> +				       struct btrfs_delayed_node *node)
>> +{
>> +	spin_lock(&root->lock);
>> +	if (node->in_list) {
>> +		root->nodes--;
>> +		atomic_dec(&node->refs);	/* not in the rb-tree */
>> +		list_del_init(&node->list);
>> +		node->in_list = 0;
>> +	}
>> +	spin_unlock(&root->lock);
>> +}
>> +
>> +static void btrfs_release_delayed_node(struct btrfs_delayed_node *delayed_node)
>> +{
>> +	struct btrfs_delayed_root *delayed_root;
>> +
>> +	if (!delayed_node)
>> +		return;
>> +
>> +	delayed_root = delayed_node->delayed_root;
>> +
>> +	mutex_lock(&delayed_node->mutex);
>> +	if (delayed_node->count)
>> +		btrfs_queue_delayed_node(delayed_root, delayed_node);
>> +	else
>> +		btrfs_dequeue_delayed_node(delayed_root, delayed_node);
>> +	mutex_unlock(&delayed_node->mutex);
>> +
>> +	if (atomic_dec_and_test(&delayed_node->refs))
>> +		kfree(delayed_node);
>> +}
>> +
>> +struct btrfs_delayed_node *btrfs_first_delayed_node(
>> +			struct btrfs_delayed_root *delayed_root)
>> +{
>> +	struct list_head *p;
>> +	struct btrfs_delayed_node *node = NULL;
>> +
>> +	spin_lock(&delayed_root->lock);
>> +	if (list_empty(&delayed_root->node_list))
>> +		goto out;
>> +
>> +	p = delayed_root->node_list.next;
>> +	node = list_entry(p, struct btrfs_delayed_node, list);
>> +	atomic_inc(&node->refs);
>> +out:
>> +	spin_unlock(&delayed_root->lock);
>> +
>> +	return node;
>> +}
>> +
>> +struct btrfs_delayed_node *btrfs_next_delayed_node(
>> +						struct btrfs_delayed_node *node)
>> +{
>> +	struct btrfs_delayed_root *delayed_root = node->delayed_root;
>> +	struct list_head *p;
>> +	struct btrfs_delayed_node *next = NULL;
>> +
>> +	spin_lock(&delayed_root->lock);
>> +	if (!node->in_list) {	/* not in the list */
>> +		if (list_empty(&delayed_root->node_list))
>> +			goto out;
>> +		p = delayed_root->node_list.next;
>> +	} else if (list_is_last(&node->list,&delayed_root->node_list))
>> +		goto out;
>> +	else
>> +		p = node->list.next;
>> +
>> +	next = list_entry(p, struct btrfs_delayed_node, list);
>> +	atomic_inc(&next->refs);
>> +out:
>> +	spin_unlock(&delayed_root->lock);
>> +
>> +	return next;
>> +}
>> +
>> +struct btrfs_delayed_item *btrfs_alloc_delayed_item(int data_len)
>
> 'unsigned int data_len' and you can remove (u32) typing below
>
>> +{
>> +	struct btrfs_delayed_item *item;
>> +	item = kmalloc(sizeof(*item) + data_len, GFP_NOFS);
>> +	if (item) {
>> +		item->data_len = (u32)data_len;
>> +		item->ins_or_del = 0;
>> +		item->bytes_reserved = 0;
>> +		item->block_rsv = NULL;
>> +		item->delayed_node = NULL;
>> +	}
>> +	return item;
>> +}
>> +
>> +/*
>> + * __btrfs_lookup_delayed_item - look up the delayed item by key
>> + * @delayed_node: pointer of the delayed node
>                               to
>
>> + * @key:	  the key to search
>                               look up
>
>> + * @prev:	  used to store the prev item if the right item isn't found
>> + * @next:	  used to store the next item if the right item isn't found
>> + *
>> + * Note: if we don't find the right item, we will return the prev item and
>> + * the next item.
>> + */
>> +static struct btrfs_delayed_item *__btrfs_lookup_delayed_item(
>> +				struct rb_root *root,
>> +				struct btrfs_key *key,
>> +				struct btrfs_delayed_item **prev,
>> +				struct btrfs_delayed_item **next)
>> +{
>> +	struct rb_node *node, *prev_node = NULL;
>> +	struct btrfs_delayed_item *delayed_item = NULL;
>> +	int ret = 0;
>> +
>> +	node = root->rb_node;
>> +
>> +	while (node) {
>> +		delayed_item = rb_entry(node, struct btrfs_delayed_item,
>> +					rb_node);
>> +		prev_node = node;
>> +		ret = btrfs_comp_cpu_keys(&delayed_item->key, key);
>> +		if (ret<  0)
>> +			node = node->rb_right;
>> +		else if (ret>  0)
>> +			node = node->rb_left;
>> +		else
>> +			return delayed_item;
>> +	}
>> +
>> +	if (prev) {
>> +		if (!prev_node)
>> +			*prev = NULL;
>> +		else if (ret<  0)
>> +			*prev = delayed_item;
>> +		else if ((node = rb_prev(prev_node)) != NULL) {
>> +			*prev = rb_entry(node, struct btrfs_delayed_item,
>> +					 rb_node);
>> +		} else
>> +			*prev = NULL;
>> +	}
>> +
>> +	if (next) {
>> +		if (!prev_node)
>> +			*next = NULL;
>> +		else if (ret>  0)
>> +			*next = delayed_item;
>> +		else if ((node = rb_next(prev_node)) != NULL) {
>> +			*next = rb_entry(node, struct btrfs_delayed_item,
>> +					 rb_node);
>> +		} else
>> +			*next = NULL;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +struct btrfs_delayed_item *__btrfs_lookup_delayed_insertion_item(
>> +					struct btrfs_delayed_node *delayed_node,
>> +					struct btrfs_key *key)
>> +{
>> +	struct btrfs_delayed_item *item;
>> +
>> +	item = __btrfs_lookup_delayed_item(&delayed_node->ins_root, key,
>> +					   NULL, NULL);
>> +	return item;
>> +}
>> +
>> +struct btrfs_delayed_item *__btrfs_lookup_delayed_deletion_item(
>> +					struct btrfs_delayed_node *delayed_node,
>> +					struct btrfs_key *key)
>> +{
>> +	struct btrfs_delayed_item *item;
>> +
>> +	item = __btrfs_lookup_delayed_item(&delayed_node->del_root, key,
>> +					   NULL, NULL);
>> +	return item;
>> +}
>> +
>> +struct btrfs_delayed_item *__btrfs_search_delayed_insertion_item(
>> +					struct btrfs_delayed_node *delayed_node,
>> +					struct btrfs_key *key)
>> +{
>> +	struct btrfs_delayed_item *item, *next;
>> +
>> +	item = __btrfs_lookup_delayed_item(&delayed_node->ins_root, key,
>> +					   NULL,&next);
>> +	if (!item)
>> +		item = next;
>> +
>> +	return item;
>> +}
>> +
>> +struct btrfs_delayed_item *__btrfs_search_delayed_deletion_item(
>> +					struct btrfs_delayed_node *delayed_node,
>> +					struct btrfs_key *key)
>> +{
>> +	struct btrfs_delayed_item *item, *next;
>> +
>> +	item = __btrfs_lookup_delayed_item(&delayed_node->del_root, key,
>> +					   NULL,&next);
>> +	if (!item)
>> +		item = next;
>> +
>> +	return item;
>> +}
>> +
>> +static int __btrfs_add_delayed_item(struct btrfs_delayed_node *delayed_node,
>> +				    struct btrfs_delayed_item *ins,
>> +				    int action)
>> +{
>> +	struct rb_node **p, *node;
>> +	struct rb_node *parent_node = NULL;
>> +	struct rb_root *root;
>> +	struct btrfs_delayed_item *item;
>> +	int cmp;
>> +
>> +	if (action == BTRFS_DELAYED_INSERTION_ITEM)
>> +		root =&delayed_node->ins_root;
>> +	else if (action == BTRFS_DELAYED_DELETION_ITEM)
>> +		root =&delayed_node->del_root;
>> +	else
>> +		BUG();
>> +	p =&root->rb_node;
>> +	node =&ins->rb_node;
>> +
>> +	while (*p) {
>> +		parent_node = *p;
>> +		item = rb_entry(parent_node, struct btrfs_delayed_item,
>> +				 rb_node);
>> +
>> +		cmp = btrfs_comp_cpu_keys(&item->key,&ins->key);
>> +		if (cmp<  0)
>> +			p =&(*p)->rb_right;
>> +		else if (cmp>  0)
>> +			p =&(*p)->rb_left;
>> +		else
>> +			return -EEXIST;
>> +	}
>> +
>> +	rb_link_node(node, parent_node, p);
>> +	rb_insert_color(node, root);
>> +	ins->delayed_node = delayed_node;
>> +	ins->ins_or_del = action;
>> +
>> +	if (ins->key.type == BTRFS_DIR_INDEX_KEY&&
>> +	    action == BTRFS_DELAYED_INSERTION_ITEM&&
>> +	    ins->key.offset>= delayed_node->index_cnt)
>> +			delayed_node->index_cnt = ins->key.offset + 1;
>> +
>> +	delayed_node->count++;
>> +	atomic_inc(&delayed_node->delayed_root->items);
>> +	return 0;
>> +}
>> +
>> +static int __btrfs_add_delayed_insertion_item(struct btrfs_delayed_node *node,
>> +					      struct btrfs_delayed_item *item)
>> +{
>> +	return __btrfs_add_delayed_item(node, item,
>> +					BTRFS_DELAYED_INSERTION_ITEM);
>> +}
>> +
>> +static int __btrfs_add_delayed_deletion_item(struct btrfs_delayed_node *node,
>> +					     struct btrfs_delayed_item *item)
>> +{
>> +	return __btrfs_add_delayed_item(node, item,
>> +					BTRFS_DELAYED_DELETION_ITEM);
>> +}
>> +
>> +static void __btrfs_remove_delayed_item(struct btrfs_delayed_item *delayed_item)
>> +{
>> +	struct rb_root *root;
>> +	struct btrfs_delayed_root *delayed_root;
>> +
>> +	delayed_root = delayed_item->delayed_node->delayed_root;
>> +
>> +	BUG_ON(!delayed_root);
>> +	BUG_ON(delayed_item->ins_or_del != BTRFS_DELAYED_DELETION_ITEM&&
>> +	       delayed_item->ins_or_del != BTRFS_DELAYED_INSERTION_ITEM);
>> +
>> +	if (delayed_item->ins_or_del == BTRFS_DELAYED_INSERTION_ITEM)
>> +		root =&delayed_item->delayed_node->ins_root;
>> +	else
>> +		root =&delayed_item->delayed_node->del_root;
>> +
>> +	rb_erase(&delayed_item->rb_node, root);
>> +	delayed_item->delayed_node->count--;
>> +	atomic_dec(&delayed_root->items);
>> +	if (atomic_read(&delayed_root->items)<  BTRFS_DELAYED_LOWER&&
>> +	    waitqueue_active(&delayed_root->wait))
>> +		wake_up(&delayed_root->wait);
>> +}
>> +
>> +void btrfs_free_delayed_item(struct btrfs_delayed_item *item) {
>> +	kfree(item);
>> +}
>
> Is this used outside of delayed-inode.c ? The only usage is in the next
> function. So I suggest at least 'static inline' or completely removing it.
>
>> +
>> +void btrfs_release_delayed_item(struct btrfs_delayed_item *item)
>> +{
>> +	if (item) {
>> +		__btrfs_remove_delayed_item(item);
>> +		btrfs_free_delayed_item(item);
>> +	}
>> +}
>> +
>> +struct btrfs_delayed_item *__btrfs_first_delayed_insertion_item(
>> +					struct btrfs_delayed_node *delayed_node)
>> +{
>> +	struct rb_node *p;
>> +	struct btrfs_delayed_item *item = NULL;
>> +
>> +	p = rb_first(&delayed_node->ins_root);
>> +	if (p)
>> +		item = rb_entry(p, struct btrfs_delayed_item, rb_node);
>> +
>> +	return item;
>> +}
>> +
>> +struct btrfs_delayed_item *__btrfs_first_delayed_deletion_item(
>> +					struct btrfs_delayed_node *delayed_node)
>> +{
>> +	struct rb_node *p;
>> +	struct btrfs_delayed_item *item = NULL;
>> +
>> +	p = rb_first(&delayed_node->del_root);
>> +	if (p)
>> +		item = rb_entry(p, struct btrfs_delayed_item, rb_node);
>> +
>> +	return item;
>> +}
>> +
>> +struct btrfs_delayed_item *__btrfs_next_delayed_item(
>> +						struct btrfs_delayed_item *item)
>> +{
>> +	struct rb_node *p;
>> +	struct btrfs_delayed_item *next = NULL;
>> +
>> +	p = rb_next(&item->rb_node);
>> +	if (p)
>> +		next = rb_entry(p, struct btrfs_delayed_item, rb_node);
>> +
>> +	return next;
>> +}
>> +
>> +static inline struct btrfs_delayed_node *btrfs_get_delayed_node(
>> +							struct inode *inode)
>> +{
>> +	struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
>> +	struct btrfs_delayed_node *delayed_node;
>> +
>> +	delayed_node = btrfs_inode->delayed_node;
>> +	if (delayed_node)
>> +		atomic_inc(&delayed_node->refs);
>> +
>> +	return delayed_node;
>> +}
>> +
>> +static inline struct btrfs_root *btrfs_get_fs_root(struct btrfs_root *root,
>> +						   u64 root_id)
>> +{
>> +	struct btrfs_key root_key;
>> +
>> +	if (root->objectid == root_id)
>> +		return root;
>> +
>> +	root_key.objectid = root_id;
>> +	root_key.type = BTRFS_ROOT_ITEM_KEY;
>> +	root_key.offset = (u64)-1;
>> +	return btrfs_read_fs_root_no_name(root->fs_info,&root_key);
>> +}
>> +
>> +int btrfs_delayed_item_reserve_metadata(struct btrfs_trans_handle *trans,
>> +					struct btrfs_root *root,
>> +					struct btrfs_delayed_item *item)
>> +{
>> +	struct btrfs_block_rsv *src_rsv;
>> +	struct btrfs_block_rsv *dst_rsv;
>> +	u64 num_bytes;
>> +	int ret;
>> +
>> +	if (!trans->bytes_reserved)
>> +		return 0;
>> +
>> +	src_rsv = trans->block_rsv;
>> +	dst_rsv =&root->fs_info->global_block_rsv;
>> +
>> +	num_bytes = btrfs_calc_trans_metadata_size(root, 1);
>> +	ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes);
>> +	if (!ret) {
>> +		item->bytes_reserved = num_bytes;
>> +		item->block_rsv = dst_rsv;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +void btrfs_delayed_item_release_metadata(struct btrfs_root *root,
>> +					 struct btrfs_delayed_item *item)
>
> make it static
>
>> +{
>> +	if (!item->bytes_reserved)
>> +		return;
>> +
>> +	btrfs_block_rsv_release(root, item->block_rsv,
>> +				item->bytes_reserved);
>> +}
>> +
>> +int btrfs_delayed_inode_reserve_metadata(struct btrfs_trans_handle *trans,
>> +					 struct btrfs_root *root,
>> +					 struct btrfs_delayed_node *node)
>> +{
>> +	struct btrfs_block_rsv *src_rsv;
>> +	struct btrfs_block_rsv *dst_rsv;
>> +	u64 num_bytes;
>> +	int ret;
>> +
>> +	if (!trans->bytes_reserved)
>> +		return 0;
>> +
>> +	src_rsv = trans->block_rsv;
>> +	dst_rsv =&root->fs_info->global_block_rsv;
>> +
>> +	num_bytes = btrfs_calc_trans_metadata_size(root, 1);
>> +	ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes);
>> +	if (!ret) {
>> +		node->bytes_reserved = num_bytes;
>> +		node->block_rsv = dst_rsv;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +void btrfs_delayed_inode_release_metadata(struct btrfs_root *root,
>> +					  struct btrfs_delayed_node *node)
>> +{
>> +	if (!node->bytes_reserved)
>> +		return;
>> +
>> +	btrfs_block_rsv_release(root, node->block_rsv,
>> +				node->bytes_reserved);
>> +	node->bytes_reserved = 0;
>> +}
>> +
>> +/*
>> + * btrfs_batch_insert_dir_index_items - batch insert some continuous dir index
>        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> does not match the function name below, and does not seem to be useful
> for a static function (ie. not an API)
>
>> + *					items into the same leaf
>> + *
>> + * This function will insert some dir index items into the same leaf according
>> + * to the free space of the leaf.
>> + */
>
> I think it's not needed to repeat the same information three times
>
>> +static int btrfs_batch_insert_items(struct btrfs_trans_handle *trans,
>> +				struct btrfs_root *root,
>> +				struct btrfs_path *path,
>> +				struct btrfs_delayed_item *item)
>> +{
>> +	struct btrfs_delayed_item *curr, *next;
>> +	int free_space;
>> +	int total_data_size = 0, total_size = 0;
>> +	struct extent_buffer *leaf;
>> +	char *data_ptr;
>> +	struct btrfs_key *keys;
>> +	u32 *data_size;
>> +	struct list_head head;
>> +	int slot;
>> +	int nitems;
>> +	int i;
>> +	int ret = 0;
>> +
>> +	BUG_ON(!path->nodes[0]);
>> +
>> +	leaf = path->nodes[0];
>> +	free_space = btrfs_leaf_free_space(root, leaf);
>> +	INIT_LIST_HEAD(&head);
>> +
>> +	next = item;
>> +
>> +	/*
>> +	 * count the number of the dir index items that we can insert them in
>> +	 * batch
>
> "count the number of dir index items that we can insert in batch"
>
>> +	 */
>> +	while (total_size + next->data_len + sizeof(struct btrfs_item)<=
>> +	       free_space) {
>> +		total_data_size += next->data_len;
>> +		total_size += next->data_len + sizeof(struct btrfs_item);
>> +		list_add_tail(&next->tree_list,&head);
>> +		nitems++;
>> +
>> +		curr = next;
>> +		next = __btrfs_next_delayed_item(curr);
>> +		if (!next)
>> +			break;
>> +
>> +		if (!btrfs_is_continuous_delayed_item(curr, next))
>> +			break;
>> +	}
>> +
>> +	if (!nitems) {
>> +		ret = 0;
>> +		goto out;
>> +	}
>> +
>> +	keys = kmalloc(sizeof(struct btrfs_key) * nitems, GFP_NOFS);
>> +	if (!keys) {
>> +		ret = -ENOMEM;
>> +		goto out;
>> +	}
>> +
>> +	data_size = kmalloc(sizeof(u32) * nitems, GFP_NOFS);
>> +	if (!data_size) {
>> +		ret = -ENOMEM;
>> +		goto error;
>> +	}
>> +
>> +	/* get keys of all the delayed items */
>> +	i = 0;
>> +	list_for_each_entry(next,&head, tree_list) {
>> +		keys[i] = next->key;
>> +		data_size[i] = next->data_len;
>> +		i++;
>> +	}
>> +
>> +	/* insert the keys of the items */
>> +	ret = setup_items_for_insert(trans, root, path, keys, data_size,
>> +				     total_data_size, total_size, nitems);
>> +	if (ret)
>> +		goto error;
>> +
>> +	/* insert the dir index items */
>> +	slot = path->slots[0];
>> +	list_for_each_entry_safe(curr, next,&head, tree_list) {
>> +		data_ptr = btrfs_item_ptr(leaf, slot, char);
>> +		write_extent_buffer(leaf,&curr->data,
>> +				    (unsigned long)data_ptr,
>> +				    curr->data_len);
>> +		slot++;
>> +
>> +		btrfs_delayed_item_release_metadata(root, curr);
>> +
>> +		list_del(&curr->tree_list);
>> +		btrfs_release_delayed_item(curr);
>> +	}
>> +
>> +error:
>> +	kfree(data_size);
>> +	kfree(keys);
>> +out:
>> +	return ret;
>> +}
>> +
>> +/*
>> + * This helper can just do simple insertion that needn't extend item for new
>> + * data, such as directory name index insertion, inode insertion.
>> + */
>> +int btrfs_insert_delayed_item(struct btrfs_trans_handle *trans,
>
> make it 'static'
>
>> +			      struct btrfs_root *root,
>> +			      struct btrfs_path *path,
>> +			      struct btrfs_delayed_item *delayed_item)
>> +{
>> +	struct extent_buffer *leaf;
>> +	struct btrfs_item *item;
>> +	char *ptr;
>> +	int ret;
>> +
>> +	ret = btrfs_insert_empty_item(trans, root, path,&delayed_item->key,
>> +				      delayed_item->data_len);
>> +	if (ret<  0&&  ret != -EEXIST)
>> +		return ret;
>> +
>> +	leaf = path->nodes[0];
>> +
>> +	item = btrfs_item_nr(leaf, path->slots[0]);
>> +	ptr = btrfs_item_ptr(leaf, path->slots[0], char);
>> +
>> +	write_extent_buffer(leaf, delayed_item->data, (unsigned long)ptr,
>> +			    delayed_item->data_len);
>> +	btrfs_mark_buffer_dirty(leaf);
>> +
>> +	btrfs_delayed_item_release_metadata(root, delayed_item);
>> +	return 0;
>> +}
>> +
>> +/*
>> + * we insert a item first, then if there are some continuous items, we try
>                  an
>
>> + * to insert those items into the same leaf.
>> + */
>> +static int btrfs_insert_delayed_items(struct btrfs_trans_handle *trans,
>> +				      struct btrfs_path *path,
>> +				      struct btrfs_root *root,
>> +				      struct btrfs_delayed_node *node)
>> +{
>> +	struct btrfs_delayed_item *curr, *prev;
>> +	int ret = 0;
>> +
>> +do_again:
>> +	mutex_lock(&node->mutex);
>> +	curr = __btrfs_first_delayed_insertion_item(node);
>> +	if (!curr)
>> +		goto insert_end;
>> +
>> +	ret = btrfs_insert_delayed_item(trans, root, path, curr);
>> +	if (ret<  0) {
>> +		btrfs_release_path(root, path);
>> +		goto insert_end;
>> +	}
>> +
>> +	prev = curr;
>> +	curr = __btrfs_next_delayed_item(prev);
>> +	if (curr&&  btrfs_is_continuous_delayed_item(prev, curr)) {
>> +		/* insert the coninuous items into the same leaf */
>                                ^^^^^^^^^
>
>> +		path->slots[0]++;
>> +		btrfs_batch_insert_items(trans, root, path, curr);
>> +	}
>> +	btrfs_release_delayed_item(prev);
>> +	btrfs_mark_buffer_dirty(path->nodes[0]);
>> +
>> +	btrfs_release_path(root, path);
>> +	mutex_unlock(&node->mutex);
>> +	goto do_again;
>> +
>> +insert_end:
>> +	mutex_unlock(&node->mutex);
>> +	return ret;
>> +}
>> +
>> +static int btrfs_batch_delete_items(struct btrfs_trans_handle *trans,
>> +				    struct btrfs_root *root,
>> +				    struct btrfs_path *path,
>> +				    struct btrfs_delayed_item *item)
>> +{
>> +	struct btrfs_delayed_item *curr, *next;
>> +	struct extent_buffer *leaf;
>> +	struct btrfs_key key;
>> +	struct list_head head;
>> +	int nitems, i, last_item;
>> +	int ret = 0;
>> +
>> +	BUG_ON(!path->nodes[0]);
>> +
>> +	leaf = path->nodes[0];
>> +
>> +	i = path->slots[0];
>> +	last_item = btrfs_header_nritems(leaf) - 1;
>> +	if (i>  last_item)
>> +		return -ENOENT;		/* Is errno suitable? */
>
> I don'nt know, but I'd put a FIXME there :)
>
>> +
>> +	next = item;
>> +	INIT_LIST_HEAD(&head);
>> +	btrfs_item_key_to_cpu(leaf,&key, i);
>> +	nitems = 0;
>> +	/*
>> +	 * count the number of the dir index items that we can delete them in
>> +	 * batch
>> +	 */
>
> use same wording as above
>
>> +	while (btrfs_comp_cpu_keys(&next->key,&key) == 0) {
>> +		list_add_tail(&next->tree_list,&head);
>> +		nitems++;
>> +
>> +		curr = next;
>> +		next = __btrfs_next_delayed_item(curr);
>> +		if (!next)
>> +			break;
>> +
>> +		if (!btrfs_is_continuous_delayed_item(curr, next))
>> +			break;
>> +
>> +		i++;
>> +		if (i>  last_item)
>> +			break;
>> +		btrfs_item_key_to_cpu(leaf,&key, i);
>> +	}
>> +
>> +	if (!nitems)
>> +		return 0;
>> +
>> +	ret = btrfs_del_items(trans, root, path, path->slots[0], nitems);
>> +	if (ret)
>> +		goto out;
>> +
>> +	list_for_each_entry_safe(curr, next,&head, tree_list) {
>> +		btrfs_delayed_item_release_metadata(root, curr);
>> +		list_del(&curr->tree_list);
>> +		btrfs_release_delayed_item(curr);
>> +	}
>> +
>> +out:
>> +	return ret;
>> +}
>> +
>> +static int btrfs_delete_delayed_items(struct btrfs_trans_handle *trans,
>> +				      struct btrfs_path *path,
>> +				      struct btrfs_root *root,
>> +				      struct btrfs_delayed_node *node)
>> +{
>> +	struct btrfs_delayed_item *curr, *prev;
>> +	int ret = 0;
>> +
>> +do_again:
>> +	mutex_lock(&node->mutex);
>> +	curr = __btrfs_first_delayed_deletion_item(node);
>> +	if (!curr)
>> +		goto delete_fail;
>> +
>> +	ret = btrfs_search_slot(trans, root,&curr->key, path, -1, 1);
>> +	if (ret<  0)
>> +		goto delete_fail;
>> +	else if (ret>  0) {
>> +		/*
>> +		 * can't find the item which the node points to, so this node
>> +		 * is invalid, just drop it.
>> +		 */
>> +		prev = curr;
>> +		curr = __btrfs_next_delayed_item(prev);
>> +		btrfs_release_delayed_item(prev);
>> +		ret = 0;
>> +		btrfs_release_path(root, path);
>> +		if (curr)
>> +			goto do_again;
>> +		else
>> +			goto delete_fail;
>> +	}
>> +
>> +	btrfs_batch_delete_items(trans, root, path, curr);
>> +	btrfs_release_path(root, path);
>> +	mutex_unlock(&node->mutex);
>> +	goto do_again;
>> +
>> +delete_fail:
>> +	btrfs_release_path(root, path);
>> +	mutex_unlock(&node->mutex);
>> +	return ret;
>> +}
>> +
>> +static void btrfs_release_delayed_inode(struct btrfs_delayed_node *delayed_node)
>> +{
>> +	if (delayed_node&&  delayed_node->inode_dirty) {
>> +		BUG_ON(!delayed_node->delayed_root);
>> +		delayed_node->inode_dirty = 0;
>> +		delayed_node->count--;
>> +		atomic_dec(&delayed_node->delayed_root->items);
>> +		if (atomic_read(&delayed_node->delayed_root->items)<
>> +		    BTRFS_DELAYED_LOWER&&
>> +		    waitqueue_active(&delayed_node->delayed_root->wait))
>> +			wake_up(&delayed_node->delayed_root->wait);
>> +	}
>> +}
>> +
>> +static int btrfs_update_delayed_inode(struct btrfs_trans_handle *trans,
>> +				      struct btrfs_root *root,
>> +				      struct btrfs_path *path,
>> +				      struct btrfs_delayed_node *node)
>> +{
>> +	struct btrfs_key key;
>> +	struct btrfs_inode_item *inode_item;
>> +	struct extent_buffer *leaf;
>> +	int ret;
>> +
>> +	mutex_lock(&node->mutex);
>> +	if (!node->inode_dirty) {
>> +		mutex_unlock(&node->mutex);
>> +		return 0;
>> +	}
>> +
>> +	key.objectid = node->inode_id;
>> +	btrfs_set_key_type(&key, BTRFS_INODE_ITEM_KEY);
>> +	key.offset = 0;
>> +	ret = btrfs_lookup_inode(trans, root, path,&key, 1);
>> +	if (ret>  0) {
>> +		btrfs_release_path(root, path);
>> +		mutex_unlock(&node->mutex);
>> +		return -ENOENT;
>> +	} else if (ret<  0) {
>> +		mutex_unlock(&node->mutex);
>> +		return ret;
>> +	}
>> +
>> +	btrfs_unlock_up_safe(path, 1);
>> +	leaf = path->nodes[0];
>> +	inode_item = btrfs_item_ptr(leaf, path->slots[0],
>> +				    struct btrfs_inode_item);
>> +	write_extent_buffer(leaf,&node->inode_item, (unsigned long)inode_item,
>> +			    sizeof(struct btrfs_inode_item));
>> +	btrfs_mark_buffer_dirty(leaf);
>> +	btrfs_release_path(root, path);
>> +
>> +	btrfs_delayed_inode_release_metadata(root, node);
>> +	btrfs_release_delayed_inode(node);
>> +	mutex_unlock(&node->mutex);
>> +
>> +	return 0;
>> +}
>> +
>> +static int btrfs_prepare_run_delayed_node(struct btrfs_delayed_node *node)
>> +{
>> +	spin_lock(&node->lock);
>> +	if (node->delayed_doing) {
>> +		spin_unlock(&node->lock);
>> +		return 1;
>> +	}
>> +	node->delayed_doing = 1;
>> +	spin_unlock(&node->lock);
>> +
>> +	return 0;
>> +}
>> +
>> +static inline void btrfs_end_run_delayed_node(struct btrfs_delayed_node *node)
>> +{
>> +	spin_lock(&node->lock);
>> +	node->delayed_doing = 0;
>> +	spin_unlock(&node->lock);
>> +}
>> +
>> +/* Called when committing the transaction. */
>> +int btrfs_run_delayed_items(struct btrfs_trans_handle *trans,
>> +			    struct btrfs_root *root)
>> +{
>> +	struct btrfs_delayed_root *delayed_root;
>> +	struct btrfs_delayed_node *curr_node, *prev_node;
>> +	struct btrfs_path *path;
>> +	int ret = 0;
>> +
>> +	path = btrfs_alloc_path();
>> +	if (!path)
>> +		return -ENOMEM;
>> +	path->leave_spinning = 1;
>> +
>> +	delayed_root = btrfs_get_delayed_root(root);
>> +
>> +	curr_node = btrfs_first_delayed_node(delayed_root);
>> +	while (curr_node) {
>> +		/*
>> +		 * We needn't check this delayed node is being commited now or
>> +		 * not, because when committing the transaction, just the
>> +		 * current task can touch the delayed nodes, the other tasks
>> +		 * cannot get the transaction handle and also cannot get the
>> +		 * delayed nodes.
>> +		 */
>> +
>> +		/*
>> +		 * check root, if root is not the one which the delayed item
>> +		 * wants to be inserted to, we get the right root.
>> +		 */
>> +		if (root->objectid != curr_node->root_id) {
>> +			root = btrfs_get_fs_root(root, curr_node->root_id);
>> +			BUG_ON(IS_ERR_OR_NULL(root));
>> +		}
>> +
>> +		ret = btrfs_insert_delayed_items(trans, path, root,
>> +						 curr_node);
>> +		if (!ret)
>> +			ret = btrfs_delete_delayed_items(trans, path, root,
>> +							 curr_node);
>> +		if (!ret)
>> +			ret = btrfs_update_delayed_inode(trans, root, path,
>> +							 curr_node);
>> +		if (ret) {
>> +			btrfs_release_delayed_node(curr_node);
>> +			break;
>> +		}
>> +
>> +		prev_node = curr_node;
>> +		curr_node = btrfs_next_delayed_node(curr_node);
>> +		btrfs_release_delayed_node(prev_node);
>> +	}
>> +
>> +	btrfs_free_path(path);
>> +	return ret;
>> +}
>> +
>> +int btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans,
>> +				     struct btrfs_root *root,
>> +				     struct inode *inode)
>> +{
>> +	struct btrfs_delayed_node *delayed_node = btrfs_get_delayed_node(inode);
>> +	struct btrfs_path *path;
>> +	int ret;
>> +
>> +	if (!delayed_node)
>> +		return 0;
>> +
>> +	mutex_lock(&delayed_node->mutex);
>> +	if (!delayed_node->count) {
>> +		mutex_unlock(&delayed_node->mutex);
>> +		btrfs_release_delayed_node(delayed_node);
>> +		return 0;
>> +	}
>> +	mutex_unlock(&delayed_node->mutex);
>> +
>> +	path = btrfs_alloc_path();
>> +	if (!path) {
>> +		btrfs_release_delayed_node(delayed_node);
>> +		return -ENOMEM;
>> +	}
>> +	path->leave_spinning = 1;
>> +
>> +	ret = btrfs_insert_delayed_items(trans, path, root, delayed_node);
>> +	if (!ret)
>> +		ret = btrfs_delete_delayed_items(trans, path, root,
>> +						 delayed_node);
>> +	if (!ret)
>> +		ret = btrfs_update_delayed_inode(trans, root, path,
>> +						 delayed_node);
>> +	btrfs_free_path(path);
>> +	btrfs_release_delayed_node(delayed_node);
>> +	return ret;
>> +}
>> +
>> +int btrfs_remove_delayed_node(struct inode *inode)
>> +{
>> +	struct btrfs_trans_handle *trans;
>> +	struct btrfs_root *root = BTRFS_I(inode)->root;
>> +	struct btrfs_delayed_node *delayed_node = BTRFS_I(inode)->delayed_node;
>> +	int ret;
>> +
>> +	if (!delayed_node)
>> +		return 0;
>> +
>> +	trans = btrfs_join_transaction(root, 0);
>> +	if (IS_ERR(trans))
>> +		return PTR_ERR(trans);
>> +
>> +	ret = btrfs_commit_inode_delayed_items(trans, root, inode);
>> +	if (ret)
>> +		goto out;
>> +
>> +	BUG_ON(delayed_node->in_list);
>> +	BUG_ON(delayed_node->count);
>> +
>> +	BTRFS_I(inode)->delayed_node = NULL;
>> +	btrfs_release_delayed_node(delayed_node);
>> +out:
>> +	btrfs_end_transaction(trans, root);
>> +
>> +	return ret;
>> +}
>> +
>> +struct btrfs_async_delayed_node {
>> +	struct btrfs_root *root;
>> +	struct btrfs_delayed_node *delayed_node;
>> +	struct btrfs_work work;
>> +};
>> +
>> +static void btrfs_async_run_delayed_node_done(struct btrfs_work *work)
>> +{
>> +	struct btrfs_async_delayed_node *async_node;
>> +	struct btrfs_trans_handle *trans;
>> +	struct btrfs_path *path;
>> +	struct btrfs_delayed_node *delayed_node;
>> +	struct btrfs_root *root;
>> +	int need_requeue = 1;
>> +	int ret;
>> +
>> +	async_node = container_of(work, struct btrfs_async_delayed_node, work);
>> +
>> +	path = btrfs_alloc_path();
>> +	if (!path)
>> +		goto out;
>> +	path->leave_spinning = 1;
>> +
>> +	root = async_node->root;
>> +	BUG_ON(!root);
>> +
>> +	trans = btrfs_join_transaction(root, 0);
>> +	if (IS_ERR(trans)) {
>> +		btrfs_free_path(path);
> remove
>
>> +		goto out;
> goto out_free_path:
>
>> +	}
>> +
>> +	delayed_node = async_node->delayed_node;
>> +
>> +	root = btrfs_get_fs_root(async_node->root, delayed_node->root_id);
>> +	BUG_ON(IS_ERR_OR_NULL(root));
>> +
>> +	ret = btrfs_insert_delayed_items(trans, path, root, delayed_node);
>> +	if (!ret)
>> +		ret = btrfs_delete_delayed_items(trans, path, root,
>> +						 delayed_node);
>> +
>> +	if (!ret)
>> +		btrfs_update_delayed_inode(trans, root, path, delayed_node);
>> +
>> +	/* Maybe new delayed items have been inserted */
>> +	mutex_lock(&delayed_node->mutex);
>> +	if (!delayed_node->count)
>> +		need_requeue = 0;
>> +	mutex_unlock(&delayed_node->mutex);
>> +
>> +	btrfs_end_transaction(trans, root);
>
> out_free_path:
>
>> +	btrfs_free_path(path);
>> +out:
>> +	if (need_requeue)
>> +		btrfs_requeue_work(&async_node->work);
>> +	else {
>> +		BTRFS_end_run_delayed_node(delayed_node);
>> +		btrfs_release_delayed_node(delayed_node);
>> +		kfree(async_node);
>> +	}
>> +}
>> +
>> +static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root,
>> +				     struct btrfs_root *root, int all)
>> +{
>> +	struct btrfs_async_delayed_node *async_node;
>> +	struct btrfs_delayed_node *curr, *next;
>> +	int count = 0;
>> +
>> +	curr = btrfs_first_delayed_node(delayed_root);
>> +again:
>> +	if (!curr)
>> +		return 0;
>> +
>> +	if (btrfs_prepare_run_delayed_node(curr)) {
>> +		next = btrfs_next_delayed_node(curr);
>> +		btrfs_release_delayed_node(curr);
>> +		curr = next;
>> +		goto skip;
>> +	}
>> +
>> +	async_node = kmalloc(sizeof(*async_node), GFP_NOFS);
>> +	if (!async_node) {
>> +		btrfs_end_run_delayed_node(curr);
>> +		btrfs_release_delayed_node(curr);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	async_node->root = root;
>> +	async_node->delayed_node = curr;
>> +
>> +	async_node->work.func = btrfs_async_run_delayed_node_done;
>> +	async_node->work.flags = 0;
>> +
>> +	btrfs_queue_worker(&root->fs_info->delayed_workers,&async_node->work);
>> +	count++;
>> +
>> +	curr = btrfs_next_delayed_node(curr);
>> +skip:
>> +	if (all || count<  4)
>> +		goto again;
>> +	else if (curr)
>> +		btrfs_release_delayed_node(curr);
>> +	return 0;
>> +}
>> +
>> +static void btrfs_balance_delayed_items(struct btrfs_trans_handle *trans,
>> +					struct btrfs_root *root)
>> +{
>> +	struct btrfs_delayed_root *delayed_root;
>> +
>> +	delayed_root = btrfs_get_delayed_root(root);
>> +	if (atomic_read(&delayed_root->items)<  BTRFS_DELAYED_BACKGROUND)
>> +		return;
>> +
>> +	if (atomic_read(&delayed_root->items)>= BTRFS_DELAYED_WRITEBACK) {
>> +		int ret;
>> +		ret = btrfs_wq_run_delayed_node(delayed_root, root, 1);
>> +		if (ret)
>> +			return;
>> +
>> +		wait_event(delayed_root->wait,
>> +			   (atomic_read(&delayed_root->items)<
>> +			    BTRFS_DELAYED_LOWER));
>> +		return;
>> +	}
>> +
>> +	btrfs_wq_run_delayed_node(delayed_root, root, 0);
>> +}
>> +
>> +int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
>> +				   struct btrfs_root *root, const char *name,
>> +				   int name_len, struct inode *dir,
>> +				   struct btrfs_disk_key *disk_key, u8 type,
>> +				   u64 index)
>> +{
>> +	struct btrfs_delayed_node *delayed_node;
>> +	struct btrfs_delayed_item *delayed_item;
>> +	struct btrfs_dir_item *dir_item;
>> +	int ret;
>> +
>> +	delayed_node = btrfs_get_or_create_delayed_node(dir);
>> +	if (IS_ERR(delayed_node))
>> +		return PTR_ERR(delayed_node);
>> +
>> +	delayed_item = btrfs_alloc_delayed_item(sizeof(*dir_item) + name_len);
>> +	if (!delayed_item) {
>> +		ret = -ENOMEM;
>> +		goto release_node;
>> +	}
>> +
>> +	ret = btrfs_delayed_item_reserve_metadata(trans, root, delayed_item);
>> +	/*
>> +	 * we have reserved the enough space when we start a new
>                              ^^^ remove
>
>> +	 * transaction, so reserving metadata fails is impossible
>                                                failure
>
>> +	 */
>> +	BUG_ON(ret);
>> +
>> +	delayed_item->key.objectid = dir->i_ino;
>> +	btrfs_set_key_type(&delayed_item->key, BTRFS_DIR_INDEX_KEY);
>> +	delayed_item->key.offset = index;
>> +
>> +	dir_item = (struct btrfs_dir_item *)delayed_item->data;
>> +	dir_item->location = *disk_key;
>> +	dir_item->transid = cpu_to_le64(trans->transid);
>> +	dir_item->data_len = 0;
>> +	dir_item->name_len = cpu_to_le16(name_len);
>> +	dir_item->type = type;
>> +	memcpy((char *)(dir_item + 1), name, name_len);
>> +
>> +	mutex_lock(&delayed_node->mutex);
>> +	ret = __btrfs_add_delayed_insertion_item(delayed_node, delayed_item);
>
> Ret can become only 0 or -EEXIST, a printk would help here understand
> what happened.
>
>> +	BUG_ON(ret);
>> +	mutex_unlock(&delayed_node->mutex);
>> +
>> +release_node:
>> +	btrfs_release_delayed_node(delayed_node);
>> +
>> +	btrfs_balance_delayed_items(trans, root);
>> +	return ret;
>> +}
>> +
>> +static int btrfs_delete_delayed_insertion_item(struct btrfs_root *root,
>> +					       struct btrfs_delayed_node *node,
>> +					       struct btrfs_key *key)
>> +{
>> +	struct btrfs_delayed_item *item;
>> +
>> +	mutex_lock(&node->mutex);
>> +	item = __btrfs_lookup_delayed_insertion_item(node, key);
>> +	if (!item) {
>> +		mutex_unlock(&node->mutex);
>> +		return 1;
>> +	}
>> +
>> +	btrfs_delayed_item_release_metadata(root, item);
>> +	btrfs_release_delayed_item(item);
>> +	mutex_unlock(&node->mutex);
>> +	return 0;
>> +}
>> +
>> +int btrfs_delete_delayed_dir_index(struct btrfs_trans_handle *trans,
>> +				   struct btrfs_root *root, struct inode *dir,
>> +				   u64 index)
>> +{
>> +	struct btrfs_delayed_node *node;
>> +	struct btrfs_delayed_item *item;
>> +	struct btrfs_key item_key;
>> +	int ret;
>> +
>> +	node = btrfs_get_or_create_delayed_node(dir);
>> +	if (IS_ERR(node))
>> +		return PTR_ERR(node);
>> +
>> +	item_key.objectid = dir->i_ino;
>> +	btrfs_set_key_type(&item_key, BTRFS_DIR_INDEX_KEY);
>> +	item_key.offset = index;
>> +
>> +	ret = btrfs_delete_delayed_insertion_item(root, node,&item_key);
>> +	if (!ret)
>> +		goto end;
>> +
>> +	item = btrfs_alloc_delayed_item(0);
>> +	if (!item) {
>> +		ret = -ENOMEM;
>> +		goto end;
>> +	}
>> +
>> +	item->key = item_key;
>> +
>> +	ret = btrfs_delayed_item_reserve_metadata(trans, root, item);
>> +	/*
>> +	 * we have reserved the enough space when we start a new
>> +	 * transaction, so reserving metadata fails is impossible
>> +	 */
>> +	BUG_ON(ret);
>> +
>> +	mutex_lock(&node->mutex);
>> +	ret = __btrfs_add_delayed_deletion_item(node, item);
>> +	BUG_ON(ret);
>> +	mutex_unlock(&node->mutex);
>> +end:
>> +	btrfs_release_delayed_node(node);
>> +	btrfs_balance_delayed_items(trans, root);
>> +	return ret;
>> +}
>> +
>> +int btrfs_inode_delayed_dir_index_count(struct inode *inode)
>> +{
>> +	struct btrfs_delayed_node *delayed_node = BTRFS_I(inode)->delayed_node;
>> +	int ret = 0;
>> +
>> +	if (!delayed_node)
>> +		return -ENOENT;
>> +
>> +	/*
>> +	 * Since we have held i_mutex of this directory, it is impossible that
>> +	 * a new directory index is added into the delayed node and index_cnt
>> +	 * is updated now. So we needn't lock the delayed node.
>> +	 */
>> +	if (!delayed_node->index_cnt)
>> +		return -EINVAL;
>> +
>> +	BTRFS_I(inode)->index_cnt = delayed_node->index_cnt;
>> +	return ret;
>> +}
>> +
>> +struct btrfs_delayed_node *btrfs_get_locked_delayed_node(struct inode *inode)
>> +{
>> +	struct btrfs_delayed_node *delayed_node;
>> +
>> +	delayed_node = btrfs_get_delayed_node(inode);
>> +	if (delayed_node)
>> +		mutex_lock(&delayed_node->mutex);
>> +
>> +	return delayed_node;
>> +}
>> +
>> +void btrfs_put_locked_delayed_node(struct btrfs_delayed_node *node)
>> +{
>> +	if (!node)
>> +		return;
>> +
>> +	mutex_unlock(&node->mutex);
>> +	/*
>> +	 * This delayed node is still cached in the btrfs inode, so refs
>> +	 * must be>  1 now, and we needn't check it is going to be freed
>> +	 * or not.
>> +	 *
>> +	 * Besiede that, this function is used to read dir, we do not
>             Besides
>
>> +	 * insert/delete delayed items in this period. so we also needn't
>                                                         So
>
>> +	 * requeue or dequeue this delayed node.
>> +	 */
>> +	atomic_dec(&node->refs);
>> +}
>> +
>> +int btrfs_should_delete_dir_index(struct btrfs_delayed_node *delayed_node,
>> +				  struct btrfs_delayed_item **item,
>> +				  u64 index)
>> +{
>> +	if (!delayed_node)
>> +		return 0;
>> +
>> +	if (IS_ERR(*item))
>> +		return 0;
>> +
>> +	if (!(*item))
>> +		*item = __btrfs_first_delayed_deletion_item(delayed_node);
>> +
>> +	while (*item) {
>> +		if ((*item)->key.offset<  index) {
>> +			*item = __btrfs_next_delayed_item(*item);
>> +			continue;
>> +		} else if ((*item)->key.offset>  index)
>> +			break;
>> +
>> +		*item = __btrfs_next_delayed_item(*item);
>> +		if (!(*item))
>> +			*item = ERR_PTR(-ERANGE);
>> +		return 1;
>> +	}
>> +
>> +	if (!(*item))
>> +		*item = ERR_PTR(-ERANGE);
>> +	return 0;
>> +}
>> +
>> +/*
>> + * btrfs_readdir_delayed_dir_index - read dir info stored in the delayed tree
>> + *
>> + */
>> +int btrfs_readdir_delayed_dir_index(struct file *filp, void *dirent,
>> +				    filldir_t filldir,
>> +				    struct btrfs_delayed_node *delayed_node)
>> +{
>> +	struct btrfs_dir_item *di;
>> +	struct btrfs_delayed_item *item;
>> +	struct btrfs_key location;
>> +	char *name;
>> +	int name_len;
>> +	int over = 0;
>> +	unsigned char d_type;
>> +
>> +	if (!delayed_node)
>> +		return 0;
>> +
>> +	/*
>> +	 * Changing the data of the delayed item is impossible. So
>> +	 * we needn't lock them. And we have held i_mutex of the directory,
>> +	 * nobody can delete any directory indexs now.
>                                             indexes
>
>> +	 */
>> +	item = __btrfs_first_delayed_insertion_item(delayed_node);
>> +	while (item) {
>> +		if (item->key.offset<  filp->f_pos) {
>> +			item = __btrfs_next_delayed_item(item);
>> +			continue;
>> +		}
>> +
>> +		filp->f_pos = item->key.offset;
>> +
>> +		di = (struct btrfs_dir_item *)item->data;
>> +		name = (char *)(di + 1);
>> +		name_len = le16_to_cpu(di->name_len);
>> +
>> +		d_type = btrfs_filetype_table[di->type];
>> +		btrfs_disk_key_to_cpu(&location,&di->location);
>> +
>> +		over = filldir(dirent, name, name_len, item->key.offset,
>> +			       location.objectid, d_type);
>> +
>> +		if (over)
>> +			return 1;
>> +		item = __btrfs_next_delayed_item(item);
>> +	}
>> +	return 0;
>> +}
>> +
>> +BTRFS_SETGET_STACK_FUNCS(stack_inode_generation, struct btrfs_inode_item,
>> +			 generation, 64);
>> +BTRFS_SETGET_STACK_FUNCS(stack_inode_sequence, struct btrfs_inode_item,
>> +			 sequence, 64);
>> +BTRFS_SETGET_STACK_FUNCS(stack_inode_transid, struct btrfs_inode_item,
>> +			 transid, 64);
>> +BTRFS_SETGET_STACK_FUNCS(stack_inode_size, struct btrfs_inode_item, size, 64);
>> +BTRFS_SETGET_STACK_FUNCS(stack_inode_nbytes, struct btrfs_inode_item,
>> +			 nbytes, 64);
>> +BTRFS_SETGET_STACK_FUNCS(stack_inode_block_group, struct btrfs_inode_item,
>> +			 block_group, 64);
>> +BTRFS_SETGET_STACK_FUNCS(stack_inode_nlink, struct btrfs_inode_item, nlink, 32);
>> +BTRFS_SETGET_STACK_FUNCS(stack_inode_uid, struct btrfs_inode_item, uid, 32);
>> +BTRFS_SETGET_STACK_FUNCS(stack_inode_gid, struct btrfs_inode_item, gid, 32);
>> +BTRFS_SETGET_STACK_FUNCS(stack_inode_mode, struct btrfs_inode_item, mode, 32);
>> +BTRFS_SETGET_STACK_FUNCS(stack_inode_rdev, struct btrfs_inode_item, rdev, 64);
>> +BTRFS_SETGET_STACK_FUNCS(stack_inode_flags, struct btrfs_inode_item, flags, 64);
>> +
>> +BTRFS_SETGET_STACK_FUNCS(stack_timespec_sec, struct btrfs_timespec, sec, 64);
>> +BTRFS_SETGET_STACK_FUNCS(stack_timespec_nsec, struct btrfs_timespec, nsec, 32);
>> +
>> +static void fill_stack_inode_item(struct btrfs_trans_handle *trans,
>> +				  struct btrfs_inode_item *inode_item,
>> +				  struct inode *inode)
>> +{
>> +	btrfs_set_stack_inode_uid(inode_item, inode->i_uid);
>> +	btrfs_set_stack_inode_gid(inode_item, inode->i_gid);
>> +	btrfs_set_stack_inode_size(inode_item, BTRFS_I(inode)->disk_i_size);
>> +	btrfs_set_stack_inode_mode(inode_item, inode->i_mode);
>> +	btrfs_set_stack_inode_nlink(inode_item, inode->i_nlink);
>> +	btrfs_set_stack_inode_nbytes(inode_item, inode_get_bytes(inode));
>> +	btrfs_set_stack_inode_generation(inode_item,
>> +					 BTRFS_I(inode)->generation);
>> +	btrfs_set_stack_inode_sequence(inode_item, BTRFS_I(inode)->sequence);
>> +	btrfs_set_stack_inode_transid(inode_item, trans->transid);
>> +	btrfs_set_stack_inode_rdev(inode_item, inode->i_rdev);
>> +	btrfs_set_stack_inode_flags(inode_item, BTRFS_I(inode)->flags);
>> +	btrfs_set_stack_inode_block_group(inode_item,
>> +					  BTRFS_I(inode)->block_group);
>> +
>> +	btrfs_set_stack_timespec_sec(btrfs_inode_atime(inode_item),
>> +				     inode->i_atime.tv_sec);
>> +	btrfs_set_stack_timespec_nsec(btrfs_inode_atime(inode_item),
>> +				      inode->i_atime.tv_nsec);
>> +
>> +	btrfs_set_stack_timespec_sec(btrfs_inode_mtime(inode_item),
>> +				     inode->i_mtime.tv_sec);
>> +	btrfs_set_stack_timespec_nsec(btrfs_inode_mtime(inode_item),
>> +				      inode->i_mtime.tv_nsec);
>> +
>> +	btrfs_set_stack_timespec_sec(btrfs_inode_ctime(inode_item),
>> +				     inode->i_ctime.tv_sec);
>> +	btrfs_set_stack_timespec_nsec(btrfs_inode_ctime(inode_item),
>> +				      inode->i_ctime.tv_nsec);
>> +}
>> +
>> +int btrfs_delayed_update_inode(struct btrfs_trans_handle *trans,
>> +			       struct btrfs_root *root, struct inode *inode)
>> +{
>> +	struct btrfs_delayed_node *delayed_node;
>> +	int ret;
>> +
>> +	delayed_node = btrfs_get_or_create_delayed_node(inode);
>> +	if (IS_ERR(delayed_node))
>> +		return PTR_ERR(delayed_node);
>> +
>> +	mutex_lock(&delayed_node->mutex);
>> +	if (delayed_node->inode_dirty) {
>> +		fill_stack_inode_item(trans,&delayed_node->inode_item, inode);
>> +		goto release_node;
>> +	}
>> +
> (Extra empty line)
>
>> +
>> +	ret = btrfs_delayed_inode_reserve_metadata(trans, root, delayed_node);
>> +	/*
>> +	 * we must reserve the enough space when we start a new
>                             ^^^ remove
>> +	 * transaction, so reserving metadata fails is impossible
>                                                failure
>
>> +	 */
>> +	BUG_ON(ret);
>> +
>> +	fill_stack_inode_item(trans,&delayed_node->inode_item, inode);
>> +	delayed_node->inode_dirty = 1;
>> +	delayed_node->count++;
>> +	atomic_inc(&delayed_node->delayed_root->items);
>> +release_node:
>> +	mutex_unlock(&delayed_node->mutex);
>> +	btrfs_release_delayed_node(delayed_node);
>> +
>> +	btrfs_balance_delayed_items(trans, root);
>> +	return ret;
>> +}
>> +
>> +int btrfs_kill_delayed_inode_items(struct btrfs_trans_handle *trans,
>> +				   struct inode *inode)
>> +{
>> +	struct btrfs_root *root = BTRFS_I(inode)->root;
>> +	struct btrfs_delayed_node *delayed_node;
>> +	struct btrfs_delayed_item *curr_item, *prev_item;
>> +
>> +	delayed_node = btrfs_get_delayed_node(inode);
>> +	if (!delayed_node)
>> +		return 0;
>> +
>> +	mutex_lock(&delayed_node->mutex);
>> +	curr_item = __btrfs_first_delayed_insertion_item(delayed_node);
>> +	while (curr_item) {
>> +		btrfs_delayed_item_release_metadata(root, curr_item);
>> +		prev_item = curr_item;
>> +		curr_item = __btrfs_next_delayed_item(prev_item);
>> +		btrfs_release_delayed_item(prev_item);
>> +	}
>> +
>> +	curr_item = __btrfs_first_delayed_deletion_item(delayed_node);
>> +	while (curr_item) {
>> +		btrfs_delayed_item_release_metadata(root, curr_item);
>> +		prev_item = curr_item;
>> +		curr_item = __btrfs_next_delayed_item(prev_item);
>> +		btrfs_release_delayed_item(prev_item);
>> +	}
>> +
>> +	if (delayed_node->inode_dirty) {
>> +		btrfs_delayed_inode_release_metadata(root, delayed_node);
>> +		btrfs_release_delayed_inode(delayed_node);
>> +	}
>> +	mutex_unlock(&delayed_node->mutex);
>> +
>> +	btrfs_release_delayed_node(delayed_node);
>> +
>> +	return 0;
>> +}
>> diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
>> new file mode 100644
>> index 0000000..c1f8c5f
>> --- /dev/null
>> +++ b/fs/btrfs/delayed-inode.h
>> @@ -0,0 +1,122 @@
>> +/*
>> + * Copyright (C) 2011 Fujitsu.  All rights reserved.
>> + * Written by Miao Xie<miaox@cn.fujitsu.com>
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public
>> + * License v2 as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public
>> + * License along with this program; if not, write to the
>> + * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
>> + * Boston, MA 021110-1307, USA.
>> + */
>> +
>> +#ifndef __DELAYED_TREE_OPERATION_H
>> +#define __DELAYED_TREE_OPERATION_H
>> +
>> +#include<linux/rbtree.h>
>> +#include<linux/spinlock.h>
>> +#include<linux/mutex.h>
>> +#include<linux/list.h>
>> +#include<linux/wait.h>
>> +#include<asm/atomic.h>
>> +
>> +#include "ctree.h"
>> +
>> +/* types of the delayed item */
>> +#define BTRFS_DELAYED_INSERTION_ITEM	1
>> +#define BTRFS_DELAYED_DELETION_ITEM	2
>> +
>> +struct btrfs_delayed_root {
>> +	spinlock_t lock;
>> +	struct list_head node_list;
>> +	atomic_t items;		/* for delayed items */
>> +	int nodes;		/* for delayed nodes */
>> +	wait_queue_head_t wait;
>> +};
>> +
>> +struct btrfs_delayed_node {
>> +	struct list_head list;
>> +	u64 root_id;
>> +	u64 inode_id;
>> +	struct btrfs_delayed_root *delayed_root;
>> +	struct rb_root ins_root;
>> +	struct rb_root del_root;
>> +	struct mutex mutex;
>> +	spinlock_t lock;	/* protect delayed_doing */
>> +	struct btrfs_inode_item inode_item;
>> +	u64 bytes_reserved;
>> +	struct btrfs_block_rsv *block_rsv;
>> +	atomic_t refs;
>> +	u64 index_cnt;
>
>> +	int count;
>> +	int in_list;
>> +	int inode_dirty;
>> +	int delayed_doing;
>
> These could go to a bitfield, saves some bytes.
>
> I haven't looked in detail, but the fields may be reordered, namely the
> u64 ones, so there is no unnecessary alignment.
>
>> +};
>> +
>> +struct btrfs_delayed_item {
>> +	struct rb_node rb_node;
>> +	struct btrfs_key key;
>> +	struct list_head tree_list;	/* used for batch insert/delete items */
>> +	u64 bytes_reserved;
>> +	struct btrfs_block_rsv *block_rsv;
>> +	struct btrfs_delayed_node *delayed_node;
>> +	int ins_or_del;
>> +	u32 data_len;
>> +	char data[0];
>> +};
>> +
>> +static inline void btrfs_init_delayed_root(
>> +				struct btrfs_delayed_root *delayed_root)
>> +{
>> +	atomic_set(&delayed_root->items, 0);
>> +	delayed_root->nodes = 0;
>> +	spin_lock_init(&delayed_root->lock);
>> +	init_waitqueue_head(&delayed_root->wait);
>> +	INIT_LIST_HEAD(&delayed_root->node_list);
>> +}
>> +
>> +int btrfs_insert_delayed_dir_index(struct btrfs_trans_handle *trans,
>> +				   struct btrfs_root *root, const char *name,
>> +				   int name_len, struct inode *dir,
>> +				   struct btrfs_disk_key *disk_key, u8 type,
>> +				   u64 index);
>> +
>> +int btrfs_delete_delayed_dir_index(struct btrfs_trans_handle *trans,
>> +				   struct btrfs_root *root, struct inode *dir,
>> +				   u64 index);
>> +
>> +int btrfs_inode_delayed_dir_index_count(struct inode *inode);
>> +
>> +int btrfs_run_delayed_items(struct btrfs_trans_handle *trans,
>> +			    struct btrfs_root *root);
>> +
>> +int btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans,
>> +				     struct btrfs_root *root,
>> +				     struct inode *inode);
>> +/* Used for evicting the inode. */
>> +int btrfs_remove_delayed_node(struct inode *inode);
>> +int btrfs_kill_delayed_inode_items(struct btrfs_trans_handle *trans,
>> +				   struct inode *inode);
>> +
>> +
>> +int btrfs_delayed_update_inode(struct btrfs_trans_handle *trans,
>> +			       struct btrfs_root *root, struct inode *inode);
>> +
>> +/* Used for readdir() */
>> +struct btrfs_delayed_node *btrfs_get_locked_delayed_node(struct inode *inode);
>> +void btrfs_put_locked_delayed_node(struct btrfs_delayed_node *node);
>> +int btrfs_should_delete_dir_index(struct btrfs_delayed_node *delayed_node,
>> +				  struct btrfs_delayed_item **item,
>> +				  u64 index);
>> +int btrfs_readdir_delayed_dir_index(struct file *filp, void *dirent,
>> +				    filldir_t filldir,
>> +				    struct btrfs_delayed_node *delayed_node);
>> +#endif
>> diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c
>> index f0cad5a..696c793 100644
>> --- a/fs/btrfs/dir-item.c
>> +++ b/fs/btrfs/dir-item.c
>> @@ -124,8 +124,9 @@ int btrfs_insert_xattr_item(struct btrfs_trans_handle *trans,
>>    * to use for the second index (if one is created).
>>    */
>>   int btrfs_insert_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root
>> -			  *root, const char *name, int name_len, u64 dir,
>> -			  struct btrfs_key *location, u8 type, u64 index)
>> +			  *root, const char *name, int name_len,
>> +			  struct inode *dir, struct btrfs_key *location,
>> +			  u8 type, u64 index)
>
> Can 'type' and 'index' be swapped? alignment of u64 after u8 will waste
> some stack space. Probably as a separate patch, hmm.
>
>>   {
>>   	int ret = 0;
>>   	int ret2 = 0;
>> @@ -137,13 +138,17 @@ int btrfs_insert_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root
>>   	struct btrfs_disk_key disk_key;
>>   	u32 data_size;
>>
>> -	key.objectid = dir;
>> +	key.objectid = dir->i_ino;
>>   	btrfs_set_key_type(&key, BTRFS_DIR_ITEM_KEY);
>>   	key.offset = btrfs_name_hash(name, name_len);
>>
>>   	path = btrfs_alloc_path();
>> +	if (!path)
>> +		return -ENOMEM;
>>   	path->leave_spinning = 1;
>>
>> +	btrfs_cpu_key_to_disk(&disk_key, location);
>> +
>>   	data_size = sizeof(*dir_item) + name_len;
>>   	dir_item = insert_with_overflow(trans, root, path,&key, data_size,
>>   					name, name_len);
>> @@ -155,7 +160,6 @@ int btrfs_insert_dir_item(struct btrfs_trans_handle *trans, struct btrfs_root
>>   	}
>>
>>   	leaf = path->nodes[0];
>> -	btrfs_cpu_key_to_disk(&disk_key, location);
>>   	btrfs_set_dir_item_key(leaf, dir_item,&disk_key);
>>   	btrfs_set_dir_type(leaf, dir_item, type);
>>   	btrfs_set_dir_data_len(leaf, dir_item, 0);
>> @@ -174,24 +178,8 @@ second_insert:
>>   	}
>>   	btrfs_release_path(root, path);
>>
>> -	btrfs_set_key_type(&key, BTRFS_DIR_INDEX_KEY);
>> -	key.offset = index;
>> -	dir_item = insert_with_overflow(trans, root, path,&key, data_size,
>> -					name, name_len);
>> -	if (IS_ERR(dir_item)) {
>> -		ret2 = PTR_ERR(dir_item);
>> -		goto out;
>> -	}
>> -	leaf = path->nodes[0];
>> -	btrfs_cpu_key_to_disk(&disk_key, location);
>> -	btrfs_set_dir_item_key(leaf, dir_item,&disk_key);
>> -	btrfs_set_dir_type(leaf, dir_item, type);
>> -	btrfs_set_dir_data_len(leaf, dir_item, 0);
>> -	btrfs_set_dir_name_len(leaf, dir_item, name_len);
>> -	btrfs_set_dir_transid(leaf, dir_item, trans->transid);
>> -	name_ptr = (unsigned long)(dir_item + 1);
>> -	write_extent_buffer(leaf, name, name_ptr, name_len);
>> -	btrfs_mark_buffer_dirty(leaf);
>> +	ret2 = btrfs_insert_delayed_dir_index(trans, root, name, name_len, dir,
>> +					&disk_key, type, index);
>>   out:
>>   	btrfs_free_path(path);
>>   	if (ret)
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 3e1ea3e..ea3c145 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -1676,6 +1676,13 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>>
>>   	INIT_LIST_HEAD(&fs_info->ordered_extents);
>>   	spin_lock_init(&fs_info->ordered_extent_lock);
>> +	fs_info->delayed_root = kmalloc(sizeof(struct btrfs_delayed_root),
>> +					GFP_NOFS);
>> +	if (!fs_info->delayed_root) {
>> +		err = -ENOMEM;
>> +		goto fail_iput;
>> +	}
>> +	btrfs_init_delayed_root(fs_info->delayed_root);
>>
>>   	sb->s_blocksize = 4096;
>>   	sb->s_blocksize_bits = blksize_bits(4096);
>> @@ -1743,7 +1750,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>>   	bh = btrfs_read_dev_super(fs_devices->latest_bdev);
>>   	if (!bh) {
>>   		err = -EINVAL;
>> -		goto fail_iput;
>> +		goto fail_alloc;
>>   	}
>>
>>   	memcpy(&fs_info->super_copy, bh->b_data, sizeof(fs_info->super_copy));
>> @@ -1755,7 +1762,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>>
>>   	disk_super =&fs_info->super_copy;
>>   	if (!btrfs_super_root(disk_super))
>> -		goto fail_iput;
>> +		goto fail_alloc;
>>
>>   	/* check FS state, whether FS is broken. */
>>   	fs_info->fs_state |= btrfs_super_flags(disk_super);
>> @@ -1765,7 +1772,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>>   	ret = btrfs_parse_options(tree_root, options);
>>   	if (ret) {
>>   		err = ret;
>> -		goto fail_iput;
>> +		goto fail_alloc;
>>   	}
>>
>>   	features = btrfs_super_incompat_flags(disk_super)&
>> @@ -1775,7 +1782,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>>   		       "unsupported optional features (%Lx).\n",
>>   		       (unsigned long long)features);
>>   		err = -EINVAL;
>> -		goto fail_iput;
>> +		goto fail_alloc;
>>   	}
>>
>>   	features = btrfs_super_incompat_flags(disk_super);
>> @@ -1791,7 +1798,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>>   		       "unsupported option features (%Lx).\n",
>>   		       (unsigned long long)features);
>>   		err = -EINVAL;
>> -		goto fail_iput;
>> +		goto fail_alloc;
>>   	}
>>
>>   	btrfs_init_workers(&fs_info->generic_worker,
>> @@ -1838,6 +1845,9 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>>   			&fs_info->generic_worker);
>>   	btrfs_init_workers(&fs_info->endio_freespace_worker, "freespace-write",
>>   			   1,&fs_info->generic_worker);
>> +	btrfs_init_workers(&fs_info->delayed_workers, "delayed-meta",
>> +			   fs_info->thread_pool_size,
>> +			&fs_info->generic_worker);
>>
>>   	/*
>>   	 * endios are largely parallel and should have a very
>> @@ -1859,6 +1869,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
>>   	btrfs_start_workers(&fs_info->endio_meta_write_workers, 1);
>>   	btrfs_start_workers(&fs_info->endio_write_workers, 1);
>>   	btrfs_start_workers(&fs_info->endio_freespace_worker, 1);
>> +	btrfs_start_workers(&fs_info->delayed_workers, 1);
>>
>>   	fs_info->bdi.ra_pages *= btrfs_super_num_devices(disk_super);
>>   	fs_info->bdi.ra_pages = max(fs_info->bdi.ra_pages,
>> @@ -2104,6 +2115,9 @@ fail_sb_buffer:
>>   	btrfs_stop_workers(&fs_info->endio_write_workers);
>>   	btrfs_stop_workers(&fs_info->endio_freespace_worker);
>>   	btrfs_stop_workers(&fs_info->submit_workers);
>> +	btrfs_stop_workers(&fs_info->delayed_workers);
>> +fail_alloc:
>> +	kfree(fs_info->delayed_root);
>>   fail_iput:
>>   	invalidate_inode_pages2(fs_info->btree_inode->i_mapping);
>>   	iput(fs_info->btree_inode);
>> @@ -2551,6 +2565,7 @@ int close_ctree(struct btrfs_root *root)
>>   	del_fs_roots(fs_info);
>>
>>   	iput(fs_info->btree_inode);
>> +	kfree(fs_info->delayed_root);
>>
>>   	btrfs_stop_workers(&fs_info->generic_worker);
>>   	btrfs_stop_workers(&fs_info->fixup_workers);
>> @@ -2562,6 +2577,7 @@ int close_ctree(struct btrfs_root *root)
>>   	btrfs_stop_workers(&fs_info->endio_write_workers);
>>   	btrfs_stop_workers(&fs_info->endio_freespace_worker);
>>   	btrfs_stop_workers(&fs_info->submit_workers);
>> +	btrfs_stop_workers(&fs_info->delayed_workers);
>>
>>   	btrfs_close_devices(fs_info->fs_devices);
>>   	btrfs_mapping_tree_free(&fs_info->mapping_tree);
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index a7aaa10..0512b04 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -3897,12 +3897,6 @@ static void release_global_block_rsv(struct btrfs_fs_info *fs_info)
>>   	WARN_ON(fs_info->chunk_block_rsv.reserved>  0);
>>   }
>>
>> -static u64 calc_trans_metadata_size(struct btrfs_root *root, int num_items)
>> -{
>> -	return (root->leafsize + root->nodesize * (BTRFS_MAX_LEVEL - 1)) *
>> -		3 * num_items;
>> -}
>> -
>>   int btrfs_trans_reserve_metadata(struct btrfs_trans_handle *trans,
>>   				 struct btrfs_root *root,
>>   				 int num_items)
>> @@ -3913,7 +3907,7 @@ int btrfs_trans_reserve_metadata(struct btrfs_trans_handle *trans,
>>   	if (num_items == 0 || root->fs_info->chunk_root == root)
>>   		return 0;
>>
>> -	num_bytes = calc_trans_metadata_size(root, num_items);
>> +	num_bytes = btrfs_calc_trans_metadata_size(root, num_items);
>>   	ret = btrfs_block_rsv_add(trans, root,&root->fs_info->trans_block_rsv,
>>   				  num_bytes);
>>   	if (!ret) {
>> @@ -3952,14 +3946,14 @@ int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
>>   	 * If all of the metadata space is used, we can commit
>>   	 * transaction and use space it freed.
>>   	 */
>> -	u64 num_bytes = calc_trans_metadata_size(root, 4);
>> +	u64 num_bytes = btrfs_calc_trans_metadata_size(root, 4);
>>   	return block_rsv_migrate_bytes(src_rsv, dst_rsv, num_bytes);
>>   }
>>
>>   void btrfs_orphan_release_metadata(struct inode *inode)
>>   {
>>   	struct btrfs_root *root = BTRFS_I(inode)->root;
> (newline after declarations)
>
>> -	u64 num_bytes = calc_trans_metadata_size(root, 4);
>> +	u64 num_bytes = btrfs_calc_trans_metadata_size(root, 4);
>>   	btrfs_block_rsv_release(root, root->orphan_block_rsv, num_bytes);
>>   }
>>
>> @@ -3973,7 +3967,7 @@ int btrfs_snap_reserve_metadata(struct btrfs_trans_handle *trans,
>>   	 * two for root back/forward refs, two for directory entries
>>   	 * and one for root of the snapshot.
>>   	 */
>> -	u64 num_bytes = calc_trans_metadata_size(root, 5);
>> +	u64 num_bytes = btrfs_calc_trans_metadata_size(root, 5);
> (newline after declarations)
>
>>   	dst_rsv->space_info = src_rsv->space_info;
>>   	return block_rsv_migrate_bytes(src_rsv, dst_rsv, num_bytes);
>>   }
>> @@ -4000,7 +3994,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes)
>>   	nr_extents = atomic_read(&BTRFS_I(inode)->outstanding_extents) + 1;
>>   	if (nr_extents>  BTRFS_I(inode)->reserved_extents) {
>>   		nr_extents -= BTRFS_I(inode)->reserved_extents;
>> -		to_reserve = calc_trans_metadata_size(root, nr_extents);
>> +		to_reserve = btrfs_calc_trans_metadata_size(root, nr_extents);
>>   	} else {
>>   		nr_extents = 0;
>>   		to_reserve = 0;
>> @@ -4047,7 +4041,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes)
>>
>>   	to_free = calc_csum_metadata_size(inode, num_bytes);
>>   	if (nr_extents>  0)
>> -		to_free += calc_trans_metadata_size(root, nr_extents);
>> +		to_free += btrfs_calc_trans_metadata_size(root, nr_extents);
>>
>>   	btrfs_block_rsv_release(root,&root->fs_info->delalloc_block_rsv,
>>   				to_free);
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 8d392ed..09152fe 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -2598,33 +2598,12 @@ static void fill_inode_item(struct btrfs_trans_handle *trans,
>>   noinline int btrfs_update_inode(struct btrfs_trans_handle *trans,
>>   				struct btrfs_root *root, struct inode *inode)
>>   {
>> -	struct btrfs_inode_item *inode_item;
>> -	struct btrfs_path *path;
>> -	struct extent_buffer *leaf;
>>   	int ret;
>>
>> -	path = btrfs_alloc_path();
>> -	BUG_ON(!path);
>> -	path->leave_spinning = 1;
>> -	ret = btrfs_lookup_inode(trans, root, path,
>> -				&BTRFS_I(inode)->location, 1);
>> -	if (ret) {
>> -		if (ret>  0)
>> -			ret = -ENOENT;
>> -		goto failed;
>> -	}
>> -
>> -	btrfs_unlock_up_safe(path, 1);
>> -	leaf = path->nodes[0];
>> -	inode_item = btrfs_item_ptr(leaf, path->slots[0],
>> -				  struct btrfs_inode_item);
>> +	ret = btrfs_delayed_update_inode(trans, root, inode);
>> +	if (!ret)
>> +		btrfs_set_inode_last_trans(trans, inode);
>>
>> -	fill_inode_item(trans, leaf, inode_item, inode);
>> -	btrfs_mark_buffer_dirty(leaf);
>> -	btrfs_set_inode_last_trans(trans, inode);
>> -	ret = 0;
>> -failed:
>> -	btrfs_free_path(path);
>>   	return ret;
>>   }
>>
>> @@ -2680,18 +2659,9 @@ int btrfs_unlink_inode(struct btrfs_trans_handle *trans,
>>   		goto err;
>>   	}
>>
>> -	di = btrfs_lookup_dir_index_item(trans, root, path, dir->i_ino,
>> -					 index, name, name_len, -1);
>> -	if (IS_ERR(di)) {
>> -		ret = PTR_ERR(di);
>> -		goto err;
>> -	}
>> -	if (!di) {
>> -		ret = -ENOENT;
>> +	ret = btrfs_delete_delayed_dir_index(trans, root, dir, index);
>> +	if (ret)
>>   		goto err;
>> -	}
>> -	ret = btrfs_delete_one_dir_name(trans, root, path, di);
>> -	btrfs_release_path(root, path);
>>
>>   	ret = btrfs_del_inode_ref_in_log(trans, root, name, name_len,
>>   					 inode, dir->i_ino);
>> @@ -2867,6 +2837,14 @@ static struct btrfs_trans_handle *__unlink_start_trans(struct inode *dir,
>>   	index = btrfs_inode_ref_index(path->nodes[0], ref);
>>   	btrfs_release_path(root, path);
>>
>> +	/*
>> +	 * This is a commit root search, if we can lookup inode item and other
>> +	 * relative items in the commit root, it means the transaction of
>> +	 * dir/file creation has been committed, and the dir index item that we
>> +	 * delay to insert has also been inserted into the commit root. So
>> +	 * we needn't worry about the delayed insertion of the dir index item
>> +	 * here.
>> +	 */
>>   	di = btrfs_lookup_dir_index_item(trans, root, path, dir->i_ino, index,
>>   				dentry->d_name.name, dentry->d_name.len, 0);
>>   	if (IS_ERR(di)) {
>> @@ -2972,24 +2950,16 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
>>   		btrfs_release_path(root, path);
>>   		index = key.offset;
>>   	}
>> +	btrfs_release_path(root, path);
>>
>> -	di = btrfs_lookup_dir_index_item(trans, root, path, dir->i_ino,
>> -					 index, name, name_len, -1);
>> -	BUG_ON(!di || IS_ERR(di));
>> -
>> -	leaf = path->nodes[0];
>> -	btrfs_dir_item_key_to_cpu(leaf, di,&key);
>> -	WARN_ON(key.type != BTRFS_ROOT_ITEM_KEY || key.objectid != objectid);
>> -	ret = btrfs_delete_one_dir_name(trans, root, path, di);
>> +	ret = btrfs_delete_delayed_dir_index(trans, root, dir, index);
>>   	BUG_ON(ret);
>> -	btrfs_release_path(root, path);
>>
>>   	btrfs_i_size_write(dir, dir->i_size - name_len * 2);
>>   	dir->i_mtime = dir->i_ctime = CURRENT_TIME;
>>   	ret = btrfs_update_inode(trans, root, dir);
>>   	BUG_ON(ret);
>>
>> -	btrfs_free_path(path);
>>   	return 0;
>>   }
>>
>> @@ -3249,6 +3219,15 @@ int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
>>   	if (root->ref_cows || root == root->fs_info->tree_root)
>>   		btrfs_drop_extent_cache(inode, new_size&  (~mask), (u64)-1, 0);
>>
>> +	/*
>> +	 * This function is also used to drop the items in the log tree before
>> +	 * we relog the inode, so if root != BTRFS_I(inode)->root, it means
>> +	 * it is used to drop the loged items. So we shouldn't kill the delayed
>> +	 * items.
>> +	 */
>> +	if (min_type == 0&&  root == BTRFS_I(inode)->root)
>> +		btrfs_kill_delayed_inode_items(trans, inode);
>> +
>>   	path = btrfs_alloc_path();
>>   	BUG_ON(!path);
>>   	path->reada = -1;
>> @@ -4182,7 +4161,7 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
>>   	return d_splice_alias(inode, dentry);
>>   }
>>
>> -static unsigned char btrfs_filetype_table[] = {
>> +unsigned char btrfs_filetype_table[] = {
>>   	DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
>>   };
>>
>> @@ -4196,6 +4175,8 @@ static int btrfs_real_readdir(struct file *filp, void *dirent,
>>   	struct btrfs_key key;
>>   	struct btrfs_key found_key;
>>   	struct btrfs_path *path;
>> +	struct btrfs_delayed_node *delayed_node = NULL;
>> +	struct btrfs_delayed_item *delayed_item = NULL;
>>   	int ret;
>>   	u32 nritems;
>>   	struct extent_buffer *leaf;
>> @@ -4210,6 +4191,7 @@ static int btrfs_real_readdir(struct file *filp, void *dirent,
>>   	char tmp_name[32];
>>   	char *name_ptr;
>>   	int name_len;
>> +	int is_curr = 0;	/* filp->f_pos points to the current index? */
>>
>>   	/* FIXME, use a real flag for deciding about the key type */
>>   	if (root->fs_info->tree_root == root)
>> @@ -4234,8 +4216,13 @@ static int btrfs_real_readdir(struct file *filp, void *dirent,
>>   		filp->f_pos = 2;
>>   	}
>>   	path = btrfs_alloc_path();
>> +	if (!path)
>> +		return -ENOMEM;
>>   	path->reada = 2;
>>
>> +	if (key_type == BTRFS_DIR_INDEX_KEY)
>> +		delayed_node = btrfs_get_locked_delayed_node(inode);
>> +
>>   	btrfs_set_key_type(&key, key_type);
>>   	key.offset = filp->f_pos;
>>   	key.objectid = inode->i_ino;
>> @@ -4273,8 +4260,13 @@ static int btrfs_real_readdir(struct file *filp, void *dirent,
>>   			break;
>>   		if (found_key.offset<  filp->f_pos)
>>   			continue;
>> +		if (key_type == BTRFS_DIR_INDEX_KEY&&
>> +		    btrfs_should_delete_dir_index(delayed_node,&delayed_item,
>> +						  found_key.offset))
>> +			continue;
>>
>>   		filp->f_pos = found_key.offset;
>> +		is_curr = 1;
>>
>>   		di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
>>   		di_cur = 0;
>> @@ -4324,6 +4316,15 @@ skip:
>>   		}
>>   	}
>>
>> +	if (key_type == BTRFS_DIR_INDEX_KEY) {
>> +		if (is_curr)
>> +			filp->f_pos++;
>> +		ret = btrfs_readdir_delayed_dir_index(filp, dirent, filldir,
>> +						      delayed_node);
>> +		if (ret)
>> +			goto nopos;
>> +	}
>> +
>>   	/* Reached end of directory/root. Bump pos past the last item. */
>>   	if (key_type == BTRFS_DIR_INDEX_KEY)
>>   		/*
>> @@ -4336,6 +4337,8 @@ skip:
>>   nopos:
>>   	ret = 0;
>>   err:
>> +	if (key_type == BTRFS_DIR_INDEX_KEY)
>> +		btrfs_put_locked_delayed_node(delayed_node);
>>   	btrfs_free_path(path);
>>   	return ret;
>>   }
>> @@ -4481,9 +4484,12 @@ int btrfs_set_inode_index(struct inode *dir, u64 *index)
>>   	int ret = 0;
>>
>>   	if (BTRFS_I(dir)->index_cnt == (u64)-1) {
>> -		ret = btrfs_set_inode_index_count(dir);
>> -		if (ret)
>> -			return ret;
>> +		ret = btrfs_inode_delayed_dir_index_count(dir);
>> +		if (ret) {
>> +			ret = btrfs_set_inode_index_count(dir);
>> +			if (ret)
>> +				return ret;
>> +		}
>>   	}
>>
>>   	*index = BTRFS_I(dir)->index_cnt;
>> @@ -4641,7 +4647,7 @@ int btrfs_add_link(struct btrfs_trans_handle *trans,
>>
>>   	if (ret == 0) {
>>   		ret = btrfs_insert_dir_item(trans, root, name, name_len,
>> -					    parent_inode->i_ino,&key,
>> +					    parent_inode,&key,
>>   					    btrfs_inode_type(inode), index);
>>   		BUG_ON(ret);
>>
>> @@ -6519,6 +6525,8 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
>>   	ei->dummy_inode = 0;
>>   	ei->force_compress = BTRFS_COMPRESS_NONE;
>>
>> +	ei->delayed_node = NULL;
>> +
>>   	inode =&ei->vfs_inode;
>>   	extent_map_tree_init(&ei->extent_tree, GFP_NOFS);
>>   	extent_io_tree_init(&ei->io_tree,&inode->i_data, GFP_NOFS);
>> @@ -6602,6 +6610,7 @@ void btrfs_destroy_inode(struct inode *inode)
>>   	inode_tree_del(inode);
>>   	btrfs_drop_extent_cache(inode, 0, (u64)-1, 0);
>>   free:
>> +	btrfs_remove_delayed_node(inode);
>>   	kmem_cache_free(btrfs_inode_cachep, BTRFS_I(inode));
>>   }
>>
>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>> index be2d4f6..c7bb1c3 100644
>> --- a/fs/btrfs/ioctl.c
>> +++ b/fs/btrfs/ioctl.c
>> @@ -333,7 +333,7 @@ static noinline int create_subvol(struct btrfs_root *root,
>>   	BUG_ON(ret);
>>
>>   	ret = btrfs_insert_dir_item(trans, root,
>> -				    name, namelen, dir->i_ino,&key,
>> +				    name, namelen, dir,&key,
>>   				    BTRFS_FT_DIR, index);
>>   	if (ret)
>>   		goto fail;
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 0209b5f..6fb8199 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -40,6 +40,7 @@
>>   #include<linux/magic.h>
>>   #include<linux/slab.h>
>>   #include "compat.h"
>> +#include "delayed-inode.h"
>>   #include "ctree.h"
>>   #include "disk-io.h"
>>   #include "transaction.h"
>> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
>> index 3d73c8d..f3e250f 100644
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@ -958,7 +958,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>>   	BUG_ON(ret);
>>   	ret = btrfs_insert_dir_item(trans, parent_root,
>>   				dentry->d_name.name, dentry->d_name.len,
>> -				parent_inode->i_ino,&key,
>> +				parent_inode,&key,
>>   				BTRFS_FT_DIR, index);
>>   	BUG_ON(ret);
>>
>> @@ -1302,9 +1302,14 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
>>   	} while (cur_trans->num_writers>  1 ||
>>   		 (should_grow&&  cur_trans->num_joined != joined));
>>
>> +	btrfs_run_delayed_items(trans, root);
>> +
>>   	ret = create_pending_snapshots(trans, root->fs_info);
>>   	BUG_ON(ret);
>>
>> +	btrfs_run_delayed_items(trans, root);
>> +	BUG_ON(ret);
>> +
>>   	ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
>>   	BUG_ON(ret);
>>
>> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
>> index a4bbb85..e8a2e95 100644
>> --- a/fs/btrfs/tree-log.c
>> +++ b/fs/btrfs/tree-log.c
>> @@ -2775,6 +2775,13 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
>>   		max_key.type = (u8)-1;
>>   	max_key.offset = (u64)-1;
>>
>> +	ret = btrfs_commit_inode_delayed_items(trans, root, inode);
>> +	if (ret) {
>> +		btrfs_free_path(path);
>> +		btrfs_free_path(dst_path);
>> +		return ret;
>> +	}
>> +
>>   	mutex_lock(&BTRFS_I(inode)->log_mutex);
>>
>>   	/*
>
> That's all for now. I have probably missed a few 'make it static' and
> the code style suggestions are not everywhere.
>
> The patch improvement looks nice!
>
> dave
>


  reply	other threads:[~2011-02-18 14:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-17  5:48 [PATCH V2] btrfs: implement delayed inode items operation Miao Xie
2011-02-18  1:09 ` Chris Mason
2011-02-18  9:08   ` Miao Xie
2011-02-18 12:19     ` Chris Mason
2011-02-18 13:09       ` Miao Xie
2011-02-18 14:35         ` Chris Mason
2011-02-22  9:58           ` Miao Xie
2011-03-05  3:44   ` David Nicol
2011-02-18 13:30 ` David Sterba
2011-02-18 14:13   ` Miao Xie [this message]
  -- strict thread matches above, loose matches on Subject: below --
2011-02-24 14:02 Itaru Kitayama
2011-03-03  6:15 ` Miao Xie
2011-03-05  2:35   ` Itaru Kitayama

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D5E7E8D.3020705@cn.fujitsu.com \
    --to=miaox@cn.fujitsu.com \
    --cc=chris.mason@oracle.com \
    --cc=kitayama@cl.bb4u.ne.jp \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).