linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: liubo <liubo2009@cn.fujitsu.com>
To: Josef Bacik <josef@redhat.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: create a unique inode number for all subvol entries
Date: Tue, 07 Dec 2010 10:02:20 +0800	[thread overview]
Message-ID: <4CFD95AC.4030401@cn.fujitsu.com> (raw)
In-Reply-To: <1291668522-7759-1-git-send-email-josef@redhat.com>

On 12/07/2010 04:48 AM, Josef Bacik wrote:
> Currently BTRFS has a problem where any subvol's will have the same inode
> numbers as other files in the parent subvol.  This can cause problems with
> userspace apps that depend on inode numbers being unique across a volume.  So in
> order to solve this problem we need to do the following
> 
> 1) Create an empty key with the fake inode number for the subvol.  This is a
> place holder, since we determine which inode number to start with by searching
> for the largest objectid in the subvolume, we need to make sure our fake inode
> number isn't reused by somebody else.
> 
> 2) Save our fake inode number in our dir item.  We can already store data in dir
> items, so just store the inode number.  This is future proof since I explicitly
> check for data_len == sizeof(u64), that way if we change what data gets put in
> the dir item in the future, older kernels will be able to deal with it properly.
> Also if an older kernel mounts with this change it will be ok.
> 
> Since subvols have their own st_dev it is ok for them to continue to have an
> inode number of 256, but the inode returned by readdir needs to be unique to the
> subvolume, so our fake inode number will be used for d_ino with readdir.  I
> tested this with a program that Bruce Fields wrote to spit out the actual inode
> numbers and the inode number returned by readdir
> 
> root@test1244 ~]# touch /mnt/btrfs-test/foo
> [root@test1244 ~]# touch /mnt/btrfs-test/bar
> [root@test1244 ~]# touch /mnt/btrfs-test/baz
> [root@test1244 ~]# ./btrfs-progs-unstable/btrfs subvol create
> /mnt/btrfs-test/subvol
> Create subvolume '/mnt/btrfs-test/subvol'
> [root@test1244 ~]# ./readdir-test /mnt/btrfs-test/
> . 256 256
> .. 256 139265
> foo 257 257
> bar 258 258
> baz 259 259
> subvol 260 256
> 
> Thanks,

Hi, Josef,

The patch looks nice.

since insert dir code is mainly same, what about to change btrfs_insert_dir_item ABI
to use such phrase:

int btrfs_insert_subvol_dir_item(...)
{
	return btrfs_insert_dir_item(...);
}

does it make code simple?

Thanks,
Liu Bo

> 
> Signed-off-by: Josef Bacik <josef@redhat.com>
> ---
>  fs/btrfs/ctree.h       |    6 +++
>  fs/btrfs/dir-item.c    |  113 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/inode.c       |   58 ++++++++++++++++++++++++-
>  fs/btrfs/ioctl.c       |   13 ++++-
>  fs/btrfs/transaction.c |   22 +++++++--
>  5 files changed, 202 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 54e4252..ea0662e 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1161,6 +1161,8 @@ struct btrfs_root {
>  #define BTRFS_DIR_LOG_INDEX_KEY 72
>  #define BTRFS_DIR_ITEM_KEY	84
>  #define BTRFS_DIR_INDEX_KEY	96
> +#define BTRFS_DIR_SUBVOL_KEY	97
> +
>  /*
>   * extent data is for file data
>   */
> @@ -2320,6 +2322,10 @@ int btrfs_insert_dir_item(struct btrfs_trans_handle *trans,
>  			  struct btrfs_root *root, const char *name,
>  			  int name_len, u64 dir,
>  			  struct btrfs_key *location, u8 type, u64 index);
> +int btrfs_insert_subvol_dir_item(struct btrfs_trans_handle *trans,
> +				 struct btrfs_root *root, const char *name,
> +				 int name_len, u64 dir, u64 ino,
> +				 struct btrfs_key *location, u64 index);
>  struct btrfs_dir_item *btrfs_lookup_dir_item(struct btrfs_trans_handle *trans,
>  					     struct btrfs_root *root,
>  					     struct btrfs_path *path, u64 dir,
> diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c
> index f0cad5a..95d498f 100644
> --- a/fs/btrfs/dir-item.c
> +++ b/fs/btrfs/dir-item.c
> @@ -116,6 +116,119 @@ int btrfs_insert_xattr_item(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> +/**
> + * btrfs_insert_subvol_dir_item - setup the dir items for a subvol
> + *
> + * @trans: transaction handle
> + * @root: the root of the parent subvol
> + * @name: name of the subvol
> + * @name_len: the length of the name
> + * @dir: the objectid of the parent directory
> + * @ino: the unique inode number for the parent directory
> + * @key: the key that the items will point to
> + * @index: the dir index for readdir purposes
> + *
> + * Creates the dir item/dir index pair for the directory containing the subvol.
> + * This also creates a blank key to hold the made up inode number for the subvol
> + * in order to give us a unique to the parent subvol inode number.
> + */
> +int btrfs_insert_subvol_dir_item(struct btrfs_trans_handle *trans,
> +				 struct btrfs_root *root, const char *name,
> +				 int name_len, u64 dir, u64 ino,
> +				 struct btrfs_key *location, u64 index)
> +{
> +	int ret = 0;
> +	int ret2 = 0;
> +	struct btrfs_path *path;
> +	struct btrfs_dir_item *dir_item;
> +	struct extent_buffer *leaf;
> +	unsigned long name_ptr;
> +	unsigned long ino_ptr;
> +	struct btrfs_key key;
> +	struct btrfs_disk_key disk_key;
> +	u32 data_size;
> +	u8 type = BTRFS_FT_DIR;
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +	path->leave_spinning = 1;
> +
> +	key.objectid = ino;
> +	key.type = BTRFS_DIR_SUBVOL_KEY;
> +	key.offset = location->objectid;
> +
> +	ret = btrfs_insert_empty_item(trans, root, path, &key, 0);
> +	if (ret)
> +		goto out;
> +
> +	btrfs_release_path(root, path);
> +
> +	key.objectid = dir;
> +	btrfs_set_key_type(&key, BTRFS_DIR_ITEM_KEY);
> +	key.offset = btrfs_name_hash(name, name_len);
> +
> +	data_size = sizeof(*dir_item) + name_len + sizeof(u64);
> +	dir_item = insert_with_overflow(trans, root, path, &key, data_size,
> +					name, name_len);
> +	if (IS_ERR(dir_item)) {
> +		ret = PTR_ERR(dir_item);
> +		if (ret == -EEXIST)
> +			goto second_insert;
> +		goto out;
> +	}
> +
> +	leaf = path->nodes[0];
> +	btrfs_cpu_key_to_disk(&disk_key, location);
> +	btrfs_set_dir_item_key(leaf, dir_item, &disk_key);
> +	btrfs_set_dir_type(leaf, dir_item, type);
> +	btrfs_set_dir_data_len(leaf, dir_item, sizeof(u64));
> +	btrfs_set_dir_name_len(leaf, dir_item, name_len);
> +	btrfs_set_dir_transid(leaf, dir_item, trans->transid);
> +	name_ptr = (unsigned long)(dir_item + 1);
> +	ino_ptr = name_ptr + name_len;
> +
> +	write_extent_buffer(leaf, name, name_ptr, name_len);
> +	write_extent_buffer(leaf, &ino, ino_ptr, sizeof(u64));
> +	btrfs_mark_buffer_dirty(leaf);
> +
> +second_insert:
> +	/* FIXME, use some real flag for selecting the extra index */
> +	if (root == root->fs_info->tree_root) {
> +		ret = 0;
> +		goto out;
> +	}
> +	btrfs_release_path(root, path);
> +
> +	btrfs_set_key_type(&key, BTRFS_DIR_INDEX_KEY);
> +	key.offset = index;
> +	dir_item = insert_with_overflow(trans, root, path, &key, data_size,
> +					name, name_len);
> +	if (IS_ERR(dir_item)) {
> +		ret2 = PTR_ERR(dir_item);
> +		goto out;
> +	}
> +	leaf = path->nodes[0];
> +	btrfs_cpu_key_to_disk(&disk_key, location);
> +	btrfs_set_dir_item_key(leaf, dir_item, &disk_key);
> +	btrfs_set_dir_type(leaf, dir_item, type);
> +	btrfs_set_dir_data_len(leaf, dir_item, sizeof(u64));
> +	btrfs_set_dir_name_len(leaf, dir_item, name_len);
> +	btrfs_set_dir_transid(leaf, dir_item, trans->transid);
> +	name_ptr = (unsigned long)(dir_item + 1);
> +	ino_ptr = name_ptr + name_len;
> +	write_extent_buffer(leaf, name, name_ptr, name_len);
> +	write_extent_buffer(leaf, &ino, ino_ptr, sizeof(u64));
> +	btrfs_mark_buffer_dirty(leaf);
> +out:
> +	btrfs_free_path(path);
> +	if (ret)
> +		return ret;
> +	if (ret2)
> +		return ret2;
> +	return 0;
> +}
> +
>  /*
>   * insert a directory item in the tree, doing all the magic for
>   * both indexes. 'dir' indicates which objectid to insert it into,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index a6a3e2f..463f816 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2917,6 +2917,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
>  	struct btrfs_dir_item *di;
>  	struct btrfs_key key;
>  	u64 index;
> +	u64 parent_root_ino = 0;
>  	int ret;
>  
>  	path = btrfs_alloc_path();
> @@ -2928,12 +2929,48 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
>  	BUG_ON(!di || IS_ERR(di));
>  
>  	leaf = path->nodes[0];
> +
> +	/*
> +	 * This subvol has a fake inode number associated with it, make sure we
> +	 * get it and delete it.
> +	 */
> +	if (btrfs_dir_data_len(leaf, di) == sizeof(u64)) {
> +		unsigned long data_ptr;
> +
> +		data_ptr = (unsigned long)(di + 1) + name_len;
> +		read_extent_buffer(leaf, &parent_root_ino, data_ptr,
> +				   sizeof(u64));
> +	}
> +
>  	btrfs_dir_item_key_to_cpu(leaf, di, &key);
>  	WARN_ON(key.type != BTRFS_ROOT_ITEM_KEY || key.objectid != objectid);
>  	ret = btrfs_delete_one_dir_name(trans, root, path, di);
>  	BUG_ON(ret);
>  	btrfs_release_path(root, path);
>  
> +	/* Delete the fake ino place holder */
> +	if (parent_root_ino != 0) {
> +		key.objectid = parent_root_ino;
> +		key.type = BTRFS_DIR_SUBVOL_KEY;
> +		key.offset = objectid;
> +
> +		ret = btrfs_search_slot(trans, root, &key, path, -1, 1);
> +		if (ret < 0) {
> +			btrfs_free_path(path);
> +			return ret;
> +		} else if (ret > 0) {
> +			WARN_ON(1);
> +			btrfs_free_path(path);
> +			return -ENOENT;
> +		}
> +		ret = btrfs_del_item(trans, root, path);
> +		if (ret) {
> +			btrfs_free_path(path);
> +			return ret;
> +		}
> +		btrfs_release_path(root, path);
> +	}
> +
>  	ret = btrfs_del_root_ref(trans, root->fs_info->tree_root,
>  				 objectid, root->root_key.objectid,
>  				 dir->i_ino, &index, name, name_len);
> @@ -4254,6 +4291,7 @@ static int btrfs_real_readdir(struct file *filp, void *dirent,
>  
>  		while (di_cur < di_total) {
>  			struct btrfs_key location;
> +			u64 ino;
>  
>  			name_len = btrfs_dir_name_len(leaf, di);
>  			if (name_len <= sizeof(tmp_name)) {
> @@ -4279,9 +4317,25 @@ static int btrfs_real_readdir(struct file *filp, void *dirent,
>  				over = 0;
>  				goto skip;
>  			}
> +
> +			ino = location.objectid;
> +
> +			/*
> +			 * If this is a subvol, check to see if the data len is
> +			 * the size of u64, if so it means we have a unique
> +			 * inode number for this subvol and we need to use that
> +			 * instead.
> +			 */
> +			if (location.type == BTRFS_ROOT_ITEM_KEY &&
> +			    btrfs_dir_data_len(leaf, di) == sizeof(u64)) {
> +				unsigned long data_ptr;
> +
> +				data_ptr = (unsigned long)(di + 1) + name_len;
> +				read_extent_buffer(leaf, &ino, data_ptr,
> +						   sizeof(u64));
> +			}
>  			over = filldir(dirent, name_ptr, name_len,
> -				       found_key.offset, location.objectid,
> -				       d_type);
> +				       found_key.offset, ino, d_type);
>  
>  skip:
>  			if (name_ptr != tmp_name)
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index f1c9bb4..460ac1e 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -238,6 +238,7 @@ static noinline int create_subvol(struct btrfs_root *root,
>  	int ret;
>  	int err;
>  	u64 objectid;
> +	u64 parent_subvol_ino;
>  	u64 new_dirid = BTRFS_FIRST_FREE_OBJECTID;
>  	u64 index = 0;
>  
> @@ -248,6 +249,12 @@ static noinline int create_subvol(struct btrfs_root *root,
>  		return ret;
>  	}
>  
> +	ret = btrfs_find_free_objectid(NULL, root, 0, &parent_subvol_ino);
> +	if (ret) {
> +		dput(parent);
> +		return ret;
> +	}
> +
>  	dir = parent->d_inode;
>  
>  	/*
> @@ -329,9 +336,9 @@ static noinline int create_subvol(struct btrfs_root *root,
>  	ret = btrfs_set_inode_index(dir, &index);
>  	BUG_ON(ret);
>  
> -	ret = btrfs_insert_dir_item(trans, root,
> -				    name, namelen, dir->i_ino, &key,
> -				    BTRFS_FT_DIR, index);
> +	ret = btrfs_insert_subvol_dir_item(trans, root, name, namelen,
> +					   dir->i_ino, parent_subvol_ino,
> +					   &key, index);
>  	if (ret)
>  		goto fail;
>  
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index f50e931..50b1335 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -910,6 +910,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  	u64 to_reserve = 0;
>  	u64 index = 0;
>  	u64 objectid;
> +	u64 parent_subvol_ino;
>  
>  	new_root_item = kmalloc(sizeof(*new_root_item), GFP_NOFS);
>  	if (!new_root_item) {
> @@ -947,16 +948,27 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
>  	parent_root = BTRFS_I(parent_inode)->root;
>  	record_root_in_trans(trans, parent_root);
>  
> +	ret = btrfs_find_free_objectid(trans, parent_root, 0,
> +				       &parent_subvol_ino);
> +	if (ret) {
> +		pending->error = ret;
> +		goto fail;
> +	}
> +
>  	/*
>  	 * insert the directory item
>  	 */
>  	ret = btrfs_set_inode_index(parent_inode, &index);
>  	BUG_ON(ret);
> -	ret = btrfs_insert_dir_item(trans, parent_root,
> -				dentry->d_name.name, dentry->d_name.len,
> -				parent_inode->i_ino, &key,
> -				BTRFS_FT_DIR, index);
> -	BUG_ON(ret);
> +	ret = btrfs_insert_subvol_dir_item(trans, parent_root,
> +					   dentry->d_name.name,
> +					   dentry->d_name.len,
> +					   parent_inode->i_ino,
> +					   parent_subvol_ino, &key, index);
> +	if (ret) {
> +		pending->error = ret;
> +		goto fail;
> +	}
>  
>  	btrfs_i_size_write(parent_inode, parent_inode->i_size +
>  					 dentry->d_name.len * 2);


      reply	other threads:[~2010-12-07  2:02 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-06 20:48 [PATCH] Btrfs: create a unique inode number for all subvol entries Josef Bacik
2010-12-07  2:02 ` liubo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CFD95AC.4030401@cn.fujitsu.com \
    --to=liubo2009@cn.fujitsu.com \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).