All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Becker <Joel.Becker@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH 2/3] ocfs2/xattr: Merge xattr set	transaction.
Date: Thu, 23 Oct 2008 14:40:36 -0700	[thread overview]
Message-ID: <20081023214036.GC12751@mail.oracle.com> (raw)
In-Reply-To: <1224218678-32196-2-git-send-email-tao.ma@oracle.com>

On Fri, Oct 17, 2008 at 12:44:37PM +0800, Tao Ma wrote:
> In current ocfs2/xattr, the whole xattr set is divided into
> many steps are many transaction are used, this make the
> xattr set process isn't like a real transaction, so this
> patch try to merge all the transaction into one. Another
> benefit is that acl can use it easily now.

	I love the concept of this change.  I didn't like the old code,
where random hunks of the operation would open their own handle and
close it right away.
 
> I don't merge the transaction of deleting xattr when we
> remove an inode. The reason is that if we have a large number
> of xattrs and every xattrs has large values(large enough
> for outside storage), the whole transaction will be very
> huge and it looks like jbd can't handle it(I meet with a
> jbd complain once). And the old inode removal is also divided
> into many steps, so I'd like to leave as it is.

	That's just fine.  Piecemeal is the right way to go there.  As
long as we have a sane inode after every step, it is easily recovered
from the orphan dir after a crash.

> Signed-off-by: Tao Ma <tao.ma@oracle.com>
> ---
>  fs/ocfs2/xattr.c |  889 +++++++++++++++++++++++++++++++-----------------------
>  1 files changed, 514 insertions(+), 375 deletions(-)

	I'm trying to think of a way you could break this patch up.
It's huge.

> @@ -128,11 +134,15 @@ static int ocfs2_xattr_tree_list_index_block(struct inode *inode,
>  					size_t buffer_size);
>  
>  static int ocfs2_xattr_create_index_block(struct inode *inode,
> -					  struct ocfs2_xattr_search *xs);
> +					  handle_t *handle,
> +					  struct ocfs2_xattr_search *xs,
> +					  struct ocfs2_xattr_set_ctxt *ctxt);
>  
>  static int ocfs2_xattr_set_entry_index_block(struct inode *inode,
> +					     handle_t *handle,
>  					     struct ocfs2_xattr_info *xi,
> -					     struct ocfs2_xattr_search *xs);
> +					     struct ocfs2_xattr_search *xs,
> +					     struct ocfs2_xattr_set_ctxt *ctxt);

	You pass 'handle' and 'ctxt' to all the same functions.  Why not
put the handle on the ctxt?

> @@ -203,21 +211,10 @@ static int ocfs2_xattr_extend_allocation(struct inode *inode,
>  
>  	ocfs2_init_xattr_value_extent_tree(&et, inode, xattr_bh, xv);
>  
> -restart_all:
> -
> -	status = ocfs2_lock_allocators(inode, &et, clusters_to_add, 0,
> -				       &data_ac, &meta_ac);
> -	if (status) {
> -		mlog_errno(status);
> -		goto leave;
> -	}
> -
>  	credits = ocfs2_calc_extend_credits(osb->sb, et.et_root_el,
> -					    clusters_to_add);
> -	handle = ocfs2_start_trans(osb, credits);
> -	if (IS_ERR(handle)) {
> -		status = PTR_ERR(handle);
> -		handle = NULL;
> +			clusters_to_add) + handle->h_buffer_credits;
> +	status = ocfs2_extend_trans(handle, credits);

	Um, you just made 'credits' be double the existing transaction +
the result of ocfs2_calc_extend_credits().  Did you mean that?
ocfs2_extend_trans() takes the additional credits, not the total.  Later
you do an eocfs2_extend_trans() with merely the new credits, so I wonder
if you meant this here?

> @@ -280,62 +278,30 @@ restarted_transaction:
>  	}
>  
>  leave:
> -	if (handle) {
> -		ocfs2_commit_trans(osb, handle);
> -		handle = NULL;
> -	}
> -	if (data_ac) {
> -		ocfs2_free_alloc_context(data_ac);
> -		data_ac = NULL;
> -	}
> -	if (meta_ac) {
> -		ocfs2_free_alloc_context(meta_ac);
> -		meta_ac = NULL;
> -	}
> -	if ((!status) && restart_func) {
> -		restart_func = 0;
> -		goto restart_all;
> -	}

	I love watching repeated code like this disappear.

> @@ -896,14 +858,12 @@ static int __ocfs2_xattr_set_value_outside(struct inode *inode,
>  	u32 clusters = ocfs2_clusters_for_bytes(inode->i_sb, value_len);
>  	u64 blkno;
>  	struct buffer_head *bh = NULL;
> -	handle_t *handle;
>  
>  	BUG_ON(clusters > le32_to_cpu(xv->xr_clusters));
>  
>  	credits = clusters * bpc;
> -	handle = ocfs2_start_trans(OCFS2_SB(inode->i_sb), credits);
> -	if (IS_ERR(handle)) {
> -		ret = PTR_ERR(handle);
> +	ret = ocfs2_extend_trans(handle, credits);
> +	if (ret) {

	I also worry about all the places you are extending the
transaction - we would love to see this all as one transaction so we
don't leak things on crash.  That said, sometimes it is hard to do, and
this is a complex operation.

> +static int ocfs2_calc_xattr_set_credits(struct inode *inode,
> +					struct ocfs2_xattr_info *xi,
> +					struct ocfs2_xattr_search *xis,
> +					struct ocfs2_xattr_search *xbs)
> +{
> +	int credits = OCFS2_INODE_UPDATE_CREDITS;
> +
> +	/* calculate xattr metadata update credits. */
> +	if (xbs->xattr_bh) {
> +		struct ocfs2_xattr_block *xb =
> +			(struct ocfs2_xattr_block *)xbs->xattr_bh->b_data;
> +
> +		if (!(le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED))
> +			credits += OCFS2_XATTR_BLOCK_UPDATE_CREDITS;
> +		else
> +			credits += ocfs2_blocks_per_xattr_bucket(inode->i_sb);
> +	}
> +
> +	return credits;
> +}
> +
> +static int ocfs2_init_xattr_set_ctxt(struct inode *inode,
> +				     struct ocfs2_dinode *di,
> +				     struct ocfs2_xattr_info *xi,
> +				     struct ocfs2_xattr_search *xis,
> +				     struct ocfs2_xattr_search *xbs,
> +				     struct ocfs2_xattr_set_ctxt *ctxt)
> +{
> +	int ret = 0, clusters_add = 0, meta_add = 0;
> +	struct buffer_head *bh = NULL;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +	struct ocfs2_xattr_block *xb = NULL;
> +	struct ocfs2_xattr_entry *xe;
> +
> +	memset(ctxt, 0, sizeof(struct ocfs2_xattr_set_ctxt));
> +
> +	ocfs2_init_dealloc_ctxt(&ctxt->dealloc);
> +
> +	if (di->i_xattr_loc) {
> +		if (!xbs->xattr_bh) {
> +			ret = ocfs2_read_block(inode,
> +					       le64_to_cpu(di->i_xattr_loc),
> +					       &bh);
> +			if (ret) {
> +				mlog_errno(ret);
> +				goto out;
> +			}
> +
> +			xb = (struct ocfs2_xattr_block *)bh->b_data;
> +		} else
> +			xb = (struct ocfs2_xattr_block *)xbs->xattr_bh->b_data;
> +
> +		if (le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED) {
> +			struct ocfs2_extent_list *el =
> +				 &xb->xb_attrs.xb_root.xt_list;
> +			meta_add += ocfs2_extend_meta_needed(el);
> +		}
> +		/*
> +		 * This cluster will be used either for new bucket or for
> +		 * new xattr block.
> +		 * If the cluster size is the same as the bucket size, one
> +		 * more is needed since we may need to extend the bucket
> +		 * also.
> +		 */
> +		clusters_add += 1;
> +		if (OCFS2_XATTR_BUCKET_SIZE == osb->s_clustersize)
> +			clusters_add += 1;
> +	} else
> +		meta_add += 1;
> +
> +	/* calculate xattr value update. */
> +	if (xi->value && xi->value_len > OCFS2_XATTR_INLINE_SIZE) {
> +		char *base;
> +		int name_offset, name_len, block_off, i;
> +		u32 new_clusters = ocfs2_clusters_for_bytes(inode->i_sb,
> +							    xi->value_len);
> +
> +		xe = NULL;
> +		if (!xis->not_found) {
> +			xe = xis->here;
> +			name_offset = le16_to_cpu(xe->xe_name_offset);
> +			name_len = OCFS2_XATTR_SIZE(xe->xe_name_len);
> +			base = xis->base;
> +		} else if (!xbs->not_found) {
> +			xe = xbs->here;
> +			name_offset = le16_to_cpu(xe->xe_name_offset);
> +			name_len = OCFS2_XATTR_SIZE(xe->xe_name_len);
> +			i = xbs->here - xbs->header->xh_entries;
> +
> +			if (le16_to_cpu(xb->xb_flags) & OCFS2_XATTR_INDEXED) {
> +				ret = ocfs2_xattr_bucket_get_name_value(inode,
> +								xbs->bucket.xh,
> +								i,
> +								&block_off,
> +								&name_offset);
> +				base = xbs->bucket.bhs[block_off]->b_data;
> +			} else
> +				base = xbs->base;
> +		}
> +
> +		/* calc value clusters and value tree metadata change. */
> +		if (!xe ||
> +		    (le64_to_cpu(xe->xe_value_size) <= OCFS2_XATTR_INLINE_SIZE))
> +			clusters_add += new_clusters;
> +		else {
> +			u32 old_clusters = ocfs2_clusters_for_bytes(inode->i_sb,
> +						le64_to_cpu(xe->xe_value_size));
> +
> +			if (old_clusters < new_clusters) {
> +				struct ocfs2_xattr_value_root *xv =
> +					(struct ocfs2_xattr_value_root *)
> +					(base + name_offset + name_len);
> +				meta_add +=
> +					ocfs2_extend_meta_needed(&xv->xr_list);
> +				clusters_add += new_clusters - old_clusters;
> +
> +			}
> +		}
> +	}
> +
> +	mlog(0, "Set xattr %s, reserve meta blocks = %d, clusters = %d\n",
> +	     xi->name, meta_add, clusters_add);
> +	if (meta_add) {
> +		ret = ocfs2_reserve_new_metadata_blocks(osb, meta_add,
> +							&ctxt->meta_ac);
> +		if (ret)
> +			mlog_errno(ret);
> +	}
> +
> +	if (clusters_add) {
> +		ret = ocfs2_reserve_clusters(osb, clusters_add, &ctxt->data_ac);
> +		if (ret) {
> +			mlog_errno(ret);
> +			goto out;
> +		}
> +	}
> +
> +out:
> +	return ret;
> +}

	Here you have almost all the information to calculate the
credits you need for your transaction?  Any chance you can pass that to
ocfs2_start_trans() up front and not have to extend later?

> @@ -1940,8 +2098,12 @@ int ocfs2_xattr_set(struct inode *inode,
>  {
>  	struct buffer_head *di_bh = NULL;
>  	struct ocfs2_dinode *di;
> -	int ret;
> +	int ret, credits;
>  	u16 i, blk_per_bucket = ocfs2_blocks_per_xattr_bucket(inode->i_sb);
> +	struct ocfs2_xattr_set_ctxt ctxt;
> +	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
> +	struct inode *tl_inode = osb->osb_tl_inode;
> +	handle_t *handle = NULL;
>  
>  	struct ocfs2_xattr_info xi = {
>  		.name_index = name_index,
> @@ -1996,48 +2158,45 @@ int ocfs2_xattr_set(struct inode *inode,
>  			goto cleanup;
>  	}
>  
> -	if (!value) {
> -		/* Remove existing extended attribute */
> -		if (!xis.not_found)
> -			ret = ocfs2_xattr_ibody_set(inode, &xi, &xis);
> -		else if (!xbs.not_found)
> -			ret = ocfs2_xattr_block_set(inode, &xi, &xbs);
> -	} else {
> -		/* We always try to set extended attribute into inode first*/
> -		ret = ocfs2_xattr_ibody_set(inode, &xi, &xis);
> -		if (!ret && !xbs.not_found) {
> -			/*
> -			 * If succeed and that extended attribute existing in
> -			 * external block, then we will remove it.
> -			 */
> -			xi.value = NULL;
> -			xi.value_len = 0;
> -			ret = ocfs2_xattr_block_set(inode, &xi, &xbs);
> -		} else if (ret == -ENOSPC) {
> -			if (di->i_xattr_loc && !xbs.xattr_bh) {
> -				ret = ocfs2_xattr_block_find(inode, name_index,
> -							     name, &xbs);
> -				if (ret)
> -					goto cleanup;
> -			}
> -			/*
> -			 * If no space in inode, we will set extended attribute
> -			 * into external block.
> -			 */
> -			ret = ocfs2_xattr_block_set(inode, &xi, &xbs);
> -			if (ret)
> -				goto cleanup;
> -			if (!xis.not_found) {
> -				/*
> -				 * If succeed and that extended attribute
> -				 * existing in inode, we will remove it.
> -				 */
> -				xi.value = NULL;
> -				xi.value_len = 0;
> -				ret = ocfs2_xattr_ibody_set(inode, &xi, &xis);
> -			}
> +
> +	mutex_lock(&tl_inode->i_mutex);
> +
> +	if (ocfs2_truncate_log_needs_flush(osb)) {
> +		ret = __ocfs2_flush_truncate_log(osb);
> +		if (ret < 0) {
> +			mutex_unlock(&tl_inode->i_mutex);
> +			mlog_errno(ret);
> +			goto cleanup;
>  		}
>  	}
> +	mutex_unlock(&tl_inode->i_mutex);
> +
> +	ret = ocfs2_init_xattr_set_ctxt(inode, di, &xi, &xis, &xbs, &ctxt);
> +	if (ret) {
> +		mlog_errno(ret);
> +		goto cleanup;
> +	}
> +
> +	credits = ocfs2_calc_xattr_set_credits(inode, &xi, &xis, &xbs);

	You recalculate this inside __ocfs2_xattr_set_handle().  Why do
you need to calculate it twice?

> +	handle = ocfs2_start_trans(osb, credits);
> +	if (IS_ERR(handle)) {
> +		ret = PTR_ERR(handle);
> +		mlog_errno(ret);
> +		goto cleanup;
> +	}
> +
> +	ret = __ocfs2_xattr_set_handle(inode, handle, di,
> +				       &xi, &xis, &xbs, &ctxt);
> +
> +	ocfs2_commit_trans(osb, handle);
> +
> +	if (ctxt.data_ac)
> +		ocfs2_free_alloc_context(ctxt.data_ac);
> +	if (ctxt.meta_ac)
> +		ocfs2_free_alloc_context(ctxt.meta_ac);
> +	ocfs2_schedule_truncate_log_flush(osb, 1);
> +	ocfs2_run_deallocs(osb, &ctxt.dealloc);
> +
>  cleanup:
>  	up_write(&OCFS2_I(inode)->ip_xattr_sem);
>  	ocfs2_inode_unlock(inode, 1);

<snip>

> @@ -2707,24 +2860,24 @@ static int ocfs2_xattr_create_index_block(struct inode *inode,
>  	 * of the new xattr bucket and one for the value/data.
>  	 */
>  	credits += 3;
> -	handle = ocfs2_start_trans(osb, credits);
> -	if (IS_ERR(handle)) {
> -		ret = PTR_ERR(handle);
> +	ret = ocfs2_extend_trans(handle, credits + handle->h_buffer_credits);

	Again, you extend by double + credits.  How come?  I'm going to
stop mentioning these, though they appear other places in the code.
	Overall, I like where this is going.

Joel

-- 

"Nothing is wrong with California that a rise in the ocean level
 wouldn't cure."
        - Ross MacDonald

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

  reply	other threads:[~2008-10-23 21:40 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-17  4:40 [Ocfs2-devel] [PATCH 0/3] ocfs2/xattr: xattr improvement Tao Ma
2008-10-17  4:44 ` [Ocfs2-devel] [PATCH 1/3] ocfs2/xattr: Remove unused restore_extent_block Tao Ma
2008-10-17  4:44 ` [Ocfs2-devel] [PATCH 2/3] ocfs2/xattr: Merge xattr set transaction Tao Ma
2008-10-23 21:40   ` Joel Becker [this message]
2008-10-24  0:44     ` Tao Ma
2008-10-24  1:09       ` Joel Becker
2008-10-24  1:17         ` Tao Ma
2008-10-24  5:59           ` Joel Becker
2008-10-17  4:44 ` [Ocfs2-devel] [PATCH 3/3] ocfs2/xattr: Proper hash collision handle in bucket division Tao Ma
2008-10-17  0:54   ` [Ocfs2-devel] [PATCH 3/3] ocfs2/xattr: Proper hash collision handle in bucket division.v2 Tao Ma
2008-10-23 23:20     ` Joel Becker
2008-10-24  0:52       ` Tao Ma
2008-10-24  1:14         ` Joel Becker
2008-10-24  1:28           ` Tao Ma
2008-10-24  6:02             ` Joel Becker
2008-10-24  6:15               ` Tao Ma
2008-10-24  7:43                 ` Joel Becker
2008-10-24  8:47                   ` Joel Becker
2008-10-24  9:02                     ` Tao Ma
2008-10-24  9:15                       ` Joel Becker
2008-10-24  9:29                         ` Tao Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081023214036.GC12751@mail.oracle.com \
    --to=joel.becker@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.