public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Alex Elder <aelder@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 12/16] xfs: implement batched inode lookups for AG walking
Date: Thu, 23 Sep 2010 12:17:05 -0500	[thread overview]
Message-ID: <1285262225.1973.60.camel@doink> (raw)
In-Reply-To: <1285137869-10310-13-git-send-email-david@fromorbit.com>

On Wed, 2010-09-22 at 16:44 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> With the reclaim code separated from the generic walking code, it is
> simple to implement batched lookups for the generic walk code.
> Separate out the inode validation from the execute operations and
> modify the tree lookups to get a batch of inodes at a time.

Two comments below.  I noticed your discussion with Christoph
so I'll look for the new version before I stamp it "reviewed."

> Reclaim operations will be optimised separately.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/linux-2.6/xfs_sync.c    |  104 +++++++++++++++++++++++-----------------
>  fs/xfs/linux-2.6/xfs_sync.h    |    3 +-
>  fs/xfs/quota/xfs_qm_syscalls.c |   26 +++++-----
>  3 files changed, 75 insertions(+), 58 deletions(-)
> 
> diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
> index 7737a13..227ecde 100644
> --- a/fs/xfs/linux-2.6/xfs_sync.c
> +++ b/fs/xfs/linux-2.6/xfs_sync.c
> @@ -39,11 +39,19 @@
>  #include <linux/kthread.h>
>  #include <linux/freezer.h>
>  
> +/*
> + * The inode lookup is done in batches to keep the amount of lock traffic and
> + * radix tree lookups to a minimum. The batch size is a trade off between
> + * lookup reduction and stack usage. This is in the reclaim path, so we can't
> + * be too greedy.
> + */
> +#define XFS_LOOKUP_BATCH	32

Did you come up with 32 empirically?  As the OS evolves might another
value be better?  And if a larger value would improve things, how
would allocating the arrays rather than making them automatic (stack)
affect things?  (Just a discussion point, I think it's fine as-is.)

>  STATIC int
>  xfs_inode_ag_walk(
>  	struct xfs_mount	*mp,
>  	struct xfs_perag	*pag,
> +	int			(*grab)(struct xfs_inode *ip),
>  	int			(*execute)(struct xfs_inode *ip,
>  					   struct xfs_perag *pag, int flags),
>  	int			flags)
> @@ -52,48 +60,68 @@ xfs_inode_ag_walk(
>  	int			last_error = 0;
>  	int			skipped;
>  	int			done;
> +	int			nr_found;
>  
>  restart:
>  	done = 0;
>  	skipped = 0;
>  	first_index = 0;
> +	nr_found = 0;
>  	do {
>  		int		error = 0;
> -		int		nr_found;
> -		xfs_inode_t	*ip;
> +		int		i;
> +		struct xfs_inode *batch[XFS_LOOKUP_BATCH];
>  
>  		read_lock(&pag->pag_ici_lock);
>  		nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
> -				(void **)&ip, first_index, 1);
> +					(void **)batch, first_index,
> +					XFS_LOOKUP_BATCH);
>  		if (!nr_found) {
>  			read_unlock(&pag->pag_ici_lock);
>  			break;
>  		}
>  
>  		/*
> -		 * Update the index for the next lookup. Catch overflows
> -		 * into the next AG range which can occur if we have inodes
> -		 * in the last block of the AG and we are currently
> -		 * pointing to the last inode.
> +		 * Grab the inodes before we drop the lock. if we found
> +		 * nothing, nr == 0 and the loop will be skipped.
>  		 */
> -		first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
> -		if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino))
> -			done = 1;
> -
> -		/* execute releases pag->pag_ici_lock */
> -		error = execute(ip, pag, flags);
> -		if (error == EAGAIN) {
> -			skipped++;
> -			continue;
> +		for (i = 0; i < nr_found; i++) {
> +			struct xfs_inode *ip = batch[i];
> +
> +			if (done || grab(ip))
> +				batch[i] = NULL;
> +
> +			/*
> +			 * Update the index for the next lookup. Catch overflows
> +			 * into the next AG range which can occur if we have inodes
> +			 * in the last block of the AG and we are currently
> +			 * pointing to the last inode.
> +			 */
> +			first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
> +			if (first_index < XFS_INO_TO_AGINO(mp, ip->i_ino))
> +				done = 1;

It sounds like you're going to re-work this, but
I'll mention this for you to consider anyway.  I
don't know that the "done" flag here should be
needed.  The gang lookup should never return
anything beyond the end of the AG.  It seems
like you ought to be able to detect when you've
covered all the whole AG elsewhere, *not*
on every entry found in this inner loop and
also *not* while holding the lock.


> +		}
> +
> +		/* unlock now we've grabbed the inodes. */
> +		read_unlock(&pag->pag_ici_lock);


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2010-09-23 17:16 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-22  6:44 [PATCH 0/16] xfs: metadata scalability V2 Dave Chinner
2010-09-22  6:44 ` [PATCH 01/16] xfs: reduce the number of CIL lock round trips during commit Dave Chinner
2010-09-22 16:51   ` Christoph Hellwig
2010-09-22 19:57   ` Alex Elder
2010-09-22  6:44 ` [PATCH 02/16] xfs: remove debug assert for per-ag reference counting Dave Chinner
2010-09-22  6:44 ` [PATCH 03/16] xfs: lockless per-ag lookups Dave Chinner
2010-09-22  6:44 ` [PATCH 04/16] xfs: don't use vfs writeback for pure metadata modifications Dave Chinner
2010-09-22 17:24   ` Christoph Hellwig
2010-09-23  0:36     ` Dave Chinner
2010-09-23 16:19   ` Alex Elder
2010-09-22  6:44 ` [PATCH 05/16] xfs: rename xfs_buf_get_nodaddr to be more appropriate Dave Chinner
2010-09-22 17:25   ` Christoph Hellwig
2010-09-23  0:37     ` Dave Chinner
2010-09-23 16:22   ` Alex Elder
2010-09-22  6:44 ` [PATCH 06/16] xfs: introduced uncached buffer read primitve Dave Chinner
2010-09-22  6:44 ` [PATCH 07/16] xfs: store xfs_mount in the buftarg instead of in the xfs_buf Dave Chinner
2010-09-22  6:44 ` [PATCH 08/16] xfs: kill XBF_FS_MANAGED buffers Dave Chinner
2010-09-22  6:44 ` [PATCH 09/16] xfs: use unhashed buffers for size checks Dave Chinner
2010-09-22  6:44 ` [PATCH 10/16] xfs: remove buftarg hash for external devices Dave Chinner
2010-09-22  6:44 ` [PATCH 11/16] xfs: split inode AG walking into separate code for reclaim Dave Chinner
2010-09-22 17:28   ` Christoph Hellwig
2010-09-23 16:45   ` Alex Elder
2010-09-22  6:44 ` [PATCH 12/16] xfs: implement batched inode lookups for AG walking Dave Chinner
2010-09-22 17:33   ` Christoph Hellwig
2010-09-23  0:40     ` Dave Chinner
2010-09-23 17:17   ` Alex Elder [this message]
2010-09-24  9:15     ` Dave Chinner
2010-09-27 16:05       ` Alex Elder
2010-09-27 17:43       ` Alex Elder
2010-09-22  6:44 ` [PATCH 13/16] xfs: batch inode reclaim lookup Dave Chinner
2010-09-22 17:34   ` Christoph Hellwig
2010-09-23  0:43     ` Dave Chinner
2010-09-23 17:39   ` Alex Elder
2010-09-22  6:44 ` [PATCH 14/16] xfs: serialise inode reclaim within an AG Dave Chinner
2010-09-23 17:50   ` Alex Elder
2010-09-22  6:44 ` [PATCH 16/16] xfs; pack xfs_buf structure more tightly Dave Chinner
2010-09-22 14:53 ` [PATCH 0/16] xfs: metadata scalability V2 Christoph Hellwig
2010-09-22 20:55 ` Alex Elder
2010-09-23  0:46   ` [PATCH 15/16] xfs: convert buffer cache hash to rbtree Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1285262225.1973.60.camel@doink \
    --to=aelder@sgi.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox