All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Elder <aelder@sgi.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 12/16] xfs: implement batched inode lookups for AG walking
Date: Mon, 27 Sep 2010 12:43:29 -0500	[thread overview]
Message-ID: <1285609409.2380.67.camel@doink> (raw)
In-Reply-To: <20100924091522.GT2614@dastard>

On Fri, 2010-09-24 at 19:15 +1000, Dave Chinner wrote:
> On Thu, Sep 23, 2010 at 12:17:05PM -0500, Alex Elder wrote:
> > On Wed, 2010-09-22 at 16:44 +1000, Dave Chinner wrote:

. . .

> > It sounds like you're going to re-work this, but
> > I'll mention this for you to consider anyway.  I
> > don't know that the "done" flag here should be
> > needed.
> 
> This check was added because if we don't detect the special case of
> the last valid inode _number_ in the AG, first_index will loop back
> to 0 and we'll start searching the AG again.  IOWs, we're not
> looking for the last inode in the cache, we're looking for the last
> valid inode number.
> 
> Hence the done flag is ensuring that:
> 	a) we terminate the walk at the last valid inode
> 	b) if there are inodes at indexes above the last valid inode
> 	number, we do not grab them or continue walking them.
> 
> Yes, b) should never happen, but I've had bugs in development code
> that have put inodes in stange places before...
> 
> > The gang lookup should never return
> > anything beyond the end of the AG.  It seems
> > like you ought to be able to detect when you've
> > covered all the whole AG elsewhere,
> 
> AFAICT, there are only two ways - the gang lookup returns nothing,
> or we see the last valid inode number in the AG. If you can come up
> with something that doesn't invlove a tree or inode number lookup,
> I'm all ears....
> 
> > *not*
> > on every entry found in this inner loop and
> > also *not* while holding the lock.
> 
> It has to be done while holding the lock because if we cannot grab
> the inode then the only way we can safely derefence the inode is
> by still holding the inode cache lock. Once we drop the lock, the
> inodes we failed to grab can be removed from the cache and we cannot
> safely dereference them to get the inode number from them.

OK, I have a proposal below--it's not a diff, it's just a
modified version of xfs_inode_ag_walk() for you to consider.
It's not hugely better but it reduces the amount
of computation done inside the inner loop and while the
lock is held.  I haven't done any testing with it.

How this differs from what you have, probably in order
of importance:
- Update first_index only on the last inode returned by
  a gang lookup (not on every inode returned)
- Don't compare new value of first_index against the
  old one when it's updated.
- Use first_index == 0 (rather than done != 0) as an
  indication that we've exhausted the inodes in the AG.
- Tracks only grabbed inodes (rather than filling the
  array with null pointers in slots for those not grabbed)
- Don't bother looking at "error" again if it's zero
  (the normal case)

Anyway, do what you like with this (or do nothing at all).
I was just following up on my suggestion.

					-Alex

STATIC int
xfs_inode_ag_walk(
        struct xfs_mount        *mp,
        struct xfs_perag        *pag,
        int                     (*execute)(struct xfs_inode *ip,
                                           struct xfs_perag *pag, int
flags),
        int                     flags)
{
        uint32_t                first_index;
        int                     last_error = 0;
        int                     skipped;
        int                     nr_found;
        
restart:
        skipped = 0;
        first_index = 0;
        do {
                int             error = 0;
                int             nr_grabbed = 0;
                int             i;
                struct xfs_inode *batch[XFS_LOOKUP_BATCH];
                struct xfs_inode *ip; /* = NULL if compiler complains */
        
                read_lock(&pag->pag_ici_lock);
                nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
                                        (void **) batch, first_index,
                                        XFS_LOOKUP_BATCH);
                if (!nr_found) {
                        read_unlock(&pag->pag_ici_lock);
                        break;
                }
        
                /* Grab the inodes while we hold the lock. */
                for (i = 0; i < nr_found; i++) {
                        ip = batch[i];
                        if (!xfs_inode_ag_walk_grab(ip)) {
                                if (i > nr_grabbed)
                                            batch[nr_grabbed] = ip;
                                nr_grabbed++;
                        }   
                }

                /*
                 * Update the index so the next lookup starts after
                 * the last inode found (whether or not we were able
                 * to grab it).  If that inode was the highest one
                 * in the AG, this will evaluate to 0, which will
                 * cause the loop to terminate (below).
                 */
                first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);

		/* Done looking at (ungrabbed) inodes; drop the lock */
                read_unlock(&pag->pag_ici_lock);
                 
                for (i = 0; i < nr_grabbed; i++) {
                        ip = batch[i];
                        error = execute(ip, pag, flags);
                        IRELE(ip);
                        if (error) {
                                if (error == EAGAIN) {
                                        skipped++;
                                else if (last_error != EFSCORRUPTED)
                                        last_error = error;
                        }
                }

                /* bail out if the filesystem is corrupted. */
                if (error == EFSCORRUPTED)
                        break;
        } while (nr_found && first_index);

        if (skipped) {
                delay(1);
                goto restart;
        }

        return last_error;
}



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2010-09-27 17:43 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-22  6:44 [PATCH 0/16] xfs: metadata scalability V2 Dave Chinner
2010-09-22  6:44 ` [PATCH 01/16] xfs: reduce the number of CIL lock round trips during commit Dave Chinner
2010-09-22 16:51   ` Christoph Hellwig
2010-09-22 19:57   ` Alex Elder
2010-09-22  6:44 ` [PATCH 02/16] xfs: remove debug assert for per-ag reference counting Dave Chinner
2010-09-22  6:44 ` [PATCH 03/16] xfs: lockless per-ag lookups Dave Chinner
2010-09-22  6:44 ` [PATCH 04/16] xfs: don't use vfs writeback for pure metadata modifications Dave Chinner
2010-09-22 17:24   ` Christoph Hellwig
2010-09-23  0:36     ` Dave Chinner
2010-09-23 16:19   ` Alex Elder
2010-09-22  6:44 ` [PATCH 05/16] xfs: rename xfs_buf_get_nodaddr to be more appropriate Dave Chinner
2010-09-22 17:25   ` Christoph Hellwig
2010-09-23  0:37     ` Dave Chinner
2010-09-23 16:22   ` Alex Elder
2010-09-22  6:44 ` [PATCH 06/16] xfs: introduced uncached buffer read primitve Dave Chinner
2010-09-22  6:44 ` [PATCH 07/16] xfs: store xfs_mount in the buftarg instead of in the xfs_buf Dave Chinner
2010-09-22  6:44 ` [PATCH 08/16] xfs: kill XBF_FS_MANAGED buffers Dave Chinner
2010-09-22  6:44 ` [PATCH 09/16] xfs: use unhashed buffers for size checks Dave Chinner
2010-09-22  6:44 ` [PATCH 10/16] xfs: remove buftarg hash for external devices Dave Chinner
2010-09-22  6:44 ` [PATCH 11/16] xfs: split inode AG walking into separate code for reclaim Dave Chinner
2010-09-22 17:28   ` Christoph Hellwig
2010-09-23 16:45   ` Alex Elder
2010-09-22  6:44 ` [PATCH 12/16] xfs: implement batched inode lookups for AG walking Dave Chinner
2010-09-22 17:33   ` Christoph Hellwig
2010-09-23  0:40     ` Dave Chinner
2010-09-23 17:17   ` Alex Elder
2010-09-24  9:15     ` Dave Chinner
2010-09-27 16:05       ` Alex Elder
2010-09-27 17:43       ` Alex Elder [this message]
2010-09-22  6:44 ` [PATCH 13/16] xfs: batch inode reclaim lookup Dave Chinner
2010-09-22 17:34   ` Christoph Hellwig
2010-09-23  0:43     ` Dave Chinner
2010-09-23 17:39   ` Alex Elder
2010-09-22  6:44 ` [PATCH 14/16] xfs: serialise inode reclaim within an AG Dave Chinner
2010-09-23 17:50   ` Alex Elder
2010-09-22  6:44 ` [PATCH 16/16] xfs; pack xfs_buf structure more tightly Dave Chinner
2010-09-22 14:53 ` [PATCH 0/16] xfs: metadata scalability V2 Christoph Hellwig
2010-09-22 20:55 ` Alex Elder
2010-09-23  0:46   ` [PATCH 15/16] xfs: convert buffer cache hash to rbtree Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1285609409.2380.67.camel@doink \
    --to=aelder@sgi.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.