linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Valerie Aurora <vaurora@redhat.com>
To: Nick Piggin <npiggin@kernel.dk>, Jan Blunck <blunck@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>, linux-fsdevel@vger.kernel.org
Subject: Re: [patch 05/10] fs: remove extra lookup in __lookup_hash
Date: Wed, 18 Aug 2010 15:34:18 -0400	[thread overview]
Message-ID: <20100818193418.GE10850@shell> (raw)
In-Reply-To: <20100817184121.115628228@kernel.dk>

On Wed, Aug 18, 2010 at 04:37:34AM +1000, Nick Piggin wrote:
> fs: remove extra lookup in __lookup_hash
> 
> Optimize lookup for create operations, where no dentry should often be
> common-case. In cases where it is not, such as unlink, the added overhead
> is much smaller than the removed.
> 
> Also, move comments about __d_lookup racyness to the __d_lookup call site.
> d_lookup is intuitive; __d_lookup is what needs commenting. So in that same
> vein, add kerneldoc comments to __d_lookup and clean up some of the comments:
> 
> - We are interested in how the RCU lookup works here, particularly with
>   renames. Make that explicit, and point to the document where it is explained
>   in more detail.
> - RCU is pretty standard now, and macros make implementations pretty mindless.
>   If we want to know about RCU barrier details, we look in RCU code.
> - Delete some boring legacy comments because we don't care much about how the
>   code used to work, more about the interesting parts of how it works now. So
>   comments about lazy LRU may be interesting, but would better be done in the
>   LRU or refcount management code.
> 
> Signed-off-by: Nick Piggin <npiggin@kernel.dk>
> 
> ---
>  fs/dcache.c |   60 +++++++++++++++++++++++++++++++++++-------------------------
>  fs/namei.c  |   32 ++++++++++++++++----------------
>  2 files changed, 51 insertions(+), 41 deletions(-)
> 
> Index: linux-2.6/fs/namei.c
> ===================================================================
> --- linux-2.6.orig/fs/namei.c	2010-08-18 04:04:29.000000000 +1000
> +++ linux-2.6/fs/namei.c	2010-08-18 04:05:07.000000000 +1000
> @@ -735,6 +735,11 @@ static int do_lookup(struct nameidata *n
>  			return err;
>  	}
>  
> +	/*
> +	 * Rename seqlock is not required here because in the off chance
> +	 * of a false negative due to a concurrent rename, we're going to
> +	 * do the non-racy lookup, below.
> +	 */
>  	dentry = __d_lookup(nd->path.dentry, name);
>  	if (!dentry)
>  		goto need_lookup;
> @@ -754,17 +759,13 @@ need_lookup:
>  	mutex_lock(&dir->i_mutex);
>  	/*
>  	 * First re-do the cached lookup just in case it was created
> -	 * while we waited for the directory semaphore..
> -	 *
> -	 * FIXME! This could use version numbering or similar to
> -	 * avoid unnecessary cache lookups.
> -	 *
> -	 * The "dcache_lock" is purely to protect the RCU list walker
> -	 * from concurrent renames at this point (we mustn't get false
> -	 * negatives from the RCU list walk here, unlike the optimistic
> -	 * fast walk).
> +	 * while we waited for the directory semaphore, or the first
> +	 * lookup failed due to an unrelated rename.
>  	 *
> -	 * so doing d_lookup() (with seqlock), instead of lockfree __d_lookup
> +	 * This could use version numbering or similar to avoid unnecessary
> +	 * cache lookups, but then we'd have to do the first lookup in the
> +	 * non-racy way. However in the common case here, everything should
> +	 * be hot in cache, so would it be a big win?
>  	 */
>  	dentry = d_lookup(parent, name);
>  	if (likely(!dentry)) {
> @@ -1136,13 +1137,12 @@ static struct dentry *__lookup_hash(stru
>  			goto out;
>  	}
>  
> -	dentry = __d_lookup(base, name);
> -
> -	/* lockess __d_lookup may fail due to concurrent d_move()
> -	 * in some unrelated directory, so try with d_lookup
> +	/*
> +	 * Don't bother with __d_lookup: callers are for creat as
> +	 * well as unlink, so a lot of the time it would cost
> +	 * a double lookup.
>  	 */
> -	if (!dentry)
> -		dentry = d_lookup(base, name);
> +	dentry = d_lookup(base, name);
>  
>  	if (dentry && dentry->d_op && dentry->d_op->d_revalidate)
>  		dentry = do_revalidate(dentry, nd);
> Index: linux-2.6/fs/dcache.c
> ===================================================================
> --- linux-2.6.orig/fs/dcache.c	2010-08-18 04:04:01.000000000 +1000
> +++ linux-2.6/fs/dcache.c	2010-08-18 04:05:07.000000000 +1000
> @@ -1332,31 +1332,13 @@ EXPORT_SYMBOL(d_add_ci);
>   * d_lookup - search for a dentry
>   * @parent: parent dentry
>   * @name: qstr of name we wish to find
> + * Returns: dentry, or NULL
>   *
> - * Searches the children of the parent dentry for the name in question. If
> - * the dentry is found its reference count is incremented and the dentry
> - * is returned. The caller must use dput to free the entry when it has
> - * finished using it. %NULL is returned on failure.
> - *
> - * __d_lookup is dcache_lock free. The hash list is protected using RCU.
> - * Memory barriers are used while updating and doing lockless traversal. 
> - * To avoid races with d_move while rename is happening, d_lock is used.
> - *
> - * Overflows in memcmp(), while d_move, are avoided by keeping the length
> - * and name pointer in one structure pointed by d_qstr.
> - *
> - * rcu_read_lock() and rcu_read_unlock() are used to disable preemption while
> - * lookup is going on.
> - *
> - * The dentry unused LRU is not updated even if lookup finds the required dentry
> - * in there. It is updated in places such as prune_dcache, shrink_dcache_sb,
> - * select_parent and __dget_locked. This laziness saves lookup from dcache_lock
> - * acquisition.
> - *
> - * d_lookup() is protected against the concurrent renames in some unrelated
> - * directory using the seqlockt_t rename_lock.
> + * d_lookup searches the children of the parent dentry for the name in
> + * question. If the dentry is found its reference count is incremented and the
> + * dentry is returned. The caller must use dput to free the entry when it has
> + * finished using it. %NULL is returned if the dentry does not exist.
>   */
> -
>  struct dentry * d_lookup(struct dentry * parent, struct qstr * name)
>  {
>  	struct dentry * dentry = NULL;
> @@ -1372,6 +1354,21 @@ struct dentry * d_lookup(struct dentry *
>  }
>  EXPORT_SYMBOL(d_lookup);
>  
> +/*
> + * __d_lookup - search for a dentry (racy)
> + * @parent: parent dentry
> + * @name: qstr of name we wish to find
> + * Returns: dentry, or NULL
> + *
> + * __d_lookup is like d_lookup, however it may (rarely) return a
> + * false-negative result due to unrelated rename activity.
> + *
> + * __d_lookup is slightly faster by avoiding rename_lock read seqlock,
> + * however it must be used carefully, eg. with a following d_lookup in
> + * the case of failure.
> + *
> + * __d_lookup callers must be commented.
> + */
>  struct dentry * __d_lookup(struct dentry * parent, struct qstr * name)
>  {
>  	unsigned int len = name->len;
> @@ -1382,6 +1379,19 @@ struct dentry * __d_lookup(struct dentry
>  	struct hlist_node *node;
>  	struct dentry *dentry;
>  
> +	/*
> +	 * The hash list is protected using RCU.
> +	 *
> +	 * Take d_lock when comparing a candidate dentry, to avoid races
> +	 * with d_move().
> +	 *
> +	 * It is possible that concurrent renames can mess up our list
> +	 * walk here and result in missing our dentry, resulting in the
> +	 * false-negative result. d_lookup() protects against concurrent
> +	 * renames using rename_lock seqlock.
> +	 *
> +	 * See Documentation/vfs/dcache-locking.txt for more details.
> +	 */
>  	rcu_read_lock();
>  	
>  	hlist_for_each_entry_rcu(dentry, node, head, d_hash) {
> @@ -1396,8 +1406,8 @@ struct dentry * __d_lookup(struct dentry
>  
>  		/*
>  		 * Recheck the dentry after taking the lock - d_move may have
> -		 * changed things.  Don't bother checking the hash because we're
> -		 * about to compare the whole name anyway.
> +		 * changed things. Don't bother checking the hash because
> +		 * we're about to compare the whole name anyway.
>  		 */
>  		if (dentry->d_parent != parent)
>  			goto next;

Jan Blunck (cc'd) wrote a similar patch that I ended up dropping from
the union mount queue just to make my life easier.

This makes sense; it's far more likely that the dentry doesn't exist
than that we collided with a d_move(), in which case the second locked
lookup is totally wasted effort.  And the seqlock is low cost anyway.

Reviewed-by: Valerie Aurora <vaurora@redhat.com>

-VAL


  parent reply	other threads:[~2010-08-18 19:34 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-17 18:37 [patch 00/10] first set of vfs scale patches Nick Piggin
2010-08-17 18:37 ` [patch 01/10] fs: fix do_lookup false negative Nick Piggin
2010-08-17 22:45   ` Valerie Aurora
2010-08-17 23:04   ` Sage Weil
2010-08-18 13:41   ` Andi Kleen
2010-08-17 18:37 ` [patch 02/10] fs: dentry allocation consolidation Nick Piggin
2010-08-17 22:45   ` Valerie Aurora
2010-08-17 18:37 ` [patch 03/10] apparmor: use task path helpers Nick Piggin
2010-08-17 22:59   ` Valerie Aurora
2010-08-17 18:37 ` [patch 04/10] fs: fs_struct rwlock to spinlock Nick Piggin
2010-08-17 23:14   ` Valerie Aurora
2010-08-20 10:05     ` Nick Piggin
2010-08-17 18:37 ` [patch 05/10] fs: remove extra lookup in __lookup_hash Nick Piggin
2010-08-18 13:57   ` Andi Kleen
2010-08-18 21:13     ` Andi Kleen
2010-08-18 19:34   ` Valerie Aurora [this message]
2010-08-17 18:37 ` [patch 06/10] fs: cleanup files_lock locking Nick Piggin
2010-08-18 19:46   ` Valerie Aurora
2010-08-17 18:37 ` [patch 07/10] tty: fix fu_list abuse Nick Piggin
2010-08-17 18:37 ` [patch 08/10] lglock: introduce special lglock and brlock spin locks Nick Piggin
2010-08-17 18:37 ` [patch 09/10] fs: scale files_lock Nick Piggin
2010-08-17 18:37 ` [patch 10/10] fs: brlock vfsmount_lock Nick Piggin
2010-08-18 14:05   ` Andi Kleen
2010-08-20 10:09     ` Nick Piggin
2010-08-17 21:14 ` [patch 00/10] first set of vfs scale patches Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100818193418.GE10850@shell \
    --to=vaurora@redhat.com \
    --cc=blunck@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=npiggin@kernel.dk \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).