From: Valerie Aurora <vaurora@redhat.com>
To: Nick Piggin <npiggin@kernel.dk>, Jan Blunck <blunck@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>, linux-fsdevel@vger.kernel.org
Subject: Re: [patch 05/10] fs: remove extra lookup in __lookup_hash
Date: Wed, 18 Aug 2010 15:34:18 -0400 [thread overview]
Message-ID: <20100818193418.GE10850@shell> (raw)
In-Reply-To: <20100817184121.115628228@kernel.dk>
On Wed, Aug 18, 2010 at 04:37:34AM +1000, Nick Piggin wrote:
> fs: remove extra lookup in __lookup_hash
>
> Optimize lookup for create operations, where no dentry should often be
> common-case. In cases where it is not, such as unlink, the added overhead
> is much smaller than the removed.
>
> Also, move comments about __d_lookup racyness to the __d_lookup call site.
> d_lookup is intuitive; __d_lookup is what needs commenting. So in that same
> vein, add kerneldoc comments to __d_lookup and clean up some of the comments:
>
> - We are interested in how the RCU lookup works here, particularly with
> renames. Make that explicit, and point to the document where it is explained
> in more detail.
> - RCU is pretty standard now, and macros make implementations pretty mindless.
> If we want to know about RCU barrier details, we look in RCU code.
> - Delete some boring legacy comments because we don't care much about how the
> code used to work, more about the interesting parts of how it works now. So
> comments about lazy LRU may be interesting, but would better be done in the
> LRU or refcount management code.
>
> Signed-off-by: Nick Piggin <npiggin@kernel.dk>
>
> ---
> fs/dcache.c | 60 +++++++++++++++++++++++++++++++++++-------------------------
> fs/namei.c | 32 ++++++++++++++++----------------
> 2 files changed, 51 insertions(+), 41 deletions(-)
>
> Index: linux-2.6/fs/namei.c
> ===================================================================
> --- linux-2.6.orig/fs/namei.c 2010-08-18 04:04:29.000000000 +1000
> +++ linux-2.6/fs/namei.c 2010-08-18 04:05:07.000000000 +1000
> @@ -735,6 +735,11 @@ static int do_lookup(struct nameidata *n
> return err;
> }
>
> + /*
> + * Rename seqlock is not required here because in the off chance
> + * of a false negative due to a concurrent rename, we're going to
> + * do the non-racy lookup, below.
> + */
> dentry = __d_lookup(nd->path.dentry, name);
> if (!dentry)
> goto need_lookup;
> @@ -754,17 +759,13 @@ need_lookup:
> mutex_lock(&dir->i_mutex);
> /*
> * First re-do the cached lookup just in case it was created
> - * while we waited for the directory semaphore..
> - *
> - * FIXME! This could use version numbering or similar to
> - * avoid unnecessary cache lookups.
> - *
> - * The "dcache_lock" is purely to protect the RCU list walker
> - * from concurrent renames at this point (we mustn't get false
> - * negatives from the RCU list walk here, unlike the optimistic
> - * fast walk).
> + * while we waited for the directory semaphore, or the first
> + * lookup failed due to an unrelated rename.
> *
> - * so doing d_lookup() (with seqlock), instead of lockfree __d_lookup
> + * This could use version numbering or similar to avoid unnecessary
> + * cache lookups, but then we'd have to do the first lookup in the
> + * non-racy way. However in the common case here, everything should
> + * be hot in cache, so would it be a big win?
> */
> dentry = d_lookup(parent, name);
> if (likely(!dentry)) {
> @@ -1136,13 +1137,12 @@ static struct dentry *__lookup_hash(stru
> goto out;
> }
>
> - dentry = __d_lookup(base, name);
> -
> - /* lockess __d_lookup may fail due to concurrent d_move()
> - * in some unrelated directory, so try with d_lookup
> + /*
> + * Don't bother with __d_lookup: callers are for creat as
> + * well as unlink, so a lot of the time it would cost
> + * a double lookup.
> */
> - if (!dentry)
> - dentry = d_lookup(base, name);
> + dentry = d_lookup(base, name);
>
> if (dentry && dentry->d_op && dentry->d_op->d_revalidate)
> dentry = do_revalidate(dentry, nd);
> Index: linux-2.6/fs/dcache.c
> ===================================================================
> --- linux-2.6.orig/fs/dcache.c 2010-08-18 04:04:01.000000000 +1000
> +++ linux-2.6/fs/dcache.c 2010-08-18 04:05:07.000000000 +1000
> @@ -1332,31 +1332,13 @@ EXPORT_SYMBOL(d_add_ci);
> * d_lookup - search for a dentry
> * @parent: parent dentry
> * @name: qstr of name we wish to find
> + * Returns: dentry, or NULL
> *
> - * Searches the children of the parent dentry for the name in question. If
> - * the dentry is found its reference count is incremented and the dentry
> - * is returned. The caller must use dput to free the entry when it has
> - * finished using it. %NULL is returned on failure.
> - *
> - * __d_lookup is dcache_lock free. The hash list is protected using RCU.
> - * Memory barriers are used while updating and doing lockless traversal.
> - * To avoid races with d_move while rename is happening, d_lock is used.
> - *
> - * Overflows in memcmp(), while d_move, are avoided by keeping the length
> - * and name pointer in one structure pointed by d_qstr.
> - *
> - * rcu_read_lock() and rcu_read_unlock() are used to disable preemption while
> - * lookup is going on.
> - *
> - * The dentry unused LRU is not updated even if lookup finds the required dentry
> - * in there. It is updated in places such as prune_dcache, shrink_dcache_sb,
> - * select_parent and __dget_locked. This laziness saves lookup from dcache_lock
> - * acquisition.
> - *
> - * d_lookup() is protected against the concurrent renames in some unrelated
> - * directory using the seqlockt_t rename_lock.
> + * d_lookup searches the children of the parent dentry for the name in
> + * question. If the dentry is found its reference count is incremented and the
> + * dentry is returned. The caller must use dput to free the entry when it has
> + * finished using it. %NULL is returned if the dentry does not exist.
> */
> -
> struct dentry * d_lookup(struct dentry * parent, struct qstr * name)
> {
> struct dentry * dentry = NULL;
> @@ -1372,6 +1354,21 @@ struct dentry * d_lookup(struct dentry *
> }
> EXPORT_SYMBOL(d_lookup);
>
> +/*
> + * __d_lookup - search for a dentry (racy)
> + * @parent: parent dentry
> + * @name: qstr of name we wish to find
> + * Returns: dentry, or NULL
> + *
> + * __d_lookup is like d_lookup, however it may (rarely) return a
> + * false-negative result due to unrelated rename activity.
> + *
> + * __d_lookup is slightly faster by avoiding rename_lock read seqlock,
> + * however it must be used carefully, eg. with a following d_lookup in
> + * the case of failure.
> + *
> + * __d_lookup callers must be commented.
> + */
> struct dentry * __d_lookup(struct dentry * parent, struct qstr * name)
> {
> unsigned int len = name->len;
> @@ -1382,6 +1379,19 @@ struct dentry * __d_lookup(struct dentry
> struct hlist_node *node;
> struct dentry *dentry;
>
> + /*
> + * The hash list is protected using RCU.
> + *
> + * Take d_lock when comparing a candidate dentry, to avoid races
> + * with d_move().
> + *
> + * It is possible that concurrent renames can mess up our list
> + * walk here and result in missing our dentry, resulting in the
> + * false-negative result. d_lookup() protects against concurrent
> + * renames using rename_lock seqlock.
> + *
> + * See Documentation/vfs/dcache-locking.txt for more details.
> + */
> rcu_read_lock();
>
> hlist_for_each_entry_rcu(dentry, node, head, d_hash) {
> @@ -1396,8 +1406,8 @@ struct dentry * __d_lookup(struct dentry
>
> /*
> * Recheck the dentry after taking the lock - d_move may have
> - * changed things. Don't bother checking the hash because we're
> - * about to compare the whole name anyway.
> + * changed things. Don't bother checking the hash because
> + * we're about to compare the whole name anyway.
> */
> if (dentry->d_parent != parent)
> goto next;
Jan Blunck (cc'd) wrote a similar patch that I ended up dropping from
the union mount queue just to make my life easier.
This makes sense; it's far more likely that the dentry doesn't exist
than that we collided with a d_move(), in which case the second locked
lookup is totally wasted effort. And the seqlock is low cost anyway.
Reviewed-by: Valerie Aurora <vaurora@redhat.com>
-VAL
next prev parent reply other threads:[~2010-08-18 19:34 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-17 18:37 [patch 00/10] first set of vfs scale patches Nick Piggin
2010-08-17 18:37 ` [patch 01/10] fs: fix do_lookup false negative Nick Piggin
2010-08-17 22:45 ` Valerie Aurora
2010-08-17 23:04 ` Sage Weil
2010-08-18 13:41 ` Andi Kleen
2010-08-17 18:37 ` [patch 02/10] fs: dentry allocation consolidation Nick Piggin
2010-08-17 22:45 ` Valerie Aurora
2010-08-17 18:37 ` [patch 03/10] apparmor: use task path helpers Nick Piggin
2010-08-17 22:59 ` Valerie Aurora
2010-08-17 18:37 ` [patch 04/10] fs: fs_struct rwlock to spinlock Nick Piggin
2010-08-17 23:14 ` Valerie Aurora
2010-08-20 10:05 ` Nick Piggin
2010-08-17 18:37 ` [patch 05/10] fs: remove extra lookup in __lookup_hash Nick Piggin
2010-08-18 13:57 ` Andi Kleen
2010-08-18 21:13 ` Andi Kleen
2010-08-18 19:34 ` Valerie Aurora [this message]
2010-08-17 18:37 ` [patch 06/10] fs: cleanup files_lock locking Nick Piggin
2010-08-18 19:46 ` Valerie Aurora
2010-08-17 18:37 ` [patch 07/10] tty: fix fu_list abuse Nick Piggin
2010-08-17 18:37 ` [patch 08/10] lglock: introduce special lglock and brlock spin locks Nick Piggin
2010-08-17 18:37 ` [patch 09/10] fs: scale files_lock Nick Piggin
2010-08-17 18:37 ` [patch 10/10] fs: brlock vfsmount_lock Nick Piggin
2010-08-18 14:05 ` Andi Kleen
2010-08-20 10:09 ` Nick Piggin
2010-08-17 21:14 ` [patch 00/10] first set of vfs scale patches Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100818193418.GE10850@shell \
--to=vaurora@redhat.com \
--cc=blunck@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=npiggin@kernel.dk \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).