* [rfc] scale dcache locking
@ 2009-03-29 15:55 npiggin
2009-03-29 15:55 ` [patch 01/14] fs: dcache fix LRU ordering npiggin
` (14 more replies)
0 siblings, 15 replies; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
This is my sketch for improving dcache locking scalability. So far I've
only really been looking at core code to get an idea of how it might look,
so most configurable functionality is broken (and unfortunately it might
well be something in there which will cause a fundamental problem for me).
But there is a *lot* of stuff to go through, so I like to just get some
early opinions of this.
I have tried to split it as nicely as possible. Patch splitting still
needs a bit more work, but it is not too bad hopefully to review.
It seems to break naturally into 3 phases:
- First phase is to add new locks and rules to protect specific data
structures and dentry fields. This attempts to be as simple and dumb
replacement as possible, to make review easier.
- Second phase is to remove dcache_lock after it is not protecting
anything itself.
- Last phase is to improve existing locking schemes, and improve the
scalability of the newly added locks.
It's nowhere near complete, but it is running and relatively stable
on configs where it compiles. As far as core locking changes go for
this project, I still need to make nr_dentry into a per-cpu counter,
and break up the LRU lock (which is the last remaining global lock
broken out of dcache_lock). LRU lock currently naturally splits into
per-sb locking, but that's not satisfying because we also want really
good scalability within a single sb. I think per-memory-zone LRU lists
and locking might be a good idea because it scales with socket count
and also makes better NUMA targetted dcache reclaim possible. Making
LRU more lazy might also be a good idea to reduce locking.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 01/14] fs: dcache fix LRU ordering
2009-03-29 15:55 [rfc] scale dcache locking npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-29 15:55 ` [patch 02/14] fs: dcache scale hash npiggin
` (13 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: fs-dcache-order-lru.patch --]
[-- Type: text/plain, Size: 771 bytes --]
Fix ordering of LRU when moving referenced dentries to the head of the list
(they should go to the head of the list in the same order as they were found
from the tail, rather than reverse order).
---
fs/dcache.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -481,7 +481,7 @@ restart:
if ((flags & DCACHE_REFERENCED)
&& (dentry->d_flags & DCACHE_REFERENCED)) {
dentry->d_flags &= ~DCACHE_REFERENCED;
- list_move_tail(&dentry->d_lru, &referenced);
+ list_move(&dentry->d_lru, &referenced);
spin_unlock(&dentry->d_lock);
} else {
list_move_tail(&dentry->d_lru, &tmp);
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 02/14] fs: dcache scale hash
2009-03-29 15:55 [rfc] scale dcache locking npiggin
2009-03-29 15:55 ` [patch 01/14] fs: dcache fix LRU ordering npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-29 15:55 ` [patch 03/14] fs: dcache scale lru npiggin
` (12 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: fs-dcache-scale-d_hash.patch --]
[-- Type: text/plain, Size: 3801 bytes --]
Add a new lock, dcache_hash_lock, to protect the dcache hash table from
concurrent modification. d_hash is also protected by d_lock.
---
fs/dcache.c | 35 ++++++++++++++++++++++++-----------
include/linux/dcache.h | 3 +++
2 files changed, 27 insertions(+), 11 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -34,12 +34,23 @@
#include <linux/bootmem.h>
#include "internal.h"
+/*
+ * Usage:
+ * dcache_hash_lock protects dcache hash table
+ *
+ * Ordering:
+ * dcache_lock
+ * dentry->d_lock
+ * dcache_hash_lock
+ */
int sysctl_vfs_cache_pressure __read_mostly = 100;
EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
- __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
+__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_hash_lock);
+__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
+EXPORT_SYMBOL(dcache_hash_lock);
EXPORT_SYMBOL(dcache_lock);
static struct kmem_cache *dentry_cache __read_mostly;
@@ -1478,17 +1489,20 @@ int d_validate(struct dentry *dentry, st
goto out;
spin_lock(&dcache_lock);
+ spin_lock(&dcache_hash_lock);
base = d_hash(dparent, dentry->d_name.hash);
hlist_for_each(lhp,base) {
/* hlist_for_each_entry_rcu() not required for d_hash list
* as it is parsed under dcache_lock
*/
if (dentry == hlist_entry(lhp, struct dentry, d_hash)) {
+ spin_unlock(&dcache_hash_lock);
__dget_locked(dentry);
spin_unlock(&dcache_lock);
return 1;
}
}
+ spin_unlock(&dcache_hash_lock);
spin_unlock(&dcache_lock);
out:
return 0;
@@ -1562,7 +1576,9 @@ void d_rehash(struct dentry * entry)
{
spin_lock(&dcache_lock);
spin_lock(&entry->d_lock);
+ spin_lock(&dcache_hash_lock);
_d_rehash(entry);
+ spin_unlock(&dcache_hash_lock);
spin_unlock(&entry->d_lock);
spin_unlock(&dcache_lock);
}
@@ -1641,8 +1657,6 @@ static void switch_names(struct dentry *
*/
static void d_move_locked(struct dentry * dentry, struct dentry * target)
{
- struct hlist_head *list;
-
if (!dentry->d_inode)
printk(KERN_WARNING "VFS: moving negative dcache entry\n");
@@ -1659,14 +1673,11 @@ static void d_move_locked(struct dentry
}
/* Move the dentry to the target hash queue, if on different bucket */
- if (d_unhashed(dentry))
- goto already_unhashed;
-
- hlist_del_rcu(&dentry->d_hash);
-
-already_unhashed:
- list = d_hash(target->d_parent, target->d_name.hash);
- __d_rehash(dentry, list);
+ spin_lock(&dcache_hash_lock);
+ if (!d_unhashed(dentry))
+ hlist_del_rcu(&dentry->d_hash);
+ __d_rehash(dentry, d_hash(target->d_parent, target->d_name.hash));
+ spin_unlock(&dcache_hash_lock);
/* Unhash the target: dput() will then get rid of it */
__d_drop(target);
@@ -1862,7 +1873,9 @@ struct dentry *d_materialise_unique(stru
found_lock:
spin_lock(&actual->d_lock);
found:
+ spin_lock(&dcache_hash_lock);
_d_rehash(actual);
+ spin_unlock(&dcache_hash_lock);
spin_unlock(&actual->d_lock);
spin_unlock(&dcache_lock);
out_nolock:
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -184,6 +184,7 @@ d_iput: no no no yes
#define DCACHE_COOKIE 0x0040 /* For use by dcookie subsystem */
+extern spinlock_t dcache_hash_lock;
extern spinlock_t dcache_lock;
extern seqlock_t rename_lock;
@@ -207,7 +208,9 @@ static inline void __d_drop(struct dentr
{
if (!(dentry->d_flags & DCACHE_UNHASHED)) {
dentry->d_flags |= DCACHE_UNHASHED;
+ spin_lock(&dcache_hash_lock);
hlist_del_rcu(&dentry->d_hash);
+ spin_unlock(&dcache_hash_lock);
}
}
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 03/14] fs: dcache scale lru
2009-03-29 15:55 [rfc] scale dcache locking npiggin
2009-03-29 15:55 ` [patch 01/14] fs: dcache fix LRU ordering npiggin
2009-03-29 15:55 ` [patch 02/14] fs: dcache scale hash npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-29 15:55 ` [patch 04/14] fs: dcache scale nr_dentry npiggin
` (11 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: fs-dcache-scale-d_lru.patch --]
[-- Type: text/plain, Size: 7877 bytes --]
Add a new lock, dcache_lru_lock, to protect the dcache hash table from
concurrent modification. d_lru is also protected by d_lock.
Move lru scanning out from underneath dcache_lock.
---
fs/dcache.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 85 insertions(+), 20 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -36,17 +36,26 @@
/*
* Usage:
- * dcache_hash_lock protects dcache hash table
+ * dcache_hash_lock protects:
+ * - the dcache hash table
+ * dcache_lru_lock protects:
+ * - the dcache lru lists and counters
+ * d_lock protects:
+ * - d_flags
+ * - d_name
+ * - d_lru
*
* Ordering:
* dcache_lock
* dentry->d_lock
+ * dcache_lru_lock
* dcache_hash_lock
*/
int sysctl_vfs_cache_pressure __read_mostly = 100;
EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_hash_lock);
+__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lru_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
@@ -133,37 +142,56 @@ static void dentry_iput(struct dentry *
}
/*
- * dentry_lru_(add|add_tail|del|del_init) must be called with dcache_lock held.
+ * dentry_lru_(add|add_tail|del|del_init) must be called with d_lock held
+ * to protect list_empty(d_lru) condition.
*/
static void dentry_lru_add(struct dentry *dentry)
{
+ spin_lock(&dcache_lru_lock);
list_add(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
dentry->d_sb->s_nr_dentry_unused++;
dentry_stat.nr_unused++;
+ spin_unlock(&dcache_lru_lock);
}
static void dentry_lru_add_tail(struct dentry *dentry)
{
+ spin_lock(&dcache_lru_lock);
list_add_tail(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
dentry->d_sb->s_nr_dentry_unused++;
dentry_stat.nr_unused++;
+ spin_unlock(&dcache_lru_lock);
+}
+
+static void __dentry_lru_del(struct dentry *dentry)
+{
+ list_del(&dentry->d_lru);
+ dentry->d_sb->s_nr_dentry_unused--;
+ dentry_stat.nr_unused--;
+}
+
+static void __dentry_lru_del_init(struct dentry *dentry)
+{
+ list_del_init(&dentry->d_lru);
+ dentry->d_sb->s_nr_dentry_unused--;
+ dentry_stat.nr_unused--;
}
static void dentry_lru_del(struct dentry *dentry)
{
if (!list_empty(&dentry->d_lru)) {
- list_del(&dentry->d_lru);
- dentry->d_sb->s_nr_dentry_unused--;
- dentry_stat.nr_unused--;
+ spin_lock(&dcache_lru_lock);
+ __dentry_lru_del(dentry);
+ spin_unlock(&dcache_lru_lock);
}
}
static void dentry_lru_del_init(struct dentry *dentry)
{
if (likely(!list_empty(&dentry->d_lru))) {
- list_del_init(&dentry->d_lru);
- dentry->d_sb->s_nr_dentry_unused--;
- dentry_stat.nr_unused--;
+ spin_lock(&dcache_lru_lock);
+ __dentry_lru_del_init(dentry);
+ spin_unlock(&dcache_lru_lock);
}
}
@@ -174,6 +202,8 @@ static void dentry_lru_del_init(struct d
* The dentry must already be unhashed and removed from the LRU.
*
* If this is the root of the dentry tree, return NULL.
+ *
+ * dcache_lock and d_lock must be held by caller, are dropped by d_kill.
*/
static struct dentry *d_kill(struct dentry *dentry)
__releases(dentry->d_lock)
@@ -326,11 +356,19 @@ int d_invalidate(struct dentry * dentry)
}
/* This should be called _only_ with dcache_lock held */
+static inline struct dentry * __dget_locked_dlock(struct dentry *dentry)
+{
+ atomic_inc(&dentry->d_count);
+ dentry_lru_del_init(dentry);
+ return dentry;
+}
static inline struct dentry * __dget_locked(struct dentry *dentry)
{
atomic_inc(&dentry->d_count);
+ spin_lock(&dentry->d_lock);
dentry_lru_del_init(dentry);
+ spin_lock(&dentry->d_lock);
return dentry;
}
@@ -407,7 +445,7 @@ restart:
list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
spin_lock(&dentry->d_lock);
if (!atomic_read(&dentry->d_count)) {
- __dget_locked(dentry);
+ __dget_locked_dlock(dentry);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
@@ -439,17 +477,18 @@ static void prune_one_dentry(struct dent
* Prune ancestors. Locking is simpler than in dput(),
* because dcache_lock needs to be taken anyway.
*/
- spin_lock(&dcache_lock);
while (dentry) {
- if (!atomic_dec_and_lock(&dentry->d_count, &dentry->d_lock))
+ spin_lock(&dcache_lock);
+ if (!atomic_dec_and_lock(&dentry->d_count, &dentry->d_lock)) {
+ spin_unlock(&dcache_lock);
return;
+ }
if (dentry->d_op && dentry->d_op->d_delete)
dentry->d_op->d_delete(dentry);
dentry_lru_del_init(dentry);
__d_drop(dentry);
dentry = d_kill(dentry);
- spin_lock(&dcache_lock);
}
}
@@ -470,10 +509,11 @@ static void __shrink_dcache_sb(struct su
BUG_ON(!sb);
BUG_ON((flags & DCACHE_REFERENCED) && count == NULL);
- spin_lock(&dcache_lock);
if (count != NULL)
/* called from prune_dcache() and shrink_dcache_parent() */
cnt = *count;
+relock:
+ spin_lock(&dcache_lru_lock);
restart:
if (count == NULL)
list_splice_init(&sb->s_dentry_lru, &tmp);
@@ -483,7 +523,10 @@ restart:
struct dentry, d_lru);
BUG_ON(dentry->d_sb != sb);
- spin_lock(&dentry->d_lock);
+ if (!spin_trylock(&dentry->d_lock)) {
+ spin_unlock(&dcache_lru_lock);
+ goto relock;
+ }
/*
* If we are honouring the DCACHE_REFERENCED flag and
* the dentry has this flag set, don't free it. Clear
@@ -501,13 +544,22 @@ restart:
if (!cnt)
break;
}
- cond_resched_lock(&dcache_lock);
+ cond_resched_lock(&dcache_lru_lock);
}
}
+ spin_unlock(&dcache_lru_lock);
+
+ spin_lock(&dcache_lock);
+again:
+ spin_lock(&dcache_lru_lock); /* lru_lock also protects tmp list */
while (!list_empty(&tmp)) {
dentry = list_entry(tmp.prev, struct dentry, d_lru);
- dentry_lru_del_init(dentry);
- spin_lock(&dentry->d_lock);
+
+ if (!spin_trylock(&dentry->d_lock)) {
+ spin_unlock(&dcache_lru_lock);
+ goto again;
+ }
+ __dentry_lru_del_init(dentry);
/*
* We found an inuse dentry which was not removed from
* the LRU because of laziness during lookup. Do not free
@@ -517,17 +569,22 @@ restart:
spin_unlock(&dentry->d_lock);
continue;
}
+
+ spin_unlock(&dcache_lru_lock);
prune_one_dentry(dentry);
- /* dentry->d_lock was dropped in prune_one_dentry() */
- cond_resched_lock(&dcache_lock);
+ /* dcache_lock and dentry->d_lock dropped */
+ spin_lock(&dcache_lock);
+ spin_lock(&dcache_lru_lock);
}
+ spin_unlock(&dcache_lock);
+
if (count == NULL && !list_empty(&sb->s_dentry_lru))
goto restart;
if (count != NULL)
*count = cnt;
if (!list_empty(&referenced))
list_splice(&referenced, &sb->s_dentry_lru);
- spin_unlock(&dcache_lock);
+ spin_unlock(&dcache_lru_lock);
}
/**
@@ -635,7 +692,9 @@ static void shrink_dcache_for_umount_sub
/* detach this root from the system */
spin_lock(&dcache_lock);
+ spin_lock(&dentry->d_lock);
dentry_lru_del_init(dentry);
+ spin_unlock(&dentry->d_lock);
__d_drop(dentry);
spin_unlock(&dcache_lock);
@@ -649,7 +708,9 @@ static void shrink_dcache_for_umount_sub
spin_lock(&dcache_lock);
list_for_each_entry(loop, &dentry->d_subdirs,
d_u.d_child) {
+ spin_lock(&loop->d_lock);
dentry_lru_del_init(loop);
+ spin_unlock(&loop->d_lock);
__d_drop(loop);
cond_resched_lock(&dcache_lock);
}
@@ -832,13 +893,17 @@ resume:
struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
next = tmp->next;
+ spin_lock(&dentry->d_lock);
dentry_lru_del_init(dentry);
+ spin_unlock(&dentry->d_lock);
/*
* move only zero ref count dentries to the end
* of the unused list for prune_dcache
*/
if (!atomic_read(&dentry->d_count)) {
+ spin_lock(&dentry->d_lock);
dentry_lru_add_tail(dentry);
+ spin_unlock(&dentry->d_lock);
found++;
}
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 04/14] fs: dcache scale nr_dentry
2009-03-29 15:55 [rfc] scale dcache locking npiggin
` (2 preceding siblings ...)
2009-03-29 15:55 ` [patch 03/14] fs: dcache scale lru npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-29 15:55 ` [patch 05/14] fs: dcache scale dentry refcount npiggin
` (10 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: fs-dcache-scale-nr_dentry.patch --]
[-- Type: text/plain, Size: 3093 bytes --]
Make dentry_stat_t.nr_dentry an atomic_t type, and move it from under
dcache_lock.
---
fs/dcache.c | 20 +++++++++-----------
include/linux/dcache.h | 4 ++--
kernel/sysctl.c | 6 ++++++
3 files changed, 17 insertions(+), 13 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -83,6 +83,7 @@ static struct hlist_head *dentry_hashtab
/* Statistics gathering. */
struct dentry_stat_t dentry_stat = {
+ .nr_dentry = ATOMIC_INIT(0),
.age_limit = 45,
};
@@ -101,11 +102,11 @@ static void d_callback(struct rcu_head *
}
/*
- * no dcache_lock, please. The caller must decrement dentry_stat.nr_dentry
- * inside dcache_lock.
+ * no dcache_lock, please.
*/
static void d_free(struct dentry *dentry)
{
+ atomic_dec(&dentry_stat.nr_dentry);
if (dentry->d_op && dentry->d_op->d_release)
dentry->d_op->d_release(dentry);
/* if dentry was never inserted into hash, immediate free is OK */
@@ -212,7 +213,6 @@ static struct dentry *d_kill(struct dent
struct dentry *parent;
list_del(&dentry->d_u.d_child);
- dentry_stat.nr_dentry--; /* For d_free, below */
/*drops the locks, at that point nobody can reach this dentry */
dentry_iput(dentry);
if (IS_ROOT(dentry))
@@ -777,10 +777,7 @@ static void shrink_dcache_for_umount_sub
struct dentry, d_u.d_child);
}
out:
- /* several dentries were freed, need to correct nr_dentry */
- spin_lock(&dcache_lock);
- dentry_stat.nr_dentry -= detached;
- spin_unlock(&dcache_lock);
+ return;
}
/*
@@ -1035,11 +1032,12 @@ struct dentry *d_alloc(struct dentry * p
INIT_LIST_HEAD(&dentry->d_u.d_child);
}
- spin_lock(&dcache_lock);
- if (parent)
+ if (parent) {
+ spin_lock(&dcache_lock);
list_add(&dentry->d_u.d_child, &parent->d_subdirs);
- dentry_stat.nr_dentry++;
- spin_unlock(&dcache_lock);
+ spin_unlock(&dcache_lock);
+ }
+ atomic_inc(&dentry_stat.nr_dentry);
return dentry;
}
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -37,8 +37,8 @@ struct qstr {
};
struct dentry_stat_t {
- int nr_dentry;
- int nr_unused;
+ atomic_t nr_dentry;
+ int nr_unused; /* protected by dcache_lru_lock */
int age_limit; /* age in seconds */
int want_pages; /* pages requested by system */
int dummy[2];
Index: linux-2.6/kernel/sysctl.c
===================================================================
--- linux-2.6.orig/kernel/sysctl.c
+++ linux-2.6/kernel/sysctl.c
@@ -1316,6 +1316,12 @@ static struct ctl_table fs_table[] = {
.extra2 = &sysctl_nr_open_max,
},
{
+ /*
+ * dentry_stat has an atomic_t member, so this is a bit of
+ * a hack, but it works for the moment, and I won't bother
+ * changing it now because we'll probably want to change to
+ * a more scalable counter anyway.
+ */
.ctl_name = FS_DENTRY,
.procname = "dentry-state",
.data = &dentry_stat,
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 05/14] fs: dcache scale dentry refcount
2009-03-29 15:55 [rfc] scale dcache locking npiggin
` (3 preceding siblings ...)
2009-03-29 15:55 ` [patch 04/14] fs: dcache scale nr_dentry npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-29 15:55 ` [patch 06/14] fs: dcache scale d_unhashed npiggin
` (9 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: fs-dcache-scale-d_count.patch --]
[-- Type: text/plain, Size: 8366 bytes --]
Make d_count non-atomic and protect it with d_lock. This allows us to
ensure a 0 refcount dentry remains 0 without dcache_lock. It is also
fairly natural when we start protecting many other dentry members with
d_lock.
---
fs/configfs/dir.c | 3 --
fs/dcache.c | 68 ++++++++++++++++++++++++++++++-------------------
fs/locks.c | 2 -
fs/namei.c | 2 -
include/linux/dcache.h | 8 +++--
5 files changed, 51 insertions(+), 32 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -258,13 +258,23 @@ void dput(struct dentry *dentry)
return;
repeat:
- if (atomic_read(&dentry->d_count) == 1)
+ if (dentry->d_count == 1)
might_sleep();
- if (!atomic_dec_and_lock(&dentry->d_count, &dcache_lock))
- return;
-
spin_lock(&dentry->d_lock);
- if (atomic_read(&dentry->d_count)) {
+ if (dentry->d_count == 1) {
+ if (!spin_trylock(&dcache_lock)) {
+ /*
+ * Something of a livelock possibility we could avoid
+ * by taking dcache_lock and trying again, but we
+ * want to reduce dcache_lock anyway so this will
+ * get improved.
+ */
+ spin_unlock(&dentry->d_lock);
+ goto repeat;
+ }
+ }
+ dentry->d_count--;
+ if (dentry->d_count) {
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
return;
@@ -341,7 +351,7 @@ int d_invalidate(struct dentry * dentry)
* working directory or similar).
*/
spin_lock(&dentry->d_lock);
- if (atomic_read(&dentry->d_count) > 1) {
+ if (dentry->d_count > 1) {
if (dentry->d_inode && S_ISDIR(dentry->d_inode->i_mode)) {
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
@@ -358,16 +368,15 @@ int d_invalidate(struct dentry * dentry)
/* This should be called _only_ with dcache_lock held */
static inline struct dentry * __dget_locked_dlock(struct dentry *dentry)
{
- atomic_inc(&dentry->d_count);
+ dentry->d_count++;
dentry_lru_del_init(dentry);
return dentry;
}
static inline struct dentry * __dget_locked(struct dentry *dentry)
{
- atomic_inc(&dentry->d_count);
spin_lock(&dentry->d_lock);
- dentry_lru_del_init(dentry);
+ __dget_locked_dlock(dentry);
spin_lock(&dentry->d_lock);
return dentry;
}
@@ -444,7 +453,7 @@ restart:
spin_lock(&dcache_lock);
list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
spin_lock(&dentry->d_lock);
- if (!atomic_read(&dentry->d_count)) {
+ if (!dentry->d_count) {
__dget_locked_dlock(dentry);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
@@ -479,7 +488,10 @@ static void prune_one_dentry(struct dent
*/
while (dentry) {
spin_lock(&dcache_lock);
- if (!atomic_dec_and_lock(&dentry->d_count, &dentry->d_lock)) {
+ spin_lock(&dentry->d_lock);
+ dentry->d_count--;
+ if (dentry->d_count) {
+ spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
return;
}
@@ -565,7 +577,7 @@ again:
* the LRU because of laziness during lookup. Do not free
* it - just keep it off the LRU list.
*/
- if (atomic_read(&dentry->d_count)) {
+ if (dentry->d_count) {
spin_unlock(&dentry->d_lock);
continue;
}
@@ -726,7 +738,7 @@ static void shrink_dcache_for_umount_sub
do {
struct inode *inode;
- if (atomic_read(&dentry->d_count) != 0) {
+ if (dentry->d_count != 0) {
printk(KERN_ERR
"BUG: Dentry %p{i=%lx,n=%s}"
" still in use (%d)"
@@ -735,7 +747,7 @@ static void shrink_dcache_for_umount_sub
dentry->d_inode ?
dentry->d_inode->i_ino : 0UL,
dentry->d_name.name,
- atomic_read(&dentry->d_count),
+ dentry->d_count,
dentry->d_sb->s_type->name,
dentry->d_sb->s_id);
BUG();
@@ -745,7 +757,9 @@ static void shrink_dcache_for_umount_sub
parent = NULL;
else {
parent = dentry->d_parent;
- atomic_dec(&parent->d_count);
+ spin_lock(&parent->d_lock);
+ parent->d_count--;
+ spin_unlock(&parent->d_lock);
}
list_del(&dentry->d_u.d_child);
@@ -800,7 +814,9 @@ void shrink_dcache_for_umount(struct sup
dentry = sb->s_root;
sb->s_root = NULL;
- atomic_dec(&dentry->d_count);
+ spin_lock(&dentry->d_lock);
+ dentry->d_count--;
+ spin_unlock(&dentry->d_lock);
shrink_dcache_for_umount_subtree(dentry);
while (!hlist_empty(&sb->s_anon)) {
@@ -892,17 +908,15 @@ resume:
spin_lock(&dentry->d_lock);
dentry_lru_del_init(dentry);
- spin_unlock(&dentry->d_lock);
/*
* move only zero ref count dentries to the end
* of the unused list for prune_dcache
*/
- if (!atomic_read(&dentry->d_count)) {
- spin_lock(&dentry->d_lock);
+ if (!dentry->d_count) {
dentry_lru_add_tail(dentry);
- spin_unlock(&dentry->d_lock);
found++;
}
+ spin_unlock(&dentry->d_lock);
/*
* We can return to the caller if we have found some (this
@@ -1011,7 +1025,7 @@ struct dentry *d_alloc(struct dentry * p
memcpy(dname, name->name, name->len);
dname[name->len] = 0;
- atomic_set(&dentry->d_count, 1);
+ dentry->d_count = 1;
dentry->d_flags = DCACHE_UNHASHED;
spin_lock_init(&dentry->d_lock);
dentry->d_inode = NULL;
@@ -1491,7 +1505,7 @@ struct dentry * __d_lookup(struct dentry
goto next;
}
- atomic_inc(&dentry->d_count);
+ dentry->d_count++;
found = dentry;
spin_unlock(&dentry->d_lock);
break;
@@ -1601,7 +1615,7 @@ void d_delete(struct dentry * dentry)
spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
isdir = S_ISDIR(dentry->d_inode->i_mode);
- if (atomic_read(&dentry->d_count) == 1) {
+ if (dentry->d_count == 1) {
dentry_iput(dentry);
fsnotify_nameremove(dentry, isdir);
return;
@@ -2278,11 +2292,15 @@ resume:
this_parent = dentry;
goto repeat;
}
- atomic_dec(&dentry->d_count);
+ spin_lock(&dentry->d_lock);
+ dentry->d_count--;
+ spin_unlock(&dentry->d_lock);
}
if (this_parent != root) {
next = this_parent->d_u.d_child.next;
- atomic_dec(&this_parent->d_count);
+ spin_lock(&this_parent->d_lock);
+ this_parent->d_count--;
+ spin_unlock(&this_parent->d_lock);
this_parent = this_parent->d_parent;
goto resume;
}
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -87,7 +87,7 @@ full_name_hash(const unsigned char *name
#endif
struct dentry {
- atomic_t d_count;
+ unsigned int d_count; /* protected by d_lock */
unsigned int d_flags; /* protected by d_lock */
spinlock_t d_lock; /* per dentry lock */
int d_mounted;
@@ -334,8 +334,10 @@ extern char *dentry_path(struct dentry *
static inline struct dentry *dget(struct dentry *dentry)
{
if (dentry) {
- BUG_ON(!atomic_read(&dentry->d_count));
- atomic_inc(&dentry->d_count);
+ BUG_ON(!dentry->d_count);
+ spin_lock(&dentry->d_lock);
+ dentry->d_count++;
+ spin_unlock(&dentry->d_lock);
}
return dentry;
}
Index: linux-2.6/fs/configfs/dir.c
===================================================================
--- linux-2.6.orig/fs/configfs/dir.c
+++ linux-2.6/fs/configfs/dir.c
@@ -311,8 +311,7 @@ static void remove_dir(struct dentry * d
if (d->d_inode)
simple_rmdir(parent->d_inode,d);
- pr_debug(" o %s removing done (%d)\n",d->d_name.name,
- atomic_read(&d->d_count));
+ pr_debug(" o %s removing done (%d)\n",d->d_name.name, d->d_count);
dput(parent);
}
Index: linux-2.6/fs/locks.c
===================================================================
--- linux-2.6.orig/fs/locks.c
+++ linux-2.6/fs/locks.c
@@ -1373,7 +1373,7 @@ int generic_setlease(struct file *filp,
if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
goto out;
if ((arg == F_WRLCK)
- && ((atomic_read(&dentry->d_count) > 1)
+ && (dentry->d_count > 1
|| (atomic_read(&inode->i_count) > 1)))
goto out;
}
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c
+++ linux-2.6/fs/namei.c
@@ -2107,7 +2107,7 @@ void dentry_unhash(struct dentry *dentry
shrink_dcache_parent(dentry);
spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
- if (atomic_read(&dentry->d_count) == 2)
+ if (dentry->d_count == 2)
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 06/14] fs: dcache scale d_unhashed
2009-03-29 15:55 [rfc] scale dcache locking npiggin
` (4 preceding siblings ...)
2009-03-29 15:55 ` [patch 05/14] fs: dcache scale dentry refcount npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-29 15:55 ` [patch 07/14] fs: dcache scale subdirs npiggin
` (8 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: fs-dcache-scale-d_unhashed.patch --]
[-- Type: text/plain, Size: 9363 bytes --]
Protect d_unhashed(dentry) condition with d_lock.
---
fs/dcache.c | 46 +++++++++++++++++++++++++++++++++++-----------
fs/libfs.c | 29 ++++++++++++++++++++---------
fs/seq_file.c | 3 +++
fs/sysfs/dir.c | 8 +++++---
include/linux/dcache.h | 3 +--
5 files changed, 64 insertions(+), 25 deletions(-)
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -343,6 +341,7 @@ static inline struct dentry *dget(struct
}
extern struct dentry * dget_locked(struct dentry *);
+extern struct dentry * dget_locked_dlock(struct dentry *);
/**
* d_unhashed - is dentry hashed
Index: linux-2.6/fs/sysfs/dir.c
===================================================================
--- linux-2.6.orig/fs/sysfs/dir.c
+++ linux-2.6/fs/sysfs/dir.c
@@ -521,10 +521,12 @@ static void sysfs_drop_dentry(struct sys
repeat:
spin_lock(&dcache_lock);
list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
- if (d_unhashed(dentry))
- continue;
- dget_locked(dentry);
spin_lock(&dentry->d_lock);
+ if (d_unhashed(dentry)) {
+ spin_unlock(&dentry->d_lock);
+ continue;
+ }
+ dget_locked_dlock(dentry);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
Index: linux-2.6/fs/libfs.c
===================================================================
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -12,6 +12,11 @@
#include <asm/uaccess.h>
+static inline int simple_positive(struct dentry *dentry)
+{
+ return dentry->d_inode && !d_unhashed(dentry);
+}
+
int simple_getattr(struct vfsmount *mnt, struct dentry *dentry,
struct kstat *stat)
{
@@ -101,8 +106,10 @@ loff_t dcache_dir_lseek(struct file *fil
while (n && p != &file->f_path.dentry->d_subdirs) {
struct dentry *next;
next = list_entry(p, struct dentry, d_u.d_child);
- if (!d_unhashed(next) && next->d_inode)
+ spin_lock(&next->d_lock);
+ if (simple_positive(next))
n--;
+ spin_unlock(&next->d_lock);
p = p->next;
}
list_add_tail(&cursor->d_u.d_child, p);
@@ -156,9 +163,13 @@ int dcache_readdir(struct file * filp, v
for (p=q->next; p != &dentry->d_subdirs; p=p->next) {
struct dentry *next;
next = list_entry(p, struct dentry, d_u.d_child);
- if (d_unhashed(next) || !next->d_inode)
+ spin_lock_nested(&next->d_lock, DENTRY_D_LOCK_NESTED);
+ if (!simple_positive(next)) {
+ spin_unlock(&next->d_lock);
continue;
+ }
+ spin_unlock(&next->d_lock);
spin_unlock(&dcache_lock);
if (filldir(dirent, next->d_name.name,
next->d_name.len, filp->f_pos,
@@ -262,20 +273,20 @@ int simple_link(struct dentry *old_dentr
return 0;
}
-static inline int simple_positive(struct dentry *dentry)
-{
- return dentry->d_inode && !d_unhashed(dentry);
-}
-
int simple_empty(struct dentry *dentry)
{
struct dentry *child;
int ret = 0;
spin_lock(&dcache_lock);
- list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child)
- if (simple_positive(child))
+ list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
+ spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
+ if (simple_positive(child)) {
+ spin_unlock(&child->d_lock);
goto out;
+ }
+ spin_unlock(&child->d_lock);
+ }
ret = 1;
out:
spin_unlock(&dcache_lock);
Index: linux-2.6/fs/seq_file.c
===================================================================
--- linux-2.6.orig/fs/seq_file.c
+++ linux-2.6/fs/seq_file.c
@@ -6,6 +6,7 @@
*/
#include <linux/fs.h>
+#include <linux/mount.h>
#include <linux/module.h>
#include <linux/seq_file.h>
#include <linux/slab.h>
@@ -460,7 +461,9 @@ int seq_path_root(struct seq_file *m, st
char *p;
spin_lock(&dcache_lock);
+ spin_lock(&vfsmount_lock);
p = __d_path(path, root, s, m->size - m->count);
+ spin_unlock(&vfsmount_lock);
spin_unlock(&dcache_lock);
err = PTR_ERR(p);
if (!IS_ERR(p)) {
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -326,7 +326,9 @@ int d_invalidate(struct dentry * dentry)
* If it's already been dropped, return OK.
*/
spin_lock(&dcache_lock);
+ spin_lock(&dentry->d_lock);
if (d_unhashed(dentry)) {
+ spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
return 0;
}
@@ -335,6 +337,7 @@ int d_invalidate(struct dentry * dentry)
* to get rid of unused child entries.
*/
if (!list_empty(&dentry->d_subdirs)) {
+ spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
shrink_dcache_parent(dentry);
spin_lock(&dcache_lock);
@@ -386,6 +389,11 @@ struct dentry * dget_locked(struct dentr
return __dget_locked(dentry);
}
+struct dentry * dget_locked_dlock(struct dentry *dentry)
+{
+ return __dget_locked_dlock(dentry);
+}
+
/**
* d_find_alias - grab a hashed alias of inode
* @inode: inode in question
@@ -415,15 +423,18 @@ static struct dentry * __d_find_alias(st
next = tmp->next;
prefetch(next);
alias = list_entry(tmp, struct dentry, d_alias);
+ spin_lock(&alias->d_lock);
if (S_ISDIR(inode->i_mode) || !d_unhashed(alias)) {
if (IS_ROOT(alias) &&
(alias->d_flags & DCACHE_DISCONNECTED))
discon_alias = alias;
else if (!want_discon) {
- __dget_locked(alias);
+ __dget_locked_dlock(alias);
+ spin_unlock(&alias->d_lock);
return alias;
}
}
+ spin_unlock(&alias->d_lock);
}
if (discon_alias)
__dget_locked(discon_alias);
@@ -706,8 +717,8 @@ static void shrink_dcache_for_umount_sub
spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
dentry_lru_del_init(dentry);
- spin_unlock(&dentry->d_lock);
__d_drop(dentry);
+ spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
for (;;) {
@@ -722,8 +733,8 @@ static void shrink_dcache_for_umount_sub
d_u.d_child) {
spin_lock(&loop->d_lock);
dentry_lru_del_init(loop);
- spin_unlock(&loop->d_lock);
__d_drop(loop);
+ spin_unlock(&loop->d_lock);
cond_resched_lock(&dcache_lock);
}
spin_unlock(&dcache_lock);
@@ -1997,7 +2006,8 @@ static int prepend_name(char **buffer, i
* Returns a pointer into the buffer or an error code if the
* path was too long.
*
- * "buflen" should be positive. Caller holds the dcache_lock.
+ * "buflen" should be positive. Caller holds the dcache_lock and
+ * path->dentry->d_lock.
*
* If path is not reachable from the supplied root, then the value of
* root is changed (without modifying refcounts).
@@ -2010,7 +2020,6 @@ char *__d_path(const struct path *path,
char *end = buffer + buflen;
char *retval;
- spin_lock(&vfsmount_lock);
prepend(&end, &buflen, "\0", 1);
if (!IS_ROOT(dentry) && d_unhashed(dentry) &&
(prepend(&end, &buflen, " (deleted)", 10) != 0))
@@ -2046,7 +2055,6 @@ char *__d_path(const struct path *path,
}
out:
- spin_unlock(&vfsmount_lock);
return retval;
global_root:
@@ -2099,8 +2107,12 @@ char *d_path(const struct path *path, ch
path_get(&root);
read_unlock(¤t->fs->lock);
spin_lock(&dcache_lock);
+ spin_lock(&vfsmount_lock);
+ spin_lock(&path->dentry->d_lock);
tmp = root;
res = __d_path(path, &tmp, buf, buflen);
+ spin_unlock(&path->dentry->d_lock);
+ spin_unlock(&vfsmount_lock);
spin_unlock(&dcache_lock);
path_put(&root);
return res;
@@ -2136,6 +2148,7 @@ char *dentry_path(struct dentry *dentry,
char *retval;
spin_lock(&dcache_lock);
+ spin_lock(&dentry->d_lock);
prepend(&end, &buflen, "\0", 1);
if (!IS_ROOT(dentry) && d_unhashed(dentry) &&
(prepend(&end, &buflen, "//deleted", 9) != 0))
@@ -2157,9 +2170,11 @@ char *dentry_path(struct dentry *dentry,
retval = end;
dentry = parent;
}
+ spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
return retval;
Elong:
+ spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
return ERR_PTR(-ENAMETOOLONG);
}
@@ -2201,12 +2216,16 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
error = -ENOENT;
/* Has the current directory has been unlinked? */
spin_lock(&dcache_lock);
+ spin_lock(&vfsmount_lock);
+ spin_lock(&pwd.dentry->d_lock);
if (IS_ROOT(pwd.dentry) || !d_unhashed(pwd.dentry)) {
unsigned long len;
struct path tmp = root;
char * cwd;
cwd = __d_path(&pwd, &tmp, page, PAGE_SIZE);
+ spin_unlock(&pwd.dentry->d_lock);
+ spin_unlock(&vfsmount_lock);
spin_unlock(&dcache_lock);
error = PTR_ERR(cwd);
@@ -2220,8 +2239,10 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
if (copy_to_user(buf, cwd, len))
error = -EFAULT;
}
- } else
+ } else {
+ spin_unlock(&pwd.dentry->d_lock);
spin_unlock(&dcache_lock);
+ }
out:
path_put(&pwd);
@@ -2286,13 +2307,16 @@ resume:
struct list_head *tmp = next;
struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
next = tmp->next;
- if (d_unhashed(dentry)||!dentry->d_inode)
+ spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+ if (d_unhashed(dentry) || !dentry->d_inode) {
+ spin_unlock(&dentry->d_lock);
continue;
+ }
if (!list_empty(&dentry->d_subdirs)) {
+ spin_unlock(&dentry->d_lock);
this_parent = dentry;
goto repeat;
}
- spin_lock(&dentry->d_lock);
dentry->d_count--;
spin_unlock(&dentry->d_lock);
}
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 07/14] fs: dcache scale subdirs
2009-03-29 15:55 [rfc] scale dcache locking npiggin
` (5 preceding siblings ...)
2009-03-29 15:55 ` [patch 06/14] fs: dcache scale d_unhashed npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-29 15:55 ` [patch 08/14] fs: scale inode alias list npiggin
` (7 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: fs-dcache-scale-d_subdirs.patch --]
[-- Type: text/plain, Size: 16629 bytes --]
Protect d_subdirs and d_child with d_lock.
XXX: probably don't need parent lock in inotify (because child lock
should stabilize parent).
---
fs/dcache.c | 155 +++++++++++++++++++++++++++++++++++---------
fs/libfs.c | 19 +++--
fs/notify/inotify/inotify.c | 16 +++-
include/linux/dcache.h | 10 ++
4 files changed, 159 insertions(+), 41 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -44,6 +44,8 @@
* - d_flags
* - d_name
* - d_lru
+ * - d_unhashed
+ * - d_subdirs and children's d_child
*
* Ordering:
* dcache_lock
@@ -204,7 +206,8 @@ static void dentry_lru_del_init(struct d
*
* If this is the root of the dentry tree, return NULL.
*
- * dcache_lock and d_lock must be held by caller, are dropped by d_kill.
+ * dcache_lock and d_lock and d_parent->d_lock must be held by caller, and
+ * are dropped by d_kill.
*/
static struct dentry *d_kill(struct dentry *dentry)
__releases(dentry->d_lock)
@@ -213,12 +216,14 @@ static struct dentry *d_kill(struct dent
struct dentry *parent;
list_del(&dentry->d_u.d_child);
- /*drops the locks, at that point nobody can reach this dentry */
- dentry_iput(dentry);
+ if (dentry->d_parent && dentry != dentry->d_parent)
+ spin_unlock(&dentry->d_parent->d_lock);
if (IS_ROOT(dentry))
parent = NULL;
else
parent = dentry->d_parent;
+ /*drops the locks, at that point nobody can reach this dentry */
+ dentry_iput(dentry);
d_free(dentry);
return parent;
}
@@ -254,6 +259,7 @@ static struct dentry *d_kill(struct dent
void dput(struct dentry *dentry)
{
+ struct dentry *parent;
if (!dentry)
return;
@@ -272,6 +278,15 @@ repeat:
spin_unlock(&dentry->d_lock);
goto repeat;
}
+ parent = dentry->d_parent;
+ if (parent) {
+ BUG_ON(parent == dentry);
+ if (!spin_trylock(&parent->d_lock)) {
+ spin_unlock(&dentry->d_lock);
+ spin_unlock(&dcache_lock);
+ goto repeat;
+ }
+ }
}
dentry->d_count--;
if (dentry->d_count) {
@@ -295,6 +310,8 @@ repeat:
dentry_lru_add(dentry);
}
spin_unlock(&dentry->d_lock);
+ if (parent)
+ spin_unlock(&parent->d_lock);
spin_unlock(&dcache_lock);
return;
@@ -498,10 +515,22 @@ static void prune_one_dentry(struct dent
* because dcache_lock needs to be taken anyway.
*/
while (dentry) {
+ struct dentry *parent = NULL;
+
spin_lock(&dcache_lock);
+again:
spin_lock(&dentry->d_lock);
+ if (dentry->d_parent && dentry != dentry->d_parent) {
+ if (!spin_trylock(&dentry->d_parent->d_lock)) {
+ spin_unlock(&dentry->d_lock);
+ goto again;
+ }
+ parent = dentry->d_parent;
+ }
dentry->d_count--;
if (dentry->d_count) {
+ if (parent)
+ spin_unlock(&parent->d_lock);
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
return;
@@ -579,20 +608,28 @@ again:
dentry = list_entry(tmp.prev, struct dentry, d_lru);
if (!spin_trylock(&dentry->d_lock)) {
+again1:
spin_unlock(&dcache_lru_lock);
goto again;
}
- __dentry_lru_del_init(dentry);
/*
* We found an inuse dentry which was not removed from
* the LRU because of laziness during lookup. Do not free
* it - just keep it off the LRU list.
*/
if (dentry->d_count) {
+ __dentry_lru_del_init(dentry);
spin_unlock(&dentry->d_lock);
continue;
}
-
+ if (dentry->d_parent) {
+ BUG_ON(dentry == dentry->d_parent);
+ if (!spin_trylock(&dentry->d_parent->d_lock)) {
+ spin_unlock(&dentry->d_lock);
+ goto again1;
+ }
+ }
+ __dentry_lru_del_init(dentry);
spin_unlock(&dcache_lru_lock);
prune_one_dentry(dentry);
/* dcache_lock and dentry->d_lock dropped */
@@ -729,14 +766,15 @@ static void shrink_dcache_for_umount_sub
/* this is a branch with children - detach all of them
* from the system in one go */
spin_lock(&dcache_lock);
+ spin_lock(&dentry->d_lock);
list_for_each_entry(loop, &dentry->d_subdirs,
d_u.d_child) {
- spin_lock(&loop->d_lock);
+ spin_lock_nested(&loop->d_lock, DENTRY_D_LOCK_NESTED);
dentry_lru_del_init(loop);
__d_drop(loop);
spin_unlock(&loop->d_lock);
- cond_resched_lock(&dcache_lock);
}
+ spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
/* move to the first child */
@@ -764,16 +802,17 @@ static void shrink_dcache_for_umount_sub
BUG();
}
- if (IS_ROOT(dentry))
+ if (IS_ROOT(dentry)) {
parent = NULL;
- else {
+ list_del(&dentry->d_u.d_child);
+ } else {
parent = dentry->d_parent;
spin_lock(&parent->d_lock);
parent->d_count--;
+ list_del(&dentry->d_u.d_child);
spin_unlock(&parent->d_lock);
}
- list_del(&dentry->d_u.d_child);
detached++;
inode = dentry->d_inode;
@@ -858,6 +897,7 @@ int have_submounts(struct dentry *parent
spin_lock(&dcache_lock);
if (d_mountpoint(parent))
goto positive;
+ spin_lock(&this_parent->d_lock);
repeat:
next = this_parent->d_subdirs.next;
resume:
@@ -865,22 +905,32 @@ resume:
struct list_head *tmp = next;
struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
next = tmp->next;
+
+ spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
/* Have we found a mount point ? */
- if (d_mountpoint(dentry))
+ if (d_mountpoint(dentry)) {
+ spin_unlock(&dentry->d_lock);
+ spin_unlock(&this_parent->d_lock);
goto positive;
+ }
if (!list_empty(&dentry->d_subdirs)) {
+ spin_unlock(&this_parent->d_lock);
this_parent = dentry;
goto repeat;
}
+ spin_unlock(&dentry->d_lock);
}
/*
* All done at this level ... ascend and resume the search.
*/
if (this_parent != parent) {
next = this_parent->d_u.d_child.next;
+ spin_unlock(&this_parent->d_lock);
this_parent = this_parent->d_parent;
+ spin_lock(&this_parent->d_lock);
goto resume;
}
+ spin_unlock(&this_parent->d_lock);
spin_unlock(&dcache_lock);
return 0; /* No mount points found in tree */
positive:
@@ -909,6 +959,7 @@ static int select_parent(struct dentry *
int found = 0;
spin_lock(&dcache_lock);
+ spin_lock(&this_parent->d_lock);
repeat:
next = this_parent->d_subdirs.next;
resume:
@@ -916,8 +967,9 @@ resume:
struct list_head *tmp = next;
struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
next = tmp->next;
+ BUG_ON(this_parent == dentry);
- spin_lock(&dentry->d_lock);
+ spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
dentry_lru_del_init(dentry);
/*
* move only zero ref count dentries to the end
@@ -927,33 +979,45 @@ resume:
dentry_lru_add_tail(dentry);
found++;
}
- spin_unlock(&dentry->d_lock);
/*
* We can return to the caller if we have found some (this
* ensures forward progress). We'll be coming back to find
* the rest.
*/
- if (found && need_resched())
+ if (found && need_resched()) {
+ spin_unlock(&dentry->d_lock);
goto out;
+ }
/*
* Descend a level if the d_subdirs list is non-empty.
*/
if (!list_empty(&dentry->d_subdirs)) {
+ spin_unlock(&this_parent->d_lock);
+ spin_release(&dentry->d_lock.dep_map, 1, _RET_IP_);
this_parent = dentry;
+ spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
goto repeat;
}
+
+ spin_unlock(&dentry->d_lock);
}
/*
* All done at this level ... ascend and resume the search.
*/
if (this_parent != parent) {
+ struct dentry *tmp;
next = this_parent->d_u.d_child.next;
- this_parent = this_parent->d_parent;
+ tmp = this_parent->d_parent;
+ spin_unlock(&this_parent->d_lock);
+ BUG_ON(tmp == this_parent);
+ this_parent = tmp;
+ spin_lock(&this_parent->d_lock);
goto resume;
}
out:
+ spin_unlock(&this_parent->d_lock);
spin_unlock(&dcache_lock);
return found;
}
@@ -1049,19 +1113,20 @@ struct dentry *d_alloc(struct dentry * p
INIT_LIST_HEAD(&dentry->d_lru);
INIT_LIST_HEAD(&dentry->d_subdirs);
INIT_LIST_HEAD(&dentry->d_alias);
-
- if (parent) {
- dentry->d_parent = dget(parent);
- dentry->d_sb = parent->d_sb;
- } else {
- INIT_LIST_HEAD(&dentry->d_u.d_child);
- }
+ INIT_LIST_HEAD(&dentry->d_u.d_child);
if (parent) {
spin_lock(&dcache_lock);
+ spin_lock(&parent->d_lock);
+ spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+ dentry->d_parent = dget_dlock(parent);
+ dentry->d_sb = parent->d_sb;
list_add(&dentry->d_u.d_child, &parent->d_subdirs);
+ spin_unlock(&dentry->d_lock);
+ spin_unlock(&parent->d_lock);
spin_unlock(&dcache_lock);
}
+
atomic_inc(&dentry_stat.nr_dentry);
return dentry;
@@ -1749,15 +1814,27 @@ static void d_move_locked(struct dentry
printk(KERN_WARNING "VFS: moving negative dcache entry\n");
write_seqlock(&rename_lock);
- /*
- * XXXX: do we really need to take target->d_lock?
- */
+
+ if (target->d_parent != dentry->d_parent) {
+ if (target->d_parent < dentry->d_parent) {
+ spin_lock(&target->d_parent->d_lock);
+ spin_lock_nested(&dentry->d_parent->d_lock,
+ DENTRY_D_LOCK_NESTED);
+ } else {
+ spin_lock(&dentry->d_parent->d_lock);
+ spin_lock_nested(&target->d_parent->d_lock,
+ DENTRY_D_LOCK_NESTED);
+ }
+ } else {
+ spin_lock(&target->d_parent->d_lock);
+ }
+
if (target < dentry) {
- spin_lock(&target->d_lock);
- spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+ spin_lock_nested(&target->d_lock, 2);
+ spin_lock_nested(&dentry->d_lock, 3);
} else {
- spin_lock(&dentry->d_lock);
- spin_lock_nested(&target->d_lock, DENTRY_D_LOCK_NESTED);
+ spin_lock_nested(&dentry->d_lock, 2);
+ spin_lock_nested(&target->d_lock, 3);
}
/* Move the dentry to the target hash queue, if on different bucket */
@@ -1790,7 +1867,10 @@ static void d_move_locked(struct dentry
}
list_add(&dentry->d_u.d_child, &dentry->d_parent->d_subdirs);
+ if (target->d_parent != dentry->d_parent)
+ spin_unlock(&dentry->d_parent->d_lock);
spin_unlock(&target->d_lock);
+ spin_unlock(&target->d_parent->d_lock);
fsnotify_d_move(dentry);
spin_unlock(&dentry->d_lock);
write_sequnlock(&rename_lock);
@@ -1889,6 +1969,12 @@ static void __d_materialise_dentry(struc
dparent = dentry->d_parent;
aparent = anon->d_parent;
+ /* XXX: hack */
+ spin_lock(&aparent->d_lock);
+ spin_lock(&dparent->d_lock);
+ spin_lock(&dentry->d_lock);
+ spin_lock(&anon->d_lock);
+
dentry->d_parent = (aparent == anon) ? dentry : aparent;
list_del(&dentry->d_u.d_child);
if (!IS_ROOT(dentry))
@@ -1903,6 +1989,11 @@ static void __d_materialise_dentry(struc
else
INIT_LIST_HEAD(&anon->d_u.d_child);
+ spin_unlock(&anon->d_lock);
+ spin_unlock(&dentry->d_lock);
+ spin_unlock(&dparent->d_lock);
+ spin_unlock(&aparent->d_lock);
+
anon->d_flags &= ~DCACHE_DISCONNECTED;
}
@@ -2302,6 +2393,7 @@ void d_genocide(struct dentry *root)
struct list_head *next;
spin_lock(&dcache_lock);
+ spin_lock(&this_parent->d_lock);
repeat:
next = this_parent->d_subdirs.next;
resume:
@@ -2315,7 +2407,7 @@ resume:
continue;
}
if (!list_empty(&dentry->d_subdirs)) {
- spin_unlock(&dentry->d_lock);
+ spin_unlock(&this_parent->d_lock);
this_parent = dentry;
goto repeat;
}
@@ -2324,12 +2416,13 @@ resume:
}
if (this_parent != root) {
next = this_parent->d_u.d_child.next;
- spin_lock(&this_parent->d_lock);
this_parent->d_count--;
spin_unlock(&this_parent->d_lock);
this_parent = this_parent->d_parent;
+ spin_lock(&this_parent->d_lock);
goto resume;
}
+ spin_unlock(&this_parent->d_lock);
spin_unlock(&dcache_lock);
}
Index: linux-2.6/fs/libfs.c
===================================================================
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -82,7 +82,8 @@ int dcache_dir_close(struct inode *inode
loff_t dcache_dir_lseek(struct file *file, loff_t offset, int origin)
{
- mutex_lock(&file->f_path.dentry->d_inode->i_mutex);
+ struct dentry *dentry = file->f_path.dentry;
+ mutex_lock(&dentry->d_inode->i_mutex);
switch (origin) {
case 1:
offset += file->f_pos;
@@ -90,7 +91,7 @@ loff_t dcache_dir_lseek(struct file *fil
if (offset >= 0)
break;
default:
- mutex_unlock(&file->f_path.dentry->d_inode->i_mutex);
+ mutex_unlock(&dentry->d_inode->i_mutex);
return -EINVAL;
}
if (offset != file->f_pos) {
@@ -101,9 +102,10 @@ loff_t dcache_dir_lseek(struct file *fil
loff_t n = file->f_pos - 2;
spin_lock(&dcache_lock);
+ spin_lock(&dentry->d_lock);
list_del(&cursor->d_u.d_child);
- p = file->f_path.dentry->d_subdirs.next;
- while (n && p != &file->f_path.dentry->d_subdirs) {
+ p = dentry->d_subdirs.next;
+ while (n && p != &dentry->d_subdirs) {
struct dentry *next;
next = list_entry(p, struct dentry, d_u.d_child);
spin_lock(&next->d_lock);
@@ -113,10 +115,11 @@ loff_t dcache_dir_lseek(struct file *fil
p = p->next;
}
list_add_tail(&cursor->d_u.d_child, p);
+ spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
}
}
- mutex_unlock(&file->f_path.dentry->d_inode->i_mutex);
+ mutex_unlock(&dentry->d_inode->i_mutex);
return offset;
}
@@ -157,6 +160,7 @@ int dcache_readdir(struct file * filp, v
/* fallthrough */
default:
spin_lock(&dcache_lock);
+ spin_lock(&dentry->d_lock);
if (filp->f_pos == 2)
list_move(q, &dentry->d_subdirs);
@@ -170,6 +174,7 @@ int dcache_readdir(struct file * filp, v
}
spin_unlock(&next->d_lock);
+ spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
if (filldir(dirent, next->d_name.name,
next->d_name.len, filp->f_pos,
@@ -177,11 +182,13 @@ int dcache_readdir(struct file * filp, v
dt_type(next->d_inode)) < 0)
return 0;
spin_lock(&dcache_lock);
+ spin_lock(&dentry->d_lock);
/* next is still alive */
list_move(q, p);
p = q;
filp->f_pos++;
}
+ spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
}
return 0;
@@ -279,6 +286,7 @@ int simple_empty(struct dentry *dentry)
int ret = 0;
spin_lock(&dcache_lock);
+ spin_lock(&dentry->d_lock);
list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
if (simple_positive(child)) {
@@ -289,6 +297,7 @@ int simple_empty(struct dentry *dentry)
}
ret = 1;
out:
+ spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
return ret;
}
Index: linux-2.6/fs/notify/inotify/inotify.c
===================================================================
--- linux-2.6.orig/fs/notify/inotify/inotify.c
+++ linux-2.6/fs/notify/inotify/inotify.c
@@ -188,17 +188,19 @@ static void set_dentry_child_flags(struc
list_for_each_entry(alias, &inode->i_dentry, d_alias) {
struct dentry *child;
+ spin_lock(&alias->d_lock);
list_for_each_entry(child, &alias->d_subdirs, d_u.d_child) {
if (!child->d_inode)
continue;
- spin_lock(&child->d_lock);
+ spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
if (watched)
child->d_flags |= DCACHE_INOTIFY_PARENT_WATCHED;
else
child->d_flags &=~DCACHE_INOTIFY_PARENT_WATCHED;
spin_unlock(&child->d_lock);
}
+ spin_unlock(&alias->d_lock);
}
spin_unlock(&dcache_lock);
}
@@ -338,18 +340,26 @@ void inotify_dentry_parent_queue_event(s
if (!(dentry->d_flags & DCACHE_INOTIFY_PARENT_WATCHED))
return;
+retry:
spin_lock(&dentry->d_lock);
parent = dentry->d_parent;
+ if (!spin_trylock(&parent->d_lock)) {
+ spin_unlock(&dentry->d_lock);
+ goto retry;
+ }
inode = parent->d_inode;
if (inotify_inode_watched(inode)) {
- dget(parent);
+ dget_dlock(parent);
spin_unlock(&dentry->d_lock);
+ spin_unlock(&parent->d_lock);
inotify_inode_queue_event(inode, mask, cookie, name,
dentry->d_inode);
dput(parent);
- } else
+ } else {
spin_unlock(&dentry->d_lock);
+ spin_unlock(&parent->d_lock);
+ }
}
EXPORT_SYMBOL_GPL(inotify_dentry_parent_queue_event);
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -331,12 +331,18 @@ extern char *dentry_path(struct dentry *
* and call dget_locked() instead of dget().
*/
+static inline struct dentry *dget_dlock(struct dentry *dentry)
+{
+ BUG_ON(!dentry->d_count);
+ dentry->d_count++;
+ return dentry;
+}
+
static inline struct dentry *dget(struct dentry *dentry)
{
if (dentry) {
- BUG_ON(!dentry->d_count);
spin_lock(&dentry->d_lock);
- dentry->d_count++;
+ dget_dlock(dentry);
spin_unlock(&dentry->d_lock);
}
return dentry;
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 08/14] fs: scale inode alias list
2009-03-29 15:55 [rfc] scale dcache locking npiggin
` (6 preceding siblings ...)
2009-03-29 15:55 ` [patch 07/14] fs: dcache scale subdirs npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-30 12:18 ` Andi Kleen
2009-03-29 15:55 ` [patch 09/14] fs: use RCU / seqlock logic for reverse and multi-step operaitons npiggin
` (6 subsequent siblings)
14 siblings, 1 reply; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: fs-dcache-scale-i_dentry.patch --]
[-- Type: text/plain, Size: 10749 bytes --]
Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
from concurrent modification. d_alias is also protected by d_lock.
---
fs/dcache.c | 56 +++++++++++++++++++++++++++++++++++++++-----
fs/notify/inotify/inotify.c | 2 +
fs/sysfs/dir.c | 3 ++
include/linux/dcache.h | 1
4 files changed, 56 insertions(+), 6 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -36,6 +36,8 @@
/*
* Usage:
+ * dcache_inode_lock protects:
+ * - the inode alias lists, d_inode
* dcache_hash_lock protects:
* - the dcache hash table
* dcache_lru_lock protects:
@@ -49,18 +51,21 @@
*
* Ordering:
* dcache_lock
- * dentry->d_lock
- * dcache_lru_lock
- * dcache_hash_lock
+ * dcache_inode_lock
+ * dentry->d_lock
+ * dcache_lru_lock
+ * dcache_hash_lock
*/
int sysctl_vfs_cache_pressure __read_mostly = 100;
EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
+__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_inode_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_hash_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lru_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
+EXPORT_SYMBOL(dcache_inode_lock);
EXPORT_SYMBOL(dcache_hash_lock);
EXPORT_SYMBOL(dcache_lock);
@@ -124,6 +129,7 @@ static void d_free(struct dentry *dentry
*/
static void dentry_iput(struct dentry * dentry)
__releases(dentry->d_lock)
+ __releases(dcache_inode_lock)
__releases(dcache_lock)
{
struct inode *inode = dentry->d_inode;
@@ -131,6 +137,7 @@ static void dentry_iput(struct dentry *
dentry->d_inode = NULL;
list_del_init(&dentry->d_alias);
spin_unlock(&dentry->d_lock);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
if (!inode->i_nlink)
fsnotify_inoderemove(inode);
@@ -140,6 +147,7 @@ static void dentry_iput(struct dentry *
iput(inode);
} else {
spin_unlock(&dentry->d_lock);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
}
}
@@ -211,6 +219,7 @@ static void dentry_lru_del_init(struct d
*/
static struct dentry *d_kill(struct dentry *dentry)
__releases(dentry->d_lock)
+ __releases(dcache_inode_lock)
__releases(dcache_lock)
{
struct dentry *parent;
@@ -275,16 +284,21 @@ repeat:
* want to reduce dcache_lock anyway so this will
* get improved.
*/
+drop1:
spin_unlock(&dentry->d_lock);
goto repeat;
}
+ if (!spin_trylock(&dcache_inode_lock)) {
+drop2:
+ spin_unlock(&dcache_lock);
+ goto drop1;
+ }
parent = dentry->d_parent;
if (parent) {
BUG_ON(parent == dentry);
if (!spin_trylock(&parent->d_lock)) {
- spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
- goto repeat;
+ spin_unlock(&dcache_inode_lock);
+ goto drop2;
}
}
}
@@ -311,6 +325,7 @@ repeat:
spin_unlock(&dentry->d_lock);
if (parent)
spin_unlock(&parent->d_lock);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
return;
@@ -463,7 +478,9 @@ struct dentry * d_find_alias(struct inod
if (!list_empty(&inode->i_dentry)) {
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
de = __d_find_alias(inode, 0);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
}
return de;
@@ -478,18 +495,21 @@ void d_prune_aliases(struct inode *inode
struct dentry *dentry;
restart:
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
spin_lock(&dentry->d_lock);
if (!dentry->d_count) {
__dget_locked_dlock(dentry);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
dput(dentry);
goto restart;
}
spin_unlock(&dentry->d_lock);
}
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
}
@@ -517,6 +537,7 @@ static void prune_one_dentry(struct dent
struct dentry *parent = NULL;
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
again:
spin_lock(&dentry->d_lock);
if (dentry->d_parent && dentry != dentry->d_parent) {
@@ -531,6 +552,7 @@ again:
if (parent)
spin_unlock(&parent->d_lock);
spin_unlock(&dentry->d_lock);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
return;
}
@@ -601,6 +623,7 @@ restart:
spin_unlock(&dcache_lru_lock);
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
again:
spin_lock(&dcache_lru_lock); /* lru_lock also protects tmp list */
while (!list_empty(&tmp)) {
@@ -633,8 +656,10 @@ again1:
prune_one_dentry(dentry);
/* dcache_lock and dentry->d_lock dropped */
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
spin_lock(&dcache_lru_lock);
}
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
if (count == NULL && !list_empty(&sb->s_dentry_lru))
@@ -1169,7 +1194,9 @@ void d_instantiate(struct dentry *entry,
{
BUG_ON(!list_empty(&entry->d_alias));
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
__d_instantiate(entry, inode);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
security_d_instantiate(entry, inode);
}
@@ -1229,7 +1256,9 @@ struct dentry *d_instantiate_unique(stru
BUG_ON(!list_empty(&entry->d_alias));
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
result = __d_instantiate_unique(entry, inode);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
if (!result) {
@@ -1319,8 +1348,10 @@ struct dentry *d_obtain_alias(struct ino
tmp->d_parent = tmp; /* make sure dput doesn't croak */
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
res = __d_find_alias(inode, 0);
if (res) {
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
dput(tmp);
goto out_iput;
@@ -1335,6 +1366,7 @@ struct dentry *d_obtain_alias(struct ino
list_add(&tmp->d_alias, &inode->i_dentry);
hlist_add_head(&tmp->d_hash, &inode->i_sb->s_anon);
spin_unlock(&tmp->d_lock);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
return tmp;
@@ -1367,9 +1399,11 @@ struct dentry *d_splice_alias(struct ino
if (inode && S_ISDIR(inode->i_mode)) {
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
new = __d_find_alias(inode, 1);
if (new) {
BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
security_d_instantiate(new, inode);
d_rehash(dentry);
@@ -1378,6 +1412,7 @@ struct dentry *d_splice_alias(struct ino
} else {
/* already taking dcache_lock, so d_add() by hand */
__d_instantiate(dentry, inode);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
security_d_instantiate(dentry, inode);
d_rehash(dentry);
@@ -1454,6 +1489,7 @@ struct dentry *d_add_ci(struct dentry *d
return found;
}
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
if (list_empty(&inode->i_dentry)) {
/*
* Directory without a 'disconnected' dentry; we need to do
@@ -1461,6 +1497,7 @@ struct dentry *d_add_ci(struct dentry *d
* we already hold.
*/
__d_instantiate(found, inode);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
security_d_instantiate(found, inode);
return found;
@@ -1471,6 +1508,7 @@ struct dentry *d_add_ci(struct dentry *d
*/
new = list_entry(inode->i_dentry.next, struct dentry, d_alias);
dget_locked(new);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
/* Do security vodoo. */
security_d_instantiate(found, inode);
@@ -1688,6 +1726,7 @@ void d_delete(struct dentry * dentry)
* Are we the only user?
*/
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
spin_lock(&dentry->d_lock);
isdir = S_ISDIR(dentry->d_inode->i_mode);
if (dentry->d_count == 1) {
@@ -1700,6 +1739,7 @@ void d_delete(struct dentry * dentry)
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
fsnotify_nameremove(dentry, isdir);
@@ -1944,6 +1984,7 @@ out_unalias:
d_move_locked(alias, dentry);
ret = alias;
out_err:
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
if (m2)
mutex_unlock(m2);
@@ -2009,6 +2050,7 @@ struct dentry *d_materialise_unique(stru
BUG_ON(!d_unhashed(dentry));
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
if (!inode) {
actual = dentry;
@@ -2053,6 +2095,7 @@ found:
_d_rehash(actual);
spin_unlock(&dcache_hash_lock);
spin_unlock(&actual->d_lock);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
out_nolock:
if (actual == dentry) {
@@ -2064,6 +2107,7 @@ out_nolock:
return actual;
shouldnt_be_hashed:
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
BUG();
}
Index: linux-2.6/fs/sysfs/dir.c
===================================================================
--- linux-2.6.orig/fs/sysfs/dir.c
+++ linux-2.6/fs/sysfs/dir.c
@@ -520,6 +520,7 @@ static void sysfs_drop_dentry(struct sys
*/
repeat:
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
spin_lock(&dentry->d_lock);
if (d_unhashed(dentry)) {
@@ -529,10 +530,12 @@ repeat:
dget_locked_dlock(dentry);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
dput(dentry);
goto repeat;
}
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
/* adjust nlink and update timestamp */
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -184,6 +184,7 @@ d_iput: no no no yes
#define DCACHE_COOKIE 0x0040 /* For use by dcookie subsystem */
+extern spinlock_t dcache_inode_lock;
extern spinlock_t dcache_hash_lock;
extern spinlock_t dcache_lock;
extern seqlock_t rename_lock;
Index: linux-2.6/fs/notify/inotify/inotify.c
===================================================================
--- linux-2.6.orig/fs/notify/inotify/inotify.c
+++ linux-2.6/fs/notify/inotify/inotify.c
@@ -185,6 +185,7 @@ static void set_dentry_child_flags(struc
struct dentry *alias;
spin_lock(&dcache_lock);
+ spin_lock(&dcache_inode_lock);
list_for_each_entry(alias, &inode->i_dentry, d_alias) {
struct dentry *child;
@@ -202,6 +203,7 @@ static void set_dentry_child_flags(struc
}
spin_unlock(&alias->d_lock);
}
+ spin_unlock(&dcache_inode_lock);
spin_unlock(&dcache_lock);
}
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 09/14] fs: use RCU / seqlock logic for reverse and multi-step operaitons
2009-03-29 15:55 [rfc] scale dcache locking npiggin
` (7 preceding siblings ...)
2009-03-29 15:55 ` [patch 08/14] fs: scale inode alias list npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-30 12:16 ` Andi Kleen
2009-03-29 15:55 ` [patch 10/14] fs: dcache remove dcache_lock npiggin
` (5 subsequent siblings)
14 siblings, 1 reply; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: fs-dcache_lock-multi-step.patch --]
[-- Type: text/plain, Size: 7589 bytes --]
The remaining usages for dcache_lock is to allow atomic, multi-step read-side
operations over the directory tree by excluding modifications to the tree.
Also, to walk in the leaf->root direction in the tree where we don't have
a natural d_lock ordering. This is the hardest bit.
This could be accomplished by taking every d_lock, but this would mean a
huge number of locks and actually gets very tricky.
Solve this instead by using the rename seqlock for multi-step read-side
operations. Insert operations are not serialised. Delete operations are
tricky when walking up the directory our parent might have been deleted
when dropping locks so also need to check and retry for that. (XXX: I'm
not sure if I've quite got this correct...)
---
fs/dcache.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
fs/seq_file.c | 6 +++
2 files changed, 93 insertions(+), 11 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -912,11 +912,15 @@ void shrink_dcache_for_umount(struct sup
* Return true if the parent or its subdirectories contain
* a mount point
*/
-
int have_submounts(struct dentry *parent)
{
- struct dentry *this_parent = parent;
+ struct dentry *this_parent;
struct list_head *next;
+ unsigned seq;
+
+rename_retry:
+ this_parent = parent;
+ seq = read_seqbegin(&rename_lock);
spin_lock(&dcache_lock);
if (d_mountpoint(parent))
@@ -948,17 +952,34 @@ resume:
* All done at this level ... ascend and resume the search.
*/
if (this_parent != parent) {
+ struct dentry *tmp;
+
next = this_parent->d_u.d_child.next;
+ tmp = this_parent->d_parent;
+ rcu_read_lock();
spin_unlock(&this_parent->d_lock);
- this_parent = this_parent->d_parent;
+ this_parent = tmp;
spin_lock(&this_parent->d_lock);
+ /* might go back up the wrong parent if we have had a rename
+ * or deletion */
+ if (d_unhashed(this_parent) || read_seqretry(&rename_lock, seq)) {
+ spin_unlock(&this_parent->d_lock);
+ spin_unlock(&dcache_lock);
+ rcu_read_unlock();
+ goto rename_retry;
+ }
+ rcu_read_unlock();
goto resume;
}
spin_unlock(&this_parent->d_lock);
spin_unlock(&dcache_lock);
+ if (read_seqretry(&rename_lock, seq))
+ goto rename_retry;
return 0; /* No mount points found in tree */
positive:
spin_unlock(&dcache_lock);
+ if (read_seqretry(&rename_lock, seq))
+ goto rename_retry;
return 1;
}
@@ -978,10 +999,15 @@ positive:
*/
static int select_parent(struct dentry * parent)
{
- struct dentry *this_parent = parent;
+ struct dentry *this_parent;
struct list_head *next;
+ unsigned seq;
int found = 0;
+rename_retry:
+ this_parent = parent;
+ seq = read_seqbegin(&rename_lock);
+
spin_lock(&dcache_lock);
spin_lock(&this_parent->d_lock);
repeat:
@@ -1032,17 +1058,31 @@ resume:
*/
if (this_parent != parent) {
struct dentry *tmp;
+
next = this_parent->d_u.d_child.next;
tmp = this_parent->d_parent;
+ rcu_read_lock();
spin_unlock(&this_parent->d_lock);
- BUG_ON(tmp == this_parent);
this_parent = tmp;
spin_lock(&this_parent->d_lock);
+ /* might go back up the wrong parent if we have had a rename
+ * or deletion */
+ if (this_parent != parent &&
+ (/* d_unhashed(this_parent) XXX: hmm... */ 0 ||
+ read_seqretry(&rename_lock, seq))) {
+ spin_unlock(&this_parent->d_lock);
+ spin_unlock(&dcache_lock);
+ rcu_read_unlock();
+ goto rename_retry;
+ }
+ rcu_read_unlock();
goto resume;
}
out:
spin_unlock(&this_parent->d_lock);
spin_unlock(&dcache_lock);
+ if (read_seqretry(&rename_lock, seq))
+ goto rename_retry;
return found;
}
@@ -2154,6 +2194,7 @@ char *__d_path(const struct path *path,
char *end = buffer + buflen;
char *retval;
+ rcu_read_lock();
prepend(&end, &buflen, "\0", 1);
if (!IS_ROOT(dentry) && d_unhashed(dentry) &&
(prepend(&end, &buflen, " (deleted)", 10) != 0))
@@ -2189,6 +2230,7 @@ char *__d_path(const struct path *path,
}
out:
+ rcu_read_unlock();
return retval;
global_root:
@@ -2225,6 +2267,7 @@ char *d_path(const struct path *path, ch
char *res;
struct path root;
struct path tmp;
+ unsigned seq;
/*
* We have various synthetic filesystems that never get mounted. On
@@ -2240,6 +2283,9 @@ char *d_path(const struct path *path, ch
root = current->fs->root;
path_get(&root);
read_unlock(¤t->fs->lock);
+
+rename_retry:
+ seq = read_seqbegin(&rename_lock);
spin_lock(&dcache_lock);
spin_lock(&vfsmount_lock);
spin_lock(&path->dentry->d_lock);
@@ -2248,6 +2294,9 @@ char *d_path(const struct path *path, ch
spin_unlock(&path->dentry->d_lock);
spin_unlock(&vfsmount_lock);
spin_unlock(&dcache_lock);
+ if (read_seqretry(&rename_lock, seq))
+ goto rename_retry;
+
path_put(&root);
return res;
}
@@ -2278,9 +2327,14 @@ char *dynamic_dname(struct dentry *dentr
*/
char *dentry_path(struct dentry *dentry, char *buf, int buflen)
{
- char *end = buf + buflen;
+ char *end;
char *retval;
+ unsigned seq;
+rename_retry:
+ end = buf + buflen;
+ seq = read_seqbegin(&rename_lock);
+ rcu_read_lock(); /* protect parent */
spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
prepend(&end, &buflen, "\0", 1);
@@ -2304,13 +2358,16 @@ char *dentry_path(struct dentry *dentry,
retval = end;
dentry = parent;
}
+out:
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_lock);
+ rcu_read_unlock();
+ if (read_seqretry(&rename_lock, seq))
+ goto rename_retry;
return retval;
Elong:
- spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
- return ERR_PTR(-ENAMETOOLONG);
+ retval = ERR_PTR(-ENAMETOOLONG);
+ goto out;
}
/*
@@ -2430,9 +2487,13 @@ int is_subdir(struct dentry *new_dentry,
void d_genocide(struct dentry *root)
{
- struct dentry *this_parent = root;
+ struct dentry *this_parent;
struct list_head *next;
+ unsigned seq;
+rename_retry:
+ this_parent = root;
+ seq = read_seqbegin(&rename_lock);
spin_lock(&dcache_lock);
spin_lock(&this_parent->d_lock);
repeat:
@@ -2456,15 +2517,30 @@ resume:
spin_unlock(&dentry->d_lock);
}
if (this_parent != root) {
+ struct dentry *tmp;
+
next = this_parent->d_u.d_child.next;
+ tmp = this_parent->d_parent;
this_parent->d_count--;
+ rcu_read_lock();
spin_unlock(&this_parent->d_lock);
- this_parent = this_parent->d_parent;
+ this_parent = tmp;
spin_lock(&this_parent->d_lock);
+ /* might go back up the wrong parent if we have had a rename
+ * or deletion */
+ if (d_unhashed(this_parent) || read_seqretry(&rename_lock, seq)) {
+ spin_unlock(&this_parent->d_lock);
+ spin_unlock(&dcache_lock);
+ rcu_read_unlock();
+ goto rename_retry;
+ }
+ rcu_read_unlock();
goto resume;
}
spin_unlock(&this_parent->d_lock);
spin_unlock(&dcache_lock);
+ if (read_seqretry(&rename_lock, seq))
+ goto rename_retry;
}
/**
Index: linux-2.6/fs/seq_file.c
===================================================================
--- linux-2.6.orig/fs/seq_file.c
+++ linux-2.6/fs/seq_file.c
@@ -459,12 +459,18 @@ int seq_path_root(struct seq_file *m, st
if (m->count < m->size) {
char *s = m->buf + m->count;
char *p;
+ unsigned seq;
+rename_retry:
+ seq = read_seqbegin(&rename_lock);
spin_lock(&dcache_lock);
spin_lock(&vfsmount_lock);
p = __d_path(path, root, s, m->size - m->count);
spin_unlock(&vfsmount_lock);
spin_unlock(&dcache_lock);
+ if (read_seqretry(&rename_lock, seq))
+ goto rename_retry;
+
err = PTR_ERR(p);
if (!IS_ERR(p)) {
s = mangle_path(s, p, esc);
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 10/14] fs: dcache remove dcache_lock
2009-03-29 15:55 [rfc] scale dcache locking npiggin
` (8 preceding siblings ...)
2009-03-29 15:55 ` [patch 09/14] fs: use RCU / seqlock logic for reverse and multi-step operaitons npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-29 15:55 ` [patch 11/14] fs: dcache reduce dput locking npiggin
` (4 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: fs-dcache_lock-remove.patch --]
[-- Type: text/plain, Size: 27055 bytes --]
dcache_lock no longer protects anything (I hope). remove it.
This breaks a lot of the tree where I haven't thought about the problem,
but it simplifies the dcache.c code quite a bit (and it's also probably
a good thing to break unconverted code). So I include this here before
making further changes to the locking.
---
fs/dcache.c | 134 ++++++--------------------------------------
fs/libfs.c | 8 --
fs/namei.c | 5 -
fs/notify/inotify/inotify.c | 2
fs/seq_file.c | 2
fs/sysfs/dir.c | 3
include/linux/dcache.h | 17 ++---
7 files changed, 26 insertions(+), 145 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -50,11 +50,10 @@
* - d_subdirs and children's d_child
*
* Ordering:
- * dcache_lock
- * dcache_inode_lock
- * dentry->d_lock
- * dcache_lru_lock
- * dcache_hash_lock
+ * dcache_inode_lock
+ * dentry->d_lock
+ * dcache_lru_lock
+ * dcache_hash_lock
*/
int sysctl_vfs_cache_pressure __read_mostly = 100;
EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
@@ -62,12 +61,10 @@ EXPORT_SYMBOL_GPL(sysctl_vfs_cache_press
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_inode_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_hash_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lru_lock);
-__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
EXPORT_SYMBOL(dcache_inode_lock);
EXPORT_SYMBOL(dcache_hash_lock);
-EXPORT_SYMBOL(dcache_lock);
static struct kmem_cache *dentry_cache __read_mostly;
@@ -109,7 +106,7 @@ static void d_callback(struct rcu_head *
}
/*
- * no dcache_lock, please.
+ * no locks, please.
*/
static void d_free(struct dentry *dentry)
{
@@ -130,7 +127,6 @@ static void d_free(struct dentry *dentry
static void dentry_iput(struct dentry * dentry)
__releases(dentry->d_lock)
__releases(dcache_inode_lock)
- __releases(dcache_lock)
{
struct inode *inode = dentry->d_inode;
if (inode) {
@@ -138,7 +134,6 @@ static void dentry_iput(struct dentry *
list_del_init(&dentry->d_alias);
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
if (!inode->i_nlink)
fsnotify_inoderemove(inode);
if (dentry->d_op && dentry->d_op->d_iput)
@@ -148,7 +143,6 @@ static void dentry_iput(struct dentry *
} else {
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
}
}
@@ -214,13 +208,12 @@ static void dentry_lru_del_init(struct d
*
* If this is the root of the dentry tree, return NULL.
*
- * dcache_lock and d_lock and d_parent->d_lock must be held by caller, and
+ * d_lock and d_parent->d_lock must be held by caller, and
* are dropped by d_kill.
*/
static struct dentry *d_kill(struct dentry *dentry)
__releases(dentry->d_lock)
__releases(dcache_inode_lock)
- __releases(dcache_lock)
{
struct dentry *parent;
@@ -277,21 +270,10 @@ repeat:
might_sleep();
spin_lock(&dentry->d_lock);
if (dentry->d_count == 1) {
- if (!spin_trylock(&dcache_lock)) {
- /*
- * Something of a livelock possibility we could avoid
- * by taking dcache_lock and trying again, but we
- * want to reduce dcache_lock anyway so this will
- * get improved.
- */
-drop1:
- spin_unlock(&dentry->d_lock);
- goto repeat;
- }
if (!spin_trylock(&dcache_inode_lock)) {
drop2:
- spin_unlock(&dcache_lock);
- goto drop1;
+ spin_unlock(&dentry->d_lock);
+ goto repeat;
}
parent = dentry->d_parent;
if (parent) {
@@ -305,7 +287,6 @@ drop2:
dentry->d_count--;
if (dentry->d_count) {
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
return;
}
@@ -327,7 +308,6 @@ drop2:
if (parent)
spin_unlock(&parent->d_lock);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
return;
unhash_it:
@@ -357,11 +337,9 @@ int d_invalidate(struct dentry * dentry)
/*
* If it's already been dropped, return OK.
*/
- spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
if (d_unhashed(dentry)) {
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
return 0;
}
/*
@@ -370,9 +348,7 @@ int d_invalidate(struct dentry * dentry)
*/
if (!list_empty(&dentry->d_subdirs)) {
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
shrink_dcache_parent(dentry);
- spin_lock(&dcache_lock);
}
/*
@@ -389,18 +365,15 @@ int d_invalidate(struct dentry * dentry)
if (dentry->d_count > 1) {
if (dentry->d_inode && S_ISDIR(dentry->d_inode->i_mode)) {
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
return -EBUSY;
}
}
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
return 0;
}
-/* This should be called _only_ with dcache_lock held */
static inline struct dentry * __dget_locked_dlock(struct dentry *dentry)
{
dentry->d_count++;
@@ -478,11 +451,9 @@ struct dentry * d_find_alias(struct inod
struct dentry *de = NULL;
if (!list_empty(&inode->i_dentry)) {
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
de = __d_find_alias(inode, 0);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
}
return de;
}
@@ -495,7 +466,6 @@ void d_prune_aliases(struct inode *inode
{
struct dentry *dentry;
restart:
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
spin_lock(&dentry->d_lock);
@@ -504,14 +474,12 @@ restart:
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
dput(dentry);
goto restart;
}
spin_unlock(&dentry->d_lock);
}
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
}
/*
@@ -524,20 +492,16 @@ restart:
*/
static void prune_one_dentry(struct dentry * dentry)
__releases(dentry->d_lock)
- __releases(dcache_lock)
- __acquires(dcache_lock)
{
__d_drop(dentry);
dentry = d_kill(dentry);
/*
- * Prune ancestors. Locking is simpler than in dput(),
- * because dcache_lock needs to be taken anyway.
+ * Prune ancestors.
*/
while (dentry) {
struct dentry *parent = NULL;
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
again:
spin_lock(&dentry->d_lock);
@@ -554,7 +518,6 @@ again:
spin_unlock(&parent->d_lock);
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
return;
}
@@ -623,7 +586,6 @@ restart:
}
spin_unlock(&dcache_lru_lock);
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
again:
spin_lock(&dcache_lru_lock); /* lru_lock also protects tmp list */
@@ -654,14 +616,13 @@ again1:
}
__dentry_lru_del_init(dentry);
spin_unlock(&dcache_lru_lock);
+
prune_one_dentry(dentry);
- /* dcache_lock and dentry->d_lock dropped */
- spin_lock(&dcache_lock);
+ /* dentry->d_lock dropped */
spin_lock(&dcache_inode_lock);
spin_lock(&dcache_lru_lock);
}
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
if (count == NULL && !list_empty(&sb->s_dentry_lru))
goto restart;
@@ -691,7 +652,6 @@ static void prune_dcache(int count)
if (unused == 0 || count == 0)
return;
- spin_lock(&dcache_lock);
restart:
if (count >= unused)
prune_ratio = 1;
@@ -727,11 +687,9 @@ restart:
if (down_read_trylock(&sb->s_umount)) {
if ((sb->s_root != NULL) &&
(!list_empty(&sb->s_dentry_lru))) {
- spin_unlock(&dcache_lock);
__shrink_dcache_sb(sb, &w_count,
DCACHE_REFERENCED);
pruned -= w_count;
- spin_lock(&dcache_lock);
}
up_read(&sb->s_umount);
}
@@ -747,7 +705,6 @@ restart:
}
}
spin_unlock(&sb_lock);
- spin_unlock(&dcache_lock);
}
/**
@@ -776,12 +733,10 @@ static void shrink_dcache_for_umount_sub
BUG_ON(!IS_ROOT(dentry));
/* detach this root from the system */
- spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
dentry_lru_del_init(dentry);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
for (;;) {
/* descend to the first leaf in the current subtree */
@@ -790,7 +745,6 @@ static void shrink_dcache_for_umount_sub
/* this is a branch with children - detach all of them
* from the system in one go */
- spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
list_for_each_entry(loop, &dentry->d_subdirs,
d_u.d_child) {
@@ -800,7 +754,6 @@ static void shrink_dcache_for_umount_sub
spin_unlock(&loop->d_lock);
}
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
/* move to the first child */
dentry = list_entry(dentry->d_subdirs.next,
@@ -871,8 +824,7 @@ out:
/*
* destroy the dentries attached to a superblock on unmounting
- * - we don't need to use dentry->d_lock, and only need dcache_lock when
- * removing the dentry from the system lists and hashes because:
+ * - we don't need to use dentry->d_lock because:
* - the superblock is detached from all mountings and open files, so the
* dentry trees will not be rearranged by the VFS
* - s_umount is write-locked, so the memory pressure shrinker will ignore
@@ -923,7 +875,6 @@ rename_retry:
this_parent = parent;
seq = read_seqbegin(&rename_lock);
- spin_lock(&dcache_lock);
if (d_mountpoint(parent))
goto positive;
spin_lock(&this_parent->d_lock);
@@ -965,7 +916,6 @@ resume:
* or deletion */
if (d_unhashed(this_parent) || read_seqretry(&rename_lock, seq)) {
spin_unlock(&this_parent->d_lock);
- spin_unlock(&dcache_lock);
rcu_read_unlock();
goto rename_retry;
}
@@ -973,12 +923,10 @@ resume:
goto resume;
}
spin_unlock(&this_parent->d_lock);
- spin_unlock(&dcache_lock);
if (read_seqretry(&rename_lock, seq))
goto rename_retry;
return 0; /* No mount points found in tree */
positive:
- spin_unlock(&dcache_lock);
if (read_seqretry(&rename_lock, seq))
goto rename_retry;
return 1;
@@ -1009,7 +957,6 @@ rename_retry:
this_parent = parent;
seq = read_seqbegin(&rename_lock);
- spin_lock(&dcache_lock);
spin_lock(&this_parent->d_lock);
repeat:
next = this_parent->d_subdirs.next;
@@ -1072,7 +1019,6 @@ resume:
(/* d_unhashed(this_parent) XXX: hmm... */ 0 ||
read_seqretry(&rename_lock, seq))) {
spin_unlock(&this_parent->d_lock);
- spin_unlock(&dcache_lock);
rcu_read_unlock();
goto rename_retry;
}
@@ -1081,7 +1027,6 @@ resume:
}
out:
spin_unlock(&this_parent->d_lock);
- spin_unlock(&dcache_lock);
if (read_seqretry(&rename_lock, seq))
goto rename_retry;
return found;
@@ -1181,7 +1126,6 @@ struct dentry *d_alloc(struct dentry * p
INIT_LIST_HEAD(&dentry->d_u.d_child);
if (parent) {
- spin_lock(&dcache_lock);
spin_lock(&parent->d_lock);
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
dentry->d_parent = dget_dlock(parent);
@@ -1189,7 +1133,6 @@ struct dentry *d_alloc(struct dentry * p
list_add(&dentry->d_u.d_child, &parent->d_subdirs);
spin_unlock(&dentry->d_lock);
spin_unlock(&parent->d_lock);
- spin_unlock(&dcache_lock);
}
atomic_inc(&dentry_stat.nr_dentry);
@@ -1207,7 +1150,6 @@ struct dentry *d_alloc_name(struct dentr
return d_alloc(parent, &q);
}
-/* the caller must hold dcache_lock */
static void __d_instantiate(struct dentry *dentry, struct inode *inode)
{
if (inode)
@@ -1234,11 +1176,9 @@ static void __d_instantiate(struct dentr
void d_instantiate(struct dentry *entry, struct inode * inode)
{
BUG_ON(!list_empty(&entry->d_alias));
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
__d_instantiate(entry, inode);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
security_d_instantiate(entry, inode);
}
@@ -1296,11 +1236,9 @@ struct dentry *d_instantiate_unique(stru
BUG_ON(!list_empty(&entry->d_alias));
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
result = __d_instantiate_unique(entry, inode);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
if (!result) {
security_d_instantiate(entry, inode);
@@ -1388,12 +1326,10 @@ struct dentry *d_obtain_alias(struct ino
}
tmp->d_parent = tmp; /* make sure dput doesn't croak */
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
res = __d_find_alias(inode, 0);
if (res) {
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
dput(tmp);
goto out_iput;
}
@@ -1409,7 +1345,6 @@ struct dentry *d_obtain_alias(struct ino
spin_unlock(&tmp->d_lock);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
return tmp;
out_iput:
@@ -1439,22 +1374,19 @@ struct dentry *d_splice_alias(struct ino
struct dentry *new = NULL;
if (inode && S_ISDIR(inode->i_mode)) {
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
new = __d_find_alias(inode, 1);
if (new) {
BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
security_d_instantiate(new, inode);
d_rehash(dentry);
d_move(new, dentry);
iput(inode);
} else {
- /* already taking dcache_lock, so d_add() by hand */
+ /* already taken dcache_inode_lock, d_add() by hand */
__d_instantiate(dentry, inode);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
security_d_instantiate(dentry, inode);
d_rehash(dentry);
}
@@ -1529,17 +1461,15 @@ struct dentry *d_add_ci(struct dentry *d
d_instantiate(found, inode);
return found;
}
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
if (list_empty(&inode->i_dentry)) {
/*
* Directory without a 'disconnected' dentry; we need to do
- * d_instantiate() by hand because it takes dcache_lock which
+ * d_instantiate() by hand because it takes dcache_inode_lock which
* we already hold.
*/
__d_instantiate(found, inode);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
security_d_instantiate(found, inode);
return found;
}
@@ -1550,7 +1480,6 @@ struct dentry *d_add_ci(struct dentry *d
new = list_entry(inode->i_dentry.next, struct dentry, d_alias);
dget_locked(new);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
/* Do security vodoo. */
security_d_instantiate(found, inode);
/* Move new in place of found. */
@@ -1577,7 +1506,7 @@ err_out:
* is returned. The caller must use dput to free the entry when it has
* finished using it. %NULL is returned on failure.
*
- * __d_lookup is dcache_lock free. The hash list is protected using RCU.
+ * __d_lookup is global lock free. The hash list is protected using RCU.
* Memory barriers are used while updating and doing lockless traversal.
* To avoid races with d_move while rename is happening, d_lock is used.
*
@@ -1589,7 +1518,7 @@ err_out:
*
* The dentry unused LRU is not updated even if lookup finds the required dentry
* in there. It is updated in places such as prune_dcache, shrink_dcache_sb,
- * select_parent and __dget_locked. This laziness saves lookup from dcache_lock
+ * select_parent and __dget_locked. This laziness saves lookup from LRU lock
* acquisition.
*
* d_lookup() is protected against the concurrent renames in some unrelated
@@ -1719,22 +1648,19 @@ int d_validate(struct dentry *dentry, st
if (dentry->d_parent != dparent)
goto out;
- spin_lock(&dcache_lock);
spin_lock(&dcache_hash_lock);
base = d_hash(dparent, dentry->d_name.hash);
hlist_for_each(lhp,base) {
/* hlist_for_each_entry_rcu() not required for d_hash list
- * as it is parsed under dcache_lock
+ * as it is parsed under dcache_hash_lock
*/
if (dentry == hlist_entry(lhp, struct dentry, d_hash)) {
spin_unlock(&dcache_hash_lock);
__dget_locked(dentry);
- spin_unlock(&dcache_lock);
return 1;
}
}
spin_unlock(&dcache_hash_lock);
- spin_unlock(&dcache_lock);
out:
return 0;
}
@@ -1766,7 +1692,6 @@ void d_delete(struct dentry * dentry)
/*
* Are we the only user?
*/
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
spin_lock(&dentry->d_lock);
isdir = S_ISDIR(dentry->d_inode->i_mode);
@@ -1781,7 +1706,6 @@ void d_delete(struct dentry * dentry)
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
fsnotify_nameremove(dentry, isdir);
}
@@ -1807,13 +1731,11 @@ static void _d_rehash(struct dentry * en
void d_rehash(struct dentry * entry)
{
- spin_lock(&dcache_lock);
spin_lock(&entry->d_lock);
spin_lock(&dcache_hash_lock);
_d_rehash(entry);
spin_unlock(&dcache_hash_lock);
spin_unlock(&entry->d_lock);
- spin_unlock(&dcache_lock);
}
/*
@@ -1967,9 +1889,7 @@ static void d_move_locked(struct dentry
void d_move(struct dentry * dentry, struct dentry * target)
{
- spin_lock(&dcache_lock);
d_move_locked(dentry, target);
- spin_unlock(&dcache_lock);
}
/**
@@ -1995,13 +1915,12 @@ struct dentry *d_ancestor(struct dentry
* This helper attempts to cope with remotely renamed directories
*
* It assumes that the caller is already holding
- * dentry->d_parent->d_inode->i_mutex and the dcache_lock
+ * dentry->d_parent->d_inode->i_mutex
*
* Note: If ever the locking in lock_rename() changes, then please
* remember to update this too...
*/
static struct dentry *__d_unalias(struct dentry *dentry, struct dentry *alias)
- __releases(dcache_lock)
{
struct mutex *m1 = NULL, *m2 = NULL;
struct dentry *ret;
@@ -2028,7 +1947,6 @@ out_unalias:
ret = alias;
out_err:
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
if (m2)
mutex_unlock(m2);
if (m1)
@@ -2092,7 +2010,6 @@ struct dentry *d_materialise_unique(stru
BUG_ON(!d_unhashed(dentry));
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
if (!inode) {
@@ -2139,7 +2056,6 @@ found:
spin_unlock(&dcache_hash_lock);
spin_unlock(&actual->d_lock);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
out_nolock:
if (actual == dentry) {
security_d_instantiate(dentry, inode);
@@ -2151,7 +2067,6 @@ out_nolock:
shouldnt_be_hashed:
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
BUG();
}
@@ -2183,8 +2098,7 @@ static int prepend_name(char **buffer, i
* Returns a pointer into the buffer or an error code if the
* path was too long.
*
- * "buflen" should be positive. Caller holds the dcache_lock and
- * path->dentry->d_lock.
+ * "buflen" should be positive. Caller holds the path->dentry->d_lock.
*
* If path is not reachable from the supplied root, then the value of
* root is changed (without modifying refcounts).
@@ -2289,14 +2203,12 @@ char *d_path(const struct path *path, ch
rename_retry:
seq = read_seqbegin(&rename_lock);
- spin_lock(&dcache_lock);
spin_lock(&vfsmount_lock);
spin_lock(&path->dentry->d_lock);
tmp = root;
res = __d_path(path, &tmp, buf, buflen);
spin_unlock(&path->dentry->d_lock);
spin_unlock(&vfsmount_lock);
- spin_unlock(&dcache_lock);
if (read_seqretry(&rename_lock, seq))
goto rename_retry;
@@ -2338,7 +2250,6 @@ rename_retry:
end = buf + buflen;
seq = read_seqbegin(&rename_lock);
rcu_read_lock(); /* protect parent */
- spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
prepend(&end, &buflen, "\0", 1);
if (!IS_ROOT(dentry) && d_unhashed(dentry) &&
@@ -2363,7 +2274,6 @@ rename_retry:
}
out:
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
rcu_read_unlock();
if (read_seqretry(&rename_lock, seq))
goto rename_retry;
@@ -2409,7 +2319,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
error = -ENOENT;
/* Has the current directory has been unlinked? */
- spin_lock(&dcache_lock);
spin_lock(&vfsmount_lock);
spin_lock(&pwd.dentry->d_lock);
if (IS_ROOT(pwd.dentry) || !d_unhashed(pwd.dentry)) {
@@ -2420,7 +2329,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
cwd = __d_path(&pwd, &tmp, page, PAGE_SIZE);
spin_unlock(&pwd.dentry->d_lock);
spin_unlock(&vfsmount_lock);
- spin_unlock(&dcache_lock);
error = PTR_ERR(cwd);
if (IS_ERR(cwd))
@@ -2435,7 +2343,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
}
} else {
spin_unlock(&pwd.dentry->d_lock);
- spin_unlock(&dcache_lock);
}
out:
@@ -2497,7 +2404,6 @@ void d_genocide(struct dentry *root)
rename_retry:
this_parent = root;
seq = read_seqbegin(&rename_lock);
- spin_lock(&dcache_lock);
spin_lock(&this_parent->d_lock);
repeat:
next = this_parent->d_subdirs.next;
@@ -2533,7 +2439,6 @@ resume:
* or deletion */
if (d_unhashed(this_parent) || read_seqretry(&rename_lock, seq)) {
spin_unlock(&this_parent->d_lock);
- spin_unlock(&dcache_lock);
rcu_read_unlock();
goto rename_retry;
}
@@ -2541,7 +2446,6 @@ resume:
goto resume;
}
spin_unlock(&this_parent->d_lock);
- spin_unlock(&dcache_lock);
if (read_seqretry(&rename_lock, seq))
goto rename_retry;
}
Index: linux-2.6/fs/libfs.c
===================================================================
--- linux-2.6.orig/fs/libfs.c
+++ linux-2.6/fs/libfs.c
@@ -101,7 +101,6 @@ loff_t dcache_dir_lseek(struct file *fil
struct dentry *cursor = file->private_data;
loff_t n = file->f_pos - 2;
- spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
list_del(&cursor->d_u.d_child);
p = dentry->d_subdirs.next;
@@ -116,7 +115,6 @@ loff_t dcache_dir_lseek(struct file *fil
}
list_add_tail(&cursor->d_u.d_child, p);
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
}
}
mutex_unlock(&dentry->d_inode->i_mutex);
@@ -159,7 +157,6 @@ int dcache_readdir(struct file * filp, v
i++;
/* fallthrough */
default:
- spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
if (filp->f_pos == 2)
list_move(q, &dentry->d_subdirs);
@@ -175,13 +172,11 @@ int dcache_readdir(struct file * filp, v
spin_unlock(&next->d_lock);
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
if (filldir(dirent, next->d_name.name,
next->d_name.len, filp->f_pos,
next->d_inode->i_ino,
dt_type(next->d_inode)) < 0)
return 0;
- spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
/* next is still alive */
list_move(q, p);
@@ -189,7 +184,6 @@ int dcache_readdir(struct file * filp, v
filp->f_pos++;
}
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
}
return 0;
}
@@ -285,7 +279,6 @@ int simple_empty(struct dentry *dentry)
struct dentry *child;
int ret = 0;
- spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
list_for_each_entry(child, &dentry->d_subdirs, d_u.d_child) {
spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
@@ -298,7 +291,6 @@ int simple_empty(struct dentry *dentry)
ret = 1;
out:
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
return ret;
}
Index: linux-2.6/fs/namei.c
===================================================================
--- linux-2.6.orig/fs/namei.c
+++ linux-2.6/fs/namei.c
@@ -752,14 +752,11 @@ static __always_inline void follow_dotdo
break;
}
read_unlock(&fs->lock);
- spin_lock(&dcache_lock);
if (nd->path.dentry != nd->path.mnt->mnt_root) {
nd->path.dentry = dget(nd->path.dentry->d_parent);
- spin_unlock(&dcache_lock);
dput(old);
break;
}
- spin_unlock(&dcache_lock);
spin_lock(&vfsmount_lock);
parent = nd->path.mnt->mnt_parent;
if (parent == nd->path.mnt) {
@@ -2105,12 +2102,10 @@ void dentry_unhash(struct dentry *dentry
{
dget(dentry);
shrink_dcache_parent(dentry);
- spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
if (dentry->d_count == 2)
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
}
int vfs_rmdir(struct inode *dir, struct dentry *dentry)
Index: linux-2.6/fs/seq_file.c
===================================================================
--- linux-2.6.orig/fs/seq_file.c
+++ linux-2.6/fs/seq_file.c
@@ -463,11 +463,9 @@ int seq_path_root(struct seq_file *m, st
rename_retry:
seq = read_seqbegin(&rename_lock);
- spin_lock(&dcache_lock);
spin_lock(&vfsmount_lock);
p = __d_path(path, root, s, m->size - m->count);
spin_unlock(&vfsmount_lock);
- spin_unlock(&dcache_lock);
if (read_seqretry(&rename_lock, seq))
goto rename_retry;
Index: linux-2.6/fs/sysfs/dir.c
===================================================================
--- linux-2.6.orig/fs/sysfs/dir.c
+++ linux-2.6/fs/sysfs/dir.c
@@ -519,7 +519,6 @@ static void sysfs_drop_dentry(struct sys
* dput to immediately free the dentry if it is not in use.
*/
repeat:
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
spin_lock(&dentry->d_lock);
@@ -531,12 +530,10 @@ repeat:
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
dput(dentry);
goto repeat;
}
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
/* adjust nlink and update timestamp */
mutex_lock(&inode->i_mutex);
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -150,13 +150,13 @@ struct dentry_operations {
/*
locking rules:
- big lock dcache_lock d_lock may block
-d_revalidate: no no no yes
-d_hash no no no yes
-d_compare: no yes yes no
-d_delete: no yes no no
-d_release: no no no yes
-d_iput: no no no yes
+ big lock d_lock may block
+d_revalidate: no no yes
+d_hash no no yes
+d_compare: no yes no
+d_delete: no no no
+d_release: no no yes
+d_iput: no no yes
*/
/* d_flags entries */
@@ -186,7 +186,6 @@ d_iput: no no no yes
extern spinlock_t dcache_inode_lock;
extern spinlock_t dcache_hash_lock;
-extern spinlock_t dcache_lock;
extern seqlock_t rename_lock;
/**
@@ -217,11 +216,9 @@ static inline void __d_drop(struct dentr
static inline void d_drop(struct dentry *dentry)
{
- spin_lock(&dcache_lock);
spin_lock(&dentry->d_lock);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_lock);
}
static inline int dname_external(struct dentry *dentry)
Index: linux-2.6/fs/notify/inotify/inotify.c
===================================================================
--- linux-2.6.orig/fs/notify/inotify/inotify.c
+++ linux-2.6/fs/notify/inotify/inotify.c
@@ -184,7 +184,6 @@ static void set_dentry_child_flags(struc
{
struct dentry *alias;
- spin_lock(&dcache_lock);
spin_lock(&dcache_inode_lock);
list_for_each_entry(alias, &inode->i_dentry, d_alias) {
struct dentry *child;
@@ -204,7 +203,6 @@ static void set_dentry_child_flags(struc
spin_unlock(&alias->d_lock);
}
spin_unlock(&dcache_inode_lock);
- spin_unlock(&dcache_lock);
}
/*
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 11/14] fs: dcache reduce dput locking
2009-03-29 15:55 [rfc] scale dcache locking npiggin
` (9 preceding siblings ...)
2009-03-29 15:55 ` [patch 10/14] fs: dcache remove dcache_lock npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-29 15:55 ` [patch 12/14] fs: dcache per-bucket dcache hash locking npiggin
` (3 subsequent siblings)
14 siblings, 0 replies; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: dcache-dput-less-dcache_lock.patch --]
[-- Type: text/plain, Size: 2539 bytes --]
It is possible to run dput without taking locks up-front. In many cases
where we don't kill the dentry anyway, these locks are not required.
(I think... need to think about it more). Further changes ->d_delete
locking which is not all audited.
---
fs/dcache.c | 56 ++++++++++++++++++++++++++++++--------------------------
1 file changed, 30 insertions(+), 26 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -262,6 +262,7 @@ static struct dentry *d_kill(struct dent
void dput(struct dentry *dentry)
{
struct dentry *parent;
+
if (!dentry)
return;
@@ -269,23 +270,8 @@ repeat:
if (dentry->d_count == 1)
might_sleep();
spin_lock(&dentry->d_lock);
- if (dentry->d_count == 1) {
- if (!spin_trylock(&dcache_inode_lock)) {
-drop2:
- spin_unlock(&dentry->d_lock);
- goto repeat;
- }
- parent = dentry->d_parent;
- if (parent) {
- BUG_ON(parent == dentry);
- if (!spin_trylock(&parent->d_lock)) {
- spin_unlock(&dcache_inode_lock);
- goto drop2;
- }
- }
- }
- dentry->d_count--;
- if (dentry->d_count) {
+ if (dentry->d_count > 1) {
+ dentry->d_count--;
spin_unlock(&dentry->d_lock);
return;
}
@@ -294,8 +280,10 @@ drop2:
* AV: ->d_delete() is _NOT_ allowed to block now.
*/
if (dentry->d_op && dentry->d_op->d_delete) {
- if (dentry->d_op->d_delete(dentry))
- goto unhash_it;
+ if (dentry->d_op->d_delete(dentry)) {
+ __d_drop(dentry);
+ goto kill_it;
+ }
}
/* Unreachable? Get rid of it */
if (d_unhashed(dentry))
@@ -304,15 +292,31 @@ drop2:
dentry->d_flags |= DCACHE_REFERENCED;
dentry_lru_add(dentry);
}
- spin_unlock(&dentry->d_lock);
- if (parent)
- spin_unlock(&parent->d_lock);
- spin_unlock(&dcache_inode_lock);
- return;
+ dentry->d_count--;
+ spin_unlock(&dentry->d_lock);
+ return;
-unhash_it:
- __d_drop(dentry);
kill_it:
+ spin_unlock(&dentry->d_lock);
+ spin_lock(&dcache_inode_lock);
+relock:
+ spin_lock(&dentry->d_lock);
+ parent = dentry->d_parent;
+ if (parent) {
+ BUG_ON(parent == dentry);
+ if (!spin_trylock(&parent->d_lock)) {
+ spin_unlock(&dentry->d_lock);
+ goto relock;
+ }
+ }
+ dentry->d_count--;
+ if (dentry->d_count) {
+ spin_unlock(&dentry->d_lock);
+ spin_unlock(&parent->d_lock);
+ spin_unlock(&dcache_inode_lock);
+ printk("elevated d_count\n");
+ return;
+ }
/* if dentry was on the d_lru list delete it from there */
dentry_lru_del(dentry);
dentry = d_kill(dentry);
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 12/14] fs: dcache per-bucket dcache hash locking
2009-03-29 15:55 [rfc] scale dcache locking npiggin
` (10 preceding siblings ...)
2009-03-29 15:55 ` [patch 11/14] fs: dcache reduce dput locking npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-30 12:14 ` Andi Kleen
2009-03-29 15:55 ` [patch 13/14] fs: dcache reduce dcache_inode_lock npiggin
` (2 subsequent siblings)
14 siblings, 1 reply; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: dcache-chain-hashlock.patch --]
[-- Type: text/plain, Size: 10010 bytes --]
We can turn the dcache hash locking from a global dcache_hash_lock into
per-bucket locking.
---
fs/dcache.c | 150 +++++++++++++++++++++++++++++--------------------
include/linux/dcache.h | 20 ------
2 files changed, 92 insertions(+), 78 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -38,7 +38,7 @@
* Usage:
* dcache_inode_lock protects:
* - the inode alias lists, d_inode
- * dcache_hash_lock protects:
+ * dcache_hash_bucket->lock protects:
* - the dcache hash table
* dcache_lru_lock protects:
* - the dcache lru lists and counters
@@ -53,18 +53,16 @@
* dcache_inode_lock
* dentry->d_lock
* dcache_lru_lock
- * dcache_hash_lock
+ * dcache_hash_bucket->lock
*/
int sysctl_vfs_cache_pressure __read_mostly = 100;
EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_inode_lock);
-__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_hash_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lru_lock);
__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
EXPORT_SYMBOL(dcache_inode_lock);
-EXPORT_SYMBOL(dcache_hash_lock);
static struct kmem_cache *dentry_cache __read_mostly;
@@ -83,7 +81,12 @@ static struct kmem_cache *dentry_cache _
static unsigned int d_hash_mask __read_mostly;
static unsigned int d_hash_shift __read_mostly;
-static struct hlist_head *dentry_hashtable __read_mostly;
+
+struct dcache_hash_bucket {
+ spinlock_t lock;
+ struct hlist_head head;
+};
+static struct dcache_hash_bucket *dentry_hashtable __read_mostly;
/* Statistics gathering. */
struct dentry_stat_t dentry_stat = {
@@ -91,6 +94,14 @@ struct dentry_stat_t dentry_stat = {
.age_limit = 45,
};
+static inline struct dcache_hash_bucket *d_hash(struct dentry *parent,
+ unsigned long hash)
+{
+ hash += ((unsigned long) parent ^ GOLDEN_RATIO_PRIME) / L1_CACHE_BYTES;
+ hash = hash ^ ((hash ^ GOLDEN_RATIO_PRIME) >> D_HASHBITS);
+ return dentry_hashtable + (hash & D_HASHMASK);
+}
+
static void __d_free(struct dentry *dentry)
{
WARN_ON(!list_empty(&dentry->d_alias));
@@ -230,6 +241,51 @@ static struct dentry *d_kill(struct dent
return parent;
}
+void __d_drop(struct dentry *dentry)
+{
+ if (!(dentry->d_flags & DCACHE_UNHASHED)) {
+ struct dcache_hash_bucket *b;
+ b = d_hash(dentry->d_parent, dentry->d_name.hash);
+ dentry->d_flags |= DCACHE_UNHASHED;
+ spin_lock(&b->lock);
+ hlist_del_rcu(&dentry->d_hash);
+ spin_unlock(&b->lock);
+ }
+}
+
+void d_drop(struct dentry *dentry)
+{
+ spin_lock(&dentry->d_lock);
+ __d_drop(dentry);
+ spin_unlock(&dentry->d_lock);
+}
+
+/* This should be called _only_ with dcache_lock held */
+static inline struct dentry * __dget_locked_dlock(struct dentry *dentry)
+{
+ dentry->d_count++;
+ dentry_lru_del_init(dentry);
+ return dentry;
+}
+
+static inline struct dentry * __dget_locked(struct dentry *dentry)
+{
+ spin_lock(&dentry->d_lock);
+ __dget_locked_dlock(dentry);
+ spin_lock(&dentry->d_lock);
+ return dentry;
+}
+
+struct dentry * dget_locked(struct dentry *dentry)
+{
+ return __dget_locked(dentry);
+}
+
+struct dentry * dget_locked_dlock(struct dentry *dentry)
+{
+ return __dget_locked_dlock(dentry);
+}
+
/*
* This is dput
*
@@ -378,31 +434,6 @@ int d_invalidate(struct dentry * dentry)
return 0;
}
-static inline struct dentry * __dget_locked_dlock(struct dentry *dentry)
-{
- dentry->d_count++;
- dentry_lru_del_init(dentry);
- return dentry;
-}
-
-static inline struct dentry * __dget_locked(struct dentry *dentry)
-{
- spin_lock(&dentry->d_lock);
- __dget_locked_dlock(dentry);
- spin_lock(&dentry->d_lock);
- return dentry;
-}
-
-struct dentry * dget_locked(struct dentry *dentry)
-{
- return __dget_locked(dentry);
-}
-
-struct dentry * dget_locked_dlock(struct dentry *dentry)
-{
- return __dget_locked_dlock(dentry);
-}
-
/**
* d_find_alias - grab a hashed alias of inode
* @inode: inode in question
@@ -1282,14 +1313,6 @@ struct dentry * d_alloc_root(struct inod
return res;
}
-static inline struct hlist_head *d_hash(struct dentry *parent,
- unsigned long hash)
-{
- hash += ((unsigned long) parent ^ GOLDEN_RATIO_PRIME) / L1_CACHE_BYTES;
- hash = hash ^ ((hash ^ GOLDEN_RATIO_PRIME) >> D_HASHBITS);
- return dentry_hashtable + (hash & D_HASHMASK);
-}
-
/**
* d_obtain_alias - find or allocate a dentry for a given inode
* @inode: inode to allocate the dentry for
@@ -1548,7 +1571,8 @@ struct dentry * __d_lookup(struct dentry
unsigned int len = name->len;
unsigned int hash = name->hash;
const unsigned char *str = name->name;
- struct hlist_head *head = d_hash(parent,hash);
+ struct dcache_hash_bucket *b = d_hash(parent, hash);
+ struct hlist_head *head = &b->head;
struct dentry *found = NULL;
struct hlist_node *node;
struct dentry *dentry;
@@ -1642,6 +1666,7 @@ out:
int d_validate(struct dentry *dentry, struct dentry *dparent)
{
+ struct dcache_hash_bucket *b;
struct hlist_head *base;
struct hlist_node *lhp;
@@ -1652,19 +1677,20 @@ int d_validate(struct dentry *dentry, st
if (dentry->d_parent != dparent)
goto out;
- spin_lock(&dcache_hash_lock);
- base = d_hash(dparent, dentry->d_name.hash);
+ b = d_hash(dparent, dentry->d_name.hash);
+ base = &b->head;
+ spin_lock(&b->lock);
hlist_for_each(lhp,base) {
/* hlist_for_each_entry_rcu() not required for d_hash list
- * as it is parsed under dcache_hash_lock
+ * as it is parsed under dcache_hash_bucket->lock
*/
if (dentry == hlist_entry(lhp, struct dentry, d_hash)) {
- spin_unlock(&dcache_hash_lock);
+ spin_unlock(&b->lock);
__dget_locked(dentry);
return 1;
}
}
- spin_unlock(&dcache_hash_lock);
+ spin_unlock(&b->lock);
out:
return 0;
}
@@ -1714,11 +1740,12 @@ void d_delete(struct dentry * dentry)
fsnotify_nameremove(dentry, isdir);
}
-static void __d_rehash(struct dentry * entry, struct hlist_head *list)
+static void __d_rehash(struct dentry * entry, struct dcache_hash_bucket *b)
{
-
entry->d_flags &= ~DCACHE_UNHASHED;
- hlist_add_head_rcu(&entry->d_hash, list);
+ spin_lock(&b->lock);
+ hlist_add_head_rcu(&entry->d_hash, &b->head);
+ spin_unlock(&b->lock);
}
static void _d_rehash(struct dentry * entry)
@@ -1736,9 +1763,7 @@ static void _d_rehash(struct dentry * en
void d_rehash(struct dentry * entry)
{
spin_lock(&entry->d_lock);
- spin_lock(&dcache_hash_lock);
_d_rehash(entry);
- spin_unlock(&dcache_hash_lock);
spin_unlock(&entry->d_lock);
}
@@ -1816,6 +1841,7 @@ static void switch_names(struct dentry *
*/
static void d_move_locked(struct dentry * dentry, struct dentry * target)
{
+ struct dcache_hash_bucket *b;
if (!dentry->d_inode)
printk(KERN_WARNING "VFS: moving negative dcache entry\n");
@@ -1844,11 +1870,13 @@ static void d_move_locked(struct dentry
}
/* Move the dentry to the target hash queue, if on different bucket */
- spin_lock(&dcache_hash_lock);
- if (!d_unhashed(dentry))
+ if (!d_unhashed(dentry)) {
+ b = d_hash(dentry->d_parent, dentry->d_name.hash);
+ spin_lock(&b->lock);
hlist_del_rcu(&dentry->d_hash);
+ spin_unlock(&b->lock);
+ }
__d_rehash(dentry, d_hash(target->d_parent, target->d_name.hash));
- spin_unlock(&dcache_hash_lock);
/* Unhash the target: dput() will then get rid of it */
__d_drop(target);
@@ -2055,9 +2083,7 @@ struct dentry *d_materialise_unique(stru
found_lock:
spin_lock(&actual->d_lock);
found:
- spin_lock(&dcache_hash_lock);
_d_rehash(actual);
- spin_unlock(&dcache_hash_lock);
spin_unlock(&actual->d_lock);
spin_unlock(&dcache_inode_lock);
out_nolock:
@@ -2504,7 +2530,7 @@ static void __init dcache_init_early(voi
dentry_hashtable =
alloc_large_system_hash("Dentry cache",
- sizeof(struct hlist_head),
+ sizeof(struct dcache_hash_bucket),
dhash_entries,
13,
HASH_EARLY,
@@ -2512,8 +2538,10 @@ static void __init dcache_init_early(voi
&d_hash_mask,
0);
- for (loop = 0; loop < (1 << d_hash_shift); loop++)
- INIT_HLIST_HEAD(&dentry_hashtable[loop]);
+ for (loop = 0; loop < (1 << d_hash_shift); loop++) {
+ spin_lock_init(&dentry_hashtable[loop].lock);
+ INIT_HLIST_HEAD(&dentry_hashtable[loop].head);
+ }
}
static void __init dcache_init(void)
@@ -2536,7 +2564,7 @@ static void __init dcache_init(void)
dentry_hashtable =
alloc_large_system_hash("Dentry cache",
- sizeof(struct hlist_head),
+ sizeof(struct dcache_hash_bucket),
dhash_entries,
13,
0,
@@ -2544,8 +2572,10 @@ static void __init dcache_init(void)
&d_hash_mask,
0);
- for (loop = 0; loop < (1 << d_hash_shift); loop++)
- INIT_HLIST_HEAD(&dentry_hashtable[loop]);
+ for (loop = 0; loop < (1 << d_hash_shift); loop++) {
+ spin_lock_init(&dentry_hashtable[loop].lock);
+ INIT_HLIST_HEAD(&dentry_hashtable[loop].head);
+ }
}
/* SLAB cache for __getname() consumers */
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -185,7 +185,6 @@ d_iput: no no yes
#define DCACHE_COOKIE 0x0040 /* For use by dcookie subsystem */
extern spinlock_t dcache_inode_lock;
-extern spinlock_t dcache_hash_lock;
extern seqlock_t rename_lock;
/**
@@ -203,23 +202,8 @@ extern seqlock_t rename_lock;
*
* __d_drop requires dentry->d_lock.
*/
-
-static inline void __d_drop(struct dentry *dentry)
-{
- if (!(dentry->d_flags & DCACHE_UNHASHED)) {
- dentry->d_flags |= DCACHE_UNHASHED;
- spin_lock(&dcache_hash_lock);
- hlist_del_rcu(&dentry->d_hash);
- spin_unlock(&dcache_hash_lock);
- }
-}
-
-static inline void d_drop(struct dentry *dentry)
-{
- spin_lock(&dentry->d_lock);
- __d_drop(dentry);
- spin_unlock(&dentry->d_lock);
-}
+void d_drop(struct dentry *dentry);
+void __d_drop(struct dentry *dentry);
static inline int dname_external(struct dentry *dentry)
{
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 13/14] fs: dcache reduce dcache_inode_lock
2009-03-29 15:55 [rfc] scale dcache locking npiggin
` (11 preceding siblings ...)
2009-03-29 15:55 ` [patch 12/14] fs: dcache per-bucket dcache hash locking npiggin
@ 2009-03-29 15:55 ` npiggin
2009-03-29 15:55 ` [patch 14/14] fs: dcache per-inode inode alias locking npiggin
2009-04-01 14:23 ` [rfc] scale dcache locking Al Viro
14 siblings, 0 replies; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: fs-dcache-d_delete-less-lock.patch --]
[-- Type: text/plain, Size: 1893 bytes --]
dcache_inode_lock can be avoided in d_delete() and d_materialise_unique()
in cases where it is not required.
---
fs/dcache.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -1722,10 +1722,14 @@ void d_delete(struct dentry * dentry)
/*
* Are we the only user?
*/
- spin_lock(&dcache_inode_lock);
+again:
spin_lock(&dentry->d_lock);
isdir = S_ISDIR(dentry->d_inode->i_mode);
if (dentry->d_count == 1) {
+ if (!spin_trylock(&dcache_inode_lock)) {
+ spin_unlock(&dentry->d_lock);
+ goto again;
+ }
dentry_iput(dentry);
fsnotify_nameremove(dentry, isdir);
return;
@@ -1735,7 +1739,6 @@ void d_delete(struct dentry * dentry)
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_inode_lock);
fsnotify_nameremove(dentry, isdir);
}
@@ -2042,14 +2045,15 @@ struct dentry *d_materialise_unique(stru
BUG_ON(!d_unhashed(dentry));
- spin_lock(&dcache_inode_lock);
-
if (!inode) {
actual = dentry;
__d_instantiate(dentry, NULL);
- goto found_lock;
+ d_rehash(actual);
+ goto out_nolock;
}
+ spin_lock(&dcache_inode_lock);
+
if (S_ISDIR(inode->i_mode)) {
struct dentry *alias;
@@ -2077,10 +2081,9 @@ struct dentry *d_materialise_unique(stru
actual = __d_instantiate_unique(dentry, inode);
if (!actual)
actual = dentry;
- else if (unlikely(!d_unhashed(actual)))
- goto shouldnt_be_hashed;
+ else
+ BUG_ON(!d_unhashed(actual));
-found_lock:
spin_lock(&actual->d_lock);
found:
_d_rehash(actual);
@@ -2094,10 +2097,6 @@ out_nolock:
iput(inode);
return actual;
-
-shouldnt_be_hashed:
- spin_unlock(&dcache_inode_lock);
- BUG();
}
static int prepend(char **buffer, int *buflen, const char *str, int namelen)
^ permalink raw reply [flat|nested] 28+ messages in thread
* [patch 14/14] fs: dcache per-inode inode alias locking
2009-03-29 15:55 [rfc] scale dcache locking npiggin
` (12 preceding siblings ...)
2009-03-29 15:55 ` [patch 13/14] fs: dcache reduce dcache_inode_lock npiggin
@ 2009-03-29 15:55 ` npiggin
2009-04-01 14:23 ` [rfc] scale dcache locking Al Viro
14 siblings, 0 replies; 28+ messages in thread
From: npiggin @ 2009-03-29 15:55 UTC (permalink / raw)
To: linux-fsdevel, linux-kernel
[-- Attachment #1: dcache-split-inode_lock.patch --]
[-- Type: text/plain, Size: 12445 bytes --]
dcache_inode_lock can be replaced with per-inode locking. Use existing
inode->i_lock for this. This is slightly non-trivial because we sometimes
need to find the inode from the dentry, which requires d_inode to be
stabilised (either with refcount or d_lock).
---
fs/dcache.c | 105 +++++++++++++++++++++++++-------------------
fs/notify/inotify/inotify.c | 4 -
fs/sysfs/dir.c | 6 +-
include/linux/dcache.h | 1
4 files changed, 65 insertions(+), 51 deletions(-)
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -36,7 +36,7 @@
/*
* Usage:
- * dcache_inode_lock protects:
+ * dcache->d_inode->i_lock protects:
* - the inode alias lists, d_inode
* dcache_hash_bucket->lock protects:
* - the dcache hash table
@@ -50,7 +50,7 @@
* - d_subdirs and children's d_child
*
* Ordering:
- * dcache_inode_lock
+ * dcache->d_inode->i_lock
* dentry->d_lock
* dcache_lru_lock
* dcache_hash_bucket->lock
@@ -58,12 +58,9 @@
int sysctl_vfs_cache_pressure __read_mostly = 100;
EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
-__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_inode_lock);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lru_lock);
__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
-EXPORT_SYMBOL(dcache_inode_lock);
-
static struct kmem_cache *dentry_cache __read_mostly;
#define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname))
@@ -137,14 +134,13 @@ static void d_free(struct dentry *dentry
*/
static void dentry_iput(struct dentry * dentry)
__releases(dentry->d_lock)
- __releases(dcache_inode_lock)
{
struct inode *inode = dentry->d_inode;
if (inode) {
dentry->d_inode = NULL;
list_del_init(&dentry->d_alias);
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
if (!inode->i_nlink)
fsnotify_inoderemove(inode);
if (dentry->d_op && dentry->d_op->d_iput)
@@ -153,7 +149,6 @@ static void dentry_iput(struct dentry *
iput(inode);
} else {
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_inode_lock);
}
}
@@ -224,7 +219,6 @@ static void dentry_lru_del_init(struct d
*/
static struct dentry *d_kill(struct dentry *dentry)
__releases(dentry->d_lock)
- __releases(dcache_inode_lock)
{
struct dentry *parent;
@@ -318,6 +312,7 @@ struct dentry * dget_locked_dlock(struct
void dput(struct dentry *dentry)
{
struct dentry *parent;
+ struct inode *inode;
if (!dentry)
return;
@@ -354,7 +349,9 @@ repeat:
kill_it:
spin_unlock(&dentry->d_lock);
- spin_lock(&dcache_inode_lock);
+ inode = dentry->d_inode;
+ if (inode)
+ spin_lock(&inode->i_lock);
relock:
spin_lock(&dentry->d_lock);
parent = dentry->d_parent;
@@ -369,7 +366,8 @@ relock:
if (dentry->d_count) {
spin_unlock(&dentry->d_lock);
spin_unlock(&parent->d_lock);
- spin_unlock(&dcache_inode_lock);
+ if (inode)
+ spin_unlock(&inode->i_lock);
printk("elevated d_count\n");
return;
}
@@ -486,9 +484,9 @@ struct dentry * d_find_alias(struct inod
struct dentry *de = NULL;
if (!list_empty(&inode->i_dentry)) {
- spin_lock(&dcache_inode_lock);
+ spin_lock(&inode->i_lock);
de = __d_find_alias(inode, 0);
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
}
return de;
}
@@ -501,20 +499,20 @@ void d_prune_aliases(struct inode *inode
{
struct dentry *dentry;
restart:
- spin_lock(&dcache_inode_lock);
+ spin_lock(&inode->i_lock);
list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
spin_lock(&dentry->d_lock);
if (!dentry->d_count) {
__dget_locked_dlock(dentry);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
dput(dentry);
goto restart;
}
spin_unlock(&dentry->d_lock);
}
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
}
/*
@@ -536,8 +534,10 @@ static void prune_one_dentry(struct dent
*/
while (dentry) {
struct dentry *parent = NULL;
+ struct inode *inode = dentry->d_inode;
- spin_lock(&dcache_inode_lock);
+ if (inode)
+ spin_lock(&inode->i_lock);
again:
spin_lock(&dentry->d_lock);
if (dentry->d_parent && dentry != dentry->d_parent) {
@@ -552,7 +552,8 @@ again:
if (parent)
spin_unlock(&parent->d_lock);
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_inode_lock);
+ if (inode)
+ spin_unlock(&inode->i_lock);
return;
}
@@ -621,10 +622,11 @@ restart:
}
spin_unlock(&dcache_lru_lock);
- spin_lock(&dcache_inode_lock);
again:
spin_lock(&dcache_lru_lock); /* lru_lock also protects tmp list */
while (!list_empty(&tmp)) {
+ struct inode *inode;
+
dentry = list_entry(tmp.prev, struct dentry, d_lru);
if (!spin_trylock(&dentry->d_lock)) {
@@ -642,11 +644,18 @@ again1:
spin_unlock(&dentry->d_lock);
continue;
}
+ inode = dentry->d_inode;
+ if (inode && !spin_trylock(&inode->i_lock)) {
+again2:
+ spin_unlock(&dentry->d_lock);
+ goto again1;
+ }
if (dentry->d_parent) {
BUG_ON(dentry == dentry->d_parent);
if (!spin_trylock(&dentry->d_parent->d_lock)) {
- spin_unlock(&dentry->d_lock);
- goto again1;
+ if (inode)
+ spin_unlock(&inode->i_lock);
+ goto again2;
}
}
__dentry_lru_del_init(dentry);
@@ -654,10 +663,8 @@ again1:
prune_one_dentry(dentry);
/* dentry->d_lock dropped */
- spin_lock(&dcache_inode_lock);
spin_lock(&dcache_lru_lock);
}
- spin_unlock(&dcache_inode_lock);
if (count == NULL && !list_empty(&sb->s_dentry_lru))
goto restart;
@@ -1211,9 +1218,11 @@ static void __d_instantiate(struct dentr
void d_instantiate(struct dentry *entry, struct inode * inode)
{
BUG_ON(!list_empty(&entry->d_alias));
- spin_lock(&dcache_inode_lock);
+ if (inode)
+ spin_lock(&inode->i_lock);
__d_instantiate(entry, inode);
- spin_unlock(&dcache_inode_lock);
+ if (inode)
+ spin_unlock(&inode->i_lock);
security_d_instantiate(entry, inode);
}
@@ -1271,9 +1280,11 @@ struct dentry *d_instantiate_unique(stru
BUG_ON(!list_empty(&entry->d_alias));
- spin_lock(&dcache_inode_lock);
+ if (inode)
+ spin_lock(&inode->i_lock);
result = __d_instantiate_unique(entry, inode);
- spin_unlock(&dcache_inode_lock);
+ if (inode)
+ spin_unlock(&inode->i_lock);
if (!result) {
security_d_instantiate(entry, inode);
@@ -1353,10 +1364,10 @@ struct dentry *d_obtain_alias(struct ino
}
tmp->d_parent = tmp; /* make sure dput doesn't croak */
- spin_lock(&dcache_inode_lock);
+ spin_lock(&inode->i_lock);
res = __d_find_alias(inode, 0);
if (res) {
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
dput(tmp);
goto out_iput;
}
@@ -1370,7 +1381,7 @@ struct dentry *d_obtain_alias(struct ino
list_add(&tmp->d_alias, &inode->i_dentry);
hlist_add_head(&tmp->d_hash, &inode->i_sb->s_anon);
spin_unlock(&tmp->d_lock);
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
return tmp;
@@ -1401,19 +1412,19 @@ struct dentry *d_splice_alias(struct ino
struct dentry *new = NULL;
if (inode && S_ISDIR(inode->i_mode)) {
- spin_lock(&dcache_inode_lock);
+ spin_lock(&inode->i_lock);
new = __d_find_alias(inode, 1);
if (new) {
BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
security_d_instantiate(new, inode);
d_rehash(dentry);
d_move(new, dentry);
iput(inode);
} else {
- /* already taken dcache_inode_lock, d_add() by hand */
+ /* already taken inode->i_lock, d_add() by hand */
__d_instantiate(dentry, inode);
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
security_d_instantiate(dentry, inode);
d_rehash(dentry);
}
@@ -1488,15 +1499,15 @@ struct dentry *d_add_ci(struct dentry *d
d_instantiate(found, inode);
return found;
}
- spin_lock(&dcache_inode_lock);
+ spin_lock(&inode->i_lock);
if (list_empty(&inode->i_dentry)) {
/*
* Directory without a 'disconnected' dentry; we need to do
- * d_instantiate() by hand because it takes dcache_inode_lock which
+ * d_instantiate() by hand because it takes inode->i_lock which
* we already hold.
*/
__d_instantiate(found, inode);
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
security_d_instantiate(found, inode);
return found;
}
@@ -1506,7 +1517,7 @@ struct dentry *d_add_ci(struct dentry *d
*/
new = list_entry(inode->i_dentry.next, struct dentry, d_alias);
dget_locked(new);
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
/* Do security vodoo. */
security_d_instantiate(found, inode);
/* Move new in place of found. */
@@ -1718,15 +1729,17 @@ out:
void d_delete(struct dentry * dentry)
{
+ struct inode *inode;
int isdir = 0;
/*
* Are we the only user?
*/
again:
spin_lock(&dentry->d_lock);
- isdir = S_ISDIR(dentry->d_inode->i_mode);
+ inode = dentry->d_inode;
+ isdir = S_ISDIR(inode->i_mode);
if (dentry->d_count == 1) {
- if (!spin_trylock(&dcache_inode_lock)) {
+ if (inode && !spin_trylock(&inode->i_lock)) {
spin_unlock(&dentry->d_lock);
goto again;
}
@@ -1959,6 +1972,7 @@ static struct dentry *__d_unalias(struct
{
struct mutex *m1 = NULL, *m2 = NULL;
struct dentry *ret;
+ struct inode *inode;
/* If alias and dentry share a parent, then no extra locks required */
if (alias->d_parent == dentry->d_parent)
@@ -1974,14 +1988,15 @@ static struct dentry *__d_unalias(struct
if (!mutex_trylock(&dentry->d_sb->s_vfs_rename_mutex))
goto out_err;
m1 = &dentry->d_sb->s_vfs_rename_mutex;
- if (!mutex_trylock(&alias->d_parent->d_inode->i_mutex))
+ inode = alias->d_parent->d_inode;
+ if (!mutex_trylock(&inode->i_mutex))
goto out_err;
- m2 = &alias->d_parent->d_inode->i_mutex;
+ m2 = &inode->i_mutex;
out_unalias:
d_move_locked(alias, dentry);
ret = alias;
out_err:
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
if (m2)
mutex_unlock(m2);
if (m1)
@@ -2052,7 +2067,7 @@ struct dentry *d_materialise_unique(stru
goto out_nolock;
}
- spin_lock(&dcache_inode_lock);
+ spin_lock(&inode->i_lock);
if (S_ISDIR(inode->i_mode)) {
struct dentry *alias;
@@ -2088,7 +2103,7 @@ struct dentry *d_materialise_unique(stru
found:
_d_rehash(actual);
spin_unlock(&actual->d_lock);
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
out_nolock:
if (actual == dentry) {
security_d_instantiate(dentry, inode);
Index: linux-2.6/fs/sysfs/dir.c
===================================================================
--- linux-2.6.orig/fs/sysfs/dir.c
+++ linux-2.6/fs/sysfs/dir.c
@@ -519,7 +519,7 @@ static void sysfs_drop_dentry(struct sys
* dput to immediately free the dentry if it is not in use.
*/
repeat:
- spin_lock(&dcache_inode_lock);
+ spin_lock(&inode->i_lock);
list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
spin_lock(&dentry->d_lock);
if (d_unhashed(dentry)) {
@@ -529,11 +529,11 @@ repeat:
dget_locked_dlock(dentry);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
dput(dentry);
goto repeat;
}
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
/* adjust nlink and update timestamp */
mutex_lock(&inode->i_mutex);
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -184,7 +184,6 @@ d_iput: no no yes
#define DCACHE_COOKIE 0x0040 /* For use by dcookie subsystem */
-extern spinlock_t dcache_inode_lock;
extern seqlock_t rename_lock;
/**
Index: linux-2.6/fs/notify/inotify/inotify.c
===================================================================
--- linux-2.6.orig/fs/notify/inotify/inotify.c
+++ linux-2.6/fs/notify/inotify/inotify.c
@@ -184,7 +184,7 @@ static void set_dentry_child_flags(struc
{
struct dentry *alias;
- spin_lock(&dcache_inode_lock);
+ spin_lock(&inode->i_lock);
list_for_each_entry(alias, &inode->i_dentry, d_alias) {
struct dentry *child;
@@ -202,7 +202,7 @@ static void set_dentry_child_flags(struc
}
spin_unlock(&alias->d_lock);
}
- spin_unlock(&dcache_inode_lock);
+ spin_unlock(&inode->i_lock);
}
/*
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 12/14] fs: dcache per-bucket dcache hash locking
2009-03-29 15:55 ` [patch 12/14] fs: dcache per-bucket dcache hash locking npiggin
@ 2009-03-30 12:14 ` Andi Kleen
2009-03-30 12:27 ` Nick Piggin
2009-03-30 18:00 ` Christoph Hellwig
0 siblings, 2 replies; 28+ messages in thread
From: Andi Kleen @ 2009-03-30 12:14 UTC (permalink / raw)
To: npiggin; +Cc: linux-fsdevel, linux-kernel
npiggin@suse.de writes:
> We can turn the dcache hash locking from a global dcache_hash_lock into
> per-bucket locking.
Per bucket locking is typically a bad idea because you get far too
many locks and you increase cache footprint with all of them. It's
typically better to use a second much smaller hash table that only has
locks (by just shifting the hash value down some more)
Just need to be careful to avoid too much false sharing.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 09/14] fs: use RCU / seqlock logic for reverse and multi-step operaitons
2009-03-29 15:55 ` [patch 09/14] fs: use RCU / seqlock logic for reverse and multi-step operaitons npiggin
@ 2009-03-30 12:16 ` Andi Kleen
2009-03-30 12:29 ` Nick Piggin
0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2009-03-30 12:16 UTC (permalink / raw)
To: npiggin; +Cc: linux-fsdevel, linux-kernel
npiggin@suse.de writes:
> The remaining usages for dcache_lock is to allow atomic, multi-step read-side
> operations over the directory tree by excluding modifications to the tree.
> Also, to walk in the leaf->root direction in the tree where we don't have
> a natural d_lock ordering. This is the hardest bit.
General thoughts: is there a way to add a self testing infrastructure
to this. e.g. by having more sequence counts per object (only enabled
in the debug case, so it doesn't matter when cache line bounces) and lots of
checks?
I suppose that would lower the work needed of actually fixing this to
work significantly.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 08/14] fs: scale inode alias list
2009-03-29 15:55 ` [patch 08/14] fs: scale inode alias list npiggin
@ 2009-03-30 12:18 ` Andi Kleen
2009-03-30 12:31 ` Nick Piggin
0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2009-03-30 12:18 UTC (permalink / raw)
To: npiggin; +Cc: linux-fsdevel, linux-kernel
npiggin@suse.de writes:
> Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
> from concurrent modification. d_alias is also protected by d_lock.
This would seem to ask for per object lock? Why not put it into the inode?
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 12/14] fs: dcache per-bucket dcache hash locking
2009-03-30 12:14 ` Andi Kleen
@ 2009-03-30 12:27 ` Nick Piggin
2009-03-30 12:47 ` Andi Kleen
2009-03-30 18:00 ` Christoph Hellwig
1 sibling, 1 reply; 28+ messages in thread
From: Nick Piggin @ 2009-03-30 12:27 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-fsdevel, linux-kernel
On Mon, Mar 30, 2009 at 02:14:08PM +0200, Andi Kleen wrote:
> npiggin@suse.de writes:
>
> > We can turn the dcache hash locking from a global dcache_hash_lock into
> > per-bucket locking.
>
> Per bucket locking is typically a bad idea because you get far too
> many locks and you increase cache footprint with all of them. It's
> typically better to use a second much smaller hash table that only has
> locks (by just shifting the hash value down some more)
> Just need to be careful to avoid too much false sharing.
It's interesting. I suspect that with the size of the dcache hash,
if we assume pretty random distribution of access patterns, then
it might be unlikely to get much common cache lines (ok, birthday
paradox says we'll get a few common cachelines but how many?). So
then if we have to go to a 2nd lock hash table then that might
actually increase our cacheline footprint.
Of course RAM footprint will be more.
Anyway, I did think of this and it is something to discus in
future, but for now at least it is a demonstration of how it becomes
quite easy to change locking after we have broken the locking into
these components.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 09/14] fs: use RCU / seqlock logic for reverse and multi-step operaitons
2009-03-30 12:16 ` Andi Kleen
@ 2009-03-30 12:29 ` Nick Piggin
2009-03-30 12:43 ` Andi Kleen
0 siblings, 1 reply; 28+ messages in thread
From: Nick Piggin @ 2009-03-30 12:29 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-fsdevel, linux-kernel
On Mon, Mar 30, 2009 at 02:16:49PM +0200, Andi Kleen wrote:
> npiggin@suse.de writes:
>
> > The remaining usages for dcache_lock is to allow atomic, multi-step read-side
> > operations over the directory tree by excluding modifications to the tree.
> > Also, to walk in the leaf->root direction in the tree where we don't have
> > a natural d_lock ordering. This is the hardest bit.
>
> General thoughts: is there a way to add a self testing infrastructure
> to this. e.g. by having more sequence counts per object (only enabled
> in the debug case, so it doesn't matter when cache line bounces) and lots of
> checks?
>
> I suppose that would lower the work needed of actually fixing this to
> work significantly.
Might be a good idea. I'll think about whether it can be done.
Note that I *think* the idea is pretty sound, but I'm just not
quite sure about checking for parent being deleted when we're
walking back up the tree -- d_unhashed() doesn't seem to work
because we can encounter unhashed parents by design. We might
just need another d_flag...
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 08/14] fs: scale inode alias list
2009-03-30 12:18 ` Andi Kleen
@ 2009-03-30 12:31 ` Nick Piggin
0 siblings, 0 replies; 28+ messages in thread
From: Nick Piggin @ 2009-03-30 12:31 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-fsdevel, linux-kernel
On Mon, Mar 30, 2009 at 02:18:07PM +0200, Andi Kleen wrote:
> npiggin@suse.de writes:
>
> > Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
> > from concurrent modification. d_alias is also protected by d_lock.
>
> This would seem to ask for per object lock? Why not put it into the inode?
Yes, this comes later in the last patch, because the first round is just
meant to be as simple as possible, and making it per-inode is non-trivial
(not too difficult, but just not totally trivial) for reasons given in
that patch changelog.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 09/14] fs: use RCU / seqlock logic for reverse and multi-step operaitons
2009-03-30 12:29 ` Nick Piggin
@ 2009-03-30 12:43 ` Andi Kleen
0 siblings, 0 replies; 28+ messages in thread
From: Andi Kleen @ 2009-03-30 12:43 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andi Kleen, linux-fsdevel, linux-kernel
> Note that I *think* the idea is pretty sound, but I'm just not
Sorry didn't want to sound negative. Just more testing is always
good.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 12/14] fs: dcache per-bucket dcache hash locking
2009-03-30 12:27 ` Nick Piggin
@ 2009-03-30 12:47 ` Andi Kleen
2009-03-30 12:59 ` Nick Piggin
0 siblings, 1 reply; 28+ messages in thread
From: Andi Kleen @ 2009-03-30 12:47 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andi Kleen, linux-fsdevel, linux-kernel
On Mon, Mar 30, 2009 at 02:27:12PM +0200, Nick Piggin wrote:
> It's interesting. I suspect that with the size of the dcache hash,
> if we assume pretty random distribution of access patterns, then
> it might be unlikely to get much common cache lines (ok, birthday
The problem is that you increase the cache foot print overall
because these hash tables are gigantic. And because it's random
there will not be much locality. That is your hash table
might still fit when you're lucky, but then if the rest
of your workload needs a lot of cache too you might
end up with a cache miss on every access.
False sharing is not the issue with the big lock hash typically, that was
more as an issue for a potential separate hash table design
(I guess my original sentence was a bit confusing)
BTW the alternative would be to switch the hash table to some
large fan out tree indexed by the string hash value and then use
the standard lockless algorithms on that.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 12/14] fs: dcache per-bucket dcache hash locking
2009-03-30 12:47 ` Andi Kleen
@ 2009-03-30 12:59 ` Nick Piggin
0 siblings, 0 replies; 28+ messages in thread
From: Nick Piggin @ 2009-03-30 12:59 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-fsdevel, linux-kernel
On Mon, Mar 30, 2009 at 02:47:35PM +0200, Andi Kleen wrote:
> On Mon, Mar 30, 2009 at 02:27:12PM +0200, Nick Piggin wrote:
> > It's interesting. I suspect that with the size of the dcache hash,
> > if we assume pretty random distribution of access patterns, then
> > it might be unlikely to get much common cache lines (ok, birthday
>
> The problem is that you increase the cache foot print overall
> because these hash tables are gigantic. And because it's random
> there will not be much locality. That is your hash table
> might still fit when you're lucky, but then if the rest
> of your workload needs a lot of cache too you might
> end up with a cache miss on every access.
Hmm, I disagree in general because the hash table is so big, then
it is very unlikely to get much sharing whether or not we double
the size of it. Even if we only use a few dentries in the workload,
they will be scattered all over the table and each lookup will use
one cacheline regardless of the bucket head size.
Wheras if we have to go to another lock table each time, then we
have to touch 2 cachelines per lookup.
Actually I have patches floating around to be able to dynamically
resize the dcache hash table, and in that case it actually would
be able to make it very small and fit in cache for workloads that
don't have too many dentries.
But anyway let's not worry too much about this yet. I agree it
has downsides whatever direction we go, so we can discuss or
measure after the basics of the patchset are more mature.
> False sharing is not the issue with the big lock hash typically, that was
> more as an issue for a potential separate hash table design
> (I guess my original sentence was a bit confusing)
>
> BTW the alternative would be to switch the hash table to some
> large fan out tree indexed by the string hash value and then use
> the standard lockless algorithms on that.
Well yes that's the other thing we could try.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 12/14] fs: dcache per-bucket dcache hash locking
2009-03-30 12:14 ` Andi Kleen
2009-03-30 12:27 ` Nick Piggin
@ 2009-03-30 18:00 ` Christoph Hellwig
2009-03-31 1:57 ` Nick Piggin
1 sibling, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2009-03-30 18:00 UTC (permalink / raw)
To: Andi Kleen; +Cc: npiggin, linux-fsdevel, linux-kernel
On Mon, Mar 30, 2009 at 02:14:08PM +0200, Andi Kleen wrote:
> npiggin@suse.de writes:
>
> > We can turn the dcache hash locking from a global dcache_hash_lock into
> > per-bucket locking.
>
> Per bucket locking is typically a bad idea because you get far too
> many locks and you increase cache footprint with all of them. It's
> typically better to use a second much smaller hash table that only has
> locks (by just shifting the hash value down some more)
> Just need to be careful to avoid too much false sharing.
Yeah, I'm also not too happy about it. I think we're better off
replacing the global hash with more fine-grained structures. That might
even some sort of simplistic tree hanging of directory dentries.
The hard part in getting anything like this done was always the amount
of different things dcache_lock protects besiseds the hash, but Nick's
earlier patches when actually fully implemented should sort that out.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [patch 12/14] fs: dcache per-bucket dcache hash locking
2009-03-30 18:00 ` Christoph Hellwig
@ 2009-03-31 1:57 ` Nick Piggin
0 siblings, 0 replies; 28+ messages in thread
From: Nick Piggin @ 2009-03-31 1:57 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Andi Kleen, linux-fsdevel, linux-kernel
On Mon, Mar 30, 2009 at 02:00:31PM -0400, Christoph Hellwig wrote:
> On Mon, Mar 30, 2009 at 02:14:08PM +0200, Andi Kleen wrote:
> > npiggin@suse.de writes:
> >
> > > We can turn the dcache hash locking from a global dcache_hash_lock into
> > > per-bucket locking.
> >
> > Per bucket locking is typically a bad idea because you get far too
> > many locks and you increase cache footprint with all of them. It's
> > typically better to use a second much smaller hash table that only has
> > locks (by just shifting the hash value down some more)
> > Just need to be careful to avoid too much false sharing.
>
> Yeah, I'm also not too happy about it. I think we're better off
> replacing the global hash with more fine-grained structures. That might
> even some sort of simplistic tree hanging of directory dentries.
>
> The hard part in getting anything like this done was always the amount
> of different things dcache_lock protects besiseds the hash, but Nick's
> earlier patches when actually fully implemented should sort that out.
Well yes that's the thing, dcache_hash_lock becomes inner-most and
very simply protecting dentry hash and d_hash, so it should now
be quite trivial (locking-wise) to experiment with other data
structures.
Of course any proposed new data structure will have to compete with
a scaled hash table... whether we scale it the dumb way like I did
or something like Andi suggests, I don't really know. But just to
be clear, the point of that patch is more to show how it is possible
to easily change locking. The really important parts of my series for
the moment are the first half of actually breaking up dcache_lock.
Anyway, no real objections yet, so I'll continue down this path and
see what I come up with.
If anybody knows of workloads or benchmarks to try, also let me know.
The patchset posted improved dbench-in-ram performance by about 1.6x
on my 2s8c system here so it is encouraging but not the most realistic
test :)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [rfc] scale dcache locking
2009-03-29 15:55 [rfc] scale dcache locking npiggin
` (13 preceding siblings ...)
2009-03-29 15:55 ` [patch 14/14] fs: dcache per-inode inode alias locking npiggin
@ 2009-04-01 14:23 ` Al Viro
2009-04-02 9:43 ` Nick Piggin
14 siblings, 1 reply; 28+ messages in thread
From: Al Viro @ 2009-04-01 14:23 UTC (permalink / raw)
To: npiggin; +Cc: linux-fsdevel, linux-kernel
On Mon, Mar 30, 2009 at 02:55:39AM +1100, npiggin@suse.de wrote:
> This is my sketch for improving dcache locking scalability. So far I've
> only really been looking at core code to get an idea of how it might look,
> so most configurable functionality is broken (and unfortunately it might
> well be something in there which will cause a fundamental problem for me).
Umm... Some of that makes obvious sense per se, some... In particular,
all of a sudden we get contention between multiple dput() on the same
dentry, which is dirt-common for directory ones.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [rfc] scale dcache locking
2009-04-01 14:23 ` [rfc] scale dcache locking Al Viro
@ 2009-04-02 9:43 ` Nick Piggin
0 siblings, 0 replies; 28+ messages in thread
From: Nick Piggin @ 2009-04-02 9:43 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, linux-kernel
On Wed, Apr 01, 2009 at 03:23:12PM +0100, Al Viro wrote:
> On Mon, Mar 30, 2009 at 02:55:39AM +1100, npiggin@suse.de wrote:
> > This is my sketch for improving dcache locking scalability. So far I've
> > only really been looking at core code to get an idea of how it might look,
> > so most configurable functionality is broken (and unfortunately it might
> > well be something in there which will cause a fundamental problem for me).
>
> Umm... Some of that makes obvious sense per se, some... In particular,
> all of a sudden we get contention between multiple dput() on the same
> dentry, which is dirt-common for directory ones.
Yes that's true but I'm hoping lock hold times on d_lock aren't
too long, in which case the major cost should remain just the
cacheline contention.
Hmm, I wanted to avoid the atomic because it tends to be covered
by d_lock a lot of the time anyway so avoiding the extra locked op,
and also makes concurrency a bit easier to think about.
In worst case, I guess we need to reintroduce atomic refcount or
have another lock for it...
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2009-04-02 9:43 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-29 15:55 [rfc] scale dcache locking npiggin
2009-03-29 15:55 ` [patch 01/14] fs: dcache fix LRU ordering npiggin
2009-03-29 15:55 ` [patch 02/14] fs: dcache scale hash npiggin
2009-03-29 15:55 ` [patch 03/14] fs: dcache scale lru npiggin
2009-03-29 15:55 ` [patch 04/14] fs: dcache scale nr_dentry npiggin
2009-03-29 15:55 ` [patch 05/14] fs: dcache scale dentry refcount npiggin
2009-03-29 15:55 ` [patch 06/14] fs: dcache scale d_unhashed npiggin
2009-03-29 15:55 ` [patch 07/14] fs: dcache scale subdirs npiggin
2009-03-29 15:55 ` [patch 08/14] fs: scale inode alias list npiggin
2009-03-30 12:18 ` Andi Kleen
2009-03-30 12:31 ` Nick Piggin
2009-03-29 15:55 ` [patch 09/14] fs: use RCU / seqlock logic for reverse and multi-step operaitons npiggin
2009-03-30 12:16 ` Andi Kleen
2009-03-30 12:29 ` Nick Piggin
2009-03-30 12:43 ` Andi Kleen
2009-03-29 15:55 ` [patch 10/14] fs: dcache remove dcache_lock npiggin
2009-03-29 15:55 ` [patch 11/14] fs: dcache reduce dput locking npiggin
2009-03-29 15:55 ` [patch 12/14] fs: dcache per-bucket dcache hash locking npiggin
2009-03-30 12:14 ` Andi Kleen
2009-03-30 12:27 ` Nick Piggin
2009-03-30 12:47 ` Andi Kleen
2009-03-30 12:59 ` Nick Piggin
2009-03-30 18:00 ` Christoph Hellwig
2009-03-31 1:57 ` Nick Piggin
2009-03-29 15:55 ` [patch 13/14] fs: dcache reduce dcache_inode_lock npiggin
2009-03-29 15:55 ` [patch 14/14] fs: dcache per-inode inode alias locking npiggin
2009-04-01 14:23 ` [rfc] scale dcache locking Al Viro
2009-04-02 9:43 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).