* [patch 0/3] A few patches for dcache
@ 2011-07-28 13:12 Cyrill Gorcunov
2011-07-28 13:12 ` [patch 1/3] vfs, dcache: Introduce lighten r/o rename_lock lockers Cyrill Gorcunov
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2011-07-28 13:12 UTC (permalink / raw)
To: Al Viro, LINUXFS-ML
Cc: Andrew Morton, Christoph Hellwig, James Bottomley, xemul
There were a thread at http://lwn.net/Articles/441164/ and while the rest of
dcache rework is still under review and discussion, the three patches might be
considered independantly.
Please review, comments are *highly* appreciated!
I've tested them locally with dbench (maybe something
else I should try?) to be sure all works as expected
but better to have them reviewed since tests do not
always reveal potential problems.
The idea of series is to remove global rename_lock and
prepare the ground for future factoring out of the
shrinker code.
Thanks,
Cyrill
^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 1/3] vfs, dcache: Introduce lighten r/o rename_lock lockers
2011-07-28 13:12 [patch 0/3] A few patches for dcache Cyrill Gorcunov
@ 2011-07-28 13:12 ` Cyrill Gorcunov
2011-07-28 13:12 ` [patch 2/3] vfs, dcache: Factor out rename_lock locking Cyrill Gorcunov
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2011-07-28 13:12 UTC (permalink / raw)
To: Al Viro, LINUXFS-ML
Cc: Andrew Morton, Christoph Hellwig, James Bottomley, xemul,
Cyrill Gorcunov
[-- Attachment #1: dcache-1-13.patch --]
[-- Type: text/plain, Size: 5536 bytes --]
From: Pavel Emelyanov <xemul@openvz.org>
There are some places, that need to _lock_ the rename_lock, but they
do it for reading the dentry tree. When doing the write_seqlock they
force the existing "willing to restart" readers do the restart, which
is not actually required.
Introduce two more seqlock helpers, that lock the lock, but do not
update the seq count.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
fs/dcache.c | 40 ++++++++++++++++++++--------------------
include/linux/seqlock.h | 15 +++++++++++++++
2 files changed, 35 insertions(+), 20 deletions(-)
Index: linux-2.6.git/fs/dcache.c
===================================================================
--- linux-2.6.git.orig/fs/dcache.c
+++ linux-2.6.git/fs/dcache.c
@@ -1089,18 +1089,18 @@ resume:
if (!locked && read_seqretry(&rename_lock, seq))
goto rename_retry;
if (locked)
- write_sequnlock(&rename_lock);
+ read_sequnlock(&rename_lock);
return 0; /* No mount points found in tree */
positive:
if (!locked && read_seqretry(&rename_lock, seq))
goto rename_retry;
if (locked)
- write_sequnlock(&rename_lock);
+ read_sequnlock(&rename_lock);
return 1;
rename_retry:
locked = 1;
- write_seqlock(&rename_lock);
+ read_seqlock(&rename_lock);
goto again;
}
EXPORT_SYMBOL(have_submounts);
@@ -1191,14 +1191,14 @@ out:
if (!locked && read_seqretry(&rename_lock, seq))
goto rename_retry;
if (locked)
- write_sequnlock(&rename_lock);
+ read_sequnlock(&rename_lock);
return found;
rename_retry:
if (found)
return found;
locked = 1;
- write_seqlock(&rename_lock);
+ read_seqlock(&rename_lock);
goto again;
}
@@ -2577,9 +2577,9 @@ char *__d_path(const struct path *path,
int error;
prepend(&res, &buflen, "\0", 1);
- write_seqlock(&rename_lock);
+ read_seqlock(&rename_lock);
error = prepend_path(path, root, &res, &buflen);
- write_sequnlock(&rename_lock);
+ read_sequnlock(&rename_lock);
if (error)
return ERR_PTR(error);
@@ -2641,12 +2641,12 @@ char *d_path(const struct path *path, ch
return path->dentry->d_op->d_dname(path->dentry, buf, buflen);
get_fs_root(current->fs, &root);
- write_seqlock(&rename_lock);
+ read_seqlock(&rename_lock);
tmp = root;
error = path_with_deleted(path, &tmp, &res, &buflen);
if (error)
res = ERR_PTR(error);
- write_sequnlock(&rename_lock);
+ read_sequnlock(&rename_lock);
path_put(&root);
return res;
}
@@ -2672,12 +2672,12 @@ char *d_path_with_unreachable(const stru
return path->dentry->d_op->d_dname(path->dentry, buf, buflen);
get_fs_root(current->fs, &root);
- write_seqlock(&rename_lock);
+ read_seqlock(&rename_lock);
tmp = root;
error = path_with_deleted(path, &tmp, &res, &buflen);
if (!error && !path_equal(&tmp, &root))
error = prepend_unreachable(&res, &buflen);
- write_sequnlock(&rename_lock);
+ read_sequnlock(&rename_lock);
path_put(&root);
if (error)
res = ERR_PTR(error);
@@ -2744,9 +2744,9 @@ char *dentry_path_raw(struct dentry *den
{
char *retval;
- write_seqlock(&rename_lock);
+ read_seqlock(&rename_lock);
retval = __dentry_path(dentry, buf, buflen);
- write_sequnlock(&rename_lock);
+ read_sequnlock(&rename_lock);
return retval;
}
@@ -2757,7 +2757,7 @@ char *dentry_path(struct dentry *dentry,
char *p = NULL;
char *retval;
- write_seqlock(&rename_lock);
+ read_seqlock(&rename_lock);
if (d_unlinked(dentry)) {
p = buf + buflen;
if (prepend(&p, &buflen, "//deleted", 10) != 0)
@@ -2765,7 +2765,7 @@ char *dentry_path(struct dentry *dentry,
buflen++;
}
retval = __dentry_path(dentry, buf, buflen);
- write_sequnlock(&rename_lock);
+ read_sequnlock(&rename_lock);
if (!IS_ERR(retval) && p)
*p = '/'; /* restore '/' overriden with '\0' */
return retval;
@@ -2803,7 +2803,7 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
get_fs_root_and_pwd(current->fs, &root, &pwd);
error = -ENOENT;
- write_seqlock(&rename_lock);
+ read_seqlock(&rename_lock);
if (!d_unlinked(pwd.dentry)) {
unsigned long len;
struct path tmp = root;
@@ -2812,7 +2812,7 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
prepend(&cwd, &buflen, "\0", 1);
error = prepend_path(&pwd, &tmp, &cwd, &buflen);
- write_sequnlock(&rename_lock);
+ read_sequnlock(&rename_lock);
if (error)
goto out;
@@ -2832,7 +2832,7 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
error = -EFAULT;
}
} else {
- write_sequnlock(&rename_lock);
+ read_sequnlock(&rename_lock);
}
out:
@@ -2962,12 +2962,12 @@ resume:
if (!locked && read_seqretry(&rename_lock, seq))
goto rename_retry;
if (locked)
- write_sequnlock(&rename_lock);
+ read_sequnlock(&rename_lock);
return;
rename_retry:
locked = 1;
- write_seqlock(&rename_lock);
+ read_seqlock(&rename_lock);
goto again;
}
Index: linux-2.6.git/include/linux/seqlock.h
===================================================================
--- linux-2.6.git.orig/include/linux/seqlock.h
+++ linux-2.6.git/include/linux/seqlock.h
@@ -108,6 +108,21 @@ static __always_inline int read_seqretry
return unlikely(sl->sequence != start);
}
+/*
+ * Lock/unlock the critical section against the updates but
+ * without changing the sequence counter, so other lockless
+ * readers will not have to restart.
+ */
+
+static __always_inline void read_seqlock(seqlock_t *sl)
+{
+ spin_lock(&sl->lock);
+}
+
+static __always_inline void read_sequnlock(seqlock_t *sl)
+{
+ spin_unlock(&sl->lock);
+}
/*
* Version using sequence counter only.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 2/3] vfs, dcache: Factor out rename_lock locking
2011-07-28 13:12 [patch 0/3] A few patches for dcache Cyrill Gorcunov
2011-07-28 13:12 ` [patch 1/3] vfs, dcache: Introduce lighten r/o rename_lock lockers Cyrill Gorcunov
@ 2011-07-28 13:12 ` Cyrill Gorcunov
2011-07-28 13:12 ` [patch 3/3] vfs: Make the rename_lock per-sb Cyrill Gorcunov
2011-07-29 3:25 ` [patch 0/3] A few patches for dcache Dave Chinner
3 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2011-07-28 13:12 UTC (permalink / raw)
To: Al Viro, LINUXFS-ML
Cc: Andrew Morton, Christoph Hellwig, James Bottomley, xemul,
Cyrill Gorcunov
[-- Attachment #1: dcache-2-13.patch --]
[-- Type: text/plain, Size: 4118 bytes --]
From: Pavel Emelyanov <xemul@openvz.org>
The set of routnes that try to get the dentry path with various
flavours end up in two "core" routines -- the prepend_path and
the __dentry_path.
Since the rename_lock only protects the path integrity, it's OK
to move the rename_lock locking inside these two.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
fs/dcache.c | 27 ++++++---------------------
1 file changed, 6 insertions(+), 21 deletions(-)
Index: linux-2.6.git/fs/dcache.c
===================================================================
--- linux-2.6.git.orig/fs/dcache.c
+++ linux-2.6.git/fs/dcache.c
@@ -2504,6 +2504,7 @@ static int prepend_path(const struct pat
bool slash = false;
int error = 0;
+ read_seqlock(&rename_lock);
br_read_lock(vfsmount_lock);
while (dentry != root->dentry || vfsmnt != root->mnt) {
struct dentry * parent;
@@ -2536,6 +2537,7 @@ out:
error = prepend(buffer, buflen, "/", 1);
br_read_unlock(vfsmount_lock);
+ read_sequnlock(&rename_lock);
return error;
global_root:
@@ -2577,10 +2579,7 @@ char *__d_path(const struct path *path,
int error;
prepend(&res, &buflen, "\0", 1);
- read_seqlock(&rename_lock);
error = prepend_path(path, root, &res, &buflen);
- read_sequnlock(&rename_lock);
-
if (error)
return ERR_PTR(error);
return res;
@@ -2641,12 +2640,10 @@ char *d_path(const struct path *path, ch
return path->dentry->d_op->d_dname(path->dentry, buf, buflen);
get_fs_root(current->fs, &root);
- read_seqlock(&rename_lock);
tmp = root;
error = path_with_deleted(path, &tmp, &res, &buflen);
if (error)
res = ERR_PTR(error);
- read_sequnlock(&rename_lock);
path_put(&root);
return res;
}
@@ -2672,12 +2669,10 @@ char *d_path_with_unreachable(const stru
return path->dentry->d_op->d_dname(path->dentry, buf, buflen);
get_fs_root(current->fs, &root);
- read_seqlock(&rename_lock);
tmp = root;
error = path_with_deleted(path, &tmp, &res, &buflen);
if (!error && !path_equal(&tmp, &root))
error = prepend_unreachable(&res, &buflen);
- read_sequnlock(&rename_lock);
path_put(&root);
if (error)
res = ERR_PTR(error);
@@ -2721,6 +2716,7 @@ static char *__dentry_path(struct dentry
retval = end-1;
*retval = '/';
+ read_seqlock(&rename_lock);
while (!IS_ROOT(dentry)) {
struct dentry *parent = dentry->d_parent;
int error;
@@ -2735,20 +2731,16 @@ static char *__dentry_path(struct dentry
retval = end;
dentry = parent;
}
+ read_sequnlock(&rename_lock);
return retval;
Elong:
+ read_sequnlock(&rename_lock);
return ERR_PTR(-ENAMETOOLONG);
}
char *dentry_path_raw(struct dentry *dentry, char *buf, int buflen)
{
- char *retval;
-
- read_seqlock(&rename_lock);
- retval = __dentry_path(dentry, buf, buflen);
- read_sequnlock(&rename_lock);
-
- return retval;
+ return __dentry_path(dentry, buf, buflen);
}
EXPORT_SYMBOL(dentry_path_raw);
@@ -2757,7 +2749,6 @@ char *dentry_path(struct dentry *dentry,
char *p = NULL;
char *retval;
- read_seqlock(&rename_lock);
if (d_unlinked(dentry)) {
p = buf + buflen;
if (prepend(&p, &buflen, "//deleted", 10) != 0)
@@ -2765,7 +2756,6 @@ char *dentry_path(struct dentry *dentry,
buflen++;
}
retval = __dentry_path(dentry, buf, buflen);
- read_sequnlock(&rename_lock);
if (!IS_ERR(retval) && p)
*p = '/'; /* restore '/' overriden with '\0' */
return retval;
@@ -2803,7 +2793,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
get_fs_root_and_pwd(current->fs, &root, &pwd);
error = -ENOENT;
- read_seqlock(&rename_lock);
if (!d_unlinked(pwd.dentry)) {
unsigned long len;
struct path tmp = root;
@@ -2812,7 +2801,6 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
prepend(&cwd, &buflen, "\0", 1);
error = prepend_path(&pwd, &tmp, &cwd, &buflen);
- read_sequnlock(&rename_lock);
if (error)
goto out;
@@ -2831,10 +2819,7 @@ SYSCALL_DEFINE2(getcwd, char __user *, b
if (copy_to_user(buf, cwd, len))
error = -EFAULT;
}
- } else {
- read_sequnlock(&rename_lock);
}
-
out:
path_put(&pwd);
path_put(&root);
^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 3/3] vfs: Make the rename_lock per-sb
2011-07-28 13:12 [patch 0/3] A few patches for dcache Cyrill Gorcunov
2011-07-28 13:12 ` [patch 1/3] vfs, dcache: Introduce lighten r/o rename_lock lockers Cyrill Gorcunov
2011-07-28 13:12 ` [patch 2/3] vfs, dcache: Factor out rename_lock locking Cyrill Gorcunov
@ 2011-07-28 13:12 ` Cyrill Gorcunov
2011-07-29 3:25 ` [patch 0/3] A few patches for dcache Dave Chinner
3 siblings, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2011-07-28 13:12 UTC (permalink / raw)
To: Al Viro, LINUXFS-ML
Cc: Andrew Morton, Christoph Hellwig, James Bottomley, xemul,
Cyrill Gorcunov
[-- Attachment #1: dcache-3-13.patch --]
[-- Type: text/plain, Size: 13762 bytes --]
From: Pavel Emelyanov <xemul@openvz.org>
There are two things I try to get with this patch.
The first one -- renames on different sb-s will not depend on each other.
The second one -- I plan to factor out the dcache shrinking code (in
further patches). In order to make the srink_dcache_for_umount work OK
on a dying SB it's better to have this lock non-global.
There's some trickery in the prepend_path -- this one has to relock
the rename lock as it ascends the vfsmount tree. This relock seems OK,
as the dentry path of each mount renames solid, and even if we "race"
with the rename on some of the super blocks, the overall path we get
will be either the one we had before rename or them one we had after
one, i.e. -- some existing one.
v2: Use a pointer s_rename_lock in nfs_path() instead of
dereferences in cycle.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
drivers/staging/pohmelfs/path_entry.c | 4 +-
fs/autofs4/waitq.c | 6 +--
fs/dcache.c | 66 +++++++++++++++++-----------------
fs/nfs/namespace.c | 7 ++-
fs/super.c | 1
include/linux/dcache.h | 2 -
include/linux/fs.h | 2 +
kernel/auditsc.c | 4 +-
8 files changed, 47 insertions(+), 45 deletions(-)
Index: linux-2.6.git/drivers/staging/pohmelfs/path_entry.c
===================================================================
--- linux-2.6.git.orig/drivers/staging/pohmelfs/path_entry.c
+++ linux-2.6.git/drivers/staging/pohmelfs/path_entry.c
@@ -99,7 +99,7 @@ int pohmelfs_path_length(struct pohmelfs
rename_retry:
len = 1; /* Root slash */
d = first;
- seq = read_seqbegin(&rename_lock);
+ seq = read_seqbegin(&root->d_sb->s_rename_lock);
rcu_read_lock();
if (!IS_ROOT(d) && d_unhashed(d))
@@ -110,7 +110,7 @@ rename_retry:
d = d->d_parent;
}
rcu_read_unlock();
- if (read_seqretry(&rename_lock, seq))
+ if (read_seqretry(&root->d_sb->s_rename_lock, seq))
goto rename_retry;
dput(root);
Index: linux-2.6.git/fs/autofs4/waitq.c
===================================================================
--- linux-2.6.git.orig/fs/autofs4/waitq.c
+++ linux-2.6.git/fs/autofs4/waitq.c
@@ -195,7 +195,7 @@ rename_retry:
buf = *name;
len = 0;
- seq = read_seqbegin(&rename_lock);
+ seq = read_seqbegin(&sbi->sb->s_rename_lock);
rcu_read_lock();
spin_lock(&sbi->fs_lock);
for (tmp = dentry ; tmp != root ; tmp = tmp->d_parent)
@@ -204,7 +204,7 @@ rename_retry:
if (!len || --len > NAME_MAX) {
spin_unlock(&sbi->fs_lock);
rcu_read_unlock();
- if (read_seqretry(&rename_lock, seq))
+ if (read_seqretry(&sbi->sb->s_rename_lock, seq))
goto rename_retry;
return 0;
}
@@ -220,7 +220,7 @@ rename_retry:
}
spin_unlock(&sbi->fs_lock);
rcu_read_unlock();
- if (read_seqretry(&rename_lock, seq))
+ if (read_seqretry(&sbi->sb->s_rename_lock, seq))
goto rename_retry;
return len;
Index: linux-2.6.git/fs/dcache.c
===================================================================
--- linux-2.6.git.orig/fs/dcache.c
+++ linux-2.6.git/fs/dcache.c
@@ -80,10 +80,6 @@ int sysctl_vfs_cache_pressure __read_mos
EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
static __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lru_lock);
-__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
-
-EXPORT_SYMBOL(rename_lock);
-
static struct kmem_cache *dentry_cache __read_mostly;
/*
@@ -1014,7 +1010,7 @@ static struct dentry *try_to_ascend(stru
*/
if (new != old->d_parent ||
(old->d_flags & DCACHE_DISCONNECTED) ||
- (!locked && read_seqretry(&rename_lock, seq))) {
+ (!locked && read_seqretry(&old->d_sb->s_rename_lock, seq))) {
spin_unlock(&new->d_lock);
new = NULL;
}
@@ -1043,7 +1039,7 @@ int have_submounts(struct dentry *parent
unsigned seq;
int locked = 0;
- seq = read_seqbegin(&rename_lock);
+ seq = read_seqbegin(&parent->d_sb->s_rename_lock);
again:
this_parent = parent;
@@ -1086,21 +1082,21 @@ resume:
goto resume;
}
spin_unlock(&this_parent->d_lock);
- if (!locked && read_seqretry(&rename_lock, seq))
+ if (!locked && read_seqretry(&parent->d_sb->s_rename_lock, seq))
goto rename_retry;
if (locked)
- read_sequnlock(&rename_lock);
+ read_sequnlock(&parent->d_sb->s_rename_lock);
return 0; /* No mount points found in tree */
positive:
- if (!locked && read_seqretry(&rename_lock, seq))
+ if (!locked && read_seqretry(&parent->d_sb->s_rename_lock, seq))
goto rename_retry;
if (locked)
- read_sequnlock(&rename_lock);
+ read_sequnlock(&parent->d_sb->s_rename_lock);
return 1;
rename_retry:
locked = 1;
- read_seqlock(&rename_lock);
+ read_seqlock(&parent->d_sb->s_rename_lock);
goto again;
}
EXPORT_SYMBOL(have_submounts);
@@ -1127,7 +1123,7 @@ static int select_parent(struct dentry *
int found = 0;
int locked = 0;
- seq = read_seqbegin(&rename_lock);
+ seq = read_seqbegin(&parent->d_sb->s_rename_lock);
again:
this_parent = parent;
spin_lock(&this_parent->d_lock);
@@ -1188,17 +1184,17 @@ resume:
}
out:
spin_unlock(&this_parent->d_lock);
- if (!locked && read_seqretry(&rename_lock, seq))
+ if (!locked && read_seqretry(&parent->d_sb->s_rename_lock, seq))
goto rename_retry;
if (locked)
- read_sequnlock(&rename_lock);
+ read_sequnlock(&parent->d_sb->s_rename_lock);
return found;
rename_retry:
if (found)
return found;
locked = 1;
- read_seqlock(&rename_lock);
+ read_seqlock(&parent->d_sb->s_rename_lock);
goto again;
}
@@ -1859,11 +1855,11 @@ struct dentry *d_lookup(struct dentry *p
unsigned seq;
do {
- seq = read_seqbegin(&rename_lock);
+ seq = read_seqbegin(&parent->d_sb->s_rename_lock);
dentry = __d_lookup(parent, name);
if (dentry)
break;
- } while (read_seqretry(&rename_lock, seq));
+ } while (read_seqretry(&parent->d_sb->s_rename_lock, seq));
return dentry;
}
EXPORT_SYMBOL(d_lookup);
@@ -2226,6 +2222,7 @@ static void __d_move(struct dentry * den
BUG_ON(d_ancestor(dentry, target));
BUG_ON(d_ancestor(target, dentry));
+ BUG_ON(dentry->d_sb != target->d_sb);
dentry_lock_for_move(dentry, target);
@@ -2284,9 +2281,9 @@ static void __d_move(struct dentry * den
*/
void d_move(struct dentry *dentry, struct dentry *target)
{
- write_seqlock(&rename_lock);
+ write_seqlock(&dentry->d_sb->s_rename_lock);
__d_move(dentry, target);
- write_sequnlock(&rename_lock);
+ write_sequnlock(&dentry->d_sb->s_rename_lock);
}
EXPORT_SYMBOL(d_move);
@@ -2422,7 +2419,7 @@ struct dentry *d_materialise_unique(stru
alias = __d_find_alias(inode, 0);
if (alias) {
actual = alias;
- write_seqlock(&rename_lock);
+ write_seqlock(&dentry->d_sb->s_rename_lock);
if (d_ancestor(alias, dentry)) {
/* Check for loops */
@@ -2431,7 +2428,7 @@ struct dentry *d_materialise_unique(stru
/* Is this an anonymous mountpoint that we
* could splice into our tree? */
__d_materialise_dentry(dentry, alias);
- write_sequnlock(&rename_lock);
+ write_sequnlock(&dentry->d_sb->s_rename_lock);
__d_drop(alias);
goto found;
} else {
@@ -2439,7 +2436,7 @@ struct dentry *d_materialise_unique(stru
* aliasing */
actual = __d_unalias(inode, dentry, alias);
}
- write_sequnlock(&rename_lock);
+ write_sequnlock(&dentry->d_sb->s_rename_lock);
if (IS_ERR(actual))
dput(alias);
goto out_nolock;
@@ -2504,8 +2501,8 @@ static int prepend_path(const struct pat
bool slash = false;
int error = 0;
- read_seqlock(&rename_lock);
br_read_lock(vfsmount_lock);
+ read_seqlock(&vfsmnt->mnt_sb->s_rename_lock);
while (dentry != root->dentry || vfsmnt != root->mnt) {
struct dentry * parent;
@@ -2515,7 +2512,9 @@ static int prepend_path(const struct pat
goto global_root;
}
dentry = vfsmnt->mnt_mountpoint;
+ read_sequnlock(&vfsmnt->mnt_sb->s_rename_lock);
vfsmnt = vfsmnt->mnt_parent;
+ read_seqlock(&vfsmnt->mnt_sb->s_rename_lock);
continue;
}
parent = dentry->d_parent;
@@ -2536,8 +2535,8 @@ out:
if (!error && !slash)
error = prepend(buffer, buflen, "/", 1);
+ read_sequnlock(&vfsmnt->mnt_sb->s_rename_lock);
br_read_unlock(vfsmount_lock);
- read_sequnlock(&rename_lock);
return error;
global_root:
@@ -2708,6 +2707,7 @@ static char *__dentry_path(struct dentry
{
char *end = buf + buflen;
char *retval;
+ struct super_block *sb = dentry->d_sb;
prepend(&end, &buflen, "\0", 1);
if (buflen < 1)
@@ -2716,7 +2716,7 @@ static char *__dentry_path(struct dentry
retval = end-1;
*retval = '/';
- read_seqlock(&rename_lock);
+ read_seqlock(&sb->s_rename_lock);
while (!IS_ROOT(dentry)) {
struct dentry *parent = dentry->d_parent;
int error;
@@ -2731,10 +2731,10 @@ static char *__dentry_path(struct dentry
retval = end;
dentry = parent;
}
- read_sequnlock(&rename_lock);
+ read_sequnlock(&sb->s_rename_lock);
return retval;
Elong:
- read_sequnlock(&rename_lock);
+ read_sequnlock(&sb->s_rename_lock);
return ERR_PTR(-ENAMETOOLONG);
}
@@ -2853,7 +2853,7 @@ int is_subdir(struct dentry *new_dentry,
do {
/* for restarting inner loop in case of seq retry */
- seq = read_seqbegin(&rename_lock);
+ seq = read_seqbegin(&old_dentry->d_sb->s_rename_lock);
/*
* Need rcu_readlock to protect against the d_parent trashing
* due to d_move
@@ -2864,7 +2864,7 @@ int is_subdir(struct dentry *new_dentry,
else
result = 0;
rcu_read_unlock();
- } while (read_seqretry(&rename_lock, seq));
+ } while (read_seqretry(&old_dentry->d_sb->s_rename_lock, seq));
return result;
}
@@ -2901,7 +2901,7 @@ void d_genocide(struct dentry *root)
unsigned seq;
int locked = 0;
- seq = read_seqbegin(&rename_lock);
+ seq = read_seqbegin(&root->d_sb->s_rename_lock);
again:
this_parent = root;
spin_lock(&this_parent->d_lock);
@@ -2944,15 +2944,15 @@ resume:
goto resume;
}
spin_unlock(&this_parent->d_lock);
- if (!locked && read_seqretry(&rename_lock, seq))
+ if (!locked && read_seqretry(&root->d_sb->s_rename_lock, seq))
goto rename_retry;
if (locked)
- read_sequnlock(&rename_lock);
+ read_sequnlock(&root->d_sb->s_rename_lock);
return;
rename_retry:
locked = 1;
- read_seqlock(&rename_lock);
+ read_seqlock(&root->d_sb->s_rename_lock);
goto again;
}
Index: linux-2.6.git/fs/nfs/namespace.c
===================================================================
--- linux-2.6.git.orig/fs/nfs/namespace.c
+++ linux-2.6.git/fs/nfs/namespace.c
@@ -47,6 +47,7 @@ static struct vfsmount *nfs_do_submount(
*/
char *nfs_path(char **p, struct dentry *dentry, char *buffer, ssize_t buflen)
{
+ seqlock_t *p_rename_lock = &dentry->d_sb->s_rename_lock;
char *end;
int namelen;
unsigned seq;
@@ -57,7 +58,7 @@ rename_retry:
*--end = '\0';
buflen--;
- seq = read_seqbegin(&rename_lock);
+ seq = read_seqbegin(p_rename_lock);
rcu_read_lock();
while (1) {
spin_lock(&dentry->d_lock);
@@ -73,7 +74,7 @@ rename_retry:
spin_unlock(&dentry->d_lock);
dentry = dentry->d_parent;
}
- if (read_seqretry(&rename_lock, seq)) {
+ if (read_seqretry(p_rename_lock, seq)) {
spin_unlock(&dentry->d_lock);
rcu_read_unlock();
goto rename_retry;
@@ -112,7 +113,7 @@ rename_retry:
Elong_unlock:
spin_unlock(&dentry->d_lock);
rcu_read_unlock();
- if (read_seqretry(&rename_lock, seq))
+ if (read_seqretry(p_rename_lock, seq))
goto rename_retry;
Elong:
return ERR_PTR(-ENAMETOOLONG);
Index: linux-2.6.git/fs/super.c
===================================================================
--- linux-2.6.git.orig/fs/super.c
+++ linux-2.6.git/fs/super.c
@@ -105,6 +105,7 @@ static struct super_block *alloc_super(s
s->s_count = 1;
atomic_set(&s->s_active, 1);
mutex_init(&s->s_vfs_rename_mutex);
+ seqlock_init(&s->s_rename_lock);
lockdep_set_class(&s->s_vfs_rename_mutex, &type->s_vfs_rename_key);
mutex_init(&s->s_dquot.dqio_mutex);
mutex_init(&s->s_dquot.dqonoff_mutex);
Index: linux-2.6.git/include/linux/dcache.h
===================================================================
--- linux-2.6.git.orig/include/linux/dcache.h
+++ linux-2.6.git/include/linux/dcache.h
@@ -219,8 +219,6 @@ struct dentry_operations {
#define DCACHE_MANAGED_DENTRY \
(DCACHE_MOUNTED|DCACHE_NEED_AUTOMOUNT|DCACHE_MANAGE_TRANSIT)
-extern seqlock_t rename_lock;
-
static inline int dname_external(struct dentry *dentry)
{
return dentry->d_name.name != dentry->d_iname;
Index: linux-2.6.git/include/linux/fs.h
===================================================================
--- linux-2.6.git.orig/include/linux/fs.h
+++ linux-2.6.git/include/linux/fs.h
@@ -1421,6 +1421,8 @@ struct super_block {
*/
struct mutex s_vfs_rename_mutex; /* Kludge */
+ seqlock_t s_rename_lock;
+
/*
* Filesystem subtype. If non-empty the filesystem type field
* in /proc/mounts will be "type.subtype"
Index: linux-2.6.git/kernel/auditsc.c
===================================================================
--- linux-2.6.git.orig/kernel/auditsc.c
+++ linux-2.6.git/kernel/auditsc.c
@@ -1779,7 +1779,7 @@ retry:
drop = NULL;
d = dentry;
rcu_read_lock();
- seq = read_seqbegin(&rename_lock);
+ seq = read_seqbegin(&dentry->d_sb->s_rename_lock);
for(;;) {
struct inode *inode = d->d_inode;
if (inode && unlikely(!hlist_empty(&inode->i_fsnotify_marks))) {
@@ -1797,7 +1797,7 @@ retry:
break;
d = parent;
}
- if (unlikely(read_seqretry(&rename_lock, seq) || drop)) { /* in this order */
+ if (unlikely(read_seqretry(&dentry->d_sb->s_rename_lock, seq) || drop)) { /* in this order */
rcu_read_unlock();
if (!drop) {
/* just a race with rename */
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 0/3] A few patches for dcache
2011-07-28 13:12 [patch 0/3] A few patches for dcache Cyrill Gorcunov
` (2 preceding siblings ...)
2011-07-28 13:12 ` [patch 3/3] vfs: Make the rename_lock per-sb Cyrill Gorcunov
@ 2011-07-29 3:25 ` Dave Chinner
2011-07-29 5:59 ` Cyrill Gorcunov
3 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2011-07-29 3:25 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Al Viro, LINUXFS-ML, Andrew Morton, Christoph Hellwig,
James Bottomley, xemul
On Thu, Jul 28, 2011 at 05:12:19PM +0400, Cyrill Gorcunov wrote:
> There were a thread at http://lwn.net/Articles/441164/ and while the rest of
> dcache rework is still under review and discussion, the three patches might be
> considered independantly.
>
> Please review, comments are *highly* appreciated!
>
> I've tested them locally with dbench (maybe something
> else I should try?) to be sure all works as expected
> but better to have them reviewed since tests do not
> always reveal potential problems.
>
> The idea of series is to remove global rename_lock and
> prepare the ground for future factoring out of the
> shrinker code.
The VFS shrinker code is now already called on a per-sb basis. Each
sb has it's own shrinker context that deals with dentries, inodes
and anything a filesystem wants to have shrunk in the call. That
solves the original issue I had with your "limit the dentry cache
size" patch series in that it didn't shrink or limit the other VFS
caches that were the ones that were really consuming all your
memory...
If you want to limit the size of the VFS caches on a specific
superblock, then I'm pretty sure all we need to do now is add a
method to call the per-sb shrinker when over a threshold rather than
only calling them when there is memory pressure (e.g. via a
workqueue).
Right now the only thing that will serialise this is the fact that
the dcache_lru_lock is still global. I dropped the patch to make it
per-sb because it was causing memory corruptions and crashes. I'm
starting to test that patch again now that the dcache gurus seems to
have fixed a bunch of nasty problems that could have been causing
this...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 0/3] A few patches for dcache
2011-07-29 3:25 ` [patch 0/3] A few patches for dcache Dave Chinner
@ 2011-07-29 5:59 ` Cyrill Gorcunov
2011-07-29 6:59 ` Dave Chinner
0 siblings, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2011-07-29 5:59 UTC (permalink / raw)
To: Dave Chinner
Cc: Al Viro, LINUXFS-ML, Andrew Morton, Christoph Hellwig,
James Bottomley, xemul
On Fri, Jul 29, 2011 at 01:25:03PM +1000, Dave Chinner wrote:
...
>
> The VFS shrinker code is now already called on a per-sb basis. Each
> sb has it's own shrinker context that deals with dentries, inodes
> and anything a filesystem wants to have shrunk in the call. That
> solves the original issue I had with your "limit the dentry cache
> size" patch series in that it didn't shrink or limit the other VFS
> caches that were the ones that were really consuming all your
> memory...
Thanks for comments, Dave! Still the read only lock without
increasing sequence number might be useful, no? (patch 1)
...
>
> Dave.
Cyrill
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 0/3] A few patches for dcache
2011-07-29 5:59 ` Cyrill Gorcunov
@ 2011-07-29 6:59 ` Dave Chinner
2011-07-29 7:01 ` Cyrill Gorcunov
2011-07-29 7:24 ` Al Viro
0 siblings, 2 replies; 12+ messages in thread
From: Dave Chinner @ 2011-07-29 6:59 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Al Viro, LINUXFS-ML, Andrew Morton, Christoph Hellwig,
James Bottomley, xemul
On Fri, Jul 29, 2011 at 09:59:18AM +0400, Cyrill Gorcunov wrote:
> On Fri, Jul 29, 2011 at 01:25:03PM +1000, Dave Chinner wrote:
> ...
> >
> > The VFS shrinker code is now already called on a per-sb basis. Each
> > sb has it's own shrinker context that deals with dentries, inodes
> > and anything a filesystem wants to have shrunk in the call. That
> > solves the original issue I had with your "limit the dentry cache
> > size" patch series in that it didn't shrink or limit the other VFS
> > caches that were the ones that were really consuming all your
> > memory...
>
> Thanks for comments, Dave! Still the read only lock without
> increasing sequence number might be useful, no? (patch 1)
I'll defer to Al on that one - the intricacies of the rename locking
are way over my head.
FWIW, the problems with the per-sb dcache LRU lock seem to be gone -
it's not causing my test machines to fall over with the current
Linus tree like it was during 2.6.39-rc and 3.0-rc kernels...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 0/3] A few patches for dcache
2011-07-29 6:59 ` Dave Chinner
@ 2011-07-29 7:01 ` Cyrill Gorcunov
2011-07-29 7:24 ` Al Viro
1 sibling, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2011-07-29 7:01 UTC (permalink / raw)
To: Dave Chinner
Cc: Al Viro, LINUXFS-ML, Andrew Morton, Christoph Hellwig,
James Bottomley, xemul
On Fri, Jul 29, 2011 at 04:59:51PM +1000, Dave Chinner wrote:
...
> >
> > Thanks for comments, Dave! Still the read only lock without
> > increasing sequence number might be useful, no? (patch 1)
>
> I'll defer to Al on that one - the intricacies of the rename locking
> are way over my head.
ok
>
> FWIW, the problems with the per-sb dcache LRU lock seem to be gone -
> it's not causing my test machines to fall over with the current
> Linus tree like it was during 2.6.39-rc and 3.0-rc kernels...
>
> Cheers,
>
> Dave.
ok, thanks for info!
Cyrill
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 0/3] A few patches for dcache
2011-07-29 6:59 ` Dave Chinner
2011-07-29 7:01 ` Cyrill Gorcunov
@ 2011-07-29 7:24 ` Al Viro
2011-07-29 7:50 ` Cyrill Gorcunov
2011-08-15 7:42 ` Cyrill Gorcunov
1 sibling, 2 replies; 12+ messages in thread
From: Al Viro @ 2011-07-29 7:24 UTC (permalink / raw)
To: Dave Chinner
Cc: Cyrill Gorcunov, LINUXFS-ML, Andrew Morton, Christoph Hellwig,
James Bottomley, xemul
On Fri, Jul 29, 2011 at 04:59:51PM +1000, Dave Chinner wrote:
> On Fri, Jul 29, 2011 at 09:59:18AM +0400, Cyrill Gorcunov wrote:
> > On Fri, Jul 29, 2011 at 01:25:03PM +1000, Dave Chinner wrote:
> > ...
> > >
> > > The VFS shrinker code is now already called on a per-sb basis. Each
> > > sb has it's own shrinker context that deals with dentries, inodes
> > > and anything a filesystem wants to have shrunk in the call. That
> > > solves the original issue I had with your "limit the dentry cache
> > > size" patch series in that it didn't shrink or limit the other VFS
> > > caches that were the ones that were really consuming all your
> > > memory...
> >
> > Thanks for comments, Dave! Still the read only lock without
> > increasing sequence number might be useful, no? (patch 1)
>
> I'll defer to Al on that one - the intricacies of the rename locking
> are way over my head.
I'm not sure that's safe. Note that one use of rename_lock is that
we allow hash lookup to race with d_move(). Which can move object
from one hash chain to another, so hash lookup may end up jumping
from one chain to another and getting a false negative. That's
why __d_lookup() is not safe without read_seqretry loop (or seq_writelock,
of course).
But look what happens if we do per-sb locks - d_move() derailing the
hash lookup might happen on *any* filesystem. They all share the
same hash table. So just checking that we hadn't done renames on
our filesystem is not enough to make sure we hadn't hit a false
negative.
Unless we go for making the hashtable itself per-superblock (and I really
doubt that it's a good idea), I don't see any obvious ways to avoid that
kind of race. IOW, how would you implement safe d_lookup()?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 0/3] A few patches for dcache
2011-07-29 7:24 ` Al Viro
@ 2011-07-29 7:50 ` Cyrill Gorcunov
2011-08-15 7:42 ` Cyrill Gorcunov
1 sibling, 0 replies; 12+ messages in thread
From: Cyrill Gorcunov @ 2011-07-29 7:50 UTC (permalink / raw)
To: Al Viro
Cc: Dave Chinner, LINUXFS-ML, Andrew Morton, Christoph Hellwig,
James Bottomley, xemul
On Fri, Jul 29, 2011 at 08:24:41AM +0100, Al Viro wrote:
...
> > >
> > > Thanks for comments, Dave! Still the read only lock without
> > > increasing sequence number might be useful, no? (patch 1)
> >
> > I'll defer to Al on that one - the intricacies of the rename locking
> > are way over my head.
>
> I'm not sure that's safe. Note that one use of rename_lock is that
> we allow hash lookup to race with d_move(). Which can move object
> from one hash chain to another, so hash lookup may end up jumping
> from one chain to another and getting a false negative. That's
> why __d_lookup() is not safe without read_seqretry loop (or seq_writelock,
> of course).
>
...
Thanks a lot for explanations, Al!
Cyrill
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 0/3] A few patches for dcache
2011-07-29 7:24 ` Al Viro
2011-07-29 7:50 ` Cyrill Gorcunov
@ 2011-08-15 7:42 ` Cyrill Gorcunov
2011-08-24 6:31 ` Pavel Emelyanov
1 sibling, 1 reply; 12+ messages in thread
From: Cyrill Gorcunov @ 2011-08-15 7:42 UTC (permalink / raw)
To: Al Viro
Cc: Dave Chinner, LINUXFS-ML, Andrew Morton, Christoph Hellwig,
James Bottomley, xemul
On Fri, Jul 29, 2011 at 08:24:41AM +0100, Al Viro wrote:
...
> >
> > I'll defer to Al on that one - the intricacies of the rename locking
> > are way over my head.
>
> I'm not sure that's safe. Note that one use of rename_lock is that
> we allow hash lookup to race with d_move(). Which can move object
> from one hash chain to another, so hash lookup may end up jumping
> from one chain to another and getting a false negative. That's
> why __d_lookup() is not safe without read_seqretry loop (or seq_writelock,
> of course).
>
> But look what happens if we do per-sb locks - d_move() derailing the
> hash lookup might happen on *any* filesystem. They all share the
> same hash table. So just checking that we hadn't done renames on
> our filesystem is not enough to make sure we hadn't hit a false
> negative.
>
> Unless we go for making the hashtable itself per-superblock (and I really
> doubt that it's a good idea), I don't see any obvious ways to avoid that
> kind of race. IOW, how would you implement safe d_lookup()?
Hi Al,
(a bit late reply actually ;)
but what about first two patches (without per-sb locks)?
They simply introduce read_seqlock/unlock without modification of sl->sequence.
The functions which were calling for read_seqretry -- still remains and do continue
calling for read_seqretry.
The following functions were converted to read_seqlock/unlock
- have_submounts (still have the read_seqretry on unlocked path)
- select_parent (still have the read_seqretry on unlocked path)
- __d_path
- d_path
- d_path_with_unreachable
- dentry_path_raw
- dentry_path
- getcwd
- d_genocide (still have the read_seqretry on unlocked path)
While both d_move() and d_materialise_unique() remains using write_seqlock.
So it seems I'm a bit confused/screwed why can't we do so...
Cyrill
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 0/3] A few patches for dcache
2011-08-15 7:42 ` Cyrill Gorcunov
@ 2011-08-24 6:31 ` Pavel Emelyanov
0 siblings, 0 replies; 12+ messages in thread
From: Pavel Emelyanov @ 2011-08-24 6:31 UTC (permalink / raw)
To: Al Viro
Cc: Cyrill Gorcunov, Dave Chinner, LINUXFS-ML, Andrew Morton,
Christoph Hellwig, James Bottomley
> Hi Al,
>
> (a bit late reply actually ;)
>
> but what about first two patches (without per-sb locks)?
Al, taking into account the amount of changes in the dcache should we
rebase patches 1 and 2 and send you again?
Thanks,
Pavel
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-08-24 6:32 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-28 13:12 [patch 0/3] A few patches for dcache Cyrill Gorcunov
2011-07-28 13:12 ` [patch 1/3] vfs, dcache: Introduce lighten r/o rename_lock lockers Cyrill Gorcunov
2011-07-28 13:12 ` [patch 2/3] vfs, dcache: Factor out rename_lock locking Cyrill Gorcunov
2011-07-28 13:12 ` [patch 3/3] vfs: Make the rename_lock per-sb Cyrill Gorcunov
2011-07-29 3:25 ` [patch 0/3] A few patches for dcache Dave Chinner
2011-07-29 5:59 ` Cyrill Gorcunov
2011-07-29 6:59 ` Dave Chinner
2011-07-29 7:01 ` Cyrill Gorcunov
2011-07-29 7:24 ` Al Viro
2011-07-29 7:50 ` Cyrill Gorcunov
2011-08-15 7:42 ` Cyrill Gorcunov
2011-08-24 6:31 ` Pavel Emelyanov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).