* Re: dentry bloat.
[not found] ` <409B1511.6010500@colorfullife.com>
@ 2004-05-08 8:23 ` Andrew Morton
2004-05-08 9:23 ` Andrew Morton
0 siblings, 1 reply; 62+ messages in thread
From: Andrew Morton @ 2004-05-08 8:23 UTC (permalink / raw)
To: Manfred Spraul; +Cc: davej, torvalds, wli, linux-kernel
(added lkml)
Manfred Spraul <manfred@colorfullife.com> wrote:
>
> Andrew Morton wrote:
>
> >It does seem a little excessive. Maybe we should round the size up by "the
> >next multiple of sizeof(void*) which is greater than 32".
> >
> >
> Sounds good.
> But the SLAB_HWCACHE_ALIGN parameter from the kmem_cache_create call
> must be removed as well - otherwise slab will round the size up to 256
> again.
>
There are a couple of things I don't understand in the dentry name handling:
a) How come switch_names does a memcpy of d_iname even if the name might
not be in there?
b) Why does d_alloc() resize the storage for the out-of-line name to the
next multiple of 16?
c) Why is it that when we generate an out-of-line name we allocate a
complete new qstr rather than just storage for the name? The dentry has
a whole qstr just sitting there doing nothing, and from a quick squizz
it seems that we could simply remove dentry.d_qstr, make
dentry.d_name.name point at the kmalloced name and not allocate all the
other parts of the qstr?
Anyway, here be a patch.
Save a bit of dcache space.
Currently we use all the space from the start of dentry.d_name to the next
cacheline for the inline name. That can be 128 bytes or more nowadays, which
is excessive. In fact it's exactly 128 bytes of inline name length for an x86
P4 build (both UP and SMP).
Rework things so that the inline name length is between 31 and 48 bytes.
On SMP P4-compiled x86 each dentry goes from 256 bytes (15 per page) to 160
bytes (24 per page). The downside is that we'll have more out-of-line names.
Here's the histogram of name lengths on all 1.5M files on my workstation:
1: 0%
2: 0%
3: 1%
4: 5%
5: 8%
6: 13%
7: 19%
8: 26%
9: 33%
10: 42%
11: 49%
12: 55%
13: 60%
14: 64%
15: 67%
16: 69%
17: 71%
18: 73%
19: 75%
20: 76%
21: 78%
22: 79%
23: 80%
24: 81%
25: 82%
26: 83%
27: 85%
28: 86%
29: 87%
30: 88%
31: 89%
32: 90%
33: 91%
34: 92%
35: 93%
36: 94%
37: 95%
38: 96%
39: 96%
40: 96%
41: 96%
42: 96%
43: 96%
44: 97%
45: 97%
46: 97%
47: 97%
48: 97%
49: 98%
50: 98%
51: 98%
52: 98%
53: 98%
54: 98%
55: 98%
56: 98%
57: 98%
58: 98%
59: 98%
60: 99%
61: 99%
62: 99%
63: 99%
64: 99%
So on x86 we'll fit 89% of filenames into the inline name.
The patch also removes the NAME_ALLOC_LEN() rounding-up of the storage for the
out-of-line names. That seems unnecessary.
---
25-akpm/fs/dcache.c | 16 ++++++++++------
25-akpm/include/linux/dcache.h | 8 ++------
2 files changed, 12 insertions(+), 12 deletions(-)
diff -puN fs/dcache.c~dentry-shrinkage fs/dcache.c
--- 25/fs/dcache.c~dentry-shrinkage 2004-05-08 00:55:58.762312080 -0700
+++ 25-akpm/fs/dcache.c 2004-05-08 00:56:22.590689616 -0700
@@ -42,6 +42,13 @@ EXPORT_SYMBOL(dcache_lock);
static kmem_cache_t *dentry_cache;
/*
+ * The allocation size for each dentry. It is a multiple of 16 bytes. We
+ * leave the final 32-47 bytes for the inline name.
+ */
+#define DENTRY_STORAGE (((sizeof(struct dentry)+32) + 15) & ~15)
+#define DNAME_INLINE_LEN (DENTRY_STORAGE - sizeof(struct dentry))
+
+/*
* This is the single most critical data structure when it comes
* to the dcache: the hashtable for lookups. Somebody should try
* to make this good - I've just made it work.
@@ -665,8 +672,6 @@ static int shrink_dcache_memory(int nr,
return dentry_stat.nr_unused;
}
-#define NAME_ALLOC_LEN(len) ((len+16) & ~15)
-
/**
* d_alloc - allocate a dcache entry
* @parent: parent of entry to allocate
@@ -688,8 +693,7 @@ struct dentry * d_alloc(struct dentry *
return NULL;
if (name->len > DNAME_INLINE_LEN-1) {
- qstr = kmalloc(sizeof(*qstr) + NAME_ALLOC_LEN(name->len),
- GFP_KERNEL);
+ qstr = kmalloc(sizeof(*qstr) + name->len + 1, GFP_KERNEL);
if (!qstr) {
kmem_cache_free(dentry_cache, dentry);
return NULL;
@@ -1563,8 +1567,8 @@ static void __init dcache_init(unsigned
* flag could be removed here, to hint to the allocator that
* it should not try to get multiple page regions.
*/
- dentry_cache = kmem_cache_create("dentry_cache", sizeof(struct dentry),
- 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_PANIC,
+ dentry_cache = kmem_cache_create("dentry_cache", DENTRY_STORAGE,
+ 0, SLAB_RECLAIM_ACCOUNT|SLAB_PANIC,
NULL, NULL);
set_shrinker(DEFAULT_SEEKS, shrink_dcache_memory);
diff -puN include/linux/dcache.h~dentry-shrinkage include/linux/dcache.h
--- 25/include/linux/dcache.h~dentry-shrinkage 2004-05-08 00:55:58.774310256 -0700
+++ 25-akpm/include/linux/dcache.h 2004-05-08 00:55:58.779309496 -0700
@@ -74,8 +74,6 @@ full_name_hash(const unsigned char *name
return end_name_hash(hash);
}
-#define DNAME_INLINE_LEN_MIN 16
-
struct dcookie_struct;
struct dentry {
@@ -101,11 +99,9 @@ struct dentry {
struct qstr d_name;
struct hlist_node d_hash; /* lookup hash list */
struct hlist_head * d_bucket; /* lookup hash bucket */
- unsigned char d_iname[DNAME_INLINE_LEN_MIN]; /* small names */
-} ____cacheline_aligned;
+ unsigned char d_iname[0]; /* small names */
+};
-#define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname))
-
struct dentry_operations {
int (*d_revalidate)(struct dentry *, struct nameidata *);
int (*d_hash) (struct dentry *, struct qstr *);
_
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 8:23 ` dentry bloat Andrew Morton
@ 2004-05-08 9:23 ` Andrew Morton
2004-05-08 10:11 ` Andrew Morton
0 siblings, 1 reply; 62+ messages in thread
From: Andrew Morton @ 2004-05-08 9:23 UTC (permalink / raw)
To: manfred, davej, torvalds, wli, linux-kernel
Andrew Morton <akpm@osdl.org> wrote:
>
> There are a couple of things I don't understand in the dentry name handling:
>
> a) How come switch_names does a memcpy of d_iname even if the name might
> not be in there?
OK, just in case the target dentry was internal.
> b) Why does d_alloc() resize the storage for the out-of-line name to the
> next multiple of 16?
No reason I can see.
> c) Why is it that when we generate an out-of-line name we allocate a
> complete new qstr rather than just storage for the name? The dentry has
> a whole qstr just sitting there doing nothing, and from a quick squizz
> it seems that we could simply remove dentry.d_qstr, make
> dentry.d_name.name point at the kmalloced name and not allocate all the
> other parts of the qstr?
Great idea! Here be another patch.
When dentries are given an external name we currently allocate an entire qstr
for the external name.
This isn't needed. We can use the internal qstr and kmalloc only the string
itself. This saves 12 bytes from externally-allocated names and 4 bytes from
the dentry itself.
The saving of 4 bytes from the dentry doesn't actually decrease the dentry's
storage requirements, but it makes four more bytes available for internal
names, taking the internal/external ratio from 89% up to 93% on my 1.5M files.
---
25-akpm/fs/dcache.c | 94 +++++++++++++++++++++--------------------
25-akpm/include/linux/dcache.h | 8 +--
2 files changed, 53 insertions(+), 49 deletions(-)
diff -puN include/linux/dcache.h~dentry-qstr-consolidation include/linux/dcache.h
--- 25/include/linux/dcache.h~dentry-qstr-consolidation 2004-05-08 02:16:19.700418088 -0700
+++ 25-akpm/include/linux/dcache.h 2004-05-08 02:16:19.706417176 -0700
@@ -29,10 +29,9 @@ struct vfsmount;
* saves "metadata" about the string (ie length and the hash).
*/
struct qstr {
- const unsigned char * name;
+ const unsigned char *name;
unsigned int len;
unsigned int hash;
- char name_str[0];
};
struct dentry_stat_t {
@@ -94,7 +93,6 @@ struct dentry {
struct rcu_head d_rcu;
struct dcookie_struct * d_cookie; /* cookie, if any */
unsigned long d_move_count; /* to indicated moved dentry while lockless lookup */
- struct qstr * d_qstr; /* quick str ptr used in lockless lookup and concurrent d_move */
struct dentry * d_parent; /* parent directory */
struct qstr d_name;
struct hlist_node d_hash; /* lookup hash list */
@@ -184,9 +182,9 @@ static inline void d_drop(struct dentry
spin_unlock(&dcache_lock);
}
-static inline int dname_external(struct dentry *d)
+static inline int dname_external(struct dentry *dentry)
{
- return d->d_name.name != d->d_iname;
+ return dentry->d_name.name != dentry->d_iname;
}
/*
diff -puN fs/dcache.c~dentry-qstr-consolidation fs/dcache.c
--- 25/fs/dcache.c~dentry-qstr-consolidation 2004-05-08 02:16:19.702417784 -0700
+++ 25-akpm/fs/dcache.c 2004-05-08 02:16:19.709416720 -0700
@@ -73,9 +73,8 @@ static void d_callback(void *arg)
{
struct dentry * dentry = (struct dentry *)arg;
- if (dname_external(dentry)) {
- kfree(dentry->d_qstr);
- }
+ if (dname_external(dentry))
+ kfree(dentry->d_name.name);
kmem_cache_free(dentry_cache, dentry);
}
@@ -682,34 +681,30 @@ static int shrink_dcache_memory(int nr,
* copied and the copy passed in may be reused after this call.
*/
-struct dentry * d_alloc(struct dentry * parent, const struct qstr *name)
+struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
{
- char * str;
struct dentry *dentry;
- struct qstr * qstr;
+ char *dname;
dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL);
if (!dentry)
return NULL;
if (name->len > DNAME_INLINE_LEN-1) {
- qstr = kmalloc(sizeof(*qstr) + name->len + 1, GFP_KERNEL);
- if (!qstr) {
+ dname = kmalloc(name->len + 1, GFP_KERNEL);
+ if (!dname) {
kmem_cache_free(dentry_cache, dentry);
return NULL;
}
- qstr->name = qstr->name_str;
- qstr->len = name->len;
- qstr->hash = name->hash;
- dentry->d_qstr = qstr;
- str = qstr->name_str;
} else {
- dentry->d_qstr = &dentry->d_name;
- str = dentry->d_iname;
+ dname = dentry->d_iname;
}
+ dentry->d_name.name = dname;
- memcpy(str, name->name, name->len);
- str[name->len] = 0;
+ dentry->d_name.len = name->len;
+ dentry->d_name.hash = name->hash;
+ memcpy(dname, name->name, name->len);
+ dname[name->len] = 0;
atomic_set(&dentry->d_count, 1);
dentry->d_vfs_flags = DCACHE_UNHASHED;
@@ -719,9 +714,6 @@ struct dentry * d_alloc(struct dentry *
dentry->d_parent = NULL;
dentry->d_move_count = 0;
dentry->d_sb = NULL;
- dentry->d_name.name = str;
- dentry->d_name.len = name->len;
- dentry->d_name.hash = name->hash;
dentry->d_op = NULL;
dentry->d_fsdata = NULL;
dentry->d_mounted = 0;
@@ -788,7 +780,8 @@ struct dentry * d_alloc_root(struct inod
struct dentry *res = NULL;
if (root_inode) {
- static const struct qstr name = { .name = "/", .len = 1, .hash = 0 };
+ static const struct qstr name = { .name = "/", .len = 1 };
+
res = d_alloc(NULL, &name);
if (res) {
res->d_sb = root_inode->i_sb;
@@ -829,7 +822,7 @@ static inline struct hlist_head *d_hash(
struct dentry * d_alloc_anon(struct inode *inode)
{
- static const struct qstr anonstring = { "", 0, 0};
+ static const struct qstr anonstring = { .name = "" };
struct dentry *tmp;
struct dentry *res;
@@ -1003,7 +996,7 @@ struct dentry * __d_lookup(struct dentry
if (dentry->d_parent != parent)
continue;
- qstr = dentry->d_qstr;
+ qstr = &dentry->d_name;
smp_read_barrier_depends();
if (parent->d_op && parent->d_op->d_compare) {
if (parent->d_op->d_compare(parent, qstr, name))
@@ -1145,28 +1138,40 @@ void d_rehash(struct dentry * entry)
* then no longer matches the actual (corrupted) string of the target.
* The hash value has to match the hash queue that the dentry is on..
*/
-static inline void switch_names(struct dentry * dentry, struct dentry * target)
+static inline void switch_names(struct dentry *dentry, struct dentry *target)
{
- const unsigned char *old_name, *new_name;
- struct qstr *old_qstr, *new_qstr;
-
- memcpy(dentry->d_iname, target->d_iname, DNAME_INLINE_LEN);
- old_qstr = target->d_qstr;
- old_name = target->d_name.name;
- new_qstr = dentry->d_qstr;
- new_name = dentry->d_name.name;
- if (old_name == target->d_iname) {
- old_name = dentry->d_iname;
- old_qstr = &dentry->d_name;
- }
- if (new_name == dentry->d_iname) {
- new_name = target->d_iname;
- new_qstr = &target->d_name;
- }
- target->d_name.name = new_name;
- dentry->d_name.name = old_name;
- target->d_qstr = new_qstr;
- dentry->d_qstr = old_qstr;
+ if (dname_external(target)) {
+ if (dname_external(dentry)) {
+ /*
+ * Both external: swap the pointers
+ */
+ do_switch(target->d_name.name, dentry->d_name.name);
+ } else {
+ /*
+ * dentry:internal, target:external. Steal target's
+ * storage and make target internal.
+ */
+ dentry->d_name.name = target->d_name.name;
+ target->d_name.name = target->d_iname;
+ }
+ } else {
+ if (dname_external(dentry)) {
+ /*
+ * dentry:external, target:internal. Give dentry's
+ * storage to target and make dentry internal
+ */
+ memcpy(dentry->d_iname, target->d_name.name,
+ target->d_name.len + 1);
+ target->d_name.name = dentry->d_name.name;
+ dentry->d_name.name = dentry->d_iname;
+ } else {
+ /*
+ * Both are internal. Just copy target to dentry
+ */
+ memcpy(dentry->d_iname, target->d_name.name,
+ target->d_name.len + 1);
+ }
+ }
}
/*
@@ -1342,6 +1347,7 @@ char * d_path(struct dentry *dentry, str
char *res;
struct vfsmount *rootmnt;
struct dentry *root;
+
read_lock(¤t->fs->lock);
rootmnt = mntget(current->fs->rootmnt);
root = dget(current->fs->root);
_
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 9:23 ` Andrew Morton
@ 2004-05-08 10:11 ` Andrew Morton
2004-05-08 10:12 ` Andrew Morton
` (3 more replies)
0 siblings, 4 replies; 62+ messages in thread
From: Andrew Morton @ 2004-05-08 10:11 UTC (permalink / raw)
To: manfred, davej, torvalds, wli, linux-kernel
Andrew Morton <akpm@osdl.org> wrote:
>
> Here be another patch.
Can't help myself!
- d_vfs_flags can be removed - just use d_flags. It looks like someone
added d_vfs_flags with the intent of doing bitops on it, but all
modifications are under dcache_lock. 4 bytes saved.
- make d_flags a ushort.
- Assume that nobody wants filenames longer than 64k and make qstr.len
unsigned short too.
- Pack things so that dentry.d_name.len and dentry.d_flags occupy the same
word. 4 bytes saved.
- NFS and AFS are modifying d_flags without dcache_lock.
On x86 this takes the internal string size up to 44 bytes. The
internal/external ratio on my 1.5M files hits 96%.
---
25-akpm/fs/dcache.c | 17 ++++++++---------
25-akpm/include/linux/dcache.h | 33 +++++++++++++++++----------------
2 files changed, 25 insertions(+), 25 deletions(-)
diff -puN include/linux/dcache.h~dentry-more-shrinkage include/linux/dcache.h
--- 25/include/linux/dcache.h~dentry-more-shrinkage 2004-05-08 02:46:53.134693712 -0700
+++ 25-akpm/include/linux/dcache.h 2004-05-08 02:56:16.687020736 -0700
@@ -30,9 +30,9 @@ struct vfsmount;
*/
struct qstr {
const unsigned char *name;
- unsigned int len;
unsigned int hash;
-};
+ unsigned short len;
+} __attribute__((packed));
struct dentry_stat_t {
int nr_dentry;
@@ -77,28 +77,29 @@ struct dcookie_struct;
struct dentry {
atomic_t d_count;
- unsigned long d_vfs_flags; /* moved here to be on same cacheline */
spinlock_t d_lock; /* per dentry lock */
- struct inode * d_inode; /* Where the name belongs to - NULL is negative */
+ struct inode *d_inode; /* Where the name belongs to - NULL is
+ * negative */
struct list_head d_lru; /* LRU list */
struct list_head d_child; /* child of parent list */
struct list_head d_subdirs; /* our children */
struct list_head d_alias; /* inode alias list */
unsigned long d_time; /* used by d_revalidate */
- struct dentry_operations *d_op;
- struct super_block * d_sb; /* The root of the dentry tree */
- unsigned int d_flags;
+ struct dentry_operations *d_op;
+ struct super_block *d_sb; /* The root of the dentry tree */
int d_mounted;
- void * d_fsdata; /* fs-specific data */
+ void *d_fsdata; /* fs-specific data */
struct rcu_head d_rcu;
- struct dcookie_struct * d_cookie; /* cookie, if any */
- unsigned long d_move_count; /* to indicated moved dentry while lockless lookup */
- struct dentry * d_parent; /* parent directory */
+ struct dcookie_struct *d_cookie; /* cookie, if any */
+ unsigned long d_move_count; /* to indicated moved dentry while
+ * lockless lookup */
+ struct dentry *d_parent; /* parent directory */
struct qstr d_name;
+ unsigned short d_flags;
struct hlist_node d_hash; /* lookup hash list */
- struct hlist_head * d_bucket; /* lookup hash bucket */
+ struct hlist_head *d_bucket; /* lookup hash bucket */
unsigned char d_iname[0]; /* small names */
-};
+} __attribute__((packed));
struct dentry_operations {
int (*d_revalidate)(struct dentry *, struct nameidata *);
@@ -169,8 +170,8 @@ extern spinlock_t dcache_lock;
static inline void __d_drop(struct dentry *dentry)
{
- if (!(dentry->d_vfs_flags & DCACHE_UNHASHED)) {
- dentry->d_vfs_flags |= DCACHE_UNHASHED;
+ if (!(dentry->d_flags & DCACHE_UNHASHED)) {
+ dentry->d_flags |= DCACHE_UNHASHED;
hlist_del_rcu(&dentry->d_hash);
}
}
@@ -281,7 +282,7 @@ extern struct dentry * dget_locked(struc
static inline int d_unhashed(struct dentry *dentry)
{
- return (dentry->d_vfs_flags & DCACHE_UNHASHED);
+ return (dentry->d_flags & DCACHE_UNHASHED);
}
static inline struct dentry *dget_parent(struct dentry *dentry)
diff -puN fs/dcache.c~dentry-more-shrinkage fs/dcache.c
--- 25/fs/dcache.c~dentry-more-shrinkage 2004-05-08 02:55:55.284274448 -0700
+++ 25-akpm/fs/dcache.c 2004-05-08 02:57:10.581827480 -0700
@@ -168,7 +168,7 @@ repeat:
if (d_unhashed(dentry))
goto kill_it;
if (list_empty(&dentry->d_lru)) {
- dentry->d_vfs_flags |= DCACHE_REFERENCED;
+ dentry->d_flags |= DCACHE_REFERENCED;
list_add(&dentry->d_lru, &dentry_unused);
dentry_stat.nr_unused++;
}
@@ -401,8 +401,8 @@ static void prune_dcache(int count)
continue;
}
/* If the dentry was recently referenced, don't free it. */
- if (dentry->d_vfs_flags & DCACHE_REFERENCED) {
- dentry->d_vfs_flags &= ~DCACHE_REFERENCED;
+ if (dentry->d_flags & DCACHE_REFERENCED) {
+ dentry->d_flags &= ~DCACHE_REFERENCED;
list_add(&dentry->d_lru, &dentry_unused);
dentry_stat.nr_unused++;
spin_unlock(&dentry->d_lock);
@@ -707,9 +707,8 @@ struct dentry *d_alloc(struct dentry * p
dname[name->len] = 0;
atomic_set(&dentry->d_count, 1);
- dentry->d_vfs_flags = DCACHE_UNHASHED;
+ dentry->d_flags = DCACHE_UNHASHED;
dentry->d_lock = SPIN_LOCK_UNLOCKED;
- dentry->d_flags = 0;
dentry->d_inode = NULL;
dentry->d_parent = NULL;
dentry->d_move_count = 0;
@@ -855,7 +854,7 @@ struct dentry * d_alloc_anon(struct inod
res->d_inode = inode;
res->d_bucket = d_hash(res, res->d_name.hash);
res->d_flags |= DCACHE_DISCONNECTED;
- res->d_vfs_flags &= ~DCACHE_UNHASHED;
+ res->d_flags &= ~DCACHE_UNHASHED;
list_add(&res->d_alias, &inode->i_dentry);
hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
spin_unlock(&res->d_lock);
@@ -1117,7 +1116,7 @@ void d_rehash(struct dentry * entry)
{
struct hlist_head *list = d_hash(entry->d_parent, entry->d_name.hash);
spin_lock(&dcache_lock);
- entry->d_vfs_flags &= ~DCACHE_UNHASHED;
+ entry->d_flags &= ~DCACHE_UNHASHED;
entry->d_bucket = list;
hlist_add_head_rcu(&entry->d_hash, list);
spin_unlock(&dcache_lock);
@@ -1214,14 +1213,14 @@ void d_move(struct dentry * dentry, stru
}
/* Move the dentry to the target hash queue, if on different bucket */
- if (dentry->d_vfs_flags & DCACHE_UNHASHED)
+ if (dentry->d_flags & DCACHE_UNHASHED)
goto already_unhashed;
if (dentry->d_bucket != target->d_bucket) {
hlist_del_rcu(&dentry->d_hash);
already_unhashed:
dentry->d_bucket = target->d_bucket;
hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
- dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
+ dentry->d_flags &= ~DCACHE_UNHASHED;
}
/* Unhash the target: dput() will then get rid of it */
_
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 10:11 ` Andrew Morton
@ 2004-05-08 10:12 ` Andrew Morton
2004-05-08 10:28 ` viro
` (2 subsequent siblings)
3 siblings, 0 replies; 62+ messages in thread
From: Andrew Morton @ 2004-05-08 10:12 UTC (permalink / raw)
To: manfred, davej, torvalds, wli, linux-kernel
Andrew Morton <akpm@osdl.org> wrote:
>
> - NFS and AFS are modifying d_flags without dcache_lock.
>
25-akpm/fs/afs/dir.c | 2 ++
25-akpm/fs/nfs/unlink.c | 4 ++++
2 files changed, 6 insertions(+)
diff -puN fs/nfs/unlink.c~d_flags-locking-fix fs/nfs/unlink.c
--- 25/fs/nfs/unlink.c~d_flags-locking-fix 2004-05-08 03:03:14.075568032 -0700
+++ 25-akpm/fs/nfs/unlink.c 2004-05-08 03:04:02.561197096 -0700
@@ -180,7 +180,9 @@ nfs_async_unlink(struct dentry *dentry)
task->tk_action = nfs_async_unlink_init;
task->tk_release = nfs_async_unlink_release;
+ spin_lock(&dcache_lock);
dentry->d_flags |= DCACHE_NFSFS_RENAMED;
+ spin_unlock(&dcache_lock);
data->cred = rpcauth_lookupcred(clnt->cl_auth, 0);
rpc_sleep_on(&nfs_delete_queue, task, NULL, NULL);
@@ -210,7 +212,9 @@ nfs_complete_unlink(struct dentry *dentr
return;
data->count++;
nfs_copy_dname(dentry, data);
+ spin_lock(&dcache_lock);
dentry->d_flags &= ~DCACHE_NFSFS_RENAMED;
+ spin_unlock(&dcache_lock);
if (data->task.tk_rpcwait == &nfs_delete_queue)
rpc_wake_up_task(&data->task);
nfs_put_unlinkdata(data);
diff -puN fs/afs/dir.c~d_flags-locking-fix fs/afs/dir.c
--- 25/fs/afs/dir.c~d_flags-locking-fix 2004-05-08 03:03:14.404518024 -0700
+++ 25-akpm/fs/afs/dir.c 2004-05-08 03:04:20.190517032 -0700
@@ -615,7 +615,9 @@ static int afs_d_revalidate(struct dentr
/* the dirent, if it exists, now points to a different vnode */
not_found:
+ spin_lock(&dcache_lock);
dentry->d_flags |= DCACHE_NFSFS_RENAMED;
+ spin_unlock(&dcache_lock);
out_bad:
if (inode) {
_
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 10:11 ` Andrew Morton
2004-05-08 10:12 ` Andrew Morton
@ 2004-05-08 10:28 ` viro
2004-05-08 10:41 ` Andrew Morton
2004-05-08 10:52 ` Andrew Morton
2004-05-08 10:31 ` Manfred Spraul
2004-05-08 17:28 ` Linus Torvalds
3 siblings, 2 replies; 62+ messages in thread
From: viro @ 2004-05-08 10:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: manfred, davej, torvalds, wli, linux-kernel
On Sat, May 08, 2004 at 03:11:59AM -0700, Andrew Morton wrote:
> Andrew Morton <akpm@osdl.org> wrote:
> >
> > Here be another patch.
>
>
> Can't help myself!
>
>
> - d_vfs_flags can be removed - just use d_flags. It looks like someone
> added d_vfs_flags with the intent of doing bitops on it, but all
> modifications are under dcache_lock. 4 bytes saved.
Bzzert. ->d_vfs_flags modifications are under dcache_lock; ->d_flags ones
are *not* - they are up to whatever filesystem code happens to use them.
> - Pack things so that dentry.d_name.len and dentry.d_flags occupy the same
> word. 4 bytes saved.
d_name.len is accessed on very hot paths. In particular, we are looking
at it while traversing hash chains and we do that without dcache_lock
(see callers of ->d_compare()).
If we are going to hold dcache_lock in __d_lookup(), we can get rid of
a _lot_ more than 2 bytes. 32 bytes that came from RCU are there pretty
much for a single reason - to avoid dcache_lock on that path.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 10:11 ` Andrew Morton
2004-05-08 10:12 ` Andrew Morton
2004-05-08 10:28 ` viro
@ 2004-05-08 10:31 ` Manfred Spraul
2004-05-08 17:28 ` Linus Torvalds
3 siblings, 0 replies; 62+ messages in thread
From: Manfred Spraul @ 2004-05-08 10:31 UTC (permalink / raw)
To: Andrew Morton; +Cc: davej, torvalds, wli, linux-kernel
Andrew Morton wrote:
>Andrew Morton <akpm@osdl.org> wrote:
>
>
>>Here be another patch.
>>
>>
>
>
>Can't help myself!
>
>
>- d_vfs_flags can be removed - just use d_flags. It looks like someone
> added d_vfs_flags with the intent of doing bitops on it, but all
> modifications are under dcache_lock. 4 bytes saved.
>
>
IIRC it was intended to access one of the flag fields under dentry->d_lock.
--
Manfred
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 10:28 ` viro
@ 2004-05-08 10:41 ` Andrew Morton
2004-05-08 10:52 ` Andrew Morton
1 sibling, 0 replies; 62+ messages in thread
From: Andrew Morton @ 2004-05-08 10:41 UTC (permalink / raw)
To: viro; +Cc: manfred, davej, torvalds, wli, linux-kernel
viro@parcelfarce.linux.theplanet.co.uk wrote:
>
> On Sat, May 08, 2004 at 03:11:59AM -0700, Andrew Morton wrote:
> > Andrew Morton <akpm@osdl.org> wrote:
> > >
> > > Here be another patch.
> >
> >
> > Can't help myself!
> >
> >
> > - d_vfs_flags can be removed - just use d_flags. It looks like someone
> > added d_vfs_flags with the intent of doing bitops on it, but all
> > modifications are under dcache_lock. 4 bytes saved.
>
> Bzzert. ->d_vfs_flags modifications are under dcache_lock; ->d_flags ones
> are *not* - they are up to whatever filesystem code happens to use them.
yup, I hunted down the rest.
> > - Pack things so that dentry.d_name.len and dentry.d_flags occupy the same
> > word. 4 bytes saved.
>
> d_name.len is accessed on very hot paths. In particular, we are looking
> at it while traversing hash chains and we do that without dcache_lock
> (see callers of ->d_compare()).
That's OK. The d_move() d_movecount and locking logic takes care of that.
But the ushorts are a bit dopey - I'll give those 4 bytes back.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 10:28 ` viro
2004-05-08 10:41 ` Andrew Morton
@ 2004-05-08 10:52 ` Andrew Morton
1 sibling, 0 replies; 62+ messages in thread
From: Andrew Morton @ 2004-05-08 10:52 UTC (permalink / raw)
To: viro; +Cc: manfred, davej, torvalds, wli, linux-kernel
viro@parcelfarce.linux.theplanet.co.uk wrote:
>
> Bzzert. ->d_vfs_flags modifications are under dcache_lock; ->d_flags ones
> are *not* - they are up to whatever filesystem code happens to use them.
All but one of the d_vfs_flags modifications are under ->d_lock, so I fixed
that one up, converted all the d_flags modifiers to take ->d_lock and moved
everything over to d_flags.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 10:11 ` Andrew Morton
` (2 preceding siblings ...)
2004-05-08 10:31 ` Manfred Spraul
@ 2004-05-08 17:28 ` Linus Torvalds
2004-05-08 18:19 ` David S. Miller
2004-05-08 19:01 ` Andrew Morton
3 siblings, 2 replies; 62+ messages in thread
From: Linus Torvalds @ 2004-05-08 17:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: manfred, davej, wli, linux-kernel
On Sat, 8 May 2004, Andrew Morton wrote:
> */
> struct qstr {
> const unsigned char *name;
> - unsigned int len;
> unsigned int hash;
> -};
> + unsigned short len;
> +} __attribute__((packed));
This would make me nervous.
Since we don't use this thing in an array, this may be right. But on the
other hand, if this thing is preceded by an "unsigned short" in a
structure (or "int" on a 64-bit architecture), I think you just made
"name" be unaligned, which can cause serious problems. Because I think
"packed" _also_ means that it doesn't honor the alignment of the thing
when laying it out in structures.
Also, in your previous patch (which I'm not as convinced might be wrong),
the d_qstr pointer removal makes me worry:
- struct qstr * d_qstr; /* quick str ptr used in lockless lookup and concurrent d_move */
I thought the point of d_qstr was that when we do the lockless lookup,
we're guaranteed to always see "stable storage" in the sense that when we
follow the d_qstr, we will always get a "char *" + "len" that match, and
we could never see a partial update (ie len points to the old one, and
"char *" points to the new one).
In particular, think about the "d_compare(parent, qstr, name)" /
"memcmp(qstr->name, str, len)" part - what if "len" doesn't match str,
because a concurrent d_move() is updating them, and maybe we will compare
past the end of kernel mapped memory or something?
(In other words, the "move_count" check should protect us from returning a
wrong dentry, but I'd worry that we'd do something that could cause
serious problems before we even get to the "move_count" check).
Hmm?
Btw, I'd love to be proven wrong, since I hate that d_qstr, and I think
your d_qstr removal patch otherwise looked wonderful.
Linus
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 17:28 ` Linus Torvalds
@ 2004-05-08 18:19 ` David S. Miller
2004-05-08 19:01 ` Andrew Morton
1 sibling, 0 replies; 62+ messages in thread
From: David S. Miller @ 2004-05-08 18:19 UTC (permalink / raw)
To: Linus Torvalds; +Cc: akpm, manfred, davej, wli, linux-kernel
On Sat, 8 May 2004 10:28:29 -0700 (PDT)
Linus Torvalds <torvalds@osdl.org> wrote:
> > + unsigned short len;
> > +} __attribute__((packed));
...
> I think you just made
> "name" be unaligned, which can cause serious problems. Because I think
> "packed" _also_ means that it doesn't honor the alignment of the thing
> when laying it out in structures.
That's correct, alignment concerns are thrown out the door once
you specify packed to the compiler.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 17:28 ` Linus Torvalds
2004-05-08 18:19 ` David S. Miller
@ 2004-05-08 19:01 ` Andrew Morton
2004-05-08 19:13 ` Linus Torvalds
2004-05-08 21:00 ` Dipankar Sarma
1 sibling, 2 replies; 62+ messages in thread
From: Andrew Morton @ 2004-05-08 19:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: manfred, davej, wli, linux-kernel
Linus Torvalds <torvalds@osdl.org> wrote:
>
>
>
> On Sat, 8 May 2004, Andrew Morton wrote:
> > */
> > struct qstr {
> > const unsigned char *name;
> > - unsigned int len;
> > unsigned int hash;
> > -};
> > + unsigned short len;
> > +} __attribute__((packed));
>
> This would make me nervous.
yeah, that was just mucking about.
> Also, in your previous patch (which I'm not as convinced might be wrong),
> the d_qstr pointer removal makes me worry:
>
> - struct qstr * d_qstr; /* quick str ptr used in lockless lookup and concurrent d_move */
>
> I thought the point of d_qstr was that when we do the lockless lookup,
> we're guaranteed to always see "stable storage" in the sense that when we
> follow the d_qstr, we will always get a "char *" + "len" that match, and
> we could never see a partial update (ie len points to the old one, and
> "char *" points to the new one).
It looks that way.
> In particular, think about the "d_compare(parent, qstr, name)" /
> "memcmp(qstr->name, str, len)" part - what if "len" doesn't match str,
> because a concurrent d_move() is updating them, and maybe we will compare
> past the end of kernel mapped memory or something?
>
> (In other words, the "move_count" check should protect us from returning a
> wrong dentry, but I'd worry that we'd do something that could cause
> serious problems before we even get to the "move_count" check).
>
> Hmm?
>
> Btw, I'd love to be proven wrong, since I hate that d_qstr, and I think
> your d_qstr removal patch otherwise looked wonderful.
It looks very similar to 2.4 actually ;)
I think we can simply take ->d_lock a bit earlier in __d_lookup. That will
serialise against d_move(), fixing the problem which you mention, and also
makes d_movecount go away.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 19:01 ` Andrew Morton
@ 2004-05-08 19:13 ` Linus Torvalds
2004-05-08 19:27 ` Andrew Morton
` (2 more replies)
2004-05-08 21:00 ` Dipankar Sarma
1 sibling, 3 replies; 62+ messages in thread
From: Linus Torvalds @ 2004-05-08 19:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: manfred, davej, wli, linux-kernel
On Sat, 8 May 2004, Andrew Morton wrote:
>
> I think we can simply take ->d_lock a bit earlier in __d_lookup. That will
> serialise against d_move(), fixing the problem which you mention, and also
> makes d_movecount go away.
If you do that, RCU basically loses most of it's meaning.
You'll be taking a lock for - and dirtying in the cache - every single
dentry on the hash chain, which is pretty much guaranteed to be slower
than just taking the dcache_lock _once_, even if that one jumps across
CPU's a lot.
In other words, no, I don't think that's a good idea. We really want to
take the dentry lock only after we're pretty sure we have the right
dentry. Otherwise the dentry chains will be bouncing from CPU to CPU all
the time.
Linus
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 19:13 ` Linus Torvalds
@ 2004-05-08 19:27 ` Andrew Morton
2004-05-08 19:27 ` Linus Torvalds
2004-05-08 20:13 ` Dipankar Sarma
2 siblings, 0 replies; 62+ messages in thread
From: Andrew Morton @ 2004-05-08 19:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: manfred, davej, wli, linux-kernel
Linus Torvalds <torvalds@osdl.org> wrote:
>
> On Sat, 8 May 2004, Andrew Morton wrote:
> >
> > I think we can simply take ->d_lock a bit earlier in __d_lookup. That will
> > serialise against d_move(), fixing the problem which you mention, and also
> > makes d_movecount go away.
>
> If you do that, RCU basically loses most of it's meaning.
>
> You'll be taking a lock for - and dirtying in the cache - every single
> dentry on the hash chain, which is pretty much guaranteed to be slower
> than just taking the dcache_lock _once_, even if that one jumps across
> CPU's a lot.
Can take the lock after comparing the hash and the parent?
And if we recheck those after locking, d_movecount is unneeded?
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 19:13 ` Linus Torvalds
2004-05-08 19:27 ` Andrew Morton
@ 2004-05-08 19:27 ` Linus Torvalds
2004-05-08 20:42 ` Dipankar Sarma
2004-05-08 20:13 ` Dipankar Sarma
2 siblings, 1 reply; 62+ messages in thread
From: Linus Torvalds @ 2004-05-08 19:27 UTC (permalink / raw)
To: Andrew Morton
Cc: manfred, davej, wli, Kernel Mailing List, Dipankar Sarma,
Maneesh Soni
On Sat, 8 May 2004, Linus Torvalds wrote:
>
>
> On Sat, 8 May 2004, Andrew Morton wrote:
> >
> > I think we can simply take ->d_lock a bit earlier in __d_lookup. That will
> > serialise against d_move(), fixing the problem which you mention, and also
> > makes d_movecount go away.
>
> If you do that, RCU basically loses most of it's meaning.
Hmm.. Maybe I'm wrong.
In particular, it should be safe to at least do the name hash and parent
comparison without holding any lock (since even if they are invalidated by
a concurrent "move()" operation, doing the comparison is safe). By the
time those have matched, we are probably pretty safe in taking the lock,
since the likelihood of the other checks matching should be pretty high.
And yes, removing d_movecount would be ok by then, as long as we re-test
the parent inside d_lock (we don't need to re-test "hash", since if we
tested the full name inside the lock, the hash had better match too ;)
Comments?
Linus
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 19:13 ` Linus Torvalds
2004-05-08 19:27 ` Andrew Morton
2004-05-08 19:27 ` Linus Torvalds
@ 2004-05-08 20:13 ` Dipankar Sarma
2004-10-06 12:58 ` Maneesh Soni
2 siblings, 1 reply; 62+ messages in thread
From: Dipankar Sarma @ 2004-05-08 20:13 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, manfred, davej, wli, linux-kernel
On Sat, May 08, 2004 at 12:13:09PM -0700, Linus Torvalds wrote:
> On Sat, 8 May 2004, Andrew Morton wrote:
> >
> > I think we can simply take ->d_lock a bit earlier in __d_lookup. That will
> > serialise against d_move(), fixing the problem which you mention, and also
> > makes d_movecount go away.
>
> If you do that, RCU basically loses most of it's meaning.
>
> You'll be taking a lock for - and dirtying in the cache - every single
> dentry on the hash chain, which is pretty much guaranteed to be slower
> than just taking the dcache_lock _once_, even if that one jumps across
> CPU's a lot.
>
> In other words, no, I don't think that's a good idea. We really want to
> take the dentry lock only after we're pretty sure we have the right
> dentry. Otherwise the dentry chains will be bouncing from CPU to CPU all
> the time.
Exactly. Taking ->d_lock for every dentry while traversing the hash
chain should be avoided. As such, atomic operations on that path
are getting costly.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 19:27 ` Linus Torvalds
@ 2004-05-08 20:42 ` Dipankar Sarma
2004-05-08 20:55 ` Andrew Morton
2004-05-09 7:28 ` Manfred Spraul
0 siblings, 2 replies; 62+ messages in thread
From: Dipankar Sarma @ 2004-05-08 20:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, manfred, davej, wli, Kernel Mailing List,
Maneesh Soni
On Sat, May 08, 2004 at 12:27:50PM -0700, Linus Torvalds wrote:
>
> On Sat, 8 May 2004, Linus Torvalds wrote:
> > On Sat, 8 May 2004, Andrew Morton wrote:
> > >
> > > I think we can simply take ->d_lock a bit earlier in __d_lookup. That will
> > > serialise against d_move(), fixing the problem which you mention, and also
> > > makes d_movecount go away.
> >
> > If you do that, RCU basically loses most of it's meaning.
>
> In particular, it should be safe to at least do the name hash and parent
> comparison without holding any lock (since even if they are invalidated by
> a concurrent "move()" operation, doing the comparison is safe). By the
> time those have matched, we are probably pretty safe in taking the lock,
> since the likelihood of the other checks matching should be pretty high.
>
> And yes, removing d_movecount would be ok by then, as long as we re-test
> the parent inside d_lock (we don't need to re-test "hash", since if we
> tested the full name inside the lock, the hash had better match too ;)
There are couple of issues that need to be checked -
1. Re-doing the parent comparison and full name under ->d_lock
need to be benchmarked using dcachebench. That part of code
is extrememly performance sensitive and I remember that the
current d_movecount based solution was done after a lot of
benchmarking of various alternatives.
2. We need to check if doing ->d_compare() under ->d_lock will
result in locking hierarchy problems.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 20:42 ` Dipankar Sarma
@ 2004-05-08 20:55 ` Andrew Morton
2004-05-08 21:19 ` Dipankar Sarma
2004-05-09 7:28 ` Manfred Spraul
1 sibling, 1 reply; 62+ messages in thread
From: Andrew Morton @ 2004-05-08 20:55 UTC (permalink / raw)
To: dipankar; +Cc: torvalds, manfred, davej, wli, linux-kernel, maneesh
Dipankar Sarma <dipankar@in.ibm.com> wrote:
>
> > And yes, removing d_movecount would be ok by then, as long as we re-test
> > the parent inside d_lock (we don't need to re-test "hash", since if we
> > tested the full name inside the lock, the hash had better match too ;)
>
> There are couple of issues that need to be checked -
>
> 1. Re-doing the parent comparison and full name under ->d_lock
> need to be benchmarked using dcachebench. That part of code
> is extrememly performance sensitive and I remember that the
> current d_movecount based solution was done after a lot of
> benchmarking of various alternatives.
There's a speed-space tradeoff here as well. Making the dentry smaller
means that more can be cached, which reduces disk seeks. On all
machines...
But yes, when I've finished mucking with this I'll be asking you to put it
all through your performance/correctness/stress tests please.
One thing which needs to be reviewed is the layout of the dentry, too.
> 2. We need to check if doing ->d_compare() under ->d_lock will
> result in locking hierarchy problems.
Yup, I checked that. Is OK. It's hard to see how a ->d_compare
implementation could care. And ->d_compare is called under dcache_lock
in 2.4.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 19:01 ` Andrew Morton
2004-05-08 19:13 ` Linus Torvalds
@ 2004-05-08 21:00 ` Dipankar Sarma
1 sibling, 0 replies; 62+ messages in thread
From: Dipankar Sarma @ 2004-05-08 21:00 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, manfred, davej, wli, linux-kernel, Maneesh Soni
On Sat, May 08, 2004 at 12:01:48PM -0700, Andrew Morton wrote:
> Linus Torvalds <torvalds@osdl.org> wrote:
> > Also, in your previous patch (which I'm not as convinced might be wrong),
> > the d_qstr pointer removal makes me worry:
> >
> > - struct qstr * d_qstr; /* quick str ptr used in lockless lookup and concurrent d_move */
> >
> > I thought the point of d_qstr was that when we do the lockless lookup,
> > we're guaranteed to always see "stable storage" in the sense that when we
> > follow the d_qstr, we will always get a "char *" + "len" that match, and
> > we could never see a partial update (ie len points to the old one, and
> > "char *" points to the new one).
>
> It looks that way.
Yes, that is exactly why d_qstr was introduced. The "len" and the
storage for the name is then a single update through d_qstr.
>
> > In particular, think about the "d_compare(parent, qstr, name)" /
> > "memcmp(qstr->name, str, len)" part - what if "len" doesn't match str,
> > because a concurrent d_move() is updating them, and maybe we will compare
> > past the end of kernel mapped memory or something?
> >
> > (In other words, the "move_count" check should protect us from returning a
> > wrong dentry, but I'd worry that we'd do something that could cause
> > serious problems before we even get to the "move_count" check).
> >
> > Hmm?
Yes, that is indeed why we had to have d_qstr.
> I think we can simply take ->d_lock a bit earlier in __d_lookup. That will
> serialise against d_move(), fixing the problem which you mention, and also
> makes d_movecount go away.
Repeating some of the tests under ->d_lock is worth looking at, but
we have to be carefull about performance. ISTR, there was another
issue related to calling ->d_compare() under ->d_lock. I will dig
a little bit on this, or Maneesh may remember.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 20:55 ` Andrew Morton
@ 2004-05-08 21:19 ` Dipankar Sarma
2004-05-09 0:10 ` Andrew Morton
0 siblings, 1 reply; 62+ messages in thread
From: Dipankar Sarma @ 2004-05-08 21:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: torvalds, manfred, davej, wli, linux-kernel, maneesh
On Sat, May 08, 2004 at 01:55:12PM -0700, Andrew Morton wrote:
> Dipankar Sarma <dipankar@in.ibm.com> wrote:
> > There are couple of issues that need to be checked -
> >
> > 1. Re-doing the parent comparison and full name under ->d_lock
> > need to be benchmarked using dcachebench. That part of code
> > is extrememly performance sensitive and I remember that the
> > current d_movecount based solution was done after a lot of
> > benchmarking of various alternatives.
>
> There's a speed-space tradeoff here as well. Making the dentry smaller
> means that more can be cached, which reduces disk seeks. On all
> machines...
Another thing that would help is the singly linked rcu patch.
It shaves off 8-bytes per-rcu_head on x86. Should I revive
that ?
> But yes, when I've finished mucking with this I'll be asking you to put it
> all through your performance/correctness/stress tests please.
Yes, sure.
> One thing which needs to be reviewed is the layout of the dentry, too.
IIRC, Maneesh did some experiments with this and found that any
changes in the layout he did only degraded performance :)
Thanks
Dipankar
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 21:19 ` Dipankar Sarma
@ 2004-05-09 0:10 ` Andrew Morton
2004-05-09 2:55 ` Linus Torvalds
0 siblings, 1 reply; 62+ messages in thread
From: Andrew Morton @ 2004-05-09 0:10 UTC (permalink / raw)
To: dipankar; +Cc: torvalds, manfred, davej, wli, linux-kernel, maneesh
Dipankar Sarma <dipankar@in.ibm.com> wrote:
>
> On Sat, May 08, 2004 at 01:55:12PM -0700, Andrew Morton wrote:
> > Dipankar Sarma <dipankar@in.ibm.com> wrote:
> > > There are couple of issues that need to be checked -
> > >
> > > 1. Re-doing the parent comparison and full name under ->d_lock
> > > need to be benchmarked using dcachebench. That part of code
> > > is extrememly performance sensitive and I remember that the
> > > current d_movecount based solution was done after a lot of
> > > benchmarking of various alternatives.
> >
> > There's a speed-space tradeoff here as well. Making the dentry smaller
> > means that more can be cached, which reduces disk seeks. On all
> > machines...
>
> Another thing that would help is the singly linked rcu patch.
> It shaves off 8-bytes per-rcu_head on x86. Should I revive
> that ?
>
May as well.
> > But yes, when I've finished mucking with this I'll be asking you to put it
> > all through your performance/correctness/stress tests please.
>
> Yes, sure.
>
> > One thing which needs to be reviewed is the layout of the dentry, too.
>
> IIRC, Maneesh did some experiments with this and found that any
> changes in the layout he did only degraded performance :)
Well. Bear in mind that the dentry used to be 256 bytes on a 128 byte
boundary so there was really nothing to improve, unless you were using a
PIII or something.
But I just made it 160 bytes on a 16-byte boundary, so some dentries will
now stradddle cachelines. We need to identify the used fields on the
lookup path and try to scrunch them into a 16-byte-aligned 16-byte chunk.
I'll have a go at that.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 0:10 ` Andrew Morton
@ 2004-05-09 2:55 ` Linus Torvalds
2004-05-09 3:12 ` David S. Miller
2004-05-09 4:12 ` Andrew Morton
0 siblings, 2 replies; 62+ messages in thread
From: Linus Torvalds @ 2004-05-09 2:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: dipankar, manfred, davej, wli, linux-kernel, maneesh
Ok,
while we're discussing the other things, how about for now
just having this fairly minimal patch to get rid of the most egregious
memory mis-use.
This removes the SLAB_HWCACHE_ALIGN and the cache line alignment on
"struct dentry", and to compensate for the fact that the inline name
length will shrink quite a bit it changes DNAME_INLINE_LEN_MIN from 16 to
24.
It also re-orders the first few fields of the dentry, since the "count"
and "d_lock" should be 32-bit on most 64-bit architectures, so to avoid
having unnecessary unused space, we should just put them together
(everything else there is goign to be comprised of 64-bit entities, I
think).
So this should bring the size of a dentry down to 160 bytes on x86, from
either 192 of 256 bytes (depending on cacheline size). And 24 bytes of
inline name (it can be a bit more on some architectures, due to longword
alignment or similar) should still capture most of the dentries by far.
Let's see what Andrew's more complex patches can do to bring the size of
"struct dentry" down further.
Linus
--- 1.71/fs/dcache.c Mon Apr 26 22:07:34 2004
+++ edited/fs/dcache.c Sat May 8 17:41:30 2004
@@ -1558,14 +1558,11 @@
* A constructor could be added for stable state like the lists,
* but it is probably not worth it because of the cache nature
* of the dcache.
- * If fragmentation is too bad then the SLAB_HWCACHE_ALIGN
- * flag could be removed here, to hint to the allocator that
- * it should not try to get multiple page regions.
*/
dentry_cache = kmem_cache_create("dentry_cache",
sizeof(struct dentry),
0,
- SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT,
+ SLAB_RECLAIM_ACCOUNT,
NULL, NULL);
if (!dentry_cache)
panic("Cannot create dentry cache");
===== include/linux/dcache.h 1.36 vs edited =====
--- 1.36/include/linux/dcache.h Mon Jan 19 15:38:11 2004
+++ edited/include/linux/dcache.h Sat May 8 19:39:47 2004
@@ -74,14 +74,14 @@
return end_name_hash(hash);
}
-#define DNAME_INLINE_LEN_MIN 16
+#define DNAME_INLINE_LEN_MIN 24
struct dcookie_struct;
struct dentry {
atomic_t d_count;
- unsigned long d_vfs_flags; /* moved here to be on same cacheline */
spinlock_t d_lock; /* per dentry lock */
+ unsigned long d_vfs_flags; /* moved here to be on same cacheline */
struct inode * d_inode; /* Where the name belongs to - NULL is negative */
struct list_head d_lru; /* LRU list */
struct list_head d_child; /* child of parent list */
@@ -102,7 +102,7 @@
struct hlist_node d_hash; /* lookup hash list */
struct hlist_head * d_bucket; /* lookup hash bucket */
unsigned char d_iname[DNAME_INLINE_LEN_MIN]; /* small names */
-} ____cacheline_aligned;
+};
#define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname))
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 2:55 ` Linus Torvalds
@ 2004-05-09 3:12 ` David S. Miller
2004-05-09 3:53 ` Linus Torvalds
2004-05-09 4:12 ` Andrew Morton
1 sibling, 1 reply; 62+ messages in thread
From: David S. Miller @ 2004-05-09 3:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: akpm, dipankar, manfred, davej, wli, linux-kernel, maneesh
FWIW this patch does not change the dentry size on sparc64.
It's 256 bytes both before and after. But of course the
inline string length is larger.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 3:12 ` David S. Miller
@ 2004-05-09 3:53 ` Linus Torvalds
2004-05-09 21:03 ` Matt Mackall
0 siblings, 1 reply; 62+ messages in thread
From: Linus Torvalds @ 2004-05-09 3:53 UTC (permalink / raw)
To: David S. Miller
Cc: akpm, dipankar, manfred, davej, wli, linux-kernel, maneesh
On Sat, 8 May 2004, David S. Miller wrote:
>
> FWIW this patch does not change the dentry size on sparc64.
> It's 256 bytes both before and after. But of course the
> inline string length is larger.
Yup. I tested it on ppc64, that was part of the reason for the increase in
the inline string size ("reclaim the lost space" rather than "make the
dentry shrink from 256 bytes to 248 bytes").
That shows how broken the old setup was, btw: on 64-bit architectures, the
old code resulted in the inline string size being 16 bytes, while on a P4
it would be something like 120 bytes. That makes no sense.
Btw, the cacheline alignment was actually likely to guarantee the
_worst_ possible cache behaviour on a P4: the fields we access on lookup
are:
- d_hash (pointer *2)
- d_bucket (pointer)
- d_move_count (unsigned long)
- d_name.hash (part of a struct with 1 pointer + 2 ints)
- d_parent (pointer)
and they were all close together, but I think "d_bucket" was guaranteed to
be in another cacheline exactly because of the thing always being aligned.
So removing the alignment is likely to cause more of the lookups to only
need one cacheline instead of two like it was before.
I think. Maybe I counted the offsets wrong.
Linus
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 2:55 ` Linus Torvalds
2004-05-09 3:12 ` David S. Miller
@ 2004-05-09 4:12 ` Andrew Morton
2004-05-09 4:25 ` Linus Torvalds
1 sibling, 1 reply; 62+ messages in thread
From: Andrew Morton @ 2004-05-09 4:12 UTC (permalink / raw)
To: Linus Torvalds; +Cc: dipankar, manfred, davej, wli, linux-kernel, maneesh
Linus Torvalds <torvalds@osdl.org> wrote:
>
> while we're discussing the other things, how about for now
> just having this fairly minimal patch to get rid of the most egregious
> memory mis-use.
>
> This removes the SLAB_HWCACHE_ALIGN and the cache line alignment on
> "struct dentry", and to compensate for the fact that the inline name
> length will shrink quite a bit it changes DNAME_INLINE_LEN_MIN from 16 to
> 24.
I think that'll crash. DNAME_INLINE_LEN_MIN is the *minimum* storage in
the dentry. The actual storage is assumed to be DNAME_INLINE_LEN and the
patch doesn't change that value.
The calculation of DNAME_INLINE_LEN assumes that slab padded the dentry out
to the next cacheline.
If we want a quickie deporkification, this will do it. But I don't see a
rush on it.
Save a bit of dcache space.
Currently we use all the space from the start of dentry.d_name to the next
cacheline for the inline name. That can be 128 bytes or more nowadays, which
is excessive. In fact it's exactly 128 bytes of inline name length for an x86
P4 build (both UP and SMP).
Rework things so that the inline name length is between 31 and 48 bytes.
On SMP P4-compiled x86 each dentry goes from 256 bytes (15 per page) to 160
bytes (24 per page). The downside is that we'll have more out-of-line names.
Here's the histogram of name lengths on all 1.5M files on my workstation:
1: 0%
2: 0%
3: 1%
4: 5%
5: 8%
6: 13%
7: 19%
8: 26%
9: 33%
10: 42%
11: 49%
12: 55%
13: 60%
14: 64%
15: 67%
16: 69%
17: 71%
18: 73%
19: 75%
20: 76%
21: 78%
22: 79%
23: 80%
24: 81%
25: 82%
26: 83%
27: 85%
28: 86%
29: 87%
30: 88%
31: 89%
32: 90%
33: 91%
34: 92%
35: 93%
36: 94%
37: 95%
38: 96%
39: 96%
40: 96%
41: 96%
42: 96%
43: 96%
44: 97%
45: 97%
46: 97%
47: 97%
48: 97%
49: 98%
50: 98%
51: 98%
52: 98%
53: 98%
54: 98%
55: 98%
56: 98%
57: 98%
58: 98%
59: 98%
60: 99%
61: 99%
62: 99%
63: 99%
64: 99%
So on x86 we'll fit 89% of filenames into the inline name.
The patch also removes the NAME_ALLOC_LEN() rounding-up of the storage for the
out-of-line names. That seems unnecessary.
---
25-akpm/fs/dcache.c | 16 ++++++++++------
25-akpm/include/linux/dcache.h | 8 ++------
2 files changed, 12 insertions(+), 12 deletions(-)
diff -puN fs/dcache.c~dentry-shrinkage fs/dcache.c
--- 25/fs/dcache.c~dentry-shrinkage 2004-05-08 21:09:00.620787784 -0700
+++ 25-akpm/fs/dcache.c 2004-05-08 21:09:37.413194488 -0700
@@ -41,6 +41,13 @@ EXPORT_SYMBOL(dcache_lock);
static kmem_cache_t *dentry_cache;
/*
+ * The allocation size for each dentry. It is a multiple of 16 bytes. We
+ * leave the final 32-47 bytes for the inline name.
+ */
+#define DENTRY_STORAGE (((sizeof(struct dentry)+32) + 15) & ~15)
+#define DNAME_INLINE_LEN (DENTRY_STORAGE - sizeof(struct dentry))
+
+/*
* This is the single most critical data structure when it comes
* to the dcache: the hashtable for lookups. Somebody should try
* to make this good - I've just made it work.
@@ -665,8 +672,6 @@ static int shrink_dcache_memory(int nr,
return dentry_stat.nr_unused;
}
-#define NAME_ALLOC_LEN(len) ((len+16) & ~15)
-
/**
* d_alloc - allocate a dcache entry
* @parent: parent of entry to allocate
@@ -688,8 +693,7 @@ struct dentry * d_alloc(struct dentry *
return NULL;
if (name->len > DNAME_INLINE_LEN-1) {
- qstr = kmalloc(sizeof(*qstr) + NAME_ALLOC_LEN(name->len),
- GFP_KERNEL);
+ qstr = kmalloc(sizeof(*qstr) + name->len + 1, GFP_KERNEL);
if (!qstr) {
kmem_cache_free(dentry_cache, dentry);
return NULL;
@@ -1563,9 +1567,9 @@ static void __init dcache_init(unsigned
* it should not try to get multiple page regions.
*/
dentry_cache = kmem_cache_create("dentry_cache",
- sizeof(struct dentry),
+ DENTRY_STORAGE,
0,
- SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT,
+ SLAB_RECLAIM_ACCOUNT,
NULL, NULL);
if (!dentry_cache)
panic("Cannot create dentry cache");
diff -puN include/linux/dcache.h~dentry-shrinkage include/linux/dcache.h
--- 25/include/linux/dcache.h~dentry-shrinkage 2004-05-08 21:09:00.621787632 -0700
+++ 25-akpm/include/linux/dcache.h 2004-05-08 21:09:00.628786568 -0700
@@ -74,8 +74,6 @@ full_name_hash(const unsigned char *name
return end_name_hash(hash);
}
-#define DNAME_INLINE_LEN_MIN 16
-
struct dcookie_struct;
struct dentry {
@@ -101,11 +99,9 @@ struct dentry {
struct qstr d_name;
struct hlist_node d_hash; /* lookup hash list */
struct hlist_head * d_bucket; /* lookup hash bucket */
- unsigned char d_iname[DNAME_INLINE_LEN_MIN]; /* small names */
-} ____cacheline_aligned;
+ unsigned char d_iname[0]; /* small names */
+};
-#define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname))
-
struct dentry_operations {
int (*d_revalidate)(struct dentry *, struct nameidata *);
int (*d_hash) (struct dentry *, struct qstr *);
_
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 4:12 ` Andrew Morton
@ 2004-05-09 4:25 ` Linus Torvalds
2004-05-09 4:36 ` Andrew Morton
2004-05-09 4:43 ` Linus Torvalds
0 siblings, 2 replies; 62+ messages in thread
From: Linus Torvalds @ 2004-05-09 4:25 UTC (permalink / raw)
To: Andrew Morton; +Cc: dipankar, manfred, davej, wli, linux-kernel, maneesh
On Sat, 8 May 2004, Andrew Morton wrote:
>
> I think that'll crash. DNAME_INLINE_LEN_MIN is the *minimum* storage in
> the dentry. The actual storage is assumed to be DNAME_INLINE_LEN and the
> patch doesn't change that value.
It doesn't need to, the value is correct.
> The calculation of DNAME_INLINE_LEN assumes that slab padded the dentry out
> to the next cacheline.
No it doesn't. It's just:
#define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname))
which is always right.
It used to assume the padding because of the "____cacheline_aligned" on
the struct dentry, which obviously also padded out the "sizeof()". But
since I removed that, DNAME_INLINE_LEN is still correct.
NOTE! It's absolutely true that DNAME_INLINE_LEN may still be different
from DNAME_INLINE_LEN_MIN. In particular, if something inside the struct
makes the alignment of "struct dentry" be bigger than the offset of the
last field, then DNAME_INLINE_LEN will be different from (bigger than)
DNAME_INLINE_LEN.
It's just that with current architectures, I don't believe that will
happen in practice. But I left it as-is, because at least in theory it
could happen.
Linus
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 4:25 ` Linus Torvalds
@ 2004-05-09 4:36 ` Andrew Morton
2004-05-09 5:14 ` Linus Torvalds
2004-05-09 4:43 ` Linus Torvalds
1 sibling, 1 reply; 62+ messages in thread
From: Andrew Morton @ 2004-05-09 4:36 UTC (permalink / raw)
To: Linus Torvalds; +Cc: dipankar, manfred, davej, wli, linux-kernel, maneesh
Linus Torvalds <torvalds@osdl.org> wrote:
>
> > The calculation of DNAME_INLINE_LEN assumes that slab padded the dentry out
> > to the next cacheline.
>
> No it doesn't. It's just:
>
> #define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname))
>
> which is always right.
erk. OK. Things are (much) worse than I thought. The 24 byte limit means
that 20% of my names will be externally allocated, but that's no worse than
what we had before.
We should make this change to 2.4.x.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 4:25 ` Linus Torvalds
2004-05-09 4:36 ` Andrew Morton
@ 2004-05-09 4:43 ` Linus Torvalds
1 sibling, 0 replies; 62+ messages in thread
From: Linus Torvalds @ 2004-05-09 4:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: dipankar, manfred, davej, wli, linux-kernel, maneesh
On Sat, 8 May 2004, Linus Torvalds wrote:
>
> NOTE! It's absolutely true that DNAME_INLINE_LEN may still be different
> from DNAME_INLINE_LEN_MIN. In particular, if something inside the struct
> makes the alignment of "struct dentry" be bigger than the offset of the
> last field, then DNAME_INLINE_LEN will be different from (bigger than)
> DNAME_INLINE_LEN.
Btw, this does depend on the fact that a regular "kmem_cache_create()"
with a alignment of 0 had better return a pointer that is always "aligned
enough" for the architecture in question.
But that had better be true anyway, since otherwise everything would
basically have to specify the alignment by hand, and the alignment of 0
would be useless.
Linus
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 4:36 ` Andrew Morton
@ 2004-05-09 5:14 ` Linus Torvalds
2004-05-09 7:36 ` Rodolfo Guluarte Hale
2004-05-09 9:10 ` Guennadi Liakhovetski
0 siblings, 2 replies; 62+ messages in thread
From: Linus Torvalds @ 2004-05-09 5:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: dipankar, manfred, davej, wli, Kernel Mailing List, maneesh
On Sat, 8 May 2004, Andrew Morton wrote:
>
> erk. OK. Things are (much) worse than I thought. The 24 byte limit means
> that 20% of my names will be externally allocated, but that's no worse than
> what we had before.
In fact, it's better than what we had before at least on 64-bit
archtiectures.
But I'd be happy to make the DNAME_INLINE_LEN_MIN #define larger - I just
think we should try to shrink the internal structure fields first.
Btw, at least for the kernel sources, my statistics say that filename
distribution (in a built tree, and with BK) is
1: 5.04 % ( 5.04 % cum -- 2246)
2: 5.19 % ( 10.23 % cum -- 2312)
3: 0.55 % ( 10.79 % cum -- 247)
4: 3.30 % ( 14.08 % cum -- 1469)
5: 3.35 % ( 17.43 % cum -- 1492)
6: 4.35 % ( 21.79 % cum -- 1940)
7: 7.55 % ( 29.34 % cum -- 3365)
8: 9.64 % ( 38.98 % cum -- 4293)
9: 9.17 % ( 48.15 % cum -- 4084)
10: 10.98 % ( 59.12 % cum -- 4891)
11: 7.65 % ( 66.77 % cum -- 3406)
12: 7.01 % ( 73.78 % cum -- 3122)
13: 5.16 % ( 78.94 % cum -- 2298)
14: 3.83 % ( 82.77 % cum -- 1706)
15: 3.47 % ( 86.24 % cum -- 1545)
16: 2.11 % ( 88.34 % cum -- 939)
17: 1.47 % ( 89.81 % cum -- 655)
18: 1.06 % ( 90.87 % cum -- 472)
19: 0.68 % ( 91.55 % cum -- 303)
20: 0.42 % ( 91.97 % cum -- 188)
21: 0.29 % ( 92.26 % cum -- 128)
22: 0.24 % ( 92.50 % cum -- 107)
23: 0.14 % ( 92.64 % cum -- 63)
ie we've reached 92% of all names with 24-byte inline thing.
For my whole disk, I have similar stats:
1: 6.59 % ( 6.59 % cum -- 71690)
2: 6.86 % ( 13.45 % cum -- 74611)
3: 1.59 % ( 15.04 % cum -- 17292)
4: 3.77 % ( 18.81 % cum -- 40992)
5: 3.11 % ( 21.92 % cum -- 33884)
6: 4.13 % ( 26.05 % cum -- 44898)
7: 6.97 % ( 33.01 % cum -- 75774)
8: 8.13 % ( 41.15 % cum -- 88451)
9: 7.81 % ( 48.96 % cum -- 84987)
10: 9.56 % ( 58.52 % cum -- 104021)
11: 7.67 % ( 66.19 % cum -- 83403)
12: 8.07 % ( 74.26 % cum -- 87826)
13: 4.38 % ( 78.65 % cum -- 47690)
14: 3.36 % ( 82.01 % cum -- 36592)
15: 2.71 % ( 84.71 % cum -- 29431)
16: 1.78 % ( 86.49 % cum -- 19311)
17: 1.35 % ( 87.84 % cum -- 14703)
18: 1.05 % ( 88.89 % cum -- 11410)
19: 0.82 % ( 89.71 % cum -- 8952)
20: 0.77 % ( 90.49 % cum -- 8423)
21: 0.85 % ( 91.34 % cum -- 9264)
22: 0.72 % ( 92.06 % cum -- 7798)
23: 0.69 % ( 92.75 % cum -- 7534)
so it appears that I'm either a sad case with a lot of source code on my
disk, or you have overlong filenames that brings up your stats.
Or my program is broken. Entirely possible.
Whee. 149 characters is my winning entry:
/usr/share/doc/HTML/en/kdelibs-3.1-apidocs/kdecore/html/classKGenericFactory_3_01KTypeList_3_01Product_00_01ProductListTail_01_4_00_01KTypeList_3_01ParentType_00_01ParentTypeListTail_01_4_01_4-members.html
That's obscene.
Linus
-----
/*
* (C) Copyright 2003 Linus Torvalds
*
* "bkr" - recusrive "bk" invocations aka "bk -r"
*/
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/param.h>
#include <fcntl.h>
#include <dirent.h>
#include <string.h>
#include <regex.h>
/*
* Very generic directory tree handling.
*/
static int bkr(const char *path, int pathlen,
void (*regcallback)(const char *path, int pathlen, const char *name, int namelen),
void (*dircallback)(const char *path, int pathlen, const char *name, int namelen))
{
struct dirent *de;
char fullname[MAXPATHLEN + 1];
char *ptr = fullname + pathlen;
DIR *base = opendir(path);
if (!base)
return 0;
memcpy(fullname, path, pathlen);
while ((de = readdir(base)) != NULL) {
int len;
len = strlen(de->d_name);
memcpy(ptr, de->d_name, len+1);
if (dircallback) {
switch (de->d_type) {
struct stat st;
case DT_UNKNOWN:
if (stat(fullname, &st))
break;
if (!S_ISDIR(st.st_mode))
break;
case DT_DIR:
if (de->d_name[0] == '.') {
if (len == 1)
break;
if (de->d_name[1] == '.' && len == 2)
break;
}
ptr[len] = '/';
ptr[len+1] = '\0';
dircallback(fullname, pathlen + len + 1, de->d_name, len);
continue;
}
}
regcallback(fullname, pathlen + len, de->d_name, len);
}
closedir(base);
return 0;
}
static int total;
static int len[256];
static void file(const char *path, int pathlen, const char *name, int namelen)
{
total++;
len[namelen]++;
}
static void dir(const char *path, int pathlen, const char *name, int namelen)
{
file(path, pathlen, name, namelen);
bkr(path, pathlen, file, dir);
}
int main(int argc, char **argv)
{
int i;
double sum = 0.0;
bkr(".", 0, file, dir);
for (i = 0; i < 256; i++) {
int nr = len[i];
if (nr) {
double this = (double) nr * 100.0 / total;
sum += this;
printf("%4i: %8.2f %% (%8.2f %% cum -- %d)\n", i, this, sum, nr);
}
}
}
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 20:42 ` Dipankar Sarma
2004-05-08 20:55 ` Andrew Morton
@ 2004-05-09 7:28 ` Manfred Spraul
2004-05-09 15:33 ` Dipankar Sarma
1 sibling, 1 reply; 62+ messages in thread
From: Manfred Spraul @ 2004-05-09 7:28 UTC (permalink / raw)
To: dipankar
Cc: Linus Torvalds, Andrew Morton, davej, wli, Kernel Mailing List,
Maneesh Soni
Dipankar Sarma wrote:
>On Sat, May 08, 2004 at 12:27:50PM -0700, Linus Torvalds wrote:
>
>
>>And yes, removing d_movecount would be ok by then, as long as we re-test
>>the parent inside d_lock (we don't need to re-test "hash", since if we
>>tested the full name inside the lock, the hash had better match too ;)
>>
>>
What's the prupose of d_move_count?
AFAICS it protects against a double rename: first to different bucket,
then back to original bucket. This changes the position of the dentry in
the hash chain and a concurrent lookup would skip entries.
d_lock wouldn't prevent that.
But I think d_bucket could be removed: for __d_lookup the test appears
to be redundant with the d_move_count test. The remaining users are not
performance critical, they could recalculate the bucket from d_parent
and d_name.hash.
--
Manfred
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 5:14 ` Linus Torvalds
@ 2004-05-09 7:36 ` Rodolfo Guluarte Hale
2004-05-09 9:10 ` Guennadi Liakhovetski
1 sibling, 0 replies; 62+ messages in thread
From: Rodolfo Guluarte Hale @ 2004-05-09 7:36 UTC (permalink / raw)
To: linux-kernel
----- Original Message -----
From: "Linus Torvalds" <torvalds@osdl.org>
To: "Andrew Morton" <akpm@osdl.org>
Cc: <dipankar@in.ibm.com>; <manfred@colorfullife.com>; <davej@redhat.com>;
<wli@holomorphy.com>; "Kernel Mailing List" <linux-kernel@vger.kernel.org>;
<maneesh@in.ibm.com>
Sent: Saturday, May 08, 2004 11:14 PM
Subject: Re: dentry bloat.
>
>
> On Sat, 8 May 2004, Andrew Morton wrote:
> >
> > erk. OK. Things are (much) worse than I thought. The 24 byte limit
means
> > that 20% of my names will be externally allocated, but that's no worse
than
> > what we had before.
>
> In fact, it's better than what we had before at least on 64-bit
> archtiectures.
>
> But I'd be happy to make the DNAME_INLINE_LEN_MIN #define larger - I just
> think we should try to shrink the internal structure fields first.
>
> Btw, at least for the kernel sources, my statistics say that filename
> distribution (in a built tree, and with BK) is
>
> 1: 5.04 % ( 5.04 % cum -- 2246)
> 2: 5.19 % ( 10.23 % cum -- 2312)
> 3: 0.55 % ( 10.79 % cum -- 247)
> 4: 3.30 % ( 14.08 % cum -- 1469)
> 5: 3.35 % ( 17.43 % cum -- 1492)
> 6: 4.35 % ( 21.79 % cum -- 1940)
> 7: 7.55 % ( 29.34 % cum -- 3365)
> 8: 9.64 % ( 38.98 % cum -- 4293)
> 9: 9.17 % ( 48.15 % cum -- 4084)
> 10: 10.98 % ( 59.12 % cum -- 4891)
> 11: 7.65 % ( 66.77 % cum -- 3406)
> 12: 7.01 % ( 73.78 % cum -- 3122)
> 13: 5.16 % ( 78.94 % cum -- 2298)
> 14: 3.83 % ( 82.77 % cum -- 1706)
> 15: 3.47 % ( 86.24 % cum -- 1545)
> 16: 2.11 % ( 88.34 % cum -- 939)
> 17: 1.47 % ( 89.81 % cum -- 655)
> 18: 1.06 % ( 90.87 % cum -- 472)
> 19: 0.68 % ( 91.55 % cum -- 303)
> 20: 0.42 % ( 91.97 % cum -- 188)
> 21: 0.29 % ( 92.26 % cum -- 128)
> 22: 0.24 % ( 92.50 % cum -- 107)
> 23: 0.14 % ( 92.64 % cum -- 63)
>
> ie we've reached 92% of all names with 24-byte inline thing.
>
> For my whole disk, I have similar stats:
>
> 1: 6.59 % ( 6.59 % cum -- 71690)
> 2: 6.86 % ( 13.45 % cum -- 74611)
> 3: 1.59 % ( 15.04 % cum -- 17292)
> 4: 3.77 % ( 18.81 % cum -- 40992)
> 5: 3.11 % ( 21.92 % cum -- 33884)
> 6: 4.13 % ( 26.05 % cum -- 44898)
> 7: 6.97 % ( 33.01 % cum -- 75774)
> 8: 8.13 % ( 41.15 % cum -- 88451)
> 9: 7.81 % ( 48.96 % cum -- 84987)
> 10: 9.56 % ( 58.52 % cum -- 104021)
> 11: 7.67 % ( 66.19 % cum -- 83403)
> 12: 8.07 % ( 74.26 % cum -- 87826)
> 13: 4.38 % ( 78.65 % cum -- 47690)
> 14: 3.36 % ( 82.01 % cum -- 36592)
> 15: 2.71 % ( 84.71 % cum -- 29431)
> 16: 1.78 % ( 86.49 % cum -- 19311)
> 17: 1.35 % ( 87.84 % cum -- 14703)
> 18: 1.05 % ( 88.89 % cum -- 11410)
> 19: 0.82 % ( 89.71 % cum -- 8952)
> 20: 0.77 % ( 90.49 % cum -- 8423)
> 21: 0.85 % ( 91.34 % cum -- 9264)
> 22: 0.72 % ( 92.06 % cum -- 7798)
> 23: 0.69 % ( 92.75 % cum -- 7534)
>
> so it appears that I'm either a sad case with a lot of source code on my
> disk, or you have overlong filenames that brings up your stats.
>
> Or my program is broken. Entirely possible.
>
> Whee. 149 characters is my winning entry:
>
>
/usr/share/doc/HTML/en/kdelibs-3.1-apidocs/kdecore/html/classKGenericFactory
_3_01KTypeList_3_01Product_00_01ProductListTail_01_4_00_01KTypeList_3_01Pare
ntType_00_01ParentTypeListTail_01_4_01_4-members.html
>
> That's obscene.
>
> Linus
>
> -----
> /*
> * (C) Copyright 2003 Linus Torvalds
> *
> * "bkr" - recusrive "bk" invocations aka "bk -r"
> */
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <sys/param.h>
> #include <fcntl.h>
> #include <dirent.h>
> #include <string.h>
> #include <regex.h>
>
> /*
> * Very generic directory tree handling.
> */
> static int bkr(const char *path, int pathlen,
> void (*regcallback)(const char *path, int pathlen, const char *name, int
namelen),
> void (*dircallback)(const char *path, int pathlen, const char *name, int
namelen))
> {
> struct dirent *de;
> char fullname[MAXPATHLEN + 1];
> char *ptr = fullname + pathlen;
> DIR *base = opendir(path);
>
> if (!base)
> return 0;
> memcpy(fullname, path, pathlen);
>
> while ((de = readdir(base)) != NULL) {
> int len;
>
> len = strlen(de->d_name);
> memcpy(ptr, de->d_name, len+1);
>
> if (dircallback) {
> switch (de->d_type) {
> struct stat st;
> case DT_UNKNOWN:
> if (stat(fullname, &st))
> break;
> if (!S_ISDIR(st.st_mode))
> break;
> case DT_DIR:
> if (de->d_name[0] == '.') {
> if (len == 1)
> break;
> if (de->d_name[1] == '.' && len == 2)
> break;
> }
> ptr[len] = '/';
> ptr[len+1] = '\0';
> dircallback(fullname, pathlen + len + 1, de->d_name, len);
> continue;
> }
> }
> regcallback(fullname, pathlen + len, de->d_name, len);
> }
> closedir(base);
> return 0;
> }
>
> static int total;
> static int len[256];
>
> static void file(const char *path, int pathlen, const char *name, int
namelen)
> {
> total++;
> len[namelen]++;
> }
>
> static void dir(const char *path, int pathlen, const char *name, int
namelen)
> {
> file(path, pathlen, name, namelen);
> bkr(path, pathlen, file, dir);
> }
>
>
> int main(int argc, char **argv)
> {
> int i;
> double sum = 0.0;
>
> bkr(".", 0, file, dir);
> for (i = 0; i < 256; i++) {
> int nr = len[i];
> if (nr) {
> double this = (double) nr * 100.0 / total;
> sum += this;
> printf("%4i: %8.2f %% (%8.2f %% cum -- %d)\n", i, this, sum, nr);
> }
> }
> }
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
http://www.ocioyocio.com/
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 5:14 ` Linus Torvalds
2004-05-09 7:36 ` Rodolfo Guluarte Hale
@ 2004-05-09 9:10 ` Guennadi Liakhovetski
2004-05-09 9:23 ` viro
2004-05-09 15:35 ` Linus Torvalds
1 sibling, 2 replies; 62+ messages in thread
From: Guennadi Liakhovetski @ 2004-05-09 9:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Morton, dipankar, manfred, davej, wli, Kernel Mailing List,
maneesh
On Sat, 8 May 2004, Linus Torvalds wrote:
> 1: 5.04 % ( 5.04 % cum -- 2246)
> 2: 5.19 % ( 10.23 % cum -- 2312)
Ok, risking to state the obvious - it was intentional to count "."s and
".."s, wasn't it? Just this makes it a bit non-trivial to compare this
statistics with Andrew's.
> 23: 0.14 % ( 92.64 % cum -- 63)
>
> ie we've reached 92% of all names with 24-byte inline thing.
[OT, educational]: Do "." and ".." actually take dentries?
Thanks
Guennadi
---
Guennadi Liakhovetski
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 9:10 ` Guennadi Liakhovetski
@ 2004-05-09 9:23 ` viro
2004-05-09 15:35 ` Linus Torvalds
1 sibling, 0 replies; 62+ messages in thread
From: viro @ 2004-05-09 9:23 UTC (permalink / raw)
To: Guennadi Liakhovetski
Cc: Linus Torvalds, Andrew Morton, dipankar, manfred, davej, wli,
Kernel Mailing List, maneesh
On Sun, May 09, 2004 at 11:10:30AM +0200, Guennadi Liakhovetski wrote:
> [OT, educational]: Do "." and ".." actually take dentries?
No.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 7:28 ` Manfred Spraul
@ 2004-05-09 15:33 ` Dipankar Sarma
2004-05-09 22:17 ` viro
0 siblings, 1 reply; 62+ messages in thread
From: Dipankar Sarma @ 2004-05-09 15:33 UTC (permalink / raw)
To: Manfred Spraul
Cc: Linus Torvalds, Andrew Morton, davej, wli, Kernel Mailing List,
Maneesh Soni
On Sun, May 09, 2004 at 09:28:46AM +0200, Manfred Spraul wrote:
> What's the prupose of d_move_count?
> AFAICS it protects against a double rename: first to different bucket,
> then back to original bucket. This changes the position of the dentry in
> the hash chain and a concurrent lookup would skip entries.
> d_lock wouldn't prevent that.
Actually, what may happen is that since the dentries are added
in the front, a double move like that would result in hash chain
traversal looping. Timing dependent and unlikely, but d_move_count
avoided that theoritical possibility. It is not about skipping
dentries which is safe because a miss would result in a real_lookup()
anyway. If we do the comparison within d_lock, then atleast
we are guaranteed to get the right file. The remaining question
is whether we violate POSIX rename semantics in some twisted way.
> But I think d_bucket could be removed: for __d_lookup the test appears
> to be redundant with the d_move_count test. The remaining users are not
> performance critical, they could recalculate the bucket from d_parent
> and d_name.hash.
Yes, afaics, d_move_count can effectively be used to work around
the possibility of the dentry moving to a different hash bucket.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 9:10 ` Guennadi Liakhovetski
2004-05-09 9:23 ` viro
@ 2004-05-09 15:35 ` Linus Torvalds
2004-05-09 18:11 ` Matt Mackall
` (2 more replies)
1 sibling, 3 replies; 62+ messages in thread
From: Linus Torvalds @ 2004-05-09 15:35 UTC (permalink / raw)
To: Guennadi Liakhovetski
Cc: Andrew Morton, dipankar, manfred, davej, wli, Kernel Mailing List,
maneesh
On Sun, 9 May 2004, Guennadi Liakhovetski wrote:
> On Sat, 8 May 2004, Linus Torvalds wrote:
>
> > 1: 5.04 % ( 5.04 % cum -- 2246)
> > 2: 5.19 % ( 10.23 % cum -- 2312)
>
> Ok, risking to state the obvious - it was intentional to count "."s and
> ".."s, wasn't it? Just this makes it a bit non-trivial to compare this
> statistics with Andrew's.
Ok, here's a version that doesn't count "." and "..". My numbers don't
really change much for the kernel:
1: 0.00 % ( 0.00 % cum -- 2)
2: 0.17 % ( 0.17 % cum -- 68)
3: 0.62 % ( 0.79 % cum -- 247)
4: 3.67 % ( 4.46 % cum -- 1469)
5: 3.72 % ( 8.18 % cum -- 1492)
6: 4.84 % ( 13.03 % cum -- 1940)
7: 8.40 % ( 21.43 % cum -- 3365)
8: 10.72 % ( 32.14 % cum -- 4293)
9: 10.19 % ( 42.34 % cum -- 4084)
10: 12.21 % ( 54.55 % cum -- 4891)
11: 8.50 % ( 63.05 % cum -- 3406)
12: 7.79 % ( 70.84 % cum -- 3122)
13: 5.74 % ( 76.58 % cum -- 2298)
14: 4.26 % ( 80.84 % cum -- 1706)
15: 3.86 % ( 84.69 % cum -- 1545)
16: 2.34 % ( 87.04 % cum -- 939)
17: 1.64 % ( 88.67 % cum -- 655)
18: 1.18 % ( 89.85 % cum -- 472)
19: 0.76 % ( 90.61 % cum -- 303)
20: 0.47 % ( 91.08 % cum -- 188)
21: 0.32 % ( 91.40 % cum -- 128)
22: 0.27 % ( 91.66 % cum -- 107)
23: 0.16 % ( 91.82 % cum -- 63)
and we've reached over 90% coverage with the 24-byte inline name.
(We don't strictly _need_ the terminating '\0' at the end of the dentry
name, since they are all counted strings anyway, but it certainly is safer
that way).
Same goes for my full tree:
1: 0.11 % ( 0.11 % cum -- 1073)
2: 0.41 % ( 0.53 % cum -- 3918)
3: 1.82 % ( 2.35 % cum -- 17256)
4: 4.32 % ( 6.68 % cum -- 40925)
5: 3.58 % ( 10.25 % cum -- 33860)
6: 4.74 % ( 15.00 % cum -- 44874)
7: 8.00 % ( 23.00 % cum -- 75750)
8: 9.35 % ( 32.35 % cum -- 88451)
9: 8.98 % ( 41.33 % cum -- 84987)
10: 10.99 % ( 52.32 % cum -- 104021)
11: 8.81 % ( 61.13 % cum -- 83404)
12: 9.28 % ( 70.41 % cum -- 87826)
13: 5.04 % ( 75.45 % cum -- 47690)
14: 3.87 % ( 79.32 % cum -- 36592)
15: 3.11 % ( 82.43 % cum -- 29431)
16: 2.04 % ( 84.47 % cum -- 19311)
17: 1.55 % ( 86.02 % cum -- 14703)
18: 1.21 % ( 87.23 % cum -- 11410)
19: 0.95 % ( 88.17 % cum -- 8952)
20: 0.89 % ( 89.07 % cum -- 8423)
21: 0.98 % ( 90.04 % cum -- 9264)
22: 0.82 % ( 90.87 % cum -- 7798)
23: 0.80 % ( 91.66 % cum -- 7534)
so Andrew really must have a fairly different setup if he sees 20%
filenames being > 23 characters.
> [OT, educational]: Do "." and ".." actually take dentries?
Nope, they are resolved early. So you're right, I shouldn't have counted
them.
Linus
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 15:35 ` Linus Torvalds
@ 2004-05-09 18:11 ` Matt Mackall
2004-05-09 22:08 ` Francois Romieu
2004-05-09 23:51 ` Paul Jackson
2004-05-10 7:17 ` Florian Weimer
2004-05-10 14:12 ` Rik van Riel
2 siblings, 2 replies; 62+ messages in thread
From: Matt Mackall @ 2004-05-09 18:11 UTC (permalink / raw)
To: Linus Torvalds
Cc: Guennadi Liakhovetski, Andrew Morton, dipankar, manfred, davej,
wli, Kernel Mailing List, maneesh
On Sun, May 09, 2004 at 08:35:55AM -0700, Linus Torvalds wrote:
>
>
> On Sun, 9 May 2004, Guennadi Liakhovetski wrote:
>
> > On Sat, 8 May 2004, Linus Torvalds wrote:
> >
> > > 1: 5.04 % ( 5.04 % cum -- 2246)
> > > 2: 5.19 % ( 10.23 % cum -- 2312)
> >
> > Ok, risking to state the obvious - it was intentional to count "."s and
> > ".."s, wasn't it? Just this makes it a bit non-trivial to compare this
> > statistics with Andrew's.
>
> Ok, here's a version that doesn't count "." and "..". My numbers don't
> really change much for the kernel:
>
> and we've reached over 90% coverage with the 24-byte inline name.
I hacked up something and ran it on my webserver (which has something
like 200 shell accounts). My histogram peaked at 12 rather than 10 but
still hit 90% cumulative at 21 characters.
I suspect worst case is a large LAN fileserver with Samba shares and
whatnot, I suspect there are a large number of "A picture of my cute
puppy I took last summer.JPG" style filenames there. Anyone have stats
for such an FS?
--
Matt Mackall : http://www.selenic.com : Linux development and consulting
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 3:53 ` Linus Torvalds
@ 2004-05-09 21:03 ` Matt Mackall
2004-05-10 8:27 ` Helge Hafting
0 siblings, 1 reply; 62+ messages in thread
From: Matt Mackall @ 2004-05-09 21:03 UTC (permalink / raw)
To: Linus Torvalds
Cc: David S. Miller, akpm, dipankar, manfred, davej, wli,
linux-kernel, maneesh, Arjan van de Ven
One also wonders about whether all the RCU stuff is needed on UP. I'm
not sure if I grok all the finepoints here, but it looks like the
answer is no and that we can make struct_rcu head empty and have
call_rcu fall directly through to the callback. This would save
something like 16-32 bytes (32/64bit), not to mention a bunch of
dinking around with lists and whatnot.
So what am I missing?
--
Matt Mackall : http://www.selenic.com : Linux development and consulting
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 18:11 ` Matt Mackall
@ 2004-05-09 22:08 ` Francois Romieu
2004-05-09 23:51 ` Paul Jackson
1 sibling, 0 replies; 62+ messages in thread
From: Francois Romieu @ 2004-05-09 22:08 UTC (permalink / raw)
To: Matt Mackall
Cc: Linus Torvalds, Guennadi Liakhovetski, Andrew Morton, dipankar,
manfred, davej, wli, Kernel Mailing List, maneesh
Matt Mackall <mpm@selenic.com> :
[...]
> I suspect worst case is a large LAN fileserver with Samba shares and
> whatnot, I suspect there are a large number of "A picture of my cute
> puppy I took last summer.JPG" style filenames there. Anyone have stats
> for such an FS?
General purpose server (mail/smb/cvs/rpm repo) for 15 users: closer to
Andrew's results. "." and ".." do not appear in the results below.
1: 0.30 % ( 0.30 % cum -- 4263)
2: 1.36 % ( 1.65 % cum -- 19491)
3: 2.64 % ( 4.29 % cum -- 37905)
4: 2.43 % ( 6.72 % cum -- 34930)
5: 2.00 % ( 8.72 % cum -- 28816)
6: 3.04 % ( 11.75 % cum -- 43652)
7: 4.80 % ( 16.56 % cum -- 69065)
8: 8.50 % ( 25.06 % cum -- 122244)
9: 4.96 % ( 30.01 % cum -- 71310)
10: 10.29 % ( 40.30 % cum -- 147964)
11: 6.29 % ( 46.60 % cum -- 90527)
12: 6.70 % ( 53.30 % cum -- 96428)
13: 4.02 % ( 57.32 % cum -- 57766)
14: 2.84 % ( 60.16 % cum -- 40865)
15: 2.36 % ( 62.52 % cum -- 34000)
16: 2.31 % ( 64.84 % cum -- 33288)
17: 3.58 % ( 68.41 % cum -- 51421)
18: 2.87 % ( 71.28 % cum -- 41308)
19: 2.10 % ( 73.39 % cum -- 30263)
20: 2.36 % ( 75.75 % cum -- 33959)
21: 2.78 % ( 78.53 % cum -- 39951)
22: 1.48 % ( 80.01 % cum -- 21300)
23: 2.01 % ( 82.02 % cum -- 28975)
24: 2.17 % ( 84.19 % cum -- 31169)
25: 1.32 % ( 85.51 % cum -- 19047)
26: 1.02 % ( 86.54 % cum -- 14713)
27: 0.85 % ( 87.39 % cum -- 12282)
28: 0.82 % ( 88.21 % cum -- 11795)
29: 0.75 % ( 88.96 % cum -- 10783)
30: 0.73 % ( 89.69 % cum -- 10507)
31: 0.59 % ( 90.28 % cum -- 8454)
[...]
57: 0.15 % ( 99.06 % cum -- 2198)
Assuming one is only interested in the users's home directory:
1: 0.37 % ( 0.37 % cum -- 1638)
2: 1.52 % ( 1.90 % cum -- 6672)
3: 3.40 % ( 5.30 % cum -- 14907)
4: 2.61 % ( 7.91 % cum -- 11447)
5: 1.71 % ( 9.62 % cum -- 7498)
6: 2.75 % ( 12.37 % cum -- 12069)
7: 4.83 % ( 17.20 % cum -- 21186)
8: 10.27 % ( 27.48 % cum -- 45034)
9: 4.79 % ( 32.27 % cum -- 21005)
10: 6.23 % ( 38.50 % cum -- 27302)
11: 6.51 % ( 45.01 % cum -- 28551)
12: 5.99 % ( 51.00 % cum -- 26267)
13: 4.13 % ( 55.14 % cum -- 18107)
14: 2.85 % ( 57.98 % cum -- 12479)
15: 2.26 % ( 60.25 % cum -- 9921)
16: 2.28 % ( 62.53 % cum -- 10015)
17: 2.88 % ( 65.41 % cum -- 12604)
18: 2.04 % ( 67.45 % cum -- 8955)
19: 2.26 % ( 69.71 % cum -- 9903)
20: 2.57 % ( 72.28 % cum -- 11260)
21: 2.02 % ( 74.30 % cum -- 8874)
22: 1.48 % ( 75.78 % cum -- 6498)
23: 2.08 % ( 77.86 % cum -- 9109)
24: 2.50 % ( 80.36 % cum -- 10969)
25: 1.35 % ( 81.71 % cum -- 5921)
26: 1.01 % ( 82.72 % cum -- 4428)
27: 0.85 % ( 83.57 % cum -- 3705)
28: 0.84 % ( 84.41 % cum -- 3692)
29: 0.80 % ( 85.21 % cum -- 3489)
30: 0.76 % ( 85.97 % cum -- 3335)
31: 0.63 % ( 86.60 % cum -- 2757)
32: 0.83 % ( 87.43 % cum -- 3633)
33: 0.49 % ( 87.92 % cum -- 2151)
34: 0.53 % ( 88.45 % cum -- 2327)
35: 0.45 % ( 88.90 % cum -- 1959)
36: 0.40 % ( 89.29 % cum -- 1737)
37: 0.47 % ( 89.76 % cum -- 2068)
38: 0.48 % ( 90.24 % cum -- 2103)
[...]
60: 0.13 % ( 99.00 % cum -- 583)
--
Ueimor
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 15:33 ` Dipankar Sarma
@ 2004-05-09 22:17 ` viro
2004-05-09 22:27 ` Andrew Morton
2004-05-10 18:39 ` Dipankar Sarma
0 siblings, 2 replies; 62+ messages in thread
From: viro @ 2004-05-09 22:17 UTC (permalink / raw)
To: Dipankar Sarma
Cc: Manfred Spraul, Linus Torvalds, Andrew Morton, davej, wli,
Kernel Mailing List, Maneesh Soni
On Sun, May 09, 2004 at 09:03:16PM +0530, Dipankar Sarma wrote:
> Actually, what may happen is that since the dentries are added
> in the front, a double move like that would result in hash chain
> traversal looping. Timing dependent and unlikely, but d_move_count
> avoided that theoritical possibility. It is not about skipping
> dentries which is safe because a miss would result in a real_lookup()
Not really. A miss could result in getting another dentry allocated
for the same e.g. directory, which is *NOT* harmless at all.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 22:17 ` viro
@ 2004-05-09 22:27 ` Andrew Morton
2004-05-11 5:26 ` Maneesh Soni
2004-05-10 18:39 ` Dipankar Sarma
1 sibling, 1 reply; 62+ messages in thread
From: Andrew Morton @ 2004-05-09 22:27 UTC (permalink / raw)
To: viro; +Cc: dipankar, manfred, torvalds, davej, wli, linux-kernel, maneesh
viro@parcelfarce.linux.theplanet.co.uk wrote:
>
> On Sun, May 09, 2004 at 09:03:16PM +0530, Dipankar Sarma wrote:
>
> > Actually, what may happen is that since the dentries are added
> > in the front, a double move like that would result in hash chain
> > traversal looping. Timing dependent and unlikely, but d_move_count
> > avoided that theoritical possibility. It is not about skipping
> > dentries which is safe because a miss would result in a real_lookup()
>
> Not really. A miss could result in getting another dentry allocated
> for the same e.g. directory, which is *NOT* harmless at all.
The d_bucket logic does look a bit odd.
dentry = hlist_entry(node, struct dentry, d_hash);
/* if lookup ends up in a different bucket
* due to concurrent rename, fail it
*/
if (unlikely(dentry->d_bucket != head))
break;
/*
* We must take a snapshot of d_move_count followed by
* read memory barrier before any search key comparison
*/
move_count = dentry->d_move_count;
There is a window between the d_bucket test and sampling of d_move_count.
What happens if the dentry gets moved around in there?
Anyway, regardless of that, it is more efficient to test d_bucket _after_
performing the hash comparison. And it seems saner to perform the d_bucket
check when things are pinned down by d_lock.
25-akpm/fs/dcache.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)
diff -puN fs/dcache.c~dentry-d_bucket-fix fs/dcache.c
--- 25/fs/dcache.c~dentry-d_bucket-fix 2004-05-09 14:06:51.816554824 -0700
+++ 25-akpm/fs/dcache.c 2004-05-09 14:07:41.932935976 -0700
@@ -975,12 +975,6 @@ struct dentry * __d_lookup(struct dentry
smp_read_barrier_depends();
dentry = hlist_entry(node, struct dentry, d_hash);
- /* if lookup ends up in a different bucket
- * due to concurrent rename, fail it
- */
- if (unlikely(dentry->d_bucket != head))
- break;
-
smp_rmb();
if (dentry->d_name.hash != hash)
@@ -991,6 +985,13 @@ struct dentry * __d_lookup(struct dentry
spin_lock(&dentry->d_lock);
/*
+ * If lookup ends up in a different bucket due to concurrent
+ * rename, fail it
+ */
+ if (unlikely(dentry->d_bucket != head))
+ goto terminate;
+
+ /*
* Recheck the dentry after taking the lock - d_move may have
* changed things. Don't bother checking the hash because we're
* about to compare the whole name anyway.
@@ -1014,6 +1015,7 @@ struct dentry * __d_lookup(struct dentry
atomic_inc(&dentry->d_count);
found = dentry;
}
+terminate:
spin_unlock(&dentry->d_lock);
break;
next:
_
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 18:11 ` Matt Mackall
2004-05-09 22:08 ` Francois Romieu
@ 2004-05-09 23:51 ` Paul Jackson
1 sibling, 0 replies; 62+ messages in thread
From: Paul Jackson @ 2004-05-09 23:51 UTC (permalink / raw)
To: Matt Mackall
Cc: torvalds, g.liakhovetski, akpm, dipankar, manfred, davej, wli,
linux-kernel, maneesh
> Anyone have stats for such an FS?
Not a samba share but a dual-boot linux box with a few windows file
systems located on various subdirectories of path length 6 to 16 chars.
You have to go to length 67 to get 80% of the names:
length 3, cum 4, percent 0
length 4, cum 17, percent 0
length 5, cum 23, percent 0
length 6, cum 27, percent 0
length 7, cum 49, percent 0
length 8, cum 101, percent 0
length 9, cum 149, percent 0
length 10, cum 223, percent 0
length 11, cum 314, percent 0
length 12, cum 584, percent 0
length 13, cum 876, percent 0
length 14, cum 1321, percent 0
length 15, cum 1726, percent 0
length 16, cum 2289, percent 0
length 17, cum 3004, percent 0
length 18, cum 3618, percent 0
length 19, cum 4282, percent 0
length 20, cum 5024, percent 0
length 21, cum 6059, percent 1
length 22, cum 7114, percent 1
length 23, cum 8323, percent 1
length 24, cum 9885, percent 1
length 25, cum 11786, percent 1
length 26, cum 14184, percent 2
length 27, cum 16704, percent 2
length 28, cum 19470, percent 3
length 29, cum 23400, percent 3
length 30, cum 28310, percent 4
length 31, cum 35104, percent 5
length 32, cum 40083, percent 6
length 33, cum 46084, percent 7
length 34, cum 52765, percent 8
length 35, cum 59267, percent 9
length 36, cum 66206, percent 11
length 37, cum 73745, percent 12
length 38, cum 82686, percent 13
length 39, cum 93252, percent 15
length 40, cum 105353, percent 17
length 41, cum 118474, percent 19
length 42, cum 130596, percent 22
length 43, cum 143802, percent 24
length 44, cum 156347, percent 26
length 45, cum 169459, percent 28
length 46, cum 184349, percent 31
length 47, cum 199955, percent 33
length 48, cum 215665, percent 36
length 49, cum 232193, percent 39
length 50, cum 248706, percent 41
length 51, cum 263703, percent 44
length 52, cum 278759, percent 47
length 53, cum 294684, percent 49
length 54, cum 310449, percent 52
length 55, cum 326860, percent 55
length 56, cum 343146, percent 57
length 57, cum 359223, percent 60
length 58, cum 375097, percent 63
length 59, cum 389867, percent 65
length 60, cum 409873, percent 69
length 61, cum 421748, percent 71
length 62, cum 433091, percent 73
length 63, cum 443854, percent 74
length 64, cum 454093, percent 76
length 65, cum 463521, percent 78
length 66, cum 472333, percent 79
length 67, cum 480730, percent 81
length 68, cum 488440, percent 82
length 69, cum 494620, percent 83
length 70, cum 500712, percent 84
length 71, cum 506007, percent 85
length 72, cum 510948, percent 86
length 73, cum 515491, percent 86
length 74, cum 520553, percent 87
length 75, cum 524475, percent 88
length 76, cum 528410, percent 89
length 77, cum 532247, percent 89
length 78, cum 535946, percent 90
length 79, cum 539595, percent 90
length 80, cum 543056, percent 91
length 81, cum 546411, percent 92
length 82, cum 549284, percent 92
length 83, cum 552563, percent 93
length 84, cum 555361, percent 93
length 85, cum 558000, percent 94
length 86, cum 560718, percent 94
length 87, cum 563665, percent 95
length 88, cum 566046, percent 95
length 89, cum 568362, percent 95
length 90, cum 570559, percent 96
length 91, cum 572596, percent 96
length 92, cum 574224, percent 96
length 93, cum 575820, percent 97
length 94, cum 577350, percent 97
length 95, cum 578693, percent 97
length 96, cum 580016, percent 97
length 97, cum 581175, percent 98
length 98, cum 582349, percent 98
length 99, cum 583471, percent 98
length 100, cum 584385, percent 98
length 101, cum 585189, percent 98
length 102, cum 585926, percent 98
length 103, cum 586489, percent 98
length 104, cum 587178, percent 99
length 105, cum 587746, percent 99
length 106, cum 588257, percent 99
length 107, cum 588664, percent 99
length 108, cum 589181, percent 99
length 109, cum 589491, percent 99
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 15:35 ` Linus Torvalds
2004-05-09 18:11 ` Matt Mackall
@ 2004-05-10 7:17 ` Florian Weimer
2004-05-10 14:12 ` Rik van Riel
2 siblings, 0 replies; 62+ messages in thread
From: Florian Weimer @ 2004-05-10 7:17 UTC (permalink / raw)
To: Linus Torvalds
Cc: Guennadi Liakhovetski, Andrew Morton, dipankar, manfred, davej,
wli, Kernel Mailing List, maneesh
* Linus Torvalds:
> so Andrew really must have a fairly different setup if he sees 20%
> filenames being > 23 characters.
A few Maildir folders are more than sufficient for that.
--
Current mail filters: many dial-up/DSL/cable modem hosts, and the
following domains: atlas.cz, bigpond.com, di-ve.com, hotmail.com,
jumpy.it, libero.it, netscape.net, postino.it, simplesnet.pt,
tiscali.co.uk, tiscali.cz, tiscali.it, voila.fr, yahoo.com.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 21:03 ` Matt Mackall
@ 2004-05-10 8:27 ` Helge Hafting
2004-05-10 8:32 ` Arjan van de Ven
0 siblings, 1 reply; 62+ messages in thread
From: Helge Hafting @ 2004-05-10 8:27 UTC (permalink / raw)
To: Matt Mackall; +Cc: linux-kernel
Matt Mackall wrote:
>One also wonders about whether all the RCU stuff is needed on UP. I'm
>not sure if I grok all the finepoints here, but it looks like the
>answer is no and that we can make struct_rcu head empty and have
>call_rcu fall directly through to the callback. This would save
>something like 16-32 bytes (32/64bit), not to mention a bunch of
>dinking around with lists and whatnot.
>
>So what am I missing?
>
>
Preempt can happen anytime, I believe.
Helge Hafting
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-10 8:27 ` Helge Hafting
@ 2004-05-10 8:32 ` Arjan van de Ven
2004-05-10 9:46 ` Andrew Morton
0 siblings, 1 reply; 62+ messages in thread
From: Arjan van de Ven @ 2004-05-10 8:32 UTC (permalink / raw)
To: Helge Hafting; +Cc: Matt Mackall, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 644 bytes --]
On Mon, 2004-05-10 at 10:27, Helge Hafting wrote:
> Matt Mackall wrote:
>
> >One also wonders about whether all the RCU stuff is needed on UP. I'm
> >not sure if I grok all the finepoints here, but it looks like the
> >answer is no and that we can make struct_rcu head empty and have
> >call_rcu fall directly through to the callback. This would save
> >something like 16-32 bytes (32/64bit), not to mention a bunch of
> >dinking around with lists and whatnot.
> >
> >So what am I missing?
> >
> >
> Preempt can happen anytime, I believe.
ok so for UP-non-preempt we can still get those 16 bytes back from the
dentry....
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-10 8:32 ` Arjan van de Ven
@ 2004-05-10 9:46 ` Andrew Morton
2004-05-10 14:54 ` Matt Mackall
0 siblings, 1 reply; 62+ messages in thread
From: Andrew Morton @ 2004-05-10 9:46 UTC (permalink / raw)
To: arjanv; +Cc: helgehaf, mpm, linux-kernel
Arjan van de Ven <arjanv@redhat.com> wrote:
>
> On Mon, 2004-05-10 at 10:27, Helge Hafting wrote:
> > Matt Mackall wrote:
> >
> > >One also wonders about whether all the RCU stuff is needed on UP. I'm
> > >not sure if I grok all the finepoints here, but it looks like the
> > >answer is no and that we can make struct_rcu head empty and have
> > >call_rcu fall directly through to the callback. This would save
> > >something like 16-32 bytes (32/64bit), not to mention a bunch of
> > >dinking around with lists and whatnot.
> > >
> > >So what am I missing?
> > >
> > >
> > Preempt can happen anytime, I believe.
>
> ok so for UP-non-preempt we can still get those 16 bytes back from the
> dentry....
I suppose so. And on small SMP, really. We chose not to play those games
early on so the code got the best testing coverage.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 15:35 ` Linus Torvalds
2004-05-09 18:11 ` Matt Mackall
2004-05-10 7:17 ` Florian Weimer
@ 2004-05-10 14:12 ` Rik van Riel
2 siblings, 0 replies; 62+ messages in thread
From: Rik van Riel @ 2004-05-10 14:12 UTC (permalink / raw)
To: Linus Torvalds
Cc: Guennadi Liakhovetski, Andrew Morton, dipankar, manfred, davej,
wli, Kernel Mailing List, maneesh
On Sun, 9 May 2004, Linus Torvalds wrote:
> so Andrew really must have a fairly different setup if he sees 20%
> filenames being > 23 characters.
Just look at the patch names in the -mm tree ;)
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-10 9:46 ` Andrew Morton
@ 2004-05-10 14:54 ` Matt Mackall
2004-05-10 16:26 ` Paul E. McKenney
0 siblings, 1 reply; 62+ messages in thread
From: Matt Mackall @ 2004-05-10 14:54 UTC (permalink / raw)
To: Andrew Morton; +Cc: arjanv, helgehaf, linux-kernel
On Mon, May 10, 2004 at 02:46:58AM -0700, Andrew Morton wrote:
> Arjan van de Ven <arjanv@redhat.com> wrote:
> >
> > On Mon, 2004-05-10 at 10:27, Helge Hafting wrote:
> > > Matt Mackall wrote:
> > >
> > > >One also wonders about whether all the RCU stuff is needed on UP. I'm
> > > >not sure if I grok all the finepoints here, but it looks like the
> > > >answer is no and that we can make struct_rcu head empty and have
> > > >call_rcu fall directly through to the callback. This would save
> > > >something like 16-32 bytes (32/64bit), not to mention a bunch of
> > > >dinking around with lists and whatnot.
> > > >
> > > >So what am I missing?
> > > >
> > > >
> > > Preempt can happen anytime, I believe.
> >
> > ok so for UP-non-preempt we can still get those 16 bytes back from the
> > dentry....
>
> I suppose so. And on small SMP, really. We chose not to play those games
> early on so the code got the best testing coverage.
Ok, I can spin something up. I'll start with a generic no-RCU-on-UP
and then we can think about the small SMP case a bit later.
--
Matt Mackall : http://www.selenic.com : Linux development and consulting
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-10 14:54 ` Matt Mackall
@ 2004-05-10 16:26 ` Paul E. McKenney
2004-05-10 18:34 ` Dipankar Sarma
0 siblings, 1 reply; 62+ messages in thread
From: Paul E. McKenney @ 2004-05-10 16:26 UTC (permalink / raw)
To: Matt Mackall; +Cc: Andrew Morton, arjanv, helgehaf, linux-kernel
On Mon, May 10, 2004 at 09:54:04AM -0500, Matt Mackall wrote:
> On Mon, May 10, 2004 at 02:46:58AM -0700, Andrew Morton wrote:
> > Arjan van de Ven <arjanv@redhat.com> wrote:
> > >
> > > On Mon, 2004-05-10 at 10:27, Helge Hafting wrote:
> > > > Matt Mackall wrote:
> > > >
> > > > >One also wonders about whether all the RCU stuff is needed on UP. I'm
> > > > >not sure if I grok all the finepoints here, but it looks like the
> > > > >answer is no and that we can make struct_rcu head empty and have
> > > > >call_rcu fall directly through to the callback. This would save
> > > > >something like 16-32 bytes (32/64bit), not to mention a bunch of
> > > > >dinking around with lists and whatnot.
> > > > >
> > > > >So what am I missing?
> > > > >
> > > > >
> > > > Preempt can happen anytime, I believe.
> > >
> > > ok so for UP-non-preempt we can still get those 16 bytes back from the
> > > dentry....
> >
> > I suppose so. And on small SMP, really. We chose not to play those games
> > early on so the code got the best testing coverage.
>
> Ok, I can spin something up. I'll start with a generic no-RCU-on-UP
> and then we can think about the small SMP case a bit later.
Hello, Matt,
You may wish to start with the dcache portion of Dipankar's earlier patch:
http://sourceforge.net/mailarchive/forum.php?thread_id=3644163&forum_id=730
It does not remove the extra rcu_head from the dentry, but does the
rest of the work.
Thanx, Paul
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-10 16:26 ` Paul E. McKenney
@ 2004-05-10 18:34 ` Dipankar Sarma
0 siblings, 0 replies; 62+ messages in thread
From: Dipankar Sarma @ 2004-05-10 18:34 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Matt Mackall, Andrew Morton, arjanv, helgehaf, linux-kernel
On Mon, May 10, 2004 at 09:26:45AM -0700, Paul E. McKenney wrote:
> On Mon, May 10, 2004 at 09:54:04AM -0500, Matt Mackall wrote:
> > Ok, I can spin something up. I'll start with a generic no-RCU-on-UP
> > and then we can think about the small SMP case a bit later.
>
> Hello, Matt,
>
> You may wish to start with the dcache portion of Dipankar's earlier patch:
>
> http://sourceforge.net/mailarchive/forum.php?thread_id=3644163&forum_id=730
>
> It does not remove the extra rcu_head from the dentry, but does the
> rest of the work.
Matt and I had discussed this privately sometime ago and I will
send him somewhat cleaned up patches that introduces call_rcu_direct()
which, in UP kernel, invokes the callback directly. The corresponding
read-side critical section will have to protect itself with
rcu_read_lock_bh() or rcu_read_lock_irq() if an update can happen
from softirq or irq context.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 22:17 ` viro
2004-05-09 22:27 ` Andrew Morton
@ 2004-05-10 18:39 ` Dipankar Sarma
2004-05-11 5:17 ` Maneesh Soni
1 sibling, 1 reply; 62+ messages in thread
From: Dipankar Sarma @ 2004-05-10 18:39 UTC (permalink / raw)
To: viro
Cc: Manfred Spraul, Linus Torvalds, Andrew Morton, davej, wli,
Kernel Mailing List, Maneesh Soni
On Sun, May 09, 2004 at 11:17:12PM +0100, viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Sun, May 09, 2004 at 09:03:16PM +0530, Dipankar Sarma wrote:
>
> > Actually, what may happen is that since the dentries are added
> > in the front, a double move like that would result in hash chain
> > traversal looping. Timing dependent and unlikely, but d_move_count
> > avoided that theoritical possibility. It is not about skipping
> > dentries which is safe because a miss would result in a real_lookup()
>
> Not really. A miss could result in getting another dentry allocated
> for the same e.g. directory, which is *NOT* harmless at all.
AFAICS, a miss in __d_lookup would result in a repeat lookup
under dcache_lock in which case we are safe or real_lookup()
which in turn does another lookup with dcache_lock. Is there
a path that I am missing here ?
Thanks
Dipankar
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-10 18:39 ` Dipankar Sarma
@ 2004-05-11 5:17 ` Maneesh Soni
0 siblings, 0 replies; 62+ messages in thread
From: Maneesh Soni @ 2004-05-11 5:17 UTC (permalink / raw)
To: Dipankar Sarma
Cc: viro, Manfred Spraul, Linus Torvalds, Andrew Morton, davej, wli,
Kernel Mailing List
On Tue, May 11, 2004 at 12:09:25AM +0530, Dipankar Sarma wrote:
> On Sun, May 09, 2004 at 11:17:12PM +0100, viro@parcelfarce.linux.theplanet.co.uk wrote:
> > On Sun, May 09, 2004 at 09:03:16PM +0530, Dipankar Sarma wrote:
> >
> > > Actually, what may happen is that since the dentries are added
> > > in the front, a double move like that would result in hash chain
> > > traversal looping. Timing dependent and unlikely, but d_move_count
> > > avoided that theoritical possibility. It is not about skipping
> > > dentries which is safe because a miss would result in a real_lookup()
> >
> > Not really. A miss could result in getting another dentry allocated
> > for the same e.g. directory, which is *NOT* harmless at all.
>
> AFAICS, a miss in __d_lookup would result in a repeat lookup
> under dcache_lock in which case we are safe or real_lookup()
> which in turn does another lookup with dcache_lock. Is there
> a path that I am missing here ?
Actually, real_lookup is done with parent's i_sem (avoiding rename is the
same directory) and it also uses rename_lock (seqence lock) which provides
protection against d_move. (real_lookup() --> d_lookup() --> __d_lookup()).
So as real_lookup() does repeat the cached lookup, I don't see any chance
of missing a hashed dentry and allocating one more dentry with the same name.
Thanks
Maneesh
--
Maneesh Soni
IBM Linux Technology Center,
IBM India Software Lab, Bangalore.
Phone: +91-80-5044999 email: maneesh@in.ibm.com
http://lse.sourceforge.net/
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-09 22:27 ` Andrew Morton
@ 2004-05-11 5:26 ` Maneesh Soni
0 siblings, 0 replies; 62+ messages in thread
From: Maneesh Soni @ 2004-05-11 5:26 UTC (permalink / raw)
To: Andrew Morton; +Cc: viro, dipankar, manfred, torvalds, davej, wli, linux-kernel
On Sun, May 09, 2004 at 03:27:20PM -0700, Andrew Morton wrote:
> viro@parcelfarce.linux.theplanet.co.uk wrote:
> >
> > On Sun, May 09, 2004 at 09:03:16PM +0530, Dipankar Sarma wrote:
> >
> > > Actually, what may happen is that since the dentries are added
> > > in the front, a double move like that would result in hash chain
> > > traversal looping. Timing dependent and unlikely, but d_move_count
> > > avoided that theoritical possibility. It is not about skipping
> > > dentries which is safe because a miss would result in a real_lookup()
> >
> > Not really. A miss could result in getting another dentry allocated
> > for the same e.g. directory, which is *NOT* harmless at all.
>
> The d_bucket logic does look a bit odd.
>
> dentry = hlist_entry(node, struct dentry, d_hash);
>
> /* if lookup ends up in a different bucket
> * due to concurrent rename, fail it
> */
> if (unlikely(dentry->d_bucket != head))
> break;
>
> /*
> * We must take a snapshot of d_move_count followed by
> * read memory barrier before any search key comparison
> */
> move_count = dentry->d_move_count;
>
> There is a window between the d_bucket test and sampling of d_move_count.
> What happens if the dentry gets moved around in there?
>
> Anyway, regardless of that, it is more efficient to test d_bucket _after_
> performing the hash comparison. And it seems saner to perform the d_bucket
> check when things are pinned down by d_lock.
>
This should be fine. Earlier d_bucket check was done before "continue" as the
lookup used to loop infinetly. The reason for infinite looping was that lookup
going to a different hash bucket due to concurrent d_move and not finding
the list head from where it started.
After introduction of hlist, there is less chance of lookup looping
infinitely even if it is moved to a different hash bucket as hlist ends with
NULL.
But I still see theoritical possibilty of increased looping. Double rename can
keep putting lookup back at the head of hash chain and hlist end is never seen.
--
Maneesh Soni
IBM Linux Technology Center,
IBM India Software Lab, Bangalore.
Phone: +91-80-5044999 email: maneesh@in.ibm.com
http://lse.sourceforge.net/
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-10-06 12:58 ` Maneesh Soni
@ 2004-05-11 20:22 ` Andrew Morton
2004-05-14 10:33 ` Raghavan
0 siblings, 1 reply; 62+ messages in thread
From: Andrew Morton @ 2004-05-11 20:22 UTC (permalink / raw)
To: maneesh; +Cc: dipankar, torvalds, manfred, davej, wli, linux-kernel
Maneesh Soni <maneesh@in.ibm.com> wrote:
>
> We can see this happening in the following numbers taken using dcachebench*
> gathered on 2-way P4 Xeon 2.4MHz SMP box with 4.5GB RAM. The benchmark was run
> with the following parameters and averaged over 10 runs.
> ./dcachebench -p 32 -b testdir
>
> Average microseconds/iterations Std. Deviation
> (lesser is better)
> 2.6.6 10204 161.5
> 2.6.6-mm1 10571 51.5
>
Well.. this could be anything. If the hash is any good -mm shouldn't be
doing significantly more locked operations. (I think - didn't check very
closely).
Also the inode and dentry hash algorithms were changed in -mm. You can
evaluate the effect of that by comparing 2.6.6 with current Linus -bk.
If we compare 2.6.6-bk with 2.6.6-mm1 and see a slowdown on SMP and no
slowdown on UP then yup, it might be due to additional locking.
But we should evaluate the hash changes separately.
Summary:
2.6.6-rc3: baseline
2.6.6: dentry size+alignment changes
2.6.6-bk: dentry size+alignment changes, hash changes
2.6.6-mm1: dentry size+alignment changes, hash changes, lots of other stuff.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-11 20:22 ` Andrew Morton
@ 2004-05-14 10:33 ` Raghavan
2004-05-14 10:50 ` Paul Jackson
2004-05-14 11:18 ` Dipankar Sarma
0 siblings, 2 replies; 62+ messages in thread
From: Raghavan @ 2004-05-14 10:33 UTC (permalink / raw)
To: Andrew Morton
Cc: maneesh, dipankar, torvalds, manfred, davej, wli, linux-kernel
Hi,
The following is the output of dcachebench run on the 4 kernels:
Environment - 2-way P4 Xeon 2.4MHz SMP box with 4.5GB RAM.
Tests were run for 10 iterations to calculate the milliseconds/iteration
and then mean and deviation were calculated.
Kernel version Mean Standard Deviation
--------------- ---- ------------------
2.6.6-rc3(baseline) 10578 221
2.6.6 10280 110
2.6.6-bk 10862 30
2.6.6-mm1 10626 36
To find out if the huge performance dip between the 2.6.6
and 2.6.6-bk is because of the hash changes, I removed the hash patch
from 2.6.6-bk and applied it to 2.6.6.
2.6.6-bk with old hash 10685 34
2.6.6 with new hash 10496 125
Looks like the new hashing function has brought down the performance.
Also some code outside dcache.c and inode.c seems to have pushed down
the performance in 2.6.6-bk.
Raghav
On Tue, May 11, 2004 at 01:22:05PM -0700, Andrew Morton wrote:
> Maneesh Soni <maneesh@in.ibm.com> wrote:
> >
> > We can see this happening in the following numbers taken using dcachebench*
> > gathered on 2-way P4 Xeon 2.4MHz SMP box with 4.5GB RAM. The benchmark was run
> > with the following parameters and averaged over 10 runs.
> > ./dcachebench -p 32 -b testdir
> >
> > Average microseconds/iterations Std. Deviation
> > (lesser is better)
> > 2.6.6 10204 161.5
> > 2.6.6-mm1 10571 51.5
> >
>
> Well.. this could be anything. If the hash is any good -mm shouldn't be
> doing significantly more locked operations. (I think - didn't check very
> closely).
>
> Also the inode and dentry hash algorithms were changed in -mm. You can
> evaluate the effect of that by comparing 2.6.6 with current Linus -bk.
>
> If we compare 2.6.6-bk with 2.6.6-mm1 and see a slowdown on SMP and no
> slowdown on UP then yup, it might be due to additional locking.
>
> But we should evaluate the hash changes separately.
>
> Summary:
>
> 2.6.6-rc3: baseline
> 2.6.6: dentry size+alignment changes
> 2.6.6-bk: dentry size+alignment changes, hash changes
> 2.6.6-mm1: dentry size+alignment changes, hash changes, lots of other stuff.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-14 10:33 ` Raghavan
@ 2004-05-14 10:50 ` Paul Jackson
2004-05-14 11:04 ` Jens Axboe
2004-05-14 11:18 ` Dipankar Sarma
1 sibling, 1 reply; 62+ messages in thread
From: Paul Jackson @ 2004-05-14 10:50 UTC (permalink / raw)
To: raghav; +Cc: akpm, maneesh, dipankar, torvalds, manfred, davej, wli,
linux-kernel
> Looks like the new hashing function has brought down the performance.
"brought down the performance" as in "better, faster", right?
That is, your numbers show that the new hashing function is good, right?
Or do I have it backwards?
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-14 10:50 ` Paul Jackson
@ 2004-05-14 11:04 ` Jens Axboe
2004-05-14 11:14 ` Paul Jackson
0 siblings, 1 reply; 62+ messages in thread
From: Jens Axboe @ 2004-05-14 11:04 UTC (permalink / raw)
To: Paul Jackson
Cc: raghav, akpm, maneesh, dipankar, torvalds, manfred, davej, wli,
linux-kernel
On Fri, May 14 2004, Paul Jackson wrote:
> > Looks like the new hashing function has brought down the performance.
>
> "brought down the performance" as in "better, faster", right?
>
> That is, your numbers show that the new hashing function is good, right?
>
> Or do I have it backwards?
The numbers were ms/iteration, so I guess you do.
--
Jens Axboe
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-14 11:04 ` Jens Axboe
@ 2004-05-14 11:14 ` Paul Jackson
2004-05-14 11:24 ` Jens Axboe
2004-05-14 11:24 ` Dipankar Sarma
0 siblings, 2 replies; 62+ messages in thread
From: Paul Jackson @ 2004-05-14 11:14 UTC (permalink / raw)
To: Jens Axboe
Cc: raghav, akpm, maneesh, dipankar, torvalds, manfred, davej, wli,
linux-kernel
> so I guess you do.
Sorry - I'm being thick.
Is the new hashing good or bad?
(Usually, performance is thought of as something 'good', so when you say
it is 'brought down', that sounds 'bad', but since it's ms/iteration,
I'm guessing that you mean to say that the ms/iteration is lower, which
would I guess improves performance, so I'm guessing that bringing
performance down is 'good' in this case, which is not idiomatic to the
particular version of English I happen to speak ... So please favor
this poor old brain of mine and state outright whether the new hash is
good or bad. Does the new hash makes performance better or worse?)
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-14 10:33 ` Raghavan
2004-05-14 10:50 ` Paul Jackson
@ 2004-05-14 11:18 ` Dipankar Sarma
2004-05-14 14:44 ` Linus Torvalds
1 sibling, 1 reply; 62+ messages in thread
From: Dipankar Sarma @ 2004-05-14 11:18 UTC (permalink / raw)
To: Raghavan
Cc: Andrew Morton, maneesh, torvalds, manfred, davej, wli,
linux-kernel
On Fri, May 14, 2004 at 04:03:23PM +0530, Raghavan wrote:
>
> Environment - 2-way P4 Xeon 2.4MHz SMP box with 4.5GB RAM.
> Tests were run for 10 iterations to calculate the milliseconds/iteration
> and then mean and deviation were calculated.
I think it is microseconds/iteration. Lesser the better.
> Kernel version Mean Standard Deviation
> --------------- ---- ------------------
>
> 2.6.6-rc3(baseline) 10578 221
>
> 2.6.6 10280 110
So alignment changes helped.
>
> 2.6.6-bk 10862 30
Hash function changes regressed.
> 2.6.6-mm1 10626 36
dentry size change patchset helps.
> To find out if the huge performance dip between the 2.6.6
> and 2.6.6-bk is because of the hash changes, I removed the hash patch
> from 2.6.6-bk and applied it to 2.6.6.
>
> 2.6.6-bk with old hash 10685 34
>
> 2.6.6 with new hash 10496 125
>
> Looks like the new hashing function has brought down the performance.
> Also some code outside dcache.c and inode.c seems to have pushed down
> the performance in 2.6.6-bk.
OK, I am confused. These numbers show that the new hash function
is better. It contradicts your conclusion. And why are you
comparing 2.6.6-bk+old has with 2.6.6+new hash ? Why not
2.6.6-bk vs. 2.6.6-bk-with-old-hash ?
Thanks
Dipankar
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-14 11:14 ` Paul Jackson
@ 2004-05-14 11:24 ` Jens Axboe
2004-05-14 11:30 ` Paul Jackson
2004-05-14 11:24 ` Dipankar Sarma
1 sibling, 1 reply; 62+ messages in thread
From: Jens Axboe @ 2004-05-14 11:24 UTC (permalink / raw)
To: Paul Jackson
Cc: raghav, akpm, maneesh, dipankar, torvalds, manfred, davej, wli,
linux-kernel
On Fri, May 14 2004, Paul Jackson wrote:
> > so I guess you do.
>
> Sorry - I'm being thick.
>
> Is the new hashing good or bad?
>
> (Usually, performance is thought of as something 'good', so when you say
> it is 'brought down', that sounds 'bad', but since it's ms/iteration,
> I'm guessing that you mean to say that the ms/iteration is lower, which
> would I guess improves performance, so I'm guessing that bringing
> performance down is 'good' in this case, which is not idiomatic to the
> particular version of English I happen to speak ... So please favor
> this poor old brain of mine and state outright whether the new hash is
> good or bad. Does the new hash makes performance better or worse?)
:-)
I can only say the way I read the numbers, the new hashing scores higher
ms/iteration which is a bad thing. So when it is stated that
'performance is brought down' I completely agree that it describes the
situation, performance is worse than before.
First table shows 2.6.6 (with old hash) doing better than 2.6.6-BK with
new hash. It then shows 2.6.6-Bk with old hash doing worse than 2.6.6
still, so it's not just the hash that has slowed things down.
2.6.6-new_hash does worse than 2.6.6-stock.
--
Jens Axboe
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-14 11:14 ` Paul Jackson
2004-05-14 11:24 ` Jens Axboe
@ 2004-05-14 11:24 ` Dipankar Sarma
1 sibling, 0 replies; 62+ messages in thread
From: Dipankar Sarma @ 2004-05-14 11:24 UTC (permalink / raw)
To: Paul Jackson
Cc: Jens Axboe, raghav, akpm, maneesh, torvalds, manfred, davej, wli,
linux-kernel
On Fri, May 14, 2004 at 04:14:33AM -0700, Paul Jackson wrote:
> > so I guess you do.
>
> Sorry - I'm being thick.
>
> Is the new hashing good or bad?
That I don't know from the contradicting descriptions, but...
> (Usually, performance is thought of as something 'good', so when you say
> it is 'brought down', that sounds 'bad', but since it's ms/iteration,
> I'm guessing that you mean to say that the ms/iteration is lower, which
> would I guess improves performance, so I'm guessing that bringing
> performance down is 'good' in this case, which is not idiomatic to the
> particular version of English I happen to speak ... So please favor
In this case, the performance is inversely proportional to the
benchmark metric. Lower benchmark metric means better performance.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-14 11:24 ` Jens Axboe
@ 2004-05-14 11:30 ` Paul Jackson
0 siblings, 0 replies; 62+ messages in thread
From: Paul Jackson @ 2004-05-14 11:30 UTC (permalink / raw)
To: Jens Axboe
Cc: raghav, akpm, maneesh, dipankar, torvalds, manfred, davej, wli,
linux-kernel
> I can only say the way I read the numbers,
Ah - it wasn't your reading of the numbers that I had in doubt.
It was my ability to read that I doubted.
> ... performance is worse than before.
> ... the hash that has slowed things down
Ok - that's clear. Thank-you for your patience.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-14 11:18 ` Dipankar Sarma
@ 2004-05-14 14:44 ` Linus Torvalds
0 siblings, 0 replies; 62+ messages in thread
From: Linus Torvalds @ 2004-05-14 14:44 UTC (permalink / raw)
To: Dipankar Sarma
Cc: Raghavan, Andrew Morton, maneesh, manfred, davej, wli,
linux-kernel
On Fri, 14 May 2004, Dipankar Sarma wrote:
> >
> > 2.6.6 10280 110
> >
> > 2.6.6-bk 10862 30
>
> > To find out if the huge performance dip between the 2.6.6
> > and 2.6.6-bk is because of the hash changes, I removed the hash patch
> > from 2.6.6-bk and applied it to 2.6.6.
> >
> > 2.6.6-bk with old hash 10685 34
> >
> > 2.6.6 with new hash 10496 125
> >
> > Looks like the new hashing function has brought down the performance.
> > Also some code outside dcache.c and inode.c seems to have pushed down
> > the performance in 2.6.6-bk.
>
> OK, I am confused. These numbers show that the new hash function
> is better.
No, look again.
old hash new hash
2.6.6: 10280 10496
2.6.6-bk: 10685 10862
in both cases, the new hash makes each iteration about ~200us (or whatever
the metric is) slower.
There's something _else_ going on too, since plain 2.6.6 is so clearly
better than the others. I don't know why - the only thing -bk does is the
hash change and some changes to GFP_NOFS behaviour (different return value
from shrink_dentries or whatever). Which shouldn't even trigger, I'd have
assumed this is all cached.
NOTE! Just "simple" things like variations in I$ layout of the kernel code
can make a pretty big difference if you're unlucky. And the new dentry
code doesn't align the things on a D$ cache line boundary, so there could
easily be "consistent" differences from that - just from the order of
dentries allocated etc.
But it's interesting to note that the hash does make a difference. The old
hash was very much optimized for simplicity (those hash-calculation
routines are some of the hottest in the kernel). But I don't see that a
few extra instructions should make that big of a difference.
Linus
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat.
2004-05-08 20:13 ` Dipankar Sarma
@ 2004-10-06 12:58 ` Maneesh Soni
2004-05-11 20:22 ` Andrew Morton
0 siblings, 1 reply; 62+ messages in thread
From: Maneesh Soni @ 2004-10-06 12:58 UTC (permalink / raw)
To: Dipankar Sarma
Cc: Linus Torvalds, Andrew Morton, manfred, davej, wli, linux-kernel
On Sat, May 08, 2004 at 08:30:55PM +0000, Dipankar Sarma wrote:
> On Sat, May 08, 2004 at 12:13:09PM -0700, Linus Torvalds wrote:
> > On Sat, 8 May 2004, Andrew Morton wrote:
> > >
> > > I think we can simply take ->d_lock a bit earlier in __d_lookup. That will
> > > serialise against d_move(), fixing the problem which you mention, and also
> > > makes d_movecount go away.
> >
> > If you do that, RCU basically loses most of it's meaning.
> >
> > You'll be taking a lock for - and dirtying in the cache - every single
> > dentry on the hash chain, which is pretty much guaranteed to be slower
> > than just taking the dcache_lock _once_, even if that one jumps across
> > CPU's a lot.
> >
> > In other words, no, I don't think that's a good idea. We really want to
> > take the dentry lock only after we're pretty sure we have the right
> > dentry. Otherwise the dentry chains will be bouncing from CPU to CPU all
> > the time.
>
> Exactly. Taking ->d_lock for every dentry while traversing the hash
> chain should be avoided. As such, atomic operations on that path
> are getting costly.
We can see this happening in the following numbers taken using dcachebench*
gathered on 2-way P4 Xeon 2.4MHz SMP box with 4.5GB RAM. The benchmark was run
with the following parameters and averaged over 10 runs.
./dcachebench -p 32 -b testdir
Average microseconds/iterations Std. Deviation
(lesser is better)
2.6.6 10204 161.5
2.6.6-mm1 10571 51.5
*dcachebench is a microbenchmark written by Bill Hartner and is available at
http://www-124.ibm.com/developerworks/opensource/linuxperf/dcachebench/dcachebench.html
--
Maneesh Soni
Linux Technology Center,
IBM Software Lab, Bangalore, India
email: maneesh@in.ibm.com
Phone: 91-80-25044999 Fax: 91-80-25268553
T/L : 9243696
^ permalink raw reply [flat|nested] 62+ messages in thread
end of thread, other threads:[~2004-05-14 14:45 UTC | newest]
Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20040506200027.GC26679@redhat.com>
[not found] ` <20040506150944.126bb409.akpm@osdl.org>
[not found] ` <409B1511.6010500@colorfullife.com>
2004-05-08 8:23 ` dentry bloat Andrew Morton
2004-05-08 9:23 ` Andrew Morton
2004-05-08 10:11 ` Andrew Morton
2004-05-08 10:12 ` Andrew Morton
2004-05-08 10:28 ` viro
2004-05-08 10:41 ` Andrew Morton
2004-05-08 10:52 ` Andrew Morton
2004-05-08 10:31 ` Manfred Spraul
2004-05-08 17:28 ` Linus Torvalds
2004-05-08 18:19 ` David S. Miller
2004-05-08 19:01 ` Andrew Morton
2004-05-08 19:13 ` Linus Torvalds
2004-05-08 19:27 ` Andrew Morton
2004-05-08 19:27 ` Linus Torvalds
2004-05-08 20:42 ` Dipankar Sarma
2004-05-08 20:55 ` Andrew Morton
2004-05-08 21:19 ` Dipankar Sarma
2004-05-09 0:10 ` Andrew Morton
2004-05-09 2:55 ` Linus Torvalds
2004-05-09 3:12 ` David S. Miller
2004-05-09 3:53 ` Linus Torvalds
2004-05-09 21:03 ` Matt Mackall
2004-05-10 8:27 ` Helge Hafting
2004-05-10 8:32 ` Arjan van de Ven
2004-05-10 9:46 ` Andrew Morton
2004-05-10 14:54 ` Matt Mackall
2004-05-10 16:26 ` Paul E. McKenney
2004-05-10 18:34 ` Dipankar Sarma
2004-05-09 4:12 ` Andrew Morton
2004-05-09 4:25 ` Linus Torvalds
2004-05-09 4:36 ` Andrew Morton
2004-05-09 5:14 ` Linus Torvalds
2004-05-09 7:36 ` Rodolfo Guluarte Hale
2004-05-09 9:10 ` Guennadi Liakhovetski
2004-05-09 9:23 ` viro
2004-05-09 15:35 ` Linus Torvalds
2004-05-09 18:11 ` Matt Mackall
2004-05-09 22:08 ` Francois Romieu
2004-05-09 23:51 ` Paul Jackson
2004-05-10 7:17 ` Florian Weimer
2004-05-10 14:12 ` Rik van Riel
2004-05-09 4:43 ` Linus Torvalds
2004-05-09 7:28 ` Manfred Spraul
2004-05-09 15:33 ` Dipankar Sarma
2004-05-09 22:17 ` viro
2004-05-09 22:27 ` Andrew Morton
2004-05-11 5:26 ` Maneesh Soni
2004-05-10 18:39 ` Dipankar Sarma
2004-05-11 5:17 ` Maneesh Soni
2004-05-08 20:13 ` Dipankar Sarma
2004-10-06 12:58 ` Maneesh Soni
2004-05-11 20:22 ` Andrew Morton
2004-05-14 10:33 ` Raghavan
2004-05-14 10:50 ` Paul Jackson
2004-05-14 11:04 ` Jens Axboe
2004-05-14 11:14 ` Paul Jackson
2004-05-14 11:24 ` Jens Axboe
2004-05-14 11:30 ` Paul Jackson
2004-05-14 11:24 ` Dipankar Sarma
2004-05-14 11:18 ` Dipankar Sarma
2004-05-14 14:44 ` Linus Torvalds
2004-05-08 21:00 ` Dipankar Sarma
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox