* Re: dentry bloat. [not found] ` <409B1511.6010500@colorfullife.com> @ 2004-05-08 8:23 ` Andrew Morton 2004-05-08 9:23 ` Andrew Morton 0 siblings, 1 reply; 62+ messages in thread From: Andrew Morton @ 2004-05-08 8:23 UTC (permalink / raw) To: Manfred Spraul; +Cc: davej, torvalds, wli, linux-kernel (added lkml) Manfred Spraul <manfred@colorfullife.com> wrote: > > Andrew Morton wrote: > > >It does seem a little excessive. Maybe we should round the size up by "the > >next multiple of sizeof(void*) which is greater than 32". > > > > > Sounds good. > But the SLAB_HWCACHE_ALIGN parameter from the kmem_cache_create call > must be removed as well - otherwise slab will round the size up to 256 > again. > There are a couple of things I don't understand in the dentry name handling: a) How come switch_names does a memcpy of d_iname even if the name might not be in there? b) Why does d_alloc() resize the storage for the out-of-line name to the next multiple of 16? c) Why is it that when we generate an out-of-line name we allocate a complete new qstr rather than just storage for the name? The dentry has a whole qstr just sitting there doing nothing, and from a quick squizz it seems that we could simply remove dentry.d_qstr, make dentry.d_name.name point at the kmalloced name and not allocate all the other parts of the qstr? Anyway, here be a patch. Save a bit of dcache space. Currently we use all the space from the start of dentry.d_name to the next cacheline for the inline name. That can be 128 bytes or more nowadays, which is excessive. In fact it's exactly 128 bytes of inline name length for an x86 P4 build (both UP and SMP). Rework things so that the inline name length is between 31 and 48 bytes. On SMP P4-compiled x86 each dentry goes from 256 bytes (15 per page) to 160 bytes (24 per page). The downside is that we'll have more out-of-line names. Here's the histogram of name lengths on all 1.5M files on my workstation: 1: 0% 2: 0% 3: 1% 4: 5% 5: 8% 6: 13% 7: 19% 8: 26% 9: 33% 10: 42% 11: 49% 12: 55% 13: 60% 14: 64% 15: 67% 16: 69% 17: 71% 18: 73% 19: 75% 20: 76% 21: 78% 22: 79% 23: 80% 24: 81% 25: 82% 26: 83% 27: 85% 28: 86% 29: 87% 30: 88% 31: 89% 32: 90% 33: 91% 34: 92% 35: 93% 36: 94% 37: 95% 38: 96% 39: 96% 40: 96% 41: 96% 42: 96% 43: 96% 44: 97% 45: 97% 46: 97% 47: 97% 48: 97% 49: 98% 50: 98% 51: 98% 52: 98% 53: 98% 54: 98% 55: 98% 56: 98% 57: 98% 58: 98% 59: 98% 60: 99% 61: 99% 62: 99% 63: 99% 64: 99% So on x86 we'll fit 89% of filenames into the inline name. The patch also removes the NAME_ALLOC_LEN() rounding-up of the storage for the out-of-line names. That seems unnecessary. --- 25-akpm/fs/dcache.c | 16 ++++++++++------ 25-akpm/include/linux/dcache.h | 8 ++------ 2 files changed, 12 insertions(+), 12 deletions(-) diff -puN fs/dcache.c~dentry-shrinkage fs/dcache.c --- 25/fs/dcache.c~dentry-shrinkage 2004-05-08 00:55:58.762312080 -0700 +++ 25-akpm/fs/dcache.c 2004-05-08 00:56:22.590689616 -0700 @@ -42,6 +42,13 @@ EXPORT_SYMBOL(dcache_lock); static kmem_cache_t *dentry_cache; /* + * The allocation size for each dentry. It is a multiple of 16 bytes. We + * leave the final 32-47 bytes for the inline name. + */ +#define DENTRY_STORAGE (((sizeof(struct dentry)+32) + 15) & ~15) +#define DNAME_INLINE_LEN (DENTRY_STORAGE - sizeof(struct dentry)) + +/* * This is the single most critical data structure when it comes * to the dcache: the hashtable for lookups. Somebody should try * to make this good - I've just made it work. @@ -665,8 +672,6 @@ static int shrink_dcache_memory(int nr, return dentry_stat.nr_unused; } -#define NAME_ALLOC_LEN(len) ((len+16) & ~15) - /** * d_alloc - allocate a dcache entry * @parent: parent of entry to allocate @@ -688,8 +693,7 @@ struct dentry * d_alloc(struct dentry * return NULL; if (name->len > DNAME_INLINE_LEN-1) { - qstr = kmalloc(sizeof(*qstr) + NAME_ALLOC_LEN(name->len), - GFP_KERNEL); + qstr = kmalloc(sizeof(*qstr) + name->len + 1, GFP_KERNEL); if (!qstr) { kmem_cache_free(dentry_cache, dentry); return NULL; @@ -1563,8 +1567,8 @@ static void __init dcache_init(unsigned * flag could be removed here, to hint to the allocator that * it should not try to get multiple page regions. */ - dentry_cache = kmem_cache_create("dentry_cache", sizeof(struct dentry), - 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_PANIC, + dentry_cache = kmem_cache_create("dentry_cache", DENTRY_STORAGE, + 0, SLAB_RECLAIM_ACCOUNT|SLAB_PANIC, NULL, NULL); set_shrinker(DEFAULT_SEEKS, shrink_dcache_memory); diff -puN include/linux/dcache.h~dentry-shrinkage include/linux/dcache.h --- 25/include/linux/dcache.h~dentry-shrinkage 2004-05-08 00:55:58.774310256 -0700 +++ 25-akpm/include/linux/dcache.h 2004-05-08 00:55:58.779309496 -0700 @@ -74,8 +74,6 @@ full_name_hash(const unsigned char *name return end_name_hash(hash); } -#define DNAME_INLINE_LEN_MIN 16 - struct dcookie_struct; struct dentry { @@ -101,11 +99,9 @@ struct dentry { struct qstr d_name; struct hlist_node d_hash; /* lookup hash list */ struct hlist_head * d_bucket; /* lookup hash bucket */ - unsigned char d_iname[DNAME_INLINE_LEN_MIN]; /* small names */ -} ____cacheline_aligned; + unsigned char d_iname[0]; /* small names */ +}; -#define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname)) - struct dentry_operations { int (*d_revalidate)(struct dentry *, struct nameidata *); int (*d_hash) (struct dentry *, struct qstr *); _ ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 8:23 ` dentry bloat Andrew Morton @ 2004-05-08 9:23 ` Andrew Morton 2004-05-08 10:11 ` Andrew Morton 0 siblings, 1 reply; 62+ messages in thread From: Andrew Morton @ 2004-05-08 9:23 UTC (permalink / raw) To: manfred, davej, torvalds, wli, linux-kernel Andrew Morton <akpm@osdl.org> wrote: > > There are a couple of things I don't understand in the dentry name handling: > > a) How come switch_names does a memcpy of d_iname even if the name might > not be in there? OK, just in case the target dentry was internal. > b) Why does d_alloc() resize the storage for the out-of-line name to the > next multiple of 16? No reason I can see. > c) Why is it that when we generate an out-of-line name we allocate a > complete new qstr rather than just storage for the name? The dentry has > a whole qstr just sitting there doing nothing, and from a quick squizz > it seems that we could simply remove dentry.d_qstr, make > dentry.d_name.name point at the kmalloced name and not allocate all the > other parts of the qstr? Great idea! Here be another patch. When dentries are given an external name we currently allocate an entire qstr for the external name. This isn't needed. We can use the internal qstr and kmalloc only the string itself. This saves 12 bytes from externally-allocated names and 4 bytes from the dentry itself. The saving of 4 bytes from the dentry doesn't actually decrease the dentry's storage requirements, but it makes four more bytes available for internal names, taking the internal/external ratio from 89% up to 93% on my 1.5M files. --- 25-akpm/fs/dcache.c | 94 +++++++++++++++++++++-------------------- 25-akpm/include/linux/dcache.h | 8 +-- 2 files changed, 53 insertions(+), 49 deletions(-) diff -puN include/linux/dcache.h~dentry-qstr-consolidation include/linux/dcache.h --- 25/include/linux/dcache.h~dentry-qstr-consolidation 2004-05-08 02:16:19.700418088 -0700 +++ 25-akpm/include/linux/dcache.h 2004-05-08 02:16:19.706417176 -0700 @@ -29,10 +29,9 @@ struct vfsmount; * saves "metadata" about the string (ie length and the hash). */ struct qstr { - const unsigned char * name; + const unsigned char *name; unsigned int len; unsigned int hash; - char name_str[0]; }; struct dentry_stat_t { @@ -94,7 +93,6 @@ struct dentry { struct rcu_head d_rcu; struct dcookie_struct * d_cookie; /* cookie, if any */ unsigned long d_move_count; /* to indicated moved dentry while lockless lookup */ - struct qstr * d_qstr; /* quick str ptr used in lockless lookup and concurrent d_move */ struct dentry * d_parent; /* parent directory */ struct qstr d_name; struct hlist_node d_hash; /* lookup hash list */ @@ -184,9 +182,9 @@ static inline void d_drop(struct dentry spin_unlock(&dcache_lock); } -static inline int dname_external(struct dentry *d) +static inline int dname_external(struct dentry *dentry) { - return d->d_name.name != d->d_iname; + return dentry->d_name.name != dentry->d_iname; } /* diff -puN fs/dcache.c~dentry-qstr-consolidation fs/dcache.c --- 25/fs/dcache.c~dentry-qstr-consolidation 2004-05-08 02:16:19.702417784 -0700 +++ 25-akpm/fs/dcache.c 2004-05-08 02:16:19.709416720 -0700 @@ -73,9 +73,8 @@ static void d_callback(void *arg) { struct dentry * dentry = (struct dentry *)arg; - if (dname_external(dentry)) { - kfree(dentry->d_qstr); - } + if (dname_external(dentry)) + kfree(dentry->d_name.name); kmem_cache_free(dentry_cache, dentry); } @@ -682,34 +681,30 @@ static int shrink_dcache_memory(int nr, * copied and the copy passed in may be reused after this call. */ -struct dentry * d_alloc(struct dentry * parent, const struct qstr *name) +struct dentry *d_alloc(struct dentry * parent, const struct qstr *name) { - char * str; struct dentry *dentry; - struct qstr * qstr; + char *dname; dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL); if (!dentry) return NULL; if (name->len > DNAME_INLINE_LEN-1) { - qstr = kmalloc(sizeof(*qstr) + name->len + 1, GFP_KERNEL); - if (!qstr) { + dname = kmalloc(name->len + 1, GFP_KERNEL); + if (!dname) { kmem_cache_free(dentry_cache, dentry); return NULL; } - qstr->name = qstr->name_str; - qstr->len = name->len; - qstr->hash = name->hash; - dentry->d_qstr = qstr; - str = qstr->name_str; } else { - dentry->d_qstr = &dentry->d_name; - str = dentry->d_iname; + dname = dentry->d_iname; } + dentry->d_name.name = dname; - memcpy(str, name->name, name->len); - str[name->len] = 0; + dentry->d_name.len = name->len; + dentry->d_name.hash = name->hash; + memcpy(dname, name->name, name->len); + dname[name->len] = 0; atomic_set(&dentry->d_count, 1); dentry->d_vfs_flags = DCACHE_UNHASHED; @@ -719,9 +714,6 @@ struct dentry * d_alloc(struct dentry * dentry->d_parent = NULL; dentry->d_move_count = 0; dentry->d_sb = NULL; - dentry->d_name.name = str; - dentry->d_name.len = name->len; - dentry->d_name.hash = name->hash; dentry->d_op = NULL; dentry->d_fsdata = NULL; dentry->d_mounted = 0; @@ -788,7 +780,8 @@ struct dentry * d_alloc_root(struct inod struct dentry *res = NULL; if (root_inode) { - static const struct qstr name = { .name = "/", .len = 1, .hash = 0 }; + static const struct qstr name = { .name = "/", .len = 1 }; + res = d_alloc(NULL, &name); if (res) { res->d_sb = root_inode->i_sb; @@ -829,7 +822,7 @@ static inline struct hlist_head *d_hash( struct dentry * d_alloc_anon(struct inode *inode) { - static const struct qstr anonstring = { "", 0, 0}; + static const struct qstr anonstring = { .name = "" }; struct dentry *tmp; struct dentry *res; @@ -1003,7 +996,7 @@ struct dentry * __d_lookup(struct dentry if (dentry->d_parent != parent) continue; - qstr = dentry->d_qstr; + qstr = &dentry->d_name; smp_read_barrier_depends(); if (parent->d_op && parent->d_op->d_compare) { if (parent->d_op->d_compare(parent, qstr, name)) @@ -1145,28 +1138,40 @@ void d_rehash(struct dentry * entry) * then no longer matches the actual (corrupted) string of the target. * The hash value has to match the hash queue that the dentry is on.. */ -static inline void switch_names(struct dentry * dentry, struct dentry * target) +static inline void switch_names(struct dentry *dentry, struct dentry *target) { - const unsigned char *old_name, *new_name; - struct qstr *old_qstr, *new_qstr; - - memcpy(dentry->d_iname, target->d_iname, DNAME_INLINE_LEN); - old_qstr = target->d_qstr; - old_name = target->d_name.name; - new_qstr = dentry->d_qstr; - new_name = dentry->d_name.name; - if (old_name == target->d_iname) { - old_name = dentry->d_iname; - old_qstr = &dentry->d_name; - } - if (new_name == dentry->d_iname) { - new_name = target->d_iname; - new_qstr = &target->d_name; - } - target->d_name.name = new_name; - dentry->d_name.name = old_name; - target->d_qstr = new_qstr; - dentry->d_qstr = old_qstr; + if (dname_external(target)) { + if (dname_external(dentry)) { + /* + * Both external: swap the pointers + */ + do_switch(target->d_name.name, dentry->d_name.name); + } else { + /* + * dentry:internal, target:external. Steal target's + * storage and make target internal. + */ + dentry->d_name.name = target->d_name.name; + target->d_name.name = target->d_iname; + } + } else { + if (dname_external(dentry)) { + /* + * dentry:external, target:internal. Give dentry's + * storage to target and make dentry internal + */ + memcpy(dentry->d_iname, target->d_name.name, + target->d_name.len + 1); + target->d_name.name = dentry->d_name.name; + dentry->d_name.name = dentry->d_iname; + } else { + /* + * Both are internal. Just copy target to dentry + */ + memcpy(dentry->d_iname, target->d_name.name, + target->d_name.len + 1); + } + } } /* @@ -1342,6 +1347,7 @@ char * d_path(struct dentry *dentry, str char *res; struct vfsmount *rootmnt; struct dentry *root; + read_lock(¤t->fs->lock); rootmnt = mntget(current->fs->rootmnt); root = dget(current->fs->root); _ ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 9:23 ` Andrew Morton @ 2004-05-08 10:11 ` Andrew Morton 2004-05-08 10:12 ` Andrew Morton ` (3 more replies) 0 siblings, 4 replies; 62+ messages in thread From: Andrew Morton @ 2004-05-08 10:11 UTC (permalink / raw) To: manfred, davej, torvalds, wli, linux-kernel Andrew Morton <akpm@osdl.org> wrote: > > Here be another patch. Can't help myself! - d_vfs_flags can be removed - just use d_flags. It looks like someone added d_vfs_flags with the intent of doing bitops on it, but all modifications are under dcache_lock. 4 bytes saved. - make d_flags a ushort. - Assume that nobody wants filenames longer than 64k and make qstr.len unsigned short too. - Pack things so that dentry.d_name.len and dentry.d_flags occupy the same word. 4 bytes saved. - NFS and AFS are modifying d_flags without dcache_lock. On x86 this takes the internal string size up to 44 bytes. The internal/external ratio on my 1.5M files hits 96%. --- 25-akpm/fs/dcache.c | 17 ++++++++--------- 25-akpm/include/linux/dcache.h | 33 +++++++++++++++++---------------- 2 files changed, 25 insertions(+), 25 deletions(-) diff -puN include/linux/dcache.h~dentry-more-shrinkage include/linux/dcache.h --- 25/include/linux/dcache.h~dentry-more-shrinkage 2004-05-08 02:46:53.134693712 -0700 +++ 25-akpm/include/linux/dcache.h 2004-05-08 02:56:16.687020736 -0700 @@ -30,9 +30,9 @@ struct vfsmount; */ struct qstr { const unsigned char *name; - unsigned int len; unsigned int hash; -}; + unsigned short len; +} __attribute__((packed)); struct dentry_stat_t { int nr_dentry; @@ -77,28 +77,29 @@ struct dcookie_struct; struct dentry { atomic_t d_count; - unsigned long d_vfs_flags; /* moved here to be on same cacheline */ spinlock_t d_lock; /* per dentry lock */ - struct inode * d_inode; /* Where the name belongs to - NULL is negative */ + struct inode *d_inode; /* Where the name belongs to - NULL is + * negative */ struct list_head d_lru; /* LRU list */ struct list_head d_child; /* child of parent list */ struct list_head d_subdirs; /* our children */ struct list_head d_alias; /* inode alias list */ unsigned long d_time; /* used by d_revalidate */ - struct dentry_operations *d_op; - struct super_block * d_sb; /* The root of the dentry tree */ - unsigned int d_flags; + struct dentry_operations *d_op; + struct super_block *d_sb; /* The root of the dentry tree */ int d_mounted; - void * d_fsdata; /* fs-specific data */ + void *d_fsdata; /* fs-specific data */ struct rcu_head d_rcu; - struct dcookie_struct * d_cookie; /* cookie, if any */ - unsigned long d_move_count; /* to indicated moved dentry while lockless lookup */ - struct dentry * d_parent; /* parent directory */ + struct dcookie_struct *d_cookie; /* cookie, if any */ + unsigned long d_move_count; /* to indicated moved dentry while + * lockless lookup */ + struct dentry *d_parent; /* parent directory */ struct qstr d_name; + unsigned short d_flags; struct hlist_node d_hash; /* lookup hash list */ - struct hlist_head * d_bucket; /* lookup hash bucket */ + struct hlist_head *d_bucket; /* lookup hash bucket */ unsigned char d_iname[0]; /* small names */ -}; +} __attribute__((packed)); struct dentry_operations { int (*d_revalidate)(struct dentry *, struct nameidata *); @@ -169,8 +170,8 @@ extern spinlock_t dcache_lock; static inline void __d_drop(struct dentry *dentry) { - if (!(dentry->d_vfs_flags & DCACHE_UNHASHED)) { - dentry->d_vfs_flags |= DCACHE_UNHASHED; + if (!(dentry->d_flags & DCACHE_UNHASHED)) { + dentry->d_flags |= DCACHE_UNHASHED; hlist_del_rcu(&dentry->d_hash); } } @@ -281,7 +282,7 @@ extern struct dentry * dget_locked(struc static inline int d_unhashed(struct dentry *dentry) { - return (dentry->d_vfs_flags & DCACHE_UNHASHED); + return (dentry->d_flags & DCACHE_UNHASHED); } static inline struct dentry *dget_parent(struct dentry *dentry) diff -puN fs/dcache.c~dentry-more-shrinkage fs/dcache.c --- 25/fs/dcache.c~dentry-more-shrinkage 2004-05-08 02:55:55.284274448 -0700 +++ 25-akpm/fs/dcache.c 2004-05-08 02:57:10.581827480 -0700 @@ -168,7 +168,7 @@ repeat: if (d_unhashed(dentry)) goto kill_it; if (list_empty(&dentry->d_lru)) { - dentry->d_vfs_flags |= DCACHE_REFERENCED; + dentry->d_flags |= DCACHE_REFERENCED; list_add(&dentry->d_lru, &dentry_unused); dentry_stat.nr_unused++; } @@ -401,8 +401,8 @@ static void prune_dcache(int count) continue; } /* If the dentry was recently referenced, don't free it. */ - if (dentry->d_vfs_flags & DCACHE_REFERENCED) { - dentry->d_vfs_flags &= ~DCACHE_REFERENCED; + if (dentry->d_flags & DCACHE_REFERENCED) { + dentry->d_flags &= ~DCACHE_REFERENCED; list_add(&dentry->d_lru, &dentry_unused); dentry_stat.nr_unused++; spin_unlock(&dentry->d_lock); @@ -707,9 +707,8 @@ struct dentry *d_alloc(struct dentry * p dname[name->len] = 0; atomic_set(&dentry->d_count, 1); - dentry->d_vfs_flags = DCACHE_UNHASHED; + dentry->d_flags = DCACHE_UNHASHED; dentry->d_lock = SPIN_LOCK_UNLOCKED; - dentry->d_flags = 0; dentry->d_inode = NULL; dentry->d_parent = NULL; dentry->d_move_count = 0; @@ -855,7 +854,7 @@ struct dentry * d_alloc_anon(struct inod res->d_inode = inode; res->d_bucket = d_hash(res, res->d_name.hash); res->d_flags |= DCACHE_DISCONNECTED; - res->d_vfs_flags &= ~DCACHE_UNHASHED; + res->d_flags &= ~DCACHE_UNHASHED; list_add(&res->d_alias, &inode->i_dentry); hlist_add_head(&res->d_hash, &inode->i_sb->s_anon); spin_unlock(&res->d_lock); @@ -1117,7 +1116,7 @@ void d_rehash(struct dentry * entry) { struct hlist_head *list = d_hash(entry->d_parent, entry->d_name.hash); spin_lock(&dcache_lock); - entry->d_vfs_flags &= ~DCACHE_UNHASHED; + entry->d_flags &= ~DCACHE_UNHASHED; entry->d_bucket = list; hlist_add_head_rcu(&entry->d_hash, list); spin_unlock(&dcache_lock); @@ -1214,14 +1213,14 @@ void d_move(struct dentry * dentry, stru } /* Move the dentry to the target hash queue, if on different bucket */ - if (dentry->d_vfs_flags & DCACHE_UNHASHED) + if (dentry->d_flags & DCACHE_UNHASHED) goto already_unhashed; if (dentry->d_bucket != target->d_bucket) { hlist_del_rcu(&dentry->d_hash); already_unhashed: dentry->d_bucket = target->d_bucket; hlist_add_head_rcu(&dentry->d_hash, target->d_bucket); - dentry->d_vfs_flags &= ~DCACHE_UNHASHED; + dentry->d_flags &= ~DCACHE_UNHASHED; } /* Unhash the target: dput() will then get rid of it */ _ ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 10:11 ` Andrew Morton @ 2004-05-08 10:12 ` Andrew Morton 2004-05-08 10:28 ` viro ` (2 subsequent siblings) 3 siblings, 0 replies; 62+ messages in thread From: Andrew Morton @ 2004-05-08 10:12 UTC (permalink / raw) To: manfred, davej, torvalds, wli, linux-kernel Andrew Morton <akpm@osdl.org> wrote: > > - NFS and AFS are modifying d_flags without dcache_lock. > 25-akpm/fs/afs/dir.c | 2 ++ 25-akpm/fs/nfs/unlink.c | 4 ++++ 2 files changed, 6 insertions(+) diff -puN fs/nfs/unlink.c~d_flags-locking-fix fs/nfs/unlink.c --- 25/fs/nfs/unlink.c~d_flags-locking-fix 2004-05-08 03:03:14.075568032 -0700 +++ 25-akpm/fs/nfs/unlink.c 2004-05-08 03:04:02.561197096 -0700 @@ -180,7 +180,9 @@ nfs_async_unlink(struct dentry *dentry) task->tk_action = nfs_async_unlink_init; task->tk_release = nfs_async_unlink_release; + spin_lock(&dcache_lock); dentry->d_flags |= DCACHE_NFSFS_RENAMED; + spin_unlock(&dcache_lock); data->cred = rpcauth_lookupcred(clnt->cl_auth, 0); rpc_sleep_on(&nfs_delete_queue, task, NULL, NULL); @@ -210,7 +212,9 @@ nfs_complete_unlink(struct dentry *dentr return; data->count++; nfs_copy_dname(dentry, data); + spin_lock(&dcache_lock); dentry->d_flags &= ~DCACHE_NFSFS_RENAMED; + spin_unlock(&dcache_lock); if (data->task.tk_rpcwait == &nfs_delete_queue) rpc_wake_up_task(&data->task); nfs_put_unlinkdata(data); diff -puN fs/afs/dir.c~d_flags-locking-fix fs/afs/dir.c --- 25/fs/afs/dir.c~d_flags-locking-fix 2004-05-08 03:03:14.404518024 -0700 +++ 25-akpm/fs/afs/dir.c 2004-05-08 03:04:20.190517032 -0700 @@ -615,7 +615,9 @@ static int afs_d_revalidate(struct dentr /* the dirent, if it exists, now points to a different vnode */ not_found: + spin_lock(&dcache_lock); dentry->d_flags |= DCACHE_NFSFS_RENAMED; + spin_unlock(&dcache_lock); out_bad: if (inode) { _ ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 10:11 ` Andrew Morton 2004-05-08 10:12 ` Andrew Morton @ 2004-05-08 10:28 ` viro 2004-05-08 10:41 ` Andrew Morton 2004-05-08 10:52 ` Andrew Morton 2004-05-08 10:31 ` Manfred Spraul 2004-05-08 17:28 ` Linus Torvalds 3 siblings, 2 replies; 62+ messages in thread From: viro @ 2004-05-08 10:28 UTC (permalink / raw) To: Andrew Morton; +Cc: manfred, davej, torvalds, wli, linux-kernel On Sat, May 08, 2004 at 03:11:59AM -0700, Andrew Morton wrote: > Andrew Morton <akpm@osdl.org> wrote: > > > > Here be another patch. > > > Can't help myself! > > > - d_vfs_flags can be removed - just use d_flags. It looks like someone > added d_vfs_flags with the intent of doing bitops on it, but all > modifications are under dcache_lock. 4 bytes saved. Bzzert. ->d_vfs_flags modifications are under dcache_lock; ->d_flags ones are *not* - they are up to whatever filesystem code happens to use them. > - Pack things so that dentry.d_name.len and dentry.d_flags occupy the same > word. 4 bytes saved. d_name.len is accessed on very hot paths. In particular, we are looking at it while traversing hash chains and we do that without dcache_lock (see callers of ->d_compare()). If we are going to hold dcache_lock in __d_lookup(), we can get rid of a _lot_ more than 2 bytes. 32 bytes that came from RCU are there pretty much for a single reason - to avoid dcache_lock on that path. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 10:28 ` viro @ 2004-05-08 10:41 ` Andrew Morton 2004-05-08 10:52 ` Andrew Morton 1 sibling, 0 replies; 62+ messages in thread From: Andrew Morton @ 2004-05-08 10:41 UTC (permalink / raw) To: viro; +Cc: manfred, davej, torvalds, wli, linux-kernel viro@parcelfarce.linux.theplanet.co.uk wrote: > > On Sat, May 08, 2004 at 03:11:59AM -0700, Andrew Morton wrote: > > Andrew Morton <akpm@osdl.org> wrote: > > > > > > Here be another patch. > > > > > > Can't help myself! > > > > > > - d_vfs_flags can be removed - just use d_flags. It looks like someone > > added d_vfs_flags with the intent of doing bitops on it, but all > > modifications are under dcache_lock. 4 bytes saved. > > Bzzert. ->d_vfs_flags modifications are under dcache_lock; ->d_flags ones > are *not* - they are up to whatever filesystem code happens to use them. yup, I hunted down the rest. > > - Pack things so that dentry.d_name.len and dentry.d_flags occupy the same > > word. 4 bytes saved. > > d_name.len is accessed on very hot paths. In particular, we are looking > at it while traversing hash chains and we do that without dcache_lock > (see callers of ->d_compare()). That's OK. The d_move() d_movecount and locking logic takes care of that. But the ushorts are a bit dopey - I'll give those 4 bytes back. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 10:28 ` viro 2004-05-08 10:41 ` Andrew Morton @ 2004-05-08 10:52 ` Andrew Morton 1 sibling, 0 replies; 62+ messages in thread From: Andrew Morton @ 2004-05-08 10:52 UTC (permalink / raw) To: viro; +Cc: manfred, davej, torvalds, wli, linux-kernel viro@parcelfarce.linux.theplanet.co.uk wrote: > > Bzzert. ->d_vfs_flags modifications are under dcache_lock; ->d_flags ones > are *not* - they are up to whatever filesystem code happens to use them. All but one of the d_vfs_flags modifications are under ->d_lock, so I fixed that one up, converted all the d_flags modifiers to take ->d_lock and moved everything over to d_flags. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 10:11 ` Andrew Morton 2004-05-08 10:12 ` Andrew Morton 2004-05-08 10:28 ` viro @ 2004-05-08 10:31 ` Manfred Spraul 2004-05-08 17:28 ` Linus Torvalds 3 siblings, 0 replies; 62+ messages in thread From: Manfred Spraul @ 2004-05-08 10:31 UTC (permalink / raw) To: Andrew Morton; +Cc: davej, torvalds, wli, linux-kernel Andrew Morton wrote: >Andrew Morton <akpm@osdl.org> wrote: > > >>Here be another patch. >> >> > > >Can't help myself! > > >- d_vfs_flags can be removed - just use d_flags. It looks like someone > added d_vfs_flags with the intent of doing bitops on it, but all > modifications are under dcache_lock. 4 bytes saved. > > IIRC it was intended to access one of the flag fields under dentry->d_lock. -- Manfred ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 10:11 ` Andrew Morton ` (2 preceding siblings ...) 2004-05-08 10:31 ` Manfred Spraul @ 2004-05-08 17:28 ` Linus Torvalds 2004-05-08 18:19 ` David S. Miller 2004-05-08 19:01 ` Andrew Morton 3 siblings, 2 replies; 62+ messages in thread From: Linus Torvalds @ 2004-05-08 17:28 UTC (permalink / raw) To: Andrew Morton; +Cc: manfred, davej, wli, linux-kernel On Sat, 8 May 2004, Andrew Morton wrote: > */ > struct qstr { > const unsigned char *name; > - unsigned int len; > unsigned int hash; > -}; > + unsigned short len; > +} __attribute__((packed)); This would make me nervous. Since we don't use this thing in an array, this may be right. But on the other hand, if this thing is preceded by an "unsigned short" in a structure (or "int" on a 64-bit architecture), I think you just made "name" be unaligned, which can cause serious problems. Because I think "packed" _also_ means that it doesn't honor the alignment of the thing when laying it out in structures. Also, in your previous patch (which I'm not as convinced might be wrong), the d_qstr pointer removal makes me worry: - struct qstr * d_qstr; /* quick str ptr used in lockless lookup and concurrent d_move */ I thought the point of d_qstr was that when we do the lockless lookup, we're guaranteed to always see "stable storage" in the sense that when we follow the d_qstr, we will always get a "char *" + "len" that match, and we could never see a partial update (ie len points to the old one, and "char *" points to the new one). In particular, think about the "d_compare(parent, qstr, name)" / "memcmp(qstr->name, str, len)" part - what if "len" doesn't match str, because a concurrent d_move() is updating them, and maybe we will compare past the end of kernel mapped memory or something? (In other words, the "move_count" check should protect us from returning a wrong dentry, but I'd worry that we'd do something that could cause serious problems before we even get to the "move_count" check). Hmm? Btw, I'd love to be proven wrong, since I hate that d_qstr, and I think your d_qstr removal patch otherwise looked wonderful. Linus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 17:28 ` Linus Torvalds @ 2004-05-08 18:19 ` David S. Miller 2004-05-08 19:01 ` Andrew Morton 1 sibling, 0 replies; 62+ messages in thread From: David S. Miller @ 2004-05-08 18:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: akpm, manfred, davej, wli, linux-kernel On Sat, 8 May 2004 10:28:29 -0700 (PDT) Linus Torvalds <torvalds@osdl.org> wrote: > > + unsigned short len; > > +} __attribute__((packed)); ... > I think you just made > "name" be unaligned, which can cause serious problems. Because I think > "packed" _also_ means that it doesn't honor the alignment of the thing > when laying it out in structures. That's correct, alignment concerns are thrown out the door once you specify packed to the compiler. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 17:28 ` Linus Torvalds 2004-05-08 18:19 ` David S. Miller @ 2004-05-08 19:01 ` Andrew Morton 2004-05-08 19:13 ` Linus Torvalds 2004-05-08 21:00 ` Dipankar Sarma 1 sibling, 2 replies; 62+ messages in thread From: Andrew Morton @ 2004-05-08 19:01 UTC (permalink / raw) To: Linus Torvalds; +Cc: manfred, davej, wli, linux-kernel Linus Torvalds <torvalds@osdl.org> wrote: > > > > On Sat, 8 May 2004, Andrew Morton wrote: > > */ > > struct qstr { > > const unsigned char *name; > > - unsigned int len; > > unsigned int hash; > > -}; > > + unsigned short len; > > +} __attribute__((packed)); > > This would make me nervous. yeah, that was just mucking about. > Also, in your previous patch (which I'm not as convinced might be wrong), > the d_qstr pointer removal makes me worry: > > - struct qstr * d_qstr; /* quick str ptr used in lockless lookup and concurrent d_move */ > > I thought the point of d_qstr was that when we do the lockless lookup, > we're guaranteed to always see "stable storage" in the sense that when we > follow the d_qstr, we will always get a "char *" + "len" that match, and > we could never see a partial update (ie len points to the old one, and > "char *" points to the new one). It looks that way. > In particular, think about the "d_compare(parent, qstr, name)" / > "memcmp(qstr->name, str, len)" part - what if "len" doesn't match str, > because a concurrent d_move() is updating them, and maybe we will compare > past the end of kernel mapped memory or something? > > (In other words, the "move_count" check should protect us from returning a > wrong dentry, but I'd worry that we'd do something that could cause > serious problems before we even get to the "move_count" check). > > Hmm? > > Btw, I'd love to be proven wrong, since I hate that d_qstr, and I think > your d_qstr removal patch otherwise looked wonderful. It looks very similar to 2.4 actually ;) I think we can simply take ->d_lock a bit earlier in __d_lookup. That will serialise against d_move(), fixing the problem which you mention, and also makes d_movecount go away. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 19:01 ` Andrew Morton @ 2004-05-08 19:13 ` Linus Torvalds 2004-05-08 19:27 ` Andrew Morton ` (2 more replies) 2004-05-08 21:00 ` Dipankar Sarma 1 sibling, 3 replies; 62+ messages in thread From: Linus Torvalds @ 2004-05-08 19:13 UTC (permalink / raw) To: Andrew Morton; +Cc: manfred, davej, wli, linux-kernel On Sat, 8 May 2004, Andrew Morton wrote: > > I think we can simply take ->d_lock a bit earlier in __d_lookup. That will > serialise against d_move(), fixing the problem which you mention, and also > makes d_movecount go away. If you do that, RCU basically loses most of it's meaning. You'll be taking a lock for - and dirtying in the cache - every single dentry on the hash chain, which is pretty much guaranteed to be slower than just taking the dcache_lock _once_, even if that one jumps across CPU's a lot. In other words, no, I don't think that's a good idea. We really want to take the dentry lock only after we're pretty sure we have the right dentry. Otherwise the dentry chains will be bouncing from CPU to CPU all the time. Linus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 19:13 ` Linus Torvalds @ 2004-05-08 19:27 ` Andrew Morton 2004-05-08 19:27 ` Linus Torvalds 2004-05-08 20:13 ` Dipankar Sarma 2 siblings, 0 replies; 62+ messages in thread From: Andrew Morton @ 2004-05-08 19:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: manfred, davej, wli, linux-kernel Linus Torvalds <torvalds@osdl.org> wrote: > > On Sat, 8 May 2004, Andrew Morton wrote: > > > > I think we can simply take ->d_lock a bit earlier in __d_lookup. That will > > serialise against d_move(), fixing the problem which you mention, and also > > makes d_movecount go away. > > If you do that, RCU basically loses most of it's meaning. > > You'll be taking a lock for - and dirtying in the cache - every single > dentry on the hash chain, which is pretty much guaranteed to be slower > than just taking the dcache_lock _once_, even if that one jumps across > CPU's a lot. Can take the lock after comparing the hash and the parent? And if we recheck those after locking, d_movecount is unneeded? ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 19:13 ` Linus Torvalds 2004-05-08 19:27 ` Andrew Morton @ 2004-05-08 19:27 ` Linus Torvalds 2004-05-08 20:42 ` Dipankar Sarma 2004-05-08 20:13 ` Dipankar Sarma 2 siblings, 1 reply; 62+ messages in thread From: Linus Torvalds @ 2004-05-08 19:27 UTC (permalink / raw) To: Andrew Morton Cc: manfred, davej, wli, Kernel Mailing List, Dipankar Sarma, Maneesh Soni On Sat, 8 May 2004, Linus Torvalds wrote: > > > On Sat, 8 May 2004, Andrew Morton wrote: > > > > I think we can simply take ->d_lock a bit earlier in __d_lookup. That will > > serialise against d_move(), fixing the problem which you mention, and also > > makes d_movecount go away. > > If you do that, RCU basically loses most of it's meaning. Hmm.. Maybe I'm wrong. In particular, it should be safe to at least do the name hash and parent comparison without holding any lock (since even if they are invalidated by a concurrent "move()" operation, doing the comparison is safe). By the time those have matched, we are probably pretty safe in taking the lock, since the likelihood of the other checks matching should be pretty high. And yes, removing d_movecount would be ok by then, as long as we re-test the parent inside d_lock (we don't need to re-test "hash", since if we tested the full name inside the lock, the hash had better match too ;) Comments? Linus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 19:27 ` Linus Torvalds @ 2004-05-08 20:42 ` Dipankar Sarma 2004-05-08 20:55 ` Andrew Morton 2004-05-09 7:28 ` Manfred Spraul 0 siblings, 2 replies; 62+ messages in thread From: Dipankar Sarma @ 2004-05-08 20:42 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, manfred, davej, wli, Kernel Mailing List, Maneesh Soni On Sat, May 08, 2004 at 12:27:50PM -0700, Linus Torvalds wrote: > > On Sat, 8 May 2004, Linus Torvalds wrote: > > On Sat, 8 May 2004, Andrew Morton wrote: > > > > > > I think we can simply take ->d_lock a bit earlier in __d_lookup. That will > > > serialise against d_move(), fixing the problem which you mention, and also > > > makes d_movecount go away. > > > > If you do that, RCU basically loses most of it's meaning. > > In particular, it should be safe to at least do the name hash and parent > comparison without holding any lock (since even if they are invalidated by > a concurrent "move()" operation, doing the comparison is safe). By the > time those have matched, we are probably pretty safe in taking the lock, > since the likelihood of the other checks matching should be pretty high. > > And yes, removing d_movecount would be ok by then, as long as we re-test > the parent inside d_lock (we don't need to re-test "hash", since if we > tested the full name inside the lock, the hash had better match too ;) There are couple of issues that need to be checked - 1. Re-doing the parent comparison and full name under ->d_lock need to be benchmarked using dcachebench. That part of code is extrememly performance sensitive and I remember that the current d_movecount based solution was done after a lot of benchmarking of various alternatives. 2. We need to check if doing ->d_compare() under ->d_lock will result in locking hierarchy problems. Thanks Dipankar ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 20:42 ` Dipankar Sarma @ 2004-05-08 20:55 ` Andrew Morton 2004-05-08 21:19 ` Dipankar Sarma 2004-05-09 7:28 ` Manfred Spraul 1 sibling, 1 reply; 62+ messages in thread From: Andrew Morton @ 2004-05-08 20:55 UTC (permalink / raw) To: dipankar; +Cc: torvalds, manfred, davej, wli, linux-kernel, maneesh Dipankar Sarma <dipankar@in.ibm.com> wrote: > > > And yes, removing d_movecount would be ok by then, as long as we re-test > > the parent inside d_lock (we don't need to re-test "hash", since if we > > tested the full name inside the lock, the hash had better match too ;) > > There are couple of issues that need to be checked - > > 1. Re-doing the parent comparison and full name under ->d_lock > need to be benchmarked using dcachebench. That part of code > is extrememly performance sensitive and I remember that the > current d_movecount based solution was done after a lot of > benchmarking of various alternatives. There's a speed-space tradeoff here as well. Making the dentry smaller means that more can be cached, which reduces disk seeks. On all machines... But yes, when I've finished mucking with this I'll be asking you to put it all through your performance/correctness/stress tests please. One thing which needs to be reviewed is the layout of the dentry, too. > 2. We need to check if doing ->d_compare() under ->d_lock will > result in locking hierarchy problems. Yup, I checked that. Is OK. It's hard to see how a ->d_compare implementation could care. And ->d_compare is called under dcache_lock in 2.4. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 20:55 ` Andrew Morton @ 2004-05-08 21:19 ` Dipankar Sarma 2004-05-09 0:10 ` Andrew Morton 0 siblings, 1 reply; 62+ messages in thread From: Dipankar Sarma @ 2004-05-08 21:19 UTC (permalink / raw) To: Andrew Morton; +Cc: torvalds, manfred, davej, wli, linux-kernel, maneesh On Sat, May 08, 2004 at 01:55:12PM -0700, Andrew Morton wrote: > Dipankar Sarma <dipankar@in.ibm.com> wrote: > > There are couple of issues that need to be checked - > > > > 1. Re-doing the parent comparison and full name under ->d_lock > > need to be benchmarked using dcachebench. That part of code > > is extrememly performance sensitive and I remember that the > > current d_movecount based solution was done after a lot of > > benchmarking of various alternatives. > > There's a speed-space tradeoff here as well. Making the dentry smaller > means that more can be cached, which reduces disk seeks. On all > machines... Another thing that would help is the singly linked rcu patch. It shaves off 8-bytes per-rcu_head on x86. Should I revive that ? > But yes, when I've finished mucking with this I'll be asking you to put it > all through your performance/correctness/stress tests please. Yes, sure. > One thing which needs to be reviewed is the layout of the dentry, too. IIRC, Maneesh did some experiments with this and found that any changes in the layout he did only degraded performance :) Thanks Dipankar ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 21:19 ` Dipankar Sarma @ 2004-05-09 0:10 ` Andrew Morton 2004-05-09 2:55 ` Linus Torvalds 0 siblings, 1 reply; 62+ messages in thread From: Andrew Morton @ 2004-05-09 0:10 UTC (permalink / raw) To: dipankar; +Cc: torvalds, manfred, davej, wli, linux-kernel, maneesh Dipankar Sarma <dipankar@in.ibm.com> wrote: > > On Sat, May 08, 2004 at 01:55:12PM -0700, Andrew Morton wrote: > > Dipankar Sarma <dipankar@in.ibm.com> wrote: > > > There are couple of issues that need to be checked - > > > > > > 1. Re-doing the parent comparison and full name under ->d_lock > > > need to be benchmarked using dcachebench. That part of code > > > is extrememly performance sensitive and I remember that the > > > current d_movecount based solution was done after a lot of > > > benchmarking of various alternatives. > > > > There's a speed-space tradeoff here as well. Making the dentry smaller > > means that more can be cached, which reduces disk seeks. On all > > machines... > > Another thing that would help is the singly linked rcu patch. > It shaves off 8-bytes per-rcu_head on x86. Should I revive > that ? > May as well. > > But yes, when I've finished mucking with this I'll be asking you to put it > > all through your performance/correctness/stress tests please. > > Yes, sure. > > > One thing which needs to be reviewed is the layout of the dentry, too. > > IIRC, Maneesh did some experiments with this and found that any > changes in the layout he did only degraded performance :) Well. Bear in mind that the dentry used to be 256 bytes on a 128 byte boundary so there was really nothing to improve, unless you were using a PIII or something. But I just made it 160 bytes on a 16-byte boundary, so some dentries will now stradddle cachelines. We need to identify the used fields on the lookup path and try to scrunch them into a 16-byte-aligned 16-byte chunk. I'll have a go at that. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 0:10 ` Andrew Morton @ 2004-05-09 2:55 ` Linus Torvalds 2004-05-09 3:12 ` David S. Miller 2004-05-09 4:12 ` Andrew Morton 0 siblings, 2 replies; 62+ messages in thread From: Linus Torvalds @ 2004-05-09 2:55 UTC (permalink / raw) To: Andrew Morton; +Cc: dipankar, manfred, davej, wli, linux-kernel, maneesh Ok, while we're discussing the other things, how about for now just having this fairly minimal patch to get rid of the most egregious memory mis-use. This removes the SLAB_HWCACHE_ALIGN and the cache line alignment on "struct dentry", and to compensate for the fact that the inline name length will shrink quite a bit it changes DNAME_INLINE_LEN_MIN from 16 to 24. It also re-orders the first few fields of the dentry, since the "count" and "d_lock" should be 32-bit on most 64-bit architectures, so to avoid having unnecessary unused space, we should just put them together (everything else there is goign to be comprised of 64-bit entities, I think). So this should bring the size of a dentry down to 160 bytes on x86, from either 192 of 256 bytes (depending on cacheline size). And 24 bytes of inline name (it can be a bit more on some architectures, due to longword alignment or similar) should still capture most of the dentries by far. Let's see what Andrew's more complex patches can do to bring the size of "struct dentry" down further. Linus --- 1.71/fs/dcache.c Mon Apr 26 22:07:34 2004 +++ edited/fs/dcache.c Sat May 8 17:41:30 2004 @@ -1558,14 +1558,11 @@ * A constructor could be added for stable state like the lists, * but it is probably not worth it because of the cache nature * of the dcache. - * If fragmentation is too bad then the SLAB_HWCACHE_ALIGN - * flag could be removed here, to hint to the allocator that - * it should not try to get multiple page regions. */ dentry_cache = kmem_cache_create("dentry_cache", sizeof(struct dentry), 0, - SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT, + SLAB_RECLAIM_ACCOUNT, NULL, NULL); if (!dentry_cache) panic("Cannot create dentry cache"); ===== include/linux/dcache.h 1.36 vs edited ===== --- 1.36/include/linux/dcache.h Mon Jan 19 15:38:11 2004 +++ edited/include/linux/dcache.h Sat May 8 19:39:47 2004 @@ -74,14 +74,14 @@ return end_name_hash(hash); } -#define DNAME_INLINE_LEN_MIN 16 +#define DNAME_INLINE_LEN_MIN 24 struct dcookie_struct; struct dentry { atomic_t d_count; - unsigned long d_vfs_flags; /* moved here to be on same cacheline */ spinlock_t d_lock; /* per dentry lock */ + unsigned long d_vfs_flags; /* moved here to be on same cacheline */ struct inode * d_inode; /* Where the name belongs to - NULL is negative */ struct list_head d_lru; /* LRU list */ struct list_head d_child; /* child of parent list */ @@ -102,7 +102,7 @@ struct hlist_node d_hash; /* lookup hash list */ struct hlist_head * d_bucket; /* lookup hash bucket */ unsigned char d_iname[DNAME_INLINE_LEN_MIN]; /* small names */ -} ____cacheline_aligned; +}; #define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname)) ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 2:55 ` Linus Torvalds @ 2004-05-09 3:12 ` David S. Miller 2004-05-09 3:53 ` Linus Torvalds 2004-05-09 4:12 ` Andrew Morton 1 sibling, 1 reply; 62+ messages in thread From: David S. Miller @ 2004-05-09 3:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: akpm, dipankar, manfred, davej, wli, linux-kernel, maneesh FWIW this patch does not change the dentry size on sparc64. It's 256 bytes both before and after. But of course the inline string length is larger. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 3:12 ` David S. Miller @ 2004-05-09 3:53 ` Linus Torvalds 2004-05-09 21:03 ` Matt Mackall 0 siblings, 1 reply; 62+ messages in thread From: Linus Torvalds @ 2004-05-09 3:53 UTC (permalink / raw) To: David S. Miller Cc: akpm, dipankar, manfred, davej, wli, linux-kernel, maneesh On Sat, 8 May 2004, David S. Miller wrote: > > FWIW this patch does not change the dentry size on sparc64. > It's 256 bytes both before and after. But of course the > inline string length is larger. Yup. I tested it on ppc64, that was part of the reason for the increase in the inline string size ("reclaim the lost space" rather than "make the dentry shrink from 256 bytes to 248 bytes"). That shows how broken the old setup was, btw: on 64-bit architectures, the old code resulted in the inline string size being 16 bytes, while on a P4 it would be something like 120 bytes. That makes no sense. Btw, the cacheline alignment was actually likely to guarantee the _worst_ possible cache behaviour on a P4: the fields we access on lookup are: - d_hash (pointer *2) - d_bucket (pointer) - d_move_count (unsigned long) - d_name.hash (part of a struct with 1 pointer + 2 ints) - d_parent (pointer) and they were all close together, but I think "d_bucket" was guaranteed to be in another cacheline exactly because of the thing always being aligned. So removing the alignment is likely to cause more of the lookups to only need one cacheline instead of two like it was before. I think. Maybe I counted the offsets wrong. Linus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 3:53 ` Linus Torvalds @ 2004-05-09 21:03 ` Matt Mackall 2004-05-10 8:27 ` Helge Hafting 0 siblings, 1 reply; 62+ messages in thread From: Matt Mackall @ 2004-05-09 21:03 UTC (permalink / raw) To: Linus Torvalds Cc: David S. Miller, akpm, dipankar, manfred, davej, wli, linux-kernel, maneesh, Arjan van de Ven One also wonders about whether all the RCU stuff is needed on UP. I'm not sure if I grok all the finepoints here, but it looks like the answer is no and that we can make struct_rcu head empty and have call_rcu fall directly through to the callback. This would save something like 16-32 bytes (32/64bit), not to mention a bunch of dinking around with lists and whatnot. So what am I missing? -- Matt Mackall : http://www.selenic.com : Linux development and consulting ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 21:03 ` Matt Mackall @ 2004-05-10 8:27 ` Helge Hafting 2004-05-10 8:32 ` Arjan van de Ven 0 siblings, 1 reply; 62+ messages in thread From: Helge Hafting @ 2004-05-10 8:27 UTC (permalink / raw) To: Matt Mackall; +Cc: linux-kernel Matt Mackall wrote: >One also wonders about whether all the RCU stuff is needed on UP. I'm >not sure if I grok all the finepoints here, but it looks like the >answer is no and that we can make struct_rcu head empty and have >call_rcu fall directly through to the callback. This would save >something like 16-32 bytes (32/64bit), not to mention a bunch of >dinking around with lists and whatnot. > >So what am I missing? > > Preempt can happen anytime, I believe. Helge Hafting ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-10 8:27 ` Helge Hafting @ 2004-05-10 8:32 ` Arjan van de Ven 2004-05-10 9:46 ` Andrew Morton 0 siblings, 1 reply; 62+ messages in thread From: Arjan van de Ven @ 2004-05-10 8:32 UTC (permalink / raw) To: Helge Hafting; +Cc: Matt Mackall, linux-kernel [-- Attachment #1: Type: text/plain, Size: 644 bytes --] On Mon, 2004-05-10 at 10:27, Helge Hafting wrote: > Matt Mackall wrote: > > >One also wonders about whether all the RCU stuff is needed on UP. I'm > >not sure if I grok all the finepoints here, but it looks like the > >answer is no and that we can make struct_rcu head empty and have > >call_rcu fall directly through to the callback. This would save > >something like 16-32 bytes (32/64bit), not to mention a bunch of > >dinking around with lists and whatnot. > > > >So what am I missing? > > > > > Preempt can happen anytime, I believe. ok so for UP-non-preempt we can still get those 16 bytes back from the dentry.... [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-10 8:32 ` Arjan van de Ven @ 2004-05-10 9:46 ` Andrew Morton 2004-05-10 14:54 ` Matt Mackall 0 siblings, 1 reply; 62+ messages in thread From: Andrew Morton @ 2004-05-10 9:46 UTC (permalink / raw) To: arjanv; +Cc: helgehaf, mpm, linux-kernel Arjan van de Ven <arjanv@redhat.com> wrote: > > On Mon, 2004-05-10 at 10:27, Helge Hafting wrote: > > Matt Mackall wrote: > > > > >One also wonders about whether all the RCU stuff is needed on UP. I'm > > >not sure if I grok all the finepoints here, but it looks like the > > >answer is no and that we can make struct_rcu head empty and have > > >call_rcu fall directly through to the callback. This would save > > >something like 16-32 bytes (32/64bit), not to mention a bunch of > > >dinking around with lists and whatnot. > > > > > >So what am I missing? > > > > > > > > Preempt can happen anytime, I believe. > > ok so for UP-non-preempt we can still get those 16 bytes back from the > dentry.... I suppose so. And on small SMP, really. We chose not to play those games early on so the code got the best testing coverage. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-10 9:46 ` Andrew Morton @ 2004-05-10 14:54 ` Matt Mackall 2004-05-10 16:26 ` Paul E. McKenney 0 siblings, 1 reply; 62+ messages in thread From: Matt Mackall @ 2004-05-10 14:54 UTC (permalink / raw) To: Andrew Morton; +Cc: arjanv, helgehaf, linux-kernel On Mon, May 10, 2004 at 02:46:58AM -0700, Andrew Morton wrote: > Arjan van de Ven <arjanv@redhat.com> wrote: > > > > On Mon, 2004-05-10 at 10:27, Helge Hafting wrote: > > > Matt Mackall wrote: > > > > > > >One also wonders about whether all the RCU stuff is needed on UP. I'm > > > >not sure if I grok all the finepoints here, but it looks like the > > > >answer is no and that we can make struct_rcu head empty and have > > > >call_rcu fall directly through to the callback. This would save > > > >something like 16-32 bytes (32/64bit), not to mention a bunch of > > > >dinking around with lists and whatnot. > > > > > > > >So what am I missing? > > > > > > > > > > > Preempt can happen anytime, I believe. > > > > ok so for UP-non-preempt we can still get those 16 bytes back from the > > dentry.... > > I suppose so. And on small SMP, really. We chose not to play those games > early on so the code got the best testing coverage. Ok, I can spin something up. I'll start with a generic no-RCU-on-UP and then we can think about the small SMP case a bit later. -- Matt Mackall : http://www.selenic.com : Linux development and consulting ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-10 14:54 ` Matt Mackall @ 2004-05-10 16:26 ` Paul E. McKenney 2004-05-10 18:34 ` Dipankar Sarma 0 siblings, 1 reply; 62+ messages in thread From: Paul E. McKenney @ 2004-05-10 16:26 UTC (permalink / raw) To: Matt Mackall; +Cc: Andrew Morton, arjanv, helgehaf, linux-kernel On Mon, May 10, 2004 at 09:54:04AM -0500, Matt Mackall wrote: > On Mon, May 10, 2004 at 02:46:58AM -0700, Andrew Morton wrote: > > Arjan van de Ven <arjanv@redhat.com> wrote: > > > > > > On Mon, 2004-05-10 at 10:27, Helge Hafting wrote: > > > > Matt Mackall wrote: > > > > > > > > >One also wonders about whether all the RCU stuff is needed on UP. I'm > > > > >not sure if I grok all the finepoints here, but it looks like the > > > > >answer is no and that we can make struct_rcu head empty and have > > > > >call_rcu fall directly through to the callback. This would save > > > > >something like 16-32 bytes (32/64bit), not to mention a bunch of > > > > >dinking around with lists and whatnot. > > > > > > > > > >So what am I missing? > > > > > > > > > > > > > > Preempt can happen anytime, I believe. > > > > > > ok so for UP-non-preempt we can still get those 16 bytes back from the > > > dentry.... > > > > I suppose so. And on small SMP, really. We chose not to play those games > > early on so the code got the best testing coverage. > > Ok, I can spin something up. I'll start with a generic no-RCU-on-UP > and then we can think about the small SMP case a bit later. Hello, Matt, You may wish to start with the dcache portion of Dipankar's earlier patch: http://sourceforge.net/mailarchive/forum.php?thread_id=3644163&forum_id=730 It does not remove the extra rcu_head from the dentry, but does the rest of the work. Thanx, Paul ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-10 16:26 ` Paul E. McKenney @ 2004-05-10 18:34 ` Dipankar Sarma 0 siblings, 0 replies; 62+ messages in thread From: Dipankar Sarma @ 2004-05-10 18:34 UTC (permalink / raw) To: Paul E. McKenney Cc: Matt Mackall, Andrew Morton, arjanv, helgehaf, linux-kernel On Mon, May 10, 2004 at 09:26:45AM -0700, Paul E. McKenney wrote: > On Mon, May 10, 2004 at 09:54:04AM -0500, Matt Mackall wrote: > > Ok, I can spin something up. I'll start with a generic no-RCU-on-UP > > and then we can think about the small SMP case a bit later. > > Hello, Matt, > > You may wish to start with the dcache portion of Dipankar's earlier patch: > > http://sourceforge.net/mailarchive/forum.php?thread_id=3644163&forum_id=730 > > It does not remove the extra rcu_head from the dentry, but does the > rest of the work. Matt and I had discussed this privately sometime ago and I will send him somewhat cleaned up patches that introduces call_rcu_direct() which, in UP kernel, invokes the callback directly. The corresponding read-side critical section will have to protect itself with rcu_read_lock_bh() or rcu_read_lock_irq() if an update can happen from softirq or irq context. Thanks Dipankar ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 2:55 ` Linus Torvalds 2004-05-09 3:12 ` David S. Miller @ 2004-05-09 4:12 ` Andrew Morton 2004-05-09 4:25 ` Linus Torvalds 1 sibling, 1 reply; 62+ messages in thread From: Andrew Morton @ 2004-05-09 4:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: dipankar, manfred, davej, wli, linux-kernel, maneesh Linus Torvalds <torvalds@osdl.org> wrote: > > while we're discussing the other things, how about for now > just having this fairly minimal patch to get rid of the most egregious > memory mis-use. > > This removes the SLAB_HWCACHE_ALIGN and the cache line alignment on > "struct dentry", and to compensate for the fact that the inline name > length will shrink quite a bit it changes DNAME_INLINE_LEN_MIN from 16 to > 24. I think that'll crash. DNAME_INLINE_LEN_MIN is the *minimum* storage in the dentry. The actual storage is assumed to be DNAME_INLINE_LEN and the patch doesn't change that value. The calculation of DNAME_INLINE_LEN assumes that slab padded the dentry out to the next cacheline. If we want a quickie deporkification, this will do it. But I don't see a rush on it. Save a bit of dcache space. Currently we use all the space from the start of dentry.d_name to the next cacheline for the inline name. That can be 128 bytes or more nowadays, which is excessive. In fact it's exactly 128 bytes of inline name length for an x86 P4 build (both UP and SMP). Rework things so that the inline name length is between 31 and 48 bytes. On SMP P4-compiled x86 each dentry goes from 256 bytes (15 per page) to 160 bytes (24 per page). The downside is that we'll have more out-of-line names. Here's the histogram of name lengths on all 1.5M files on my workstation: 1: 0% 2: 0% 3: 1% 4: 5% 5: 8% 6: 13% 7: 19% 8: 26% 9: 33% 10: 42% 11: 49% 12: 55% 13: 60% 14: 64% 15: 67% 16: 69% 17: 71% 18: 73% 19: 75% 20: 76% 21: 78% 22: 79% 23: 80% 24: 81% 25: 82% 26: 83% 27: 85% 28: 86% 29: 87% 30: 88% 31: 89% 32: 90% 33: 91% 34: 92% 35: 93% 36: 94% 37: 95% 38: 96% 39: 96% 40: 96% 41: 96% 42: 96% 43: 96% 44: 97% 45: 97% 46: 97% 47: 97% 48: 97% 49: 98% 50: 98% 51: 98% 52: 98% 53: 98% 54: 98% 55: 98% 56: 98% 57: 98% 58: 98% 59: 98% 60: 99% 61: 99% 62: 99% 63: 99% 64: 99% So on x86 we'll fit 89% of filenames into the inline name. The patch also removes the NAME_ALLOC_LEN() rounding-up of the storage for the out-of-line names. That seems unnecessary. --- 25-akpm/fs/dcache.c | 16 ++++++++++------ 25-akpm/include/linux/dcache.h | 8 ++------ 2 files changed, 12 insertions(+), 12 deletions(-) diff -puN fs/dcache.c~dentry-shrinkage fs/dcache.c --- 25/fs/dcache.c~dentry-shrinkage 2004-05-08 21:09:00.620787784 -0700 +++ 25-akpm/fs/dcache.c 2004-05-08 21:09:37.413194488 -0700 @@ -41,6 +41,13 @@ EXPORT_SYMBOL(dcache_lock); static kmem_cache_t *dentry_cache; /* + * The allocation size for each dentry. It is a multiple of 16 bytes. We + * leave the final 32-47 bytes for the inline name. + */ +#define DENTRY_STORAGE (((sizeof(struct dentry)+32) + 15) & ~15) +#define DNAME_INLINE_LEN (DENTRY_STORAGE - sizeof(struct dentry)) + +/* * This is the single most critical data structure when it comes * to the dcache: the hashtable for lookups. Somebody should try * to make this good - I've just made it work. @@ -665,8 +672,6 @@ static int shrink_dcache_memory(int nr, return dentry_stat.nr_unused; } -#define NAME_ALLOC_LEN(len) ((len+16) & ~15) - /** * d_alloc - allocate a dcache entry * @parent: parent of entry to allocate @@ -688,8 +693,7 @@ struct dentry * d_alloc(struct dentry * return NULL; if (name->len > DNAME_INLINE_LEN-1) { - qstr = kmalloc(sizeof(*qstr) + NAME_ALLOC_LEN(name->len), - GFP_KERNEL); + qstr = kmalloc(sizeof(*qstr) + name->len + 1, GFP_KERNEL); if (!qstr) { kmem_cache_free(dentry_cache, dentry); return NULL; @@ -1563,9 +1567,9 @@ static void __init dcache_init(unsigned * it should not try to get multiple page regions. */ dentry_cache = kmem_cache_create("dentry_cache", - sizeof(struct dentry), + DENTRY_STORAGE, 0, - SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT, + SLAB_RECLAIM_ACCOUNT, NULL, NULL); if (!dentry_cache) panic("Cannot create dentry cache"); diff -puN include/linux/dcache.h~dentry-shrinkage include/linux/dcache.h --- 25/include/linux/dcache.h~dentry-shrinkage 2004-05-08 21:09:00.621787632 -0700 +++ 25-akpm/include/linux/dcache.h 2004-05-08 21:09:00.628786568 -0700 @@ -74,8 +74,6 @@ full_name_hash(const unsigned char *name return end_name_hash(hash); } -#define DNAME_INLINE_LEN_MIN 16 - struct dcookie_struct; struct dentry { @@ -101,11 +99,9 @@ struct dentry { struct qstr d_name; struct hlist_node d_hash; /* lookup hash list */ struct hlist_head * d_bucket; /* lookup hash bucket */ - unsigned char d_iname[DNAME_INLINE_LEN_MIN]; /* small names */ -} ____cacheline_aligned; + unsigned char d_iname[0]; /* small names */ +}; -#define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname)) - struct dentry_operations { int (*d_revalidate)(struct dentry *, struct nameidata *); int (*d_hash) (struct dentry *, struct qstr *); _ ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 4:12 ` Andrew Morton @ 2004-05-09 4:25 ` Linus Torvalds 2004-05-09 4:36 ` Andrew Morton 2004-05-09 4:43 ` Linus Torvalds 0 siblings, 2 replies; 62+ messages in thread From: Linus Torvalds @ 2004-05-09 4:25 UTC (permalink / raw) To: Andrew Morton; +Cc: dipankar, manfred, davej, wli, linux-kernel, maneesh On Sat, 8 May 2004, Andrew Morton wrote: > > I think that'll crash. DNAME_INLINE_LEN_MIN is the *minimum* storage in > the dentry. The actual storage is assumed to be DNAME_INLINE_LEN and the > patch doesn't change that value. It doesn't need to, the value is correct. > The calculation of DNAME_INLINE_LEN assumes that slab padded the dentry out > to the next cacheline. No it doesn't. It's just: #define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname)) which is always right. It used to assume the padding because of the "____cacheline_aligned" on the struct dentry, which obviously also padded out the "sizeof()". But since I removed that, DNAME_INLINE_LEN is still correct. NOTE! It's absolutely true that DNAME_INLINE_LEN may still be different from DNAME_INLINE_LEN_MIN. In particular, if something inside the struct makes the alignment of "struct dentry" be bigger than the offset of the last field, then DNAME_INLINE_LEN will be different from (bigger than) DNAME_INLINE_LEN. It's just that with current architectures, I don't believe that will happen in practice. But I left it as-is, because at least in theory it could happen. Linus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 4:25 ` Linus Torvalds @ 2004-05-09 4:36 ` Andrew Morton 2004-05-09 5:14 ` Linus Torvalds 2004-05-09 4:43 ` Linus Torvalds 1 sibling, 1 reply; 62+ messages in thread From: Andrew Morton @ 2004-05-09 4:36 UTC (permalink / raw) To: Linus Torvalds; +Cc: dipankar, manfred, davej, wli, linux-kernel, maneesh Linus Torvalds <torvalds@osdl.org> wrote: > > > The calculation of DNAME_INLINE_LEN assumes that slab padded the dentry out > > to the next cacheline. > > No it doesn't. It's just: > > #define DNAME_INLINE_LEN (sizeof(struct dentry)-offsetof(struct dentry,d_iname)) > > which is always right. erk. OK. Things are (much) worse than I thought. The 24 byte limit means that 20% of my names will be externally allocated, but that's no worse than what we had before. We should make this change to 2.4.x. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 4:36 ` Andrew Morton @ 2004-05-09 5:14 ` Linus Torvalds 2004-05-09 7:36 ` Rodolfo Guluarte Hale 2004-05-09 9:10 ` Guennadi Liakhovetski 0 siblings, 2 replies; 62+ messages in thread From: Linus Torvalds @ 2004-05-09 5:14 UTC (permalink / raw) To: Andrew Morton; +Cc: dipankar, manfred, davej, wli, Kernel Mailing List, maneesh On Sat, 8 May 2004, Andrew Morton wrote: > > erk. OK. Things are (much) worse than I thought. The 24 byte limit means > that 20% of my names will be externally allocated, but that's no worse than > what we had before. In fact, it's better than what we had before at least on 64-bit archtiectures. But I'd be happy to make the DNAME_INLINE_LEN_MIN #define larger - I just think we should try to shrink the internal structure fields first. Btw, at least for the kernel sources, my statistics say that filename distribution (in a built tree, and with BK) is 1: 5.04 % ( 5.04 % cum -- 2246) 2: 5.19 % ( 10.23 % cum -- 2312) 3: 0.55 % ( 10.79 % cum -- 247) 4: 3.30 % ( 14.08 % cum -- 1469) 5: 3.35 % ( 17.43 % cum -- 1492) 6: 4.35 % ( 21.79 % cum -- 1940) 7: 7.55 % ( 29.34 % cum -- 3365) 8: 9.64 % ( 38.98 % cum -- 4293) 9: 9.17 % ( 48.15 % cum -- 4084) 10: 10.98 % ( 59.12 % cum -- 4891) 11: 7.65 % ( 66.77 % cum -- 3406) 12: 7.01 % ( 73.78 % cum -- 3122) 13: 5.16 % ( 78.94 % cum -- 2298) 14: 3.83 % ( 82.77 % cum -- 1706) 15: 3.47 % ( 86.24 % cum -- 1545) 16: 2.11 % ( 88.34 % cum -- 939) 17: 1.47 % ( 89.81 % cum -- 655) 18: 1.06 % ( 90.87 % cum -- 472) 19: 0.68 % ( 91.55 % cum -- 303) 20: 0.42 % ( 91.97 % cum -- 188) 21: 0.29 % ( 92.26 % cum -- 128) 22: 0.24 % ( 92.50 % cum -- 107) 23: 0.14 % ( 92.64 % cum -- 63) ie we've reached 92% of all names with 24-byte inline thing. For my whole disk, I have similar stats: 1: 6.59 % ( 6.59 % cum -- 71690) 2: 6.86 % ( 13.45 % cum -- 74611) 3: 1.59 % ( 15.04 % cum -- 17292) 4: 3.77 % ( 18.81 % cum -- 40992) 5: 3.11 % ( 21.92 % cum -- 33884) 6: 4.13 % ( 26.05 % cum -- 44898) 7: 6.97 % ( 33.01 % cum -- 75774) 8: 8.13 % ( 41.15 % cum -- 88451) 9: 7.81 % ( 48.96 % cum -- 84987) 10: 9.56 % ( 58.52 % cum -- 104021) 11: 7.67 % ( 66.19 % cum -- 83403) 12: 8.07 % ( 74.26 % cum -- 87826) 13: 4.38 % ( 78.65 % cum -- 47690) 14: 3.36 % ( 82.01 % cum -- 36592) 15: 2.71 % ( 84.71 % cum -- 29431) 16: 1.78 % ( 86.49 % cum -- 19311) 17: 1.35 % ( 87.84 % cum -- 14703) 18: 1.05 % ( 88.89 % cum -- 11410) 19: 0.82 % ( 89.71 % cum -- 8952) 20: 0.77 % ( 90.49 % cum -- 8423) 21: 0.85 % ( 91.34 % cum -- 9264) 22: 0.72 % ( 92.06 % cum -- 7798) 23: 0.69 % ( 92.75 % cum -- 7534) so it appears that I'm either a sad case with a lot of source code on my disk, or you have overlong filenames that brings up your stats. Or my program is broken. Entirely possible. Whee. 149 characters is my winning entry: /usr/share/doc/HTML/en/kdelibs-3.1-apidocs/kdecore/html/classKGenericFactory_3_01KTypeList_3_01Product_00_01ProductListTail_01_4_00_01KTypeList_3_01ParentType_00_01ParentTypeListTail_01_4_01_4-members.html That's obscene. Linus ----- /* * (C) Copyright 2003 Linus Torvalds * * "bkr" - recusrive "bk" invocations aka "bk -r" */ #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/param.h> #include <fcntl.h> #include <dirent.h> #include <string.h> #include <regex.h> /* * Very generic directory tree handling. */ static int bkr(const char *path, int pathlen, void (*regcallback)(const char *path, int pathlen, const char *name, int namelen), void (*dircallback)(const char *path, int pathlen, const char *name, int namelen)) { struct dirent *de; char fullname[MAXPATHLEN + 1]; char *ptr = fullname + pathlen; DIR *base = opendir(path); if (!base) return 0; memcpy(fullname, path, pathlen); while ((de = readdir(base)) != NULL) { int len; len = strlen(de->d_name); memcpy(ptr, de->d_name, len+1); if (dircallback) { switch (de->d_type) { struct stat st; case DT_UNKNOWN: if (stat(fullname, &st)) break; if (!S_ISDIR(st.st_mode)) break; case DT_DIR: if (de->d_name[0] == '.') { if (len == 1) break; if (de->d_name[1] == '.' && len == 2) break; } ptr[len] = '/'; ptr[len+1] = '\0'; dircallback(fullname, pathlen + len + 1, de->d_name, len); continue; } } regcallback(fullname, pathlen + len, de->d_name, len); } closedir(base); return 0; } static int total; static int len[256]; static void file(const char *path, int pathlen, const char *name, int namelen) { total++; len[namelen]++; } static void dir(const char *path, int pathlen, const char *name, int namelen) { file(path, pathlen, name, namelen); bkr(path, pathlen, file, dir); } int main(int argc, char **argv) { int i; double sum = 0.0; bkr(".", 0, file, dir); for (i = 0; i < 256; i++) { int nr = len[i]; if (nr) { double this = (double) nr * 100.0 / total; sum += this; printf("%4i: %8.2f %% (%8.2f %% cum -- %d)\n", i, this, sum, nr); } } } ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 5:14 ` Linus Torvalds @ 2004-05-09 7:36 ` Rodolfo Guluarte Hale 2004-05-09 9:10 ` Guennadi Liakhovetski 1 sibling, 0 replies; 62+ messages in thread From: Rodolfo Guluarte Hale @ 2004-05-09 7:36 UTC (permalink / raw) To: linux-kernel ----- Original Message ----- From: "Linus Torvalds" <torvalds@osdl.org> To: "Andrew Morton" <akpm@osdl.org> Cc: <dipankar@in.ibm.com>; <manfred@colorfullife.com>; <davej@redhat.com>; <wli@holomorphy.com>; "Kernel Mailing List" <linux-kernel@vger.kernel.org>; <maneesh@in.ibm.com> Sent: Saturday, May 08, 2004 11:14 PM Subject: Re: dentry bloat. > > > On Sat, 8 May 2004, Andrew Morton wrote: > > > > erk. OK. Things are (much) worse than I thought. The 24 byte limit means > > that 20% of my names will be externally allocated, but that's no worse than > > what we had before. > > In fact, it's better than what we had before at least on 64-bit > archtiectures. > > But I'd be happy to make the DNAME_INLINE_LEN_MIN #define larger - I just > think we should try to shrink the internal structure fields first. > > Btw, at least for the kernel sources, my statistics say that filename > distribution (in a built tree, and with BK) is > > 1: 5.04 % ( 5.04 % cum -- 2246) > 2: 5.19 % ( 10.23 % cum -- 2312) > 3: 0.55 % ( 10.79 % cum -- 247) > 4: 3.30 % ( 14.08 % cum -- 1469) > 5: 3.35 % ( 17.43 % cum -- 1492) > 6: 4.35 % ( 21.79 % cum -- 1940) > 7: 7.55 % ( 29.34 % cum -- 3365) > 8: 9.64 % ( 38.98 % cum -- 4293) > 9: 9.17 % ( 48.15 % cum -- 4084) > 10: 10.98 % ( 59.12 % cum -- 4891) > 11: 7.65 % ( 66.77 % cum -- 3406) > 12: 7.01 % ( 73.78 % cum -- 3122) > 13: 5.16 % ( 78.94 % cum -- 2298) > 14: 3.83 % ( 82.77 % cum -- 1706) > 15: 3.47 % ( 86.24 % cum -- 1545) > 16: 2.11 % ( 88.34 % cum -- 939) > 17: 1.47 % ( 89.81 % cum -- 655) > 18: 1.06 % ( 90.87 % cum -- 472) > 19: 0.68 % ( 91.55 % cum -- 303) > 20: 0.42 % ( 91.97 % cum -- 188) > 21: 0.29 % ( 92.26 % cum -- 128) > 22: 0.24 % ( 92.50 % cum -- 107) > 23: 0.14 % ( 92.64 % cum -- 63) > > ie we've reached 92% of all names with 24-byte inline thing. > > For my whole disk, I have similar stats: > > 1: 6.59 % ( 6.59 % cum -- 71690) > 2: 6.86 % ( 13.45 % cum -- 74611) > 3: 1.59 % ( 15.04 % cum -- 17292) > 4: 3.77 % ( 18.81 % cum -- 40992) > 5: 3.11 % ( 21.92 % cum -- 33884) > 6: 4.13 % ( 26.05 % cum -- 44898) > 7: 6.97 % ( 33.01 % cum -- 75774) > 8: 8.13 % ( 41.15 % cum -- 88451) > 9: 7.81 % ( 48.96 % cum -- 84987) > 10: 9.56 % ( 58.52 % cum -- 104021) > 11: 7.67 % ( 66.19 % cum -- 83403) > 12: 8.07 % ( 74.26 % cum -- 87826) > 13: 4.38 % ( 78.65 % cum -- 47690) > 14: 3.36 % ( 82.01 % cum -- 36592) > 15: 2.71 % ( 84.71 % cum -- 29431) > 16: 1.78 % ( 86.49 % cum -- 19311) > 17: 1.35 % ( 87.84 % cum -- 14703) > 18: 1.05 % ( 88.89 % cum -- 11410) > 19: 0.82 % ( 89.71 % cum -- 8952) > 20: 0.77 % ( 90.49 % cum -- 8423) > 21: 0.85 % ( 91.34 % cum -- 9264) > 22: 0.72 % ( 92.06 % cum -- 7798) > 23: 0.69 % ( 92.75 % cum -- 7534) > > so it appears that I'm either a sad case with a lot of source code on my > disk, or you have overlong filenames that brings up your stats. > > Or my program is broken. Entirely possible. > > Whee. 149 characters is my winning entry: > > /usr/share/doc/HTML/en/kdelibs-3.1-apidocs/kdecore/html/classKGenericFactory _3_01KTypeList_3_01Product_00_01ProductListTail_01_4_00_01KTypeList_3_01Pare ntType_00_01ParentTypeListTail_01_4_01_4-members.html > > That's obscene. > > Linus > > ----- > /* > * (C) Copyright 2003 Linus Torvalds > * > * "bkr" - recusrive "bk" invocations aka "bk -r" > */ > > #include <stdio.h> > #include <stdlib.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <sys/param.h> > #include <fcntl.h> > #include <dirent.h> > #include <string.h> > #include <regex.h> > > /* > * Very generic directory tree handling. > */ > static int bkr(const char *path, int pathlen, > void (*regcallback)(const char *path, int pathlen, const char *name, int namelen), > void (*dircallback)(const char *path, int pathlen, const char *name, int namelen)) > { > struct dirent *de; > char fullname[MAXPATHLEN + 1]; > char *ptr = fullname + pathlen; > DIR *base = opendir(path); > > if (!base) > return 0; > memcpy(fullname, path, pathlen); > > while ((de = readdir(base)) != NULL) { > int len; > > len = strlen(de->d_name); > memcpy(ptr, de->d_name, len+1); > > if (dircallback) { > switch (de->d_type) { > struct stat st; > case DT_UNKNOWN: > if (stat(fullname, &st)) > break; > if (!S_ISDIR(st.st_mode)) > break; > case DT_DIR: > if (de->d_name[0] == '.') { > if (len == 1) > break; > if (de->d_name[1] == '.' && len == 2) > break; > } > ptr[len] = '/'; > ptr[len+1] = '\0'; > dircallback(fullname, pathlen + len + 1, de->d_name, len); > continue; > } > } > regcallback(fullname, pathlen + len, de->d_name, len); > } > closedir(base); > return 0; > } > > static int total; > static int len[256]; > > static void file(const char *path, int pathlen, const char *name, int namelen) > { > total++; > len[namelen]++; > } > > static void dir(const char *path, int pathlen, const char *name, int namelen) > { > file(path, pathlen, name, namelen); > bkr(path, pathlen, file, dir); > } > > > int main(int argc, char **argv) > { > int i; > double sum = 0.0; > > bkr(".", 0, file, dir); > for (i = 0; i < 256; i++) { > int nr = len[i]; > if (nr) { > double this = (double) nr * 100.0 / total; > sum += this; > printf("%4i: %8.2f %% (%8.2f %% cum -- %d)\n", i, this, sum, nr); > } > } > } > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > http://www.ocioyocio.com/ ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 5:14 ` Linus Torvalds 2004-05-09 7:36 ` Rodolfo Guluarte Hale @ 2004-05-09 9:10 ` Guennadi Liakhovetski 2004-05-09 9:23 ` viro 2004-05-09 15:35 ` Linus Torvalds 1 sibling, 2 replies; 62+ messages in thread From: Guennadi Liakhovetski @ 2004-05-09 9:10 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, dipankar, manfred, davej, wli, Kernel Mailing List, maneesh On Sat, 8 May 2004, Linus Torvalds wrote: > 1: 5.04 % ( 5.04 % cum -- 2246) > 2: 5.19 % ( 10.23 % cum -- 2312) Ok, risking to state the obvious - it was intentional to count "."s and ".."s, wasn't it? Just this makes it a bit non-trivial to compare this statistics with Andrew's. > 23: 0.14 % ( 92.64 % cum -- 63) > > ie we've reached 92% of all names with 24-byte inline thing. [OT, educational]: Do "." and ".." actually take dentries? Thanks Guennadi --- Guennadi Liakhovetski ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 9:10 ` Guennadi Liakhovetski @ 2004-05-09 9:23 ` viro 2004-05-09 15:35 ` Linus Torvalds 1 sibling, 0 replies; 62+ messages in thread From: viro @ 2004-05-09 9:23 UTC (permalink / raw) To: Guennadi Liakhovetski Cc: Linus Torvalds, Andrew Morton, dipankar, manfred, davej, wli, Kernel Mailing List, maneesh On Sun, May 09, 2004 at 11:10:30AM +0200, Guennadi Liakhovetski wrote: > [OT, educational]: Do "." and ".." actually take dentries? No. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 9:10 ` Guennadi Liakhovetski 2004-05-09 9:23 ` viro @ 2004-05-09 15:35 ` Linus Torvalds 2004-05-09 18:11 ` Matt Mackall ` (2 more replies) 1 sibling, 3 replies; 62+ messages in thread From: Linus Torvalds @ 2004-05-09 15:35 UTC (permalink / raw) To: Guennadi Liakhovetski Cc: Andrew Morton, dipankar, manfred, davej, wli, Kernel Mailing List, maneesh On Sun, 9 May 2004, Guennadi Liakhovetski wrote: > On Sat, 8 May 2004, Linus Torvalds wrote: > > > 1: 5.04 % ( 5.04 % cum -- 2246) > > 2: 5.19 % ( 10.23 % cum -- 2312) > > Ok, risking to state the obvious - it was intentional to count "."s and > ".."s, wasn't it? Just this makes it a bit non-trivial to compare this > statistics with Andrew's. Ok, here's a version that doesn't count "." and "..". My numbers don't really change much for the kernel: 1: 0.00 % ( 0.00 % cum -- 2) 2: 0.17 % ( 0.17 % cum -- 68) 3: 0.62 % ( 0.79 % cum -- 247) 4: 3.67 % ( 4.46 % cum -- 1469) 5: 3.72 % ( 8.18 % cum -- 1492) 6: 4.84 % ( 13.03 % cum -- 1940) 7: 8.40 % ( 21.43 % cum -- 3365) 8: 10.72 % ( 32.14 % cum -- 4293) 9: 10.19 % ( 42.34 % cum -- 4084) 10: 12.21 % ( 54.55 % cum -- 4891) 11: 8.50 % ( 63.05 % cum -- 3406) 12: 7.79 % ( 70.84 % cum -- 3122) 13: 5.74 % ( 76.58 % cum -- 2298) 14: 4.26 % ( 80.84 % cum -- 1706) 15: 3.86 % ( 84.69 % cum -- 1545) 16: 2.34 % ( 87.04 % cum -- 939) 17: 1.64 % ( 88.67 % cum -- 655) 18: 1.18 % ( 89.85 % cum -- 472) 19: 0.76 % ( 90.61 % cum -- 303) 20: 0.47 % ( 91.08 % cum -- 188) 21: 0.32 % ( 91.40 % cum -- 128) 22: 0.27 % ( 91.66 % cum -- 107) 23: 0.16 % ( 91.82 % cum -- 63) and we've reached over 90% coverage with the 24-byte inline name. (We don't strictly _need_ the terminating '\0' at the end of the dentry name, since they are all counted strings anyway, but it certainly is safer that way). Same goes for my full tree: 1: 0.11 % ( 0.11 % cum -- 1073) 2: 0.41 % ( 0.53 % cum -- 3918) 3: 1.82 % ( 2.35 % cum -- 17256) 4: 4.32 % ( 6.68 % cum -- 40925) 5: 3.58 % ( 10.25 % cum -- 33860) 6: 4.74 % ( 15.00 % cum -- 44874) 7: 8.00 % ( 23.00 % cum -- 75750) 8: 9.35 % ( 32.35 % cum -- 88451) 9: 8.98 % ( 41.33 % cum -- 84987) 10: 10.99 % ( 52.32 % cum -- 104021) 11: 8.81 % ( 61.13 % cum -- 83404) 12: 9.28 % ( 70.41 % cum -- 87826) 13: 5.04 % ( 75.45 % cum -- 47690) 14: 3.87 % ( 79.32 % cum -- 36592) 15: 3.11 % ( 82.43 % cum -- 29431) 16: 2.04 % ( 84.47 % cum -- 19311) 17: 1.55 % ( 86.02 % cum -- 14703) 18: 1.21 % ( 87.23 % cum -- 11410) 19: 0.95 % ( 88.17 % cum -- 8952) 20: 0.89 % ( 89.07 % cum -- 8423) 21: 0.98 % ( 90.04 % cum -- 9264) 22: 0.82 % ( 90.87 % cum -- 7798) 23: 0.80 % ( 91.66 % cum -- 7534) so Andrew really must have a fairly different setup if he sees 20% filenames being > 23 characters. > [OT, educational]: Do "." and ".." actually take dentries? Nope, they are resolved early. So you're right, I shouldn't have counted them. Linus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 15:35 ` Linus Torvalds @ 2004-05-09 18:11 ` Matt Mackall 2004-05-09 22:08 ` Francois Romieu 2004-05-09 23:51 ` Paul Jackson 2004-05-10 7:17 ` Florian Weimer 2004-05-10 14:12 ` Rik van Riel 2 siblings, 2 replies; 62+ messages in thread From: Matt Mackall @ 2004-05-09 18:11 UTC (permalink / raw) To: Linus Torvalds Cc: Guennadi Liakhovetski, Andrew Morton, dipankar, manfred, davej, wli, Kernel Mailing List, maneesh On Sun, May 09, 2004 at 08:35:55AM -0700, Linus Torvalds wrote: > > > On Sun, 9 May 2004, Guennadi Liakhovetski wrote: > > > On Sat, 8 May 2004, Linus Torvalds wrote: > > > > > 1: 5.04 % ( 5.04 % cum -- 2246) > > > 2: 5.19 % ( 10.23 % cum -- 2312) > > > > Ok, risking to state the obvious - it was intentional to count "."s and > > ".."s, wasn't it? Just this makes it a bit non-trivial to compare this > > statistics with Andrew's. > > Ok, here's a version that doesn't count "." and "..". My numbers don't > really change much for the kernel: > > and we've reached over 90% coverage with the 24-byte inline name. I hacked up something and ran it on my webserver (which has something like 200 shell accounts). My histogram peaked at 12 rather than 10 but still hit 90% cumulative at 21 characters. I suspect worst case is a large LAN fileserver with Samba shares and whatnot, I suspect there are a large number of "A picture of my cute puppy I took last summer.JPG" style filenames there. Anyone have stats for such an FS? -- Matt Mackall : http://www.selenic.com : Linux development and consulting ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 18:11 ` Matt Mackall @ 2004-05-09 22:08 ` Francois Romieu 2004-05-09 23:51 ` Paul Jackson 1 sibling, 0 replies; 62+ messages in thread From: Francois Romieu @ 2004-05-09 22:08 UTC (permalink / raw) To: Matt Mackall Cc: Linus Torvalds, Guennadi Liakhovetski, Andrew Morton, dipankar, manfred, davej, wli, Kernel Mailing List, maneesh Matt Mackall <mpm@selenic.com> : [...] > I suspect worst case is a large LAN fileserver with Samba shares and > whatnot, I suspect there are a large number of "A picture of my cute > puppy I took last summer.JPG" style filenames there. Anyone have stats > for such an FS? General purpose server (mail/smb/cvs/rpm repo) for 15 users: closer to Andrew's results. "." and ".." do not appear in the results below. 1: 0.30 % ( 0.30 % cum -- 4263) 2: 1.36 % ( 1.65 % cum -- 19491) 3: 2.64 % ( 4.29 % cum -- 37905) 4: 2.43 % ( 6.72 % cum -- 34930) 5: 2.00 % ( 8.72 % cum -- 28816) 6: 3.04 % ( 11.75 % cum -- 43652) 7: 4.80 % ( 16.56 % cum -- 69065) 8: 8.50 % ( 25.06 % cum -- 122244) 9: 4.96 % ( 30.01 % cum -- 71310) 10: 10.29 % ( 40.30 % cum -- 147964) 11: 6.29 % ( 46.60 % cum -- 90527) 12: 6.70 % ( 53.30 % cum -- 96428) 13: 4.02 % ( 57.32 % cum -- 57766) 14: 2.84 % ( 60.16 % cum -- 40865) 15: 2.36 % ( 62.52 % cum -- 34000) 16: 2.31 % ( 64.84 % cum -- 33288) 17: 3.58 % ( 68.41 % cum -- 51421) 18: 2.87 % ( 71.28 % cum -- 41308) 19: 2.10 % ( 73.39 % cum -- 30263) 20: 2.36 % ( 75.75 % cum -- 33959) 21: 2.78 % ( 78.53 % cum -- 39951) 22: 1.48 % ( 80.01 % cum -- 21300) 23: 2.01 % ( 82.02 % cum -- 28975) 24: 2.17 % ( 84.19 % cum -- 31169) 25: 1.32 % ( 85.51 % cum -- 19047) 26: 1.02 % ( 86.54 % cum -- 14713) 27: 0.85 % ( 87.39 % cum -- 12282) 28: 0.82 % ( 88.21 % cum -- 11795) 29: 0.75 % ( 88.96 % cum -- 10783) 30: 0.73 % ( 89.69 % cum -- 10507) 31: 0.59 % ( 90.28 % cum -- 8454) [...] 57: 0.15 % ( 99.06 % cum -- 2198) Assuming one is only interested in the users's home directory: 1: 0.37 % ( 0.37 % cum -- 1638) 2: 1.52 % ( 1.90 % cum -- 6672) 3: 3.40 % ( 5.30 % cum -- 14907) 4: 2.61 % ( 7.91 % cum -- 11447) 5: 1.71 % ( 9.62 % cum -- 7498) 6: 2.75 % ( 12.37 % cum -- 12069) 7: 4.83 % ( 17.20 % cum -- 21186) 8: 10.27 % ( 27.48 % cum -- 45034) 9: 4.79 % ( 32.27 % cum -- 21005) 10: 6.23 % ( 38.50 % cum -- 27302) 11: 6.51 % ( 45.01 % cum -- 28551) 12: 5.99 % ( 51.00 % cum -- 26267) 13: 4.13 % ( 55.14 % cum -- 18107) 14: 2.85 % ( 57.98 % cum -- 12479) 15: 2.26 % ( 60.25 % cum -- 9921) 16: 2.28 % ( 62.53 % cum -- 10015) 17: 2.88 % ( 65.41 % cum -- 12604) 18: 2.04 % ( 67.45 % cum -- 8955) 19: 2.26 % ( 69.71 % cum -- 9903) 20: 2.57 % ( 72.28 % cum -- 11260) 21: 2.02 % ( 74.30 % cum -- 8874) 22: 1.48 % ( 75.78 % cum -- 6498) 23: 2.08 % ( 77.86 % cum -- 9109) 24: 2.50 % ( 80.36 % cum -- 10969) 25: 1.35 % ( 81.71 % cum -- 5921) 26: 1.01 % ( 82.72 % cum -- 4428) 27: 0.85 % ( 83.57 % cum -- 3705) 28: 0.84 % ( 84.41 % cum -- 3692) 29: 0.80 % ( 85.21 % cum -- 3489) 30: 0.76 % ( 85.97 % cum -- 3335) 31: 0.63 % ( 86.60 % cum -- 2757) 32: 0.83 % ( 87.43 % cum -- 3633) 33: 0.49 % ( 87.92 % cum -- 2151) 34: 0.53 % ( 88.45 % cum -- 2327) 35: 0.45 % ( 88.90 % cum -- 1959) 36: 0.40 % ( 89.29 % cum -- 1737) 37: 0.47 % ( 89.76 % cum -- 2068) 38: 0.48 % ( 90.24 % cum -- 2103) [...] 60: 0.13 % ( 99.00 % cum -- 583) -- Ueimor ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 18:11 ` Matt Mackall 2004-05-09 22:08 ` Francois Romieu @ 2004-05-09 23:51 ` Paul Jackson 1 sibling, 0 replies; 62+ messages in thread From: Paul Jackson @ 2004-05-09 23:51 UTC (permalink / raw) To: Matt Mackall Cc: torvalds, g.liakhovetski, akpm, dipankar, manfred, davej, wli, linux-kernel, maneesh > Anyone have stats for such an FS? Not a samba share but a dual-boot linux box with a few windows file systems located on various subdirectories of path length 6 to 16 chars. You have to go to length 67 to get 80% of the names: length 3, cum 4, percent 0 length 4, cum 17, percent 0 length 5, cum 23, percent 0 length 6, cum 27, percent 0 length 7, cum 49, percent 0 length 8, cum 101, percent 0 length 9, cum 149, percent 0 length 10, cum 223, percent 0 length 11, cum 314, percent 0 length 12, cum 584, percent 0 length 13, cum 876, percent 0 length 14, cum 1321, percent 0 length 15, cum 1726, percent 0 length 16, cum 2289, percent 0 length 17, cum 3004, percent 0 length 18, cum 3618, percent 0 length 19, cum 4282, percent 0 length 20, cum 5024, percent 0 length 21, cum 6059, percent 1 length 22, cum 7114, percent 1 length 23, cum 8323, percent 1 length 24, cum 9885, percent 1 length 25, cum 11786, percent 1 length 26, cum 14184, percent 2 length 27, cum 16704, percent 2 length 28, cum 19470, percent 3 length 29, cum 23400, percent 3 length 30, cum 28310, percent 4 length 31, cum 35104, percent 5 length 32, cum 40083, percent 6 length 33, cum 46084, percent 7 length 34, cum 52765, percent 8 length 35, cum 59267, percent 9 length 36, cum 66206, percent 11 length 37, cum 73745, percent 12 length 38, cum 82686, percent 13 length 39, cum 93252, percent 15 length 40, cum 105353, percent 17 length 41, cum 118474, percent 19 length 42, cum 130596, percent 22 length 43, cum 143802, percent 24 length 44, cum 156347, percent 26 length 45, cum 169459, percent 28 length 46, cum 184349, percent 31 length 47, cum 199955, percent 33 length 48, cum 215665, percent 36 length 49, cum 232193, percent 39 length 50, cum 248706, percent 41 length 51, cum 263703, percent 44 length 52, cum 278759, percent 47 length 53, cum 294684, percent 49 length 54, cum 310449, percent 52 length 55, cum 326860, percent 55 length 56, cum 343146, percent 57 length 57, cum 359223, percent 60 length 58, cum 375097, percent 63 length 59, cum 389867, percent 65 length 60, cum 409873, percent 69 length 61, cum 421748, percent 71 length 62, cum 433091, percent 73 length 63, cum 443854, percent 74 length 64, cum 454093, percent 76 length 65, cum 463521, percent 78 length 66, cum 472333, percent 79 length 67, cum 480730, percent 81 length 68, cum 488440, percent 82 length 69, cum 494620, percent 83 length 70, cum 500712, percent 84 length 71, cum 506007, percent 85 length 72, cum 510948, percent 86 length 73, cum 515491, percent 86 length 74, cum 520553, percent 87 length 75, cum 524475, percent 88 length 76, cum 528410, percent 89 length 77, cum 532247, percent 89 length 78, cum 535946, percent 90 length 79, cum 539595, percent 90 length 80, cum 543056, percent 91 length 81, cum 546411, percent 92 length 82, cum 549284, percent 92 length 83, cum 552563, percent 93 length 84, cum 555361, percent 93 length 85, cum 558000, percent 94 length 86, cum 560718, percent 94 length 87, cum 563665, percent 95 length 88, cum 566046, percent 95 length 89, cum 568362, percent 95 length 90, cum 570559, percent 96 length 91, cum 572596, percent 96 length 92, cum 574224, percent 96 length 93, cum 575820, percent 97 length 94, cum 577350, percent 97 length 95, cum 578693, percent 97 length 96, cum 580016, percent 97 length 97, cum 581175, percent 98 length 98, cum 582349, percent 98 length 99, cum 583471, percent 98 length 100, cum 584385, percent 98 length 101, cum 585189, percent 98 length 102, cum 585926, percent 98 length 103, cum 586489, percent 98 length 104, cum 587178, percent 99 length 105, cum 587746, percent 99 length 106, cum 588257, percent 99 length 107, cum 588664, percent 99 length 108, cum 589181, percent 99 length 109, cum 589491, percent 99 -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 15:35 ` Linus Torvalds 2004-05-09 18:11 ` Matt Mackall @ 2004-05-10 7:17 ` Florian Weimer 2004-05-10 14:12 ` Rik van Riel 2 siblings, 0 replies; 62+ messages in thread From: Florian Weimer @ 2004-05-10 7:17 UTC (permalink / raw) To: Linus Torvalds Cc: Guennadi Liakhovetski, Andrew Morton, dipankar, manfred, davej, wli, Kernel Mailing List, maneesh * Linus Torvalds: > so Andrew really must have a fairly different setup if he sees 20% > filenames being > 23 characters. A few Maildir folders are more than sufficient for that. -- Current mail filters: many dial-up/DSL/cable modem hosts, and the following domains: atlas.cz, bigpond.com, di-ve.com, hotmail.com, jumpy.it, libero.it, netscape.net, postino.it, simplesnet.pt, tiscali.co.uk, tiscali.cz, tiscali.it, voila.fr, yahoo.com. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 15:35 ` Linus Torvalds 2004-05-09 18:11 ` Matt Mackall 2004-05-10 7:17 ` Florian Weimer @ 2004-05-10 14:12 ` Rik van Riel 2 siblings, 0 replies; 62+ messages in thread From: Rik van Riel @ 2004-05-10 14:12 UTC (permalink / raw) To: Linus Torvalds Cc: Guennadi Liakhovetski, Andrew Morton, dipankar, manfred, davej, wli, Kernel Mailing List, maneesh On Sun, 9 May 2004, Linus Torvalds wrote: > so Andrew really must have a fairly different setup if he sees 20% > filenames being > 23 characters. Just look at the patch names in the -mm tree ;) -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 4:25 ` Linus Torvalds 2004-05-09 4:36 ` Andrew Morton @ 2004-05-09 4:43 ` Linus Torvalds 1 sibling, 0 replies; 62+ messages in thread From: Linus Torvalds @ 2004-05-09 4:43 UTC (permalink / raw) To: Andrew Morton; +Cc: dipankar, manfred, davej, wli, linux-kernel, maneesh On Sat, 8 May 2004, Linus Torvalds wrote: > > NOTE! It's absolutely true that DNAME_INLINE_LEN may still be different > from DNAME_INLINE_LEN_MIN. In particular, if something inside the struct > makes the alignment of "struct dentry" be bigger than the offset of the > last field, then DNAME_INLINE_LEN will be different from (bigger than) > DNAME_INLINE_LEN. Btw, this does depend on the fact that a regular "kmem_cache_create()" with a alignment of 0 had better return a pointer that is always "aligned enough" for the architecture in question. But that had better be true anyway, since otherwise everything would basically have to specify the alignment by hand, and the alignment of 0 would be useless. Linus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 20:42 ` Dipankar Sarma 2004-05-08 20:55 ` Andrew Morton @ 2004-05-09 7:28 ` Manfred Spraul 2004-05-09 15:33 ` Dipankar Sarma 1 sibling, 1 reply; 62+ messages in thread From: Manfred Spraul @ 2004-05-09 7:28 UTC (permalink / raw) To: dipankar Cc: Linus Torvalds, Andrew Morton, davej, wli, Kernel Mailing List, Maneesh Soni Dipankar Sarma wrote: >On Sat, May 08, 2004 at 12:27:50PM -0700, Linus Torvalds wrote: > > >>And yes, removing d_movecount would be ok by then, as long as we re-test >>the parent inside d_lock (we don't need to re-test "hash", since if we >>tested the full name inside the lock, the hash had better match too ;) >> >> What's the prupose of d_move_count? AFAICS it protects against a double rename: first to different bucket, then back to original bucket. This changes the position of the dentry in the hash chain and a concurrent lookup would skip entries. d_lock wouldn't prevent that. But I think d_bucket could be removed: for __d_lookup the test appears to be redundant with the d_move_count test. The remaining users are not performance critical, they could recalculate the bucket from d_parent and d_name.hash. -- Manfred ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 7:28 ` Manfred Spraul @ 2004-05-09 15:33 ` Dipankar Sarma 2004-05-09 22:17 ` viro 0 siblings, 1 reply; 62+ messages in thread From: Dipankar Sarma @ 2004-05-09 15:33 UTC (permalink / raw) To: Manfred Spraul Cc: Linus Torvalds, Andrew Morton, davej, wli, Kernel Mailing List, Maneesh Soni On Sun, May 09, 2004 at 09:28:46AM +0200, Manfred Spraul wrote: > What's the prupose of d_move_count? > AFAICS it protects against a double rename: first to different bucket, > then back to original bucket. This changes the position of the dentry in > the hash chain and a concurrent lookup would skip entries. > d_lock wouldn't prevent that. Actually, what may happen is that since the dentries are added in the front, a double move like that would result in hash chain traversal looping. Timing dependent and unlikely, but d_move_count avoided that theoritical possibility. It is not about skipping dentries which is safe because a miss would result in a real_lookup() anyway. If we do the comparison within d_lock, then atleast we are guaranteed to get the right file. The remaining question is whether we violate POSIX rename semantics in some twisted way. > But I think d_bucket could be removed: for __d_lookup the test appears > to be redundant with the d_move_count test. The remaining users are not > performance critical, they could recalculate the bucket from d_parent > and d_name.hash. Yes, afaics, d_move_count can effectively be used to work around the possibility of the dentry moving to a different hash bucket. Thanks Dipankar ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 15:33 ` Dipankar Sarma @ 2004-05-09 22:17 ` viro 2004-05-09 22:27 ` Andrew Morton 2004-05-10 18:39 ` Dipankar Sarma 0 siblings, 2 replies; 62+ messages in thread From: viro @ 2004-05-09 22:17 UTC (permalink / raw) To: Dipankar Sarma Cc: Manfred Spraul, Linus Torvalds, Andrew Morton, davej, wli, Kernel Mailing List, Maneesh Soni On Sun, May 09, 2004 at 09:03:16PM +0530, Dipankar Sarma wrote: > Actually, what may happen is that since the dentries are added > in the front, a double move like that would result in hash chain > traversal looping. Timing dependent and unlikely, but d_move_count > avoided that theoritical possibility. It is not about skipping > dentries which is safe because a miss would result in a real_lookup() Not really. A miss could result in getting another dentry allocated for the same e.g. directory, which is *NOT* harmless at all. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 22:17 ` viro @ 2004-05-09 22:27 ` Andrew Morton 2004-05-11 5:26 ` Maneesh Soni 2004-05-10 18:39 ` Dipankar Sarma 1 sibling, 1 reply; 62+ messages in thread From: Andrew Morton @ 2004-05-09 22:27 UTC (permalink / raw) To: viro; +Cc: dipankar, manfred, torvalds, davej, wli, linux-kernel, maneesh viro@parcelfarce.linux.theplanet.co.uk wrote: > > On Sun, May 09, 2004 at 09:03:16PM +0530, Dipankar Sarma wrote: > > > Actually, what may happen is that since the dentries are added > > in the front, a double move like that would result in hash chain > > traversal looping. Timing dependent and unlikely, but d_move_count > > avoided that theoritical possibility. It is not about skipping > > dentries which is safe because a miss would result in a real_lookup() > > Not really. A miss could result in getting another dentry allocated > for the same e.g. directory, which is *NOT* harmless at all. The d_bucket logic does look a bit odd. dentry = hlist_entry(node, struct dentry, d_hash); /* if lookup ends up in a different bucket * due to concurrent rename, fail it */ if (unlikely(dentry->d_bucket != head)) break; /* * We must take a snapshot of d_move_count followed by * read memory barrier before any search key comparison */ move_count = dentry->d_move_count; There is a window between the d_bucket test and sampling of d_move_count. What happens if the dentry gets moved around in there? Anyway, regardless of that, it is more efficient to test d_bucket _after_ performing the hash comparison. And it seems saner to perform the d_bucket check when things are pinned down by d_lock. 25-akpm/fs/dcache.c | 14 ++++++++------ 1 files changed, 8 insertions(+), 6 deletions(-) diff -puN fs/dcache.c~dentry-d_bucket-fix fs/dcache.c --- 25/fs/dcache.c~dentry-d_bucket-fix 2004-05-09 14:06:51.816554824 -0700 +++ 25-akpm/fs/dcache.c 2004-05-09 14:07:41.932935976 -0700 @@ -975,12 +975,6 @@ struct dentry * __d_lookup(struct dentry smp_read_barrier_depends(); dentry = hlist_entry(node, struct dentry, d_hash); - /* if lookup ends up in a different bucket - * due to concurrent rename, fail it - */ - if (unlikely(dentry->d_bucket != head)) - break; - smp_rmb(); if (dentry->d_name.hash != hash) @@ -991,6 +985,13 @@ struct dentry * __d_lookup(struct dentry spin_lock(&dentry->d_lock); /* + * If lookup ends up in a different bucket due to concurrent + * rename, fail it + */ + if (unlikely(dentry->d_bucket != head)) + goto terminate; + + /* * Recheck the dentry after taking the lock - d_move may have * changed things. Don't bother checking the hash because we're * about to compare the whole name anyway. @@ -1014,6 +1015,7 @@ struct dentry * __d_lookup(struct dentry atomic_inc(&dentry->d_count); found = dentry; } +terminate: spin_unlock(&dentry->d_lock); break; next: _ ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 22:27 ` Andrew Morton @ 2004-05-11 5:26 ` Maneesh Soni 0 siblings, 0 replies; 62+ messages in thread From: Maneesh Soni @ 2004-05-11 5:26 UTC (permalink / raw) To: Andrew Morton; +Cc: viro, dipankar, manfred, torvalds, davej, wli, linux-kernel On Sun, May 09, 2004 at 03:27:20PM -0700, Andrew Morton wrote: > viro@parcelfarce.linux.theplanet.co.uk wrote: > > > > On Sun, May 09, 2004 at 09:03:16PM +0530, Dipankar Sarma wrote: > > > > > Actually, what may happen is that since the dentries are added > > > in the front, a double move like that would result in hash chain > > > traversal looping. Timing dependent and unlikely, but d_move_count > > > avoided that theoritical possibility. It is not about skipping > > > dentries which is safe because a miss would result in a real_lookup() > > > > Not really. A miss could result in getting another dentry allocated > > for the same e.g. directory, which is *NOT* harmless at all. > > The d_bucket logic does look a bit odd. > > dentry = hlist_entry(node, struct dentry, d_hash); > > /* if lookup ends up in a different bucket > * due to concurrent rename, fail it > */ > if (unlikely(dentry->d_bucket != head)) > break; > > /* > * We must take a snapshot of d_move_count followed by > * read memory barrier before any search key comparison > */ > move_count = dentry->d_move_count; > > There is a window between the d_bucket test and sampling of d_move_count. > What happens if the dentry gets moved around in there? > > Anyway, regardless of that, it is more efficient to test d_bucket _after_ > performing the hash comparison. And it seems saner to perform the d_bucket > check when things are pinned down by d_lock. > This should be fine. Earlier d_bucket check was done before "continue" as the lookup used to loop infinetly. The reason for infinite looping was that lookup going to a different hash bucket due to concurrent d_move and not finding the list head from where it started. After introduction of hlist, there is less chance of lookup looping infinitely even if it is moved to a different hash bucket as hlist ends with NULL. But I still see theoritical possibilty of increased looping. Double rename can keep putting lookup back at the head of hash chain and hlist end is never seen. -- Maneesh Soni IBM Linux Technology Center, IBM India Software Lab, Bangalore. Phone: +91-80-5044999 email: maneesh@in.ibm.com http://lse.sourceforge.net/ ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-09 22:17 ` viro 2004-05-09 22:27 ` Andrew Morton @ 2004-05-10 18:39 ` Dipankar Sarma 2004-05-11 5:17 ` Maneesh Soni 1 sibling, 1 reply; 62+ messages in thread From: Dipankar Sarma @ 2004-05-10 18:39 UTC (permalink / raw) To: viro Cc: Manfred Spraul, Linus Torvalds, Andrew Morton, davej, wli, Kernel Mailing List, Maneesh Soni On Sun, May 09, 2004 at 11:17:12PM +0100, viro@parcelfarce.linux.theplanet.co.uk wrote: > On Sun, May 09, 2004 at 09:03:16PM +0530, Dipankar Sarma wrote: > > > Actually, what may happen is that since the dentries are added > > in the front, a double move like that would result in hash chain > > traversal looping. Timing dependent and unlikely, but d_move_count > > avoided that theoritical possibility. It is not about skipping > > dentries which is safe because a miss would result in a real_lookup() > > Not really. A miss could result in getting another dentry allocated > for the same e.g. directory, which is *NOT* harmless at all. AFAICS, a miss in __d_lookup would result in a repeat lookup under dcache_lock in which case we are safe or real_lookup() which in turn does another lookup with dcache_lock. Is there a path that I am missing here ? Thanks Dipankar ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-10 18:39 ` Dipankar Sarma @ 2004-05-11 5:17 ` Maneesh Soni 0 siblings, 0 replies; 62+ messages in thread From: Maneesh Soni @ 2004-05-11 5:17 UTC (permalink / raw) To: Dipankar Sarma Cc: viro, Manfred Spraul, Linus Torvalds, Andrew Morton, davej, wli, Kernel Mailing List On Tue, May 11, 2004 at 12:09:25AM +0530, Dipankar Sarma wrote: > On Sun, May 09, 2004 at 11:17:12PM +0100, viro@parcelfarce.linux.theplanet.co.uk wrote: > > On Sun, May 09, 2004 at 09:03:16PM +0530, Dipankar Sarma wrote: > > > > > Actually, what may happen is that since the dentries are added > > > in the front, a double move like that would result in hash chain > > > traversal looping. Timing dependent and unlikely, but d_move_count > > > avoided that theoritical possibility. It is not about skipping > > > dentries which is safe because a miss would result in a real_lookup() > > > > Not really. A miss could result in getting another dentry allocated > > for the same e.g. directory, which is *NOT* harmless at all. > > AFAICS, a miss in __d_lookup would result in a repeat lookup > under dcache_lock in which case we are safe or real_lookup() > which in turn does another lookup with dcache_lock. Is there > a path that I am missing here ? Actually, real_lookup is done with parent's i_sem (avoiding rename is the same directory) and it also uses rename_lock (seqence lock) which provides protection against d_move. (real_lookup() --> d_lookup() --> __d_lookup()). So as real_lookup() does repeat the cached lookup, I don't see any chance of missing a hashed dentry and allocating one more dentry with the same name. Thanks Maneesh -- Maneesh Soni IBM Linux Technology Center, IBM India Software Lab, Bangalore. Phone: +91-80-5044999 email: maneesh@in.ibm.com http://lse.sourceforge.net/ ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 19:13 ` Linus Torvalds 2004-05-08 19:27 ` Andrew Morton 2004-05-08 19:27 ` Linus Torvalds @ 2004-05-08 20:13 ` Dipankar Sarma 2004-10-06 12:58 ` Maneesh Soni 2 siblings, 1 reply; 62+ messages in thread From: Dipankar Sarma @ 2004-05-08 20:13 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, manfred, davej, wli, linux-kernel On Sat, May 08, 2004 at 12:13:09PM -0700, Linus Torvalds wrote: > On Sat, 8 May 2004, Andrew Morton wrote: > > > > I think we can simply take ->d_lock a bit earlier in __d_lookup. That will > > serialise against d_move(), fixing the problem which you mention, and also > > makes d_movecount go away. > > If you do that, RCU basically loses most of it's meaning. > > You'll be taking a lock for - and dirtying in the cache - every single > dentry on the hash chain, which is pretty much guaranteed to be slower > than just taking the dcache_lock _once_, even if that one jumps across > CPU's a lot. > > In other words, no, I don't think that's a good idea. We really want to > take the dentry lock only after we're pretty sure we have the right > dentry. Otherwise the dentry chains will be bouncing from CPU to CPU all > the time. Exactly. Taking ->d_lock for every dentry while traversing the hash chain should be avoided. As such, atomic operations on that path are getting costly. Thanks Dipankar ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 20:13 ` Dipankar Sarma @ 2004-10-06 12:58 ` Maneesh Soni 2004-05-11 20:22 ` Andrew Morton 0 siblings, 1 reply; 62+ messages in thread From: Maneesh Soni @ 2004-10-06 12:58 UTC (permalink / raw) To: Dipankar Sarma Cc: Linus Torvalds, Andrew Morton, manfred, davej, wli, linux-kernel On Sat, May 08, 2004 at 08:30:55PM +0000, Dipankar Sarma wrote: > On Sat, May 08, 2004 at 12:13:09PM -0700, Linus Torvalds wrote: > > On Sat, 8 May 2004, Andrew Morton wrote: > > > > > > I think we can simply take ->d_lock a bit earlier in __d_lookup. That will > > > serialise against d_move(), fixing the problem which you mention, and also > > > makes d_movecount go away. > > > > If you do that, RCU basically loses most of it's meaning. > > > > You'll be taking a lock for - and dirtying in the cache - every single > > dentry on the hash chain, which is pretty much guaranteed to be slower > > than just taking the dcache_lock _once_, even if that one jumps across > > CPU's a lot. > > > > In other words, no, I don't think that's a good idea. We really want to > > take the dentry lock only after we're pretty sure we have the right > > dentry. Otherwise the dentry chains will be bouncing from CPU to CPU all > > the time. > > Exactly. Taking ->d_lock for every dentry while traversing the hash > chain should be avoided. As such, atomic operations on that path > are getting costly. We can see this happening in the following numbers taken using dcachebench* gathered on 2-way P4 Xeon 2.4MHz SMP box with 4.5GB RAM. The benchmark was run with the following parameters and averaged over 10 runs. ./dcachebench -p 32 -b testdir Average microseconds/iterations Std. Deviation (lesser is better) 2.6.6 10204 161.5 2.6.6-mm1 10571 51.5 *dcachebench is a microbenchmark written by Bill Hartner and is available at http://www-124.ibm.com/developerworks/opensource/linuxperf/dcachebench/dcachebench.html -- Maneesh Soni Linux Technology Center, IBM Software Lab, Bangalore, India email: maneesh@in.ibm.com Phone: 91-80-25044999 Fax: 91-80-25268553 T/L : 9243696 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-10-06 12:58 ` Maneesh Soni @ 2004-05-11 20:22 ` Andrew Morton 2004-05-14 10:33 ` Raghavan 0 siblings, 1 reply; 62+ messages in thread From: Andrew Morton @ 2004-05-11 20:22 UTC (permalink / raw) To: maneesh; +Cc: dipankar, torvalds, manfred, davej, wli, linux-kernel Maneesh Soni <maneesh@in.ibm.com> wrote: > > We can see this happening in the following numbers taken using dcachebench* > gathered on 2-way P4 Xeon 2.4MHz SMP box with 4.5GB RAM. The benchmark was run > with the following parameters and averaged over 10 runs. > ./dcachebench -p 32 -b testdir > > Average microseconds/iterations Std. Deviation > (lesser is better) > 2.6.6 10204 161.5 > 2.6.6-mm1 10571 51.5 > Well.. this could be anything. If the hash is any good -mm shouldn't be doing significantly more locked operations. (I think - didn't check very closely). Also the inode and dentry hash algorithms were changed in -mm. You can evaluate the effect of that by comparing 2.6.6 with current Linus -bk. If we compare 2.6.6-bk with 2.6.6-mm1 and see a slowdown on SMP and no slowdown on UP then yup, it might be due to additional locking. But we should evaluate the hash changes separately. Summary: 2.6.6-rc3: baseline 2.6.6: dentry size+alignment changes 2.6.6-bk: dentry size+alignment changes, hash changes 2.6.6-mm1: dentry size+alignment changes, hash changes, lots of other stuff. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-11 20:22 ` Andrew Morton @ 2004-05-14 10:33 ` Raghavan 2004-05-14 10:50 ` Paul Jackson 2004-05-14 11:18 ` Dipankar Sarma 0 siblings, 2 replies; 62+ messages in thread From: Raghavan @ 2004-05-14 10:33 UTC (permalink / raw) To: Andrew Morton Cc: maneesh, dipankar, torvalds, manfred, davej, wli, linux-kernel Hi, The following is the output of dcachebench run on the 4 kernels: Environment - 2-way P4 Xeon 2.4MHz SMP box with 4.5GB RAM. Tests were run for 10 iterations to calculate the milliseconds/iteration and then mean and deviation were calculated. Kernel version Mean Standard Deviation --------------- ---- ------------------ 2.6.6-rc3(baseline) 10578 221 2.6.6 10280 110 2.6.6-bk 10862 30 2.6.6-mm1 10626 36 To find out if the huge performance dip between the 2.6.6 and 2.6.6-bk is because of the hash changes, I removed the hash patch from 2.6.6-bk and applied it to 2.6.6. 2.6.6-bk with old hash 10685 34 2.6.6 with new hash 10496 125 Looks like the new hashing function has brought down the performance. Also some code outside dcache.c and inode.c seems to have pushed down the performance in 2.6.6-bk. Raghav On Tue, May 11, 2004 at 01:22:05PM -0700, Andrew Morton wrote: > Maneesh Soni <maneesh@in.ibm.com> wrote: > > > > We can see this happening in the following numbers taken using dcachebench* > > gathered on 2-way P4 Xeon 2.4MHz SMP box with 4.5GB RAM. The benchmark was run > > with the following parameters and averaged over 10 runs. > > ./dcachebench -p 32 -b testdir > > > > Average microseconds/iterations Std. Deviation > > (lesser is better) > > 2.6.6 10204 161.5 > > 2.6.6-mm1 10571 51.5 > > > > Well.. this could be anything. If the hash is any good -mm shouldn't be > doing significantly more locked operations. (I think - didn't check very > closely). > > Also the inode and dentry hash algorithms were changed in -mm. You can > evaluate the effect of that by comparing 2.6.6 with current Linus -bk. > > If we compare 2.6.6-bk with 2.6.6-mm1 and see a slowdown on SMP and no > slowdown on UP then yup, it might be due to additional locking. > > But we should evaluate the hash changes separately. > > Summary: > > 2.6.6-rc3: baseline > 2.6.6: dentry size+alignment changes > 2.6.6-bk: dentry size+alignment changes, hash changes > 2.6.6-mm1: dentry size+alignment changes, hash changes, lots of other stuff. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-14 10:33 ` Raghavan @ 2004-05-14 10:50 ` Paul Jackson 2004-05-14 11:04 ` Jens Axboe 2004-05-14 11:18 ` Dipankar Sarma 1 sibling, 1 reply; 62+ messages in thread From: Paul Jackson @ 2004-05-14 10:50 UTC (permalink / raw) To: raghav; +Cc: akpm, maneesh, dipankar, torvalds, manfred, davej, wli, linux-kernel > Looks like the new hashing function has brought down the performance. "brought down the performance" as in "better, faster", right? That is, your numbers show that the new hashing function is good, right? Or do I have it backwards? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-14 10:50 ` Paul Jackson @ 2004-05-14 11:04 ` Jens Axboe 2004-05-14 11:14 ` Paul Jackson 0 siblings, 1 reply; 62+ messages in thread From: Jens Axboe @ 2004-05-14 11:04 UTC (permalink / raw) To: Paul Jackson Cc: raghav, akpm, maneesh, dipankar, torvalds, manfred, davej, wli, linux-kernel On Fri, May 14 2004, Paul Jackson wrote: > > Looks like the new hashing function has brought down the performance. > > "brought down the performance" as in "better, faster", right? > > That is, your numbers show that the new hashing function is good, right? > > Or do I have it backwards? The numbers were ms/iteration, so I guess you do. -- Jens Axboe ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-14 11:04 ` Jens Axboe @ 2004-05-14 11:14 ` Paul Jackson 2004-05-14 11:24 ` Dipankar Sarma 2004-05-14 11:24 ` Jens Axboe 0 siblings, 2 replies; 62+ messages in thread From: Paul Jackson @ 2004-05-14 11:14 UTC (permalink / raw) To: Jens Axboe Cc: raghav, akpm, maneesh, dipankar, torvalds, manfred, davej, wli, linux-kernel > so I guess you do. Sorry - I'm being thick. Is the new hashing good or bad? (Usually, performance is thought of as something 'good', so when you say it is 'brought down', that sounds 'bad', but since it's ms/iteration, I'm guessing that you mean to say that the ms/iteration is lower, which would I guess improves performance, so I'm guessing that bringing performance down is 'good' in this case, which is not idiomatic to the particular version of English I happen to speak ... So please favor this poor old brain of mine and state outright whether the new hash is good or bad. Does the new hash makes performance better or worse?) -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-14 11:14 ` Paul Jackson @ 2004-05-14 11:24 ` Dipankar Sarma 2004-05-14 11:24 ` Jens Axboe 1 sibling, 0 replies; 62+ messages in thread From: Dipankar Sarma @ 2004-05-14 11:24 UTC (permalink / raw) To: Paul Jackson Cc: Jens Axboe, raghav, akpm, maneesh, torvalds, manfred, davej, wli, linux-kernel On Fri, May 14, 2004 at 04:14:33AM -0700, Paul Jackson wrote: > > so I guess you do. > > Sorry - I'm being thick. > > Is the new hashing good or bad? That I don't know from the contradicting descriptions, but... > (Usually, performance is thought of as something 'good', so when you say > it is 'brought down', that sounds 'bad', but since it's ms/iteration, > I'm guessing that you mean to say that the ms/iteration is lower, which > would I guess improves performance, so I'm guessing that bringing > performance down is 'good' in this case, which is not idiomatic to the > particular version of English I happen to speak ... So please favor In this case, the performance is inversely proportional to the benchmark metric. Lower benchmark metric means better performance. Thanks Dipankar ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-14 11:14 ` Paul Jackson 2004-05-14 11:24 ` Dipankar Sarma @ 2004-05-14 11:24 ` Jens Axboe 2004-05-14 11:30 ` Paul Jackson 1 sibling, 1 reply; 62+ messages in thread From: Jens Axboe @ 2004-05-14 11:24 UTC (permalink / raw) To: Paul Jackson Cc: raghav, akpm, maneesh, dipankar, torvalds, manfred, davej, wli, linux-kernel On Fri, May 14 2004, Paul Jackson wrote: > > so I guess you do. > > Sorry - I'm being thick. > > Is the new hashing good or bad? > > (Usually, performance is thought of as something 'good', so when you say > it is 'brought down', that sounds 'bad', but since it's ms/iteration, > I'm guessing that you mean to say that the ms/iteration is lower, which > would I guess improves performance, so I'm guessing that bringing > performance down is 'good' in this case, which is not idiomatic to the > particular version of English I happen to speak ... So please favor > this poor old brain of mine and state outright whether the new hash is > good or bad. Does the new hash makes performance better or worse?) :-) I can only say the way I read the numbers, the new hashing scores higher ms/iteration which is a bad thing. So when it is stated that 'performance is brought down' I completely agree that it describes the situation, performance is worse than before. First table shows 2.6.6 (with old hash) doing better than 2.6.6-BK with new hash. It then shows 2.6.6-Bk with old hash doing worse than 2.6.6 still, so it's not just the hash that has slowed things down. 2.6.6-new_hash does worse than 2.6.6-stock. -- Jens Axboe ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-14 11:24 ` Jens Axboe @ 2004-05-14 11:30 ` Paul Jackson 0 siblings, 0 replies; 62+ messages in thread From: Paul Jackson @ 2004-05-14 11:30 UTC (permalink / raw) To: Jens Axboe Cc: raghav, akpm, maneesh, dipankar, torvalds, manfred, davej, wli, linux-kernel > I can only say the way I read the numbers, Ah - it wasn't your reading of the numbers that I had in doubt. It was my ability to read that I doubted. > ... performance is worse than before. > ... the hash that has slowed things down Ok - that's clear. Thank-you for your patience. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-14 10:33 ` Raghavan 2004-05-14 10:50 ` Paul Jackson @ 2004-05-14 11:18 ` Dipankar Sarma 2004-05-14 14:44 ` Linus Torvalds 1 sibling, 1 reply; 62+ messages in thread From: Dipankar Sarma @ 2004-05-14 11:18 UTC (permalink / raw) To: Raghavan Cc: Andrew Morton, maneesh, torvalds, manfred, davej, wli, linux-kernel On Fri, May 14, 2004 at 04:03:23PM +0530, Raghavan wrote: > > Environment - 2-way P4 Xeon 2.4MHz SMP box with 4.5GB RAM. > Tests were run for 10 iterations to calculate the milliseconds/iteration > and then mean and deviation were calculated. I think it is microseconds/iteration. Lesser the better. > Kernel version Mean Standard Deviation > --------------- ---- ------------------ > > 2.6.6-rc3(baseline) 10578 221 > > 2.6.6 10280 110 So alignment changes helped. > > 2.6.6-bk 10862 30 Hash function changes regressed. > 2.6.6-mm1 10626 36 dentry size change patchset helps. > To find out if the huge performance dip between the 2.6.6 > and 2.6.6-bk is because of the hash changes, I removed the hash patch > from 2.6.6-bk and applied it to 2.6.6. > > 2.6.6-bk with old hash 10685 34 > > 2.6.6 with new hash 10496 125 > > Looks like the new hashing function has brought down the performance. > Also some code outside dcache.c and inode.c seems to have pushed down > the performance in 2.6.6-bk. OK, I am confused. These numbers show that the new hash function is better. It contradicts your conclusion. And why are you comparing 2.6.6-bk+old has with 2.6.6+new hash ? Why not 2.6.6-bk vs. 2.6.6-bk-with-old-hash ? Thanks Dipankar ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-14 11:18 ` Dipankar Sarma @ 2004-05-14 14:44 ` Linus Torvalds 0 siblings, 0 replies; 62+ messages in thread From: Linus Torvalds @ 2004-05-14 14:44 UTC (permalink / raw) To: Dipankar Sarma Cc: Raghavan, Andrew Morton, maneesh, manfred, davej, wli, linux-kernel On Fri, 14 May 2004, Dipankar Sarma wrote: > > > > 2.6.6 10280 110 > > > > 2.6.6-bk 10862 30 > > > To find out if the huge performance dip between the 2.6.6 > > and 2.6.6-bk is because of the hash changes, I removed the hash patch > > from 2.6.6-bk and applied it to 2.6.6. > > > > 2.6.6-bk with old hash 10685 34 > > > > 2.6.6 with new hash 10496 125 > > > > Looks like the new hashing function has brought down the performance. > > Also some code outside dcache.c and inode.c seems to have pushed down > > the performance in 2.6.6-bk. > > OK, I am confused. These numbers show that the new hash function > is better. No, look again. old hash new hash 2.6.6: 10280 10496 2.6.6-bk: 10685 10862 in both cases, the new hash makes each iteration about ~200us (or whatever the metric is) slower. There's something _else_ going on too, since plain 2.6.6 is so clearly better than the others. I don't know why - the only thing -bk does is the hash change and some changes to GFP_NOFS behaviour (different return value from shrink_dentries or whatever). Which shouldn't even trigger, I'd have assumed this is all cached. NOTE! Just "simple" things like variations in I$ layout of the kernel code can make a pretty big difference if you're unlucky. And the new dentry code doesn't align the things on a D$ cache line boundary, so there could easily be "consistent" differences from that - just from the order of dentries allocated etc. But it's interesting to note that the hash does make a difference. The old hash was very much optimized for simplicity (those hash-calculation routines are some of the hottest in the kernel). But I don't see that a few extra instructions should make that big of a difference. Linus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: dentry bloat. 2004-05-08 19:01 ` Andrew Morton 2004-05-08 19:13 ` Linus Torvalds @ 2004-05-08 21:00 ` Dipankar Sarma 1 sibling, 0 replies; 62+ messages in thread From: Dipankar Sarma @ 2004-05-08 21:00 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, manfred, davej, wli, linux-kernel, Maneesh Soni On Sat, May 08, 2004 at 12:01:48PM -0700, Andrew Morton wrote: > Linus Torvalds <torvalds@osdl.org> wrote: > > Also, in your previous patch (which I'm not as convinced might be wrong), > > the d_qstr pointer removal makes me worry: > > > > - struct qstr * d_qstr; /* quick str ptr used in lockless lookup and concurrent d_move */ > > > > I thought the point of d_qstr was that when we do the lockless lookup, > > we're guaranteed to always see "stable storage" in the sense that when we > > follow the d_qstr, we will always get a "char *" + "len" that match, and > > we could never see a partial update (ie len points to the old one, and > > "char *" points to the new one). > > It looks that way. Yes, that is exactly why d_qstr was introduced. The "len" and the storage for the name is then a single update through d_qstr. > > > In particular, think about the "d_compare(parent, qstr, name)" / > > "memcmp(qstr->name, str, len)" part - what if "len" doesn't match str, > > because a concurrent d_move() is updating them, and maybe we will compare > > past the end of kernel mapped memory or something? > > > > (In other words, the "move_count" check should protect us from returning a > > wrong dentry, but I'd worry that we'd do something that could cause > > serious problems before we even get to the "move_count" check). > > > > Hmm? Yes, that is indeed why we had to have d_qstr. > I think we can simply take ->d_lock a bit earlier in __d_lookup. That will > serialise against d_move(), fixing the problem which you mention, and also > makes d_movecount go away. Repeating some of the tests under ->d_lock is worth looking at, but we have to be carefull about performance. ISTR, there was another issue related to calling ->d_compare() under ->d_lock. I will dig a little bit on this, or Maneesh may remember. Thanks Dipankar ^ permalink raw reply [flat|nested] 62+ messages in thread
end of thread, other threads:[~2004-05-14 14:45 UTC | newest]
Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20040506200027.GC26679@redhat.com>
[not found] ` <20040506150944.126bb409.akpm@osdl.org>
[not found] ` <409B1511.6010500@colorfullife.com>
2004-05-08 8:23 ` dentry bloat Andrew Morton
2004-05-08 9:23 ` Andrew Morton
2004-05-08 10:11 ` Andrew Morton
2004-05-08 10:12 ` Andrew Morton
2004-05-08 10:28 ` viro
2004-05-08 10:41 ` Andrew Morton
2004-05-08 10:52 ` Andrew Morton
2004-05-08 10:31 ` Manfred Spraul
2004-05-08 17:28 ` Linus Torvalds
2004-05-08 18:19 ` David S. Miller
2004-05-08 19:01 ` Andrew Morton
2004-05-08 19:13 ` Linus Torvalds
2004-05-08 19:27 ` Andrew Morton
2004-05-08 19:27 ` Linus Torvalds
2004-05-08 20:42 ` Dipankar Sarma
2004-05-08 20:55 ` Andrew Morton
2004-05-08 21:19 ` Dipankar Sarma
2004-05-09 0:10 ` Andrew Morton
2004-05-09 2:55 ` Linus Torvalds
2004-05-09 3:12 ` David S. Miller
2004-05-09 3:53 ` Linus Torvalds
2004-05-09 21:03 ` Matt Mackall
2004-05-10 8:27 ` Helge Hafting
2004-05-10 8:32 ` Arjan van de Ven
2004-05-10 9:46 ` Andrew Morton
2004-05-10 14:54 ` Matt Mackall
2004-05-10 16:26 ` Paul E. McKenney
2004-05-10 18:34 ` Dipankar Sarma
2004-05-09 4:12 ` Andrew Morton
2004-05-09 4:25 ` Linus Torvalds
2004-05-09 4:36 ` Andrew Morton
2004-05-09 5:14 ` Linus Torvalds
2004-05-09 7:36 ` Rodolfo Guluarte Hale
2004-05-09 9:10 ` Guennadi Liakhovetski
2004-05-09 9:23 ` viro
2004-05-09 15:35 ` Linus Torvalds
2004-05-09 18:11 ` Matt Mackall
2004-05-09 22:08 ` Francois Romieu
2004-05-09 23:51 ` Paul Jackson
2004-05-10 7:17 ` Florian Weimer
2004-05-10 14:12 ` Rik van Riel
2004-05-09 4:43 ` Linus Torvalds
2004-05-09 7:28 ` Manfred Spraul
2004-05-09 15:33 ` Dipankar Sarma
2004-05-09 22:17 ` viro
2004-05-09 22:27 ` Andrew Morton
2004-05-11 5:26 ` Maneesh Soni
2004-05-10 18:39 ` Dipankar Sarma
2004-05-11 5:17 ` Maneesh Soni
2004-05-08 20:13 ` Dipankar Sarma
2004-10-06 12:58 ` Maneesh Soni
2004-05-11 20:22 ` Andrew Morton
2004-05-14 10:33 ` Raghavan
2004-05-14 10:50 ` Paul Jackson
2004-05-14 11:04 ` Jens Axboe
2004-05-14 11:14 ` Paul Jackson
2004-05-14 11:24 ` Dipankar Sarma
2004-05-14 11:24 ` Jens Axboe
2004-05-14 11:30 ` Paul Jackson
2004-05-14 11:18 ` Dipankar Sarma
2004-05-14 14:44 ` Linus Torvalds
2004-05-08 21:00 ` Dipankar Sarma
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox