* [patch][rfc] fs: shrink struct dentry
@ 2008-12-01 8:33 Nick Piggin
2008-12-01 11:09 ` Andi Kleen
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Nick Piggin @ 2008-12-01 8:33 UTC (permalink / raw)
To: linux-fsdevel, Linux Memory Management List, robert.richter,
oprofile-list
Hi,
Comments?
Thanks,
Nick
--
struct dentry is one of the most critical structures in the kernel. So it's
sad to see it going neglected.
With CONFIG_PROFILING turned on (which is probably the common case at least
for distros and kernel developers), sizeof(struct dcache) == 208 here
(64-bit). This gives 19 objects per slab.
I packed d_mounted into a hole, and took another 4 bytes off the inline
name length to take the padding out from the end of the structure. This
shinks it to 200 bytes. I could have gone the other way and increased the
length to 40, but I'm aiming for a magic number, read on...
I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
why was this ever a good idea? The cookie system should increase its hash
size or use a tree or something if lookups are a problem. Also the "fast
dcookie lookups" in oprofile should be moved into the dcookie code -- how
can oprofile possibly care about the dcookie_mutex? It gets dropped after
get_dcookie() returns so it can't be providing any sort of protection.
At 192 bytes, 21 objects fit into a 4K page, saving about 3MB on my system
with ~140 000 entries allocated. 192 is also a multiple of 64, so we get
nice cacheline alignment on 64 and 32 byte line systems -- any given dentry
will now require 3 cachelines to touch all fields wheras previously it
would require 4.
I know the inline name size was chosen quite carefully, however with the
reduction in cacheline footprint, it should actually be just about as fast
to do a name lookup for a 36 character name as it was before the patch (and
faster for other sizes). The memory footprint savings for names which are
<= 32 or > 36 bytes long should more than make up for the memory cost for
33-36 byte names.
Performance is a feature...
---
arch/powerpc/oprofile/cell/spu_task_sync.c | 2 +-
drivers/oprofile/buffer_sync.c | 2 +-
fs/dcache.c | 4 ----
fs/dcookies.c | 28 +++++++++++++++++++---------
include/linux/dcache.h | 21 ++++++++++++++-------
5 files changed, 35 insertions(+), 22 deletions(-)
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -75,14 +75,22 @@ full_name_hash(const unsigned char *name
return end_name_hash(hash);
}
-struct dcookie_struct;
-
-#define DNAME_INLINE_LEN_MIN 36
+/*
+ * Try to keep struct dentry aligned on 64 byte cachelines (this will
+ * give reasonable cacheline footprint with larger lines without the
+ * large memory footprint increase).
+ */
+#ifdef CONFIG_64BIT
+#define DNAME_INLINE_LEN_MIN 32 /* 192 bytes */
+#else
+#define DNAME_INLINE_LEN_MIN 40 /* 128 bytes */
+#endif
struct dentry {
atomic_t d_count;
unsigned int d_flags; /* protected by d_lock */
spinlock_t d_lock; /* per dentry lock */
+ int d_mounted;
struct inode *d_inode; /* Where the name belongs to - NULL is
* negative */
/*
@@ -107,10 +115,7 @@ struct dentry {
struct dentry_operations *d_op;
struct super_block *d_sb; /* The root of the dentry tree */
void *d_fsdata; /* fs-specific data */
-#ifdef CONFIG_PROFILING
- struct dcookie_struct *d_cookie; /* cookie, if any */
-#endif
- int d_mounted;
+
unsigned char d_iname[DNAME_INLINE_LEN_MIN]; /* small names */
};
@@ -177,6 +182,8 @@ d_iput: no no no yes
#define DCACHE_INOTIFY_PARENT_WATCHED 0x0020 /* Parent inode is watched */
+#define DCACHE_COOKIE 0x0040 /* For use by dcookie subsystem */
+
extern spinlock_t dcache_lock;
extern seqlock_t rename_lock;
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -34,7 +34,6 @@
#include <linux/bootmem.h>
#include "internal.h"
-
int sysctl_vfs_cache_pressure __read_mostly = 100;
EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
@@ -948,9 +947,6 @@ struct dentry *d_alloc(struct dentry * p
dentry->d_op = NULL;
dentry->d_fsdata = NULL;
dentry->d_mounted = 0;
-#ifdef CONFIG_PROFILING
- dentry->d_cookie = NULL;
-#endif
INIT_HLIST_NODE(&dentry->d_hash);
INIT_LIST_HEAD(&dentry->d_lru);
INIT_LIST_HEAD(&dentry->d_subdirs);
Index: linux-2.6/fs/dcookies.c
===================================================================
--- linux-2.6.orig/fs/dcookies.c
+++ linux-2.6/fs/dcookies.c
@@ -93,10 +93,15 @@ static struct dcookie_struct *alloc_dcoo
{
struct dcookie_struct *dcs = kmem_cache_alloc(dcookie_cache,
GFP_KERNEL);
+ struct dentry *d;
if (!dcs)
return NULL;
- path->dentry->d_cookie = dcs;
+ d = path->dentry;
+ spin_lock(&d->d_lock);
+ d->d_flags |= DCACHE_COOKIE;
+ spin_unlock(&d->d_lock);
+
dcs->path = *path;
path_get(path);
hash_dcookie(dcs);
@@ -119,14 +124,14 @@ int get_dcookie(struct path *path, unsig
goto out;
}
- dcs = path->dentry->d_cookie;
-
- if (!dcs)
+ if (path->dentry->d_flags & DCACHE_COOKIE) {
+ dcs = find_dcookie((unsigned long)path->dentry);
+ } else {
dcs = alloc_dcookie(path);
-
- if (!dcs) {
- err = -ENOMEM;
- goto out;
+ if (!dcs) {
+ err = -ENOMEM;
+ goto out;
+ }
}
*cookie = dcookie_value(dcs);
@@ -251,7 +256,12 @@ out_kmem:
static void free_dcookie(struct dcookie_struct * dcs)
{
- dcs->path.dentry->d_cookie = NULL;
+ struct dentry *d = dcs->path.dentry;
+
+ spin_lock(&d->d_lock);
+ d->d_flags &= ~DCACHE_COOKIE;
+ spin_unlock(&d->d_lock);
+
path_put(&dcs->path);
kmem_cache_free(dcookie_cache, dcs);
}
Index: linux-2.6/drivers/oprofile/buffer_sync.c
===================================================================
--- linux-2.6.orig/drivers/oprofile/buffer_sync.c
+++ linux-2.6/drivers/oprofile/buffer_sync.c
@@ -200,7 +200,7 @@ static inline unsigned long fast_get_dco
{
unsigned long cookie;
- if (path->dentry->d_cookie)
+ if (path->dentry->d_flags & DCACHE_COOKIE)
return (unsigned long)path->dentry;
get_dcookie(path, &cookie);
return cookie;
Index: linux-2.6/arch/powerpc/oprofile/cell/spu_task_sync.c
===================================================================
--- linux-2.6.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ linux-2.6/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -297,7 +297,7 @@ static inline unsigned long fast_get_dco
{
unsigned long cookie;
- if (path->dentry->d_cookie)
+ if (path->dentry->d_flags & DCACHE_COOKIE)
return (unsigned long)path->dentry;
get_dcookie(path, &cookie);
return cookie;
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [patch][rfc] fs: shrink struct dentry
2008-12-01 8:33 [patch][rfc] fs: shrink struct dentry Nick Piggin
@ 2008-12-01 11:09 ` Andi Kleen
2008-12-01 11:26 ` Nick Piggin
2008-12-01 17:51 ` John Levon
2008-12-10 6:00 ` [patch] " Nick Piggin
2 siblings, 1 reply; 14+ messages in thread
From: Andi Kleen @ 2008-12-01 11:09 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-fsdevel, Linux Memory Management List, oprofile-list
Nick Piggin <npiggin@suse.de> writes:
> Hi,
> Comments?
> Thanks,
> Nick
>
> --
> struct dentry is one of the most critical structures in the kernel. So it's
> sad to see it going neglected.
Very nice. But the sad thing is that such optimizations tend to quickly
bit rot again. At least add big fat comments.
How does it look like on 32bit hosts?
Since the size is variable depending on word size it might be a
good idea to auto adjust inline name length to always give a nice
end result for slab.
Also I think with some effort it would be possible to shrink it more.
But since you already reached cache lines, it would just allow
to increase inline name length. Ok perhaps it would help more on 32bit.
Further possibilities to shrink:
- Eliminate name.length. It seems of dubious utility
(in general I'm not sure struct qstr is all that useful)
- Change some of the children/alias list_heads to hlist_heads. I don't
think these lists typically need O(1) access to the end.
- If the maximum mount nest was limited d_mounted could migrate
into d_flags (that would be probably desparate)
-Andi
--
ak@linux.intel.com
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] fs: shrink struct dentry
2008-12-01 11:09 ` Andi Kleen
@ 2008-12-01 11:26 ` Nick Piggin
0 siblings, 0 replies; 14+ messages in thread
From: Nick Piggin @ 2008-12-01 11:26 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-fsdevel, Linux Memory Management List, oprofile-list
On Mon, Dec 01, 2008 at 12:09:12PM +0100, Andi Kleen wrote:
> Nick Piggin <npiggin@suse.de> writes:
>
> > Hi,
> > Comments?
> > Thanks,
> > Nick
> >
> > --
> > struct dentry is one of the most critical structures in the kernel. So it's
> > sad to see it going neglected.
>
> Very nice. But the sad thing is that such optimizations tend to quickly
> bit rot again. At least add big fat comments.
I was tempted to add a "Don't add anything to struct dentry" comment :)
> How does it look like on 32bit hosts?
Actually 32bit does not gain anything from packing d_mounted, but it
does benefit from removing d_cookie, so in that case I just added
another 4 bytes to the inline name length in order to keep it at 128
bytes.
Removing the CONFIG_PROFILING difference is quite nice because the
last thing you want is to try to profile something in the dcache and
find that cache access characteristics change when you enable
oprofile :P
> Since the size is variable depending on word size it might be a
> good idea to auto adjust inline name length to always give a nice
> end result for slab.
Yeah, well I didn't auto adjust, but I took care to try to make it
good values on each platform.
> Also I think with some effort it would be possible to shrink it more.
> But since you already reached cache lines, it would just allow
> to increase inline name length. Ok perhaps it would help more on 32bit.
>
> Further possibilities to shrink:
> - Eliminate name.length. It seems of dubious utility
> (in general I'm not sure struct qstr is all that useful)
> - Change some of the children/alias list_heads to hlist_heads. I don't
> think these lists typically need O(1) access to the end.
> - If the maximum mount nest was limited d_mounted could migrate
> into d_flags (that would be probably desparate)
Yeah, well it would always be nice to add more bytes to the name
length... CONFIG_TINY would probably not care about reaching cache
line sizes either, so small systems I'm sure would like to see
savings.
Thanks,
Nick
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] fs: shrink struct dentry
2008-12-01 8:33 [patch][rfc] fs: shrink struct dentry Nick Piggin
2008-12-01 11:09 ` Andi Kleen
@ 2008-12-01 17:51 ` John Levon
2008-12-01 18:04 ` Nick Piggin
2008-12-10 6:00 ` [patch] " Nick Piggin
2 siblings, 1 reply; 14+ messages in thread
From: John Levon @ 2008-12-01 17:51 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-fsdevel, Linux Memory Management List, oprofile-list
On Mon, Dec 01, 2008 at 09:33:43AM +0100, Nick Piggin wrote:
> I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
> why was this ever a good idea? The cookie system should increase its hash
> size or use a tree or something if lookups are a problem.
Are you saying you've made this change without even testing its
performance impact?
john
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] fs: shrink struct dentry
2008-12-01 17:51 ` John Levon
@ 2008-12-01 18:04 ` Nick Piggin
2008-12-01 19:38 ` John Levon
0 siblings, 1 reply; 14+ messages in thread
From: Nick Piggin @ 2008-12-01 18:04 UTC (permalink / raw)
To: John Levon
Cc: linux-fsdevel, Linux Memory Management List, robert.richter,
oprofile-list
On Mon, Dec 01, 2008 at 05:51:13PM +0000, John Levon wrote:
> On Mon, Dec 01, 2008 at 09:33:43AM +0100, Nick Piggin wrote:
>
> > I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
> > why was this ever a good idea? The cookie system should increase its hash
> > size or use a tree or something if lookups are a problem.
>
> Are you saying you've made this change without even testing its
> performance impact?
For oprofile case (maybe if you are profiling hundreds of vmas and
overflow the 4096 byte hash table), no. That case is uncommon and
must be fixed in the dcookie code (as I said, trivial with changing
data structure). I don't want this pointer in struct dentry
regardless of a possible tiny benefit for oprofile.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] fs: shrink struct dentry
2008-12-01 18:04 ` Nick Piggin
@ 2008-12-01 19:38 ` John Levon
2008-12-02 7:06 ` Nick Piggin
0 siblings, 1 reply; 14+ messages in thread
From: John Levon @ 2008-12-01 19:38 UTC (permalink / raw)
To: Nick Piggin
Cc: linux-fsdevel, Linux Memory Management List, robert.richter,
oprofile-list
On Mon, Dec 01, 2008 at 07:04:55PM +0100, Nick Piggin wrote:
> On Mon, Dec 01, 2008 at 05:51:13PM +0000, John Levon wrote:
> > On Mon, Dec 01, 2008 at 09:33:43AM +0100, Nick Piggin wrote:
> >
> > > I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
> > > why was this ever a good idea? The cookie system should increase its hash
> > > size or use a tree or something if lookups are a problem.
> >
> > Are you saying you've made this change without even testing its
> > performance impact?
>
> For oprofile case (maybe if you are profiling hundreds of vmas and
> overflow the 4096 byte hash table), no. That case is uncommon and
> must be fixed in the dcookie code (as I said, trivial with changing
> data structure). I don't want this pointer in struct dentry
> regardless of a possible tiny benefit for oprofile.
Don't you even have a differential profile showing the impact of
removing d_cookie? This hash table lookup will now happen on *every*
userspace sample that's processed. That's, uh, a lot.
(By all means make your change, but I don't get how it's OK to regress
other code, and provide no evidence at all as to its impact.)
john
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] fs: shrink struct dentry
2008-12-01 19:38 ` John Levon
@ 2008-12-02 7:06 ` Nick Piggin
2008-12-02 13:04 ` John Levon
0 siblings, 1 reply; 14+ messages in thread
From: Nick Piggin @ 2008-12-02 7:06 UTC (permalink / raw)
To: John Levon
Cc: linux-fsdevel, Linux Memory Management List, robert.richter,
oprofile-list
On Mon, Dec 01, 2008 at 07:38:18PM +0000, John Levon wrote:
> On Mon, Dec 01, 2008 at 07:04:55PM +0100, Nick Piggin wrote:
>
> > On Mon, Dec 01, 2008 at 05:51:13PM +0000, John Levon wrote:
> > > On Mon, Dec 01, 2008 at 09:33:43AM +0100, Nick Piggin wrote:
> > >
> > > > I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
> > > > why was this ever a good idea? The cookie system should increase its hash
> > > > size or use a tree or something if lookups are a problem.
> > >
> > > Are you saying you've made this change without even testing its
> > > performance impact?
> >
> > For oprofile case (maybe if you are profiling hundreds of vmas and
> > overflow the 4096 byte hash table), no. That case is uncommon and
> > must be fixed in the dcookie code (as I said, trivial with changing
> > data structure). I don't want this pointer in struct dentry
> > regardless of a possible tiny benefit for oprofile.
>
> Don't you even have a differential profile showing the impact of
> removing d_cookie? This hash table lookup will now happen on *every*
> userspace sample that's processed. That's, uh, a lot.
I don't know what you mean by every sample that's processed, but
won't the hash lookup only happen for the *first* time that a given
name is asked for a dcookie (ie. fast_get_dcookie, which, as I said,
should actually be moved to fs/dcookies.c).
If get_dcookie is called "a lot" of times, then this profiling code
is broken anyway. There is a global mutex in that function. It's bad
enough that it takes mmap_sem and does find_vma...
> (By all means make your change, but I don't get how it's OK to regress
> other code, and provide no evidence at all as to its impact.)
Tradeoffs are made all the time. This is obviously a good one, and
I provided evidence of the impact of the improvement in the common
case. I also acknowledge it can slow down the uncommon case, but
showed ways that can easily be improved. Do you want me to just try
to make an artificial case where I mmap thousands of tiny shared
libraries and try to overflow the hash and try to detect a difference?
Did you add d_cookie? If so, then surely at the time you must have
justified that with some numbers to show a significant improvement
to outweigh the clear downsides. Care to share? Then I might be able
to just reuse your test case.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] fs: shrink struct dentry
2008-12-02 7:06 ` Nick Piggin
@ 2008-12-02 13:04 ` John Levon
2008-12-02 13:49 ` Nick Piggin
0 siblings, 1 reply; 14+ messages in thread
From: John Levon @ 2008-12-02 13:04 UTC (permalink / raw)
To: Nick Piggin
Cc: linux-fsdevel, Linux Memory Management List, robert.richter,
oprofile-list
On Tue, Dec 02, 2008 at 08:06:08AM +0100, Nick Piggin wrote:
> > Don't you even have a differential profile showing the impact of
> > removing d_cookie? This hash table lookup will now happen on *every*
> > userspace sample that's processed. That's, uh, a lot.
>
> I don't know what you mean by every sample that's processed, but
> won't the hash lookup only happen for the *first* time that a given
> name is asked for a dcookie (ie. fast_get_dcookie, which, as I said,
> should actually be moved to fs/dcookies.c)
I mis-read your changes.
> > (By all means make your change, but I don't get how it's OK to regress
> > other code, and provide no evidence at all as to its impact.)
>
> Tradeoffs are made all the time. This is obviously a good one, and
^^^^^^^^^^^^^^^^^^^^
By all means make your change, but I don't get how it's OK to regress
other code, and provide no evidence at all as to its impact.
> I provided evidence of the impact of the improvement in the common
> case. I also acknowledge it can slow down the uncommon case, but
> showed ways that can easily be improved. Do you want me to just try
> to make an artificial case where I mmap thousands of tiny shared
> libraries and try to overflow the hash and try to detect a difference?
You haven't even bothered to show that it hasn't affected normal
oprofile use yet.
I can't believe I'm having to argue that you need to test your code. So
I think I'll stop.
> Did you add d_cookie? If so, then surely at the time you must have
It was added along with the rest of oprofile, so I don't have breakout
numbers. I did have oprofile overhead numbers, though I doubt I could
find them now.
john
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [patch][rfc] fs: shrink struct dentry
2008-12-02 13:04 ` John Levon
@ 2008-12-02 13:49 ` Nick Piggin
2008-12-02 14:49 ` John Levon
0 siblings, 1 reply; 14+ messages in thread
From: Nick Piggin @ 2008-12-02 13:49 UTC (permalink / raw)
To: John Levon
Cc: linux-fsdevel, Linux Memory Management List, robert.richter,
oprofile-list
On Tue, Dec 02, 2008 at 01:04:10PM +0000, John Levon wrote:
> On Tue, Dec 02, 2008 at 08:06:08AM +0100, Nick Piggin wrote:
>
> > > Don't you even have a differential profile showing the impact of
> > > removing d_cookie? This hash table lookup will now happen on *every*
> > > userspace sample that's processed. That's, uh, a lot.
> >
> > I don't know what you mean by every sample that's processed, but
> > won't the hash lookup only happen for the *first* time that a given
> > name is asked for a dcookie (ie. fast_get_dcookie, which, as I said,
> > should actually be moved to fs/dcookies.c)
>
> I mis-read your changes.
>
> > > (By all means make your change, but I don't get how it's OK to regress
> > > other code, and provide no evidence at all as to its impact.)
> >
> > Tradeoffs are made all the time. This is obviously a good one, and
> ^^^^^^^^^^^^^^^^^^^^
>
> By all means make your change, but I don't get how it's OK to regress
> other code, and provide no evidence at all as to its impact.
Provide me the test case you used to justify bloating struct dentry
in the first place. Then I will test and return you the numbers
after my patch.
> > I provided evidence of the impact of the improvement in the common
> > case. I also acknowledge it can slow down the uncommon case, but
> > showed ways that can easily be improved. Do you want me to just try
> > to make an artificial case where I mmap thousands of tiny shared
> > libraries and try to overflow the hash and try to detect a difference?
>
> You haven't even bothered to show that it hasn't affected normal
> oprofile use yet.
>
> I can't believe I'm having to argue that you need to test your code. So
> I think I'll stop.
Code was tested. It doesn't affect my normal oprofile usage (it's
utterly within the noise, in case that wasn't obvious to you). But
what is "normal" for oprofile? You must have some test case in mind.
Can you go back and read the original mail for god's sake? I'm not
arguing against anything of the sort. To start with, the patch is an
RFC, and I cc'ed it to you as the oprofile maintainer I thought you
might help me by saying "oh that should be fine because it is so
uncommon", or "better test this crazy type of workload that might
slowdown".
And secondly, I acknowledged the slowdown possibility in the first
mail and I provided 2 good possibilities that most slowdown should
be able to be eliminated anyway if we should find one.
> > Did you add d_cookie? If so, then surely at the time you must have
>
> It was added along with the rest of oprofile, so I don't have breakout
> numbers. I did have oprofile overhead numbers, though I doubt I could
> find them now.
No, I mean the overhead of adding d_cookie pointer to struct dentry
to get the d_cookie thing _directly_, rather than doing a data
structure lookup to get the value. So I'll repeat: you obviously must
have had some important case that showed really improved performance
from that d_cookie pointer to be able to justify the very significant
overhead. Please share it with me then I can test.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] fs: shrink struct dentry
2008-12-02 13:49 ` Nick Piggin
@ 2008-12-02 14:49 ` John Levon
2008-12-02 15:11 ` Nick Piggin
0 siblings, 1 reply; 14+ messages in thread
From: John Levon @ 2008-12-02 14:49 UTC (permalink / raw)
To: Nick Piggin; +Cc: linux-fsdevel, Linux Memory Management List, oprofile-list
On Tue, Dec 02, 2008 at 02:49:26PM +0100, Nick Piggin wrote:
> > I can't believe I'm having to argue that you need to test your code. So
> > I think I'll stop.
>
> Code was tested. It doesn't affect my normal oprofile usage (it's
> utterly within the noise, in case that wasn't obvious to you).
Then, heck, why didn't you say so?! I just went and read the whole
exchange and this is the first time you actually stated you tested the
impact of your patch on oprofile overhead.
It's in the noise, so it's fine.
regards
john
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch][rfc] fs: shrink struct dentry
2008-12-02 14:49 ` John Levon
@ 2008-12-02 15:11 ` Nick Piggin
0 siblings, 0 replies; 14+ messages in thread
From: Nick Piggin @ 2008-12-02 15:11 UTC (permalink / raw)
To: John Levon
Cc: linux-fsdevel, Linux Memory Management List, robert.richter,
oprofile-list
On Tue, Dec 02, 2008 at 02:49:18PM +0000, John Levon wrote:
> On Tue, Dec 02, 2008 at 02:49:26PM +0100, Nick Piggin wrote:
>
> > > I can't believe I'm having to argue that you need to test your code. So
> > > I think I'll stop.
> >
> > Code was tested. It doesn't affect my normal oprofile usage (it's
> > utterly within the noise, in case that wasn't obvious to you).
>
> Then, heck, why didn't you say so?! I just went and read the whole
> exchange and this is the first time you actually stated you tested the
> impact of your patch on oprofile overhead.
This was just in running some silly benchmark like oprofile+tbench,
but I'm fairly sure it a) probably didn't have many entries in the
cookies cache -- at least not enough to create big hash chains, and
b) wasn't hitting get_dcookie very often, and c) AFAIKS, all paths to
get_dcookie go through fast_get_dcookie so I actually can't see any
possible way that this patch could increase the number of hash lookups
anyway.
Given that, I was 99.9% sure it will be OK. But I like confirmation
from you.
I didn't do any major test where I try to force thousands of entries
into dcookie subsystem or anything, but if you care to give a test
case I can try.
mmap_sem and find_vma is much more annoying in oprofile :) Speaking of
which: is there a mode it can be set in to do kernel only profiling so
it does not bother with userspace samples at all? That would be really
nice for profiling the kernel in the kinds of workloads that hit
mmap_sem and vma cache often because oprofile interferes with that.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [patch] fs: shrink struct dentry
2008-12-01 8:33 [patch][rfc] fs: shrink struct dentry Nick Piggin
2008-12-01 11:09 ` Andi Kleen
2008-12-01 17:51 ` John Levon
@ 2008-12-10 6:00 ` Nick Piggin
2008-12-10 6:53 ` Al Viro
2 siblings, 1 reply; 14+ messages in thread
From: Nick Piggin @ 2008-12-10 6:00 UTC (permalink / raw)
To: Andrew Morton, linux-fsdevel, robert.richter, oprofile-list
OK, this is a new patch with updated changelog, and slightly tweaked
change to dcookie subsystem (now we'll *never* do any more hash lookups
on dcookie lookup, ignoring that "fast" dcookie lookups would effectively
mean my previous version wouldn't have done any more lookups anyway).
Andrew, please apply.
--
struct dentry is one of the most critical structures in the kernel.
With CONFIG_PROFILING turned on (which is probably the common case at least for
distros and kernel developers), sizeof(struct dcache) == 208 here (64-bit).
This gives 19 objects per slab.
Get rid of the d_cookie pointer. This shrinks it to 192 bytes. Why was this
ever a good idea? With the observation that dcookie == dentry pointer, there is
nothing that d_cookie provided that can't be found by other means. The only
thing it provides actually is a flag as to whether a dentry has a dcookie or
not. Use a new d_flag bit for this. Saves 4/8 bytes with CONFIG_PROFILING.
Pack d_mounted into a hole, and take another 4 bytes off the inline name length
to take the padding out from the end of the structure. This shinks it to 200
bytes. We could have gone the other way and increased the length to 40, but I'm
aiming for a magic number, read on...
At 192 bytes, 21 objects fit into a 4K page, saving about 3MB on my system with
~140 000 entries allocated. 192 is also a multiple of 64, so we get nice
cacheline alignment on 64 and 32 byte line systems -- any given dentry will now
require 3 cachelines to touch all fields wheras previously it would require 4.
32-bit systems can increase the inline name length to 40, giving them a nice
size of 128 bytes.
I know the inline name size was chosen quite carefully, however with the
reduction in cacheline footprint, it should actually be just about as fast to
do a name lookup for a 36 character name as it was before the patch (and faster
for other sizes). The memory footprint savings for names which are <= 32 or >
36 bytes long should more than make up for the memory cost for 33-36 byte
names.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
arch/powerpc/oprofile/cell/spu_task_sync.c | 2 -
drivers/oprofile/buffer_sync.c | 2 -
fs/dcache.c | 5 ----
fs/dcookies.c | 31 +++++++++++++++++------------
include/linux/dcache.h | 21 +++++++++++++------
5 files changed, 36 insertions(+), 25 deletions(-)
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -75,14 +75,22 @@ full_name_hash(const unsigned char *name
return end_name_hash(hash);
}
-struct dcookie_struct;
-
-#define DNAME_INLINE_LEN_MIN 36
+/*
+ * Keep struct dentry aligned on 64 byte cachelines (this will give
+ * reasonable cacheline footprint on larger lines, without the large
+ * memory footprint increase of full cacheline alignment).
+ */
+#ifdef CONFIG_64BIT
+#define DNAME_INLINE_LEN_MIN 32 /* 192 bytes */
+#else
+#define DNAME_INLINE_LEN_MIN 40 /* 128 bytes */
+#endif
struct dentry {
atomic_t d_count;
unsigned int d_flags; /* protected by d_lock */
spinlock_t d_lock; /* per dentry lock */
+ int d_mounted;
struct inode *d_inode; /* Where the name belongs to - NULL is
* negative */
/*
@@ -107,10 +115,7 @@ struct dentry {
struct dentry_operations *d_op;
struct super_block *d_sb; /* The root of the dentry tree */
void *d_fsdata; /* fs-specific data */
-#ifdef CONFIG_PROFILING
- struct dcookie_struct *d_cookie; /* cookie, if any */
-#endif
- int d_mounted;
+
unsigned char d_iname[DNAME_INLINE_LEN_MIN]; /* small names */
};
@@ -177,6 +182,8 @@ d_iput: no no no yes
#define DCACHE_INOTIFY_PARENT_WATCHED 0x0020 /* Parent inode is watched */
+#define DCACHE_COOKIE 0x0040 /* For use by dcookie subsystem */
+
extern spinlock_t dcache_lock;
extern seqlock_t rename_lock;
Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -38,7 +38,7 @@
int sysctl_vfs_cache_pressure __read_mostly = 100;
EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
- __cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
+__cacheline_aligned_in_smp DEFINE_SPINLOCK(dcache_lock);
__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
EXPORT_SYMBOL(dcache_lock);
@@ -948,9 +948,6 @@ struct dentry *d_alloc(struct dentry * p
dentry->d_op = NULL;
dentry->d_fsdata = NULL;
dentry->d_mounted = 0;
-#ifdef CONFIG_PROFILING
- dentry->d_cookie = NULL;
-#endif
INIT_HLIST_NODE(&dentry->d_hash);
INIT_LIST_HEAD(&dentry->d_lru);
INIT_LIST_HEAD(&dentry->d_subdirs);
Index: linux-2.6/fs/dcookies.c
===================================================================
--- linux-2.6.orig/fs/dcookies.c
+++ linux-2.6/fs/dcookies.c
@@ -93,10 +93,15 @@ static struct dcookie_struct *alloc_dcoo
{
struct dcookie_struct *dcs = kmem_cache_alloc(dcookie_cache,
GFP_KERNEL);
+ struct dentry *d;
if (!dcs)
return NULL;
- path->dentry->d_cookie = dcs;
+ d = path->dentry;
+ spin_lock(&d->d_lock);
+ d->d_flags |= DCACHE_COOKIE;
+ spin_unlock(&d->d_lock);
+
dcs->path = *path;
path_get(path);
hash_dcookie(dcs);
@@ -110,7 +115,6 @@ static struct dcookie_struct *alloc_dcoo
int get_dcookie(struct path *path, unsigned long *cookie)
{
int err = 0;
- struct dcookie_struct * dcs;
mutex_lock(&dcookie_mutex);
@@ -119,17 +123,15 @@ int get_dcookie(struct path *path, unsig
goto out;
}
- dcs = path->dentry->d_cookie;
-
- if (!dcs)
- dcs = alloc_dcookie(path);
-
- if (!dcs) {
- err = -ENOMEM;
- goto out;
+ if (!path->dentry->d_flags & DCACHE_COOKIE) {
+ if (!alloc_dcookie(path)) {
+ err = -ENOMEM;
+ goto out;
+ }
+ BUG_ON(!(path->dentry->d_flags & DCACHE_COOKIE));
}
- *cookie = dcookie_value(dcs);
+ *cookie = (unsigned long)path->dentry;
out:
mutex_unlock(&dcookie_mutex);
@@ -251,7 +253,12 @@ out_kmem:
static void free_dcookie(struct dcookie_struct * dcs)
{
- dcs->path.dentry->d_cookie = NULL;
+ struct dentry *d = dcs->path.dentry;
+
+ spin_lock(&d->d_lock);
+ d->d_flags &= ~DCACHE_COOKIE;
+ spin_unlock(&d->d_lock);
+
path_put(&dcs->path);
kmem_cache_free(dcookie_cache, dcs);
}
Index: linux-2.6/drivers/oprofile/buffer_sync.c
===================================================================
--- linux-2.6.orig/drivers/oprofile/buffer_sync.c
+++ linux-2.6/drivers/oprofile/buffer_sync.c
@@ -200,7 +200,7 @@ static inline unsigned long fast_get_dco
{
unsigned long cookie;
- if (path->dentry->d_cookie)
+ if (path->dentry->d_flags & DCACHE_COOKIE)
return (unsigned long)path->dentry;
get_dcookie(path, &cookie);
return cookie;
Index: linux-2.6/arch/powerpc/oprofile/cell/spu_task_sync.c
===================================================================
--- linux-2.6.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
+++ linux-2.6/arch/powerpc/oprofile/cell/spu_task_sync.c
@@ -297,7 +297,7 @@ static inline unsigned long fast_get_dco
{
unsigned long cookie;
- if (path->dentry->d_cookie)
+ if (path->dentry->d_flags & DCACHE_COOKIE)
return (unsigned long)path->dentry;
get_dcookie(path, &cookie);
return cookie;
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [patch] fs: shrink struct dentry
2008-12-10 6:00 ` [patch] " Nick Piggin
@ 2008-12-10 6:53 ` Al Viro
2008-12-10 7:19 ` Nick Piggin
0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2008-12-10 6:53 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, linux-fsdevel, robert.richter, oprofile-list
On Wed, Dec 10, 2008 at 07:00:48AM +0100, Nick Piggin wrote:
> OK, this is a new patch with updated changelog, and slightly tweaked
> change to dcookie subsystem (now we'll *never* do any more hash lookups
> on dcookie lookup, ignoring that "fast" dcookie lookups would effectively
> mean my previous version wouldn't have done any more lookups anyway).
I agree with reordering part, and I don't have too serious objections
against the rest, but...
Just what the bleeding fsck is that stuff doing to paths?
a) get two instances of fs mounted somewhere (or just have a binding, or
have two namespaces) and you'll get very funny results from dcookie -
it'll stick with vfsmount it had run into originally. Nevermind that
it might be not reachable for you. Or reachable for anyone, for that
matter, since the namespace it had been in might be long gone by now.
b) when is it evicted, anyway? We are stuck with a bunch of randomly
selected pinned dentries and vfsmounts...
c) may I have whatever dcookie_user inventor had been smoking?
struct dcookie_user {
struct list_head next;
};
with a cyclic list going through those and the only use of that list
is "is it empty?" checks. That's one hell of a way to implement a
counter, of course, but...
Anyway, I'll put it into VFS queue.
Al, currently one turd away from being through with audit queue for
the next merge window...
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [patch] fs: shrink struct dentry
2008-12-10 6:53 ` Al Viro
@ 2008-12-10 7:19 ` Nick Piggin
0 siblings, 0 replies; 14+ messages in thread
From: Nick Piggin @ 2008-12-10 7:19 UTC (permalink / raw)
To: Al Viro; +Cc: Andrew Morton, linux-fsdevel, robert.richter, oprofile-list
On Wed, Dec 10, 2008 at 06:53:21AM +0000, Al Viro wrote:
> On Wed, Dec 10, 2008 at 07:00:48AM +0100, Nick Piggin wrote:
> > OK, this is a new patch with updated changelog, and slightly tweaked
> > change to dcookie subsystem (now we'll *never* do any more hash lookups
> > on dcookie lookup, ignoring that "fast" dcookie lookups would effectively
> > mean my previous version wouldn't have done any more lookups anyway).
>
> I agree with reordering part, and I don't have too serious objections
> against the rest, but...
>
> Just what the bleeding fsck is that stuff doing to paths?
Honestly, I tried to look at as little dcookie / oprofile code as
possible... These are good questions for the oprofile people cced,
though.
> a) get two instances of fs mounted somewhere (or just have a binding, or
> have two namespaces) and you'll get very funny results from dcookie -
> it'll stick with vfsmount it had run into originally. Nevermind that
> it might be not reachable for you. Or reachable for anyone, for that
> matter, since the namespace it had been in might be long gone by now.
Good question. What's the high level requirement? To look up a path
name from a lighter-weight data item? Should it rather be a pcookie
subsystem? Anyway, whatever the case, it really must create and
manage its own data structures to perform the relevant mappings,
rather than add fields to existing structures, I think.
> b) when is it evicted, anyway? We are stuck with a bunch of randomly
> selected pinned dentries and vfsmounts...
When oprofile daemon stops sampling, AFAIKT. Yes there will be
randomly pinned executable files until then.
> c) may I have whatever dcookie_user inventor had been smoking?
> struct dcookie_user {
> struct list_head next;
> };
> with a cyclic list going through those and the only use of that list
> is "is it empty?" checks. That's one hell of a way to implement a
> counter, of course, but...
Heh. Maybe they had some grand plans for it.
> Anyway, I'll put it into VFS queue.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-12-10 7:20 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-01 8:33 [patch][rfc] fs: shrink struct dentry Nick Piggin
2008-12-01 11:09 ` Andi Kleen
2008-12-01 11:26 ` Nick Piggin
2008-12-01 17:51 ` John Levon
2008-12-01 18:04 ` Nick Piggin
2008-12-01 19:38 ` John Levon
2008-12-02 7:06 ` Nick Piggin
2008-12-02 13:04 ` John Levon
2008-12-02 13:49 ` Nick Piggin
2008-12-02 14:49 ` John Levon
2008-12-02 15:11 ` Nick Piggin
2008-12-10 6:00 ` [patch] " Nick Piggin
2008-12-10 6:53 ` Al Viro
2008-12-10 7:19 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).