* [PATCH] Properly init address_space->writeback_index
@ 2008-08-14 18:13 Chris Mason
2008-08-14 19:05 ` Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: Chris Mason @ 2008-08-14 18:13 UTC (permalink / raw)
To: linux-fsdevel, Andrew Morton
write_cache_pages uses i_mapping->writeback_index to pick up where it
left off the last time a given inode was found by pdflush or
balance_dirty_pages (or anyone else who sets wbc->range_cyclic)
alloc_inode should set it to a sane value so that writeback doesn't start
in the middle of a file. It is somewhat difficult to notice the bug since
write_cache_pages will loop around to the start of the file and the
elevator helps hide the resulting seeks.
For whatever reason, Btrfs hits this often. Unpatched, untarring 30 copies of
the linux kernel in series runs at 47MB/s on a single sata drive. With
this fix, it jumps to 62MB/s.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
diff -r 67160ae21a58 fs/inode.c
--- a/fs/inode.c Wed Aug 06 19:26:20 2008 -0700
+++ b/fs/inode.c Thu Aug 14 10:15:49 2008 -0400
@@ -166,6 +166,7 @@ static struct inode *alloc_inode(struct
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
mapping->assoc_mapping = NULL;
mapping->backing_dev_info = &default_backing_dev_info;
+ mapping->writeback_index = 0;
/*
* If the block_device provides a backing_dev_info for client
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Properly init address_space->writeback_index
2008-08-14 18:13 [PATCH] Properly init address_space->writeback_index Chris Mason
@ 2008-08-14 19:05 ` Andrew Morton
2008-08-14 19:35 ` Chris Mason
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2008-08-14 19:05 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-fsdevel
On Thu, 14 Aug 2008 14:13:40 -0400
Chris Mason <chris.mason@oracle.com> wrote:
> write_cache_pages uses i_mapping->writeback_index to pick up where it
> left off the last time a given inode was found by pdflush or
> balance_dirty_pages (or anyone else who sets wbc->range_cyclic)
>
> alloc_inode should set it to a sane value so that writeback doesn't start
> in the middle of a file. It is somewhat difficult to notice the bug since
> write_cache_pages will loop around to the start of the file and the
> elevator helps hide the resulting seeks.
>
> For whatever reason, Btrfs hits this often. Unpatched, untarring 30 copies of
> the linux kernel in series runs at 47MB/s on a single sata drive. With
> this fix, it jumps to 62MB/s.
>
> Signed-off-by: Chris Mason <chris.mason@oracle.com>
>
> diff -r 67160ae21a58 fs/inode.c
> --- a/fs/inode.c Wed Aug 06 19:26:20 2008 -0700
> +++ b/fs/inode.c Thu Aug 14 10:15:49 2008 -0400
> @@ -166,6 +166,7 @@ static struct inode *alloc_inode(struct
> mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
> mapping->assoc_mapping = NULL;
> mapping->backing_dev_info = &default_backing_dev_info;
> + mapping->writeback_index = 0;
>
> /*
> * If the block_device provides a backing_dev_info for client
>
inode_cachep's ctor memsets the whole inode* to zero.
Does btrfs have an ->alloc_inode handler? If so, perhaps it is btrfs
which needs to zap that field.
Formally, the way slab works (which is arguably silly from a CPU cache
efficiency POV) is that the object should be returned to slab
(kmem_cache_free()) in a "constructed" manner. Which means that this
zeroing should happen in destroy_inode().
otoh, the fs/inode.c inode creation/destruction code isn't designed
that way - it chose to initialise everything synchronously in
alloc_inode(), which is arguably wrong.
One wonders if that big memset in inode_init_once() actually does
anything useful.
Ho hum, so what to do? I'm suspecting that we should just zap the
kmem_cachep constructor altogether, use a synchronous call to
inode_init_once() to initialise all inode fields and then remove all
those open-coded zeroings.
Something like this?
From: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
fs/inode.c | 40 +++++++++-------------------------------
1 file changed, 9 insertions(+), 31 deletions(-)
diff -puN fs/inode.c~a fs/inode.c
--- a/fs/inode.c~a
+++ a/fs/inode.c
@@ -115,34 +115,24 @@ static struct inode *alloc_inode(struct
static const struct file_operations empty_fops;
struct inode *inode;
- if (sb->s_op->alloc_inode)
+ if (sb->s_op->alloc_inode) {
inode = sb->s_op->alloc_inode(sb);
- else
- inode = (struct inode *) kmem_cache_alloc(inode_cachep, GFP_KERNEL);
+ } else {
+ inode = kmem_cache_alloc(inode_cachep, GFP_KERNEL);
+ if (inode)
+ inode_init_once(inode);
+ }
if (inode) {
struct address_space * const mapping = &inode->i_data;
inode->i_sb = sb;
inode->i_blkbits = sb->s_blocksize_bits;
- inode->i_flags = 0;
atomic_set(&inode->i_count, 1);
inode->i_op = &empty_iops;
inode->i_fop = &empty_fops;
inode->i_nlink = 1;
atomic_set(&inode->i_writecount, 0);
- inode->i_size = 0;
- inode->i_blocks = 0;
- inode->i_bytes = 0;
- inode->i_generation = 0;
-#ifdef CONFIG_QUOTA
- memset(&inode->i_dquot, 0, sizeof(inode->i_dquot));
-#endif
- inode->i_pipe = NULL;
- inode->i_bdev = NULL;
- inode->i_cdev = NULL;
- inode->i_rdev = 0;
- inode->dirtied_when = 0;
if (security_inode_alloc(inode)) {
if (inode->i_sb->s_op->destroy_inode)
inode->i_sb->s_op->destroy_inode(inode);
@@ -158,15 +148,13 @@ static struct inode *alloc_inode(struct
lockdep_set_class(&inode->i_mutex, &sb->s_type->i_mutex_key);
init_rwsem(&inode->i_alloc_sem);
- lockdep_set_class(&inode->i_alloc_sem, &sb->s_type->i_alloc_sem_key);
+ lockdep_set_class(&inode->i_alloc_sem,
+ &sb->s_type->i_alloc_sem_key);
mapping->a_ops = &empty_aops;
mapping->host = inode;
- mapping->flags = 0;
mapping_set_gfp_mask(mapping, GFP_HIGHUSER_PAGECACHE);
- mapping->assoc_mapping = NULL;
mapping->backing_dev_info = &default_backing_dev_info;
- mapping->writeback_index = 0;
/*
* If the block_device provides a backing_dev_info for client
@@ -181,7 +169,6 @@ static struct inode *alloc_inode(struct
bdi = sb->s_bdev->bd_inode->i_mapping->backing_dev_info;
mapping->backing_dev_info = bdi;
}
- inode->i_private = NULL;
inode->i_mapping = mapping;
}
return inode;
@@ -197,7 +184,6 @@ void destroy_inode(struct inode *inode)
kmem_cache_free(inode_cachep, (inode));
}
-
/*
* These are initializations that only need to be done
* once, because the fields are idempotent across use
@@ -222,16 +208,8 @@ void inode_init_once(struct inode *inode
mutex_init(&inode->inotify_mutex);
#endif
}
-
EXPORT_SYMBOL(inode_init_once);
-static void init_once(void *foo)
-{
- struct inode * inode = (struct inode *) foo;
-
- inode_init_once(inode);
-}
-
/*
* inode_lock must be held
*/
@@ -1400,7 +1378,7 @@ void __init inode_init(void)
0,
(SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
SLAB_MEM_SPREAD),
- init_once);
+ NULL);
register_shrinker(&icache_shrinker);
/* Hash may have been set up in inode_init_early */
_
Note that this assumes that a filesystem which implements
->alloc_inode() will call inode_init_once() within its ->alloc_inode().
Which means, I think, that we can just remove inode_init_once()
altogether and move its initialisations into alloc_inode() along with
all the existing ones.
What do you think?
I suppose that's all a fun project but doesn't preclude merging your
patchlet. I'll do that for now ;)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Properly init address_space->writeback_index
2008-08-14 19:05 ` Andrew Morton
@ 2008-08-14 19:35 ` Chris Mason
2008-08-14 19:55 ` Andrew Morton
0 siblings, 1 reply; 4+ messages in thread
From: Chris Mason @ 2008-08-14 19:35 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-fsdevel
On Thu, 2008-08-14 at 12:05 -0700, Andrew Morton wrote:
> On Thu, 14 Aug 2008 14:13:40 -0400
> Chris Mason <chris.mason@oracle.com> wrote:
>
> > write_cache_pages uses i_mapping->writeback_index to pick up where it
> > left off the last time a given inode was found by pdflush or
> > balance_dirty_pages (or anyone else who sets wbc->range_cyclic)
> >
> > alloc_inode should set it to a sane value so that writeback doesn't start
> > in the middle of a file. It is somewhat difficult to notice the bug since
> > write_cache_pages will loop around to the start of the file and the
> > elevator helps hide the resulting seeks.
> >
> > For whatever reason, Btrfs hits this often. Unpatched, untarring 30 copies of
> > the linux kernel in series runs at 47MB/s on a single sata drive. With
> > this fix, it jumps to 62MB/s.
[ ... ]
> inode_cachep's ctor memsets the whole inode* to zero.
>
> Does btrfs have an ->alloc_inode handler? If so, perhaps it is btrfs
> which needs to zap that field.
>
Yes, but I wasn't touching ->i_mapping at all. I do have a patch in
btrfs now to zero the writeback_index field for now though.
> Formally, the way slab works (which is arguably silly from a CPU cache
> efficiency POV) is that the object should be returned to slab
> (kmem_cache_free()) in a "constructed" manner. Which means that this
> zeroing should happen in destroy_inode().
>
> otoh, the fs/inode.c inode creation/destruction code isn't designed
> that way - it chose to initialise everything synchronously in
> alloc_inode(), which is arguably wrong.
>
At least for filesystems, it seems much less error prone to force us to
init on alloc.
> One wonders if that big memset in inode_init_once() actually does
> anything useful.
>
>
> Ho hum, so what to do? I'm suspecting that we should just zap the
> kmem_cachep constructor altogether, use a synchronous call to
> inode_init_once() to initialise all inode fields and then remove all
> those open-coded zeroings.
>
>
> Something like this?
>
[ looks good ]
>
> Note that this assumes that a filesystem which implements
> ->alloc_inode() will call inode_init_once() within its ->alloc_inode().
>
> Which means, I think, that we can just remove inode_init_once()
> altogether and move its initialisations into alloc_inode() along with
> all the existing ones.
>
> What do you think?
>
There's the silent breakage of out of tree FS, but, your patch seems
much cleaner to me.
>
> I suppose that's all a fun project but doesn't preclude merging your
> patchlet. I'll do that for now ;)
>
;) thanks.
-chris
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Properly init address_space->writeback_index
2008-08-14 19:35 ` Chris Mason
@ 2008-08-14 19:55 ` Andrew Morton
0 siblings, 0 replies; 4+ messages in thread
From: Andrew Morton @ 2008-08-14 19:55 UTC (permalink / raw)
To: Chris Mason; +Cc: linux-fsdevel
On Thu, 14 Aug 2008 15:35:33 -0400
Chris Mason <chris.mason@oracle.com> wrote:
> >
> > Note that this assumes that a filesystem which implements
> > ->alloc_inode() will call inode_init_once() within its ->alloc_inode().
> >
> > Which means, I think, that we can just remove inode_init_once()
> > altogether and move its initialisations into alloc_inode() along with
> > all the existing ones.
> >
> > What do you think?
> >
>
> There's the silent breakage of out of tree FS,
We could leave
void inode_init_once(struct inode *inode)
{
WARN_ON_ONCE(1);
}
EXPORT_SYMBOL(inode_init_once);
in place for a while.
> but, your patch seems
> much cleaner to me.
It has the downside that we'll need to reinitialise all those
list_heads on each inode allocation, whereas currently that only
happens at slab-page-allocation time. Trade that off against all the
open-coded zeroings which got removed then it's probably a wash from a
performance POV.
Or not. A memset followed by a sprinkle of random writes might be
faster than no-memset followed by a sprinkle of random writes. Or not ;)
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-08-14 19:55 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-14 18:13 [PATCH] Properly init address_space->writeback_index Chris Mason
2008-08-14 19:05 ` Andrew Morton
2008-08-14 19:35 ` Chris Mason
2008-08-14 19:55 ` Andrew Morton
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.