From: Chris Mason <chris.mason@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] Properly init address_space->writeback_index
Date: Thu, 14 Aug 2008 15:35:33 -0400 [thread overview]
Message-ID: <1218742533.15342.409.camel@think.oraclecorp.com> (raw)
In-Reply-To: <20080814120533.2c8a8da3.akpm@linux-foundation.org>
On Thu, 2008-08-14 at 12:05 -0700, Andrew Morton wrote:
> On Thu, 14 Aug 2008 14:13:40 -0400
> Chris Mason <chris.mason@oracle.com> wrote:
>
> > write_cache_pages uses i_mapping->writeback_index to pick up where it
> > left off the last time a given inode was found by pdflush or
> > balance_dirty_pages (or anyone else who sets wbc->range_cyclic)
> >
> > alloc_inode should set it to a sane value so that writeback doesn't start
> > in the middle of a file. It is somewhat difficult to notice the bug since
> > write_cache_pages will loop around to the start of the file and the
> > elevator helps hide the resulting seeks.
> >
> > For whatever reason, Btrfs hits this often. Unpatched, untarring 30 copies of
> > the linux kernel in series runs at 47MB/s on a single sata drive. With
> > this fix, it jumps to 62MB/s.
[ ... ]
> inode_cachep's ctor memsets the whole inode* to zero.
>
> Does btrfs have an ->alloc_inode handler? If so, perhaps it is btrfs
> which needs to zap that field.
>
Yes, but I wasn't touching ->i_mapping at all. I do have a patch in
btrfs now to zero the writeback_index field for now though.
> Formally, the way slab works (which is arguably silly from a CPU cache
> efficiency POV) is that the object should be returned to slab
> (kmem_cache_free()) in a "constructed" manner. Which means that this
> zeroing should happen in destroy_inode().
>
> otoh, the fs/inode.c inode creation/destruction code isn't designed
> that way - it chose to initialise everything synchronously in
> alloc_inode(), which is arguably wrong.
>
At least for filesystems, it seems much less error prone to force us to
init on alloc.
> One wonders if that big memset in inode_init_once() actually does
> anything useful.
>
>
> Ho hum, so what to do? I'm suspecting that we should just zap the
> kmem_cachep constructor altogether, use a synchronous call to
> inode_init_once() to initialise all inode fields and then remove all
> those open-coded zeroings.
>
>
> Something like this?
>
[ looks good ]
>
> Note that this assumes that a filesystem which implements
> ->alloc_inode() will call inode_init_once() within its ->alloc_inode().
>
> Which means, I think, that we can just remove inode_init_once()
> altogether and move its initialisations into alloc_inode() along with
> all the existing ones.
>
> What do you think?
>
There's the silent breakage of out of tree FS, but, your patch seems
much cleaner to me.
>
> I suppose that's all a fun project but doesn't preclude merging your
> patchlet. I'll do that for now ;)
>
;) thanks.
-chris
next prev parent reply other threads:[~2008-08-14 19:46 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-14 18:13 [PATCH] Properly init address_space->writeback_index Chris Mason
2008-08-14 19:05 ` Andrew Morton
2008-08-14 19:35 ` Chris Mason [this message]
2008-08-14 19:55 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1218742533.15342.409.camel@think.oraclecorp.com \
--to=chris.mason@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.