From: Wu Fengguang <fengguang.wu@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Arjan van de Ven <arjan@infradead.org>,
Jens Axboe <jens.axboe@oracle.com>,
"Li, Shaohua" <shaohua.li@intel.com>,
lkml <linux-kernel@vger.kernel.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Andrew Morton <akpm@linux-foundation.org>,
Chris Mason <chris.mason@oracle.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Jan Kara <jack@suse.cz>
Subject: Re: [RFC] page-writeback: move indoes from one superblock together
Date: Fri, 25 Sep 2009 13:09:18 +0800 [thread overview]
Message-ID: <20090925050918.GA25501@localhost> (raw)
In-Reply-To: <20090925041619.GB9464@discord.disaster>
On Fri, Sep 25, 2009 at 12:16:19PM +0800, Dave Chinner wrote:
> On Thu, Sep 24, 2009 at 10:09:19PM +0800, Wu Fengguang wrote:
> > On Thu, Sep 24, 2009 at 09:52:17PM +0800, Arjan van de Ven wrote:
> > > On Thu, 24 Sep 2009 21:46:25 +0800
> > > Wu Fengguang <fengguang.wu@intel.com> wrote:
> > > >
> > > > Note that dirty_time may not be unique, so need some workaround. And
> > > > the resulted rbtree implementation may not be more efficient than
> > > > several list traversals even for a very large list (as long as
> > > > superblocks numbers are low).
> > > >
> > > > The good side is, once sb+dirty_time rbtree is implemented, it should
> > > > be trivial to switch the key to sb+inode_number (also may not be
> > > > unique), and to do location ordered writeback ;)
> > >
> > > would you want to sort by dirty time, or by inode number?
> > > (assuming inode number is loosely related to location on disk)
> >
> > Sort by inode number; dirty time will also be considered when judging
> > whether the traversed inode is old enough(*) to be eligible for writeback.
>
> Even if the inode number is directly related to location on disk
> (like for XFS), there is no guarantee that the data or related
> metadata (indirect blocks) writeback location is in any way related
> to the inode number. e.g when using the 32 bit allocator on XFS
> (default for > 1TB filesystems), there is _zero correlation_ between
> the inode number and the data location. Hence writeback by inode
> number will not improve writeback patterns at all.
The location ordering is mainly an optimization for _small files_.
So no indirect blocks. A good filesystem will put metadata+data as
close as possible for small files. Is that true for XFS?
> Only the filesystem knows what the best writeback pattern really is;
> any change is going to affect filesystems differently.
>
> > The more detailed algorithm would be:
> >
> > - put inodes to rbtree with key sb+inode_number
> > - in each per-5s writeback, traverse a range of 1/5 rbtree
> > - in each traverse, sync inodes that is dirtied more than 5s ago
> >
> > So the user visible result would be
> > - on every 5s, roughly a 1/5 disk area will be visited
> > - for each dirtied inode, it will be synced after 5-30s
>
> Personally, I'd prefer that writeback calls a vector that says
> "writeback inodes older than N" and implement something like the
> above as the generic mechanism. That way filesystems can override
> the generic algorithm if there is a better way to track and write
> back dirty inodes for that filesystem.
We have wbc->older_than_this. Is it good enough for XFS?
Thanks,
Fengguang
prev parent reply other threads:[~2009-09-25 5:09 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-24 6:54 [RFC] page-writeback: move indoes from one superblock together Shaohua Li
2009-09-24 7:14 ` Wu Fengguang
2009-09-24 7:29 ` Arjan van de Ven
2009-09-24 7:36 ` Wu Fengguang
2009-09-24 7:44 ` Shaohua Li
2009-09-24 13:17 ` Jens Axboe
2009-09-24 13:29 ` Wu Fengguang
2009-09-24 10:01 ` Wu Fengguang
2009-09-24 12:35 ` Jens Axboe
2009-09-24 13:22 ` Wu Fengguang
2009-09-24 13:29 ` Jens Axboe
2009-09-24 13:46 ` Wu Fengguang
2009-09-24 13:52 ` Arjan van de Ven
2009-09-24 14:09 ` Wu Fengguang
2009-09-25 4:16 ` Dave Chinner
2009-09-25 5:09 ` Wu Fengguang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090925050918.GA25501@localhost \
--to=fengguang.wu@intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=chris.mason@oracle.com \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=shaohua.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.