From: David Chinner <dgc@sgi.com>
To: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michael Rubin <mrubin@google.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch] Converting writeback linked lists to a tree based data structure
Date: Thu, 17 Jan 2008 09:35:10 +1100 [thread overview]
Message-ID: <20080116223510.GY155407@sgi.com> (raw)
In-Reply-To: <E1JF4Ey-0000x4-5p@localhost.localdomain>
On Wed, Jan 16, 2008 at 05:07:20PM +0800, Fengguang Wu wrote:
> On Tue, Jan 15, 2008 at 09:51:49PM -0800, Andrew Morton wrote:
> > > Then to do better ordering by adopting radix tree(or rbtree
> > > if radix tree is not enough),
> >
> > ordering of what?
>
> Switch from time to location.
Note that data writeback may be adversely affected by location
based writeback rather than time based writeback - think of
the effect of location based data writeback on an app that
creates lots of short term (<30s) temp files and then removes
them before they are written back.
Also, data writeback locatio cannot be easily derived from
the inode number in pretty much all cases. "near" in terms
of XFS means the same AG which means the data could be up to
a TB away from the inode, and if you have >1TB filesystems
usingthe default inode32 allocator, file data is *never*
placed near the inode - the inodes are in the first TB of
the filesystem, the data is rotored around the rest of the
filesystem.
And with delayed allocation, you don't know where the data is even
going to be written ahead of the filesystem ->writepage call, so you
can't do optimal location ordering for data in this case.
> > > and lastly get rid of the list_heads to
> > > avoid locking. Does it sound like a good path?
> >
> > I'd have thaought that replacing list_heads with another data structure
> > would be a simgle commit.
>
> That would be easy. s_more_io and s_more_io_wait can all be converted
> to radix trees.
Makes sense for location based writeback of the inodes themselves,
but not for data.
Hmmmm - I'm wondering if we'd do better to split data writeback from
inode writeback. i.e. we do two passes. The first pass writes all
the data back in time order, the second pass writes all the inodes
back in location order.
Right now we interleave data and inode writeback, (i.e. we do data,
inode, data, inode, data, inode, ....). I'd much prefer to see all
data written out first, then the inodes. ->writepage often dirties
the inode and hence if we need to do multiple do_writepages() calls
on an inode to flush all the data (e.g. congestion, large amounts of
data to be written, etc), we really shouldn't be calling
write_inode() after every do_writepages() call. The inode
should not be written until all the data is written....
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-01-16 22:35 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-15 8:09 [patch] Converting writeback linked lists to a tree based data structure Michael Rubin, Michael Rubin
2008-01-15 8:46 ` Peter Zijlstra
2008-01-15 17:53 ` Michael Rubin
[not found] ` <E1JEyWa-0001Ys-F9@localhost.localdomain>
2008-01-16 3:01 ` Fengguang Wu
2008-01-16 3:44 ` Andrew Morton
[not found] ` <E1JEzqb-0003YX-Rg@localhost.localdomain>
2008-01-16 4:25 ` Fengguang Wu
2008-01-16 4:42 ` Andrew Morton
[not found] ` <E1JF0It-0000yD-Mi@localhost.localdomain>
2008-01-16 4:55 ` Fengguang Wu
2008-01-16 5:51 ` Andrew Morton
[not found] ` <E1JF4Ey-0000x4-5p@localhost.localdomain>
2008-01-16 9:07 ` Fengguang Wu
2008-01-18 7:36 ` Mike Waychison
2008-01-16 22:35 ` David Chinner [this message]
[not found] ` <E1JFLEW-0002oE-G1@localhost.localdomain>
2008-01-17 3:16 ` Fengguang Wu
2008-01-17 5:21 ` David Chinner
2008-01-16 7:55 ` David Chinner
2008-01-16 8:13 ` Andrew Morton
[not found] ` <E1JF7yp-0006l8-5P@localhost.localdomain>
2008-01-16 13:06 ` Fengguang Wu
2008-01-16 18:55 ` Michael Rubin
[not found] ` <E1JFLTR-0002pn-4Y@localhost.localdomain>
2008-01-17 3:31 ` Fengguang Wu
[not found] ` <E1JFRFm-00011Q-0q@localhost.localdomain>
2008-01-17 9:41 ` Fengguang Wu
2008-01-17 21:07 ` Michael Rubin
[not found] ` <E1JFjGz-0001eU-3O@localhost.localdomain>
2008-01-18 4:56 ` Fengguang Wu
2008-01-18 5:41 ` Andi Kleen
[not found] ` <E1JFkHy-0001jR-VD@localhost.localdomain>
2008-01-18 6:01 ` Fengguang Wu
2008-01-18 7:48 ` Mike Waychison
2008-01-18 6:43 ` Michael Rubin
[not found] ` <E1JFnZz-00015z-Vq@localhost.localdomain>
2008-01-18 9:32 ` Fengguang Wu
2008-01-18 5:01 ` David Chinner
2008-01-18 5:38 ` Michael Rubin
2008-01-18 8:54 ` David Chinner
2008-01-18 9:26 ` Michael Rubin
[not found] ` <E1JFjyv-0001hU-FA@localhost.localdomain>
2008-01-18 5:41 ` Fengguang Wu
2008-01-19 2:50 ` David Chinner
-- strict thread matches above, loose matches on Subject: below --
2007-12-13 0:32 Michael Rubin, Michael Rubin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080116223510.GY155407@sgi.com \
--to=dgc@sgi.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mrubin@google.com \
--cc=wfg@mail.ustc.edu.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).