From: Fengguang Wu <wfg@mail.ustc.edu.cn>
To: David Chinner <dgc@sgi.com>
Cc: Chris Mason <chris.mason@oracle.com>,
Andrew Morton <akpm@osdl.org>, Ken Chen <kenchen@google.com>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Jens Axboe <jens.axboe@oracle.com>
Subject: Re: [PATCH 0/6] writeback time order/delay fixes take 3
Date: Wed, 29 Aug 2007 15:53:30 +0800 [thread overview]
Message-ID: <388374011.01681@ustc.edu.cn> (raw)
Message-ID: <20070829075330.GA5960@mail.ustc.edu.cn> (raw)
In-Reply-To: <20070828163308.GE61154114@sgi.com>
On Wed, Aug 29, 2007 at 02:33:08AM +1000, David Chinner wrote:
> On Tue, Aug 28, 2007 at 11:08:20AM -0400, Chris Mason wrote:
> > On Wed, 29 Aug 2007 00:55:30 +1000
> > David Chinner <dgc@sgi.com> wrote:
> > > On Fri, Aug 24, 2007 at 09:55:04PM +0800, Fengguang Wu wrote:
> > > > On Thu, Aug 23, 2007 at 12:33:06PM +1000, David Chinner wrote:
> > > > > On Wed, Aug 22, 2007 at 09:18:41AM +0800, Fengguang Wu wrote:
> > > > > > On Tue, Aug 21, 2007 at 08:23:14PM -0400, Chris Mason wrote:
> > > > > > Notes:
> > > > > > (1) I'm not sure inode number is correlated to disk location in
> > > > > > filesystems other than ext2/3/4. Or parent dir?
> > > > >
> > > > > The correspond to the exact location on disk on XFS. But, XFS has
> > > > > it's own inode clustering (see xfs_iflush) and it can't be moved
> > > > > up into the generic layers because of locking and integration into
> > > > > the transaction subsystem.
> > > > >
> > > > > > (2) It duplicates some function of elevators. Why is it
> > > > > > necessary?
> > > > >
> > > > > The elevators have no clue as to how the filesystem might treat
> > > > > adjacent inodes. In XFS, inode clustering is a fundamental
> > > > > feature of the inode reading and writing and that is something no
> > > > > elevator can hope to acheive....
> > > >
> > > > Thank you. That explains the linear write curve(perfect!) in Chris'
> > > > graph.
> > > >
> > > > I wonder if XFS can benefit any more from the general writeback
> > > > clustering. How large would be a typical XFS cluster?
> > >
> > > Depends on inode size. typically they are 8k in size, so anything
> > > from 4-32 inodes. The inode writeback clustering is pretty tightly
> > > integrated into the transaction subsystem and has some intricate
> > > locking, so it's not likely to be easy (or perhaps even possible) to
> > > make it more generic.
> >
> > When I talked to hch about this, he said the order file data pages got
> > written in XFS was still dictated by the order the higher layers sent
> > things down.
>
> Sure, that's file data. I was talking about the inode writeback, not the
> data writeback.
>
> > Shouldn't the clustering still help to have delalloc done
> > in inode order instead of in whatever random order pdflush sends things
> > down now?
>
> Depends on how things are being allocated. if you've got inode32 allocation
> and >1TB filesytsem, then data is nowhere near the inodes. If you've got large
> allocation groups, then data is typically nowhere near the inodes, either. If
> you've got full AGs, data will be nowehere near the inodes. If you've got
> large files and lots of data to write, then clustering multiple files together
> for writing is not needed. So in many cases, clustering delalloc writes by
> inode number doesn't provide any better I/o patterns than not clustering...
>
> The only difference we may see is that if we flush all the data on inodes
> in a single cluster, we can get away with a single inode cluster write
> for all of the inodes....
So we end up with two major cases:
- small files: inode and its data are expected to be close enough,
hence it can help I_DIRTY_SYNC and/or I_DIRTY_PAGES
- big files: inode and its data may or may not be separated
- I_DIRTY_SYNC: could be improved
- I_DIRTY_PAGES: no better, no worse(it's big I/O, the seek
cost is not relevant in any case)
Conclusion: _inode_ writeback clustering is enough.
Isn't it simple? ;-)
Fengguang
next prev parent reply other threads:[~2007-08-29 7:53 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-12 9:11 [PATCH 0/6] writeback time order/delay fixes take 3 Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-22 0:23 ` Chris Mason
2007-08-22 1:18 ` Fengguang Wu
2007-08-22 1:18 ` Fengguang Wu
2007-08-22 12:42 ` Chris Mason
2007-08-23 2:47 ` David Chinner
2007-08-23 12:13 ` Chris Mason
2007-08-24 12:56 ` Fengguang Wu
2007-08-24 12:56 ` Fengguang Wu
2007-08-24 13:24 ` Fengguang Wu
2007-08-24 13:24 ` Fengguang Wu
2007-08-24 14:36 ` Chris Mason
2007-08-23 2:33 ` David Chinner
2007-08-24 13:55 ` Fengguang Wu
2007-08-24 13:55 ` Fengguang Wu
2007-08-28 14:55 ` David Chinner
2007-08-28 15:08 ` Chris Mason
2007-08-28 16:33 ` David Chinner
2007-08-28 16:57 ` Chris Mason
2007-08-29 7:53 ` Fengguang Wu [this message]
2007-08-29 7:53 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 1/6] writeback: fix time ordering of the per superblock inode lists 8 Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 2/6] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 3/6] writeback: remove pages_skipped accounting in __block_write_full_page() Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-13 1:03 ` David Chinner
2007-08-13 10:30 ` Fengguang Wu
2007-08-13 10:30 ` Fengguang Wu
2007-08-17 7:13 ` Fengguang Wu
2007-08-17 7:13 ` Fengguang Wu
2007-08-17 7:13 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 4/6] check dirty inode list Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 5/6] prevent time-ordering warnings Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
2007-08-12 9:11 ` [PATCH 6/6] track redirty_tail() calls Fengguang Wu
2007-08-12 9:11 ` Fengguang Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=388374011.01681@ustc.edu.cn \
--to=wfg@mail.ustc.edu.cn \
--cc=akpm@osdl.org \
--cc=chris.mason@oracle.com \
--cc=dgc@sgi.com \
--cc=jens.axboe@oracle.com \
--cc=kenchen@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.