Re: [PATCH 0/6] writeback time order/delay fixes take 3

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Fengguang Wu <wfg@mail.ustc.edu.cn>
To: David Chinner <dgc@sgi.com>
Cc: Chris Mason <chris.mason@oracle.com>,
	Andrew Morton <akpm@osdl.org>, Ken Chen <kenchen@google.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Jens Axboe <jens.axboe@oracle.com>
Subject: Re: [PATCH 0/6] writeback time order/delay fixes take 3
Date: Wed, 29 Aug 2007 15:53:30 +0800	[thread overview]
Message-ID: <388374011.01681@ustc.edu.cn> (raw)
Message-ID: <20070829075330.GA5960@mail.ustc.edu.cn> (raw)
In-Reply-To: <20070828163308.GE61154114@sgi.com>

On Wed, Aug 29, 2007 at 02:33:08AM +1000, David Chinner wrote:
> On Tue, Aug 28, 2007 at 11:08:20AM -0400, Chris Mason wrote:
> > On Wed, 29 Aug 2007 00:55:30 +1000
> > David Chinner <dgc@sgi.com> wrote:
> > > On Fri, Aug 24, 2007 at 09:55:04PM +0800, Fengguang Wu wrote:
> > > > On Thu, Aug 23, 2007 at 12:33:06PM +1000, David Chinner wrote:
> > > > > On Wed, Aug 22, 2007 at 09:18:41AM +0800, Fengguang Wu wrote:
> > > > > > On Tue, Aug 21, 2007 at 08:23:14PM -0400, Chris Mason wrote:
> > > > > > Notes:
> > > > > > (1) I'm not sure inode number is correlated to disk location in
> > > > > >     filesystems other than ext2/3/4. Or parent dir?
> > > > > 
> > > > > The correspond to the exact location on disk on XFS. But, XFS has
> > > > > it's own inode clustering (see xfs_iflush) and it can't be moved
> > > > > up into the generic layers because of locking and integration into
> > > > > the transaction subsystem.
> > > > >
> > > > > > (2) It duplicates some function of elevators. Why is it
> > > > > > necessary?
> > > > > 
> > > > > The elevators have no clue as to how the filesystem might treat
> > > > > adjacent inodes. In XFS, inode clustering is a fundamental
> > > > > feature of the inode reading and writing and that is something no
> > > > > elevator can hope to acheive....
> > > >  
> > > > Thank you. That explains the linear write curve(perfect!) in Chris'
> > > > graph.
> > > > 
> > > > I wonder if XFS can benefit any more from the general writeback
> > > > clustering. How large would be a typical XFS cluster?
> > > 
> > > Depends on inode size. typically they are 8k in size, so anything
> > > from 4-32 inodes. The inode writeback clustering is pretty tightly
> > > integrated into the transaction subsystem and has some intricate
> > > locking, so it's not likely to be easy (or perhaps even possible) to
> > > make it more generic.
> > 
> > When I talked to hch about this, he said the order file data pages got
> > written in XFS was still dictated by the order the higher layers sent
> > things down.
> 
> Sure, that's file data. I was talking about the inode writeback, not the
> data writeback.
> 
> > Shouldn't the clustering still help to have delalloc done
> > in inode order instead of in whatever random order pdflush sends things
> > down now?
> 
> Depends on how things are being allocated. if you've got inode32 allocation
> and >1TB filesytsem, then data is nowhere near the inodes. If you've got large
> allocation groups, then data is typically nowhere near the inodes, either. If
> you've got full AGs, data will be nowehere near the inodes. If you've got
> large files and lots of data to write, then clustering multiple files together
> for writing is not needed.  So in many cases, clustering delalloc writes by
> inode number doesn't provide any better I/o patterns than not clustering...
> 
> The only difference we may see is that if we flush all the data on inodes
> in a single cluster, we can get away with a single inode cluster write
> for all of the inodes....

So we end up with two major cases:

- small files: inode and its data are expected to be close enough,
  hence it can help I_DIRTY_SYNC and/or I_DIRTY_PAGES

- big files: inode and its data may or may not be separated
        - I_DIRTY_SYNC:  could be improved
        - I_DIRTY_PAGES: no better, no worse(it's big I/O, the seek
          cost is not relevant in any case)

Conclusion: _inode_ writeback clustering is enough.

Isn't it simple? ;-)

Fengguang

next prev parent reply	other threads:[~2007-08-29  7:53 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20070812091120.189651872@mail.ustc.edu.cn>
2007-08-12  9:11 ` [PATCH 0/6] writeback time order/delay fixes take 3 Fengguang Wu
2007-08-22  0:23   ` Chris Mason
     [not found]     ` <20070822011841.GA8090@mail.ustc.edu.cn>
2007-08-22  1:18       ` Fengguang Wu
2007-08-22 12:42         ` Chris Mason
2007-08-23  2:47           ` David Chinner
2007-08-23 12:13             ` Chris Mason
     [not found]               ` <20070824125643.GB7933@mail.ustc.edu.cn>
2007-08-24 12:56                 ` Fengguang Wu
     [not found]           ` <20070824132458.GC7933@mail.ustc.edu.cn>
2007-08-24 13:24             ` Fengguang Wu
2007-08-24 14:36               ` Chris Mason
2007-08-23  2:33       ` David Chinner
     [not found]         ` <20070824135504.GA9029@mail.ustc.edu.cn>
2007-08-24 13:55           ` Fengguang Wu
2007-08-28 14:55           ` David Chinner
2007-08-28 15:08             ` Chris Mason
2007-08-28 16:33               ` David Chinner
2007-08-28 16:57                 ` Chris Mason
     [not found]                 ` <20070829075330.GA5960@mail.ustc.edu.cn>
2007-08-29  7:53                   ` Fengguang Wu [this message]
     [not found] ` <20070812092052.558804846@mail.ustc.edu.cn>
2007-08-12  9:11   ` [PATCH 1/6] writeback: fix time ordering of the per superblock inode lists 8 Fengguang Wu
     [not found] ` <20070812092052.704326603@mail.ustc.edu.cn>
2007-08-12  9:11   ` [PATCH 2/6] writeback: fix ntfs with sb_has_dirty_inodes() Fengguang Wu
     [not found] ` <20070812092052.983296733@mail.ustc.edu.cn>
2007-08-12  9:11   ` [PATCH 4/6] check dirty inode list Fengguang Wu
     [not found] ` <20070812092053.113127445@mail.ustc.edu.cn>
2007-08-12  9:11   ` [PATCH 5/6] prevent time-ordering warnings Fengguang Wu
     [not found] ` <20070812092053.242474484@mail.ustc.edu.cn>
2007-08-12  9:11   ` [PATCH 6/6] track redirty_tail() calls Fengguang Wu
     [not found] ` <20070812092052.848213359@mail.ustc.edu.cn>
2007-08-12  9:11   ` [PATCH 3/6] writeback: remove pages_skipped accounting in __block_write_full_page() Fengguang Wu
2007-08-13  1:03   ` David Chinner
     [not found]     ` <20070813103000.GA8520@mail.ustc.edu.cn>
2007-08-13 10:30       ` Fengguang Wu
     [not found]       ` <20070817071317.GA8965@mail.ustc.edu.cn>
2007-08-17  7:13         ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=388374011.01681@ustc.edu.cn \
    --to=wfg@mail.ustc.edu.cn \
    --cc=akpm@osdl.org \
    --cc=chris.mason@oracle.com \
    --cc=dgc@sgi.com \
    --cc=jens.axboe@oracle.com \
    --cc=kenchen@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox