linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: "Li, Shaohua" <shaohua.li@intel.com>,
	lkml <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrew Morton <akpm@linux-foundation.org>,
	Chris Mason <chris.mason@oracle.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [RFC] page-writeback: move indoes from one superblock together
Date: Thu, 24 Sep 2009 15:29:50 +0200	[thread overview]
Message-ID: <20090924132949.GH23126@kernel.dk> (raw)
In-Reply-To: <20090924132252.GA696@localhost>

On Thu, Sep 24 2009, Wu Fengguang wrote:
> On Thu, Sep 24, 2009 at 08:35:19PM +0800, Jens Axboe wrote:
> > On Thu, Sep 24 2009, Wu Fengguang wrote:
> > > On Thu, Sep 24, 2009 at 02:54:20PM +0800, Li, Shaohua wrote:
> > > > __mark_inode_dirty adds inode to wb dirty list in random order. If a disk has
> > > > several partitions, writeback might keep spindle moving between partitions.
> > > > To reduce the move, better write big chunk of one partition and then move to
> > > > another. Inodes from one fs usually are in one partion, so idealy move indoes
> > > > from one fs together should reduce spindle move. This patch tries to address
> > > > this. Before per-bdi writeback is added, the behavior is write indoes
> > > > from one fs first and then another, so the patch restores previous behavior.
> > > > The loop in the patch is a bit ugly, should we add a dirty list for each
> > > > superblock in bdi_writeback?
> > > > 
> > > > Test in a two partition disk with attached fio script shows about 3% ~ 6%
> > > > improvement.
> > > 
> > > A side note: given the noticeable performance gain, I wonder if it
> > > deserves to generalize the idea to do whole disk location ordered
> > > writeback. That should benefit many small file workloads more than
> > > 10%. Because this patch only sorted 2 partitions and inodes in 5s
> > > time window, while the below patch will roughly divide the disk into
> > > 5 areas and sort inodes in a larger 25s time window.
> > > 
> > >         http://lkml.org/lkml/2007/8/27/45
> > > 
> > > Judging from this old patch, the complexity cost would be about 250
> > > lines of code (need a rbtree).
> > 
> > First of all, nice patch, I'll add it to the current tree. I too was
> 
> You mean Shaohua's patch? It should be a good addition for 2.6.32.

Yes indeed, the parent patch.

> In long term move_expired_inodes() needs some rework.  Because it
> could be time consuming to move around all the inodes in a large
> system, and thus hold inode_lock() for too long time (and this patch
> scales up the locked time).

It does. As mentioned in my reply, for 100 inodes or less, it will still
be faster than eg using an rbtree. But the more "reliable" runtime of an
rbtree based solution is appealing, though. It's not hugely critical,
though.

> So would need to split the list moves into smaller pieces in future,
> or to change data structure.

Yes, those are the two options.

> > pondering using an rbtree for sb+dirty_time insertion and extraction.
> 
> FYI Michael Rubin did some work on a rbtree implementation, just
> in case you are interested:
> 
>         http://lkml.org/lkml/2008/1/15/25

Thanks, I'll take a look.

-- 
Jens Axboe

  reply	other threads:[~2009-09-24 13:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1253775260.10618.10.camel@sli10-desk.sh.intel.com>
2009-09-24  7:14 ` [RFC] page-writeback: move indoes from one superblock together Wu Fengguang
2009-09-24  7:29   ` Arjan van de Ven
2009-09-24  7:36     ` Wu Fengguang
2009-09-24  7:44   ` Shaohua Li
2009-09-24 13:17     ` Jens Axboe
2009-09-24 13:29       ` Wu Fengguang
2009-09-24 10:01 ` Wu Fengguang
2009-09-24 12:35   ` Jens Axboe
2009-09-24 13:22     ` Wu Fengguang
2009-09-24 13:29       ` Jens Axboe [this message]
2009-09-24 13:46         ` Wu Fengguang
2009-09-24 13:52           ` Arjan van de Ven
2009-09-24 14:09             ` Wu Fengguang
2009-09-25  4:16               ` Dave Chinner
2009-09-25  5:09                 ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090924132949.GH23126@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=fengguang.wu@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).