linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Theodore Tso <tytso@mit.edu>, Jens Axboe <jens.axboe@oracle.com>,
	Christoph Hellwig <hch@infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"jack@suse.cz" <jack@suse.cz>
Subject: Re: [PATCH 0/7] Per-bdi writeback flusher threads v20
Date: Mon, 21 Sep 2009 09:53:21 -0400	[thread overview]
Message-ID: <20090921135321.GD6259@think> (raw)
In-Reply-To: <20090919042607.GA19752@localhost>

On Sat, Sep 19, 2009 at 12:26:07PM +0800, Wu Fengguang wrote:
> On Sat, Sep 19, 2009 at 12:00:51PM +0800, Wu Fengguang wrote:
> > On Sat, Sep 19, 2009 at 11:58:35AM +0800, Wu Fengguang wrote:
> > > On Sat, Sep 19, 2009 at 01:52:52AM +0800, Theodore Tso wrote:
> > > > On Fri, Sep 11, 2009 at 10:39:29PM +0800, Wu Fengguang wrote:
> > > > > 
> > > > > That would be good. Sorry for the late work. I'll allocate some time
> > > > > in mid next week to help review and benchmark recent writeback works,
> > > > > and hope to get things done in this merge window.
> > > > 
> > > > Did you have some chance to get more work done on the your writeback
> > > > patches?
> > > 
> > > Sorry for the delay, I'm now testing the patches with commands
> > > 
> > >  cp /dev/zero /mnt/test/zero0 &
> > >  dd if=/dev/zero of=/mnt/test/zero1 &
> > > 
> > > and the attached debug patch.
> > > 
> > > One problem I found with ext3/4 is, redirty_tail() is called repeatedly
> > > in the traces, which could slow down the inode writeback significantly.
> > 
> > FYI, it's this redirty_tail() called in writeback_single_inode():
> > 
> >                         /*
> >                          * Someone redirtied the inode while were writing back
> >                          * the pages.
> >                          */
> >                         redirty_tail(inode);
> 
> Hmm, this looks like an old fashioned problem get blew up by the
> 128MB MAX_WRITEBACK_PAGES.

I'm starting to rethink the 128MB MAX_WRITEBACK_PAGES.  128MB is the
right answer for the flusher thread on sequential IO, but definitely not
on random IO.  We don't want the flusher to get bogged down on random
writeback and start ignoring every other file.

My btrfs performance branch has long had a change to bump the
nr_to_write up based on the size of the delayed allocation that we're
doing.  It helped, but not as much as I really expected it too, and a
similar patch from Christoph for XFS was good but not great.

It turns out the problem is in write_cache_pages.  It processes a whole
pagevec at a time, something like this:

while(!done) {
	for each page in the pagegvec {
		writepage()
		if (wbc->nr_to_write <= 0)
			done = 1;
	}
}

If the filesystem decides to bump nr_to_write to cover a whole
extent (or a max reasonable size), the new value of nr_to_write may
be ignored if nr_to_write had already gone done to zero.

I fixed btrfs to recheck nr_to_write every time, and the results are
much smoother.  This is what it looks like to write out all the .o files
in the kernel.

http://oss.oracle.com/~mason/seekwatcher/btrfs-nr-to-write.png

In this graph, Btrfs is writing the full extent or 8192 pages, whichever
is smaller.  The write_cache_pages change is here, but it is local to
the btrfs copy of write_cache_pages:

http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=commit;h=f85d7d6c8f2ad4a86a1f4f4e3791f36dede2fa76

I'd rather see a more formal use of hints from the FS about efficient IO
than a blanket increase of the writeback max.  It's more work than
bumping a single #define, but even with the #define at 1GB, we're going
to end up splitting extents and seeking when nr_to_write does finally
get down to zero.

Btrfs currently only bumps the nr_to_write when it creates the extent, I
need to change it to also bump it when it finds an existing extent.

-chris


  parent reply	other threads:[~2009-09-21 13:54 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-11  7:34 [PATCH 0/7] Per-bdi writeback flusher threads v20 Jens Axboe
2009-09-11  7:34 ` [PATCH 1/7] writeback: get rid of generic_sync_sb_inodes() export Jens Axboe
2009-09-11  7:34 ` [PATCH 2/7] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-09-11  7:34 ` [PATCH 3/7] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-09-11  7:34 ` [PATCH 4/7] writeback: get rid of pdflush completely Jens Axboe
2009-09-11  7:34 ` [PATCH 5/7] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-09-11  7:34 ` [PATCH 6/7] writeback: add name to backing_dev_info Jens Axboe
2009-09-11  7:34 ` [PATCH 7/7] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-09-11 13:42 ` [PATCH 0/7] Per-bdi writeback flusher threads v20 Theodore Tso
2009-09-11 13:45   ` Chris Mason
2009-09-11 14:04     ` Jens Axboe
2009-09-11 14:16   ` Christoph Hellwig
2009-09-11 14:29     ` Jens Axboe
2009-09-11 14:39       ` Wu Fengguang
2009-09-18 17:52         ` Theodore Tso
2009-09-19  3:58           ` Wu Fengguang
2009-09-19  4:00             ` Wu Fengguang
2009-09-19  4:26               ` Wu Fengguang
     [not found]               ` <20090919042607.GA19752@localhost>
2009-09-19 15:03                 ` Wu Fengguang
     [not found]                 ` <20090919150351.GA19880@localhost>
2009-09-20 19:00                   ` Jan Kara
2009-09-21  3:04                     ` Wu Fengguang
2009-09-21  5:35                       ` Wu Fengguang
2009-09-21  9:53                         ` Wu Fengguang
2009-09-21 10:02                           ` Jan Kara
2009-09-21 10:18                             ` Wu Fengguang
2009-09-21 12:42                       ` Jan Kara
2009-09-21 15:12                         ` Wu Fengguang
2009-09-21 16:08                           ` Jan Kara
2009-09-22  5:10                             ` Wu Fengguang
2009-09-21 13:53                 ` Chris Mason [this message]
2009-09-22 10:13                   ` Wu Fengguang
     [not found]                   ` <20090922101335.GA27432@localhost>
2009-09-22 11:30                     ` Jan Kara
2009-09-22 13:33                       ` Wu Fengguang
2009-09-22 11:30                     ` Chris Mason
2009-09-22 11:45                       ` Jan Kara
2009-09-22 12:47                         ` Wu Fengguang
2009-09-22 17:41                         ` Chris Mason
2009-09-22 13:18                       ` Wu Fengguang
2009-09-22 15:59                         ` Chris Mason
2009-09-23  1:05                           ` Wu Fengguang
2009-09-23 14:08                             ` Chris Mason
2009-09-24  1:32                               ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090921135321.GD6259@think \
    --to=chris.mason@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).