linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	ext4 <linux-ext4@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCH] Improve buffered streaming write ordering
Date: Fri, 10 Oct 2008 16:13:39 +1100	[thread overview]
Message-ID: <20081010051339.GD8181@disturbed> (raw)
In-Reply-To: <1223565080.14090.28.camel@think.oraclecorp.com>

On Thu, Oct 09, 2008 at 11:11:20AM -0400, Chris Mason wrote:
> On Fri, 2008-10-03 at 09:43 +1000, Dave Chinner wrote:
> > On Thu, Oct 02, 2008 at 11:48:56PM +0530, Aneesh Kumar K.V wrote:
> > > On Thu, Oct 02, 2008 at 08:20:54AM -0400, Chris Mason wrote:
> > > > On Wed, 2008-10-01 at 21:52 -0700, Andrew Morton wrote:
> > > > For a 4.5GB streaming buffered write, this printk inside
> > > > ext4_da_writepage shows up 37,2429 times in /var/log/messages.
> > > > 
> > > 
> > > Part of that can happen due to shrink_page_list -> pageout -> writepagee
> > > call back with lots of unallocated buffer_heads(blocks).
> > 
> > Quite frankly, a simple streaming buffered write should *never*
> > trigger writeback from the LRU in memory reclaim. That indicates
> > that some feedback loop has broken down and we are not cleaning
> > pages fast enough or perhaps in the correct order. Page reclaim in
> > this case should be reclaiming clean pages (those that have already
> > been written back), not writing back random dirty pages.
> 
> Here are some go faster stripes for the XFS buffered writeback.  This
> patch has a lot of debatable features to it, but the idea is to show
> which knobs are slowing us down today.
> 
> The first change is to avoid calling balance_dirty_pages_ratelimited on
> every page.  When we know we're doing a largeish write it makes more
> sense to balance things less often.  This might just mean our
> ratelimit_pages magic value is too small.

Ok, so how about doing something like this to reduce the
number of balances on large writes, but causing at least one
balance call for every write that occurs:

	int	nr = 0;
	.....
	while() {
		....
		if (!(nr % 256)) {
			/* do balance */
		}
		nr++;
		....
	}

That way you get a balance on the first page on every write,
but then hold off balancing on that write again for some
number of pages.

> The second change makes xfs bump wbc->nr_to_write (suggested by
> Christoph), which probably makes delalloc go in bigger chunks.

Hmmmm.  Reasonable theory. We used to do gigantic delalloc extents -
we paid no attention to congestion and could allocate and write
several GB at a time. Latency was an issue, though, so it got
changed to be bound by nr_to_write.

I guess we need to be issuing larger allocations. Can you remove
you patches and see what effect using the allocsize mount
option has on throughput? This changes the default delalloc EOF
preallocation size, which means more or less allocations. The
default is 64k and it can go as high as 1GB, IIRC.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2008-10-10  5:13 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-01 18:40 [PATCH] Improve buffered streaming write ordering Chris Mason
2008-10-02  4:52 ` Andrew Morton
2008-10-02 12:20   ` Chris Mason
2008-10-02 16:12     ` Chris Mason
2008-10-02 18:18     ` Aneesh Kumar K.V
2008-10-02 19:44       ` Andrew Morton
2008-10-02 23:43       ` Dave Chinner
2008-10-03 19:45         ` Chris Mason
2008-10-06 10:16           ` Aneesh Kumar K.V
2008-10-06 14:21             ` Chris Mason
2008-10-07  8:45               ` Aneesh Kumar K.V
2008-10-07  9:05                 ` Christoph Hellwig
2008-10-07 10:02                   ` Aneesh Kumar K.V
2008-10-07 13:29                     ` Theodore Tso
2008-10-07 13:36                       ` Christoph Hellwig
2008-10-07 14:46                         ` Nick Piggin
2008-10-07 13:55                     ` Peter Staubach
2008-10-07 14:38                       ` Chuck Lever
2008-10-09 15:11         ` Chris Mason
2008-10-10  5:13           ` Dave Chinner [this message]
2008-10-03  1:11       ` Chris Mason
2008-10-03  2:43         ` Nick Piggin
2008-10-03 12:07           ` Chris Mason
2008-10-02 18:08 ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081010051339.GD8181@disturbed \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=chris.mason@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).