linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"Li, Shaohua" <shaohua.li@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"richard@rsk.demon.co.uk" <richard@rsk.demon.co.uk>,
	"jens.axboe@oracle.com" <jens.axboe@oracle.com>
Subject: Re: regression in page writeback
Date: Wed, 30 Sep 2009 10:11:58 -0400	[thread overview]
Message-ID: <20090930141158.GG24383@mit.edu> (raw)
In-Reply-To: <20090930052657.GA17268@localhost>

On Wed, Sep 30, 2009 at 01:26:57PM +0800, Wu Fengguang wrote:
> It's good to increase MAX_WRITEBACK_PAGES, however I'm afraid
> max_contig_writeback_mb may be a burden in future: either it is not
> necessary, or a per-bdi counterpart must be introduced for all
> filesystems.

The per-filesystem tunable was just a short-term hack; the reason why
I did it that way was it was clear that a global tunable wouldn't fly,
and rightly so --- what might be suitable for a slow USB stick might
be very different than a super-fast RAID array, and someone might very
well have both on the same system.

> And it's preferred to automatically handle slow devices well with the
> increased chunk size, instead of adding another parameter.

Agreed; long-term what we probably need is something which is
automatically tunable.  My thinking was that we should tune the the
initial nr_to_write parameter based on how many blocks could be
written in some time interval, which is tunable.  So if we decide that
1 second is a suitable time period to be writing out one inode's dirty
pages, then for a fast server-class SATA disk, we might want to set
nr_to_write to be around 128mb worth of pages.  For a laptop SATA
disk, it might be around 64mb, and for a really slow USB stick, it
might be more like 16mb.  For super-fast enterprise RAID array, 128mb
might be too small!

If we get timing and/or congestion information from the block layer,
it wouldn't be hard to figure out the optimal number of pages that
should be sent down to the filesystem, and to tune this automatically.

> I scratched up a patch to demo the ideas collected in recent discussions.
> Can you check if it serves your needs? Thanks.

Sure, I'll definitely play with it, thanks.

> The wbc.timeout (when used per-file) is mainly a safeguard against slow
> devices, which may take too long time to sync 128MB data.

Maybe I'm missing something, but I don't think the wbc.timeout
approach is sufficient.  Consider the scenario of someone who is
ripping a DVD disc to an 8 gig USB stick.  The USB stick will be very
slow, but since the file is contiguous the filesystem will very
happily try to push it out there 128MB at a time, and wbc.timeout
value isn't really going to help since a single call to writepages
could easily cause 128MB worth of data to be streamed out to the USB
stick.

This is why the MAX_WRITEBACK_PAGES really needs to be tuned on a
per-bdi basis; either manually, via a sysfs tunable, or automatically,
by auto-tuning based on how fast the storage device is or by some kind
of congestion-based approach.  This is certainly the best long-term
solution; my concern was that it might take a long-time for us to get
the auto-tunable just right, so in the meantime I added a
per-mounted-filesystem tunable and put the hack in the filesystem
layer.  I would like nothing better than to rip it out, once we have a
long-term solution.

Regards,

							- Ted


  parent reply	other threads:[~2009-09-30 14:12 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-22  5:49 regression in page writeback Shaohua Li
2009-09-22  6:40 ` Peter Zijlstra
2009-09-22  8:05   ` Wu Fengguang
2009-09-22  8:09     ` Peter Zijlstra
2009-09-22  8:24       ` Wu Fengguang
2009-09-22  8:32         ` Peter Zijlstra
2009-09-22  8:51           ` Wu Fengguang
2009-09-22  8:52           ` Richard Kennedy
2009-09-22  9:05             ` Wu Fengguang
2009-09-22 11:41               ` Shaohua Li
2009-09-22 15:52           ` Chris Mason
2009-09-23  0:22             ` Wu Fengguang
2009-09-23  0:54               ` Andrew Morton
2009-09-23  1:17                 ` Wu Fengguang
2009-09-23  1:27                   ` Wu Fengguang
2009-09-23  1:28                   ` Andrew Morton
2009-09-23  1:32                     ` Wu Fengguang
2009-09-23  1:47                       ` Andrew Morton
2009-09-23  2:01                         ` Wu Fengguang
2009-09-23  2:09                           ` Andrew Morton
2009-09-23  3:07                             ` Wu Fengguang
2009-09-23  1:45                     ` Wu Fengguang
2009-09-23  1:59                       ` Andrew Morton
2009-09-23  2:26                         ` Wu Fengguang
2009-09-23  2:36                           ` Andrew Morton
2009-09-23  2:49                             ` Wu Fengguang
2009-09-23  2:56                               ` Andrew Morton
2009-09-23  3:11                                 ` Wu Fengguang
2009-09-23  3:10                               ` Shaohua Li
2009-09-23  3:14                                 ` Wu Fengguang
2009-09-23  3:25                                   ` Wu Fengguang
2009-09-23 14:00                             ` Chris Mason
2009-09-24  3:15                               ` Wu Fengguang
2009-09-24 12:10                                 ` Chris Mason
2009-09-25  3:26                                   ` Wu Fengguang
2009-09-25  0:11                                 ` Dave Chinner
2009-09-25  0:38                                   ` Chris Mason
2009-09-25  5:04                                     ` Dave Chinner
2009-09-25  6:45                                       ` Wu Fengguang
2009-09-28  1:07                                         ` Dave Chinner
2009-09-28  7:15                                           ` Wu Fengguang
2009-09-28 13:08                                             ` Christoph Hellwig
2009-09-28 14:07                                               ` Theodore Tso
2009-09-30  5:26                                                 ` Wu Fengguang
2009-09-30  5:32                                                   ` Wu Fengguang
2009-10-01 22:17                                                     ` Jan Kara
2009-10-02  3:27                                                       ` Wu Fengguang
2009-10-06 12:55                                                         ` Jan Kara
2009-10-06 13:18                                                           ` Wu Fengguang
2009-09-30 14:11                                                   ` Theodore Tso [this message]
2009-10-01 15:14                                                     ` Wu Fengguang
2009-10-01 21:54                                                       ` Theodore Tso
2009-10-02  2:55                                                         ` Wu Fengguang
2009-10-02  8:19                                                           ` Wu Fengguang
2009-10-02 17:26                                                             ` Theodore Tso
2009-10-03  6:10                                                               ` Wu Fengguang
2009-09-29  2:32                                               ` Wu Fengguang
2009-09-29 14:00                                                 ` Chris Mason
2009-09-29 14:21                                                 ` Christoph Hellwig
2009-09-29  0:15                                             ` Wu Fengguang
2009-09-28 14:25                                           ` Chris Mason
2009-09-29 23:39                                             ` Dave Chinner
2009-09-30  1:30                                               ` Wu Fengguang
2009-09-25 12:06                                       ` Chris Mason
2009-09-25  3:19                                   ` Wu Fengguang
2009-09-26  1:47                                     ` Dave Chinner
2009-09-26  3:02                                       ` Wu Fengguang
2009-09-23  9:19                         ` Richard Kennedy
2009-09-23  9:23                           ` Peter Zijlstra
2009-09-23  9:37                             ` Wu Fengguang
2009-09-23 10:30                               ` Wu Fengguang
2009-09-23  6:41             ` Shaohua Li
2009-09-22 10:49 ` Wu Fengguang
2009-09-22 11:50   ` Shaohua Li
2009-09-22 13:39     ` Wu Fengguang
2009-09-23  1:52       ` Shaohua Li
2009-09-23  4:00         ` Wu Fengguang
2009-09-25  6:14           ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090930141158.GG24383@mit.edu \
    --to=tytso@mit.edu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=richard@rsk.demon.co.uk \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).