From: Wu Fengguang <fengguang.wu@intel.com>
To: Chris Mason <chris.mason@oracle.com>,
Peter Zijlstra <peterz@infradead.org>,
Artem Bityutskiy <dedekind1@gmail.com>,
Jens Axboe <jens.axboe@oracle.com>, linux-kernel@vger.kernel.org,
Subject: Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb
Date: Wed, 9 Sep 2009 17:29:01 +0800 [thread overview]
Message-ID: <20090909092901.GA24185@localhost> (raw)
In-Reply-To: <20090908162936.GA2975@think>
On Tue, Sep 08, 2009 at 12:29:36PM -0400, Chris Mason wrote:
> On Tue, Sep 08, 2009 at 06:06:23PM +0200, Peter Zijlstra wrote:
> > On Tue, 2009-09-08 at 13:37 +0300, Artem Bityutskiy wrote:
> > > Hi,
> > >
> > > On 09/08/2009 12:23 PM, Jens Axboe wrote:
> > > > From: Theodore Ts'o<tytso@mit.edu>
> > > >
> > > > Originally, MAX_WRITEBACK_PAGES was hard-coded to 1024 because of a
> > > > concern of not holding I_SYNC for too long. (At least, that was the
> > > > comment previously.) This doesn't make sense now because the only
> > > > time we wait for I_SYNC is if we are calling sync or fsync, and in
> > > > that case we need to write out all of the data anyway. Previously
> > > > there may have been other code paths that waited on I_SYNC, but not
> > > > any more.
> > > >
> > > > According to Christoph, the current writeback size is way too small,
> > > > and XFS had a hack that bumped out nr_to_write to four times the value
> > > > sent by the VM to be able to saturate medium-sized RAID arrays. This
> > > > value was also problematic for ext4 as well, as it caused large files
> > > > to be come interleaved on disk by in 8 megabyte chunks (we bumped up
> > > > the nr_to_write by a factor of two).
> > > >
> > > > So, in this patch, we make the MAX_WRITEBACK_PAGES a tunable,
> > > > max_writeback_mb, and set it to a default value of 128 megabytes.
> > > >
> > > > http://bugzilla.kernel.org/show_bug.cgi?id=13930
> > > >
> > > > Signed-off-by: "Theodore Ts'o"<tytso@mit.edu>
> > > > Signed-off-by: Jens Axboe<jens.axboe@oracle.com>
> > >
> > > Would be nice to update doc files like
> > >
> > > Documentation/sysctl/vm.txt
> > > Documentation/filesystems/proc.txt
> >
> > I'm still not convinced this knob is worth the patch and I'm inclined to
> > flat out NAK it..
> >
> > The whole point of MAX_WRITEBACK_PAGES seems to occasionally check the
> > dirty stats again and not write out too much.
>
> The problem is that 'too much' is a very abstract thing. When a process
> is stuck in balance_dirty_pages, we want them to do the minimal amount
> of work (or waiting) required to get them safely back inside file_write().
It seems that balance_dirty_pages() is not coupled with MAX_WRITEBACK_PAGES.
Instead it uses the much smaller (ratelimit_pages + ratelimit_pages / 2).
So I feel that we could just increase MAX_WRITEBACK_PAGES. It won't
lead to bumpy throttled writes. It does affect fairness of background
writes, ie. small files will have to wait more time for large files.
But I'm fine with MAX_WRITEBACK_PAGES=64MB, which means for desktop
that a large file may only delay others for 1 second, which is small
enough comparing to the 30 second dirty expire time.
On the other hand, I find that the (ratelimit_pages + ratelimit_pages / 2)
used for balance_dirty_pages() may fall below the real number of
dirtied pages, which is not safe if some filesystem choose to dirty
2 * ratelimit_pages before calling balance_dirty_pages_ratelimited_nr().
So, how about this patch?
Thanks,
Fengguang
---
writeback: balance_dirty_pages() shall write more than dirtied pages
Some filesystem may choose to write much more than ratelimit_pages
before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
determine number to write based on real number of dirtied pages.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
mm/page-writeback.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
--- linux.orig/mm/page-writeback.c 2009-09-09 16:53:15.000000000 +0800
+++ linux/mm/page-writeback.c 2009-09-09 17:05:14.000000000 +0800
@@ -44,12 +44,12 @@ static long ratelimit_pages = 32;
/*
* When balance_dirty_pages decides that the caller needs to perform some
* non-background writeback, this is how many pages it will attempt to write.
- * It should be somewhat larger than RATELIMIT_PAGES to ensure that reasonably
+ * It should be somewhat larger than dirtied pages to ensure that reasonably
* large amounts of I/O are submitted.
*/
-static inline long sync_writeback_pages(void)
+static inline long sync_writeback_pages(unsigned long dirtied)
{
- return ratelimit_pages + ratelimit_pages / 2;
+ return dirtied + dirtied / 2;
}
/* The following parameters are exported via /proc/sys/vm */
@@ -481,7 +481,8 @@ get_dirty_limits(unsigned long *pbackgro
* If we're over `background_thresh' then pdflush is woken to perform some
* writeout.
*/
-static void balance_dirty_pages(struct address_space *mapping)
+static void balance_dirty_pages(struct address_space *mapping,
+ unsigned long write_chunk)
{
long nr_reclaimable, bdi_nr_reclaimable;
long nr_writeback, bdi_nr_writeback;
@@ -489,7 +490,6 @@ static void balance_dirty_pages(struct a
unsigned long dirty_thresh;
unsigned long bdi_thresh;
unsigned long pages_written = 0;
- unsigned long write_chunk = sync_writeback_pages();
struct backing_dev_info *bdi = mapping->backing_dev_info;
@@ -634,9 +634,10 @@ void balance_dirty_pages_ratelimited_nr(
p = &__get_cpu_var(ratelimits);
*p += nr_pages_dirtied;
if (unlikely(*p >= ratelimit)) {
+ ratelimit = sync_writeback_pages(*p);
*p = 0;
preempt_enable();
- balance_dirty_pages(mapping);
+ balance_dirty_pages(mapping, ratelimit);
return;
}
preempt_enable();
next prev parent reply other threads:[~2009-09-09 9:29 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-08 9:23 [PATCH 0/8] Per-bdi writeback flusher threads v19 Jens Axboe
2009-09-08 9:23 ` [PATCH 1/8] writeback: get rid of generic_sync_sb_inodes() export Jens Axboe
2009-09-08 10:27 ` Artem Bityutskiy
2009-09-08 10:41 ` Jens Axboe
2009-09-08 10:52 ` Artem Bityutskiy
2009-09-08 10:57 ` Jens Axboe
2009-09-08 11:01 ` Artem Bityutskiy
2009-09-08 11:05 ` Jens Axboe
2009-09-08 11:31 ` Artem Bityutskiy
2009-09-08 9:23 ` [PATCH 2/8] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-09-08 9:23 ` [PATCH 3/8] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-09-08 13:46 ` Daniel Walker
2009-09-08 14:21 ` Jens Axboe
2009-09-08 9:23 ` [PATCH 4/8] writeback: get rid of pdflush completely Jens Axboe
2009-09-08 9:23 ` [PATCH 5/8] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-09-08 9:23 ` [PATCH 6/8] writeback: add name to backing_dev_info Jens Axboe
2009-09-08 9:23 ` [PATCH 7/8] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-09-08 9:23 ` [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb Jens Axboe
2009-09-08 10:37 ` Artem Bityutskiy
2009-09-08 16:06 ` Peter Zijlstra
2009-09-08 16:29 ` Chris Mason
2009-09-08 16:56 ` Peter Zijlstra
2009-09-08 17:28 ` Chris Mason
2009-09-08 17:46 ` Peter Zijlstra
2009-09-08 17:55 ` Peter Zijlstra
2009-09-08 18:32 ` Peter Zijlstra
2009-09-09 14:23 ` Jan Kara
2009-09-09 14:37 ` Wu Fengguang
2009-09-10 15:49 ` Peter Zijlstra
2009-09-14 11:17 ` Jan Kara
2009-09-24 8:33 ` Wu Fengguang
2009-09-24 15:38 ` Peter Zijlstra
2009-09-25 1:33 ` Wu Fengguang
2009-09-29 17:35 ` Jan Kara
2009-09-30 1:24 ` Wu Fengguang
2009-09-30 11:55 ` Jan Kara
2009-09-30 12:10 ` Jens Axboe
2009-10-01 15:17 ` Wu Fengguang
2009-10-01 13:36 ` Wu Fengguang
2009-10-01 14:22 ` Jan Kara
2009-10-01 14:54 ` Wu Fengguang
2009-10-01 21:35 ` Jan Kara
2009-10-02 2:25 ` Wu Fengguang
2009-10-02 9:54 ` Jan Kara
2009-10-02 10:34 ` Wu Fengguang
2009-09-08 18:35 ` Chris Mason
2009-09-08 17:57 ` Chris Mason
2009-09-08 18:28 ` Peter Zijlstra
2009-09-09 1:53 ` Dave Chinner
2009-09-09 3:52 ` Wu Fengguang
2009-09-08 18:06 ` Theodore Tso
[not found] ` <20090908181937.GA11545@infradead.org>
2009-09-08 19:34 ` Theodore Tso
2009-09-09 9:29 ` Wu Fengguang [this message]
2009-09-09 12:28 ` Christoph Hellwig
2009-09-09 12:32 ` Wu Fengguang
2009-09-09 12:36 ` Artem Bityutskiy
2009-09-09 12:37 ` Jens Axboe
2009-09-09 12:43 ` Christoph Hellwig
2009-09-09 12:44 ` Jens Axboe
2009-09-09 12:51 ` Christoph Hellwig
2009-09-09 12:57 ` Wu Fengguang
-- strict thread matches above, loose matches on Subject: below --
2009-09-04 7:46 [PATCH 0/8] Per-bdi writeback flusher threads v18 Jens Axboe
2009-09-04 7:46 ` [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb Jens Axboe
2009-09-04 15:28 ` Richard Kennedy
2009-09-05 13:26 ` Jamie Lokier
2009-09-05 16:18 ` Richard Kennedy
2009-09-05 16:46 ` Theodore Tso
2009-09-07 19:09 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090909092901.GA24185@localhost \
--to=fengguang.wu@intel.com \
--cc=chris.mason@oracle.com \
--cc=dedekind1@gmail.com \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).