From: Wu Fengguang <fengguang.wu@intel.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@lst.de>,
Dave Chinner <david@fromorbit.com>,
Greg Thelen <gthelen@google.com>,
Minchan Kim <minchan.kim@gmail.com>,
Vivek Goyal <vgoyal@redhat.com>,
Andrea Righi <arighi@develer.com>, linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 10/18] writeback: dirty position control - bdi reserve area
Date: Wed, 28 Sep 2011 22:02:05 +0800 [thread overview]
Message-ID: <20110928140205.GA26617@localhost> (raw)
In-Reply-To: <20110918144751.GA18645@localhost>
[-- Attachment #1: Type: text/plain, Size: 7040 bytes --]
Hi Peter,
On Sun, Sep 18, 2011 at 10:47:51PM +0800, Wu Fengguang wrote:
> > BTW, I also compared the IO-less patchset and the vanilla kernel's
> > JBOD performance. Basically, the performance is lightly improved
> > under large memory, and reduced a lot in small memory servers.
> >
> > vanillla IO-less
> > --------------------------------------------------------------------------------
> [...]
> > 26508063 17706200 -33.2% JBOD-10HDD-thresh=100M/xfs-100dd-1M-16p-5895M-100M
> > 23767810 23374918 -1.7% JBOD-10HDD-thresh=100M/xfs-10dd-1M-16p-5895M-100M
> > 28032891 20659278 -26.3% JBOD-10HDD-thresh=100M/xfs-1dd-1M-16p-5895M-100M
> > 26049973 22517497 -13.6% JBOD-10HDD-thresh=100M/xfs-2dd-1M-16p-5895M-100M
> >
> > There are still some itches in JBOD..
>
> OK, in the dirty_bytes=100M case, I find that the bdi threshold _and_
> writeout bandwidth may drop close to 0 in long periods. This change
> may avoid one bdi being stuck:
>
> /*
> * bdi reserve area, safeguard against dirty pool underrun and disk idle
> *
> * It may push the desired control point of global dirty pages higher
> * than setpoint. It's not necessary in single-bdi case because a
> * minimal pool of @freerun dirty pages will already be guaranteed.
> */
> - x_intercept = min(write_bw, freerun);
> + x_intercept = min(write_bw + MIN_WRITEBACK_PAGES, freerun);
After lots of experiments, I end up with this bdi reserve point
+ x_intercept = bdi_thresh / 2 + MIN_WRITEBACK_PAGES;
together with this chunk to avoid a bdi stuck in bdi_thresh=0 state:
@@ -590,6 +590,7 @@ static unsigned long bdi_position_ratio(
*/
if (unlikely(bdi_thresh > thresh))
bdi_thresh = thresh;
+ bdi_thresh = max(bdi_thresh, (limit - dirty) / 8);
/*
* scale global setpoint to bdi's:
* bdi_setpoint = setpoint * bdi_thresh / thresh
The above changes are good enough to keep reasonable amount of bdi
dirty pages, so the bdi underrun flag ("[PATCH 11/18] block: add bdi
flag to indicate risk of io queue underrun") is dropped.
I also tried various bdi freerun patches, however the results are not
satisfactory. Basically the bdi reserve area approach (this patch)
yields noticeably more smooth/resilient behavior than the
freerun/underrun approaches. I noticed that the bdi underrun flag
could lead to sudden surge of dirty pages (especially if not
safeguarded by the dirty_exceeded condition) in the very small
window..
To dig performance increases/drops out of the large number of test
results, I wrote a convenient script (attached) to compare the
vmstat:nr_written numbers between 2+ set of test runs. It helped a lot
for fine tuning the parameters for different cases.
The current JBOD performance numbers are encouraging:
$ ./compare.rb JBOD*/*-vanilla+ JBOD*/*-bgthresh3+
3.1.0-rc4-vanilla+ 3.1.0-rc4-bgthresh3+
------------------------ ------------------------
52934365 +3.2% 54643527 JBOD-10HDD-thresh=100M/ext4-100dd-1M-24p-16384M-100M:10-X
45488896 +18.2% 53785605 JBOD-10HDD-thresh=100M/ext4-10dd-1M-24p-16384M-100M:10-X
47217534 +12.2% 53001031 JBOD-10HDD-thresh=100M/ext4-1dd-1M-24p-16384M-100M:10-X
32286924 +25.4% 40492312 JBOD-10HDD-thresh=100M/xfs-10dd-1M-24p-16384M-100M:10-X
38676965 +14.2% 44177606 JBOD-10HDD-thresh=100M/xfs-1dd-1M-24p-16384M-100M:10-X
59662173 +11.1% 66269621 JBOD-10HDD-thresh=800M/ext4-10dd-1M-24p-16384M-800M:10-X
57510438 +2.3% 58855181 JBOD-10HDD-thresh=800M/ext4-1dd-1M-24p-16384M-800M:10-X
63691922 +64.0% 104460352 JBOD-10HDD-thresh=800M/xfs-100dd-1M-24p-16384M-800M:10-X
51978567 +16.0% 60298210 JBOD-10HDD-thresh=800M/xfs-10dd-1M-24p-16384M-800M:10-X
47641062 +6.4% 50681038 JBOD-10HDD-thresh=800M/xfs-1dd-1M-24p-16384M-800M:10-X
The common single disk cases also see good numbers except for slight
drops in the dirty_bytes=100MB case:
$ ./compare.rb thresh*/*vanilla+ thresh*/*bgthresh3+
3.1.0-rc4-vanilla+ 3.1.0-rc4-bgthresh3+
------------------------ ------------------------
4092719 -2.5% 3988742 thresh=100M/ext4-10dd-4k-8p-4096M-100M:10-X
4956323 -4.0% 4758884 thresh=100M/ext4-1dd-4k-8p-4096M-100M:10-X
4640118 -0.4% 4621240 thresh=100M/ext4-2dd-4k-8p-4096M-100M:10-X
3545136 -3.5% 3420717 thresh=100M/xfs-10dd-4k-8p-4096M-100M:10-X
4399437 -0.9% 4361830 thresh=100M/xfs-1dd-4k-8p-4096M-100M:10-X
4100655 -3.3% 3964043 thresh=100M/xfs-2dd-4k-8p-4096M-100M:10-X
4780624 -0.1% 4776216 thresh=1G/ext4-10dd-4k-8p-4096M-1024M:10-X
4904565 +0.0% 4905293 thresh=1G/ext4-1dd-4k-8p-4096M-1024M:10-X
3578539 +9.1% 3903390 thresh=1G/xfs-10dd-4k-8p-4096M-1024M:10-X
4029890 +0.8% 4063717 thresh=1G/xfs-1dd-4k-8p-4096M-1024M:10-X
2449031 +20.0% 2937926 thresh=1M/ext4-10dd-4k-8p-4096M-1M:10-X
4161896 +7.5% 4472552 thresh=1M/ext4-1dd-4k-8p-4096M-1M:10-X
3437787 +18.8% 4085707 thresh=1M/ext4-2dd-4k-8p-4096M-1M:10-X
1921914 +14.8% 2206897 thresh=1M/xfs-10dd-4k-8p-4096M-1M:10-X
2537481 +65.8% 4207336 thresh=1M/xfs-1dd-4k-8p-4096M-1M:10-X
3329176 +12.3% 3739888 thresh=1M/xfs-2dd-4k-8p-4096M-1M:10-X
4587856 +1.8% 4672501 thresh=400M-300M/ext4-10dd-4k-8p-4096M-400M:300M-X
4883525 +0.0% 4884957 thresh=400M-300M/ext4-1dd-4k-8p-4096M-400M:300M-X
4799105 +2.3% 4907525 thresh=400M-300M/ext4-2dd-4k-8p-4096M-400M:300M-X
3931315 +3.0% 4048277 thresh=400M-300M/xfs-10dd-4k-8p-4096M-400M:300M-X
4238389 +3.9% 4401927 thresh=400M-300M/xfs-1dd-4k-8p-4096M-400M:300M-X
4032798 +2.3% 4123838 thresh=400M-300M/xfs-2dd-4k-8p-4096M-400M:300M-X
2425253 +35.2% 3279302 thresh=8M/ext4-10dd-4k-8p-4096M-8M:10-X
4728506 +2.2% 4834878 thresh=8M/ext4-1dd-4k-8p-4096M-8M:10-X
2782860 +62.1% 4511120 thresh=8M/ext4-2dd-4k-8p-4096M-8M:10-X
1966133 +24.3% 2443874 thresh=8M/xfs-10dd-4k-8p-4096M-8M:10-X
4238402 +1.7% 4308416 thresh=8M/xfs-1dd-4k-8p-4096M-8M:10-X
3299446 +13.3% 3739810 thresh=8M/xfs-2dd-4k-8p-4096M-8M:10-X
Thanks,
Fengguang
[-- Attachment #2: compare.rb --]
[-- Type: application/x-ruby, Size: 2755 bytes --]
next prev parent reply other threads:[~2011-09-28 14:02 UTC|newest]
Thread overview: 160+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-04 1:53 [PATCH 00/18] IO-less dirty throttling v11 Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 01/18] writeback: account per-bdi accumulated dirtied pages Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 02/18] writeback: dirty position control Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-05 15:02 ` Peter Zijlstra
2011-09-05 15:02 ` Peter Zijlstra
2011-09-06 2:10 ` Wu Fengguang
2011-09-06 2:10 ` Wu Fengguang
2011-09-05 15:05 ` Peter Zijlstra
2011-09-05 15:05 ` Peter Zijlstra
2011-09-06 2:43 ` Wu Fengguang
2011-09-06 2:43 ` Wu Fengguang
2011-09-06 18:20 ` Vivek Goyal
2011-09-06 18:20 ` Vivek Goyal
2011-09-08 2:53 ` Wu Fengguang
2011-09-08 2:53 ` Wu Fengguang
2011-11-12 5:44 ` Nai Xia
2011-11-12 5:44 ` Nai Xia
2011-09-04 1:53 ` [PATCH 03/18] writeback: dirty rate control Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-29 11:57 ` Wu Fengguang
2011-09-29 11:57 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 04/18] writeback: stabilize bdi->dirty_ratelimit Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 05/18] writeback: per task dirty rate limit Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-06 15:47 ` Peter Zijlstra
2011-09-06 15:47 ` Peter Zijlstra
2011-09-06 23:27 ` Jan Kara
2011-09-06 23:27 ` Jan Kara
2011-09-06 23:34 ` Jan Kara
2011-09-06 23:34 ` Jan Kara
2011-09-07 7:27 ` Peter Zijlstra
2011-09-07 7:27 ` Peter Zijlstra
2011-09-07 1:04 ` Wu Fengguang
2011-09-07 1:04 ` Wu Fengguang
2011-09-07 7:31 ` Peter Zijlstra
2011-09-07 7:31 ` Peter Zijlstra
2011-09-07 11:00 ` Wu Fengguang
2011-09-07 11:00 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 06/18] writeback: IO-less balance_dirty_pages() Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-06 12:13 ` Peter Zijlstra
2011-09-06 12:13 ` Peter Zijlstra
2011-09-07 2:46 ` Wu Fengguang
2011-09-07 2:46 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 07/18] writeback: dirty ratelimit - think time compensation Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 08/18] writeback: trace dirty_ratelimit Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 09/18] writeback: trace balance_dirty_pages Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 10/18] writeback: dirty position control - bdi reserve area Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-06 14:09 ` Peter Zijlstra
2011-09-06 14:09 ` Peter Zijlstra
2011-09-07 12:31 ` Wu Fengguang
2011-09-07 12:31 ` Wu Fengguang
2011-09-12 10:19 ` Peter Zijlstra
2011-09-12 10:19 ` Peter Zijlstra
2011-09-18 14:17 ` Wu Fengguang
2011-09-18 14:37 ` Wu Fengguang
2011-09-18 14:37 ` Wu Fengguang
2011-09-18 14:47 ` Wu Fengguang
2011-09-18 14:47 ` Wu Fengguang
2011-09-28 14:02 ` Wu Fengguang [this message]
2011-09-28 14:50 ` Peter Zijlstra
2011-09-28 14:50 ` Peter Zijlstra
2011-09-29 3:32 ` Wu Fengguang
2011-09-29 3:32 ` Wu Fengguang
2011-09-29 8:49 ` Peter Zijlstra
2011-09-29 8:49 ` Peter Zijlstra
2011-09-29 11:05 ` Wu Fengguang
2011-09-29 11:05 ` Wu Fengguang
2011-09-29 12:15 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 11/18] block: add bdi flag to indicate risk of io queue underrun Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-06 14:22 ` Peter Zijlstra
2011-09-06 14:22 ` Peter Zijlstra
2011-09-07 2:37 ` Wu Fengguang
2011-09-07 2:37 ` Wu Fengguang
2011-09-07 7:31 ` Peter Zijlstra
2011-09-07 7:31 ` Peter Zijlstra
2011-09-04 1:53 ` [PATCH 12/18] writeback: balanced_rate cannot exceed write bandwidth Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 13/18] writeback: limit max dirty pause time Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-06 14:52 ` Peter Zijlstra
2011-09-06 14:52 ` Peter Zijlstra
2011-09-07 2:35 ` Wu Fengguang
2011-09-07 2:35 ` Wu Fengguang
2011-09-12 10:22 ` Peter Zijlstra
2011-09-12 10:22 ` Peter Zijlstra
2011-09-18 14:23 ` Wu Fengguang
2011-09-18 14:23 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 14/18] writeback: control " Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-06 15:51 ` Peter Zijlstra
2011-09-06 15:51 ` Peter Zijlstra
2011-09-07 2:02 ` Wu Fengguang
2011-09-07 2:02 ` Wu Fengguang
2011-09-12 10:28 ` Peter Zijlstra
2011-09-12 10:28 ` Peter Zijlstra
2011-09-04 1:53 ` [PATCH 15/18] writeback: charge leaked page dirties to active tasks Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-06 16:16 ` Peter Zijlstra
2011-09-06 16:16 ` Peter Zijlstra
2011-09-07 9:06 ` Wu Fengguang
2011-09-07 9:06 ` Wu Fengguang
2011-09-07 0:17 ` Jan Kara
2011-09-07 0:17 ` Jan Kara
2011-09-07 9:37 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 16/18] writeback: fix dirtied pages accounting on sub-page writes Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` [PATCH 17/18] writeback: fix dirtied pages accounting on redirty Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-06 16:18 ` Peter Zijlstra
2011-09-06 16:18 ` Peter Zijlstra
2011-09-07 0:22 ` Jan Kara
2011-09-07 0:22 ` Jan Kara
2011-09-07 1:18 ` Wu Fengguang
2011-09-07 6:56 ` Christoph Hellwig
2011-09-07 6:56 ` Christoph Hellwig
2011-09-07 8:19 ` Peter Zijlstra
2011-09-07 8:19 ` Peter Zijlstra
2011-09-07 16:42 ` Jan Kara
2011-09-07 16:42 ` Jan Kara
2011-09-07 16:46 ` Christoph Hellwig
2011-09-07 16:46 ` Christoph Hellwig
2011-09-08 8:51 ` Steven Whitehouse
2011-09-08 8:51 ` Steven Whitehouse
2011-09-04 1:53 ` [PATCH 18/18] btrfs: fix dirtied pages accounting on sub-page writes Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-04 1:53 ` Wu Fengguang
2011-09-07 13:32 ` [PATCH 00/18] IO-less dirty throttling v11 Wu Fengguang
2011-09-07 13:32 ` Wu Fengguang
2011-09-07 19:14 ` Trond Myklebust
2011-09-07 19:14 ` Trond Myklebust
2011-09-28 14:58 ` Christoph Hellwig
2011-09-28 14:58 ` Christoph Hellwig
2011-09-29 4:11 ` Wu Fengguang
2011-09-29 4:11 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110928140205.GA26617@localhost \
--to=fengguang.wu@intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=arighi@develer.com \
--cc=david@fromorbit.com \
--cc=gthelen@google.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.