From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: zhejiang <zhe.jiang@intel.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Reducing the bdi proporion calculation period to speed up disk write
Date: Tue, 11 Dec 2007 11:11:01 +0100 [thread overview]
Message-ID: <1197367861.6985.14.camel@twins> (raw)
In-Reply-To: <1197354339.668.62.camel@localhost.localdomain>
[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]
On Tue, 2007-12-11 at 14:25 +0800, zhejiang wrote:
> The patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f implemented bdi per
> device dirty threshold. It works well.
> However, the period for proportion calculation may be too large.
> For 8G memory, the calc_period_shift() will return 19 as the shift.
>
> When we switch writing operation between different disks, there may be
> potential performance issue.
>
> For example, we first write to disk A, then write to disk B.
> The proportion for disk B will increase slowly because the denominator
> is too large (It's 2^18 + (global_count & counter_mask)).
> The disk B will get small dirty page quota for a long time,
> it will get blocked frequently though the total dirty page is under the
> dirty page limit.
>
> Peter provided a patch to avoid this issue, this patch allow violation
> of bdi limits if there is a lot of room on the system.
> It looks like:
>
> +if (nr_reclaimable + nr_writeback < (background_thresh +
> dirty_thresh) / 2)
> + break;
>
> This patch really help to avoid congestion, but if the dirty pages
> exceed about 3/4 of the dirty_thresh, congestion still happens if we
> write to another disk.
>
> I think that we can reduce the period to speed up the proportion
> adjustment.
>
> diff -Nur a/page-writeback.c b/page-writeback.c
> --- a/page-writeback.c 2007-12-11 13:46:30.000000000 +0800
> +++ b/page-writeback.c 2007-12-11 13:47:11.000000000 +0800
> @@ -128,10 +128,7 @@
> */
> static int calc_period_shift(void)
> {
> - unsigned long dirty_total;
> -
> - dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> 100;
> - return 2 + ilog2(dirty_total - 1);
> + return 12;
> }
Its a heuristic, it might need some tuning, but a static value is wrong.
I think its generally true that the larger the machine memory size, the
faster the storage subsystem. And the more likely it has more disks.
One of the reasons this value isn't static is that with your fixed 12 it
becomes very hard to balance over more than 4096 active devices. Of
course, it takes quite a special set-up to get into that situation.
As it is, it now takes about 2 * dirty limit to switch over, you could
start by making that just a single, or maybe even half a, dirty limit.
Also, I'm not quite convinced your benchmark is all that useful. Do you
really think it matches an actual frequently occurring usage pattern?
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2007-12-11 10:11 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-11 6:25 Reducing the bdi proporion calculation period to speed up disk write zhejiang
2007-12-11 10:11 ` Peter Zijlstra [this message]
2007-12-14 6:45 ` Zhang, Yanmin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1197367861.6985.14.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=zhe.jiang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox