From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: zhejiang <zhe.jiang@intel.com>, linux-kernel@vger.kernel.org
Subject: Re: Reducing the bdi proporion calculation period to speed up disk write
Date: Fri, 14 Dec 2007 14:45:26 +0800 [thread overview]
Message-ID: <1197614726.6362.137.camel@ymzhang> (raw)
In-Reply-To: <1197367861.6985.14.camel@twins>
On Tue, 2007-12-11 at 11:11 +0100, Peter Zijlstra wrote:
> On Tue, 2007-12-11 at 14:25 +0800, zhejiang wrote:
> > The patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f implemented bdi per
> > device dirty threshold. It works well.
> > However, the period for proportion calculation may be too large.
> > For 8G memory, the calc_period_shift() will return 19 as the shift.
> >
> > When we switch writing operation between different disks, there may be
> > potential performance issue.
> >
> > For example, we first write to disk A, then write to disk B.
> > The proportion for disk B will increase slowly because the denominator
> > is too large (It's 2^18 + (global_count & counter_mask)).
> > The disk B will get small dirty page quota for a long time,
> > it will get blocked frequently though the total dirty page is under the
> > dirty page limit.
> >
> > Peter provided a patch to avoid this issue, this patch allow violation
> > of bdi limits if there is a lot of room on the system.
> > It looks like:
> >
> > +if (nr_reclaimable + nr_writeback < (background_thresh +
> > dirty_thresh) / 2)
> > + break;
> >
> > This patch really help to avoid congestion, but if the dirty pages
> > exceed about 3/4 of the dirty_thresh, congestion still happens if we
> > write to another disk.
> >
> > I think that we can reduce the period to speed up the proportion
> > adjustment.
> >
> > diff -Nur a/page-writeback.c b/page-writeback.c
> > --- a/page-writeback.c 2007-12-11 13:46:30.000000000 +0800
> > +++ b/page-writeback.c 2007-12-11 13:47:11.000000000 +0800
> > @@ -128,10 +128,7 @@
> > */
> > static int calc_period_shift(void)
> > {
> > - unsigned long dirty_total;
> > -
> > - dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> > 100;
> > - return 2 + ilog2(dirty_total - 1);
> > + return 12;
> > }
>
> Its a heuristic, it might need some tuning, but a static value is wrong.
> I think its generally true that the larger the machine memory size, the
> faster the storage subsystem. And the more likely it has more disks.
>
> One of the reasons this value isn't static is that with your fixed 12 it
> becomes very hard to balance over more than 4096 active devices. Of
> course, it takes quite a special set-up to get into that situation.
I strongly agree with you that a static value is not a good idea.
>
> As it is, it now takes about 2 * dirty limit to switch over, you could
> start by making that just a single, or maybe even half a, dirty limit.
We will do more testing to choose a better formular based on dirty_ratio
and total memory.
>
>
> Also, I'm not quite convinced your benchmark is all that useful. Do you
> really think it matches an actual frequently occurring usage pattern?
We used iozone to test 1.2GB sequential write/rewrite. It's hard to match
exactly an actual usage pattern, but I have an example. Administrator
might backup big files to other free disks periodically although he/she might
not need it fast.
-yanmin
prev parent reply other threads:[~2007-12-14 6:50 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-11 6:25 Reducing the bdi proporion calculation period to speed up disk write zhejiang
2007-12-11 10:11 ` Peter Zijlstra
2007-12-14 6:45 ` Zhang, Yanmin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1197614726.6362.137.camel@ymzhang \
--to=yanmin_zhang@linux.intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=zhe.jiang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox