public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: zhejiang <zhe.jiang@intel.com>, linux-kernel@vger.kernel.org
Subject: Re: Reducing the bdi proporion calculation period to speed up disk write
Date: Fri, 14 Dec 2007 14:45:26 +0800	[thread overview]
Message-ID: <1197614726.6362.137.camel@ymzhang> (raw)
In-Reply-To: <1197367861.6985.14.camel@twins>

On Tue, 2007-12-11 at 11:11 +0100, Peter Zijlstra wrote:
> On Tue, 2007-12-11 at 14:25 +0800, zhejiang wrote:
> > The patch 04fbfdc14e5f48463820d6b9807daa5e9c92c51f implemented bdi per
> > device dirty threshold. It works well.
> > However, the period for proportion calculation may be too large.
> > For 8G memory, the calc_period_shift() will return 19 as the shift.
> > 
> > When we switch writing operation between different disks, there may be
> > potential performance issue.
> > 
> > For example, we first write to disk A, then write to disk B.
> > The proportion for disk B will increase slowly because the denominator
> > is too large (It's 2^18 + (global_count & counter_mask)).
> > The disk B will get small dirty page quota for a long time,
> > it will get blocked frequently though the total dirty page is under the
> > dirty page limit.
> > 
> > Peter provided a patch to avoid this issue, this patch allow violation
> > of bdi limits if there is a lot of room on the system.
> > It looks like:
> > 
> > +if (nr_reclaimable + nr_writeback < (background_thresh +
> > dirty_thresh) / 2)
> > +                     break; 
> > 
> > This patch really help to avoid congestion, but if the dirty pages
> > exceed about 3/4 of the dirty_thresh, congestion still happens if we
> > write to another disk. 
> > 
> > I think that we can reduce the period to speed up the proportion
> > adjustment. 
> > 
> > diff -Nur a/page-writeback.c b/page-writeback.c
> > --- a/page-writeback.c  2007-12-11 13:46:30.000000000 +0800
> > +++ b/page-writeback.c  2007-12-11 13:47:11.000000000 +0800
> > @@ -128,10 +128,7 @@
> >   */
> >  static int calc_period_shift(void)
> >  {
> > -       unsigned long dirty_total;
> > -
> > -       dirty_total = (vm_dirty_ratio * determine_dirtyable_memory()) /
> > 100;
> > -       return 2 + ilog2(dirty_total - 1);
> > +       return 12;
> >  }
> 
> Its a heuristic, it might need some tuning, but a static value is wrong.
> I think its generally true that the larger the machine memory size, the
> faster the storage subsystem. And the more likely it has more disks.
> 
> One of the reasons this value isn't static is that with your fixed 12 it
> becomes very hard to balance over more than 4096 active devices. Of
> course, it takes quite a special set-up to get into that situation.
I strongly agree with you that a static value is not a good idea.

> 
> As it is, it now takes about 2 * dirty limit to switch over, you could
> start by making that just a single, or maybe even half a, dirty limit.
We will do more testing to choose a better formular based on dirty_ratio
and total memory.

> 
> 
> Also, I'm not quite convinced your benchmark is all that useful. Do you
> really think it matches an actual frequently occurring usage pattern?
We used iozone to test 1.2GB sequential write/rewrite. It's hard to match
exactly an actual usage pattern, but I have an example. Administrator
might backup big files to other free disks periodically although he/she might
not need it fast.

-yanmin



      reply	other threads:[~2007-12-14  6:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-11  6:25 Reducing the bdi proporion calculation period to speed up disk write zhejiang
2007-12-11 10:11 ` Peter Zijlstra
2007-12-14  6:45   ` Zhang, Yanmin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1197614726.6362.137.camel@ymzhang \
    --to=yanmin_zhang@linux.intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zhe.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox