From: Wu Fengguang <fengguang.wu@intel.com>
To: Richard Kennedy <richard@rsk.demon.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"mm-commits@vger.kernel.org" <mm-commits@vger.kernel.org>,
"chris.mason@oracle.com" <chris.mason@oracle.com>,
"jens.axboe@oracle.com" <jens.axboe@oracle.com>,
"mbligh@mbligh.org" <mbligh@mbligh.org>,
"miklos@szeredi.hu" <miklos@szeredi.hu>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: + mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-c ache-references.patch added to -mm tree
Date: Thu, 3 Sep 2009 10:22:23 +0800 [thread overview]
Message-ID: <20090903022223.GB9474@localhost> (raw)
In-Reply-To: <1251899612.2284.29.camel@castor>
On Wed, Sep 02, 2009 at 09:53:31PM +0800, Richard Kennedy wrote:
> On Wed, 2009-09-02 at 12:45 +0200, Peter Zijlstra wrote:
> > On Wed, 2009-09-02 at 17:57 +0800, Wu Fengguang wrote:
> > > On Wed, Sep 02, 2009 at 04:31:40PM +0800, Peter Zijlstra wrote:
> > > > On Sat, 2009-08-22 at 20:11 +0200, Peter Zijlstra wrote:
> > > > > > > + /* always throttle if over threshold */
> > > > > > > + if (nr_reclaimable + nr_writeback < dirty_thresh) {
> > > > > >
> > > > > > That 'if' is a big behavior change. It effectively blocks every one
> > > > > > and canceled Peter's proportional throttling work: the less a process
> > > > > > dirtied, the less it should be throttled.
> > > > >
> > > > > Hmm, I think you're right, I had not considered that, thanks for
> > > > > catching that.
> > > >
> > > > So in retrospect I think I might have been wrong here.
> > > >
> > > > The per task thing causes the bdi limit to be lower than the bdi limit
> > > > based on writeback speed alone. That is, the more a task dirties, the
> > > > lower the bdi limit is as seen for that task.
> > >
> > > Right. If I understand it right, there will be a safety margin of about
> > > (1/8) * dirty_limit for 1 heavy dirtier case, and that gap scales down
> > > when there are more concurrent heavy dirtiers.
> >
> > Right, with say 4 heavy writers the gap will be 1/4-th of 1/8-th, which
> > is 1/32-nd.
> >
> > With the side node that I think 1/8 is too much on large memory systems,
> > and I have posted a sqrt patch numerous times, but I don't think we've
> > ever found out if that helps or not...
> >
> > > In principle, the ceiling will be a bit higher for a light dirtier to
> > > make it easy to pass in the presence of more heavy dirtiers.
> >
> > Right.
> >
> > > > So if we get a task that generates tons of dirty pages (dd) then it
> > > > won't ever actually hit the full dirty limit, even if its the only task
> > > > on the system, and this outer if() will always be true.
> > >
> > > Right, we have the safety margin :)
> > >
> > > > Only when we actually saturate the full dirty limit will we fall through
> > > > and throttle, but that is ok -- we want to enforce the full limit.
> > > >
> > > > In short, a very aggressive dirtier will have a bdi limit lower than the
> > > > total limit (at all times) leaving a little room at the top for the
> > > > occasional dirtier to make quick progress.
> > > >
> > > > Wu, does that cover the scenario you had in mind?
> > >
> > > Yes thanks! Please correct me if wrong:
> > > - the lower-ceiling-for-heavier-dirtier algorithm in task_dirty_limit()
> > > is elegant enough to prevent heavy dirtier to block light ones
> >
> > ack
> >
> > > - the test (nr_reclaimable + nr_writeback < dirty_thresh) is not
> > > relevant in normal, but can be kept for safety in the form of
> > >
> > > if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
> > > nr_reclaimable + nr_writeback < dirty_thresh)
> > > break;
> >
> > ack
> >
> > > - clip_bdi_dirty_limit() could be removed: we have been secured by the
> > > above test
> >
> > ack.
>
>
> I've noticed that there's a difference in the handling of the
> dirty_exceeded flag, because this change no longer clips the bdi_thresh
> then the flag may get cleared more quickly here :-
>
> if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
> bdi->dirty_exceeded)
> bdi->dirty_exceeded = 0;
>
> So it then could call balance_dirty_pages a lot less often.
I guess in normal situations, clip_bdi_dirty_limit() is simply a
no-op, or just lowers bdi_thresh slightly (otherwise could a bug).
So it could be removed without causing much side effects, including
the influence on dirty_exceeded.
> I've got an updated version of this patch that moves the clip_bdi logic
> up into balance_dirty_pages that should be closer to the existing
> behavior & tests so far look good. I can post it for comments if you're
> interested ?
So I suggested just remove clip_bdi_dirty_limit(). To be sure, could
run with the following patch and check if big numbers are showed.
Thanks,
Fengguang
---
mm/page-writeback.c | 6 ++++++
1 file changed, 6 insertions(+)
--- linux-mm.orig/mm/page-writeback.c 2009-09-02 17:16:51.000000000 +0800
+++ linux-mm/mm/page-writeback.c 2009-09-03 10:19:00.000000000 +0800
@@ -258,6 +258,7 @@ static void clip_bdi_dirty_limit(struct
unsigned long dirty, unsigned long *pbdi_dirty)
{
unsigned long avail_dirty;
+ unsigned long delta;
avail_dirty = global_page_state(NR_FILE_DIRTY) +
global_page_state(NR_WRITEBACK) +
@@ -272,6 +273,11 @@ static void clip_bdi_dirty_limit(struct
avail_dirty += bdi_stat(bdi, BDI_RECLAIMABLE) +
bdi_stat(bdi, BDI_WRITEBACK);
+ delta = *pbdi_dirty - min(*pbdi_dirty, avail_dirty);
+ delta *= 1024;
+ delta /= *pbdi_dirty + 1;
+ printk("clip_bdi_dirty_limit %lu\n", delta);
+
*pbdi_dirty = min(*pbdi_dirty, avail_dirty);
}
next prev parent reply other threads:[~2009-09-03 2:22 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-21 22:50 + mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-cache-references.patch added to -mm tree akpm
2009-08-22 2:51 ` + mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-c ache-references.patch " Wu Fengguang
2009-08-22 2:51 ` Wu Fengguang
2009-08-22 18:11 ` Peter Zijlstra
2009-08-23 1:32 ` Wu Fengguang
2009-08-23 5:31 ` Peter Zijlstra
2009-08-23 7:27 ` Wu Fengguang
2009-08-23 7:45 ` Peter Zijlstra
2009-09-02 8:31 ` Peter Zijlstra
2009-09-02 9:57 ` Wu Fengguang
2009-09-02 10:45 ` Peter Zijlstra
2009-09-02 13:53 ` Richard Kennedy
2009-09-03 2:22 ` Wu Fengguang [this message]
2009-09-03 3:09 ` Wu Fengguang
2009-09-03 9:48 ` Richard Kennedy
2009-09-03 11:05 ` Wu Fengguang
2009-09-03 12:26 ` Richard Kennedy
2009-09-03 4:53 ` Wu Fengguang
2009-08-23 9:33 ` Richard Kennedy
2009-08-23 9:33 ` Richard Kennedy
2009-08-23 13:00 ` Wu Fengguang
2009-08-23 13:00 ` Wu Fengguang
2009-08-23 13:46 ` Richard Kennedy
2009-08-23 13:46 ` Richard Kennedy
2009-08-24 1:41 ` Wu Fengguang
2009-08-24 1:41 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090903022223.GB9474@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mbligh@mbligh.org \
--cc=miklos@szeredi.hu \
--cc=mm-commits@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=richard@rsk.demon.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.