From: Wu Fengguang <fengguang.wu@intel.com>
To: Richard Kennedy <richard@rsk.demon.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"mm-commits@vger.kernel.org" <mm-commits@vger.kernel.org>,
"chris.mason@oracle.com" <chris.mason@oracle.com>,
"jens.axboe@oracle.com" <jens.axboe@oracle.com>,
"mbligh@mbligh.org" <mbligh@mbligh.org>,
"miklos@szeredi.hu" <miklos@szeredi.hu>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: + mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-c ache-references.patch added to -mm tree
Date: Thu, 3 Sep 2009 10:22:23 +0800 [thread overview]
Message-ID: <20090903022223.GB9474@localhost> (raw)
In-Reply-To: <1251899612.2284.29.camel@castor>
On Wed, Sep 02, 2009 at 09:53:31PM +0800, Richard Kennedy wrote:
> On Wed, 2009-09-02 at 12:45 +0200, Peter Zijlstra wrote:
> > On Wed, 2009-09-02 at 17:57 +0800, Wu Fengguang wrote:
> > > On Wed, Sep 02, 2009 at 04:31:40PM +0800, Peter Zijlstra wrote:
> > > > On Sat, 2009-08-22 at 20:11 +0200, Peter Zijlstra wrote:
> > > > > > > + /* always throttle if over threshold */
> > > > > > > + if (nr_reclaimable + nr_writeback < dirty_thresh) {
> > > > > >
> > > > > > That 'if' is a big behavior change. It effectively blocks every one
> > > > > > and canceled Peter's proportional throttling work: the less a process
> > > > > > dirtied, the less it should be throttled.
> > > > >
> > > > > Hmm, I think you're right, I had not considered that, thanks for
> > > > > catching that.
> > > >
> > > > So in retrospect I think I might have been wrong here.
> > > >
> > > > The per task thing causes the bdi limit to be lower than the bdi limit
> > > > based on writeback speed alone. That is, the more a task dirties, the
> > > > lower the bdi limit is as seen for that task.
> > >
> > > Right. If I understand it right, there will be a safety margin of about
> > > (1/8) * dirty_limit for 1 heavy dirtier case, and that gap scales down
> > > when there are more concurrent heavy dirtiers.
> >
> > Right, with say 4 heavy writers the gap will be 1/4-th of 1/8-th, which
> > is 1/32-nd.
> >
> > With the side node that I think 1/8 is too much on large memory systems,
> > and I have posted a sqrt patch numerous times, but I don't think we've
> > ever found out if that helps or not...
> >
> > > In principle, the ceiling will be a bit higher for a light dirtier to
> > > make it easy to pass in the presence of more heavy dirtiers.
> >
> > Right.
> >
> > > > So if we get a task that generates tons of dirty pages (dd) then it
> > > > won't ever actually hit the full dirty limit, even if its the only task
> > > > on the system, and this outer if() will always be true.
> > >
> > > Right, we have the safety margin :)
> > >
> > > > Only when we actually saturate the full dirty limit will we fall through
> > > > and throttle, but that is ok -- we want to enforce the full limit.
> > > >
> > > > In short, a very aggressive dirtier will have a bdi limit lower than the
> > > > total limit (at all times) leaving a little room at the top for the
> > > > occasional dirtier to make quick progress.
> > > >
> > > > Wu, does that cover the scenario you had in mind?
> > >
> > > Yes thanks! Please correct me if wrong:
> > > - the lower-ceiling-for-heavier-dirtier algorithm in task_dirty_limit()
> > > is elegant enough to prevent heavy dirtier to block light ones
> >
> > ack
> >
> > > - the test (nr_reclaimable + nr_writeback < dirty_thresh) is not
> > > relevant in normal, but can be kept for safety in the form of
> > >
> > > if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
> > > nr_reclaimable + nr_writeback < dirty_thresh)
> > > break;
> >
> > ack
> >
> > > - clip_bdi_dirty_limit() could be removed: we have been secured by the
> > > above test
> >
> > ack.
>
>
> I've noticed that there's a difference in the handling of the
> dirty_exceeded flag, because this change no longer clips the bdi_thresh
> then the flag may get cleared more quickly here :-
>
> if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
> bdi->dirty_exceeded)
> bdi->dirty_exceeded = 0;
>
> So it then could call balance_dirty_pages a lot less often.
I guess in normal situations, clip_bdi_dirty_limit() is simply a
no-op, or just lowers bdi_thresh slightly (otherwise could a bug).
So it could be removed without causing much side effects, including
the influence on dirty_exceeded.
> I've got an updated version of this patch that moves the clip_bdi logic
> up into balance_dirty_pages that should be closer to the existing
> behavior & tests so far look good. I can post it for comments if you're
> interested ?
So I suggested just remove clip_bdi_dirty_limit(). To be sure, could
run with the following patch and check if big numbers are showed.
Thanks,
Fengguang
---
mm/page-writeback.c | 6 ++++++
1 file changed, 6 insertions(+)
--- linux-mm.orig/mm/page-writeback.c 2009-09-02 17:16:51.000000000 +0800
+++ linux-mm/mm/page-writeback.c 2009-09-03 10:19:00.000000000 +0800
@@ -258,6 +258,7 @@ static void clip_bdi_dirty_limit(struct
unsigned long dirty, unsigned long *pbdi_dirty)
{
unsigned long avail_dirty;
+ unsigned long delta;
avail_dirty = global_page_state(NR_FILE_DIRTY) +
global_page_state(NR_WRITEBACK) +
@@ -272,6 +273,11 @@ static void clip_bdi_dirty_limit(struct
avail_dirty += bdi_stat(bdi, BDI_RECLAIMABLE) +
bdi_stat(bdi, BDI_WRITEBACK);
+ delta = *pbdi_dirty - min(*pbdi_dirty, avail_dirty);
+ delta *= 1024;
+ delta /= *pbdi_dirty + 1;
+ printk("clip_bdi_dirty_limit %lu\n", delta);
+
*pbdi_dirty = min(*pbdi_dirty, avail_dirty);
}
next prev parent reply other threads:[~2009-09-03 2:22 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <200908212250.n7LMox3g029154@imap1.linux-foundation.org>
2009-08-22 2:51 ` + mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-c ache-references.patch added to -mm tree Wu Fengguang
2009-08-22 18:11 ` Peter Zijlstra
2009-08-23 1:32 ` Wu Fengguang
2009-08-23 5:31 ` Peter Zijlstra
2009-08-23 7:27 ` Wu Fengguang
2009-08-23 7:45 ` Peter Zijlstra
2009-09-02 8:31 ` Peter Zijlstra
2009-09-02 9:57 ` Wu Fengguang
2009-09-02 10:45 ` Peter Zijlstra
2009-09-02 13:53 ` Richard Kennedy
2009-09-03 2:22 ` Wu Fengguang [this message]
2009-09-03 3:09 ` Wu Fengguang
2009-09-03 9:48 ` Richard Kennedy
2009-09-03 11:05 ` Wu Fengguang
2009-09-03 12:26 ` Richard Kennedy
2009-09-03 4:53 ` Wu Fengguang
2009-08-23 9:33 ` Richard Kennedy
2009-08-23 13:00 ` Wu Fengguang
2009-08-23 13:46 ` Richard Kennedy
2009-08-24 1:41 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090903022223.GB9474@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mbligh@mbligh.org \
--cc=miklos@szeredi.hu \
--cc=mm-commits@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=richard@rsk.demon.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).