From: Wu Fengguang <fengguang.wu@intel.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"mm-commits@vger.kernel.org" <mm-commits@vger.kernel.org>,
"richard@rsk.demon.co.uk" <richard@rsk.demon.co.uk>,
"chris.mason@oracle.com" <chris.mason@oracle.com>,
"jens.axboe@oracle.com" <jens.axboe@oracle.com>,
"mbligh@mbligh.org" <mbligh@mbligh.org>,
"miklos@szeredi.hu" <miklos@szeredi.hu>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: + mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-c ache-references.patch added to -mm tree
Date: Sun, 23 Aug 2009 09:32:52 +0800 [thread overview]
Message-ID: <20090823013252.GA7661@localhost> (raw)
In-Reply-To: <1250964701.7538.101.camel@twins>
On Sun, Aug 23, 2009 at 02:11:41AM +0800, Peter Zijlstra wrote:
> On Sat, 2009-08-22 at 10:51 +0800, Wu Fengguang wrote:
>
> > > +++ a/mm/page-writeback.c
>
> > > @@ -465,7 +439,6 @@ get_dirty_limits(unsigned long *pbackgro
> > > bdi_dirty = dirty * bdi->max_ratio / 100;
> > >
> > > *pbdi_dirty = bdi_dirty;
> > > task_dirty_limit(current, pbdi_dirty);
> > > }
> > > }
> > > @@ -499,45 +472,12 @@ static void balance_dirty_pages(struct a
> > > };
> > >
> > > get_dirty_limits(&background_thresh, &dirty_thresh,
> > > + &bdi_thresh, bdi);
> > >
> > > nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
> > > + global_page_state(NR_UNSTABLE_NFS);
> > > + nr_writeback = global_page_state(NR_WRITEBACK) +
> > > + global_page_state(NR_WRITEBACK_TEMP);
> > >
> > > /*
> > > * In order to avoid the stacked BDI deadlock we need
> > > @@ -557,16 +497,48 @@ static void balance_dirty_pages(struct a
> > > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
> > > }
> > >
> >
> > > + /* always throttle if over threshold */
> > > + if (nr_reclaimable + nr_writeback < dirty_thresh) {
> >
> > That 'if' is a big behavior change. It effectively blocks every one
> > and canceled Peter's proportional throttling work: the less a process
> > dirtied, the less it should be throttled.
>
> Hmm, I think you're right, I had not considered that, thanks for
> catching that.
Thank you :)
> > I'd propose to remove the above 'if' and liberate the following three 'if's.
>
> That might work, but it looses the total dirty_thresh constraint. The
> sum of per-bdi dirties _should_ not be larger than that, but I'm not
> sure it won't ever be.
>
> The clip code Richard removed ensured that, and I think I wrote that out
> of more than sheer paranoia, but I'm not sure anymore :/
Oh I assumed that your per-bdi throttling is not too permissive to
exceed the global dirty_thresh. In theory the per-bdi throttling
should be able to quickly stop the growing of (nr_reclaimable +
nr_writeback). Once dirty_thresh is reached we already lose.
> I'll go over the numeric stuff again to see where it could go wrong.
>
> If we can bound the error (I'm suspecting it was some numerical error
> bound) we should be good and can indeed do this.
Yes, that error bound should be smaller than (dirty_thresh -
background_thresh) / 2, unless the user set the two thresholds
insanely close (for that we may add some sanity checks in
dirty_bytes_handler() and dirty_background_bytes_handler() etc.).
Anyway we may do some thing like this for now?
if (dirty_thresh exceeded) {
WARN_ONCE
block write more
}
Thanks,
Fengguang
> > > +
> > > + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
> > > + break;
> > > +
> > > + /*
> > > + * Throttle it only when the background writeback cannot
> > > + * catch-up. This avoids (excessively) small writeouts
> > > + * when the bdi limits are ramping up.
> > > + */
> > > + if (nr_reclaimable + nr_writeback <
> > > + (background_thresh + dirty_thresh) / 2)
> > > + break;
> > > +
> > > + /* done enough? */
> > > + if (pages_written >= write_chunk)
> > > + break;
> > > + }
> > > + if (!bdi->dirty_exceeded)
> > > + bdi->dirty_exceeded = 1;
> > >
> > > + /* Note: nr_reclaimable denotes nr_dirty + nr_unstable.
> > > + * Unstable writes are a feature of certain networked
> > > + * filesystems (i.e. NFS) in which data may have been
> > > + * written to the server's write cache, but has not yet
> > > + * been flushed to permanent storage.
> > > + * Only move pages to writeback if this bdi is over its
> > > + * threshold otherwise wait until the disk writes catch
> > > + * up.
> > > + */
> > > + if (bdi_nr_reclaimable > bdi_thresh) {
> > > + writeback_inodes(&wbc);
> > > + pages_written += write_chunk - wbc.nr_to_write;
> >
> > > + if (wbc.nr_to_write == 0)
> > > + continue;
> >
> > What's the purpose of the above 2 lines?
>
> I think I should slow down, I seem to have totally missed these two
> lines when I read the patch :/
>
> > > + }
> > > congestion_wait(BLK_RW_ASYNC, HZ/10);
> > > }
> > >
> > > if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
> > > + bdi->dirty_exceeded)
> > > bdi->dirty_exceeded = 0;
> > >
> > > if (writeback_in_progress(bdi))
> > > @@ -580,10 +552,8 @@ static void balance_dirty_pages(struct a
> > > * In normal mode, we start background writeout at the lower
> > > * background_thresh, to keep the amount of dirty memory low.
> > > */
> > > + if ((laptop_mode && pages_written) || (!laptop_mode &&
> > > + (nr_reclaimable > background_thresh)))
> > > bdi_start_writeback(bdi, NULL, 0, WB_SYNC_NONE);
> > > }
>
next prev parent reply other threads:[~2009-08-23 1:33 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-21 22:50 + mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-cache-references.patch added to -mm tree akpm
2009-08-22 2:51 ` + mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-c ache-references.patch " Wu Fengguang
2009-08-22 2:51 ` Wu Fengguang
2009-08-22 18:11 ` Peter Zijlstra
2009-08-23 1:32 ` Wu Fengguang [this message]
2009-08-23 5:31 ` Peter Zijlstra
2009-08-23 7:27 ` Wu Fengguang
2009-08-23 7:45 ` Peter Zijlstra
2009-09-02 8:31 ` Peter Zijlstra
2009-09-02 9:57 ` Wu Fengguang
2009-09-02 10:45 ` Peter Zijlstra
2009-09-02 13:53 ` Richard Kennedy
2009-09-03 2:22 ` Wu Fengguang
2009-09-03 3:09 ` Wu Fengguang
2009-09-03 9:48 ` Richard Kennedy
2009-09-03 11:05 ` Wu Fengguang
2009-09-03 12:26 ` Richard Kennedy
2009-09-03 4:53 ` Wu Fengguang
2009-08-23 9:33 ` Richard Kennedy
2009-08-23 9:33 ` Richard Kennedy
2009-08-23 13:00 ` Wu Fengguang
2009-08-23 13:00 ` Wu Fengguang
2009-08-23 13:46 ` Richard Kennedy
2009-08-23 13:46 ` Richard Kennedy
2009-08-24 1:41 ` Wu Fengguang
2009-08-24 1:41 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090823013252.GA7661@localhost \
--to=fengguang.wu@intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mbligh@mbligh.org \
--cc=miklos@szeredi.hu \
--cc=mm-commits@vger.kernel.org \
--cc=richard@rsk.demon.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.