From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Hellwig <hch@infradead.org>,
Dave Chinner <david@fromorbit.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: [PATCH] mm: Properly reflect task dirty limits in dirty_exceeded logic
Date: Mon, 4 Jul 2011 09:06:19 +0800 [thread overview]
Message-ID: <20110704010618.GA3841@localhost> (raw)
In-Reply-To: <1309458764-9153-1-git-send-email-jack@suse.cz>
Hi Jan,
On Fri, Jul 01, 2011 at 02:32:44AM +0800, Jan Kara wrote:
> We set bdi->dirty_exceeded (and thus ratelimiting code starts to
> call balance_dirty_pages() every 8 pages) when a per-bdi limit is
> exceeded or global limit is exceeded. But per-bdi limit also depends
> on the task. Thus different tasks reach the limit on that bdi at
> different levels of dirty pages. The result is that with current code
> bdi->dirty_exceeded ping-ponged between 1 and 0 depending on which task
> just got into balance_dirty_pages().
>
> We fix the issue by clearing bdi->dirty_exceeded only when per-bdi amount
> of dirty pages drops below the threshold (7/8 * bdi_dirty_limit) where task
> limits already do not have any influence.
The end result, I think, is that the dirty pages are kept more tightly
under control, with the average number a slightly lowered than before.
This reduces the risk to throttle light dirtiers and hence more
responsive. However it does introduce more overheads by enforcing
balance_dirty_pages() calls on every 8 pages.
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Christoph Hellwig <hch@infradead.org>
> CC: Dave Chinner <david@fromorbit.com>
> CC: Wu Fengguang <fengguang.wu@intel.com>
> CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> mm/page-writeback.c | 29 ++++++++++++++++++++++-------
> 1 files changed, 22 insertions(+), 7 deletions(-)
>
> This is the patch fixing dirty_exceeded logic I promised you last week.
> I based it on current Linus's tree as it is a relatively direct fix so I
> expect it can be somewhere in the beginning of the patch series and merged
> relatively quickly. Can you please add it to your tree? Thanks.
OK. I noticed that bdi_thresh is no longer used. What if we just
rename it? But I admit that the patch in its current form looks a bit
more clear in concept.
Thanks,
Fengguang
> Honza
>
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 31f6988..d8b395f 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -274,12 +274,13 @@ static inline void task_dirties_fraction(struct task_struct *tsk,
> * effectively curb the growth of dirty pages. Light dirtiers with high enough
> * dirty threshold may never get throttled.
> */
> +#define TASK_LIMIT_FRACTION 8
> static unsigned long task_dirty_limit(struct task_struct *tsk,
> unsigned long bdi_dirty)
> {
> long numerator, denominator;
> unsigned long dirty = bdi_dirty;
> - u64 inv = dirty >> 3;
> + u64 inv = dirty / TASK_LIMIT_FRACTION;
>
> task_dirties_fraction(tsk, &numerator, &denominator);
> inv *= numerator;
> @@ -290,6 +291,12 @@ static unsigned long task_dirty_limit(struct task_struct *tsk,
> return max(dirty, bdi_dirty/2);
> }
>
> +/* Minimum limit for any task */
> +static unsigned long task_min_dirty_limit(unsigned long bdi_dirty)
> +{
> + return bdi_dirty - bdi_dirty / TASK_LIMIT_FRACTION;
> +}
> +
> /*
> *
> */
> @@ -483,9 +490,12 @@ static void balance_dirty_pages(struct address_space *mapping,
> unsigned long background_thresh;
> unsigned long dirty_thresh;
> unsigned long bdi_thresh;
> + unsigned long task_bdi_thresh;
> + unsigned long min_task_bdi_thresh;
> unsigned long pages_written = 0;
> unsigned long pause = 1;
> bool dirty_exceeded = false;
> + bool clear_dirty_exceeded = true;
> struct backing_dev_info *bdi = mapping->backing_dev_info;
>
> for (;;) {
> @@ -512,7 +522,8 @@ static void balance_dirty_pages(struct address_space *mapping,
> break;
>
> bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh);
> - bdi_thresh = task_dirty_limit(current, bdi_thresh);
> + min_task_bdi_thresh = task_min_dirty_limit(bdi_thresh);
> + task_bdi_thresh = task_dirty_limit(current, bdi_thresh);
>
> /*
> * In order to avoid the stacked BDI deadlock we need
> @@ -524,7 +535,7 @@ static void balance_dirty_pages(struct address_space *mapping,
> * actually dirty; with m+n sitting in the percpu
> * deltas.
> */
> - if (bdi_thresh < 2*bdi_stat_error(bdi)) {
> + if (task_bdi_thresh < 2 * bdi_stat_error(bdi)) {
> bdi_nr_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> bdi_nr_writeback = bdi_stat_sum(bdi, BDI_WRITEBACK);
> } else {
> @@ -539,8 +550,11 @@ static void balance_dirty_pages(struct address_space *mapping,
> * the last resort safeguard.
> */
> dirty_exceeded =
> - (bdi_nr_reclaimable + bdi_nr_writeback > bdi_thresh)
> - || (nr_reclaimable + nr_writeback > dirty_thresh);
> + (bdi_nr_reclaimable + bdi_nr_writeback > task_bdi_thresh)
> + || (nr_reclaimable + nr_writeback > dirty_thresh);
> + clear_dirty_exceeded =
> + (bdi_nr_reclaimable + bdi_nr_writeback <= min_task_bdi_thresh)
> + && (nr_reclaimable + nr_writeback <= dirty_thresh);
>
> if (!dirty_exceeded)
> break;
> @@ -558,7 +572,7 @@ static void balance_dirty_pages(struct address_space *mapping,
> * up.
> */
> trace_wbc_balance_dirty_start(&wbc, bdi);
> - if (bdi_nr_reclaimable > bdi_thresh) {
> + if (bdi_nr_reclaimable > task_bdi_thresh) {
> writeback_inodes_wb(&bdi->wb, &wbc);
> pages_written += write_chunk - wbc.nr_to_write;
> trace_wbc_balance_dirty_written(&wbc, bdi);
> @@ -578,7 +592,8 @@ static void balance_dirty_pages(struct address_space *mapping,
> pause = HZ / 10;
> }
>
> - if (!dirty_exceeded && bdi->dirty_exceeded)
> + /* Clear dirty_exceeded flag only when no task can exceed the limit */
> + if (clear_dirty_exceeded && bdi->dirty_exceeded)
> bdi->dirty_exceeded = 0;
>
> if (writeback_in_progress(bdi))
> --
> 1.7.1
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-07-04 1:06 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-30 18:32 [PATCH] mm: Properly reflect task dirty limits in dirty_exceeded logic Jan Kara
2011-07-04 1:06 ` Wu Fengguang [this message]
2011-07-11 17:06 ` Jan Kara
2011-07-13 23:02 ` Wu Fengguang
2011-07-14 21:34 ` Jan Kara
2011-07-23 7:43 ` Wu Fengguang
2011-07-25 16:04 ` Jan Kara
2011-07-26 4:13 ` Wu Fengguang
2011-07-26 13:57 ` Jan Kara
2011-07-27 14:04 ` Wu Fengguang
2011-07-27 15:10 ` Christoph Hellwig
2011-07-28 15:31 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110704010618.GA3841@localhost \
--to=fengguang.wu@intel.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.