From: Wu Fengguang <fengguang.wu@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Theodore Ts'o <tytso@mit.edu>,
Chris Mason <chris.mason@oracle.com>,
Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
Jens Axboe <axboe@kernel.dk>, Mel Gorman <mel@csn.ul.ie>,
Rik van Riel <riel@redhat.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Christoph Hellwig <hch@lst.de>, linux-mm <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH v2] writeback: safety margin for bdi stat error
Date: Wed, 8 Dec 2010 12:37:46 +0800 [thread overview]
Message-ID: <20101208043746.GA15357@localhost> (raw)
In-Reply-To: <20101208043004.GB15322@localhost>
In a simple dd test on a 8p system with "mem=256M", I find all light
dirtier tasks on the root fs are get heavily throttled. That happens
because the global limit is exceeded. It's unbelievable at first sight,
because the test fs doing the heavy dd is under its bdi limit. After
doing some tracing, it's discovered that
bdi_dirty < bdi_dirty_limit() < global_dirty_limit() < nr_dirty
So the root cause is, the bdi_dirty is well under the global nr_dirty
due to accounting errors. This can be fixed by using bdi_stat_sum(),
however that's costly on large NUMA machines. So do a less costly fix
of lowering the bdi limit, so that the accounting errors won't lead to
the absurd situation "global limit exceeded but bdi limit not exceeded".
This provides guarantee when there is only 1 heavily dirtied bdi, and
works by opportunity for 2+ heavy dirtied bdi's (hopefully they won't
reach big error _and_ exceed their bdi limit at the same time).
Acked-by: Rik van Riel <riel@redhat.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
v2: add kernel doc and correct the terms in changelog.
mm/page-writeback.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
--- linux-next.orig/mm/page-writeback.c 2010-12-08 12:26:16.000000000 +0800
+++ linux-next/mm/page-writeback.c 2010-12-08 12:30:45.000000000 +0800
@@ -434,10 +434,16 @@ void global_dirty_limits(unsigned long *
*pdirty = dirty;
}
-/*
+/**
* bdi_dirty_limit - @bdi's share of dirty throttling threshold
+ * @bdi: the backing_dev_info to query
+ * @dirty: global dirty limit in pages
+ * @dirty_pages: current number of dirty pages
*
- * Allocate high/low dirty limits to fast/slow devices, in order to prevent
+ * Returns @bdi's dirty limit in pages. The term "dirty" in the context of
+ * dirty balancing includes all PG_dirty, PG_writeback and NFS unstable pages.
+ *
+ * It allocates high/low dirty limits to fast/slow devices, in order to prevent
* - starving fast devices
* - piling up dirty pages (that will take long time to sync) on slow devices
*
@@ -458,6 +464,14 @@ unsigned long bdi_dirty_limit(struct bac
long numerator, denominator;
/*
+ * try to prevent "global limit exceeded but bdi limit not exceeded"
+ */
+ if (likely(dirty > bdi_stat_error(bdi)))
+ dirty -= bdi_stat_error(bdi);
+ else
+ return 0;
+
+ /*
* Provide a global safety margin of ~1%, or up to 32MB for a 20GB box.
*/
dirty -= min(dirty / 128, 32768ULL >> (PAGE_SHIFT-10));
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-12-08 4:37 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-05 6:44 [PATCH] writeback: enabling-gate for light dirtied bdi Wu Fengguang
2010-12-05 14:04 ` Rik van Riel
2010-12-07 13:11 ` [PATCH] writeback: safety margin for bdi stat errors Wu Fengguang
[not found] ` <20101207143351.GA23377@localhost>
2010-12-07 15:21 ` ext4 memory leak? Wu Fengguang
2010-12-07 16:38 ` Ted Ts'o
2010-12-08 2:40 ` Wu Fengguang
2010-12-08 3:07 ` Theodore Tso
2010-12-08 6:10 ` Wu Fengguang
2010-12-07 17:34 ` [PATCH] writeback: safety margin for bdi stat errors Rik van Riel
2010-12-08 0:51 ` [PATCH] writeback: enabling-gate for light dirtied bdi Andrew Morton
2010-12-08 4:04 ` Wu Fengguang
2010-12-08 4:30 ` [PATCH v2] " Wu Fengguang
2010-12-08 4:37 ` Wu Fengguang [this message]
2010-12-08 15:31 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101208043746.GA15357@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=chris.mason@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).