From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: [PATCH 02/47] writeback: safety margin for bdi stat error Date: Mon, 13 Dec 2010 14:42:51 +0800 Message-ID: <20101213064837.177415939@intel.com> References: <20101213064249.648862451@intel.com> Return-path: Received: from kanga.kvack.org ([205.233.56.17]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1PS2EN-0005Rp-BV for glkm-linux-mm-2@m.gmane.org; Mon, 13 Dec 2010 07:49:55 +0100 Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with SMTP id 08B9B6B0092 for ; Mon, 13 Dec 2010 01:49:39 -0500 (EST) Content-Disposition: inline; filename=writeback-bdi-error.patch Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: Jan Kara , Peter Zijlstra , Wu Fengguang , Christoph Hellwig , Trond Myklebust , Dave Chinner , Theodore Ts'o , Chris Mason , Mel Gorman , Rik van Riel , KOSAKI Motohiro , Greg Thelen , Minchan Kim , linux-mm , linux-fsdevel@vger.kernel.org, LKML List-Id: linux-mm.kvack.org In a simple dd test on a 8p system with "mem=256M", I find all light dirtier tasks on the root fs are get heavily throttled. That happens because the global limit is exceeded. It's unbelievable at first sight, because the test fs doing the heavy dd is under its bdi limit. After doing some tracing, it's discovered that bdi_dirty < bdi_dirty_limit() < global_dirty_limit() < nr_dirty So the root cause is, the bdi_dirty is well under the global nr_dirty due to accounting errors. This can be fixed by using bdi_stat_sum(), however that's costly on large NUMA machines. So do a less costly fix of lowering the bdi limit, so that the accounting errors won't lead to the absurd situation "global limit exceeded but bdi limit not exceeded". This provides guarantee when there is only 1 heavily dirtied bdi, and works by opportunity for 2+ heavy dirtied bdi's (hopefully they won't reach big error _and_ exceed their bdi limit at the same time). CC: Peter Zijlstra Signed-off-by: Wu Fengguang --- mm/page-writeback.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) --- linux-next.orig/mm/page-writeback.c 2010-12-08 22:44:21.000000000 +0800 +++ linux-next/mm/page-writeback.c 2010-12-08 22:44:21.000000000 +0800 @@ -434,10 +434,16 @@ void global_dirty_limits(unsigned long * *pdirty = dirty; } -/* +/** * bdi_dirty_limit - @bdi's share of dirty throttling threshold + * @bdi: the backing_dev_info to query + * @dirty: global dirty limit in pages + * @dirty_pages: current number of dirty pages * - * Allocate high/low dirty limits to fast/slow devices, in order to prevent + * Returns @bdi's dirty limit in pages. The term "dirty" in the context of + * dirty balancing includes all PG_dirty, PG_writeback and NFS unstable pages. + * + * It allocates high/low dirty limits to fast/slow devices, in order to prevent * - starving fast devices * - piling up dirty pages (that will take long time to sync) on slow devices * @@ -458,6 +464,14 @@ unsigned long bdi_dirty_limit(struct bac long numerator, denominator; /* + * try to prevent "global limit exceeded but bdi limit not exceeded" + */ + if (likely(dirty > bdi_stat_error(bdi))) + dirty -= bdi_stat_error(bdi); + else + return 0; + + /* * Provide a global safety margin of ~1%, or up to 32MB for a 20GB box. */ dirty -= min(dirty / 128, 32768ULL >> (PAGE_SHIFT-10)); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: email@kvack.org