From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: [PATCH 31/47] writeback: increase min pause time on concurrent dirtiers Date: Mon, 13 Dec 2010 14:43:20 +0800 Message-ID: <20101213064840.799225309@intel.com> References: <20101213064249.648862451@intel.com> Cc: Jan Kara , Dave Chinner , Wu Fengguang To: Andrew Morton Return-path: Received: from mga03.intel.com ([143.182.124.21]:35379 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755214Ab0LMGtn (ORCPT ); Mon, 13 Dec 2010 01:49:43 -0500 CC: Christoph Hellwig CC: Trond Myklebust CC: Theodore Ts'o CC: Chris Mason CC: Peter Zijlstra CC: Mel Gorman CC: Rik van Riel CC: KOSAKI Motohiro CC: Greg Thelen CC: Minchan Kim Cc: linux-mm Cc: Cc: LKML Content-Disposition: inline; filename=writeback-min-pause-time-for-concurrent-dirtiers.patch Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Target for >60ms pause time when there are 100+ heavy dirtiers per bdi. (will average around 100ms given 200ms max pause time) It's OK for 1 dd task doing 100MB/s to be throttle paused 100 times per second. However when there are 100 tasks writing to the same disk, That sums up to 100*100 balance_dirty_pages() calls per second and may lead to massive cacheline bouncing on accessing the global page states in NUMA machines. Even in single socket boxes, we easily see >10% CPU time reduction by increasing the pause time. CC: Dave Chinner Signed-off-by: Wu Fengguang --- mm/page-writeback.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) --- linux-next.orig/mm/page-writeback.c 2010-12-09 12:24:45.000000000 +0800 +++ linux-next/mm/page-writeback.c 2010-12-09 12:24:47.000000000 +0800 @@ -666,6 +666,27 @@ static unsigned long max_pause(unsigned } /* + * Scale up pause time for concurrent dirtiers in order to reduce CPU overheads. + * But ensure reasonably large [min_pause, max_pause] range size, so that + * nr_dirtied_pause (and hence future pause time) can stay reasonably stable. + */ +static unsigned long min_pause(struct backing_dev_info *bdi, + unsigned long max) +{ + unsigned long hi = ilog2(bdi->write_bandwidth); + unsigned long lo = ilog2(bdi->throttle_bandwidth); + unsigned long t; + + if (lo >= hi) + return 1; + + /* (N * 10ms) on 2^N concurrent tasks */ + t = (hi - lo) * (10 * HZ) / 1024; + + return clamp_val(t, 1, max / 2); +} + +/* * balance_dirty_pages() must be called by processes which are generating dirty * data. It looks at the number of dirty pages in the machine and will force * the caller to perform writeback if the system is over `vm_dirty_ratio'. @@ -833,7 +854,7 @@ pause: if (pause == 0 && nr_dirty < background_thresh) current->nr_dirtied_pause = ratelimit_pages(bdi); - else if (pause == 1) + else if (pause <= min_pause(bdi, pause_max)) current->nr_dirtied_pause += current->nr_dirtied_pause / 32 + 1; else if (pause >= pause_max) /*