From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754794Ab0KXNOn (ORCPT ); Wed, 24 Nov 2010 08:14:43 -0500 Received: from mga01.intel.com ([192.55.52.88]:13817 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753679Ab0KXNOm (ORCPT ); Wed, 24 Nov 2010 08:14:42 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.59,248,1288594800"; d="scan'208";a="860955375" Date: Wed, 24 Nov 2010 21:14:38 +0800 From: Wu Fengguang To: Peter Zijlstra Cc: Andrew Morton , Jan Kara , "Li, Shaohua" , Christoph Hellwig , Dave Chinner , "Theodore Ts'o" , Chris Mason , Mel Gorman , Rik van Riel , KOSAKI Motohiro , linux-mm , "linux-fsdevel@vger.kernel.org" , LKML Subject: Re: [PATCH 06/13] writeback: bdi write bandwidth estimation Message-ID: <20101124131437.GE10413@localhost> References: <20101117042720.033773013@intel.com> <20101117042850.002299964@intel.com> <1290596732.2072.450.camel@laptop> <20101124121046.GA8333@localhost> <1290603047.2072.465.camel@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1290603047.2072.465.camel@laptop> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 24, 2010 at 08:50:47PM +0800, Peter Zijlstra wrote: > On Wed, 2010-11-24 at 20:10 +0800, Wu Fengguang wrote: > > > > + /* > > > > + * When there lots of tasks throttled in balance_dirty_pages(), they > > > > + * will each try to update the bandwidth for the same period, making > > > > + * the bandwidth drift much faster than the desired rate (as in the > > > > + * single dirtier case). So do some rate limiting. > > > > + */ > > > > + if (jiffies - bdi->write_bandwidth_update_time < elapsed) > > > > + goto snapshot; > > > > > > Why this goto snapshot and not simply return? This is the second call > > > (bdi_update_bandwidth equivalent). > > > > Good question. The loop inside balance_dirty_pages() normally run only > > once, however wb_writeback() may loop over and over again. If we just > > return here, the condition > > > > (jiffies - bdi->write_bandwidth_update_time < elapsed) > > > > cannot be reset, then future bdi_update_bandwidth() calls in the same > > wb_writeback() loop will never find it OK to update the bandwidth. > > But the thing is, you don't want to reset that, it might loop so fast > you'll throttle all of them, if you keep the pre-throttle value you'll > eventually pass, no? It (let's name it A) only resets the _local_ vars bw_* when it's sure by the condition (jiffies - bdi->write_bandwidth_update_time < elapsed) that someone else (name B) has updated the _global_ bandwidth in the time range we planned. So there may be some time in A's range that is not covered by B, but sure the range is not totally bypassed without updating the bandwidth. > > It does assume no races between CPUs.. We may need some per-cpu based > > estimation. > > But that multi-writer race is valid even for the balance_dirty_pages() > call, two or more could interleave on the bw_time and bw_written > variables. The race will only exist in each task's local vars (their bw_* will overlap). But the update bdi->write_bandwidth* will be safeguarded by the above check. When the task is scheduled back, it may find updated write_bandwidth_update_time and hence give up his estimation. This is rather tricky.. Thanks, Fengguang