From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fengguang Wu Subject: Re: [PATCH] writeback: fix writeback cache thrashing Date: Sat, 5 Jan 2013 17:55:46 +0800 Message-ID: <20130105095546.GA13670@localhost> References: <20121231113054.GC7564@quack.suse.cz> <20130102134334.GB30633@quack.suse.cz> <1357261151.5105.2.camel@kernel.cn.ibm.com> <1357346803.5273.10.camel@kernel.cn.ibm.com> <20130105032642.GA8188@localhost> <1357363603.5273.16.camel@kernel.cn.ibm.com> <20130105073846.GA11811@localhost> <1357378914.8716.3.camel@kernel.cn.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Namjae Jeon , Jan Kara , Wanpeng Li , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Namjae Jeon , Vivek Trivedi , Dave Chinner To: Simon Jeons Return-path: Received: from mga01.intel.com ([192.55.52.88]:50798 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755401Ab3AEJzu (ORCPT ); Sat, 5 Jan 2013 04:55:50 -0500 Content-Disposition: inline In-Reply-To: <1357378914.8716.3.camel@kernel.cn.ibm.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Sat, Jan 05, 2013 at 03:41:54AM -0600, Simon Jeons wrote: > On Sat, 2013-01-05 at 15:38 +0800, Fengguang Wu wrote: > > On Fri, Jan 04, 2013 at 11:26:43PM -0600, Simon Jeons wrote: > > > On Sat, 2013-01-05 at 11:26 +0800, Fengguang Wu wrote: > > > > > > > Hi Namjae, > > > > > > > > > > > > > > Why use bdi_stat_error here? What's the meaning of its co= mment "maximal > > > > > > > error of a stat counter"? > > > > > > Hi Simon, > > > > > >=20 > > > > > > As you know bdi stats (BDI_RECLAIMABLE, BDI_WRITEBACK =E2=80= =A6) are kept in > > > > > > percpu counters. > > > > > > When these percpu counters are incremented/decremented simu= ltaneously > > > > > > on multiple CPUs by small amount (individual cpu counter le= ss than > > > > > > threshold BDI_STAT_BATCH), > > > > > > it is possible that we get approximate value (not exact val= ue) of > > > > > > these percpu counters. > > > > > > In order, to handle these percpu counter error we have used > > > > > > bdi_stat_error. bdi_stat_error is the maximum error which c= an happen > > > > > > in percpu bdi stats accounting. > > > > > >=20 > > > > > > bdi_stat(bdi, BDI_RECLAIMABLE); > > > > > > -> This will give approximate value of BDI_RECLAIMABLE by = reading > > > > > > previous value of percpu count. > > > > > >=20 > > > > > > bdi_stat_sum(bdi, BDI_RECLAIMABLE); > > > > > > ->This will give exact value of BDI_RECLAIMABLE. It will t= ake lock > > > > > > and add current percpu count of individual CPUs. > > > > > > It is not recommended to use it frequently as it is expe= nsive. We > > > > > > can better use =E2=80=9Cbdi_stat=E2=80=9D and work with app= rox value of bdi stats. > > > > > >=20 > > > > >=20 > > > > > Hi Namjae, thanks for your clarify. > > > > >=20 > > > > > But why compare error stat count to bdi_bground_thresh? What'= s the > > > >=20 > > > > It's not comparing bdi_stat_error to bdi_bground_thresh, but ra= ther, > > > > in concept, comparing bdi_stat (with error bound adjustments) t= o > > > > bdi_bground_thresh. > > > >=20 > > > > > relationship between them? I also see bdi_stat_error compare = to > > > > > bdi_thresh/bdi_dirty in function balance_dirty_pages.=20 > > > >=20 > > >=20 > > > Hi Fengguang, > > >=20 > > > > Here, it's trying to use bdi_stat_sum(), the accurate (however = more > > > > costly) version of bdi_stat(), if the error would possibly be l= arge: > > >=20 > > > Why error is large use bdi_stat_sum and error is few use bdi_stat= ? > >=20 >=20 > Thanks for your response Fengguang! :) You are welcome. > > It's the opposite. Please check this per-cpu counter routine to get= an idea: > >=20 > > /* > > * Add up all the per-cpu counts, return the result. This is a mor= e accurate > > * but much slower version of percpu_counter_read_positive() > > */ =20 > > s64 __percpu_counter_sum(struct percpu_counter *fbc) > >=20 > > > >=20 > > > > if (bdi_thresh < 2 * bdi_stat_error(bdi)) { > > > > bdi_reclaimable =3D bdi_stat_sum(bdi, B= DI_RECLAIMABLE); > > > > //... > > > > } else { > > > > bdi_reclaimable =3D bdi_stat(bdi, BDI_R= ECLAIMABLE); > > > > //... > > > > } > > > >=20 >=20 > The comment above these codes: >=20 > * In order to avoid the stacked BDI deadlock we need > * to ensure we accurately count the 'dirty' pages wh= en > * the threshold is low. >=20 > Why your meaning threshold low is error large?=20 Because bdi_reclaimable is normally less than or at least comparable to bdi_thresh. So (bdi_thresh < 2 * bdi_stat_error(bdi)) means the resulted bdi_reclaimable will be small, so small that the bdi_stat() error is not ignorable and should be avoided. Thanks, =46engguang >=20 > > > > Here the comment should have explained it well: > > > >=20 > > > > * In theory 1 page is enough to keep the comsu= mer-producer > > > > * pipe going: the flusher cleans 1 page =3D> t= he task dirties 1 > > > > * more page. However bdi_dirty has accounting = errors. So use > > >=20 > > > Why bdi_dirty has accounting errors? > >=20 > > Because it typically uses bdi_stat() to get the rough sum of the pe= r-cpu > > counters. > > =20 > > Thanks, > > Fengguang > >=20 > > > > * the larger and more IO friendly bdi_stat_err= or. > > > > */ > > > > if (bdi_dirty <=3D bdi_stat_error(bdi)) > > > > break; > > > >=20 > > > >=20 > > > > Thanks, > > > > Fengguang > > >=20 >=20 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html