From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: [PATCH 10/18] writeback: dirty position control - bdi reserve area Date: Tue, 06 Sep 2011 16:09:39 +0200 Message-ID: <1315318179.14232.3.camel@twins> References: <20110904015305.367445271@intel.com> <20110904020915.942753370@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Cc: linux-fsdevel@vger.kernel.org, Andrew Morton , Jan Kara , Christoph Hellwig , Dave Chinner , Greg Thelen , Minchan Kim , Vivek Goyal , Andrea Righi , linux-mm , LKML To: Wu Fengguang Return-path: In-Reply-To: <20110904020915.942753370@intel.com> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org On Sun, 2011-09-04 at 09:53 +0800, Wu Fengguang wrote: > plain text document attachment (bdi-reserve-area) > Keep a minimal pool of dirty pages for each bdi, so that the disk IO > queues won't underrun. >=20 > It's particularly useful for JBOD and small memory system. >=20 > Note that this is not enough when memory is really tight (in comparison > to write bandwidth). It may result in (pos_ratio > 1) at the setpoint > and push the dirty pages high. This is more or less intended because the > bdi is in the danger of IO queue underflow. However the global dirty > pages, when pushed close to limit, will eventually conteract our desire > to push up the low bdi_dirty. >=20 > In low memory JBOD tests we do see disks under-utilized from time to > time. The additional fix may be to add a BDI_async_underrun flag to > indicate that the block write queue is running low and it's time to > quickly fill the queue by unthrottling the tasks regardless of the > global limit. >=20 > Signed-off-by: Wu Fengguang > --- > mm/page-writeback.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) >=20 > --- linux-next.orig/mm/page-writeback.c 2011-08-26 20:12:19.000000000 +08= 00 > +++ linux-next/mm/page-writeback.c 2011-08-26 20:13:21.000000000 +0800 > @@ -487,6 +487,16 @@ unsigned long bdi_dirty_limit(struct bac > * 0 +------------.------------------.----------------------*---------= ----> > * freerun^ setpoint^ limit^ dirty = pages > * > + * (o) bdi reserve area > + * > + * The bdi reserve area tries to keep a reasonable number of dirty pages= for > + * preventing block queue underrun. > + * > + * reserve area, scale up rate as dirty pages drop low > + * |<----------------------------------------------->| > + * |-------------------------------------------------------*-------|----= ------ > + * 0 bdi setpoint^ ^bdi_= thresh So why not call the thing bdi freerun ? > * (o) bdi control lines > * > * The control lines for the global/bdi setpoints both stretch up to @li= mit. > @@ -634,6 +644,22 @@ static unsigned long bdi_position_ratio( > pos_ratio *=3D x_intercept - bdi_dirty; > do_div(pos_ratio, x_intercept - bdi_setpoint + 1); > =20 > + /* > + * bdi reserve area, safeguard against dirty pool underrun and disk idl= e > + * > + * It may push the desired control point of global dirty pages higher > + * than setpoint. It's not necessary in single-bdi case because a > + * minimal pool of @freerun dirty pages will already be guaranteed. > + */ > + x_intercept =3D min(write_bw, freerun); > + if (bdi_dirty < x_intercept) { So the point of the freerun point is that we never throttle before it, so basically all the below shouldn't be needed at all, right?=20 > + if (bdi_dirty > x_intercept / 8) { > + pos_ratio *=3D x_intercept; > + do_div(pos_ratio, bdi_dirty); > + } else > + pos_ratio *=3D 8; > + } > + > return pos_ratio; > } So why not add: if (likely(dirty < freerun)) return 2; at the start of this function and leave it at that? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org