From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fengguang Wu Subject: Re: [PATCH 3/3] writeback: add dirty_ratio_time per bdi variable (NFS write performance) Date: Tue, 21 Aug 2012 21:04:58 +0800 Message-ID: <20120821130458.GD22321@localhost> References: <1345283402-7889-1-git-send-email-linkinjeon@gmail.com> <20120819025724.GC16796@localhost> <20120820145032.GA7469@localhost> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Namjae Jeon , linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, Dave Chinner To: Namjae Jeon Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Tue, Aug 21, 2012 at 03:00:13PM +0900, Namjae Jeon wrote: > 2012/8/20, Fengguang Wu : > > On Mon, Aug 20, 2012 at 09:48:42AM +0900, Namjae Jeon wrote: > >> 2012/8/19, Fengguang Wu : > >> > On Sat, Aug 18, 2012 at 05:50:02AM -0400, Namjae Jeon wrote: > >> >> From: Namjae Jeon > >> >> > >> >> This patch is based on suggestion by Wu Fengguang: > >> >> https://lkml.org/lkml/2011/8/19/19 > >> >> > >> >> kernel has mechanism to do writeback as per dirty_ratio and > >> >> dirty_background > >> >> ratio. It also maintains per task dirty rate limit to keep bala= nce of > >> >> dirty pages at any given instance by doing bdi bandwidth estima= tion. > >> >> > >> >> Kernel also has max_ratio/min_ratio tunables to specify percent= age of > >> >> writecache > >> >> to control per bdi dirty limits and task throtelling. > >> >> > >> >> However, there might be a usecase where user wants a writeback = tuning > >> >> parameter to flush dirty data at desired/tuned time interval. > >> >> > >> >> dirty_background_time provides an interface where user can tune > >> >> background > >> >> writeback start time using /sys/block/sda/bdi/dirty_background_= time > >> >> > >> >> dirty_background_time is used alongwith average bdi write bandw= idth > >> >> estimation > >> >> to start background writeback. > >> > > >> > Here lies my major concern about dirty_background_time: the writ= e > >> > bandwidth estimation is an _estimation_ and will sure become wil= dly > >> > wrong in some cases. So the dirty_background_time implementation= based > >> > on it will not always work to the user expectations. > >> > > >> > One important case is, some users (eg. Dave Chinner) explicitly = take > >> > advantage of the existing behavior to quickly create & delete a = big > >> > 1GB temp file without worrying about triggering unnecessary IOs. > >> > > >> Hi. Wu. > >> Okay, I have a question. > >> > >> If making dirty_writeback_interval per bdi to tune short interval > >> instead of background_time, We can get similar performance > >> improvement. > >> /sys/block//bdi/dirty_writeback_interval > >> /sys/block//bdi/dirty_expire_interval > >> > >> NFS write performance improvement is just one usecase. > >> > >> If we can set interval/time per bdi, other usecases will be creat= ed > >> by applying. > > > > Per-bdi interval/time tunables, if there comes such a need, will in > > essential be for data caching and safety. If turning them into some > > requirement for better performance, the users will potential be > > stretched on choosing the "right" value for balanced data cache, > > safety and performance. Hmm, not a comfortable prospection. > Hi Wu. > First, Thanks for shared information. >=20 > I change writeback interval on NFS server only. OK..sorry for missing that part! > I think that this does not affect data cache/page behaviour(caching) > change on NFS client. NFS client will start sending write requests as > per default NFS/writeback logic. So, no change in NFS client data > caching behaviour. >=20 > Also, on NFS server it does not make change in system-wide caching > behaviour. It only modifies caching/writeback behaviour of a > particular =E2=80=9Cbdi=E2=80=9D on NFS server so that NFS client cou= ld see better > WRITE speed. But would you default to dirty_background_time=3D0, where the special value 0 means no change of the original behavior? That will address David's very reasonable concern. Otherwise quite a few users are going to be surprised by the new behavior after upgrading kernel. > I will share several performancetest results as Dave's opinion. >=20 > > > >> >The numbers are impressive! FYI, I tried another NFS specific app= roach > >> >to avoid big NFS COMMITs, which achieved similar performance gain= s: > >> > >> >nfs: writeback pages wait queue > >> >https://lkml.org/lkml/2011/10/20/235 > This patch looks client side optimization to me.(need to check more) Yes. > Do we need the optimization of server side as Bruce's opinion ? Sure. Thanks, =46engguang