From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: linux-nfs-owner@vger.kernel.org Received: from mga11.intel.com ([192.55.52.93]:48898 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757314Ab2HUNFH (ORCPT ); Tue, 21 Aug 2012 09:05:07 -0400 Date: Tue, 21 Aug 2012 21:04:58 +0800 From: Fengguang Wu To: Namjae Jeon Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Namjae Jeon , linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, Dave Chinner Subject: Re: [PATCH 3/3] writeback: add dirty_ratio_time per bdi variable (NFS write performance) Message-ID: <20120821130458.GD22321@localhost> References: <1345283402-7889-1-git-send-email-linkinjeon@gmail.com> <20120819025724.GC16796@localhost> <20120820145032.GA7469@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 In-Reply-To: Sender: linux-nfs-owner@vger.kernel.org List-ID: On Tue, Aug 21, 2012 at 03:00:13PM +0900, Namjae Jeon wrote: > 2012/8/20, Fengguang Wu : > > On Mon, Aug 20, 2012 at 09:48:42AM +0900, Namjae Jeon wrote: > >> 2012/8/19, Fengguang Wu : > >> > On Sat, Aug 18, 2012 at 05:50:02AM -0400, Namjae Jeon wrote: > >> >> From: Namjae Jeon > >> >> > >> >> This patch is based on suggestion by Wu Fengguang: > >> >> https://lkml.org/lkml/2011/8/19/19 > >> >> > >> >> kernel has mechanism to do writeback as per dirty_ratio and > >> >> dirty_background > >> >> ratio. It also maintains per task dirty rate limit to keep balance of > >> >> dirty pages at any given instance by doing bdi bandwidth estimation. > >> >> > >> >> Kernel also has max_ratio/min_ratio tunables to specify percentage of > >> >> writecache > >> >> to control per bdi dirty limits and task throtelling. > >> >> > >> >> However, there might be a usecase where user wants a writeback tuning > >> >> parameter to flush dirty data at desired/tuned time interval. > >> >> > >> >> dirty_background_time provides an interface where user can tune > >> >> background > >> >> writeback start time using /sys/block/sda/bdi/dirty_background_time > >> >> > >> >> dirty_background_time is used alongwith average bdi write bandwidth > >> >> estimation > >> >> to start background writeback. > >> > > >> > Here lies my major concern about dirty_background_time: the write > >> > bandwidth estimation is an _estimation_ and will sure become wildly > >> > wrong in some cases. So the dirty_background_time implementation based > >> > on it will not always work to the user expectations. > >> > > >> > One important case is, some users (eg. Dave Chinner) explicitly take > >> > advantage of the existing behavior to quickly create & delete a big > >> > 1GB temp file without worrying about triggering unnecessary IOs. > >> > > >> Hi. Wu. > >> Okay, I have a question. > >> > >> If making dirty_writeback_interval per bdi to tune short interval > >> instead of background_time, We can get similar performance > >> improvement. > >> /sys/block//bdi/dirty_writeback_interval > >> /sys/block//bdi/dirty_expire_interval > >> > >> NFS write performance improvement is just one usecase. > >> > >> If we can set interval/time per bdi, other usecases will be created > >> by applying. > > > > Per-bdi interval/time tunables, if there comes such a need, will in > > essential be for data caching and safety. If turning them into some > > requirement for better performance, the users will potential be > > stretched on choosing the "right" value for balanced data cache, > > safety and performance. Hmm, not a comfortable prospection. > Hi Wu. > First, Thanks for shared information. > > I change writeback interval on NFS server only. OK..sorry for missing that part! > I think that this does not affect data cache/page behaviour(caching) > change on NFS client. NFS client will start sending write requests as > per default NFS/writeback logic. So, no change in NFS client data > caching behaviour. > > Also, on NFS server it does not make change in system-wide caching > behaviour. It only modifies caching/writeback behaviour of a > particular “bdi” on NFS server so that NFS client could see better > WRITE speed. But would you default to dirty_background_time=0, where the special value 0 means no change of the original behavior? That will address David's very reasonable concern. Otherwise quite a few users are going to be surprised by the new behavior after upgrading kernel. > I will share several performancetest results as Dave's opinion. > > > > >> >The numbers are impressive! FYI, I tried another NFS specific approach > >> >to avoid big NFS COMMITs, which achieved similar performance gains: > >> > >> >nfs: writeback pages wait queue > >> >https://lkml.org/lkml/2011/10/20/235 > This patch looks client side optimization to me.(need to check more) Yes. > Do we need the optimization of server side as Bruce's opinion ? Sure. Thanks, Fengguang