From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: Re: [PATCH 01/45] writeback: reduce calls to global_page_state in balance_dirty_pages() Date: Sun, 11 Oct 2009 18:50:12 +0800 Message-ID: <20091011105012.GB10409@localhost> References: <20091007073818.318088777@intel.com> <20091007074901.251116016@intel.com> <20091009151230.GF7654@duck.suse.cz> <1255101512.8802.65.camel@laptop> <20091009154759.GJ7654@duck.suse.cz> <20091011022813.GA21315@localhost> <1255247080.11081.3.camel@twins> Mime-Version: 1.0 Content-Type: text/plain; charset=gb2312 Cc: Jan Kara , Andrew Morton , Theodore Tso , Christoph Hellwig , Dave Chinner , Chris Mason , "Li, Shaohua" , Myklebust Trond , "jens.axboe@oracle.com" , Nick Piggin , "linux-fsdevel@vger.kernel.org" , Richard Kennedy , LKML To: Peter Zijlstra Return-path: Content-Disposition: inline In-Reply-To: <1255247080.11081.3.camel@twins> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Sun, Oct 11, 2009 at 03:44:40PM +0800, Peter Zijlstra wrote: > On Sun, 2009-10-11 at 10:28 +0800, Wu Fengguang wrote: > > > > Note that the total limit check itself may not be sufficient. For > > example, there are no nr_writeback limit for NFS (and maybe btrfs) > > after removing the congestion waits. Therefore it is very possible > > > > nr_writeback => dirty_thresh > > nr_dirty => 0 > > > > which is obviously undesirable: everything newly dirtied are soon put > > to writeback. It violates the 30s expire time and the background > > threshold rules, and will hurt write-and-truncate operations (ie. temp > > files). > > > > So the better solution would be to impose a nr_writeback limit for > > every filesystem that didn't already have one (the block io queue). > > NFS used to have that limit with congestion_wait, but now we need > > to do a wait queue for it. > > > > With the nr_writeback wait queue, it can be guaranteed that once > > balance_dirty_pages() asks for writing 1500 pages, it will be done > > with necessary sleeping in the bdi flush thread. So we can safely > > remove the loop and double checking of global dirty limit in > > balance_dirty_pages(). > > nr_reclaim = nr_dirty + nr_writeback + nr_unstable, so anything calling > into balance_dirty_pages() would still block on seeing such large > amounts of nr_writeback. Our terms are a bit different. In my previous mail, nr_reclaim = nr_dirty + nr_unstable nr_writeback is added separated when comparing with dirty_thresh, just as the code in balance_dirty_pages(). But that's fine. You are right that the application will be blocked and dirty limit be guaranteed, if we do while (over dirty limit) { bdi_writeback_wait(pages to write); } But it has a problem: as long as the bdi-flush thread for NFS don't limit nr_writeback, its nr_writeback will grow to near (dirty_thresh-nr_unstable), and its nr_dirty will approach 0. That's not desirable. So I did this: - while (over dirty limit) { + if (over dirty limit) { bdi_writeback_wait(pages to write); } _after_ adding the NFS nr_writeback wait queue ([PATCH 20/45] NFS: introduce writeback wait queue). With that it's safe to remove the loop. > Having the constraint nr_dirty + nr_writeback + nr_unstable < > dirty_thresh should ensure we never have nr_writeback > dirty_thresh, > simply because you cannot dirty more, which then cannot be converted to > more writeback. > > Or am I missing something? You are right with the assumption that the loop is still there. Sorry for the confusion, but I mean, filesystems have to limit nr_writeback (directly or indirectly via the block io queue), otherwise it either hit nr_dirty to 0 (with the loop), or let nr_writeback grow out of control (without the loop). Thanks, Fengguang