From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757413AbZJKKvL (ORCPT ); Sun, 11 Oct 2009 06:51:11 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756675AbZJKKvG (ORCPT ); Sun, 11 Oct 2009 06:51:06 -0400 Received: from mga14.intel.com ([143.182.124.37]:7638 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756352AbZJKKvD (ORCPT ); Sun, 11 Oct 2009 06:51:03 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.44,541,1249282800"; d="scan'208";a="197615364" Date: Sun, 11 Oct 2009 18:50:12 +0800 From: Wu Fengguang To: Peter Zijlstra Cc: Jan Kara , Andrew Morton , Theodore Tso , Christoph Hellwig , Dave Chinner , Chris Mason , "Li, Shaohua" , Myklebust Trond , "jens.axboe@oracle.com" , Nick Piggin , "linux-fsdevel@vger.kernel.org" , Richard Kennedy , LKML Subject: Re: [PATCH 01/45] writeback: reduce calls to global_page_state in balance_dirty_pages() Message-ID: <20091011105012.GB10409@localhost> References: <20091007073818.318088777@intel.com> <20091007074901.251116016@intel.com> <20091009151230.GF7654@duck.suse.cz> <1255101512.8802.65.camel@laptop> <20091009154759.GJ7654@duck.suse.cz> <20091011022813.GA21315@localhost> <1255247080.11081.3.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=gb2312 Content-Disposition: inline In-Reply-To: <1255247080.11081.3.camel@twins> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Oct 11, 2009 at 03:44:40PM +0800, Peter Zijlstra wrote: > On Sun, 2009-10-11 at 10:28 +0800, Wu Fengguang wrote: > > > > Note that the total limit check itself may not be sufficient. For > > example, there are no nr_writeback limit for NFS (and maybe btrfs) > > after removing the congestion waits. Therefore it is very possible > > > > nr_writeback => dirty_thresh > > nr_dirty => 0 > > > > which is obviously undesirable: everything newly dirtied are soon put > > to writeback. It violates the 30s expire time and the background > > threshold rules, and will hurt write-and-truncate operations (ie. temp > > files). > > > > So the better solution would be to impose a nr_writeback limit for > > every filesystem that didn't already have one (the block io queue). > > NFS used to have that limit with congestion_wait, but now we need > > to do a wait queue for it. > > > > With the nr_writeback wait queue, it can be guaranteed that once > > balance_dirty_pages() asks for writing 1500 pages, it will be done > > with necessary sleeping in the bdi flush thread. So we can safely > > remove the loop and double checking of global dirty limit in > > balance_dirty_pages(). > > nr_reclaim = nr_dirty + nr_writeback + nr_unstable, so anything calling > into balance_dirty_pages() would still block on seeing such large > amounts of nr_writeback. Our terms are a bit different. In my previous mail, nr_reclaim = nr_dirty + nr_unstable nr_writeback is added separated when comparing with dirty_thresh, just as the code in balance_dirty_pages(). But that's fine. You are right that the application will be blocked and dirty limit be guaranteed, if we do while (over dirty limit) { bdi_writeback_wait(pages to write); } But it has a problem: as long as the bdi-flush thread for NFS don't limit nr_writeback, its nr_writeback will grow to near (dirty_thresh-nr_unstable), and its nr_dirty will approach 0. That's not desirable. So I did this: - while (over dirty limit) { + if (over dirty limit) { bdi_writeback_wait(pages to write); } _after_ adding the NFS nr_writeback wait queue ([PATCH 20/45] NFS: introduce writeback wait queue). With that it's safe to remove the loop. > Having the constraint nr_dirty + nr_writeback + nr_unstable < > dirty_thresh should ensure we never have nr_writeback > dirty_thresh, > simply because you cannot dirty more, which then cannot be converted to > more writeback. > > Or am I missing something? You are right with the assumption that the loop is still there. Sorry for the confusion, but I mean, filesystems have to limit nr_writeback (directly or indirectly via the block io queue), otherwise it either hit nr_dirty to 0 (with the loop), or let nr_writeback grow out of control (without the loop). Thanks, Fengguang