From mboxrd@z Thu Jan  1 00:00:00 1970
From: Wu Fengguang <fengguang.wu@intel.com>
Subject: Re: [PATCH 20/45] NFS: introduce writeback wait queue
Date: Wed, 7 Oct 2009 17:07:22 +0800
Message-ID: <20091007090722.GA5646@localhost>
References: <20091007073818.318088777@intel.com> <20091007074903.749770316@intel.com> <1254905600.26976.218.camel@twins>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Theodore Tso <tytso@mit.edu>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>,
	"Li, Shaohua" <shaohua.li@intel.com>,
	Myklebust Trond <Trond.Myklebust@netapp.com>,
	"jens.axboe@oracle.com" <jens.axboe@oracle.com>,
	Jan Kara <jack@suse.cz>, Nick Piggin <npiggin@suse.de>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mga14.intel.com ([143.182.124.37]:5014 "EHLO mga14.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S933106AbZJGJIl (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 7 Oct 2009 05:08:41 -0400
Content-Disposition: inline
In-Reply-To: <1254905600.26976.218.camel@twins>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Wed, Oct 07, 2009 at 04:53:20PM +0800, Peter Zijlstra wrote:
> On Wed, 2009-10-07 at 15:38 +0800, Wu Fengguang wrote:
> > plain text document attachment (writeback-nfs-request-queue.patch)
> > The generic writeback routines are departing from congestion_wait()
> > in preferance of get_request_wait(), aka. waiting on the block queues.
> > 
> > Introduce the missing writeback wait queue for NFS, otherwise its
> > writeback pages may grow out of control.
> > 
> > In perticular, balance_dirty_pages() will exit after it pushes
> > write_chunk pages into the PG_writeback page pool, _OR_ when the
> > background writeback work quits. The latter is new behavior, and could
> > not only quit (normally) after below background threshold, but also
> > quit when it finds _zero_ dirty pages to write. The latter case gives
> > rise to the number of PG_writeback pages if it is not explicitly limited.
> > 
> > CC: Jens Axboe <jens.axboe@oracle.com>
> > CC: Chris Mason <chris.mason@oracle.com>
> > CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > CC: Trond Myklebust <Trond.Myklebust@netapp.com>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> > 
> > The wait time and network throughput varies a lot! this is a major problem.
> > This means nfs_end_page_writeback() is not called smoothly over time,
> > even when there are plenty of PG_writeback pages on the client side.
> 
> Could that be some ack batching from the nfs protocol/server?

Yes possibly. Another possibility is some works in the nfsiod
workqueue takes up a long time.

> > +static void nfs_wakeup_congested(long nr, long limit,
> > +				 struct backing_dev_info *bdi,
> > +				 wait_queue_head_t *wqh)
> > +{
> > +	if (nr < 2*limit - min(limit/8, NFS_WAIT_PAGES)) {
> > +		if (test_bit(BDI_sync_congested, &bdi->state))
> > +			clear_bdi_congested(bdi, BLK_RW_SYNC);
> > +		if (waitqueue_active(&wqh[BLK_RW_SYNC])) {
> > +			smp_mb__after_clear_bit();
> > +			wake_up(&wqh[BLK_RW_SYNC]);
> > +		}
> > +	}
> > +	if (nr < limit - min(limit/8, NFS_WAIT_PAGES)) {
> > +		if (test_bit(BDI_async_congested, &bdi->state))
> > +			clear_bdi_congested(bdi, BLK_RW_ASYNC);
> > +		if (waitqueue_active(&wqh[BLK_RW_ASYNC])) {
> > +			smp_mb__after_clear_bit();
> > +			wake_up(&wqh[BLK_RW_ASYNC]);
> > +		}
> > +	}
> > +}
> >  
> 
> wakeup implies a full memory barrier.

If so, this smp_mb__after_clear_bit() line is also not necessary?

void clear_bdi_congested(struct backing_dev_info *bdi, int sync)
{
//...
        clear_bit(bit, &bdi->state);
        smp_mb__after_clear_bit();
        if (waitqueue_active(wqh))
                wake_up(wqh);


Thanks,
Fengguang