From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755576AbZHUOFL (ORCPT ); Fri, 21 Aug 2009 10:05:11 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754669AbZHUOFL (ORCPT ); Fri, 21 Aug 2009 10:05:11 -0400 Received: from viefep17-int.chello.at ([62.179.121.37]:44727 "EHLO viefep17-int.chello.at" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754637AbZHUOFJ (ORCPT ); Fri, 21 Aug 2009 10:05:09 -0400 X-SourceIP: 213.93.53.227 Subject: Re: [RFC PATCH] mm: balance_dirty_pages. reduce calls to global_page_state to reduce cache references From: Peter Zijlstra To: Richard Kennedy Cc: Andrew Morton , "chris.mason" , lkml , Jens Axboe , miklos In-Reply-To: <1250855961.2226.94.camel@castor> References: <1250855961.2226.94.camel@castor> Content-Type: text/plain Date: Fri, 21 Aug 2009 16:04:54 +0200 Message-Id: <1250863494.7538.49.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.26.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (removed linux-mm because it seems to be ill atm) On Fri, 2009-08-21 at 12:59 +0100, Richard Kennedy wrote: > Reducing the number of times balance_dirty_pages calls global_page_state > reduces the cache references and so improves write performance on a > variety of workloads. > > 'perf stats' of simple fio write tests shows the reduction in cache > access. > Where the test is fio 'write,mmap,600Mb,pre_read' on AMD AthlonX2 with > 3Gb memory (dirty_threshold approx 600 Mb) > running each test 10 times, taking the average & standard deviation > > average (s.d.) in millions (10^6) > 2.6.31-rc6 661 (9.88) > +patch 604 (4.19) Nice. > Achieving this reduction is by dropping clip_bdi_dirty_limit as it > rereads the counters to apply the dirty_threshold and moving this check > up into balance_dirty_pages where it has already read the counters. OK, so what you did is first check the total dirty limit, and only if that is ok, check the per-BDI limit, now why didn't I think of that ;-) > Also by rearrange the for loop to only contain one copy of the limit > tests allows the pdflush test after the loop to use the local copies of > the counters rather than rereading then. > > In the common case with no throttling it now calls global_page_state 5 > fewer times and bdi_stat 2 fewer. > > I have tried to retain the existing behavior as much as possible, but > have added NR_WRITEBACK_TEMP to nr_writeback. This counter was used in > clip_bdi_dirty_limit but not in balance_dirty_pages, grep suggests this > is only used by FUSE but I haven't done any testing on that. It does > seem logical to count all the WRITEBACK pages when making the throttling > decisions so this change should be more correct ;) Right, the NR_WRITEBACK_TEMP thing is a FUSE feature, its used in writable mmap() support for FUSE things. I must admit to forgetting the exact semantics of the things, maybe Miklos can remind us. > Signed-off-by: Richard Kennedy Looks good here Acked-by: Peter Zijlstra > ---- > page-writeback.c | 116 ++++++++++++++++++++----------------------------------- > 1 file changed, 43 insertions(+), 73 deletions(-) > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 81627eb..6f18e40 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -512,45 +485,12 @@ static void balance_dirty_pages(struct address_space *mapping) > }; > > get_dirty_limits(&background_thresh, &dirty_thresh, > + &bdi_thresh, bdi); > > nr_reclaimable = global_page_state(NR_FILE_DIRTY) + > + global_page_state(NR_UNSTABLE_NFS); > + nr_writeback = global_page_state(NR_WRITEBACK) + > + global_page_state(NR_WRITEBACK_TEMP); > > /* > * In order to avoid the stacked BDI deadlock we need > @@ -570,16 +510,48 @@ static void balance_dirty_pages(struct address_space *mapping) > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > } > > + /* always throttle if over threshold */ > + if (nr_reclaimable + nr_writeback < dirty_thresh) { > + > + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > + break; > + > + /* > + * Throttle it only when the background writeback cannot > + * catch-up. This avoids (excessively) small writeouts > + * when the bdi limits are ramping up. > + */ > + if (nr_reclaimable + nr_writeback < > + (background_thresh + dirty_thresh) / 2) > + break; > + > + /* done enough? */ > + if (pages_written >= write_chunk) > + break; > + } > + if (!bdi->dirty_exceeded) > + bdi->dirty_exceeded = 1; > > + /* Note: nr_reclaimable denotes nr_dirty + nr_unstable. > + * Unstable writes are a feature of certain networked > + * filesystems (i.e. NFS) in which data may have been > + * written to the server's write cache, but has not yet > + * been flushed to permanent storage. > + * Only move pages to writeback if this bdi is over its > + * threshold otherwise wait until the disk writes catch > + * up. > + */ > + if (bdi_nr_reclaimable > bdi_thresh) { > + writeback_inodes(&wbc); > + pages_written += write_chunk - wbc.nr_to_write; > + if (wbc.nr_to_write == 0) > + continue; > + } > congestion_wait(BLK_RW_ASYNC, HZ/10); > } > > if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh && > + bdi->dirty_exceeded) > bdi->dirty_exceeded = 0; > > if (writeback_in_progress(bdi)) > @@ -593,10 +565,8 @@ static void balance_dirty_pages(struct address_space *mapping) > * In normal mode, we start background writeout at the lower > * background_thresh, to keep the amount of dirty memory low. > */ > + if ((laptop_mode && pages_written) || (!laptop_mode && > + (nr_reclaimable > background_thresh))) > pdflush_operation(background_writeout, 0); > } > > >