From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753716AbZBCByt (ORCPT ); Mon, 2 Feb 2009 20:54:49 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751623AbZBCByh (ORCPT ); Mon, 2 Feb 2009 20:54:37 -0500 Received: from mga09.intel.com ([134.134.136.24]:16431 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750977AbZBCByg (ORCPT ); Mon, 2 Feb 2009 20:54:36 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.37,368,1231142400"; d="scan'208";a="383378028" Subject: Re: Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB From: "Zhang, Yanmin" To: Nick Piggin Cc: Jan Kara , Andrew Morton , linux-fsdevel@vger.kernel.org, LKML , npiggin@suse.de In-Reply-To: <200902031224.20856.nickpiggin@yahoo.com.au> References: <20090130012315.GB19554@duck.suse.cz> <200902031224.20856.nickpiggin@yahoo.com.au> Content-Type: text/plain Date: Tue, 03 Feb 2009 09:54:26 +0800 Message-Id: <1233626066.2604.114.camel@ymzhang> Mime-Version: 1.0 X-Mailer: Evolution 2.22.1 (2.22.1-2.fc9) Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2009-02-03 at 12:24 +1100, Nick Piggin wrote: > On Friday 30 January 2009 12:23:15 Jan Kara wrote: > > Hi, > > > > today I found that commit 31a12666d8f0c22235297e1c1575f82061480029 (mm: > > write_cache_pages cyclic fix) slows down operations over Berkeley DB. > > Without this "fix", I can add 100k entries in about 5 minutes 30s, with > > that change it takes about 20 minutes. > > What is IMO happening is that previously we scanned to the end of file, > > we left writeback_index at the end of file and went to write next file. > > With the fix, we wrap around (seek) and after writing some more we go > > to next file (seek again). We also found this commit causes about 40~50% regression with iozone mmap-rand-write. #iozone -B -r 4k -s 64k -s 512m -s 1200m My machine has 8GB memory. > Hmm, but isn't that what pdflush has asked for? It is wanting to flush > some of the dirty data out of this file, and hence it wants to start > from where it last flushed out and then cycle back and flush more? > > > > Anyway, I think the original semantics of "cyclic" makes more sence, just > > the name was chosen poorly. What we should do is really scan to the end of > > file, reset index to start from the beginning next time and go for the next > > file. > > Well, if we think of a file as containing a set of dirty pages (as it > appears to the high level mm), rather than a sequence, then behaviour > of my patch is correct (ie. there should be no distinction between dirty > pages at different offsets in the file). > > However, clearly there is some problem with that assumption if you're > seeing a 4x slowdown :P I'd really like to know how it messes up the IO > patterns. How many files in the BDB workload? Are filesystem blocks > being allocated at the end of the file while writeout is happening? > Delayed allocation? > > > > I can write a patch to introduce this semantics but I'd like to hear > > opinions of other people before I do so. > > I like dirty page cleaning to be offset agnostic as far as possible, > but I can't argue with numbers like that. Though maybe it would be > possible to solve it some other way.