From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Mason Subject: Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS Date: Mon, 18 Jun 2007 10:29:47 -0400 Message-ID: <20070618142947.GC2061@think.oraclecorp.com> References: <20070612161029.GB28279@think.oraclecorp.com> <467188F6.7020002@gmail.com> <20070614191331.GE2061@think.oraclecorp.com> <4672E3B6.1030000@gmail.com> <20070615191153.GU2061@think.oraclecorp.com> <4672FA8C.50309@gmail.com> <20070615205140.GV2061@think.oraclecorp.com> <46730C9A.6090107@gmail.com> <20070616005446.GY2061@think.oraclecorp.com> <4673AE03.4000308@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-fsdevel@vger.kernel.org To: "Florian D." Return-path: Received: from agminet01.oracle.com ([141.146.126.228]:21412 "EHLO agminet01.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755859AbXFROdG (ORCPT ); Mon, 18 Jun 2007 10:33:06 -0400 Content-Disposition: inline In-Reply-To: <4673AE03.4000308@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Sat, Jun 16, 2007 at 11:31:47AM +0200, Florian D. wrote: > Chris Mason wrote: > > > Strange, these numbers are not quite what I was expecting ;) Could you > > please post your fio job files? Also, how much ram does the machine > > have? Only writing doesn't seem like enough to fill the ram. > > > > -chris > > > > > > Sure: > > [global] > > directory=/mnt/temp/default > > filename=testfile > > size=300m > > randrepeat=1 > > overwrite=1 > > end_fsync=1 [ very bad results on btrfs with these parameters ] Ok, the numbers make more sense now. Basically what is happening is that during the random IO phase, fio is hitting every single block in the file. Btrfs will allocate new blocks in a sequential fashion, but the fsync does writeback in page order. So, the fsync sees completely random block ordering, and then we see it again on the reads. In ext3 even though the writes are random, the fsync uses the original (sequential) ordering of the blocks, and everything works nicely. The fix is either delayed allocation or defrag-on-writeback. Another option (which I'll have to do for O_SYNC performance) is to leave space in the blocks allocated to the file for COWs (basically strides of allocated blocks). I'll do the defrag-on-writeback right after enospc. -chris