linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: Btrfs v0.16 released
Date: Thu, 14 Aug 2008 17:00:56 -0400	[thread overview]
Message-ID: <1218747656.15342.439.camel@think.oraclecorp.com> (raw)
In-Reply-To: <1218221293.15342.263.camel@think.oraclecorp.com>

On Fri, 2008-08-08 at 14:48 -0400, Chris Mason wrote:
> On Thu, 2008-08-07 at 20:02 +0200, Andi Kleen wrote:
> > Chris Mason <chris.mason@oracle.com> writes:
> > >
> > > Metadata is duplicated by default even on single spindle drives, 
> > 
> > Can you please say a bit how much that impacts performance? That sounds 
> > costly.
> 
> Most metadata is allocated in groups of 128k or 256k, and so most of the
> writes are nicely sized.  The mirroring code has areas of the disk
> dedicated to mirror other areas. 

[ ... ]

> So, the mirroring turns a single large write into two large writes.
> Definitely not free, but always a fixed cost.

> With /sys/block/sdb/queue/nr_requests at 8192 to hide my IO ordering
> submission problems:
> 
> Btrfs defaults: 57MB/s
> Btrfs no mirror: 61.51MB/s

I spent a bunch of time hammering on different ways to fix this without
increasing nr_requests, and it was a mixture of needing better tuning in
btrfs and needing to init mapping->writeback_index on inode allocation.

So, today's numbers for creating 30 kernel trees in sequence:

Btrfs defaults                  57.41 MB/s
Btrfs dup no csum               74.59 MB/s 
Btrfs no duplication            76.83 MB/s
Btrfs no dup no csum no inline  76.85 MB/s

Ext4 data=writeback, delalloc   60.50 MB/s

I may be able to get the duplication numbers higher by tuning metadata
writeback.  My current code doesn't push metadata throughput as high in
order to give some spindle time to data writes.

This graph may give you an idea of how the duplication goes to disk:

http://oss.oracle.com/~mason/seekwatcher/btrfs-dup/btrfs-default.png

Compared with the result of mkfs.btrfs -m single (no duplication):

http://oss.oracle.com/~mason/seekwatcher/btrfs-dup/btrfs-single.png

Both on one graph is a little hard to read:

http://oss.oracle.com/~mason/seekwatcher/btrfs-dup/btrfs-dup-compare.png

Here is btrfs with duplication on, but without checksumming.  Even with
inline extents on, the checksums seem to cause most of the metadata
related syncing (they are stored in the btree):

http://oss.oracle.com/~mason/seekwatcher/btrfs-dup/btrfs-dup-nosum.png

It is worth noting that with checksumming on, I go through async
kthreads to do the checksumming and they may be reordering the IO a bit
as they submit things.  So, I'm not 100% sure the extra seeks aren't
coming from my async code.

And Ext4:

http://oss.oracle.com/~mason/seekwatcher/btrfs-dup/ext4-writeback.png

This benchmark has questionable real world value, but since it includes
a number of smallish files it is a good place to look at the cost of
metadata and metadata dup

I'll push the btrfs related changes for this out tonight after some
stress testing.

-chris



  parent reply	other threads:[~2008-08-14 21:01 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-05 19:01 Btrfs v0.16 released Chris Mason
2008-08-07  9:08 ` Peter Zijlstra
2008-08-07 10:34   ` Chris Mason
2008-08-07 14:58     ` Chris Friesen
2008-08-07 15:07     ` tvrtko.ursulin
2008-08-07  9:14 ` Peter Zijlstra
2008-08-07 10:39   ` Chris Mason
     [not found]     ` <3da3b5b40808070703x4cf49471q6acc00351ba019d7@mail.gmail.com>
2008-08-07 14:06       ` Chris Mason
2008-08-07 18:02     ` Andi Kleen
2008-08-08 18:48       ` Chris Mason
2008-08-08 21:56         ` Andi Kleen
2008-08-09  1:19           ` Theodore Tso
2008-08-09  1:23             ` Andi Kleen
     [not found]             ` <20080809012322.GF9038@one.firstfloor.org>
2008-08-09  1:43               ` Theodore Tso
2008-08-14 21:00         ` Chris Mason [this message]
2008-08-14 21:17           ` Andi Kleen
2008-08-15  1:25             ` Chris Mason
2008-08-15  1:39               ` Andi Kleen
2008-08-15 13:00                 ` Chris Mason
2008-08-16 19:26                   ` Szabolcs Szakacsits
2008-08-18 13:52                     ` Chris Mason
2008-08-18 17:37                       ` Szabolcs Szakacsits
2008-08-14 23:44           ` Theodore Tso
2008-08-15  1:10             ` Chris Mason
2008-08-15 12:46               ` Chris Mason
2008-08-15 13:45                 ` Theodore Tso
2008-08-15 17:52                   ` Chris Mason
2008-08-15 19:59                     ` Theodore Tso
2008-08-15 20:37                       ` Chris Mason
2008-08-16 18:10                         ` Chris Mason
2008-08-16 19:27                           ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1218747656.15342.439.camel@think.oraclecorp.com \
    --to=chris.mason@oracle.com \
    --cc=andi@firstfloor.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).