linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* More Large blocksize benchmarks
@ 2007-10-16  0:22 Chris Mason
  2007-10-16  0:44 ` Christoph Lameter
  2007-10-16  2:36 ` David Chinner
  0 siblings, 2 replies; 4+ messages in thread
From: Chris Mason @ 2007-10-16  0:22 UTC (permalink / raw)
  To: David Chinner
  Cc: Linus Torvalds, Nathan Scott, Andrea Arcangeli, Nick Piggin,
	Christoph Lameter, Mel Gorman, linux-fsdevel, linux-kernel,
	Christoph Hellwig, William Lee Irwin III, Jens Axboe,
	Badari Pulavarty, Maxim Levitsky, Fengguang Wu, swin wang,
	totty.lu, hugh, joern

Hello everyone,

I'm stealing the cc list and reviving and old thread because I've
finally got some numbers to go along with the Btrfs variable blocksize
feature.  The basic idea is to create a read/write interface to
map a range of bytes on the address space, and use it in Btrfs for all
metadata operations (file operations have always been extent based).

So, instead of casting buffer_head->b_data to some structure, I read and
write at offsets in a struct extent_buffer.  The extent buffer is very
small and backed by an address space, and I get large block sizes the
same way file_write gets to write to 16k at a time, by finding the
appropriate page in the addess space.  This is an over simplification
since I try to cache these mapping decisions to avoid using too much
CPU, but hopefully you get the idea.

The advantage to this approach is the changes are all inside Btrfs.  No
extra kernel patches were required.

Dave reported that XFS saw much higher write throughput with large
blocksizes, but so far I'm seeing the most benefits during reads.

The next step is a bunch more benchmarks.  I've done the first round
and posted it here:

http://oss.oracle.com/~mason/blocksizes/

The Btrfs code makes it relatively easy to experiment, and so this may
be a good step toward figuring out if some automagic solution is worth
it in general.  I can even use different sizes for nodes and leaves,
although I haven't done much testing at all there yet.

-chris


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: More Large blocksize benchmarks
  2007-10-16  0:22 More Large blocksize benchmarks Chris Mason
@ 2007-10-16  0:44 ` Christoph Lameter
  2007-10-16  2:36 ` David Chinner
  1 sibling, 0 replies; 4+ messages in thread
From: Christoph Lameter @ 2007-10-16  0:44 UTC (permalink / raw)
  To: Chris Mason
  Cc: David Chinner, Linus Torvalds, Nathan Scott, Andrea Arcangeli,
	Nick Piggin, Mel Gorman, linux-fsdevel, linux-kernel,
	Christoph Hellwig, William Lee Irwin III, Jens Axboe,
	Badari Pulavarty, Maxim Levitsky, Fengguang Wu, swin wang,
	totty.lu, hugh, joern

On Mon, 15 Oct 2007, Chris Mason wrote:

> Dave reported that XFS saw much higher write throughput with large
> blocksizes, but so far I'm seeing the most benefits during reads.

Dave's tests were done with an early large blocksize patchset that had 
issues with readahead. More recent versions have the fixes by Fengguang 
that address the issue.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: More Large blocksize benchmarks
  2007-10-16  0:22 More Large blocksize benchmarks Chris Mason
  2007-10-16  0:44 ` Christoph Lameter
@ 2007-10-16  2:36 ` David Chinner
  2007-10-16 13:01   ` Chris Mason
  1 sibling, 1 reply; 4+ messages in thread
From: David Chinner @ 2007-10-16  2:36 UTC (permalink / raw)
  To: Chris Mason
  Cc: David Chinner, Linus Torvalds, Nathan Scott, Andrea Arcangeli,
	Nick Piggin, Christoph Lameter, Mel Gorman, linux-fsdevel,
	linux-kernel, Christoph Hellwig, William Lee Irwin III,
	Jens Axboe, Badari Pulavarty, Maxim Levitsky, Fengguang Wu,
	swin wang, totty.lu, hugh, joern

On Mon, Oct 15, 2007 at 08:22:31PM -0400, Chris Mason wrote:
> Hello everyone,
> 
> I'm stealing the cc list and reviving and old thread because I've
> finally got some numbers to go along with the Btrfs variable blocksize
> feature.  The basic idea is to create a read/write interface to
> map a range of bytes on the address space, and use it in Btrfs for all
> metadata operations (file operations have always been extent based).
> 
> So, instead of casting buffer_head->b_data to some structure, I read and
> write at offsets in a struct extent_buffer.  The extent buffer is very
> small and backed by an address space, and I get large block sizes the
> same way file_write gets to write to 16k at a time, by finding the
> appropriate page in the addess space.  This is an over simplification
> since I try to cache these mapping decisions to avoid using too much
> CPU, but hopefully you get the idea.
> 
> The advantage to this approach is the changes are all inside Btrfs.  No
> extra kernel patches were required.
> 
> Dave reported that XFS saw much higher write throughput with large
> blocksizes, but so far I'm seeing the most benefits during reads.

Apples to oranges, Chris ;)

btrfs linearises writes due to it's COW behaviour and this is trades
off read speed. i.e. we take more seeks to read data so we can keep
the write speed high. By using large blocks, you're reducing the
number of seeks needed to find anything, and hence the read speed
will increase. Write speed will be pretty much unchanged because
btrfs does linear writes no matter the block size.

XFS doesn't linearise writes and optimises it's layout for a large
number of disks and a low number of seeks on reads - the opposite
of btrfs. Hence large block sizes reduce the number of writes XFS
needs to write a given set of data+metadata and hence write speed
increases much more than the read speed (until you get to large tree
traversals).

The basic conclusion is that different filesystems will benefit in
different ways with large block sizes....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: More Large blocksize benchmarks
  2007-10-16  2:36 ` David Chinner
@ 2007-10-16 13:01   ` Chris Mason
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Mason @ 2007-10-16 13:01 UTC (permalink / raw)
  To: David Chinner
  Cc: Linus Torvalds, Nathan Scott, Andrea Arcangeli, Nick Piggin,
	Christoph Lameter, Mel Gorman, linux-fsdevel, linux-kernel,
	Christoph Hellwig, William Lee Irwin III, Jens Axboe,
	Badari Pulavarty, Maxim Levitsky, Fengguang Wu, swin wang,
	totty.lu, hugh, joern

On Tue, 2007-10-16 at 12:36 +1000, David Chinner wrote:
> On Mon, Oct 15, 2007 at 08:22:31PM -0400, Chris Mason wrote:
> > Hello everyone,
> > 
> > I'm stealing the cc list and reviving and old thread because I've
> > finally got some numbers to go along with the Btrfs variable blocksize
> > feature.  The basic idea is to create a read/write interface to
> > map a range of bytes on the address space, and use it in Btrfs for all
> > metadata operations (file operations have always been extent based).
> > 
> > So, instead of casting buffer_head->b_data to some structure, I read and
> > write at offsets in a struct extent_buffer.  The extent buffer is very
> > small and backed by an address space, and I get large block sizes the
> > same way file_write gets to write to 16k at a time, by finding the
> > appropriate page in the addess space.  This is an over simplification
> > since I try to cache these mapping decisions to avoid using too much
> > CPU, but hopefully you get the idea.
> > 
> > The advantage to this approach is the changes are all inside Btrfs.  No
> > extra kernel patches were required.
> > 
> > Dave reported that XFS saw much higher write throughput with large
> > blocksizes, but so far I'm seeing the most benefits during reads.
> 
> Apples to oranges, Chris ;)
> 

Grin, if the two were the same, there'd be no reason to write a new one.
I didn't expect faster writes on btrfs, at least not for workloads that
did not require reads.  The basic idea is to show there are a variety of
ways the larger blocks can improve (and hurt) performance.

Also, vmap isn't the only implementation path.  Its true the Btrfs
changes for this were huge, but a big chunk of the changes were for
different leaf/node blocksizes, something that may never get used in
practice.

-chris



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-10-16 13:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-16  0:22 More Large blocksize benchmarks Chris Mason
2007-10-16  0:44 ` Christoph Lameter
2007-10-16  2:36 ` David Chinner
2007-10-16 13:01   ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).