linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Why does btrfs benchmark so badly in this case?
@ 2013-08-08 16:13 John Williams
  2013-08-08 17:29 ` Josef Bacik
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: John Williams @ 2013-08-08 16:13 UTC (permalink / raw)
  To: linux-btrfs

Phoronix periodically runs benchmarks on filesystems, and one thing I
have noticed is that btrfs always does terribly on their fio "Intel
IOMeter fileserver access pattern" benchmark:

http://www.phoronix.com/scan.php?page=article&item=linux_310_10fs&num=2

Here, btrfs is more than 6 times slower than ext4, and about 3 times
slower than XFS.

Lest we attribute it to an unavoidable downside of COW filesystems and
move on...no, we cannot do that, because ZFS does well here -- btrfs
is about 6 times slower than ZFS!

Note that btrfs does quite well in the other Phoronix benchmarks. It
is just the fio fileserver benchmark that btrfs has problems with.

What is going on here? Why is btrfs doing so poorly?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why does btrfs benchmark so badly in this case?
  2013-08-08 16:13 Why does btrfs benchmark so badly in this case? John Williams
@ 2013-08-08 17:29 ` Josef Bacik
  2013-08-08 18:37 ` Clemens Eisserer
  2013-08-08 19:40 ` Josef Bacik
  2 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2013-08-08 17:29 UTC (permalink / raw)
  To: John Williams; +Cc: linux-btrfs

On Thu, Aug 08, 2013 at 09:13:04AM -0700, John Williams wrote:
> Phoronix periodically runs benchmarks on filesystems, and one thing I
> have noticed is that btrfs always does terribly on their fio "Intel
> IOMeter fileserver access pattern" benchmark:
> 
> http://www.phoronix.com/scan.php?page=article&item=linux_310_10fs&num=2
> 
> Here, btrfs is more than 6 times slower than ext4, and about 3 times
> slower than XFS.
> 
> Lest we attribute it to an unavoidable downside of COW filesystems and
> move on...no, we cannot do that, because ZFS does well here -- btrfs
> is about 6 times slower than ZFS!
> 
> Note that btrfs does quite well in the other Phoronix benchmarks. It
> is just the fio fileserver benchmark that btrfs has problems with.
> 
> What is going on here? Why is btrfs doing so poorly?

Excellent question, I'll get back to you on that.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why does btrfs benchmark so badly in this case?
  2013-08-08 16:13 Why does btrfs benchmark so badly in this case? John Williams
  2013-08-08 17:29 ` Josef Bacik
@ 2013-08-08 18:37 ` Clemens Eisserer
  2013-08-08 19:40 ` Josef Bacik
  2 siblings, 0 replies; 10+ messages in thread
From: Clemens Eisserer @ 2013-08-08 18:37 UTC (permalink / raw)
  To: linux-btrfs

> What is going on here? Why is btrfs doing so poorly?

Funny thing, I was thinking exactly the same when reading the article ;)

Regards

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why does btrfs benchmark so badly in this case?
  2013-08-08 16:13 Why does btrfs benchmark so badly in this case? John Williams
  2013-08-08 17:29 ` Josef Bacik
  2013-08-08 18:37 ` Clemens Eisserer
@ 2013-08-08 19:40 ` Josef Bacik
  2013-08-08 20:23   ` John Williams
  2 siblings, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2013-08-08 19:40 UTC (permalink / raw)
  To: John Williams; +Cc: linux-btrfs

On Thu, Aug 08, 2013 at 09:13:04AM -0700, John Williams wrote:
> Phoronix periodically runs benchmarks on filesystems, and one thing I
> have noticed is that btrfs always does terribly on their fio "Intel
> IOMeter fileserver access pattern" benchmark:
> 
> http://www.phoronix.com/scan.php?page=article&item=linux_310_10fs&num=2
> 
> Here, btrfs is more than 6 times slower than ext4, and about 3 times
> slower than XFS.
> 
> Lest we attribute it to an unavoidable downside of COW filesystems and
> move on...no, we cannot do that, because ZFS does well here -- btrfs
> is about 6 times slower than ZFS!
> 
> Note that btrfs does quite well in the other Phoronix benchmarks. It
> is just the fio fileserver benchmark that btrfs has problems with.
> 
> What is going on here? Why is btrfs doing so poorly?

So the reason this workload sucks for btrfs is because we fall back on buffered
IO because fio does not do block size aligned writes for this workload.  If you
add

ba=4k

to the iometer fio file then we go the same speed as xfs and ext4.  Not a whole
lot we can do about this since unaligned writes means we have to read in pages
to cow the block properly, which is why we fall back to buffered.  Once we do
that we end up having a lot of page locking stuff that gets in the way and makes
us twice as slow.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why does btrfs benchmark so badly in this case?
  2013-08-08 19:40 ` Josef Bacik
@ 2013-08-08 20:23   ` John Williams
  2013-08-08 20:38     ` Josef Bacik
  2013-08-08 20:59     ` Chris Murphy
  0 siblings, 2 replies; 10+ messages in thread
From: John Williams @ 2013-08-08 20:23 UTC (permalink / raw)
  Cc: linux-btrfs

On Thu, Aug 8, 2013 at 12:40 PM, Josef Bacik <jbacik@fusionio.com> wrote:
> On Thu, Aug 08, 2013 at 09:13:04AM -0700, John Williams wrote:
>> Phoronix periodically runs benchmarks on filesystems, and one thing I
>> have noticed is that btrfs always does terribly on their fio "Intel
>> IOMeter fileserver access pattern" benchmark:
>>
>> http://www.phoronix.com/scan.php?page=article&item=linux_310_10fs&num=2

> So the reason this workload sucks for btrfs is because we fall back on buffered
> IO because fio does not do block size aligned writes for this workload.  If you
> add
>
> ba=4k
>
> to the iometer fio file then we go the same speed as xfs and ext4.  Not a whole
> lot we can do about this since unaligned writes means we have to read in pages
> to cow the block properly, which is why we fall back to buffered.  Once we do
> that we end up having a lot of page locking stuff that gets in the way and makes
> us twice as slow.  Thanks,

Thanks for looking into it.

So I guess the reason that ZFS does well with that workload is that
ZFS is using smaller blocks, maybe just 512B ?

I wonder how common these type of non-4K aligned workloads are.
Apparently, people with such workloads should avoid btrfs, but maybe
these types of workloads are very rare?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why does btrfs benchmark so badly in this case?
  2013-08-08 20:23   ` John Williams
@ 2013-08-08 20:38     ` Josef Bacik
  2013-08-09 21:35       ` Kai Krakow
  2013-08-08 20:59     ` Chris Murphy
  1 sibling, 1 reply; 10+ messages in thread
From: Josef Bacik @ 2013-08-08 20:38 UTC (permalink / raw)
  To: John Williams; +Cc: linux-btrfs

On Thu, Aug 08, 2013 at 01:23:22PM -0700, John Williams wrote:
> On Thu, Aug 8, 2013 at 12:40 PM, Josef Bacik <jbacik@fusionio.com> wrote:
> > On Thu, Aug 08, 2013 at 09:13:04AM -0700, John Williams wrote:
> >> Phoronix periodically runs benchmarks on filesystems, and one thing I
> >> have noticed is that btrfs always does terribly on their fio "Intel
> >> IOMeter fileserver access pattern" benchmark:
> >>
> >> http://www.phoronix.com/scan.php?page=article&item=linux_310_10fs&num=2
> 
> > So the reason this workload sucks for btrfs is because we fall back on buffered
> > IO because fio does not do block size aligned writes for this workload.  If you
> > add
> >
> > ba=4k
> >
> > to the iometer fio file then we go the same speed as xfs and ext4.  Not a whole
> > lot we can do about this since unaligned writes means we have to read in pages
> > to cow the block properly, which is why we fall back to buffered.  Once we do
> > that we end up having a lot of page locking stuff that gets in the way and makes
> > us twice as slow.  Thanks,
> 
> Thanks for looking into it.
> 
> So I guess the reason that ZFS does well with that workload is that
> ZFS is using smaller blocks, maybe just 512B ?
> 

Yeah I'm not sure what ZFS does, but if you are writing over a block and the
size/offset isn't aligned then you'd see similar issues with ZFS since it would
have to read+modify+write.  It is likely that ZFS just is using a smaller
blocksize.

> I wonder how common these type of non-4K aligned workloads are.
> Apparently, people with such workloads should avoid btrfs, but maybe
> these types of workloads are very rare?

So most people who use AIO/O_DIRECT have really specific setups which generally
can adjust how they align stuff (databases for example this would be the db page
and those are usually large, like 16k-32k), or with virtual images which will
hopefully be doing things in block aligned io's, but this depends on the host
OS.  Like I said there isn't a whole lot we can do about this, you can do NOCOW
if you want to get around it without changing your application or you can change
the app to be blocksize aligned.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why does btrfs benchmark so badly in this case?
  2013-08-08 20:23   ` John Williams
  2013-08-08 20:38     ` Josef Bacik
@ 2013-08-08 20:59     ` Chris Murphy
  2013-08-08 21:25       ` Zach Brown
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Murphy @ 2013-08-08 20:59 UTC (permalink / raw)
  To: Btrfs BTRFS


On Aug 8, 2013, at 2:23 PM, John Williams <jwilliams4200@gmail.com> wrote:
> 
> So I guess the reason that ZFS does well with that workload is that
> ZFS is using smaller blocks, maybe just 512B ?

Likely. It uses a variable block size.


> I wonder how common these type of non-4K aligned workloads are.
> Apparently, people with such workloads should avoid btrfs, but maybe
> these types of workloads are very rare?

I can't directly answer the question, but all of the typical file systems on OS X, Linux, and Windows default to 4KB block sizes for many years now, baked in at creation time. On OS X, the block size varies automatically with respect to volume size at fs creation time (it goes to 8KB block sizes above 2TB, and scales up to 1MB block sizes), but still isn't ever less than 4KB unless manually created this way. So I'd think such workloads are rare.

I also don't know if any common use fs has an optimization whereby just the modified sector(s) is overwritten, rather than all sectors making up the file system block being modified.

Chris Murphy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why does btrfs benchmark so badly in this case?
  2013-08-08 20:59     ` Chris Murphy
@ 2013-08-08 21:25       ` Zach Brown
  0 siblings, 0 replies; 10+ messages in thread
From: Zach Brown @ 2013-08-08 21:25 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

> I also don't know if any common use fs has an optimization whereby
> just the modified sector(s) is overwritten, rather than all sectors
> making up the file system block being modified.

Most of them do.  The generic direct io path allows sector sized dio.
The very first bit of do_blockdev_direct_IO() is testing first for file
system block size alignment then for block device sector size alignment.

You can see this easily with dd conv=notrunc oflags=direct and blktrace.

# blockdev --getss /dev/sda
512
# blockdev --getbsz /dev/sda
4096

# blktrace -d /dev/sda -a issue -o - | blkparse -i - &

$ dd if=/dev/zero of=file bs=4096 count=1 oflag=direct conv=notrunc
  8,0    3       14    35.957320002 17941  D  WS 137297704 + 8 [dd]

$ dd if=/dev/zero of=file bs=512 count=1 oflag=direct conv=notrunc
  8,0    1        4    31.405641362 17940  D  WS 137297704 + 1 [dd]

- z

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why does btrfs benchmark so badly in this case?
  2013-08-08 20:38     ` Josef Bacik
@ 2013-08-09 21:35       ` Kai Krakow
  2013-08-12 13:48         ` Josef Bacik
  0 siblings, 1 reply; 10+ messages in thread
From: Kai Krakow @ 2013-08-09 21:35 UTC (permalink / raw)
  To: linux-btrfs

Josef Bacik <jbacik@fusionio.com> schrieb:

>> So I guess the reason that ZFS does well with that workload is that
>> ZFS is using smaller blocks, maybe just 512B ?
> 
> Yeah I'm not sure what ZFS does, but if you are writing over a block and
> the size/offset isn't aligned then you'd see similar issues with ZFS since
> it would
> have to read+modify+write.  It is likely that ZFS just is using a smaller
> blocksize.

>From what I remember, ZFS uses dynamic block sizes. However, block size can 
be forced and thus tuned for workloads that require it:

http://www.joyent.com/blog/bruning-questions-zfs-record-size

Maybe that's the reason...

It would be interesting to see how the benchmarks performed with forced 
block size.

Regards,
Kai


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why does btrfs benchmark so badly in this case?
  2013-08-09 21:35       ` Kai Krakow
@ 2013-08-12 13:48         ` Josef Bacik
  0 siblings, 0 replies; 10+ messages in thread
From: Josef Bacik @ 2013-08-12 13:48 UTC (permalink / raw)
  To: Kai Krakow; +Cc: linux-btrfs

On Fri, Aug 09, 2013 at 11:35:33PM +0200, Kai Krakow wrote:
> Josef Bacik <jbacik@fusionio.com> schrieb:
> 
> >> So I guess the reason that ZFS does well with that workload is that
> >> ZFS is using smaller blocks, maybe just 512B ?
> > 
> > Yeah I'm not sure what ZFS does, but if you are writing over a block and
> > the size/offset isn't aligned then you'd see similar issues with ZFS since
> > it would
> > have to read+modify+write.  It is likely that ZFS just is using a smaller
> > blocksize.
> 
> From what I remember, ZFS uses dynamic block sizes. However, block size can 
> be forced and thus tuned for workloads that require it:
> 
> http://www.joyent.com/blog/bruning-questions-zfs-record-size
> 
> Maybe that's the reason...
> 
> It would be interesting to see how the benchmarks performed with forced 
> block size.
> 

When I did bs=4k in the fio job to force it to use 4k blocksizes we performed
the same as ext4 and xfs.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-08-12 13:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-08 16:13 Why does btrfs benchmark so badly in this case? John Williams
2013-08-08 17:29 ` Josef Bacik
2013-08-08 18:37 ` Clemens Eisserer
2013-08-08 19:40 ` Josef Bacik
2013-08-08 20:23   ` John Williams
2013-08-08 20:38     ` Josef Bacik
2013-08-09 21:35       ` Kai Krakow
2013-08-12 13:48         ` Josef Bacik
2013-08-08 20:59     ` Chris Murphy
2013-08-08 21:25       ` Zach Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).