public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* Benchmarking btrfs on HW Raid ... BAD
@ 2009-09-28  8:06 Tobias Oetiker
  2009-09-28  8:17 ` Florian Weimer
  0 siblings, 1 reply; 7+ messages in thread
From: Tobias Oetiker @ 2009-09-28  8:06 UTC (permalink / raw)
  To: linux-btrfs

List,

I have done some benchmarking of different file systems on a HW raid
(areca RAID6 with 7 disks). My test focuses on the behaviour of the
system under competing read and write workloads.

My benchmark runs 3 writer and 3 reader  processes in parallel each
reader gets its own artificial 5 GB home-directory-filetree while
the writers create a similar tree.

Running this on a single disk, I get the quite acceptable results.
When running on-top of a Areca HW Raid6 (lvm partitioned)
then both read and write performance go down by at least 2
magnitudes.

I am not on this list, so please cc me on any replies.

My Test Software is here:

    http://oss.oetiker.ch/optools/wiki/FsOpBench

And these are the results:

2.6.31 - btrfs - cfq - single sata disk
######################################################################

1 readers (30s)
----------------------------------------------------------------------
A read dir        cnt  56400    min    0.001 ms    max   96.106 ms    mean    0.053 ms    stdev   0.973
B lstat file      cnt  52652    min    0.006 ms    max   34.721 ms    mean    0.057 ms    stdev   0.680
C open file       cnt  41411    min    0.014 ms    max    0.277 ms    mean    0.017 ms    stdev   0.003
D rd 1st byte     cnt  41412    min    0.019 ms    max   51.501 ms    mean    0.327 ms    stdev   1.774
E read rate     164.741 MB/s (data)  44.940 MB/s (readdir + open + 1st byte + data)

3 readers (30s)
----------------------------------------------------------------------
A read dir        cnt  21322    min    0.001 ms    max   72.704 ms    mean    0.073 ms    stdev   1.544
B lstat file      cnt  19881    min    0.006 ms    max  103.878 ms    mean    0.145 ms    stdev   2.055
C open file       cnt  15558    min    0.014 ms    max    0.109 ms    mean    0.018 ms    stdev   0.003
D rd 1st byte     cnt  15558    min    0.020 ms    max 2528.137 ms    mean    1.312 ms    stdev  21.358
E read rate     106.851 MB/s (data)  15.778 MB/s (readdir + open + 1st byte + data)

3 readers, 3 writers (30s)
----------------------------------------------------------------------
F write open      cnt  15428    min    0.057 ms    max  898.478 ms    mean    0.390 ms    stdev  13.349
G wr 1st byte     cnt  15428    min    0.006 ms    max   15.889 ms    mean    0.009 ms    stdev   0.147
H write close     cnt  15428    min    0.016 ms    max  533.088 ms    mean    0.222 ms    stdev   9.099
I mkdir           cnt   1350    min    0.031 ms    max   77.956 ms    mean    0.127 ms    stdev   2.218
J write rate     30.738 MB/s (data)  23.177 MB/s (open + 1st byte + data)

A read dir        cnt   3382    min    0.001 ms    max 1586.615 ms    mean    0.831 ms    stdev  29.901
B lstat file      cnt   3158    min    0.007 ms    max  427.770 ms    mean    0.390 ms    stdev   9.328
C open file       cnt   2489    min    0.014 ms    max    2.644 ms    mean    0.020 ms    stdev   0.071
D rd 1st byte     cnt   2489    min    0.021 ms    max 2033.881 ms    mean    8.327 ms    stdev  68.468
E read rate      11.927 MB/s (data)   2.169 MB/s (readdir + open + 1st byte + data)

2.6.31 - btrfs - cfq - areca raid6 (7 disks) lvm partitioned
######################################################################

1 readers (30s)
----------------------------------------------------------------------
A read dir        cnt  78845    min    0.001 ms    max   29.713 ms    mean    0.027 ms    stdev   0.421
B lstat file      cnt  73600    min    0.006 ms    max   21.639 ms    mean    0.038 ms    stdev   0.273
C open file       cnt  57862    min    0.013 ms    max    0.100 ms    mean    0.017 ms    stdev   0.003
D rd 1st byte     cnt  57861    min    0.014 ms    max   70.214 ms    mean    0.209 ms    stdev   0.919
E read rate     185.464 MB/s (data)  63.842 MB/s (readdir + open + 1st byte + data)

3 readers (30s)
----------------------------------------------------------------------
A read dir        cnt  41222    min    0.001 ms    max  169.195 ms    mean    0.056 ms    stdev   1.113
B lstat file      cnt  38447    min    0.006 ms    max   79.977 ms    mean    0.064 ms    stdev   0.746
C open file       cnt  30122    min    0.013 ms    max    0.042 ms    mean    0.018 ms    stdev   0.003
D rd 1st byte     cnt  30122    min    0.014 ms    max  597.264 ms    mean    0.535 ms    stdev   6.646
E read rate     124.144 MB/s (data)  31.197 MB/s (readdir + open + 1st byte + data)

3 readers, 3 writers (30s)
----------------------------------------------------------------------
F write open      cnt    107    min    0.063 ms    max   70.593 ms    mean    0.760 ms    stdev   6.784
G wr 1st byte     cnt    107    min    0.006 ms    max    0.014 ms    mean    0.007 ms    stdev   0.002
H write close     cnt    107    min    0.017 ms    max 1784.192 ms    mean   20.830 ms    stdev 176.474
I mkdir           cnt      9    min    0.049 ms    max    9.184 ms    mean    1.079 ms    stdev   2.865
J write rate      0.200 MB/s (data)   0.199 MB/s (open + 1st byte + data)

A read dir        cnt   1215    min    0.001 ms    max 2661.328 ms    mean    4.008 ms    stdev  81.513
B lstat file      cnt   1144    min    0.007 ms    max  377.476 ms    mean    1.827 ms    stdev  18.844
C open file       cnt    928    min    0.014 ms    max    1.596 ms    mean    0.021 ms    stdev   0.056
D rd 1st byte     cnt    928    min    0.015 ms    max 1936.262 ms    mean   25.187 ms    stdev 123.755
E read rate       9.199 MB/s (data)   0.792 MB/s (readdir + open + 1st byte + data)



-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Benchmarking btrfs on HW Raid ... BAD
  2009-09-28  8:06 Benchmarking btrfs on HW Raid ... BAD Tobias Oetiker
@ 2009-09-28  8:17 ` Florian Weimer
  2009-09-28  9:19   ` Tobias Oetiker
  2009-09-28  9:19   ` Daniel J Blueman
  0 siblings, 2 replies; 7+ messages in thread
From: Florian Weimer @ 2009-09-28  8:17 UTC (permalink / raw)
  To: Tobias Oetiker; +Cc: linux-btrfs

* Tobias Oetiker:

> Running this on a single disk, I get the quite acceptable results.
> When running on-top of a Areca HW Raid6 (lvm partitioned)
> then both read and write performance go down by at least 2
> magnitudes.

Does the HW RAID use write caching (preferably battery-backed)?

--=20
=46lorian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstra=DFe 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Benchmarking btrfs on HW Raid ... BAD
  2009-09-28  8:17 ` Florian Weimer
@ 2009-09-28  9:19   ` Tobias Oetiker
  2009-09-28  9:19   ` Daniel J Blueman
  1 sibling, 0 replies; 7+ messages in thread
From: Tobias Oetiker @ 2009-09-28  9:19 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linux-btrfs

Hi Florian,

Today Florian Weimer wrote:

> * Tobias Oetiker:
>
> > Running this on a single disk, I get the quite acceptable results.
> > When running on-top of a Areca HW Raid6 (lvm partitioned)
> > then both read and write performance go down by at least 2
> > magnitudes.
>
> Does the HW RAID use write caching (preferably battery-backed)?

yes it does ...  is there some magic switch to be set for btrfs to
act accordingly ?

cheers
tobi
>
>

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Benchmarking btrfs on HW Raid ... BAD
  2009-09-28  8:17 ` Florian Weimer
  2009-09-28  9:19   ` Tobias Oetiker
@ 2009-09-28  9:19   ` Daniel J Blueman
  2009-09-28  9:39     ` Tobias Oetiker
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel J Blueman @ 2009-09-28  9:19 UTC (permalink / raw)
  To: Tobias Oetiker; +Cc: Florian Weimer, linux-btrfs

On Mon, Sep 28, 2009 at 9:17 AM, Florian Weimer <fweimer@bfk.de> wrote:
> * Tobias Oetiker:
>
>> Running this on a single disk, I get the quite acceptable results.
>> When running on-top of a Areca HW Raid6 (lvm partitioned)
>> then both read and write performance go down by at least 2
>> magnitudes.
>
> Does the HW RAID use write caching (preferably battery-backed)?

I believe Areca controllers have an option for writeback or
writethrough caching, so it's worth checking this and that you're
running the current firmware, in case of errata. Ironically, disabling
writeback will give the OS tighter control of request latency, but
throughput may drop a lot. I still can't help thinking that this is
down to the behaviour of the controller, due to the 1-disk case
working well.

One way would be to configure the array as 6 or 7 devices, and allow
BTRFS/DM to mange the array, then see if performance under write load
is better, and with or without writeback caching...

Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Benchmarking btrfs on HW Raid ... BAD
  2009-09-28  9:19   ` Daniel J Blueman
@ 2009-09-28  9:39     ` Tobias Oetiker
  2009-09-30 14:35       ` Ric Wheeler
  0 siblings, 1 reply; 7+ messages in thread
From: Tobias Oetiker @ 2009-09-28  9:39 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Florian Weimer, linux-btrfs

Hi Daniel,

Today Daniel J Blueman wrote:

> On Mon, Sep 28, 2009 at 9:17 AM, Florian Weimer <fweimer@bfk.de> wrote:
> > * Tobias Oetiker:
> >
> >> Running this on a single disk, I get the quite acceptable results.
> >> When running on-top of a Areca HW Raid6 (lvm partitioned)
> >> then both read and write performance go down by at least 2
> >> magnitudes.
> >
> > Does the HW RAID use write caching (preferably battery-backed)?
>
> I believe Areca controllers have an option for writeback or
> writethrough caching, so it's worth checking this and that you're
> running the current firmware, in case of errata. Ironically, disabling
> writeback will give the OS tighter control of request latency, but
> throughput may drop a lot. I still can't help thinking that this is
> down to the behaviour of the controller, due to the 1-disk case
> working well.

it certainly is down to  a behaviour of the controller, or the
results would be the same as with a single sata disk :-) It would
be interesting to see what results others get on HW Raid
Controllers ...

> One way would be to configure the array as 6 or 7 devices, and allow
> BTRFS/DM to mange the array, then see if performance under write load
> is better, and with or without writeback caching...

I can imagine that this would help, but since btrfs aims to be
multipurpose, this does not realy help all that much since this
fundamentally alters the 'conditions' at the moment the RAID
contains different filesystem and is partitioned using lvm ...

cheers
tobi

the results for ext3 fs look like this ...



-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Benchmarking btrfs on HW Raid ... BAD
  2009-09-28  9:39     ` Tobias Oetiker
@ 2009-09-30 14:35       ` Ric Wheeler
  2009-09-30 18:44         ` Tobias Oetiker
  0 siblings, 1 reply; 7+ messages in thread
From: Ric Wheeler @ 2009-09-30 14:35 UTC (permalink / raw)
  To: Tobias Oetiker; +Cc: Daniel J Blueman, Florian Weimer, linux-btrfs

On 09/28/2009 05:39 AM, Tobias Oetiker wrote:
> Hi Daniel,
>
> Today Daniel J Blueman wrote:
>
>    
>> On Mon, Sep 28, 2009 at 9:17 AM, Florian Weimer<fweimer@bfk.de>  wrote:
>>      
>>> * Tobias Oetiker:
>>>
>>>        
>>>> Running this on a single disk, I get the quite acceptable results.
>>>> When running on-top of a Areca HW Raid6 (lvm partitioned)
>>>> then both read and write performance go down by at least 2
>>>> magnitudes.
>>>>          
>>> Does the HW RAID use write caching (preferably battery-backed)?
>>>        
>> I believe Areca controllers have an option for writeback or
>> writethrough caching, so it's worth checking this and that you're
>> running the current firmware, in case of errata. Ironically, disabling
>> writeback will give the OS tighter control of request latency, but
>> throughput may drop a lot. I still can't help thinking that this is
>> down to the behaviour of the controller, due to the 1-disk case
>> working well.
>>      
> it certainly is down to  a behaviour of the controller, or the
> results would be the same as with a single sata disk :-) It would
> be interesting to see what results others get on HW Raid
> Controllers ...
>
>    
>> One way would be to configure the array as 6 or 7 devices, and allow
>> BTRFS/DM to mange the array, then see if performance under write load
>> is better, and with or without writeback caching...
>>      
> I can imagine that this would help, but since btrfs aims to be
> multipurpose, this does not realy help all that much since this
> fundamentally alters the 'conditions' at the moment the RAID
> contains different filesystem and is partitioned using lvm ...
>
> cheers
> tobi
>
> the results for ext3 fs look like this ...
>
>    

I would be more suspicious of the barrier/flushes being issued. If your 
write cache is non-volatile, we really do not want to send them down to 
this type of device. Flushing this type of cache could certainly be 
very, very expensive and slow.

Try "mount -o nobarrier" and see if your performance (write cache still 
enabled on the controller) is back to expected levels,

Ric


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Benchmarking btrfs on HW Raid ... BAD
  2009-09-30 14:35       ` Ric Wheeler
@ 2009-09-30 18:44         ` Tobias Oetiker
  0 siblings, 0 replies; 7+ messages in thread
From: Tobias Oetiker @ 2009-09-30 18:44 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Daniel J Blueman, Florian Weimer, linux-btrfs, Roman Plessl

Hi Ric,

Today Ric Wheeler wrote:

> I would be more suspicious of the barrier/flushes being issued. If your write
> cache is non-volatile, we really do not want to send them down to this type of
> device. Flushing this type of cache could certainly be very, very expensive
> and slow.
>
> Try "mount -o nobarrier" and see if your performance (write cache still
> enabled on the controller) is back to expected levels,

wow, indeed ...

without special mount options I get the following from my RAID6 with non volatile cache:
######################################################################

1 readers (30s)
----------------------------------------------------------------------
A read dir        cnt  78845    min    0.001 ms    max   29.713 ms    mean    0.027 ms    stdev   0.421
B lstat file      cnt  73600    min    0.006 ms    max   21.639 ms    mean    0.038 ms    stdev   0.273
C open file       cnt  57862    min    0.013 ms    max    0.100 ms    mean    0.017 ms    stdev   0.003
D rd 1st byte     cnt  57861    min    0.014 ms    max   70.214 ms    mean    0.209 ms    stdev   0.919
E read rate     185.464 MB/s (data)  63.842 MB/s (readdir + open + 1st byte + data)

3 readers (30s)
----------------------------------------------------------------------
A read dir        cnt  41222    min    0.001 ms    max  169.195 ms    mean    0.056 ms    stdev   1.113
B lstat file      cnt  38447    min    0.006 ms    max   79.977 ms    mean    0.064 ms    stdev   0.746
C open file       cnt  30122    min    0.013 ms    max    0.042 ms    mean    0.018 ms    stdev   0.003
D rd 1st byte     cnt  30122    min    0.014 ms    max  597.264 ms    mean    0.535 ms    stdev   6.646
E read rate     124.144 MB/s (data)  31.197 MB/s (readdir + open + 1st byte + data)

3 readers, 3 writers (30s)
----------------------------------------------------------------------
F write open      cnt    107    min    0.063 ms    max   70.593 ms    mean    0.760 ms    stdev   6.784
G wr 1st byte     cnt    107    min    0.006 ms    max    0.014 ms    mean    0.007 ms    stdev   0.002
H write close     cnt    107    min    0.017 ms    max 1784.192 ms    mean   20.830 ms    stdev 176.474
I mkdir           cnt      9    min    0.049 ms    max    9.184 ms    mean    1.079 ms    stdev   2.865
J write rate      0.200 MB/s (data)   0.199 MB/s (open + 1st byte + data)

A read dir        cnt   1215    min    0.001 ms    max 2661.328 ms    mean    4.008 ms    stdev  81.513
B lstat file      cnt   1144    min    0.007 ms    max  377.476 ms    mean    1.827 ms    stdev  18.844
C open file       cnt    928    min    0.014 ms    max    1.596 ms    mean    0.021 ms    stdev   0.056
D rd 1st byte     cnt    928    min    0.015 ms    max 1936.262 ms    mean   25.187 ms    stdev 123.755
E read rate       9.199 MB/s (data)   0.792 MB/s (readdir + open + 1st byte + data)


mounting with -o nobarrier I get ...
######################################################################

1 readers (30s)
----------------------------------------------------------------------
A read dir        cnt  78876    min    0.001 ms    max   19.803 ms    mean    0.013 ms    stdev   0.228
B lstat file      cnt  73624    min    0.006 ms    max   18.032 ms    mean    0.034 ms    stdev   0.210
C open file       cnt  57868    min    0.014 ms    max    0.041 ms    mean    0.017 ms    stdev   0.003
D rd 1st byte     cnt  57869    min    0.019 ms    max  417.725 ms    mean    0.225 ms    stdev   2.459
E read rate     177.779 MB/s (data)  63.375 MB/s (readdir + open + 1st byte + data)

3 readers (30s)
----------------------------------------------------------------------
A read dir        cnt  38209    min    0.001 ms    max   26.745 ms    mean    0.025 ms    stdev   0.472
B lstat file      cnt  35624    min    0.006 ms    max   26.019 ms    mean    0.048 ms    stdev   0.410
C open file       cnt  27874    min    0.014 ms    max    1.257 ms    mean    0.017 ms    stdev   0.008
D rd 1st byte     cnt  27874    min    0.020 ms    max 3197.520 ms    mean    0.626 ms    stdev  20.279
E read rate      98.242 MB/s (data)  27.763 MB/s (readdir + open + 1st byte + data)

3 readers, 3 writers (30s)
----------------------------------------------------------------------
F write open      cnt   5957    min    0.061 ms    max  591.787 ms    mean    0.457 ms    stdev   9.956
G wr 1st byte     cnt   5956    min    0.006 ms    max    0.136 ms    mean    0.007 ms    stdev   0.002
H write close     cnt   5957    min    0.017 ms    max 1340.145 ms    mean    0.818 ms    stdev  22.442
I mkdir           cnt    574    min    0.034 ms    max   11.094 ms    mean    0.083 ms    stdev   0.543
J write rate      9.766 MB/s (data)   8.705 MB/s (open + 1st byte + data)

A read dir        cnt  15183    min    0.001 ms    max  439.260 ms    mean    0.130 ms    stdev   4.150
B lstat file      cnt  14199    min    0.006 ms    max  200.212 ms    mean    0.152 ms    stdev   3.420
C open file       cnt  11250    min    0.014 ms    max    6.641 ms    mean    0.019 ms    stdev   0.084
D rd 1st byte     cnt  11250    min    0.021 ms    max 1649.488 ms    mean    1.715 ms    stdev  19.472
E read rate      52.022 MB/s (data)  10.858 MB/s (readdir + open + 1st byte + data)

amazing effect ...

unfortunately the system also crashes quite often when running this
test, so I guess we have to wait a bit more for this to run
primetime ...

a further observation is that both the RAID as well the SATA case
show max latency values way over 1 second ... which is a bit much
... not that other filesystems were much better, but then again it
would be cool if btrfs could best the others ...

cheers
tobi

-- 
Tobi Oetiker, OETIKER+PARTNER AG, Aarweg 15 CH-4600 Olten, Switzerland
http://it.oetiker.ch tobi@oetiker.ch ++41 62 775 9902 / sb: -9900

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-09-30 18:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-28  8:06 Benchmarking btrfs on HW Raid ... BAD Tobias Oetiker
2009-09-28  8:17 ` Florian Weimer
2009-09-28  9:19   ` Tobias Oetiker
2009-09-28  9:19   ` Daniel J Blueman
2009-09-28  9:39     ` Tobias Oetiker
2009-09-30 14:35       ` Ric Wheeler
2009-09-30 18:44         ` Tobias Oetiker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox