* write performance of HW RAID VS MD RAID
@ 2015-06-10 22:27 Ming Lin
2015-06-10 23:00 ` Neil Brown
2015-06-11 0:27 ` Roman Mamedov
0 siblings, 2 replies; 9+ messages in thread
From: Ming Lin @ 2015-06-10 22:27 UTC (permalink / raw)
To: linux-raid; +Cc: Neil Brown
Hi NeilBrown,
As you may already see, I run a lot of tests with 10 HDDs for the patchset
"simplify block layer based on immutable biovecs"
Here is the summary.
http://minggr.net/pub/20150608/fio_results/summary.log
MD RAID6 read performance is OK.
But write performance is much lower than HW RAID6.
Is it a known issue?
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: write performance of HW RAID VS MD RAID
2015-06-10 22:27 write performance of HW RAID VS MD RAID Ming Lin
@ 2015-06-10 23:00 ` Neil Brown
2015-06-10 23:34 ` Steven Haigh
2015-06-10 23:59 ` Ming Lin
2015-06-11 0:27 ` Roman Mamedov
1 sibling, 2 replies; 9+ messages in thread
From: Neil Brown @ 2015-06-10 23:00 UTC (permalink / raw)
To: Ming Lin; +Cc: linux-raid
On Wed, 10 Jun 2015 15:27:07 -0700
Ming Lin <mlin@kernel.org> wrote:
> Hi NeilBrown,
>
> As you may already see, I run a lot of tests with 10 HDDs for the patchset
> "simplify block layer based on immutable biovecs"
>
> Here is the summary.
> http://minggr.net/pub/20150608/fio_results/summary.log
>
> MD RAID6 read performance is OK.
> But write performance is much lower than HW RAID6.
>
> Is it a known issue?
It is not unexpected.
There are two likely reasons.
One is that HW RAID cards often have on-board NVRAM which is used as a
write-behind cache. This allows better throughput by hiding latency and more
often gathering full-stripe writes. HW RAID cards may also have accelerators
for the parity calculations, but that is not likely to make a big difference.
What sort of RAID6 controller do you have?
The other is that it is not easy for MD/RAID6 to schedule writes stripes
optimally. It doesn't really know if more writes are coming, so it should
wait, or if it already has everything - so it should get to work straight away.
It is possible that it could reply to writes as soon as they are in the
(volatile) cache and only force things to storage when a REQ_FUA or REQ_FLUSH
arrives. That might help ... or it might corrupt filesystems :-(
As long as the patches don't make things obviously worse, I'm happy.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: write performance of HW RAID VS MD RAID
2015-06-10 23:00 ` Neil Brown
@ 2015-06-10 23:34 ` Steven Haigh
2015-06-10 23:59 ` Ming Lin
1 sibling, 0 replies; 9+ messages in thread
From: Steven Haigh @ 2015-06-10 23:34 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2852 bytes --]
On Thu, 11 Jun 2015 09:00:54 AM Neil Brown wrote:
> On Wed, 10 Jun 2015 15:27:07 -0700
>
> Ming Lin <mlin@kernel.org> wrote:
> > Hi NeilBrown,
> >
> > As you may already see, I run a lot of tests with 10 HDDs for the patchset
> > "simplify block layer based on immutable biovecs"
> >
> > Here is the summary.
> > http://minggr.net/pub/20150608/fio_results/summary.log
> >
> > MD RAID6 read performance is OK.
> > But write performance is much lower than HW RAID6.
> >
> > Is it a known issue?
>
> It is not unexpected.
> There are two likely reasons.
> One is that HW RAID cards often have on-board NVRAM which is used as a
> write-behind cache. This allows better throughput by hiding latency and
> more often gathering full-stripe writes. HW RAID cards may also have
> accelerators for the parity calculations, but that is not likely to make a
> big difference. What sort of RAID6 controller do you have?
>
> The other is that it is not easy for MD/RAID6 to schedule writes stripes
> optimally. It doesn't really know if more writes are coming, so it should
> wait, or if it already has everything - so it should get to work straight
> away. It is possible that it could reply to writes as soon as they are in
> the (volatile) cache and only force things to storage when a REQ_FUA or
> REQ_FLUSH arrives. That might help ... or it might corrupt filesystems :-(
And this here is the problem. Any conceptual changes that risk filesystem and
therefore data integrity are bad. For something as simple as benchmarks it
isn't really worth the risk of losing data integrity.
In a hardware card setup, one would hope that the write cache is battery
backed - or flash - or something that won't lose data if the power goes out.
When you're running this in software, you can't magically keep data if you
lose power - so the longer something is not flushed to disk, the longer the
risk period for a write.
If you want to extend this concept - then you're not safe from writes between
the write buffer in the kernel and the (hopefully) battery backed RAM on the
hardware card if power is lost. You're also not safe when the card is writing
to the physical disk - modern hard drives have massive caches! If the drive
has the write in its cache and loses power, is the data gone?
Guaranteed data integrity these days is a difficult subject. The kernel may say
the data is written properly - but is it? The HW RAID card may say the data is
written properly - but is it? Or is it still in cache? Or has it just hit the
HDD cache?
What we currently have is a slight tradeoff in performance for a minimalisation
of risk (as far as practical anyway) - and I'm ok with this.
--
Steven Haigh
Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: write performance of HW RAID VS MD RAID
2015-06-10 23:00 ` Neil Brown
2015-06-10 23:34 ` Steven Haigh
@ 2015-06-10 23:59 ` Ming Lin
2015-06-11 0:28 ` Neil Brown
1 sibling, 1 reply; 9+ messages in thread
From: Ming Lin @ 2015-06-10 23:59 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
On Wed, Jun 10, 2015 at 4:00 PM, Neil Brown <neilb@suse.de> wrote:
> On Wed, 10 Jun 2015 15:27:07 -0700
> Ming Lin <mlin@kernel.org> wrote:
>
>> Hi NeilBrown,
>>
>> As you may already see, I run a lot of tests with 10 HDDs for the patchset
>> "simplify block layer based on immutable biovecs"
>>
>> Here is the summary.
>> http://minggr.net/pub/20150608/fio_results/summary.log
>>
>> MD RAID6 read performance is OK.
>> But write performance is much lower than HW RAID6.
>>
>> Is it a known issue?
>
> It is not unexpected.
> There are two likely reasons.
> One is that HW RAID cards often have on-board NVRAM which is used as a
> write-behind cache. This allows better throughput by hiding latency and more
> often gathering full-stripe writes. HW RAID cards may also have accelerators
> for the parity calculations, but that is not likely to make a big difference.
> What sort of RAID6 controller do you have?
PERC H730 Mini
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: write performance of HW RAID VS MD RAID
2015-06-10 22:27 write performance of HW RAID VS MD RAID Ming Lin
2015-06-10 23:00 ` Neil Brown
@ 2015-06-11 0:27 ` Roman Mamedov
2015-06-11 5:39 ` AW: " Markus Stockhausen
1 sibling, 1 reply; 9+ messages in thread
From: Roman Mamedov @ 2015-06-11 0:27 UTC (permalink / raw)
To: Ming Lin; +Cc: linux-raid, Neil Brown
[-- Attachment #1: Type: text/plain, Size: 611 bytes --]
On Wed, 10 Jun 2015 15:27:07 -0700
Ming Lin <mlin@kernel.org> wrote:
> Hi NeilBrown,
>
> As you may already see, I run a lot of tests with 10 HDDs for the patchset
> "simplify block layer based on immutable biovecs"
>
> Here is the summary.
> http://minggr.net/pub/20150608/fio_results/summary.log
>
> MD RAID6 read performance is OK.
> But write performance is much lower than HW RAID6.
>
> Is it a known issue?
Did you tune the stripe_cache_size for the array? Try 32768.
https://peterkieser.com/2009/11/29/raid-mdraid-stripe_cache_size-vs-write-transfer/
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: write performance of HW RAID VS MD RAID
2015-06-10 23:59 ` Ming Lin
@ 2015-06-11 0:28 ` Neil Brown
0 siblings, 0 replies; 9+ messages in thread
From: Neil Brown @ 2015-06-11 0:28 UTC (permalink / raw)
To: Ming Lin; +Cc: linux-raid
On Wed, 10 Jun 2015 16:59:10 -0700
Ming Lin <mlin@kernel.org> wrote:
> On Wed, Jun 10, 2015 at 4:00 PM, Neil Brown <neilb@suse.de> wrote:
> > On Wed, 10 Jun 2015 15:27:07 -0700
> > Ming Lin <mlin@kernel.org> wrote:
> >
> >> Hi NeilBrown,
> >>
> >> As you may already see, I run a lot of tests with 10 HDDs for the patchset
> >> "simplify block layer based on immutable biovecs"
> >>
> >> Here is the summary.
> >> http://minggr.net/pub/20150608/fio_results/summary.log
> >>
> >> MD RAID6 read performance is OK.
> >> But write performance is much lower than HW RAID6.
> >>
> >> Is it a known issue?
> >
> > It is not unexpected.
> > There are two likely reasons.
> > One is that HW RAID cards often have on-board NVRAM which is used as a
> > write-behind cache. This allows better throughput by hiding latency and more
> > often gathering full-stripe writes. HW RAID cards may also have accelerators
> > for the parity calculations, but that is not likely to make a big difference.
> > What sort of RAID6 controller do you have?
>
> PERC H730 Mini
http://www.dell.com/learn/us/en/04/campaigns/dell-raid-controllers
1GB NV Flash Backed Cache on the H730
That would explain a lot of performance difference for writes.
NeilBrown
^ permalink raw reply [flat|nested] 9+ messages in thread
* AW: write performance of HW RAID VS MD RAID
2015-06-11 0:27 ` Roman Mamedov
@ 2015-06-11 5:39 ` Markus Stockhausen
2015-06-11 6:02 ` Ming Lin
0 siblings, 1 reply; 9+ messages in thread
From: Markus Stockhausen @ 2015-06-11 5:39 UTC (permalink / raw)
To: Ming Lin; +Cc: linux-raid@vger.kernel.org, Neil Brown, Roman Mamedov
[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]
> Von: linux-raid-owner@vger.kernel.org [linux-raid-owner@vger.kernel.org]" im Auftrag von "Roman Mamedov [rm@romanrm.net]
> Gesendet: Donnerstag, 11. Juni 2015 02:27
> An: Ming Lin
> Cc: linux-raid@vger.kernel.org; Neil Brown
> Betreff: Re: write performance of HW RAID VS MD RAID
>
> On Wed, 10 Jun 2015 15:27:07 -0700
> Ming Lin <mlin@kernel.org> wrote:
>
> > Hi NeilBrown,
> >
> > As you may already see, I run a lot of tests with 10 HDDs for the patchset
> > "simplify block layer based on immutable biovecs"
> >
> > Here is the summary.
> > http://minggr.net/pub/20150608/fio_results/summary.log
> >
> > MD RAID6 read performance is OK.
> > But write performance is much lower than HW RAID6.
> >
> > Is it a known issue?
>
> Did you tune the stripe_cache_size for the array? Try 32768.
> https://peterkieser.com/2009/11/29/raid-mdraid-stripe_cache_size-vs-write-transfer/
+1 for giving an increased cache size a try.
From the numbers I anticipate that you are doing sequential
read/write tests. Otherwise I would expect a write penalty for
the HW RAID setup too.
Markus
=
[-- Attachment #2: InterScan_Disclaimer.txt --]
[-- Type: text/plain, Size: 1650 bytes --]
****************************************************************************
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.
Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
Vorstand:
Kadir Akin
Dr. Michael Höhnerbach
Vorsitzender des Aufsichtsrates:
Hans Kristian Langva
Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.
Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln
executive board:
Kadir Akin
Dr. Michael Höhnerbach
President of the supervisory board:
Hans Kristian Langva
Registry office: district court Cologne
Register number: HRB 52 497
****************************************************************************
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: write performance of HW RAID VS MD RAID
2015-06-11 5:39 ` AW: " Markus Stockhausen
@ 2015-06-11 6:02 ` Ming Lin
2015-06-12 17:20 ` Ming Lin
0 siblings, 1 reply; 9+ messages in thread
From: Ming Lin @ 2015-06-11 6:02 UTC (permalink / raw)
To: Markus Stockhausen; +Cc: Roman Mamedov, linux-raid@vger.kernel.org, Neil Brown
On Wed, Jun 10, 2015 at 10:39 PM, Markus Stockhausen
<stockhausen@collogia.de> wrote:
>> Von: linux-raid-owner@vger.kernel.org [linux-raid-owner@vger.kernel.org]" im Auftrag von "Roman Mamedov [rm@romanrm.net]
>> Gesendet: Donnerstag, 11. Juni 2015 02:27
>> An: Ming Lin
>> Cc: linux-raid@vger.kernel.org; Neil Brown
>> Betreff: Re: write performance of HW RAID VS MD RAID
>>
>> On Wed, 10 Jun 2015 15:27:07 -0700
>> Ming Lin <mlin@kernel.org> wrote:
>>
>> > Hi NeilBrown,
>> >
>> > As you may already see, I run a lot of tests with 10 HDDs for the patchset
>> > "simplify block layer based on immutable biovecs"
>> >
>> > Here is the summary.
>> > http://minggr.net/pub/20150608/fio_results/summary.log
>> >
>> > MD RAID6 read performance is OK.
>> > But write performance is much lower than HW RAID6.
>> >
>> > Is it a known issue?
>>
>> Did you tune the stripe_cache_size for the array? Try 32768.
>> https://peterkieser.com/2009/11/29/raid-mdraid-stripe_cache_size-vs-write-transfer/
>
> +1 for giving an increased cache size a try.
Will try it.
>
> From the numbers I anticipate that you are doing sequential
> read/write tests. Otherwise I would expect a write penalty for
> the HW RAID setup too.
Yes,
[global]
ioengine=libaio
iodepth=64
direct=1
runtime=1800
time_based
group_reporting
numjobs=48
gtod_reduce=0
norandommap
write_iops_log=fs
[job1]
bs=640K
directory=/mnt
size=5G
rw=write
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: write performance of HW RAID VS MD RAID
2015-06-11 6:02 ` Ming Lin
@ 2015-06-12 17:20 ` Ming Lin
0 siblings, 0 replies; 9+ messages in thread
From: Ming Lin @ 2015-06-12 17:20 UTC (permalink / raw)
To: Ming Lin
Cc: Markus Stockhausen, Roman Mamedov, linux-raid@vger.kernel.org,
Neil Brown
On Wed, Jun 10, 2015 at 11:02 PM, Ming Lin <mlin@kernel.org> wrote:
> On Wed, Jun 10, 2015 at 10:39 PM, Markus Stockhausen
> <stockhausen@collogia.de> wrote:
>>> Von: linux-raid-owner@vger.kernel.org [linux-raid-owner@vger.kernel.org]" im Auftrag von "Roman Mamedov [rm@romanrm.net]
>>> Gesendet: Donnerstag, 11. Juni 2015 02:27
>>> An: Ming Lin
>>> Cc: linux-raid@vger.kernel.org; Neil Brown
>>> Betreff: Re: write performance of HW RAID VS MD RAID
>>>
>>> On Wed, 10 Jun 2015 15:27:07 -0700
>>> Ming Lin <mlin@kernel.org> wrote:
>>>
>>> > Hi NeilBrown,
>>> >
>>> > As you may already see, I run a lot of tests with 10 HDDs for the patchset
>>> > "simplify block layer based on immutable biovecs"
>>> >
>>> > Here is the summary.
>>> > http://minggr.net/pub/20150608/fio_results/summary.log
>>> >
>>> > MD RAID6 read performance is OK.
>>> > But write performance is much lower than HW RAID6.
>>> >
>>> > Is it a known issue?
>>>
>>> Did you tune the stripe_cache_size for the array? Try 32768.
>>> https://peterkieser.com/2009/11/29/raid-mdraid-stripe_cache_size-vs-write-transfer/
>>
>> +1 for giving an increased cache size a try.
>
> Will try it.
>
>>
>> From the numbers I anticipate that you are doing sequential
>> read/write tests. Otherwise I would expect a write penalty for
>> the HW RAID setup too.
>
> Yes,
>
> [global]
> ioengine=libaio
> iodepth=64
> direct=1
> runtime=1800
> time_based
> group_reporting
> numjobs=48
> gtod_reduce=0
> norandommap
> write_iops_log=fs
>
> [job1]
> bs=640K
> directory=/mnt
> size=5G
> rw=write
I tested xfs write for RAID6 stripe size 64k with different stripe_cache_size.
stripe_cache_size throughput(MB/s)
-------------------------- --------------------------
256 181.7
512 185.1
768 178.3
1024 194.4
2048 227
4096 247
8192 300.3
16834 312.4
32768 304
While xfs write for HW RAID6 throughput is 753.16 MB/s
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-06-12 17:20 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-10 22:27 write performance of HW RAID VS MD RAID Ming Lin
2015-06-10 23:00 ` Neil Brown
2015-06-10 23:34 ` Steven Haigh
2015-06-10 23:59 ` Ming Lin
2015-06-11 0:28 ` Neil Brown
2015-06-11 0:27 ` Roman Mamedov
2015-06-11 5:39 ` AW: " Markus Stockhausen
2015-06-11 6:02 ` Ming Lin
2015-06-12 17:20 ` Ming Lin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox