Linux RAID subsystem development
 help / color / mirror / Atom feed
From: Steven Haigh <netwiz@crc.id.au>
To: linux-raid@vger.kernel.org
Subject: Re: write performance of HW RAID VS MD RAID
Date: Thu, 11 Jun 2015 09:34:48 +1000	[thread overview]
Message-ID: <21405395.SSfhQRvNar@dell15> (raw)
In-Reply-To: <20150611090054.18daac07@home.neil.brown.name>

[-- Attachment #1: Type: text/plain, Size: 2852 bytes --]

On Thu, 11 Jun 2015 09:00:54 AM Neil Brown wrote:
> On Wed, 10 Jun 2015 15:27:07 -0700
> 
> Ming Lin <mlin@kernel.org> wrote:
> > Hi NeilBrown,
> > 
> > As you may already see, I run a lot of tests with 10 HDDs for the patchset
> > "simplify block layer based on immutable biovecs"
> > 
> > Here is the summary.
> > http://minggr.net/pub/20150608/fio_results/summary.log
> > 
> > MD RAID6 read performance is OK.
> > But write performance is much lower than HW RAID6.
> > 
> > Is it a known issue?
> 
> It is not unexpected.
> There are two likely reasons.
> One is that HW RAID cards often have on-board NVRAM which is used as a
> write-behind cache.  This allows better throughput by hiding latency and
> more often gathering full-stripe writes.  HW RAID cards may also have
> accelerators for the parity calculations, but that is not likely to make a
> big difference. What sort of RAID6 controller do you have?
> 
> The other is that it is not easy for MD/RAID6 to schedule writes stripes
> optimally.  It doesn't really know if more writes are coming, so it should
> wait, or if it already has everything - so it should get to work straight
> away. It is possible that it could reply to writes as soon as they are in
> the (volatile) cache and only force things to storage when a REQ_FUA or
> REQ_FLUSH arrives.  That might help ... or it might corrupt filesystems :-(

And this here is the problem. Any conceptual changes that risk filesystem and 
therefore data integrity are bad. For something as simple as benchmarks it 
isn't really worth the risk of losing data integrity.

In a hardware card setup, one would hope that the write cache is battery 
backed - or flash - or something that won't lose data if the power goes out. 
When you're running this in software, you can't magically keep data if you 
lose power - so the longer something is not flushed to disk, the longer the 
risk period for a write.

If you want to extend this concept - then you're not safe from writes between 
the write buffer in the kernel and the (hopefully) battery backed RAM on the 
hardware card if power is lost. You're also not safe when the card is writing 
to the physical disk - modern hard drives have massive caches! If the drive 
has the write in its cache and loses power, is the data gone?

Guaranteed data integrity these days is a difficult subject. The kernel may say 
the data is written properly - but is it? The HW RAID card may say the data is 
written properly - but is it? Or is it still in cache? Or has it just hit the 
HDD cache?

What we currently have is a slight tradeoff in performance for a minimalisation 
of risk (as far as practical anyway) - and I'm ok with this.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2015-06-10 23:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-10 22:27 write performance of HW RAID VS MD RAID Ming Lin
2015-06-10 23:00 ` Neil Brown
2015-06-10 23:34   ` Steven Haigh [this message]
2015-06-10 23:59   ` Ming Lin
2015-06-11  0:28     ` Neil Brown
2015-06-11  0:27 ` Roman Mamedov
2015-06-11  5:39   ` AW: " Markus Stockhausen
2015-06-11  6:02     ` Ming Lin
2015-06-12 17:20       ` Ming Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21405395.SSfhQRvNar@dell15 \
    --to=netwiz@crc.id.au \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox