All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc MERLIN <marc@merlins.org>
To: Stan Hoeppner <stan@hardwarefreak.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Very long raid5 init/rebuild times
Date: Wed, 22 Jan 2014 09:48:54 -0800	[thread overview]
Message-ID: <20140122174854.GF26014@merlins.org> (raw)
In-Reply-To: <52DF7976.6070808@hardwarefreak.com>

On Wed, Jan 22, 2014 at 01:55:34AM -0600, Stan Hoeppner wrote:
> >> Question #1:
> >> Is it better to dmcrypt the 5 drives and then make a raid5 on top, or the opposite
> >> (raid5 first, and then dmcrypt)
> 
> For maximum throughput and to avoid hitting a ceiling with one thread on
> one core, using one dmcrypt thread per physical device is a way to
> achieve this.
 
There is that, but at rebuild time, if dmcrypt is after raid5, the raid5
rebuild would happen without going through encryption, and hence would save
5 core's worth of encryption bandwidth, would it not (for 5 drives)

I agree that during non rebuild operation, I do get 5 cores of encryption
bandwidth insttead of 1, so if I'm willing to suck up the CPU from rebuild
time, it may be a good thing anyway.

> >> I used:
> >> cryptsetup luksFormat --align-payload=8192 -s 256 -c aes-xts-plain64 /dev/sd[mnopq]1
> 
> Changing the key size or the encryption method may decrease latency a
> bit, but likely not enough.

Ok, thanks.

> > I should have said that this is seemingly a stupid question since obviously
> > if you encrypt each drive separately, you're going through the encryption
> > layer 5 times during rebuilds instead of just once.
> 
> Each dmcrypt thread is handling 1/5th of the IOs.  The low init
> throughput isn't caused by using 5 threads.  One thread would likely do
> no better.

If crypt is on top of raid5, it seems (and that makes sense) that no
encryption is neded for the rebuild. However in my test I can confirm that
the rebuild time is exactly the same. I only get 19MB/s of rebuild bandwidth
and I think tha'ts because of the port multiplier.

> > However in my case, I'm not CPU-bound, so that didn't seem to be an issue
> > and I was more curious to know if the dmcrypt and dmraid5 layers stacked the
> > same regardless of which one was on top and which one at the bottom.
> 
> You are not CPU bound, nor hardware bandwidth bound.  You are latency
> bound, just like every dmcrypt user.  dmcrypt adds a non trivial amount
> of latency to every IO.  Latency with serial IO equals low throughput.

Are you sure that applies here in the rebuild time? I see no crypt thread
running.

> Experiment with these things to increase throughput.  If you're using
> the CFQ elevator switch to deadline.  Try smaller md chunk sizes, key
> lengths, different ciphers, etc.  Turn off automatic CPU frequency
> scaling.  I've read reports of encryption causing the frequency to drop
> instead of increase.

I'll check those too, they can't hurt.

> Once in production, if your application workloads do 1 or 2 above then
> you may see higher throughput than the 18MB/s you see with the init.  If
> your workloads are serial maybe not much more.
 
I expect to see more because the drives will move inside the array that is
directly connected to the SATA card without going through a PMP (with PMP
all the SATA IO is shared on a single SATA chip).

> Common sense says that encrypting 16TB of storage at the block level,
> using software libraries and optimized CPU instructions, is not a smart
> thing to do.  Not if one desires decent performance, and especially if
> one doesn't need all 16TB encrypted.

I encrypt everything now because I think it's good general hygiene, and I
don't want to think about where my drives and data end up 5 years later, or
worry if they get stolen.
Software encryption on linux has been close enough to wire speed for a
little while now, I encrypt my 500MB/s capable SSD on my laptop and barely
see slowdowns (except a bit of extra latency as you point out).

> If you in fact don't need all 16TB encrypted, and I'd argue very few do,
> especially John and Jane Doe, then tear this down, build a regular
> array, and maintain an encrypted directory or few.

Not bad advise in general.

> If you actually *need* to encrypt all 16TB at the block level, and
> require decent performance, you need to acquire a dedicated crypto
> board.  One board will cost more than your complete server.  The cost of
> such devices should be a strong clue as to who does and does not need to
> encrypt their entire storage.

I'm not actually convinced that the CPU is the bottleneck, and as pointed out 
if I put dmcrypt on top of raid5, the rebuild happens without any
encryption.
Or did I miss something?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

  reply	other threads:[~2014-01-22 17:48 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-21  7:35 Very long raid5 init/rebuild times Marc MERLIN
2014-01-21 16:37 ` Marc MERLIN
2014-01-21 17:08   ` Mark Knecht
2014-01-21 18:42   ` Chris Murphy
2014-01-22  7:55   ` Stan Hoeppner
2014-01-22 17:48     ` Marc MERLIN [this message]
2014-01-22 23:17       ` Stan Hoeppner
2014-01-23 14:28         ` John Stoffel
2014-01-24  1:02           ` Stan Hoeppner
2014-01-24  3:07             ` NeilBrown
2014-01-24  8:24               ` Stan Hoeppner
2014-01-23  2:37       ` Stan Hoeppner
2014-01-23  9:13         ` Marc MERLIN
2014-01-23 12:24           ` Stan Hoeppner
2014-01-23 21:01             ` Marc MERLIN
2014-01-24  5:13               ` Stan Hoeppner
2014-01-25  8:36                 ` Marc MERLIN
2014-01-28  7:46                   ` Stan Hoeppner
2014-01-28 16:50                     ` Marc MERLIN
2014-01-29  0:56                       ` Stan Hoeppner
2014-01-29  1:01                         ` Marc MERLIN
2014-01-30 20:47                     ` Phillip Susi
2014-02-01 22:39                       ` Stan Hoeppner
2014-02-02 18:53                         ` Phillip Susi
2014-02-03  6:34                           ` Stan Hoeppner
2014-02-03 14:42                             ` Phillip Susi
2014-02-04  3:30                               ` Stan Hoeppner
2014-02-04 17:59                                 ` Larry Fenske
2014-02-04 18:08                                   ` Phillip Susi
2014-02-04 18:43                                     ` Stan Hoeppner
2014-02-04 18:55                                       ` Phillip Susi
2014-02-04 19:15                                         ` Stan Hoeppner
2014-02-04 20:16                                           ` Phillip Susi
2014-02-04 21:58                                             ` Stan Hoeppner
2014-02-05  1:19                                               ` Phillip Susi
2014-02-05  1:42                                                 ` Stan Hoeppner
2014-01-30 20:36                 ` Phillip Susi
2014-01-30 20:18             ` Phillip Susi
2014-01-22 19:38     ` Opal 2.0 SEDs on linux, was: " Chris Murphy
2014-01-21 18:31 ` Chris Murphy
2014-01-22 13:46 ` Ethan Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140122174854.GF26014@merlins.org \
    --to=marc@merlins.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=stan@hardwarefreak.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.