linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: Peter Grandi <pg@lxra2.for.sabi.co.UK>
Cc: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: RAID-10 explicitly defined drive pairs?
Date: Mon, 09 Jan 2012 21:54:56 -0600	[thread overview]
Message-ID: <4F0BB690.4040904@hardwarefreak.com> (raw)
In-Reply-To: <20234.61352.443686.100881@tree.ty.sabi.co.UK>

On 1/9/2012 7:46 AM, Peter Grandi wrote:

> Those able to do a web search with the relevant keywords and
> read documentation can find some mentions of single SSD RMW and
> address/length alignment, for example here:
> 
>   http://research.cs.wisc.edu/adsl/Publications/ssd-usenix08.pdf
>   http://research.microsoft.com/en-us/projects/flashlight/winhec08-ssd.pptx
>   http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-09-2.pdf
> 
> Mentioned in passing as something pretty obvious, and there are
> other similar mentions that come up in web searches because it
> is a pretty natural application of thinking about RMW issues.

Yes, I've read such things.  I was eluding to the fact that there are at
least a half dozen different erase block sizes and algorithms in use by
different SSD manufacturers.  There is no standard.  And not all of them
are published.  There is no reliable way to do such optimization
generically.

> Now I eagerly await your explanation of the amazing "Hoeppner
> effect" by which address/length aligned writes on RAID0/1/10
> have significant benefits and of the audacious "Hoeppner
> principle" by which 'concat' is as good as RAID0 over the same
> disks.

IIRC from a previous discussion I had with Neil Brown on this list,
mdraid0, as with all the striped array code, runs as a single kernel
thread, limiting its performance to that of a single CPU.  A linear
concatenation does not run as a single kernel thread, but is simply an
offset calculation routine that, IIRC, executes on the same CPU as the
caller.  Thus one can theoretically achieve near 100% CPU scalability
when using concat instead of mdraid0.  So the issue isn't partial stripe
writes at the media level, but the CPU overhead caused by millions of
the little bastards with heavy random IOPS workloads, along with
increased numbers of smaller IOs through the SCSI/SATA interface,
causing more interrupts thus more CPU time, etc.

I've not run into this single stripe thread limitation myself, but have
read multiple cases where OPs can't get maximum performance from their
storage hardware because their top level mdraid stripe thread is peaking
a single CPU in their X-way system.  Moving from RAID10 to a linear
concat gets around this limitation for small file random IOPS workloads.
 Only when using XFS and a proper AG configuration, obviously.  This is
my recollection of Neil's description of the code behavior.  I could
very well have misunderstood, and I'm sure he'll correct me if that's
the case, or you, or both. ;)

Dave Chinner had some input WRT XFS on concat for this type of workload,
stating it's a little better than RAID10 (ambiguous as to hard/soft).
Did you read that thread Peter?  I know you're on the XFS list as well.
 I can't exactly recall at this time Dave's specific reasoning, I'll try
to dig it up.  I'm thinking it had to do with the different distribution
of metadata IOs between the two AG layouts, and the amount of total head
seeking required for the workload being somewhat higher for RAID10 than
for the concat of RAID1 pairs.  Again, I could be wrong on that, but it
seems familiar.  That discussion was many months ago.

-- 
Stan

  reply	other threads:[~2012-01-10  3:54 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-12 11:54 RAID-10 explicitly defined drive pairs? Jan Kasprzak
2011-12-12 15:33 ` John Robinson
2012-01-06 15:08   ` Jan Kasprzak
2012-01-06 16:39     ` Peter Grandi
2012-01-06 19:16       ` Stan Hoeppner
2012-01-06 20:11       ` Jan Kasprzak
2012-01-06 22:55         ` Stan Hoeppner
2012-01-07 14:25           ` Peter Grandi
2012-01-07 16:25             ` Stan Hoeppner
2012-01-09 13:46               ` Peter Grandi
2012-01-10  3:54                 ` Stan Hoeppner [this message]
2012-01-10  4:13                   ` NeilBrown
2012-01-10 16:25                     ` Stan Hoeppner
2012-01-12 11:58                   ` Peter Grandi
2012-01-12 12:47               ` Peter Grandi
2012-01-12 21:24                 ` Stan Hoeppner
2012-01-06 20:55     ` NeilBrown
2012-01-06 21:02       ` Jan Kasprzak
2012-03-22 10:01       ` Alexander Lyakas
2012-03-22 10:31         ` NeilBrown
2012-03-25  9:30           ` Alexander Lyakas
2012-04-04 16:56             ` Alexander Lyakas
2014-06-09 14:26               ` Alexander Lyakas
2014-06-10  0:11                 ` NeilBrown
2014-06-11 16:05                   ` Alexander Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F0BB690.4040904@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=pg@lxra2.for.sabi.co.UK \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).