linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: stan@hardwarefreak.com
Cc: Peter Grandi <pg@lxra2.for.sabi.co.UK>,
	Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: RAID-10 explicitly defined drive pairs?
Date: Tue, 10 Jan 2012 15:13:36 +1100	[thread overview]
Message-ID: <20120110151336.568f5da4@notabene.brown> (raw)
In-Reply-To: <4F0BB690.4040904@hardwarefreak.com>

[-- Attachment #1: Type: text/plain, Size: 4171 bytes --]

On Mon, 09 Jan 2012 21:54:56 -0600 Stan Hoeppner <stan@hardwarefreak.com>
wrote:

> On 1/9/2012 7:46 AM, Peter Grandi wrote:
> 
> > Those able to do a web search with the relevant keywords and
> > read documentation can find some mentions of single SSD RMW and
> > address/length alignment, for example here:
> > 
> >   http://research.cs.wisc.edu/adsl/Publications/ssd-usenix08.pdf
> >   http://research.microsoft.com/en-us/projects/flashlight/winhec08-ssd.pptx
> >   http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-09-2.pdf
> > 
> > Mentioned in passing as something pretty obvious, and there are
> > other similar mentions that come up in web searches because it
> > is a pretty natural application of thinking about RMW issues.
> 
> Yes, I've read such things.  I was eluding to the fact that there are at
> least a half dozen different erase block sizes and algorithms in use by
> different SSD manufacturers.  There is no standard.  And not all of them
> are published.  There is no reliable way to do such optimization
> generically.
> 
> > Now I eagerly await your explanation of the amazing "Hoeppner
> > effect" by which address/length aligned writes on RAID0/1/10
> > have significant benefits and of the audacious "Hoeppner
> > principle" by which 'concat' is as good as RAID0 over the same
> > disks.
> 
> IIRC from a previous discussion I had with Neil Brown on this list,
> mdraid0, as with all the striped array code, runs as a single kernel
> thread, limiting its performance to that of a single CPU.  A linear
> concatenation does not run as a single kernel thread, but is simply an
> offset calculation routine that, IIRC, executes on the same CPU as the
> caller.  Thus one can theoretically achieve near 100% CPU scalability
> when using concat instead of mdraid0.  So the issue isn't partial stripe
> writes at the media level, but the CPU overhead caused by millions of
> the little bastards with heavy random IOPS workloads, along with
> increased numbers of smaller IOs through the SCSI/SATA interface,
> causing more interrupts thus more CPU time, etc.
> 
> I've not run into this single stripe thread limitation myself, but have
> read multiple cases where OPs can't get maximum performance from their
> storage hardware because their top level mdraid stripe thread is peaking
> a single CPU in their X-way system.  Moving from RAID10 to a linear
> concat gets around this limitation for small file random IOPS workloads.
>  Only when using XFS and a proper AG configuration, obviously.  This is
> my recollection of Neil's description of the code behavior.  I could
> very well have misunderstood, and I'm sure he'll correct me if that's
> the case, or you, or both. ;)

(oh dear, someone is Wrong on the Internet! Quick, duck into the telephone
booth and pop out as ....)

Hi Stan,
 I think you must be misremembering.
Neither RAID0 or Linear have any threads involved.  They just redirect the
request to the appropriate devices.  Multiple threads can submit multiple
requests down through RAID0 and Linear concurrently.

RAID1, RAID10, and RAID5/6 are different.  For reads they normally are have
no contention with other requests, but for writes things to get
single-threaded at some point.

Hm... you text above sometime talks about RAID0 vs Linear, and sometimes
about RAID10 vs Linear.  So maybe you are remembering correctly, but
presenting incorrectly in part ....

NeilBrown

> 
> Dave Chinner had some input WRT XFS on concat for this type of workload,
> stating it's a little better than RAID10 (ambiguous as to hard/soft).
> Did you read that thread Peter?  I know you're on the XFS list as well.
>  I can't exactly recall at this time Dave's specific reasoning, I'll try
> to dig it up.  I'm thinking it had to do with the different distribution
> of metadata IOs between the two AG layouts, and the amount of total head
> seeking required for the workload being somewhat higher for RAID10 than
> for the concat of RAID1 pairs.  Again, I could be wrong on that, but it
> seems familiar.  That discussion was many months ago.
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-01-10  4:13 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-12 11:54 RAID-10 explicitly defined drive pairs? Jan Kasprzak
2011-12-12 15:33 ` John Robinson
2012-01-06 15:08   ` Jan Kasprzak
2012-01-06 16:39     ` Peter Grandi
2012-01-06 19:16       ` Stan Hoeppner
2012-01-06 20:11       ` Jan Kasprzak
2012-01-06 22:55         ` Stan Hoeppner
2012-01-07 14:25           ` Peter Grandi
2012-01-07 16:25             ` Stan Hoeppner
2012-01-09 13:46               ` Peter Grandi
2012-01-10  3:54                 ` Stan Hoeppner
2012-01-10  4:13                   ` NeilBrown [this message]
2012-01-10 16:25                     ` Stan Hoeppner
2012-01-12 11:58                   ` Peter Grandi
2012-01-12 12:47               ` Peter Grandi
2012-01-12 21:24                 ` Stan Hoeppner
2012-01-06 20:55     ` NeilBrown
2012-01-06 21:02       ` Jan Kasprzak
2012-03-22 10:01       ` Alexander Lyakas
2012-03-22 10:31         ` NeilBrown
2012-03-25  9:30           ` Alexander Lyakas
2012-04-04 16:56             ` Alexander Lyakas
2014-06-09 14:26               ` Alexander Lyakas
2014-06-10  0:11                 ` NeilBrown
2014-06-11 16:05                   ` Alexander Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120110151336.568f5da4@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=pg@lxra2.for.sabi.co.UK \
    --cc=stan@hardwarefreak.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).