Re: Accelerating Linux software raid

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ric Wheeler <ric@emc.com>
To: Mark Hahn <hahn@physics.mcmaster.ca>
Cc: Dan Williams <dan.j.williams@gmail.com>, linux-raid@vger.kernel.org
Subject: Re: Accelerating Linux software raid
Date: Sat, 10 Sep 2005 22:06:21 -0400	[thread overview]
Message-ID: <4323911D.8010307@emc.com> (raw)
In-Reply-To: <Pine.LNX.4.44.0509100927130.29141-100000@coffee.psychology.mcmaster.ca>

Mark Hahn wrote:

>>I think that the above holds for server applications, but there are lots 
>>of places where you will start to see a need for serious IO capabilities 
>>in low power, multi-core designs.  Think of your Tivo starting to store 
>>family photos - you don't want to bolt a server class box under your TV 
>>in order to get some reasonable data protection ;-)
>>    
>>
>
>I understand your point, but are the numbers right?  it seems to me that 
>the main factor in appliance design is power dissipation, and I'm guessing
>a budget of say 20W for the CPU.  these days, that's a pretty fast processor,
>of the mobile-athlon-64 range - probably 3 GB/s xor performance.  I'd 
>guess it amounts to perhaps 5-10% cpu overhead if the appliance were,
>for some reason, writing at 100 MB/s.  of course, it is NOT writing at 
>that rate (remember, reading doesn't require xors, and appliances probably
>do more reads than writes...)
>
>  
>
I think that one thing that your response shows is a small 
misunderstanding in what this class of part is.  It is not a TOE in the 
classic sense, rather a generally useful (non-standard) execution unit 
that can do some restricted set of operations well but is not intended 
to be used as a full second (or third or fourth) CPU.  If you get the 
code and design right, this will be a very simple driver calling 
functions that offload specific computations to these specialized 
execution units. 

If you look at public numbers for power for modern Intel architecture 
CPU's, say Tom's hardware at:

    http://www.tomshardware.com/cpu/20050525/pentium4-02.html

you will see that the 20W budget you allocate for a modern CPU is much 
closer to the power budget for these embedded parts than any modern 
cpu.   Mobile parts draw much less power than server CPUs and come 
somewhat closer to your number.

>>In the Centera group where I work, we have a linux based box that is 
>>used for archival storage.  Customers understand why the cost of a box 
>>is related to the number of disks, but the strength of the CPU, memory 
>>subsystem, etc are all more or less thought of as overhead (not to 
>>mention that nasty software stuff that I work on ;-)).
>>    
>>
>
>again, no offense meant, but I hear you saying "we under-designed the 
>centera host processor, and over-priced it, so that people are trying to 
>Stretch their budget by piling on too many disks".  I'm actually a little
>surprised, since I figured the Centera design would be a sane, modern,
>building-block-based one, where you could cheaply scale the number of 
>host processors, not just disks (like an old-fashioned, not-mourned SAN.)
>I see a lot of people using a high-performance network like IB as an internal
>backplane-like way to tie together a cluster-in-a-box.  (and I expect they'll
>sprint from IB to 10G real soon now.)
>  
>
These operations are not done only during ingest, they can be used to 
check the integrity of the already stored data, regenerate data, etc.  I 
don't want to hawk centera here, but we are definitely a scalable design 
using building blocks ;-)

What I tried to get across is the opposite of your summary, i.e. a 
customer who buys storage devices prefers to pay for storage capacity 
(media) instead of infrastructure used to provide storage and that they 
expect engineers to do the hard work to give them that storage at the 
best possible price.

We definitely use commodity hardware, we just try to get as much out of 
it as possible.

>but then again, you did say this was an archive box.  so what is the
>bandwidth of data coming in?  that's the number that sizes your host cpu.
>being able to do xor at 12 GB/s is kind of pointless if the server has just
>one or two 2 Gb net links... 
>  
>
Storage arrays like Centera are not block device, we do a lot more high 
level functions (real file systems, scrubbing, indexing, etc).  All of 
these functions require CPU, disk, etc, so anything that we can save can 
be used to provide added functionality.

>>Also keep in mind that the Xor done for simple RAID is not the whole 
>>story - think of compression offload, encryption, etc which might also 
>>be able to leverage a well thought out solution.
>>    
>>
>
>this is an excellent point, and one that argues *against* HW coprocessing.
>consider the NIC market: TOE never happened because adding tcp/ssl to a 
>separate card just moves the complexity and bugs from an easy-to-patch place 
>into a harder-to-patch place.  I'd much rather upgrade from a uni server to a
>dual and run the tcp/ssl in software than spend the same amount of money
>on a $2000 nic that runs its own OS.  my tcp stack bugs get fixed in a 
>few hours if I email netdev, but who knows how long bugs would linger in
>the firmware stack of a TOE card?
>  
>
Again, I think you misunderstand the part and the intention of the 
project and the part. Not everyone (much to our sorrow), wants a huge 
storage system - some people might be able to do with very small, quiet 
appliances for their archives.

>same thing here, except moreso.  making storage appliances smarter is great,
>but why put that smarts in some kind of opaque, inaccessible and hard-to-use
>coprocessor?  good, thoughtful design leads towards a loosely-coupled cluster
>of off-the-shelf components...
>
>regards, mark hahn.
>(I run a large supercomputing center, and spend a lot of effort specifying
>and using big compute and storage hardware...)
>
>  
>
I am an ex-Thinking Machines OS developer, who spent time working on the 
paragon OS at OSF and have a fair appreciation for large customers with 
deep wallets.  If everyone wanted to buy large installations built with 
high powered hardware, my life would be much easier ;-)

regards,

ric

next prev parent reply	other threads:[~2005-09-11  2:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-09-06 18:24 Accelerating Linux software raid Dan Williams
2005-09-06 21:52 ` Molle Bestefich
2005-09-10  4:51 ` Mark Hahn
2005-09-10 12:58   ` Ric Wheeler
2005-09-10 15:35     ` Mark Hahn
2005-09-10 19:13       ` Dan Williams
2005-09-11  2:06       ` Ric Wheeler [this message]
2005-09-11  2:35         ` Konstantin Olchanski
2005-09-11 12:00           ` Ric Wheeler
2005-09-11 20:19             ` Mark Hahn
2005-09-10  8:35 ` Colonel Hell
2005-09-11 23:14 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4323911D.8010307@emc.com \
    --to=ric@emc.com \
    --cc=dan.j.williams@gmail.com \
    --cc=hahn@physics.mcmaster.ca \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).