From: Tim Small <tim@buttersideup.com>
To: Julian Cowley <julian@lava.net>, linux-raid@vger.kernel.org
Subject: Software vs. Hardware RAID
Date: Tue, 03 Aug 2004 12:47:49 +0100 [thread overview]
Message-ID: <410F7B65.3090407@buttersideup.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0408021554070.7168@taurus.cesta.com>
Julian Cowley wrote:
>Recently I did a survey of this very question (hardware vs. software
>RAID) based on the comments from this mailing list:
>
>Software
>--------
>
>- CPU must handle operations
>- twice the I/O bandwidth when using RAID1
>
>
Yes (during writes)
>+ non-proprietary disk format
>+ open source implementation
>- limited or non-existent support for hot-swapping, even with SATA
> (see http://www.redhat.com/archives/fedora-test-list/2004-March/msg01204.html)
>
>
I've swapped out SCSI drives with software RAID on a live system - it
isn't 100% smooth, as it triggers a bus reset on these systems, and
hence about 15 seconds of no-I/O, but the machine did work afterwards,
and no reboot was required. For SATA hot-swap, see this article:
http://kerneltrap.org/node/view/3432
>- OS-specific format (can't be shared between Linux, Windows, etc.)
>
>
Well, you can configure a partition as mirrored using Linux software
RAID, and then have Windows use the rest of the disk.. Whether you
could then have Windows use it's own software RAID on the rest of the
disk, I couldn't say.. As long as you kept access to read-only you
could probably then read the whole of the fs content from both OS (why
do you want to run Windows anyway? :o)
>+ drives can be anything (ie. a mixture of SATA, PATA, Firewire, USB, etc.)
>- disk surface testing must be done manually (7/2004)
>
>
Smartd can automate this e.g. these lines in smartd.conf will tell the
drives to do an extended self-test at 1am, and 2am on Saturday...
/dev/hda -a -s L/../../6/01 -m root
/dev/hdc -a -s L/../../6/02 -m root
This may catch blocks which are going bad before they become unreadable
(i.e. when the hardware and/or firmware ECC algorithms are still able to
reconstruct the data), and cause the drive to silently remap these
blocks - so these may well save you an array degradation...
>- no bad block relocation (7/2004)
>
>
Most drives will do this automatically, except in the event of data loss
(i.e. if it can't reconstruct the correct data, it will just return a
read error - if you try to write the entire block, it will then remap
it) - with software RAID, you will end up with a degraded array at the
moment. It would be cool if the software raid subsystem would try to
rewrite individual blocks which have had read failures (assuming it has
info on the other disks, or n RAM to do this) before marking the whole
partition as bad, but it doesn't at the moment (AFAIK).
I've had cases (on IBM 75GXP drives <spit>), where two drives in a
mirror have independently had different unreadable sectors, and the
hardware RAID controller has kicked out drives, and left the OS with an
unusable array (although together, both drives have all the data -
grrr). If this was software RAID, the same thing would have happened,
but at least I would have been able to manually copy bad blocks from the
failed drive using dd, without taking down the OS.
>- no parity verification (7/2004)
>- no mirror verification (7/2004)
>
>
True, but with the exception of kernel bugs, arrays shouldn't get into
these states. Would be a nice feature tho'.
>+ reputedly, much better performance than hardware raid
>
>
Can be I think, yes. e.g. I get ~120 MB/Sec linear device reads/writes
on a 3x 10k rpm 75G (all drives on a single U320 SCSI bus) software
raid5 array that I've built. With modern CPUs, the processing overhead
required for RAID is not highly significant - a bit higher if an array
is degraded, and on RAID5 writes of course - e.g. see this kernel output
on a dual Xeon 2.8GHz box:
raid5: using function: pIII_sse (3649.600 MB/sec)
And this on a dual Opteron 248
raid5: using function: generic_sse (6744.000 MB/sec)
so parity calculation is not a serious overhead these days, but the
extra I/O may be - on the 2.8GHz Xeon box (which is the aforementioned
3x 10k rpm SCSI machine, running 2.4.26), I see:
Read from RAID5:
119MB/Sec, with 25% kernel CPU usage
Read from RAID5 (degraded array):
127MB/Sec, with 60% kernel CPU usage
>Hardware
>--------
>
>+ off-loads the CPU
>+ I/O bandwidth needed on a RAID1 system is same as single disk
>
>
again, this is only for writes, you get a similar effect with RAID5
(e.g. a four disk RAID5 needs 1.25 times the writes)
>- proprietary disk format (although limited drivers are available for Linux)
>- proprietary implementation
>+ easy hot-swapping (some controllers even indicate the bad drive with an LED)
>+ non-OS-specific (can share between Linux, Windows, etc.)
>- some features may not be supported on non-Windows operating systems
>
>
you can also add "non-Redhat kernels" to this list...
>+ able to create logical disks that seem like physical disks to the OS
>
>
and associated with this - less trouble with boot loaders (e.g. booting
from a degraded array as root fs)
>+ bad sector relocation (on the fly?)
>
>
Depends on the controller e.g. 3ware does now, but it didn't used to
>- drives must connect to the controller and all must be same type (e.g. SATA)
>+ disk surface testing done automatically
>+ automatic bad block relocation
>+ parity verification
>+ mirror verification
>
>
You can add a "maybe" to the last four - all depends on the
implementation, and if you can't get the management software to run on
your kernel/distribution, then you may not get any of them (or degraded
array notification!) without using the RAID controller's BIOS.
Add to this another negative - patchy SMART support (only 3ware supports
smartd pass-through at the moment, AFAIK) - which is useful if you want
more granularity than "drive good", or "drive bad", e.g. the ability to
read serial numbers, firmware versions, drive temperatures, SMART error
log entries, interface errors, remapped block count, spin-up count,
power-on hours etc. whilst the OS is up and running.
Tim.
next prev parent reply other threads:[~2004-08-03 11:47 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-07-30 3:53 1x 3ware controllers vs. 2x 3ware controllers Adam Hunt
2004-07-30 4:50 ` Scott T. Smith
2004-07-30 6:45 ` Luca Berra
2004-07-30 7:15 ` Mark Watts
2004-07-30 7:22 ` Mark Watts
2004-07-30 8:14 ` Scott T. Smith
2004-07-30 15:00 ` Marc Bevand
2004-07-30 16:17 ` Mark Watts
2004-07-30 23:53 ` Jim Buttafuoco
2004-07-31 8:49 ` Mikael Abrahamsson
2004-07-31 14:24 ` Jon Lewis
2004-07-31 16:28 ` Mikael Abrahamsson
2004-07-31 16:42 ` Mark Watts
2004-07-31 17:40 ` Jurriaan
2004-08-01 7:00 ` Mikael Abrahamsson
2004-08-01 7:08 ` Gordon Henderson
2004-08-01 9:18 ` Mikael Abrahamsson
2004-08-01 9:51 ` Mark Watts
2004-08-01 12:11 ` Mikael Abrahamsson
2004-08-01 15:01 ` Mark Watts
2004-08-01 15:10 ` Jim Buttafuoco
2004-08-01 15:27 ` Mikael Abrahamsson
2004-08-01 15:33 ` Mark Watts
2004-08-01 17:18 ` Gordon Henderson
2004-08-02 12:54 ` Mark Watts
2004-08-02 13:04 ` Gordon Henderson
2004-08-02 9:40 ` Mark Watts
2004-08-03 1:58 ` [RAID] " Julian Cowley
2004-08-03 2:05 ` Definition of hotswap, was " Scott T. Smith
2004-08-03 11:55 ` Tim Small
2004-08-22 17:52 ` Maurice Hilarius
2004-08-03 11:47 ` Tim Small [this message]
2004-08-03 15:58 ` Software vs. Hardware RAID Ricky Beam
2004-08-01 15:06 ` 1x 3ware controllers vs. 2x 3ware controllers Jim Buttafuoco
2004-08-01 15:24 ` Mikael Abrahamsson
2004-08-01 17:57 ` Scott T. Smith
2004-08-01 19:28 ` David Greaves
2004-08-01 22:32 ` Mikael Abrahamsson
2004-08-02 18:02 ` Mark Hahn
2004-08-02 18:07 ` Scott T. Smith
2004-08-01 17:53 ` Scott T. Smith
2004-08-02 10:22 ` what is the best multi-SATA-controller-on-a-single-board out there? Tim Small
2004-08-02 21:09 ` Jon Lewis
2004-08-02 22:48 ` robin-lists
2004-08-03 8:55 ` Tim Small
2004-08-03 17:45 ` Jon Lewis
2004-08-04 8:24 ` Tim Small
2004-07-31 13:11 ` 1x 3ware controllers vs. 2x 3ware controllers Joshua Baker-LePain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=410F7B65.3090407@buttersideup.com \
--to=tim@buttersideup.com \
--cc=julian@lava.net \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).