Best way to achieve large, expandable, cheap storage?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Best way to achieve large, expandable, cheap storage?
@ 2005-09-30 13:20 Robin Bowes
  2005-09-30 13:29 ` Robin Bowes
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Robin Bowes @ 2005-09-30 13:20 UTC (permalink / raw)
  To: linux-raid

Hi,

I have a business opportunity which would involve a large amount of 
storage, possibly growing to 10TB in the first year, possibly more. This 
would be to store media files - probably mainly .flac or .mp3 files.

Concurrency wouldn't be particularly important as I'd be the only person 
access the storage and I have no need for lightning speed.

It would be nice to be able to start smallish and grow as required, but 
my experience of linux raid to date is that it's not possible to resize 
arrays. (I have a 1TB array built from 6 x 250GB SATA discs on Promise 
SATA150 TX4 controllers).

Can anyone offer recommendations as to the most cost-effective way to 
achieve this sort of storage?

Are there any limitations I might run into using md on Linux?

For example, suppose I get something like this [1] and throw in an 
appropriate mobo/processor etc and 24 x 500 GB SATA discs; would 
md/mdadm be able to create a single 11TB RAID5 partition, ie (23-1) x 
500, with a hot-spare? Would this be a sensible thing to do?

What about file-system limitations, e.g. would ext3/reiser/XFS support 
an 11TB partition?

Would I be better off creating smaller volumes combining them with RAID0?

I'd appreciate any tips/suggestions/advice/pointers to further sources 
of information.

Thanks,

R.
-- 
http://robinbowes.com

If a man speaks in a forest,
and his wife's not there,
is he still wrong?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-09-30 13:20 Best way to achieve large, expandable, cheap storage? Robin Bowes
@ 2005-09-30 13:29 ` Robin Bowes
  2005-09-30 18:28   ` Brad Dameron
  2005-09-30 18:16 ` Gregory Seidman
  2005-10-02  4:36 ` Christopher Smith
  2 siblings, 1 reply; 19+ messages in thread
From: Robin Bowes @ 2005-09-30 13:29 UTC (permalink / raw)
  To: linux-raid

Robin Bowes wrote:
> For example, suppose I get something like this [1]

Ooops.

http://www.cidesign.com/product_detail.jsp?productID=4

R.
-- 
http://robinbowes.com

If a man speaks in a forest,
and his wife's not there,
is he still wrong?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-09-30 13:20 Best way to achieve large, expandable, cheap storage? Robin Bowes
  2005-09-30 13:29 ` Robin Bowes
@ 2005-09-30 18:16 ` Gregory Seidman
  2005-09-30 18:34   ` Andy Smith
  2005-10-02  4:36 ` Christopher Smith
  2 siblings, 1 reply; 19+ messages in thread
From: Gregory Seidman @ 2005-09-30 18:16 UTC (permalink / raw)
  To: linux-raid

On Fri, Sep 30, 2005 at 02:20:11PM +0100, Robin Bowes wrote:
} I have a business opportunity which would involve a large amount of
} storage, possibly growing to 10TB in the first year, possibly more. This
} would be to store media files - probably mainly .flac or .mp3 files.
} 
} Concurrency wouldn't be particularly important as I'd be the only person
} access the storage and I have no need for lightning speed.

If you aren't overly concerned about speed, you can use LVM. If you want
redundancy as well as disk-spanning, you can use RAID as well. That is what
I am planning on doing for myself. There are shortcomings, however. See
below.

} It would be nice to be able to start smallish and grow as required, but
} my experience of linux raid to date is that it's not possible to resize
} arrays. (I have a 1TB array built from 6 x 250GB SATA discs on Promise
} SATA150 TX4 controllers).
} 
} Can anyone offer recommendations as to the most cost-effective way to
} achieve this sort of storage?
} 
} Are there any limitations I might run into using md on Linux?
} 
} For example, suppose I get something like this [1] and throw in an
} appropriate mobo/processor etc and 24 x 500 GB SATA discs; would md/mdadm
} be able to create a single 11TB RAID5 partition, ie (23-1) x 500, with a
} hot-spare? Would this be a sensible thing to do?
} 
} What about file-system limitations, e.g. would ext3/reiser/XFS support an
} 11TB partition?

AFAIK, all three of those filesystems can handle at least 16 exabytes of
data. I may be wrong, however.

} Would I be better off creating smaller volumes combining them with RAID0?
} 
} I'd appreciate any tips/suggestions/advice/pointers to further sources of
} information.

LVM allows you to add more PVs (physical volumes, a.k.a. disks or
partitions) to a VG (volume group). You can then extend an LV (logical
volume) and the filesystem on it. Basically it is like a growable RAID0. It
is even possible to retire old PVs as long as there is sufficient room on
other PVs to take up the slack. This means that in five years when you can
get a 3TB disk for $300 you'll be able to add them in and replace your old,
outdated 250GB drives.

One advantage of LVM is snapshotting. It allows you to basically keep a
cheap diff backup of your disk. It's a really cool feature, but I'm not
going to go into detail about it here.

The main disadvantage is that while you can have a hot spare as part of a
RAID10 to be automatically used in any RAID1 pair as needed, LVM does not
integrate closely enough with md to allow that. You can have a warm spare
all ready to go, but you would have to actually assign it to the
appropriate md device for it to be used. Your best bet is to do a RAID10 with
however many hot spares on your set of disks and put LVM on top of that.
When you want to expand, add another PV to the LVM. Don't rely on LVM to
make a single device from a group of disks bought at any one time, just to
add new storage.

} Thanks,
} R.
--Greg

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-09-30 13:29 ` Robin Bowes
@ 2005-09-30 18:28   ` Brad Dameron
  2005-09-30 19:20     ` Dan Stromberg
  0 siblings, 1 reply; 19+ messages in thread
From: Brad Dameron @ 2005-09-30 18:28 UTC (permalink / raw)
  To: linux-raid

On Fri, 2005-09-30 at 14:29 +0100, Robin Bowes wrote:
> Robin Bowes wrote:
> > For example, suppose I get something like this [1]
> 
> Ooops.
> 
> http://www.cidesign.com/product_detail.jsp?productID=4
> 
> R.


Only 24 drives?


http://www.rackmountpro.com/productpage.php?prodid=2079

And yes you can do that large of a single partition. However you might
consider using something like this to control them instead of a software
RAID.

http://www.areca.com.tw/products/html/pcix-sata.htm

Or if you want the best performance you can get with SATA look at this
one:

http://www.areca.com.tw/products/html/pcie-sata.htm

It has a 800Mhz processor. Does RAID6, etc.

Brad Dameron
SeaTab Software
www.seatab.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-09-30 18:16 ` Gregory Seidman
@ 2005-09-30 18:34   ` Andy Smith
  0 siblings, 0 replies; 19+ messages in thread
From: Andy Smith @ 2005-09-30 18:34 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 328 bytes --]

On Fri, Sep 30, 2005 at 02:16:43PM -0400, Gregory Seidman wrote:
> One advantage of LVM is snapshotting. It allows you to basically keep a
> cheap diff backup of your disk. It's a really cool feature, but I'm not
> going to go into detail about it here.

I've found it (snapshots) too unstable for actual production use though.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-09-30 18:28   ` Brad Dameron
@ 2005-09-30 19:20     ` Dan Stromberg
  0 siblings, 0 replies; 19+ messages in thread
From: Dan Stromberg @ 2005-09-30 19:20 UTC (permalink / raw)
  To: Brad Dameron; +Cc: linux-raid, strombrg

On Fri, 2005-09-30 at 11:28 -0700, Brad Dameron wrote:

> http://www.areca.com.tw/products/html/pcix-sata.htm
> 
> Or if you want the best performance you can get with SATA look at this
> one:

Speaking of Areca stuff, someone I'm working with someone who has a
system that appears to be running well with 3.5 terabytes of usable
capacity, But whenever they try to reboot it, it hangs on sync'ing disks
(I haven't seen it first hand yet, but that's what I'm told is
happening).

I haven't tried:

        sync &
        sync &
        sync &
        ...
        reboot(LINUX_REBOOT_CMD_RESTART);

...yet, the last part of which I believe is supposed to reboot without
sync'ing?

Has anyone encountered this before?

Thanks!

Some specifics follow:

Controller info:
        strombrg@hiperstore ~]$ dmesg | grep -i areca
        ARECA RAID: 64BITS PCI BUS DMA ADDRESSING SUPPORTED
        scsi0 : ARECA ARC1130 PCI-X 12 PORTS SATA RAID CONTROLLER
        (RAID6-ENGINE Inside)
          Vendor: Areca     Model: ARC-1130-VOL#00   Rev: R001

CPU info:
        [strombrg@hiperstore ~]$ cat /proc/cpuinfo 
        processor       : 0
        vendor_id       : AuthenticAMD
        cpu family      : 15
        model           : 5
        model name      : AMD Opteron(tm) Processor 244
        stepping        : 10
        cpu MHz         : 1794.825
        cache size      : 1024 KB
        fpu             : yes
        fpu_exception   : yes
        cpuid level     : 1
        wp              : yes
        flags           : fpu vme de pse tsc msr pae mce cx8 apic sep
        mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx
        mmxext lm 3dnowext 3dnow
        bogomips        : 3597.17
        TLB size        : 1024 4K pages
        clflush size    : 64
        cache_alignment : 64
        address sizes   : 40 bits physical, 48 bits virtual
        power management: ts fid vid ttp
        
        processor       : 1
        vendor_id       : AuthenticAMD
        cpu family      : 15
        model           : 5
        model name      : AMD Opteron(tm) Processor 244
        stepping        : 10
        cpu MHz         : 1794.825
        cache size      : 1024 KB
        fpu             : yes
        fpu_exception   : yes
        cpuid level     : 1
        wp              : yes
        flags           : fpu vme de pse tsc msr pae mce cx8 apic sep
        mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx
        mmxext lm 3dnowext 3dnow
        bogomips        : 3590.12
        TLB size        : 1024 4K pages
        clflush size    : 64
        cache_alignment : 64
        address sizes   : 40 bits physical, 48 bits virtual
        power management: ts fid vid ttp

OS info:
        [strombrg@hiperstore ~]$ cat /etc/redhat-release 
        Fedora Core release 4 (Stentz)

Kernel info:
        [strombrg@hiperstore ~]$ cat /proc/version 
        Linux version 2.6.13-mm2 (root@hiperstore) (gcc version 4.0.1
        20050727 (Red Hat 4.0.1-5)) #2 SMP Thu Sep 8 23:10:46 PDT 2005

        It appears to be an (otherwise) unmodified Andrew Morton kernel
        they're using:
        
                [strombrg@hiperstore linux-2.6.13-mm2]$ find . -type f -print | egrep -vi '\.ko$|\.cmd$|\.o$' | xargs filetime | highest -n 50
                1126246262      ./drivers/parport/parport.mod.c Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/parport/parport_serial.mod.c  Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/parport/parport_pc.mod.c      Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/message/i2o/i2o_block.mod.c   Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/message/i2o/i2o_config.mod.c  Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/message/i2o/i2o_core.mod.c    Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/message/i2o/i2o_proc.mod.c    Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/message/i2o/i2o_scsi.mod.c    Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/infiniband/hw/mthca/ib_mthca.mod.c    Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/infiniband/core/ib_cm.mod.c   Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/infiniband/core/ib_core.mod.c Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/infiniband/core/ib_sa.mod.c   Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/infiniband/core/ib_ucm.mod.c  Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/infiniband/core/ib_mad.mod.c  Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/infiniband/core/ib_umad.mod.c Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/infiniband/ulp/ipoib/ib_ipoib.mod.c   Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/atm/firestream.mod.c  Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/atm/he.mod.c  Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/atm/horizon.mod.c     Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/atm/idt77252.mod.c    Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/atm/suni.mod.c        Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/atm/ambassador.mod.c  Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/atm/eni.mod.c Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/atm/lanai.mod.c       Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/atm/atmtcp.mod.c      Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/backlight/lcd.mod.c     Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/backlight/backlight.mod.c       Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/vgastate.mod.c  Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/aty/aty128fb.mod.c      Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/aty/radeonfb.mod.c      Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/aty/atyfb.mod.c Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/vga16fb.mod.c   Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/matrox/i2c-matroxfb.mod.c       Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/matrox/matroxfb_misc.mod.c      Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/matrox/g450_pll.mod.c   Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/matrox/matroxfb_g450.mod.c      Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/matrox/matroxfb_crtc2.mod.c     Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/matrox/matroxfb_accel.mod.c     Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/matrox/matroxfb_Ti3026.mod.c    Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/matrox/matroxfb_maven.mod.c     Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/matrox/matroxfb_base.mod.c      Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/matrox/matroxfb_DAC1064.mod.c   Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/savage/savagefb.mod.c   Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/tdfxfb.mod.c    Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/cirrusfb.mod.c  Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/sstfb.mod.c     Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/neofb.mod.c     Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/riva/rivafb.mod.c       Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/macmodes.mod.c  Thu Sep  8 23:11:02 2005
                1126246262      ./drivers/video/kyro/kyrofb.mod.c       Thu Sep  8 23:11:02 2005

        And the arcmsr driver that came with this mm kernel appears to be version " 1.20.00.07    3/23/2005".



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-09-30 13:20 Best way to achieve large, expandable, cheap storage? Robin Bowes
  2005-09-30 13:29 ` Robin Bowes
  2005-09-30 18:16 ` Gregory Seidman
@ 2005-10-02  4:36 ` Christopher Smith
  2005-10-02  7:09   ` Tyler
  2005-10-03 16:33   ` Sebastian Kuzminsky
  2 siblings, 2 replies; 19+ messages in thread
From: Christopher Smith @ 2005-10-02  4:36 UTC (permalink / raw)
  To: Robin Bowes; +Cc: linux-raid

Robin Bowes wrote:
> Hi,
> 
> I have a business opportunity which would involve a large amount of 
> storage, possibly growing to 10TB in the first year, possibly more. This 
> would be to store media files - probably mainly .flac or .mp3 files.

Here's what I do (bear in mind this is for a home setup, so the data 
volumes aren't as large and I'd expand in smaller amounts to you - but 
the principle is the same).

I use a combination of Linux's software RAID + LVM for a flexible, 
expandable data store.  I buy disks in sets of four, with a four-port 
disk controller and a 4-drive, cooled chassis of some sort (lately, the 
Coolermaster 4-in-3 part).

I RAID5 the drives together and glue multiple sets of 4 drives together 
into a single usable chunk using LVM.

Over the last ~5 years, this has allowed me to move from/to the 
following disk configurations:

4x40GB -> 4x40GB + 4x120GB -> 4x40GB + 4x120GB + 4x250GB -> 4x120GB + 
4x250GB -> 4x250GB + 4x250GB.

In the next couple of months I plan to add another 4x300GB "drive set" 
to expand further.  I add drives about once a year.  I remove drives 
either because I run out of physical room in the machine, or to re-use 
them in other machines (eg: the 4x120GB drives are now scratch space on 
my workstation, the 4x40GB drives went into machines I built for 
relatives).  The case I have now is capable of holding about 20 drives, 
so I probably won't be removing any for a while (previous cases were 
stretched to hold 8 drives).

Apart from the actual hardware installations and removals, the various 
reconfigurations have been quite smoothe and painless, with LVM allowing 
easy migration of data to/from RAID devices, division of space, etc. 
I've had 3 disk failures, none of which have resulted in any data loss. 
  The "data store" has been moved across 3 very different physical 
machines and 3 different Linux installations (Redhat 9 -> RHEL3 -> FC4).

I would suggest not trying to resize existing arrays at all, and simply 
accept the "space wastage" as a cost of flexibility.  Storage is cheap, 
and a few dozens or hundreds of GB lost to long-term cost savings is 
well worth it IMHO.  The space I "lose" but not reconfiguring my RAID 
arrays whenever I add more disks is more than made up for by the money 
I've saving not buying everything at once, or the additional space 
available at the same price point.

I would, however, suggest getting a case with a large amount of physical 
space in it so you don't have to remove drives to add bigger ones.

But, basically, just buy as much space as you need now and then buy more 
as required - it's trivially easy to do, and you'll save money in the 
long run.

CS

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-10-02  4:36 ` Christopher Smith
@ 2005-10-02  7:09   ` Tyler
  2005-10-03  3:19     ` Christopher Smith
  2005-10-03 16:33   ` Sebastian Kuzminsky
  1 sibling, 1 reply; 19+ messages in thread
From: Tyler @ 2005-10-02  7:09 UTC (permalink / raw)
  To: Christopher Smith; +Cc: linux-raid

Christopher Smith wrote:

> Robin Bowes wrote:
>
>> Hi,
>>
>> I have a business opportunity which would involve a large amount of 
>> storage, possibly growing to 10TB in the first year, possibly more. 
>> This would be to store media files - probably mainly .flac or .mp3 
>> files.
>
>
> Here's what I do (bear in mind this is for a home setup, so the data 
> volumes aren't as large and I'd expand in smaller amounts to you - but 
> the principle is the same).
>
> I use a combination of Linux's software RAID + LVM for a flexible, 
> expandable data store.  I buy disks in sets of four, with a four-port 
> disk controller and a 4-drive, cooled chassis of some sort (lately, 
> the Coolermaster 4-in-3 part).
>
> I RAID5 the drives together and glue multiple sets of 4 drives 
> together into a single usable chunk using LVM.
>
> Over the last ~5 years, this has allowed me to move from/to the 
> following disk configurations:
>
> 4x40GB -> 4x40GB + 4x120GB -> 4x40GB + 4x120GB + 4x250GB -> 4x120GB + 
> 4x250GB -> 4x250GB + 4x250GB.
>
> In the next couple of months I plan to add another 4x300GB "drive set" 
> to expand further.  I add drives about once a year.  I remove drives 
> either because I run out of physical room in the machine, or to re-use 
> them in other machines (eg: the 4x120GB drives are now scratch space 
> on my workstation, the 4x40GB drives went into machines I built for 
> relatives).  The case I have now is capable of holding about 20 
> drives, so I probably won't be removing any for a while (previous 
> cases were stretched to hold 8 drives).
>
> Apart from the actual hardware installations and removals, the various 
> reconfigurations have been quite smoothe and painless, with LVM 
> allowing easy migration of data to/from RAID devices, division of 
> space, etc. I've had 3 disk failures, none of which have resulted in 
> any data loss.  The "data store" has been moved across 3 very 
> different physical machines and 3 different Linux installations 
> (Redhat 9 -> RHEL3 -> FC4).
>
> I would suggest not trying to resize existing arrays at all, and 
> simply accept the "space wastage" as a cost of flexibility.  Storage 
> is cheap, and a few dozens or hundreds of GB lost to long-term cost 
> savings is well worth it IMHO.  The space I "lose" but not 
> reconfiguring my RAID arrays whenever I add more disks is more than 
> made up for by the money I've saving not buying everything at once, or 
> the additional space available at the same price point.
>
> I would, however, suggest getting a case with a large amount of 
> physical space in it so you don't have to remove drives to add bigger 
> ones.
>
> But, basically, just buy as much space as you need now and then buy 
> more as required - it's trivially easy to do, and you'll save money in 
> the long run.
>
> CS
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
What case and power supply(s)are you using?  What raid cards are you 
using also?

Thanks,
Tyler.


-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.11.9/116 - Release Date: 9/30/2005


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-10-02  7:09   ` Tyler
@ 2005-10-03  3:19     ` Christopher Smith
  0 siblings, 0 replies; 19+ messages in thread
From: Christopher Smith @ 2005-10-03  3:19 UTC (permalink / raw)
  To: Tyler; +Cc: linux-raid

Tyler wrote:
> What case and power supply(s)are you using?  What raid cards are you 
> using also?

The case is a no-name job I picked up from a local PC seller:

http://www.pcicase.com.au/sub_files01.htm

It's main attraction to me was the large number of 5.25" drive bays.

The PSU is just a bog-standard 450W Antec (although since I've recently 
ugpraded the machine to dual Xeons, I should get a beefier unit).

Currently the machine has 2xPromise S150 TX4.  Previously, as the 120GB 
drives were PATA, it also had a Promise TX4000.  However, since that 
card wouldn't work with a 2.6 kernel, I used it as an excuse to get more 
drives and upgrade to a newer distro :).  The TX4s are 32 bit, 66Mhz PCI 
cards and are in 64/133Mhz PCI-X slots, so they handle four 7200rpm SATA 
drives each quite well.  I toyed with getting a single 8 port SATA card, 
but all the ones I've seen are full-blown hardware RAID, making them 
quite expensive and since I use software RAID and have 5 PCI-X slots on 
the motherboard, not worth it.  I'll run out of physical space in the 
case before I run out of PCI-X slots to drop 4-port cards into.

CS

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-10-02  4:36 ` Christopher Smith
  2005-10-02  7:09   ` Tyler
@ 2005-10-03 16:33   ` Sebastian Kuzminsky
  2005-10-04  4:09     ` Christopher Smith
  1 sibling, 1 reply; 19+ messages in thread
From: Sebastian Kuzminsky @ 2005-10-03 16:33 UTC (permalink / raw)
  To: Christopher Smith; +Cc: Robin Bowes, linux-raid

On 10/1/05, Christopher Smith <csmith@nighthawkrad.net> wrote:
> I use a combination of Linux's software RAID + LVM for a flexible,
> expandable data store.  I buy disks in sets of four, with a four-port
> disk controller and a 4-drive, cooled chassis of some sort (lately, the
> Coolermaster 4-in-3 part).
>
> I RAID5 the drives together and glue multiple sets of 4 drives together
> into a single usable chunk using LVM.

Sounds pretty cool.  I've used software RAID but never LVM, let me see
if I understand your setup:

At the lowest level, you have 4-disk controller cards, each connected
to a set of 4 disks.  Each set of 4 has a software RAID-5.  All the
RAID-5 arrays are used as LVM physical volumes.  These PVs are part of
a single volume group, from which you make logical volumes as needed.

When you want more disk, you buy 4 big modern disks (and a 4x
controller if needed), RAID-5 them, extend the VG onto them, and
extend the LV(s) on the VG.  Then I guess you have to unmount the
filesystem(s) on the LV(s), resize them, and remount them.

If you get low on room in the case or it gets too hot or noisy, you
have to free up an old, small RAID array.  You unmount, resize, and
remount the filesystem(s), reduce the LV(s) and the VG, and then
you're free to pull the old RAID array from the case.

> Apart from the actual hardware installations and removals, the various
> reconfigurations have been quite smoothe and painless, with LVM allowing
> easy migration of data to/from RAID devices, division of space, etc.
> I've had 3 disk failures, none of which have resulted in any data loss.
>   The "data store" has been moved across 3 very different physical
> machines and 3 different Linux installations (Redhat 9 -> RHEL3 -> FC4).

Your data survives one disk per PV croaking, but two disks out on any
one PV causes complete data loss, assuming you use the stripe mapping.

You use SATA, which does not support SMART yet, right?  So you get no
warning of pending drive failures yet.

None the less, sounds like a nice flexible setup.

--
Sebastian Kuzminsky

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-10-03 16:33   ` Sebastian Kuzminsky
@ 2005-10-04  4:09     ` Christopher Smith
  2005-10-20 10:23       ` Robin Bowes
  0 siblings, 1 reply; 19+ messages in thread
From: Christopher Smith @ 2005-10-04  4:09 UTC (permalink / raw)
  To: Sebastian Kuzminsky; +Cc: Robin Bowes, linux-raid

Sebastian Kuzminsky wrote:
> On 10/1/05, Christopher Smith <csmith@nighthawkrad.net> wrote:
> 
>>I RAID5 the drives together and glue multiple sets of 4 drives together
>>into a single usable chunk using LVM.
> 
> 
> Sounds pretty cool.  I've used software RAID but never LVM, let me see
> if I understand your setup:
> 
> At the lowest level, you have 4-disk controller cards, each connected
> to a set of 4 disks.  Each set of 4 has a software RAID-5.  All the
> RAID-5 arrays are used as LVM physical volumes.  These PVs are part of
> a single volume group, from which you make logical volumes as needed.
> 
> When you want more disk, you buy 4 big modern disks (and a 4x
> controller if needed), RAID-5 them, extend the VG onto them, and
> extend the LV(s) on the VG.  Then I guess you have to unmount the
> filesystem(s) on the LV(s), resize them, and remount them.
> 
> If you get low on room in the case or it gets too hot or noisy, you
> have to free up an old, small RAID array.  You unmount, resize, and
> remount the filesystem(s), reduce the LV(s) and the VG, and then
> you're free to pull the old RAID array from the case.

Yep, that's pretty much bang on.  The only thing you've missed is using 
pvmove to physically move the data off the soon-to-be-decomissioned 
PVs(/RAID arrays).

Be warned, for those who haven't used it before, pvmove is _very_ slow.

>>Apart from the actual hardware installations and removals, the various
>>reconfigurations have been quite smoothe and painless, with LVM allowing
>>easy migration of data to/from RAID devices, division of space, etc.
>>I've had 3 disk failures, none of which have resulted in any data loss.
>>  The "data store" has been moved across 3 very different physical
>>machines and 3 different Linux installations (Redhat 9 -> RHEL3 -> FC4).
> 
> 
> Your data survives one disk per PV croaking, but two disks out on any
> one PV causes complete data loss, assuming you use the stripe mapping.

Yep, that's correct.  I've never lost more than one disk out of an array 
at once and I've always replaced any disk failures the same day.  I lost 
two of the 40GB drives (about 6 months apart - back before I had decent 
cooling on them) and one of the 120GB drives.

> You use SATA, which does not support SMART yet, right?  So you get no
> warning of pending drive failures yet.

Yep.  The only annoyance.  I eagerly await the ability to check my SATA 
disks with SMART.

CS

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-10-04  4:09     ` Christopher Smith
@ 2005-10-20 10:23       ` Robin Bowes
  2005-10-20 11:19         ` Gregory Seidman
  2005-10-21  4:40         ` Christopher Smith
  0 siblings, 2 replies; 19+ messages in thread
From: Robin Bowes @ 2005-10-20 10:23 UTC (permalink / raw)
  To: Christopher Smith; +Cc: Sebastian Kuzminsky, Robin Bowes, linux-raid

Christopher Smith said the following on 04/10/2005 05:09:
> Yep, that's pretty much bang on.  The only thing you've missed is using 
> pvmove to physically move the data off the soon-to-be-decomissioned 
> PVs(/RAID arrays).
> 
> Be warned, for those who haven't used it before, pvmove is _very_ slow.

I've just been re-reading this thread.

I'd like to just check if I understand how this will work.

Assume the following setup (hypothetical).

VG:
big_vg - contains /dev/md1, /dev/md2; 240GB

PV:
/dev/md1 - 4 x 40GB drives (RAID5 - 120GB total)
/dev/md2 - 4 x 40GB drives (RAID5 - 120GB total)

LV:
big_lv - in big_vg - 240GB

Filesystems:
/home - xfs filesystem in big_lv - 240GB

Suppose I then add a new PV:
/dev/md3 - 4 x 300GB drives (RAID5 - 900GB total)

I want to replace /dev/md1 with /dev/md3

I use pvmove something like this:

# pvmove /dev/md1 /dev/md3

When this finishes, big_vg will contain /dev/md2 + /dev/md3 (1020GB 
total). /dev/md1 will be unused.

big_lv will still be using just 240GB of big_vg.

I then use lvextend to increase the size of big_lv

big_lv will now use all 1020GB of big_vg.

However, the /home filesystem will still just use 240GB of big_lv

I can then use xfs_growfs to expand the /home filesystem to use all 
1020GB of big_lv.

Have I missed anything?

R.
-- 
http://robinbowes.com

If a man speaks in a forest,
and his wife's not there,
is he still wrong?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-10-20 10:23       ` Robin Bowes
@ 2005-10-20 11:19         ` Gregory Seidman
  2005-10-20 11:41           ` Robin Bowes
  2005-10-21  4:42           ` Christopher Smith
  2005-10-21  4:40         ` Christopher Smith
  1 sibling, 2 replies; 19+ messages in thread
From: Gregory Seidman @ 2005-10-20 11:19 UTC (permalink / raw)
  To: linux-raid

On Thu, Oct 20, 2005 at 11:23:30AM +0100, Robin Bowes wrote:
} Christopher Smith said the following on 04/10/2005 05:09:
} >Yep, that's pretty much bang on.  The only thing you've missed is using 
} >pvmove to physically move the data off the soon-to-be-decomissioned 
} >PVs(/RAID arrays).
} >
} >Be warned, for those who haven't used it before, pvmove is _very_ slow.
} 
} I've just been re-reading this thread.
} 
} I'd like to just check if I understand how this will work.
} 
} Assume the following setup (hypothetical).
} 
} VG:
} big_vg - contains /dev/md1, /dev/md2; 240GB
} 
} PV:
} /dev/md1 - 4 x 40GB drives (RAID5 - 120GB total)
} /dev/md2 - 4 x 40GB drives (RAID5 - 120GB total)

You should at least read the following before using RAID5. You can agree or
disagree, but you should take the arguments into account:

http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt

} LV:
} big_lv - in big_vg - 240GB
} 
} Filesystems:
} /home - xfs filesystem in big_lv - 240GB
} 
} Suppose I then add a new PV:
} /dev/md3 - 4 x 300GB drives (RAID5 - 900GB total)

You use pvcreate and vgextend to do so, incidentally.

} I want to replace /dev/md1 with /dev/md3
} 
} I use pvmove something like this:
} 
} # pvmove /dev/md1 /dev/md3
} 
} When this finishes, big_vg will contain /dev/md2 + /dev/md3 (1020GB 
} total). /dev/md1 will be unused.

/dev/md1 will still be a part of big_vg, but it won't have any data from
any LVs on it. You will need to use vgreduce to remove /dev/md1 from the
VG:

# vgreduce big_vg /dev/md1

} big_lv will still be using just 240GB of big_vg.
} 
} I then use lvextend to increase the size of big_lv
} 
} big_lv will now use all 1020GB of big_vg.
} 
} However, the /home filesystem will still just use 240GB of big_lv
} 
} I can then use xfs_growfs to expand the /home filesystem to use all 
} 1020GB of big_lv.

All correct.

} Have I missed anything?

Just the vgreduce step (and removing the physical drives that make up
/dev/md1).

} R.
--Greg

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-10-20 11:19         ` Gregory Seidman
@ 2005-10-20 11:41           ` Robin Bowes
  2005-10-21  4:42           ` Christopher Smith
  1 sibling, 0 replies; 19+ messages in thread
From: Robin Bowes @ 2005-10-20 11:41 UTC (permalink / raw)
  To: linux-raid

Gregory Seidman said the following on 20/10/2005 12:19:
> } PV:
> } /dev/md1 - 4 x 40GB drives (RAID5 - 120GB total)
> } /dev/md2 - 4 x 40GB drives (RAID5 - 120GB total)
> 
> You should at least read the following before using RAID5. You can agree or
> disagree, but you should take the arguments into account:
> 
> http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt

This was just an example configuration. My current 1TB array is RAID5 
with 1 hot spare but I'll most likely use RAID6 in production.


> } Suppose I then add a new PV:
> } /dev/md3 - 4 x 300GB drives (RAID5 - 900GB total)
> 
> You use pvcreate and vgextend to do so, incidentally.

Yes, thanks for the detail.

> } When this finishes, big_vg will contain /dev/md2 + /dev/md3 (1020GB 
> } total). /dev/md1 will be unused.
> 
> /dev/md1 will still be a part of big_vg, but it won't have any data from
> any LVs on it. You will need to use vgreduce to remove /dev/md1 from the
> VG:
> 
> # vgreduce big_vg /dev/md1

Ah, yes, forgot about that step.

Thanks for the validation of the methodology.

I'm going to give this a try on my test server (using much smaller disks!)

Thanks again,

R.
-- 
http://robinbowes.com

If a man speaks in a forest,
and his wife's not there,
is he still wrong?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-10-20 10:23       ` Robin Bowes
  2005-10-20 11:19         ` Gregory Seidman
@ 2005-10-21  4:40         ` Christopher Smith
  1 sibling, 0 replies; 19+ messages in thread
From: Christopher Smith @ 2005-10-21  4:40 UTC (permalink / raw)
  To: Robin Bowes; +Cc: Sebastian Kuzminsky, linux-raid

Robin Bowes wrote:
> Christopher Smith said the following on 04/10/2005 05:09:
> 
>> Yep, that's pretty much bang on.  The only thing you've missed is 
>> using pvmove to physically move the data off the 
>> soon-to-be-decomissioned PVs(/RAID arrays).
>>
>> Be warned, for those who haven't used it before, pvmove is _very_ slow.
> 
> 
> I've just been re-reading this thread.

[...]

> I use pvmove something like this:
> 
> # pvmove /dev/md1 /dev/md3

It would actually just be 'pvmove <old md device>', but the gist is correct.

Someone else has already responded to your questions, but just something 
else to be aware of with pvmove, is that it might hang your system 
(requiring a hard boot) when you try and use it, although the process 
will proceed and complete without error (in the background) once you 
have restarted.

It's been several months since I last used pvmove, so this bug may have 
been fixed, but it was certainly present on FC4 back then.  Basically 
running pvmove would immediately hang the system (no response to 
keyboard, etc), but after a hard reboot the pvmove process would start 
up and then complete in the background.

Again, this may well have been fixed and you might not see it, but just 
a word of warning so your first reaction isn't something rash ;).  Since 
pvmove appears to do its thing by *copying* everything from one PV to 
another, rather than moving it, even if the machine crashes during the 
process there's no data loss.

CS

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-10-20 11:19         ` Gregory Seidman
  2005-10-20 11:41           ` Robin Bowes
@ 2005-10-21  4:42           ` Christopher Smith
  2005-10-21 16:48             ` Gil
  1 sibling, 1 reply; 19+ messages in thread
From: Christopher Smith @ 2005-10-21  4:42 UTC (permalink / raw)
  To: gsslist+linuxraid; +Cc: linux-raid

Gregory Seidman wrote:
> You should at least read the following before using RAID5. You can agree or
> disagree, but you should take the arguments into account:
> 
> http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt

This bloke makes some good points about the various downsides of RAID5 
(which everyone involved in actually implementing production RAID 
systems should already know), but IMHO he also makes some poor 
assumptions and specious claims.

For example, his article suggests that "partial media failure" is a 
problem that would only affect RAID5, when really it would negatively 
impact any RAID system (your newly-synced mirror isn't much good if half 
the data that just got mirrored to it was corrupted, nor is the speed 
boost from RAID0 very helpful if half the data is corrupted).  I'm also 
not sure about his claims of RAID3 & 4 "always" checking parity - that 
sounds like a vendor-specific implementation (and while I'm not a 
developer, I fail to see why a RAID5 implementation couldn't be made to 
do the same).

As another example, I'm 99% sure that SCSI drives *do* inform the OS 
when they remap a bad sector and that any remotely modern IDE drive also 
does sector remapping.

He also focuses solely on the worst-case scenario as a reason for 
avoiding RAID5 completely.  Certainly you have to take that into 
account, but it's rather unfair to draw a general conclusion based only 
on how a particular scenario might happen.

Added to that, he completely discounts a few things:
1.  Where it's "handy" to keep lots of data easily available, but its 
entire loss is not catastrophic - ie: data volume is more important than 
redundancy (my workplace has such a requirement, although we use RAID6 - 
but RAID6 suffers most of the same "problems" he's talking about)
2.  Where cost is a significant factor.  Certainly for a business, the 
cost of going RAID10 over RAID5, when taking into account possible 
losses, is probably not large.  However, in a "home user" scenario, 
where cost is almost always the deciding factor and performance is not 
particularly important, going RAID10 over RAID5 is difficult to justify. 
  Similarly, if large amounts of data (10s of terabytes) is being 
stored, the additional cost of RAID10 can become substantial.
3.  You could potentially need a _lot_ more physical space to get the 
same amount of logical storage in a RAID10 vs a RAID5, with associated 
powering, cooling and logistical issues.

In short, RAID5 has its place.  It's certainly not the 
only-an-idiot-would-use-it train wreck that page makes it out to be.

CS

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-10-21  4:42           ` Christopher Smith
@ 2005-10-21 16:48             ` Gil
  2005-10-21 20:08               ` Robin Bowes
  0 siblings, 1 reply; 19+ messages in thread
From: Gil @ 2005-10-21 16:48 UTC (permalink / raw)
  To: Christopher Smith; +Cc: gsslist+linuxraid, linux-raid

Christopher Smith wrote:
> Gregory Seidman wrote:
> 
>> You should at least read the following before using RAID5. You 
>> can agree or disagree, but you should take the arguments into 
>> account:
>> 
>> http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt

> For example, his article suggests that "partial media failure" is
> a problem that would only affect RAID5, when really it would 
> negatively impact any RAID system (your newly-synced mirror isn't
> much good if half the data that just got mirrored to it was 
> corrupted, nor is the speed boost from RAID0 very helpful if half
> the data is corrupted).  I'm also not sure about his claims of 
> RAID3 & 4 "always" checking parity - that sounds like a 
> vendor-specific implementation (and while I'm not a developer, I 
> fail to see why a RAID5 implementation couldn't be made to do the
> same).

The partial media failure problem described here is exactly why it's
important to run smartmontools in combination with your RAID array
of any level.  By running regular checks of the disk surface you can
know well ahead of time that you're going to have trouble.  in
practice this more than mitigates the risk of partial media failure.

http://smartmontools.sourceforge.net/

--Gil

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
  2005-10-21 16:48             ` Gil
@ 2005-10-21 20:08               ` Robin Bowes
  0 siblings, 0 replies; 19+ messages in thread
From: Robin Bowes @ 2005-10-21 20:08 UTC (permalink / raw)
  To: linux-raid

Gil said the following on 21/10/2005 17:48:
> The partial media failure problem described here is exactly why it's
> important to run smartmontools in combination with your RAID array
> of any level.  By running regular checks of the disk surface you can
> know well ahead of time that you're going to have trouble.  in
> practice this more than mitigates the risk of partial media failure.
> 
> http://smartmontools.sourceforge.net/

<sigh> I wish the code to enable SMART for SATA drives would make it 
into mainstream *real* soon now.

R.
-- 
http://robinbowes.com

If a man speaks in a forest,
and his wife's not there,
is he still wrong?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Best way to achieve large, expandable, cheap storage?
@ 2005-10-27 19:12 Andrew Burgess
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Burgess @ 2005-10-27 19:12 UTC (permalink / raw)
  To: linux-raid

>> The partial media failure problem described here is exactly why it's
>> important to run smartmontools in combination with your RAID array
>> of any level.  By running regular checks of the disk surface you can
>> know well ahead of time that you're going to have trouble.  in
>> practice this more than mitigates the risk of partial media failure.
>> 
>> http://smartmontools.sourceforge.net/

><sigh> I wish the code to enable SMART for SATA drives would make it 
>into mainstream *real* soon now.

In addition to the libatapatches, the 3ware sata controllers support SMART.
They are about $250 used on ebay for a 12 drive card. Still 4x as
expensive as a 4 drive controller but only one slot VS three...

HTH


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2005-10-27 19:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-30 13:20 Best way to achieve large, expandable, cheap storage? Robin Bowes
2005-09-30 13:29 ` Robin Bowes
2005-09-30 18:28   ` Brad Dameron
2005-09-30 19:20     ` Dan Stromberg
2005-09-30 18:16 ` Gregory Seidman
2005-09-30 18:34   ` Andy Smith
2005-10-02  4:36 ` Christopher Smith
2005-10-02  7:09   ` Tyler
2005-10-03  3:19     ` Christopher Smith
2005-10-03 16:33   ` Sebastian Kuzminsky
2005-10-04  4:09     ` Christopher Smith
2005-10-20 10:23       ` Robin Bowes
2005-10-20 11:19         ` Gregory Seidman
2005-10-20 11:41           ` Robin Bowes
2005-10-21  4:42           ` Christopher Smith
2005-10-21 16:48             ` Gil
2005-10-21 20:08               ` Robin Bowes
2005-10-21  4:40         ` Christopher Smith
  -- strict thread matches above, loose matches on Subject: below --
2005-10-27 19:12 Andrew Burgess

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).