* Best way to achieve large, expandable, cheap storage?
@ 2005-09-30 13:20 Robin Bowes
2005-09-30 13:29 ` Robin Bowes
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Robin Bowes @ 2005-09-30 13:20 UTC (permalink / raw)
To: linux-raid
Hi,
I have a business opportunity which would involve a large amount of
storage, possibly growing to 10TB in the first year, possibly more. This
would be to store media files - probably mainly .flac or .mp3 files.
Concurrency wouldn't be particularly important as I'd be the only person
access the storage and I have no need for lightning speed.
It would be nice to be able to start smallish and grow as required, but
my experience of linux raid to date is that it's not possible to resize
arrays. (I have a 1TB array built from 6 x 250GB SATA discs on Promise
SATA150 TX4 controllers).
Can anyone offer recommendations as to the most cost-effective way to
achieve this sort of storage?
Are there any limitations I might run into using md on Linux?
For example, suppose I get something like this [1] and throw in an
appropriate mobo/processor etc and 24 x 500 GB SATA discs; would
md/mdadm be able to create a single 11TB RAID5 partition, ie (23-1) x
500, with a hot-spare? Would this be a sensible thing to do?
What about file-system limitations, e.g. would ext3/reiser/XFS support
an 11TB partition?
Would I be better off creating smaller volumes combining them with RAID0?
I'd appreciate any tips/suggestions/advice/pointers to further sources
of information.
Thanks,
R.
--
http://robinbowes.com
If a man speaks in a forest,
and his wife's not there,
is he still wrong?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-09-30 13:20 Best way to achieve large, expandable, cheap storage? Robin Bowes
@ 2005-09-30 13:29 ` Robin Bowes
2005-09-30 18:28 ` Brad Dameron
2005-09-30 18:16 ` Gregory Seidman
2005-10-02 4:36 ` Christopher Smith
2 siblings, 1 reply; 19+ messages in thread
From: Robin Bowes @ 2005-09-30 13:29 UTC (permalink / raw)
To: linux-raid
Robin Bowes wrote:
> For example, suppose I get something like this [1]
Ooops.
http://www.cidesign.com/product_detail.jsp?productID=4
R.
--
http://robinbowes.com
If a man speaks in a forest,
and his wife's not there,
is he still wrong?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-09-30 13:20 Best way to achieve large, expandable, cheap storage? Robin Bowes
2005-09-30 13:29 ` Robin Bowes
@ 2005-09-30 18:16 ` Gregory Seidman
2005-09-30 18:34 ` Andy Smith
2005-10-02 4:36 ` Christopher Smith
2 siblings, 1 reply; 19+ messages in thread
From: Gregory Seidman @ 2005-09-30 18:16 UTC (permalink / raw)
To: linux-raid
On Fri, Sep 30, 2005 at 02:20:11PM +0100, Robin Bowes wrote:
} I have a business opportunity which would involve a large amount of
} storage, possibly growing to 10TB in the first year, possibly more. This
} would be to store media files - probably mainly .flac or .mp3 files.
}
} Concurrency wouldn't be particularly important as I'd be the only person
} access the storage and I have no need for lightning speed.
If you aren't overly concerned about speed, you can use LVM. If you want
redundancy as well as disk-spanning, you can use RAID as well. That is what
I am planning on doing for myself. There are shortcomings, however. See
below.
} It would be nice to be able to start smallish and grow as required, but
} my experience of linux raid to date is that it's not possible to resize
} arrays. (I have a 1TB array built from 6 x 250GB SATA discs on Promise
} SATA150 TX4 controllers).
}
} Can anyone offer recommendations as to the most cost-effective way to
} achieve this sort of storage?
}
} Are there any limitations I might run into using md on Linux?
}
} For example, suppose I get something like this [1] and throw in an
} appropriate mobo/processor etc and 24 x 500 GB SATA discs; would md/mdadm
} be able to create a single 11TB RAID5 partition, ie (23-1) x 500, with a
} hot-spare? Would this be a sensible thing to do?
}
} What about file-system limitations, e.g. would ext3/reiser/XFS support an
} 11TB partition?
AFAIK, all three of those filesystems can handle at least 16 exabytes of
data. I may be wrong, however.
} Would I be better off creating smaller volumes combining them with RAID0?
}
} I'd appreciate any tips/suggestions/advice/pointers to further sources of
} information.
LVM allows you to add more PVs (physical volumes, a.k.a. disks or
partitions) to a VG (volume group). You can then extend an LV (logical
volume) and the filesystem on it. Basically it is like a growable RAID0. It
is even possible to retire old PVs as long as there is sufficient room on
other PVs to take up the slack. This means that in five years when you can
get a 3TB disk for $300 you'll be able to add them in and replace your old,
outdated 250GB drives.
One advantage of LVM is snapshotting. It allows you to basically keep a
cheap diff backup of your disk. It's a really cool feature, but I'm not
going to go into detail about it here.
The main disadvantage is that while you can have a hot spare as part of a
RAID10 to be automatically used in any RAID1 pair as needed, LVM does not
integrate closely enough with md to allow that. You can have a warm spare
all ready to go, but you would have to actually assign it to the
appropriate md device for it to be used. Your best bet is to do a RAID10 with
however many hot spares on your set of disks and put LVM on top of that.
When you want to expand, add another PV to the LVM. Don't rely on LVM to
make a single device from a group of disks bought at any one time, just to
add new storage.
} Thanks,
} R.
--Greg
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-09-30 13:29 ` Robin Bowes
@ 2005-09-30 18:28 ` Brad Dameron
2005-09-30 19:20 ` Dan Stromberg
0 siblings, 1 reply; 19+ messages in thread
From: Brad Dameron @ 2005-09-30 18:28 UTC (permalink / raw)
To: linux-raid
On Fri, 2005-09-30 at 14:29 +0100, Robin Bowes wrote:
> Robin Bowes wrote:
> > For example, suppose I get something like this [1]
>
> Ooops.
>
> http://www.cidesign.com/product_detail.jsp?productID=4
>
> R.
Only 24 drives?
http://www.rackmountpro.com/productpage.php?prodid=2079
And yes you can do that large of a single partition. However you might
consider using something like this to control them instead of a software
RAID.
http://www.areca.com.tw/products/html/pcix-sata.htm
Or if you want the best performance you can get with SATA look at this
one:
http://www.areca.com.tw/products/html/pcie-sata.htm
It has a 800Mhz processor. Does RAID6, etc.
Brad Dameron
SeaTab Software
www.seatab.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-09-30 18:16 ` Gregory Seidman
@ 2005-09-30 18:34 ` Andy Smith
0 siblings, 0 replies; 19+ messages in thread
From: Andy Smith @ 2005-09-30 18:34 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 328 bytes --]
On Fri, Sep 30, 2005 at 02:16:43PM -0400, Gregory Seidman wrote:
> One advantage of LVM is snapshotting. It allows you to basically keep a
> cheap diff backup of your disk. It's a really cool feature, but I'm not
> going to go into detail about it here.
I've found it (snapshots) too unstable for actual production use though.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-09-30 18:28 ` Brad Dameron
@ 2005-09-30 19:20 ` Dan Stromberg
0 siblings, 0 replies; 19+ messages in thread
From: Dan Stromberg @ 2005-09-30 19:20 UTC (permalink / raw)
To: Brad Dameron; +Cc: linux-raid, strombrg
On Fri, 2005-09-30 at 11:28 -0700, Brad Dameron wrote:
> http://www.areca.com.tw/products/html/pcix-sata.htm
>
> Or if you want the best performance you can get with SATA look at this
> one:
Speaking of Areca stuff, someone I'm working with someone who has a
system that appears to be running well with 3.5 terabytes of usable
capacity, But whenever they try to reboot it, it hangs on sync'ing disks
(I haven't seen it first hand yet, but that's what I'm told is
happening).
I haven't tried:
sync &
sync &
sync &
...
reboot(LINUX_REBOOT_CMD_RESTART);
...yet, the last part of which I believe is supposed to reboot without
sync'ing?
Has anyone encountered this before?
Thanks!
Some specifics follow:
Controller info:
strombrg@hiperstore ~]$ dmesg | grep -i areca
ARECA RAID: 64BITS PCI BUS DMA ADDRESSING SUPPORTED
scsi0 : ARECA ARC1130 PCI-X 12 PORTS SATA RAID CONTROLLER
(RAID6-ENGINE Inside)
Vendor: Areca Model: ARC-1130-VOL#00 Rev: R001
CPU info:
[strombrg@hiperstore ~]$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 5
model name : AMD Opteron(tm) Processor 244
stepping : 10
cpu MHz : 1794.825
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx
mmxext lm 3dnowext 3dnow
bogomips : 3597.17
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 5
model name : AMD Opteron(tm) Processor 244
stepping : 10
cpu MHz : 1794.825
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx
mmxext lm 3dnowext 3dnow
bogomips : 3590.12
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
OS info:
[strombrg@hiperstore ~]$ cat /etc/redhat-release
Fedora Core release 4 (Stentz)
Kernel info:
[strombrg@hiperstore ~]$ cat /proc/version
Linux version 2.6.13-mm2 (root@hiperstore) (gcc version 4.0.1
20050727 (Red Hat 4.0.1-5)) #2 SMP Thu Sep 8 23:10:46 PDT 2005
It appears to be an (otherwise) unmodified Andrew Morton kernel
they're using:
[strombrg@hiperstore linux-2.6.13-mm2]$ find . -type f -print | egrep -vi '\.ko$|\.cmd$|\.o$' | xargs filetime | highest -n 50
1126246262 ./drivers/parport/parport.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/parport/parport_serial.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/parport/parport_pc.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/message/i2o/i2o_block.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/message/i2o/i2o_config.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/message/i2o/i2o_core.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/message/i2o/i2o_proc.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/message/i2o/i2o_scsi.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/infiniband/hw/mthca/ib_mthca.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/infiniband/core/ib_cm.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/infiniband/core/ib_core.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/infiniband/core/ib_sa.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/infiniband/core/ib_ucm.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/infiniband/core/ib_mad.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/infiniband/core/ib_umad.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/infiniband/ulp/ipoib/ib_ipoib.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/atm/firestream.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/atm/he.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/atm/horizon.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/atm/idt77252.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/atm/suni.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/atm/ambassador.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/atm/eni.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/atm/lanai.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/atm/atmtcp.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/backlight/lcd.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/backlight/backlight.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/vgastate.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/aty/aty128fb.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/aty/radeonfb.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/aty/atyfb.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/vga16fb.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/matrox/i2c-matroxfb.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/matrox/matroxfb_misc.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/matrox/g450_pll.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/matrox/matroxfb_g450.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/matrox/matroxfb_crtc2.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/matrox/matroxfb_accel.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/matrox/matroxfb_Ti3026.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/matrox/matroxfb_maven.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/matrox/matroxfb_base.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/matrox/matroxfb_DAC1064.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/savage/savagefb.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/tdfxfb.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/cirrusfb.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/sstfb.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/neofb.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/riva/rivafb.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/macmodes.mod.c Thu Sep 8 23:11:02 2005
1126246262 ./drivers/video/kyro/kyrofb.mod.c Thu Sep 8 23:11:02 2005
And the arcmsr driver that came with this mm kernel appears to be version " 1.20.00.07 3/23/2005".
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-09-30 13:20 Best way to achieve large, expandable, cheap storage? Robin Bowes
2005-09-30 13:29 ` Robin Bowes
2005-09-30 18:16 ` Gregory Seidman
@ 2005-10-02 4:36 ` Christopher Smith
2005-10-02 7:09 ` Tyler
2005-10-03 16:33 ` Sebastian Kuzminsky
2 siblings, 2 replies; 19+ messages in thread
From: Christopher Smith @ 2005-10-02 4:36 UTC (permalink / raw)
To: Robin Bowes; +Cc: linux-raid
Robin Bowes wrote:
> Hi,
>
> I have a business opportunity which would involve a large amount of
> storage, possibly growing to 10TB in the first year, possibly more. This
> would be to store media files - probably mainly .flac or .mp3 files.
Here's what I do (bear in mind this is for a home setup, so the data
volumes aren't as large and I'd expand in smaller amounts to you - but
the principle is the same).
I use a combination of Linux's software RAID + LVM for a flexible,
expandable data store. I buy disks in sets of four, with a four-port
disk controller and a 4-drive, cooled chassis of some sort (lately, the
Coolermaster 4-in-3 part).
I RAID5 the drives together and glue multiple sets of 4 drives together
into a single usable chunk using LVM.
Over the last ~5 years, this has allowed me to move from/to the
following disk configurations:
4x40GB -> 4x40GB + 4x120GB -> 4x40GB + 4x120GB + 4x250GB -> 4x120GB +
4x250GB -> 4x250GB + 4x250GB.
In the next couple of months I plan to add another 4x300GB "drive set"
to expand further. I add drives about once a year. I remove drives
either because I run out of physical room in the machine, or to re-use
them in other machines (eg: the 4x120GB drives are now scratch space on
my workstation, the 4x40GB drives went into machines I built for
relatives). The case I have now is capable of holding about 20 drives,
so I probably won't be removing any for a while (previous cases were
stretched to hold 8 drives).
Apart from the actual hardware installations and removals, the various
reconfigurations have been quite smoothe and painless, with LVM allowing
easy migration of data to/from RAID devices, division of space, etc.
I've had 3 disk failures, none of which have resulted in any data loss.
The "data store" has been moved across 3 very different physical
machines and 3 different Linux installations (Redhat 9 -> RHEL3 -> FC4).
I would suggest not trying to resize existing arrays at all, and simply
accept the "space wastage" as a cost of flexibility. Storage is cheap,
and a few dozens or hundreds of GB lost to long-term cost savings is
well worth it IMHO. The space I "lose" but not reconfiguring my RAID
arrays whenever I add more disks is more than made up for by the money
I've saving not buying everything at once, or the additional space
available at the same price point.
I would, however, suggest getting a case with a large amount of physical
space in it so you don't have to remove drives to add bigger ones.
But, basically, just buy as much space as you need now and then buy more
as required - it's trivially easy to do, and you'll save money in the
long run.
CS
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-10-02 4:36 ` Christopher Smith
@ 2005-10-02 7:09 ` Tyler
2005-10-03 3:19 ` Christopher Smith
2005-10-03 16:33 ` Sebastian Kuzminsky
1 sibling, 1 reply; 19+ messages in thread
From: Tyler @ 2005-10-02 7:09 UTC (permalink / raw)
To: Christopher Smith; +Cc: linux-raid
Christopher Smith wrote:
> Robin Bowes wrote:
>
>> Hi,
>>
>> I have a business opportunity which would involve a large amount of
>> storage, possibly growing to 10TB in the first year, possibly more.
>> This would be to store media files - probably mainly .flac or .mp3
>> files.
>
>
> Here's what I do (bear in mind this is for a home setup, so the data
> volumes aren't as large and I'd expand in smaller amounts to you - but
> the principle is the same).
>
> I use a combination of Linux's software RAID + LVM for a flexible,
> expandable data store. I buy disks in sets of four, with a four-port
> disk controller and a 4-drive, cooled chassis of some sort (lately,
> the Coolermaster 4-in-3 part).
>
> I RAID5 the drives together and glue multiple sets of 4 drives
> together into a single usable chunk using LVM.
>
> Over the last ~5 years, this has allowed me to move from/to the
> following disk configurations:
>
> 4x40GB -> 4x40GB + 4x120GB -> 4x40GB + 4x120GB + 4x250GB -> 4x120GB +
> 4x250GB -> 4x250GB + 4x250GB.
>
> In the next couple of months I plan to add another 4x300GB "drive set"
> to expand further. I add drives about once a year. I remove drives
> either because I run out of physical room in the machine, or to re-use
> them in other machines (eg: the 4x120GB drives are now scratch space
> on my workstation, the 4x40GB drives went into machines I built for
> relatives). The case I have now is capable of holding about 20
> drives, so I probably won't be removing any for a while (previous
> cases were stretched to hold 8 drives).
>
> Apart from the actual hardware installations and removals, the various
> reconfigurations have been quite smoothe and painless, with LVM
> allowing easy migration of data to/from RAID devices, division of
> space, etc. I've had 3 disk failures, none of which have resulted in
> any data loss. The "data store" has been moved across 3 very
> different physical machines and 3 different Linux installations
> (Redhat 9 -> RHEL3 -> FC4).
>
> I would suggest not trying to resize existing arrays at all, and
> simply accept the "space wastage" as a cost of flexibility. Storage
> is cheap, and a few dozens or hundreds of GB lost to long-term cost
> savings is well worth it IMHO. The space I "lose" but not
> reconfiguring my RAID arrays whenever I add more disks is more than
> made up for by the money I've saving not buying everything at once, or
> the additional space available at the same price point.
>
> I would, however, suggest getting a case with a large amount of
> physical space in it so you don't have to remove drives to add bigger
> ones.
>
> But, basically, just buy as much space as you need now and then buy
> more as required - it's trivially easy to do, and you'll save money in
> the long run.
>
> CS
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
What case and power supply(s)are you using? What raid cards are you
using also?
Thanks,
Tyler.
--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.11.9/116 - Release Date: 9/30/2005
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-10-02 7:09 ` Tyler
@ 2005-10-03 3:19 ` Christopher Smith
0 siblings, 0 replies; 19+ messages in thread
From: Christopher Smith @ 2005-10-03 3:19 UTC (permalink / raw)
To: Tyler; +Cc: linux-raid
Tyler wrote:
> What case and power supply(s)are you using? What raid cards are you
> using also?
The case is a no-name job I picked up from a local PC seller:
http://www.pcicase.com.au/sub_files01.htm
It's main attraction to me was the large number of 5.25" drive bays.
The PSU is just a bog-standard 450W Antec (although since I've recently
ugpraded the machine to dual Xeons, I should get a beefier unit).
Currently the machine has 2xPromise S150 TX4. Previously, as the 120GB
drives were PATA, it also had a Promise TX4000. However, since that
card wouldn't work with a 2.6 kernel, I used it as an excuse to get more
drives and upgrade to a newer distro :). The TX4s are 32 bit, 66Mhz PCI
cards and are in 64/133Mhz PCI-X slots, so they handle four 7200rpm SATA
drives each quite well. I toyed with getting a single 8 port SATA card,
but all the ones I've seen are full-blown hardware RAID, making them
quite expensive and since I use software RAID and have 5 PCI-X slots on
the motherboard, not worth it. I'll run out of physical space in the
case before I run out of PCI-X slots to drop 4-port cards into.
CS
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-10-02 4:36 ` Christopher Smith
2005-10-02 7:09 ` Tyler
@ 2005-10-03 16:33 ` Sebastian Kuzminsky
2005-10-04 4:09 ` Christopher Smith
1 sibling, 1 reply; 19+ messages in thread
From: Sebastian Kuzminsky @ 2005-10-03 16:33 UTC (permalink / raw)
To: Christopher Smith; +Cc: Robin Bowes, linux-raid
On 10/1/05, Christopher Smith <csmith@nighthawkrad.net> wrote:
> I use a combination of Linux's software RAID + LVM for a flexible,
> expandable data store. I buy disks in sets of four, with a four-port
> disk controller and a 4-drive, cooled chassis of some sort (lately, the
> Coolermaster 4-in-3 part).
>
> I RAID5 the drives together and glue multiple sets of 4 drives together
> into a single usable chunk using LVM.
Sounds pretty cool. I've used software RAID but never LVM, let me see
if I understand your setup:
At the lowest level, you have 4-disk controller cards, each connected
to a set of 4 disks. Each set of 4 has a software RAID-5. All the
RAID-5 arrays are used as LVM physical volumes. These PVs are part of
a single volume group, from which you make logical volumes as needed.
When you want more disk, you buy 4 big modern disks (and a 4x
controller if needed), RAID-5 them, extend the VG onto them, and
extend the LV(s) on the VG. Then I guess you have to unmount the
filesystem(s) on the LV(s), resize them, and remount them.
If you get low on room in the case or it gets too hot or noisy, you
have to free up an old, small RAID array. You unmount, resize, and
remount the filesystem(s), reduce the LV(s) and the VG, and then
you're free to pull the old RAID array from the case.
> Apart from the actual hardware installations and removals, the various
> reconfigurations have been quite smoothe and painless, with LVM allowing
> easy migration of data to/from RAID devices, division of space, etc.
> I've had 3 disk failures, none of which have resulted in any data loss.
> The "data store" has been moved across 3 very different physical
> machines and 3 different Linux installations (Redhat 9 -> RHEL3 -> FC4).
Your data survives one disk per PV croaking, but two disks out on any
one PV causes complete data loss, assuming you use the stripe mapping.
You use SATA, which does not support SMART yet, right? So you get no
warning of pending drive failures yet.
None the less, sounds like a nice flexible setup.
--
Sebastian Kuzminsky
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-10-03 16:33 ` Sebastian Kuzminsky
@ 2005-10-04 4:09 ` Christopher Smith
2005-10-20 10:23 ` Robin Bowes
0 siblings, 1 reply; 19+ messages in thread
From: Christopher Smith @ 2005-10-04 4:09 UTC (permalink / raw)
To: Sebastian Kuzminsky; +Cc: Robin Bowes, linux-raid
Sebastian Kuzminsky wrote:
> On 10/1/05, Christopher Smith <csmith@nighthawkrad.net> wrote:
>
>>I RAID5 the drives together and glue multiple sets of 4 drives together
>>into a single usable chunk using LVM.
>
>
> Sounds pretty cool. I've used software RAID but never LVM, let me see
> if I understand your setup:
>
> At the lowest level, you have 4-disk controller cards, each connected
> to a set of 4 disks. Each set of 4 has a software RAID-5. All the
> RAID-5 arrays are used as LVM physical volumes. These PVs are part of
> a single volume group, from which you make logical volumes as needed.
>
> When you want more disk, you buy 4 big modern disks (and a 4x
> controller if needed), RAID-5 them, extend the VG onto them, and
> extend the LV(s) on the VG. Then I guess you have to unmount the
> filesystem(s) on the LV(s), resize them, and remount them.
>
> If you get low on room in the case or it gets too hot or noisy, you
> have to free up an old, small RAID array. You unmount, resize, and
> remount the filesystem(s), reduce the LV(s) and the VG, and then
> you're free to pull the old RAID array from the case.
Yep, that's pretty much bang on. The only thing you've missed is using
pvmove to physically move the data off the soon-to-be-decomissioned
PVs(/RAID arrays).
Be warned, for those who haven't used it before, pvmove is _very_ slow.
>>Apart from the actual hardware installations and removals, the various
>>reconfigurations have been quite smoothe and painless, with LVM allowing
>>easy migration of data to/from RAID devices, division of space, etc.
>>I've had 3 disk failures, none of which have resulted in any data loss.
>> The "data store" has been moved across 3 very different physical
>>machines and 3 different Linux installations (Redhat 9 -> RHEL3 -> FC4).
>
>
> Your data survives one disk per PV croaking, but two disks out on any
> one PV causes complete data loss, assuming you use the stripe mapping.
Yep, that's correct. I've never lost more than one disk out of an array
at once and I've always replaced any disk failures the same day. I lost
two of the 40GB drives (about 6 months apart - back before I had decent
cooling on them) and one of the 120GB drives.
> You use SATA, which does not support SMART yet, right? So you get no
> warning of pending drive failures yet.
Yep. The only annoyance. I eagerly await the ability to check my SATA
disks with SMART.
CS
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-10-04 4:09 ` Christopher Smith
@ 2005-10-20 10:23 ` Robin Bowes
2005-10-20 11:19 ` Gregory Seidman
2005-10-21 4:40 ` Christopher Smith
0 siblings, 2 replies; 19+ messages in thread
From: Robin Bowes @ 2005-10-20 10:23 UTC (permalink / raw)
To: Christopher Smith; +Cc: Sebastian Kuzminsky, Robin Bowes, linux-raid
Christopher Smith said the following on 04/10/2005 05:09:
> Yep, that's pretty much bang on. The only thing you've missed is using
> pvmove to physically move the data off the soon-to-be-decomissioned
> PVs(/RAID arrays).
>
> Be warned, for those who haven't used it before, pvmove is _very_ slow.
I've just been re-reading this thread.
I'd like to just check if I understand how this will work.
Assume the following setup (hypothetical).
VG:
big_vg - contains /dev/md1, /dev/md2; 240GB
PV:
/dev/md1 - 4 x 40GB drives (RAID5 - 120GB total)
/dev/md2 - 4 x 40GB drives (RAID5 - 120GB total)
LV:
big_lv - in big_vg - 240GB
Filesystems:
/home - xfs filesystem in big_lv - 240GB
Suppose I then add a new PV:
/dev/md3 - 4 x 300GB drives (RAID5 - 900GB total)
I want to replace /dev/md1 with /dev/md3
I use pvmove something like this:
# pvmove /dev/md1 /dev/md3
When this finishes, big_vg will contain /dev/md2 + /dev/md3 (1020GB
total). /dev/md1 will be unused.
big_lv will still be using just 240GB of big_vg.
I then use lvextend to increase the size of big_lv
big_lv will now use all 1020GB of big_vg.
However, the /home filesystem will still just use 240GB of big_lv
I can then use xfs_growfs to expand the /home filesystem to use all
1020GB of big_lv.
Have I missed anything?
R.
--
http://robinbowes.com
If a man speaks in a forest,
and his wife's not there,
is he still wrong?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-10-20 10:23 ` Robin Bowes
@ 2005-10-20 11:19 ` Gregory Seidman
2005-10-20 11:41 ` Robin Bowes
2005-10-21 4:42 ` Christopher Smith
2005-10-21 4:40 ` Christopher Smith
1 sibling, 2 replies; 19+ messages in thread
From: Gregory Seidman @ 2005-10-20 11:19 UTC (permalink / raw)
To: linux-raid
On Thu, Oct 20, 2005 at 11:23:30AM +0100, Robin Bowes wrote:
} Christopher Smith said the following on 04/10/2005 05:09:
} >Yep, that's pretty much bang on. The only thing you've missed is using
} >pvmove to physically move the data off the soon-to-be-decomissioned
} >PVs(/RAID arrays).
} >
} >Be warned, for those who haven't used it before, pvmove is _very_ slow.
}
} I've just been re-reading this thread.
}
} I'd like to just check if I understand how this will work.
}
} Assume the following setup (hypothetical).
}
} VG:
} big_vg - contains /dev/md1, /dev/md2; 240GB
}
} PV:
} /dev/md1 - 4 x 40GB drives (RAID5 - 120GB total)
} /dev/md2 - 4 x 40GB drives (RAID5 - 120GB total)
You should at least read the following before using RAID5. You can agree or
disagree, but you should take the arguments into account:
http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt
} LV:
} big_lv - in big_vg - 240GB
}
} Filesystems:
} /home - xfs filesystem in big_lv - 240GB
}
} Suppose I then add a new PV:
} /dev/md3 - 4 x 300GB drives (RAID5 - 900GB total)
You use pvcreate and vgextend to do so, incidentally.
} I want to replace /dev/md1 with /dev/md3
}
} I use pvmove something like this:
}
} # pvmove /dev/md1 /dev/md3
}
} When this finishes, big_vg will contain /dev/md2 + /dev/md3 (1020GB
} total). /dev/md1 will be unused.
/dev/md1 will still be a part of big_vg, but it won't have any data from
any LVs on it. You will need to use vgreduce to remove /dev/md1 from the
VG:
# vgreduce big_vg /dev/md1
} big_lv will still be using just 240GB of big_vg.
}
} I then use lvextend to increase the size of big_lv
}
} big_lv will now use all 1020GB of big_vg.
}
} However, the /home filesystem will still just use 240GB of big_lv
}
} I can then use xfs_growfs to expand the /home filesystem to use all
} 1020GB of big_lv.
All correct.
} Have I missed anything?
Just the vgreduce step (and removing the physical drives that make up
/dev/md1).
} R.
--Greg
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-10-20 11:19 ` Gregory Seidman
@ 2005-10-20 11:41 ` Robin Bowes
2005-10-21 4:42 ` Christopher Smith
1 sibling, 0 replies; 19+ messages in thread
From: Robin Bowes @ 2005-10-20 11:41 UTC (permalink / raw)
To: linux-raid
Gregory Seidman said the following on 20/10/2005 12:19:
> } PV:
> } /dev/md1 - 4 x 40GB drives (RAID5 - 120GB total)
> } /dev/md2 - 4 x 40GB drives (RAID5 - 120GB total)
>
> You should at least read the following before using RAID5. You can agree or
> disagree, but you should take the arguments into account:
>
> http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt
This was just an example configuration. My current 1TB array is RAID5
with 1 hot spare but I'll most likely use RAID6 in production.
> } Suppose I then add a new PV:
> } /dev/md3 - 4 x 300GB drives (RAID5 - 900GB total)
>
> You use pvcreate and vgextend to do so, incidentally.
Yes, thanks for the detail.
> } When this finishes, big_vg will contain /dev/md2 + /dev/md3 (1020GB
> } total). /dev/md1 will be unused.
>
> /dev/md1 will still be a part of big_vg, but it won't have any data from
> any LVs on it. You will need to use vgreduce to remove /dev/md1 from the
> VG:
>
> # vgreduce big_vg /dev/md1
Ah, yes, forgot about that step.
Thanks for the validation of the methodology.
I'm going to give this a try on my test server (using much smaller disks!)
Thanks again,
R.
--
http://robinbowes.com
If a man speaks in a forest,
and his wife's not there,
is he still wrong?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-10-20 10:23 ` Robin Bowes
2005-10-20 11:19 ` Gregory Seidman
@ 2005-10-21 4:40 ` Christopher Smith
1 sibling, 0 replies; 19+ messages in thread
From: Christopher Smith @ 2005-10-21 4:40 UTC (permalink / raw)
To: Robin Bowes; +Cc: Sebastian Kuzminsky, linux-raid
Robin Bowes wrote:
> Christopher Smith said the following on 04/10/2005 05:09:
>
>> Yep, that's pretty much bang on. The only thing you've missed is
>> using pvmove to physically move the data off the
>> soon-to-be-decomissioned PVs(/RAID arrays).
>>
>> Be warned, for those who haven't used it before, pvmove is _very_ slow.
>
>
> I've just been re-reading this thread.
[...]
> I use pvmove something like this:
>
> # pvmove /dev/md1 /dev/md3
It would actually just be 'pvmove <old md device>', but the gist is correct.
Someone else has already responded to your questions, but just something
else to be aware of with pvmove, is that it might hang your system
(requiring a hard boot) when you try and use it, although the process
will proceed and complete without error (in the background) once you
have restarted.
It's been several months since I last used pvmove, so this bug may have
been fixed, but it was certainly present on FC4 back then. Basically
running pvmove would immediately hang the system (no response to
keyboard, etc), but after a hard reboot the pvmove process would start
up and then complete in the background.
Again, this may well have been fixed and you might not see it, but just
a word of warning so your first reaction isn't something rash ;). Since
pvmove appears to do its thing by *copying* everything from one PV to
another, rather than moving it, even if the machine crashes during the
process there's no data loss.
CS
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-10-20 11:19 ` Gregory Seidman
2005-10-20 11:41 ` Robin Bowes
@ 2005-10-21 4:42 ` Christopher Smith
2005-10-21 16:48 ` Gil
1 sibling, 1 reply; 19+ messages in thread
From: Christopher Smith @ 2005-10-21 4:42 UTC (permalink / raw)
To: gsslist+linuxraid; +Cc: linux-raid
Gregory Seidman wrote:
> You should at least read the following before using RAID5. You can agree or
> disagree, but you should take the arguments into account:
>
> http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt
This bloke makes some good points about the various downsides of RAID5
(which everyone involved in actually implementing production RAID
systems should already know), but IMHO he also makes some poor
assumptions and specious claims.
For example, his article suggests that "partial media failure" is a
problem that would only affect RAID5, when really it would negatively
impact any RAID system (your newly-synced mirror isn't much good if half
the data that just got mirrored to it was corrupted, nor is the speed
boost from RAID0 very helpful if half the data is corrupted). I'm also
not sure about his claims of RAID3 & 4 "always" checking parity - that
sounds like a vendor-specific implementation (and while I'm not a
developer, I fail to see why a RAID5 implementation couldn't be made to
do the same).
As another example, I'm 99% sure that SCSI drives *do* inform the OS
when they remap a bad sector and that any remotely modern IDE drive also
does sector remapping.
He also focuses solely on the worst-case scenario as a reason for
avoiding RAID5 completely. Certainly you have to take that into
account, but it's rather unfair to draw a general conclusion based only
on how a particular scenario might happen.
Added to that, he completely discounts a few things:
1. Where it's "handy" to keep lots of data easily available, but its
entire loss is not catastrophic - ie: data volume is more important than
redundancy (my workplace has such a requirement, although we use RAID6 -
but RAID6 suffers most of the same "problems" he's talking about)
2. Where cost is a significant factor. Certainly for a business, the
cost of going RAID10 over RAID5, when taking into account possible
losses, is probably not large. However, in a "home user" scenario,
where cost is almost always the deciding factor and performance is not
particularly important, going RAID10 over RAID5 is difficult to justify.
Similarly, if large amounts of data (10s of terabytes) is being
stored, the additional cost of RAID10 can become substantial.
3. You could potentially need a _lot_ more physical space to get the
same amount of logical storage in a RAID10 vs a RAID5, with associated
powering, cooling and logistical issues.
In short, RAID5 has its place. It's certainly not the
only-an-idiot-would-use-it train wreck that page makes it out to be.
CS
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-10-21 4:42 ` Christopher Smith
@ 2005-10-21 16:48 ` Gil
2005-10-21 20:08 ` Robin Bowes
0 siblings, 1 reply; 19+ messages in thread
From: Gil @ 2005-10-21 16:48 UTC (permalink / raw)
To: Christopher Smith; +Cc: gsslist+linuxraid, linux-raid
Christopher Smith wrote:
> Gregory Seidman wrote:
>
>> You should at least read the following before using RAID5. You
>> can agree or disagree, but you should take the arguments into
>> account:
>>
>> http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt
> For example, his article suggests that "partial media failure" is
> a problem that would only affect RAID5, when really it would
> negatively impact any RAID system (your newly-synced mirror isn't
> much good if half the data that just got mirrored to it was
> corrupted, nor is the speed boost from RAID0 very helpful if half
> the data is corrupted). I'm also not sure about his claims of
> RAID3 & 4 "always" checking parity - that sounds like a
> vendor-specific implementation (and while I'm not a developer, I
> fail to see why a RAID5 implementation couldn't be made to do the
> same).
The partial media failure problem described here is exactly why it's
important to run smartmontools in combination with your RAID array
of any level. By running regular checks of the disk surface you can
know well ahead of time that you're going to have trouble. in
practice this more than mitigates the risk of partial media failure.
http://smartmontools.sourceforge.net/
--Gil
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
2005-10-21 16:48 ` Gil
@ 2005-10-21 20:08 ` Robin Bowes
0 siblings, 0 replies; 19+ messages in thread
From: Robin Bowes @ 2005-10-21 20:08 UTC (permalink / raw)
To: linux-raid
Gil said the following on 21/10/2005 17:48:
> The partial media failure problem described here is exactly why it's
> important to run smartmontools in combination with your RAID array
> of any level. By running regular checks of the disk surface you can
> know well ahead of time that you're going to have trouble. in
> practice this more than mitigates the risk of partial media failure.
>
> http://smartmontools.sourceforge.net/
<sigh> I wish the code to enable SMART for SATA drives would make it
into mainstream *real* soon now.
R.
--
http://robinbowes.com
If a man speaks in a forest,
and his wife's not there,
is he still wrong?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Best way to achieve large, expandable, cheap storage?
@ 2005-10-27 19:12 Andrew Burgess
0 siblings, 0 replies; 19+ messages in thread
From: Andrew Burgess @ 2005-10-27 19:12 UTC (permalink / raw)
To: linux-raid
>> The partial media failure problem described here is exactly why it's
>> important to run smartmontools in combination with your RAID array
>> of any level. By running regular checks of the disk surface you can
>> know well ahead of time that you're going to have trouble. in
>> practice this more than mitigates the risk of partial media failure.
>>
>> http://smartmontools.sourceforge.net/
><sigh> I wish the code to enable SMART for SATA drives would make it
>into mainstream *real* soon now.
In addition to the libatapatches, the 3ware sata controllers support SMART.
They are about $250 used on ebay for a 12 drive card. Still 4x as
expensive as a 4 drive controller but only one slot VS three...
HTH
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2005-10-27 19:12 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-30 13:20 Best way to achieve large, expandable, cheap storage? Robin Bowes
2005-09-30 13:29 ` Robin Bowes
2005-09-30 18:28 ` Brad Dameron
2005-09-30 19:20 ` Dan Stromberg
2005-09-30 18:16 ` Gregory Seidman
2005-09-30 18:34 ` Andy Smith
2005-10-02 4:36 ` Christopher Smith
2005-10-02 7:09 ` Tyler
2005-10-03 3:19 ` Christopher Smith
2005-10-03 16:33 ` Sebastian Kuzminsky
2005-10-04 4:09 ` Christopher Smith
2005-10-20 10:23 ` Robin Bowes
2005-10-20 11:19 ` Gregory Seidman
2005-10-20 11:41 ` Robin Bowes
2005-10-21 4:42 ` Christopher Smith
2005-10-21 16:48 ` Gil
2005-10-21 20:08 ` Robin Bowes
2005-10-21 4:40 ` Christopher Smith
-- strict thread matches above, loose matches on Subject: below --
2005-10-27 19:12 Andrew Burgess
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).