* Re: Partitioning md devices versus partitioining underlying devices
2006-04-06 15:25 Partitioning md devices versus partitioining underlying devices andy liebman
@ 2006-04-06 17:20 ` Bill Davidsen
2006-04-07 13:36 ` John Stoffel
1 sibling, 0 replies; 3+ messages in thread
From: Bill Davidsen @ 2006-04-06 17:20 UTC (permalink / raw)
To: andy liebman; +Cc: linux-raid
On Thu, 6 Apr 2006, andy liebman wrote:
> Hi,
>
> I have a fundamental question about WHERE it is best to do partititioning.
>
> Here's a concrete example. I have two 3ware RAID-5 arrays, each made up
> of 12 500 GB drives. When presented to Linux, these are /dev/sda and
> /dev/sdb -- each 5.5 TB in size.
>
> I want to stripe the two arrays together, so that 24 drives are all
> operating as one unit. However, I don't want an 11 TB filesystem. I want
> to keep my filesystems down below 6 TB.
>
> It seems I have two choices:
>
> 1) partition the 3ware devices to make /dev/sda1, /dev/sda2, /dev/sdb1
> and /dev/sdb2. Then I can create TWO md RAID-0 devices -- /dev/sda1 +
> /dev/sdb1 = /dev/md1, /dev/sda2 + /dev/sdb2 = /dev/md2
>
> OR
>
> 2) create /dev/md1 from the entire 3ware devices -- /dev/sda + /dev/sdb
> = /dev/md1 -- and then partition /dev/md1 into two devices.
>
> The question is, are these essentially equivalent alternatives? Is there
> any theoretical reason why one choice would be better than the other --
> in terms of security, performance, memory usage, etc.
>
> A knowledgeable answer would be appreciated. Thanks in advance.
There is one advantage to partitioning sda and sdb and then building
devices using the partitions... you can use different stripe sizes on each
md drive built on the partition. *IF* you have different things going on
in the filesystems, you may be able to improve performance and spread head
motion by using tuned stripe sizes.
I did this for an application which had and index of 128 bytes index
records to a bunch of 500-1000k data records. I used a small stripe size
on the index and large on the data, and was able to reduce time from
request to data delivery by more than 20%. I was doing RAID-0 over six
SCSI drives.
Assuming that you do the same thing on both filesystems, I see no benefit
to one way over the other, I was just answering your question as to a
possible benefit. Use of LVM or dm to do the same thing might allow you to
change f/s sizes and such after the fact, I have only tried that as a
learning exercise, so I can't say how well it works in practice, or which
is better for you.
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with little computers since 1979
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Partitioning md devices versus partitioining underlying devices
2006-04-06 15:25 Partitioning md devices versus partitioining underlying devices andy liebman
2006-04-06 17:20 ` Bill Davidsen
@ 2006-04-07 13:36 ` John Stoffel
1 sibling, 0 replies; 3+ messages in thread
From: John Stoffel @ 2006-04-07 13:36 UTC (permalink / raw)
To: andy liebman; +Cc: linux-raid
andy> Here's a concrete example. I have two 3ware RAID-5 arrays, each
andy> made up of 12 500 GB drives. When presented to Linux, these are
andy> /dev/sda and /dev/sdb -- each 5.5 TB in size.
andy> I want to stripe the two arrays together, so that 24 drives are
andy> all operating as one unit. However, I don't want an 11 TB
andy> filesystem. I want to keep my filesystems down below 6 TB.
Why? What are you issues with large filesystems? I assume this is
related to your NAS -> NAS mirror question as well. Also, what will
you do if a single controller fails? Or do you care?
andy> 1) partition the 3ware devices to make /dev/sda1, /dev/sda2, /dev/sdb1
andy> and /dev/sdb2. Then I can create TWO md RAID-0 devices -- /dev/sda1 +
andy> /dev/sdb1 = /dev/md1, /dev/sda2 + /dev/sdb2 = /dev/md2
andy> OR
andy> 2) create /dev/md1 from the entire 3ware devices -- /dev/sda + /dev/sdb
andy> = /dev/md1 -- and then partition /dev/md1 into two devices.
The general plan I would use is to start at the low level and go from:
/dev/sda1 -> md1 -> LVM -> partition
But the question is whether to use Hardware RAID5, or Software RAID5.
If the data is really important, I'd probably think seriously about
using Neil's RAID6 patches because a single disk failure takes so long
to re-sync and recover from, and RAID6 helps close that gap alot.
So I think I'd probably just ignore a controller failing issue, since
I'm mirroring the data to a totally seperate device, and just build a
single large RAID6 device with a single hot spare disk. So you'd have
21 x 500GB worth of data.
Heck, I'd also look into getting a server with multiple PCI busses and
getting non-3ware controllers across more busses since I'd get better
performance. But the 3ware should hopefully hide single disk hot-swap
issues better. It's a tradeoff and time for testing.
Anyway, try to put each 3ware onto it's own PCI bus if you can.
So, ontop of that huge RAID6 volume, I'd stick LVM and then carve out
the PVs -> LVs and make the filesystem I want onto the LVs.
andy> The question is, are these essentially equivalent alternatives?
andy> Is there any theoretical reason why one choice would be better
andy> than the other -- in terms of security, performance, memory
andy> usage, etc.
If you add in LVM to the mix, I think they are both equivilent, since
you use LVM as an interface layer to hide the details of the lower
layers from the filesystem. With LVM you can add/move/delete PVs
(Physical Volumes) from a system and move data around with the system
live.
This would allow you to do a quick shutdown to add new hardware/disks
and then bring up the system. With the system live and serving data,
you can then build new PVs, add them into LVM and then move data from
old controllers/disks to new disks, all while serving data and keeping
up redundancy. It's really cool.
You do take some performance hit while doing this, since you are
copying lots and lots of data around, but it's not bad at all.
Look for a stable filesystem which allows you to resize it while
mounted. I think XFS lets you do this, but double check.
John
^ permalink raw reply [flat|nested] 3+ messages in thread