RAID5 with two drive sizes question

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID5 with two drive sizes question
@ 2012-06-05 17:27 Joachim Otahal (privat)
  2012-06-05 17:39 ` Roman Mamedov
  0 siblings, 1 reply; 8+ messages in thread
From: Joachim Otahal (privat) @ 2012-06-05 17:27 UTC (permalink / raw)
  To: linux-raid

Hi,
Debian 6.0.4 / superblock 1.2
sdc1 = 1.5 TB
sdd1 = 1.5 TB (cannot be used during --create, contains still data)
sde1 = 1 TB
sdf1 = 1 TB
sdg1 = 1 TB

Target: RADI5 with 4.5 TB capacity.

The normal case would be:
mdadm -C /dev/md3 --bitmap=internal -l 5 -n 5 /dev/sdc1 /dev/sdd1 
/dev/sde1 /dev/sdf1 /dev/sdg1
What I expect: since the first and the second drive are 1.5 TB size the 
third fouth and fifth drive are treated like 2*1.5 TB, creating a 4.5 TB 
RAID.
What would really be created: I know here are people that know and not 
guess : ).

What my case actually is:
mdadm -C /dev/md3 --bitmap=internal -l 5 -n 5 /dev/sdc1 missing 
/dev/sde1 /dev/sdf1 /dev/sdg1
Expected: Still create a 4.5 TB since sdc1 is 1.5 TB, though sdd1 is yet 
missing.
Will it work as expected?
Then format md3, and copy content of sdd1 (which is yet still /dev/md2) 
into the raid, then --add /dev/sdd1 to the raid and wait until the 
rebuild is done.

Jou

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RAID5 with two drive sizes question
  2012-06-05 17:27 RAID5 with two drive sizes question Joachim Otahal (privat)
@ 2012-06-05 17:39 ` Roman Mamedov
  2012-06-05 19:41   ` Joachim Otahal (privat)
  0 siblings, 1 reply; 8+ messages in thread
From: Roman Mamedov @ 2012-06-05 17:39 UTC (permalink / raw)
  To: Joachim Otahal (privat); +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1274 bytes --]

On Tue, 05 Jun 2012 19:27:53 +0200
"Joachim Otahal (privat)" <Jou@gmx.net> wrote:

> Hi,
> Debian 6.0.4 / superblock 1.2
> sdc1 = 1.5 TB
> sdd1 = 1.5 TB (cannot be used during --create, contains still data)
> sde1 = 1 TB
> sdf1 = 1 TB
> sdg1 = 1 TB
> 
> Target: RADI5 with 4.5 TB capacity.
> 
> The normal case would be:
> mdadm -C /dev/md3 --bitmap=internal -l 5 -n 5 /dev/sdc1 /dev/sdd1 
> /dev/sde1 /dev/sdf1 /dev/sdg1
> What I expect: since the first and the second drive are 1.5 TB size the 
> third fouth and fifth drive are treated like 2*1.5 TB, creating a 4.5 TB 
> RAID.

Lolwhat.

> What would really be created: I know here are people that know and not 
> guess : ).

5x1TB RAID5. Lowest common device size across all RAID members is utilized in
an array.

But what you do after that, is you also create a separate 2x0.5TB RAID1 from
the 1.5TB drives' "tails", and join both arrays into a single larger volume using LVM.

The result: 4.5 TB of usable space, with one-drive-loss tolerance (provided by
RAID5 in the first 4 TB, and by RAID1 in the 0.5TB "tail").

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RAID5 with two drive sizes question
  2012-06-05 17:39 ` Roman Mamedov
@ 2012-06-05 19:41   ` Joachim Otahal (privat)
  2012-06-05 19:59     ` Roman Mamedov
  0 siblings, 1 reply; 8+ messages in thread
From: Joachim Otahal (privat) @ 2012-06-05 19:41 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-raid

Roman Mamedov schrieb:
> On Tue, 05 Jun 2012 19:27:53 +0200
> "Joachim Otahal (privat)"<Jou@gmx.net>  wrote:
>
>> Hi,
>> Debian 6.0.4 / superblock 1.2
>> sdc1 = 1.5 TB
>> sdd1 = 1.5 TB (cannot be used during --create, contains still data)
>> sde1 = 1 TB
>> sdf1 = 1 TB
>> sdg1 = 1 TB
>>
>> Target: RADI5 with 4.5 TB capacity.
>>
>> The normal case would be:
>> mdadm -C /dev/md3 --bitmap=internal -l 5 -n 5 /dev/sdc1 /dev/sdd1
>> /dev/sde1 /dev/sdf1 /dev/sdg1
>> What I expect: since the first and the second drive are 1.5 TB size the
>> third fouth and fifth drive are treated like 2*1.5 TB, creating a 4.5 TB
>> RAID.
> Lolwhat.

Hey, there is a reason why I ask, no need to lol.

>> What would really be created: I know here are people that know and not
>> guess : ).
> 5x1TB RAID5. Lowest common device size across all RAID members is utilized in
> an array.
>
> But what you do after that, is you also create a separate 2x0.5TB RAID1 from
> the 1.5TB drives' "tails", and join both arrays into a single larger volume using LVM.
>
> The result: 4.5 TB of usable space, with one-drive-loss tolerance (provided by
> RAID5 in the first 4 TB, and by RAID1 in the 0.5TB "tail").

Thanks for clearing that up. I probably would have noticed when trying 
in a few weeks, but knowing beforehand helps.

To make you lol more, following would work too:
Use only 750GB partitions, use the 3*250 GB loss at the end of each 1 TB 
drive for the fourth 750 GB, and RAID6 those 8*750. Result is 4.5 TB 
with a one-drive-loss tolerance and really bad performance.
I spare you the 500 GB partitions example which result in 4.5 TB with a 
one-drive-loss tolerance and really bad performance.

Jou


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RAID5 with two drive sizes question
  2012-06-05 19:41   ` Joachim Otahal (privat)
@ 2012-06-05 19:59     ` Roman Mamedov
  2012-06-05 20:36       ` Stan Hoeppner
  0 siblings, 1 reply; 8+ messages in thread
From: Roman Mamedov @ 2012-06-05 19:59 UTC (permalink / raw)
  To: Joachim Otahal (privat); +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 858 bytes --]

On Tue, 05 Jun 2012 21:41:39 +0200
"Joachim Otahal (privat)" <Jou@gmx.net> wrote:

> Use only 750GB partitions, use the 3*250 GB loss at the end of each 1 TB 
> drive for the fourth 750 GB, and RAID6 those 8*750. Result is 4.5 TB 
> with a one-drive-loss tolerance and really bad performance.
> I spare you the 500 GB partitions example which result in 4.5 TB with a 
> one-drive-loss tolerance and really bad performance.

Except this would not make any sense even as a thought experiment. You don't
want a configuration where two or more areas of the same physical disk need to
be accessed in parallel for any read or write to the volume. And it's pretty
easy to avoid that.

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RAID5 with two drive sizes question
  2012-06-05 19:59     ` Roman Mamedov
@ 2012-06-05 20:36       ` Stan Hoeppner
  2012-06-05 20:48         ` Joachim Otahal (privat)
  2012-06-06  4:16         ` Roman Mamedov
  0 siblings, 2 replies; 8+ messages in thread
From: Stan Hoeppner @ 2012-06-05 20:36 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Joachim Otahal (privat), linux-raid

On 6/5/2012 2:59 PM, Roman Mamedov wrote:
> On Tue, 05 Jun 2012 21:41:39 +0200
> "Joachim Otahal (privat)" <Jou@gmx.net> wrote:
> 
>> Use only 750GB partitions, use the 3*250 GB loss at the end of each 1 TB 
>> drive for the fourth 750 GB, and RAID6 those 8*750. Result is 4.5 TB 
>> with a one-drive-loss tolerance and really bad performance.
>> I spare you the 500 GB partitions example which result in 4.5 TB with a 
>> one-drive-loss tolerance and really bad performance.
> 
> Except this would not make any sense even as a thought experiment. You don't
> want a configuration where two or more areas of the same physical disk need to
> be accessed in parallel for any read or write to the volume. And it's pretty
> easy to avoid that.

You make a good point but your backing argument is incorrect:  XFS by
design, by default, writes to 4 equal sized regions of a disk in parallel.

The problem here is running multiple RAID arrays, especially of
different RAID levels, on the same physical disk.  Under high IO load
you end up thrashing the heads due to excessive seeking as the access
patterns are very different between the arrays.  In some situations it
may not cause problems.  In others it can.

For a home type server with light IO load you probably won't have any
problems.  For anything with a high IO load, you don't want to do this
type of RAID setup.  Anyone with such an IO load already knows this,
which is why it's typically only hobbyists who would consider using such
a configuration.

-- 
Stan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RAID5 with two drive sizes question
  2012-06-05 20:36       ` Stan Hoeppner
@ 2012-06-05 20:48         ` Joachim Otahal (privat)
  2012-06-06  4:16         ` Roman Mamedov
  1 sibling, 0 replies; 8+ messages in thread
From: Joachim Otahal (privat) @ 2012-06-05 20:48 UTC (permalink / raw)
  To: Mdadm

Stan Hoeppner schrieb:
> On 6/5/2012 2:59 PM, Roman Mamedov wrote:
>> On Tue, 05 Jun 2012 21:41:39 +0200
>> "Joachim Otahal (privat)"<Jou@gmx.net>  wrote:
>>
>>> Use only 750GB partitions, use the 3*250 GB loss at the end of each 1 TB
>>> drive for the fourth 750 GB, and RAID6 those 8*750. Result is 4.5 TB
>>> with a one-drive-loss tolerance and really bad performance.
>>> I spare you the 500 GB partitions example which result in 4.5 TB with a
>>> one-drive-loss tolerance and really bad performance.
>> Except this would not make any sense even as a thought experiment. You don't
>> want a configuration where two or more areas of the same physical disk need to
>> be accessed in parallel for any read or write to the volume. And it's pretty
>> easy to avoid that.
> You make a good point but your backing argument is incorrect:  XFS by
> design, by default, writes to 4 equal sized regions of a disk in parallel.
>
> The problem here is running multiple RAID arrays, especially of
> different RAID levels, on the same physical disk.  Under high IO load
> you end up thrashing the heads due to excessive seeking as the access
> patterns are very different between the arrays.  In some situations it
> may not cause problems.  In others it can.
>
> For a home type server with light IO load you probably won't have any
> problems.  For anything with a high IO load, you don't want to do this
> type of RAID setup.  Anyone with such an IO load already knows this,
> which is why it's typically only hobbyists who would consider using such
> a configuration.
>

Please stop. Next time I use <irony></irony> tags. RAID5 with 1 TB 
packages and appending the remaining 2*500 GB as RAID1 (as suggested by 
Roman Mamedov) is indeed the only sensible way, everything else is nonsense.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RAID5 with two drive sizes question
  2012-06-05 20:36       ` Stan Hoeppner
  2012-06-05 20:48         ` Joachim Otahal (privat)
@ 2012-06-06  4:16         ` Roman Mamedov
  2012-06-07  0:39           ` Stan Hoeppner
  1 sibling, 1 reply; 8+ messages in thread
From: Roman Mamedov @ 2012-06-06  4:16 UTC (permalink / raw)
  To: stan; +Cc: Joachim Otahal (privat), linux-raid

[-- Attachment #1: Type: text/plain, Size: 1600 bytes --]

On Tue, 05 Jun 2012 15:36:29 -0500
Stan Hoeppner <stan@hardwarefreak.com> wrote:

> > Except this would not make any sense even as a thought experiment. You don't
> > want a configuration where two or more areas of the same physical disk need to
> > be accessed in parallel for any read or write to the volume. And it's pretty
> > easy to avoid that.
> 
> You make a good point but your backing argument is incorrect:  XFS by
> design, by default, writes to 4 equal sized regions of a disk in parallel.

I said: "...need to be accessed in parallel for any read or write".

With XFS you mean allocation groups, however I don't think that if you write
any large file sequentially to XFS, it will always cause drive's head to jump
around between four areas because the file is written "in parallel", striped
to four different locations, which is the main problem that we're trying to
avoid.

XFS allocation groups are each a bit like an independent filesystem, to allow
for some CPU- and RAM-access-level parallelization. However spinning devices
and even SSDs can't really read or write quickly enough "in parallel", so
parallel access to different areas of the same device is used in XFS not for
*any read or write*, but only in those cases where that can be beneficial for
performance -- and even then, likely managed carefully either by XFS or by
lower level of I/O schedulers to minimize head movements.

-- 
With respect,
Roman

~~~~~~~~~~~~~~~~~~~~~~~~~~~
"Stallman had a printer,
with code he could not see.
So he began to tinker,
and set the software free."

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: RAID5 with two drive sizes question
  2012-06-06  4:16         ` Roman Mamedov
@ 2012-06-07  0:39           ` Stan Hoeppner
  0 siblings, 0 replies; 8+ messages in thread
From: Stan Hoeppner @ 2012-06-07  0:39 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Joachim Otahal (privat), linux-raid

On 6/5/2012 11:16 PM, Roman Mamedov wrote:
> On Tue, 05 Jun 2012 15:36:29 -0500
> Stan Hoeppner <stan@hardwarefreak.com> wrote:
> 
>>> Except this would not make any sense even as a thought experiment. You don't
>>> want a configuration where two or more areas of the same physical disk need to
>>> be accessed in parallel for any read or write to the volume. And it's pretty
>>> easy to avoid that.
>>
>> You make a good point but your backing argument is incorrect:  XFS by
>> design, by default, writes to 4 equal sized regions of a disk in parallel.
> 
> I said: "...need to be accessed in parallel for any read or write".
> 
> With XFS you mean allocation groups, however I don't think that if you write
> any large file sequentially to XFS, it will always cause drive's head to jump
> around between four areas because the file is written "in parallel", striped
> to four different locations, which is the main problem that we're trying to
> avoid.

It depends on which allocator you use.  Inode32, the default allocator,
can cause a sufficiently large file's blocks to be rotored across all
AGs in parallel.  Inode64 writes one file to one AG.

> XFS allocation groups are each a bit like an independent filesystem, 

This analogy may be somewhat relevant to the Inoe64 allocator, which
stores directory metadata for a file in the same AG where the file is
stored.  But it definitely does not describe the Inode32 allocator
behavior, which stores all metadata in the first 1TB of the FS, and all
file extents above 1TB.  Dependent on the total FS size, obviously.  I
described the maximal design case here where the FS is hard limited to 16TB.

> to allow
> for some CPU- and RAM-access-level parallelization. 

The focus of the concurrency mechanisms in XFS have always been on
maximizing disk array performance and flexibility with very large disk
counts and large numbers of concurrent accesses.  Much of the parallel
CPU/mem locality efficiency is a side effect of this, not the main
target of the efforts, though there have been some of these.

> However spinning devices
> and even SSDs can't really read or write quickly enough "in parallel", so
> parallel access to different areas of the same device is used in XFS not for
> *any read or write*, but only in those cases where that can be beneficial for
> performance 

I just reread that 4 times.  If I'm correctly reading what you stated,
then you are absolutely not correct.  Please read about XFS allocation
group design:

http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure//tmp/en-US/html/Allocation_Groups.html

and behavior of the allocators:

http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/xfs-allocators.html

> -- and even then, likely managed carefully either by XFS or by

XFS is completely unaware of actuator placement or any such parameters
internal to a block device.  It operates above the block layer.  It is
after all a filesystem.

> lower level of I/O schedulers to minimize head movements.

The Linux elevators aren't going to be able to minimize actuator
movement to much degree in this scenario, if/when there is concurrent
full stripe write access in all md arrays on the drives.  This problem
will likely be further exacerbated if XFS is the filesystem used on each
array.  By default mkfs.xfs creates 16 AGs if the underlying device is a
striped md array.  Thus...

If you have 4 drives and 4 md RAID 10 arrays across 4 partitions on the
drives, then format each with mkfs.xfs defaults, you end up with 64 AGs
in 4 XFS filesystems.  With the default Inode32 allocator, you could end
up with 4 concurrent file writes causing 64 actuator seeks per disk.
With average 7.2k SATA drives this takes about 0.43 seconds to write 64
sectors, 32KB, to each drive, almost half a second for each 128KB
written to all arrays concurrently, 1 second to write 256KB across 4
disks.  If you used a single md RAID 10 array, you cut your seek load by
a factor of 4.

Now, there are ways to manually tweak such a setup to reduce the number
of AGs and thus seeks, but this is only one of multiple reasons not use
to multiple striped md arrays on the same set of disks, which was/is my
original argument.

-- 
Stan

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-06-07  0:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-05 17:27 RAID5 with two drive sizes question Joachim Otahal (privat)
2012-06-05 17:39 ` Roman Mamedov
2012-06-05 19:41   ` Joachim Otahal (privat)
2012-06-05 19:59     ` Roman Mamedov
2012-06-05 20:36       ` Stan Hoeppner
2012-06-05 20:48         ` Joachim Otahal (privat)
2012-06-06  4:16         ` Roman Mamedov
2012-06-07  0:39           ` Stan Hoeppner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).