* RAID5 with two drive sizes question @ 2012-06-05 17:27 Joachim Otahal (privat) 2012-06-05 17:39 ` Roman Mamedov 0 siblings, 1 reply; 8+ messages in thread From: Joachim Otahal (privat) @ 2012-06-05 17:27 UTC (permalink / raw) To: linux-raid Hi, Debian 6.0.4 / superblock 1.2 sdc1 = 1.5 TB sdd1 = 1.5 TB (cannot be used during --create, contains still data) sde1 = 1 TB sdf1 = 1 TB sdg1 = 1 TB Target: RADI5 with 4.5 TB capacity. The normal case would be: mdadm -C /dev/md3 --bitmap=internal -l 5 -n 5 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 What I expect: since the first and the second drive are 1.5 TB size the third fouth and fifth drive are treated like 2*1.5 TB, creating a 4.5 TB RAID. What would really be created: I know here are people that know and not guess : ). What my case actually is: mdadm -C /dev/md3 --bitmap=internal -l 5 -n 5 /dev/sdc1 missing /dev/sde1 /dev/sdf1 /dev/sdg1 Expected: Still create a 4.5 TB since sdc1 is 1.5 TB, though sdd1 is yet missing. Will it work as expected? Then format md3, and copy content of sdd1 (which is yet still /dev/md2) into the raid, then --add /dev/sdd1 to the raid and wait until the rebuild is done. Jou ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 with two drive sizes question 2012-06-05 17:27 RAID5 with two drive sizes question Joachim Otahal (privat) @ 2012-06-05 17:39 ` Roman Mamedov 2012-06-05 19:41 ` Joachim Otahal (privat) 0 siblings, 1 reply; 8+ messages in thread From: Roman Mamedov @ 2012-06-05 17:39 UTC (permalink / raw) To: Joachim Otahal (privat); +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1274 bytes --] On Tue, 05 Jun 2012 19:27:53 +0200 "Joachim Otahal (privat)" <Jou@gmx.net> wrote: > Hi, > Debian 6.0.4 / superblock 1.2 > sdc1 = 1.5 TB > sdd1 = 1.5 TB (cannot be used during --create, contains still data) > sde1 = 1 TB > sdf1 = 1 TB > sdg1 = 1 TB > > Target: RADI5 with 4.5 TB capacity. > > The normal case would be: > mdadm -C /dev/md3 --bitmap=internal -l 5 -n 5 /dev/sdc1 /dev/sdd1 > /dev/sde1 /dev/sdf1 /dev/sdg1 > What I expect: since the first and the second drive are 1.5 TB size the > third fouth and fifth drive are treated like 2*1.5 TB, creating a 4.5 TB > RAID. Lolwhat. > What would really be created: I know here are people that know and not > guess : ). 5x1TB RAID5. Lowest common device size across all RAID members is utilized in an array. But what you do after that, is you also create a separate 2x0.5TB RAID1 from the 1.5TB drives' "tails", and join both arrays into a single larger volume using LVM. The result: 4.5 TB of usable space, with one-drive-loss tolerance (provided by RAID5 in the first 4 TB, and by RAID1 in the 0.5TB "tail"). -- With respect, Roman ~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Stallman had a printer, with code he could not see. So he began to tinker, and set the software free." [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 with two drive sizes question 2012-06-05 17:39 ` Roman Mamedov @ 2012-06-05 19:41 ` Joachim Otahal (privat) 2012-06-05 19:59 ` Roman Mamedov 0 siblings, 1 reply; 8+ messages in thread From: Joachim Otahal (privat) @ 2012-06-05 19:41 UTC (permalink / raw) To: Roman Mamedov; +Cc: linux-raid Roman Mamedov schrieb: > On Tue, 05 Jun 2012 19:27:53 +0200 > "Joachim Otahal (privat)"<Jou@gmx.net> wrote: > >> Hi, >> Debian 6.0.4 / superblock 1.2 >> sdc1 = 1.5 TB >> sdd1 = 1.5 TB (cannot be used during --create, contains still data) >> sde1 = 1 TB >> sdf1 = 1 TB >> sdg1 = 1 TB >> >> Target: RADI5 with 4.5 TB capacity. >> >> The normal case would be: >> mdadm -C /dev/md3 --bitmap=internal -l 5 -n 5 /dev/sdc1 /dev/sdd1 >> /dev/sde1 /dev/sdf1 /dev/sdg1 >> What I expect: since the first and the second drive are 1.5 TB size the >> third fouth and fifth drive are treated like 2*1.5 TB, creating a 4.5 TB >> RAID. > Lolwhat. Hey, there is a reason why I ask, no need to lol. >> What would really be created: I know here are people that know and not >> guess : ). > 5x1TB RAID5. Lowest common device size across all RAID members is utilized in > an array. > > But what you do after that, is you also create a separate 2x0.5TB RAID1 from > the 1.5TB drives' "tails", and join both arrays into a single larger volume using LVM. > > The result: 4.5 TB of usable space, with one-drive-loss tolerance (provided by > RAID5 in the first 4 TB, and by RAID1 in the 0.5TB "tail"). Thanks for clearing that up. I probably would have noticed when trying in a few weeks, but knowing beforehand helps. To make you lol more, following would work too: Use only 750GB partitions, use the 3*250 GB loss at the end of each 1 TB drive for the fourth 750 GB, and RAID6 those 8*750. Result is 4.5 TB with a one-drive-loss tolerance and really bad performance. I spare you the 500 GB partitions example which result in 4.5 TB with a one-drive-loss tolerance and really bad performance. Jou ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 with two drive sizes question 2012-06-05 19:41 ` Joachim Otahal (privat) @ 2012-06-05 19:59 ` Roman Mamedov 2012-06-05 20:36 ` Stan Hoeppner 0 siblings, 1 reply; 8+ messages in thread From: Roman Mamedov @ 2012-06-05 19:59 UTC (permalink / raw) To: Joachim Otahal (privat); +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 858 bytes --] On Tue, 05 Jun 2012 21:41:39 +0200 "Joachim Otahal (privat)" <Jou@gmx.net> wrote: > Use only 750GB partitions, use the 3*250 GB loss at the end of each 1 TB > drive for the fourth 750 GB, and RAID6 those 8*750. Result is 4.5 TB > with a one-drive-loss tolerance and really bad performance. > I spare you the 500 GB partitions example which result in 4.5 TB with a > one-drive-loss tolerance and really bad performance. Except this would not make any sense even as a thought experiment. You don't want a configuration where two or more areas of the same physical disk need to be accessed in parallel for any read or write to the volume. And it's pretty easy to avoid that. -- With respect, Roman ~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Stallman had a printer, with code he could not see. So he began to tinker, and set the software free." [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 with two drive sizes question 2012-06-05 19:59 ` Roman Mamedov @ 2012-06-05 20:36 ` Stan Hoeppner 2012-06-05 20:48 ` Joachim Otahal (privat) 2012-06-06 4:16 ` Roman Mamedov 0 siblings, 2 replies; 8+ messages in thread From: Stan Hoeppner @ 2012-06-05 20:36 UTC (permalink / raw) To: Roman Mamedov; +Cc: Joachim Otahal (privat), linux-raid On 6/5/2012 2:59 PM, Roman Mamedov wrote: > On Tue, 05 Jun 2012 21:41:39 +0200 > "Joachim Otahal (privat)" <Jou@gmx.net> wrote: > >> Use only 750GB partitions, use the 3*250 GB loss at the end of each 1 TB >> drive for the fourth 750 GB, and RAID6 those 8*750. Result is 4.5 TB >> with a one-drive-loss tolerance and really bad performance. >> I spare you the 500 GB partitions example which result in 4.5 TB with a >> one-drive-loss tolerance and really bad performance. > > Except this would not make any sense even as a thought experiment. You don't > want a configuration where two or more areas of the same physical disk need to > be accessed in parallel for any read or write to the volume. And it's pretty > easy to avoid that. You make a good point but your backing argument is incorrect: XFS by design, by default, writes to 4 equal sized regions of a disk in parallel. The problem here is running multiple RAID arrays, especially of different RAID levels, on the same physical disk. Under high IO load you end up thrashing the heads due to excessive seeking as the access patterns are very different between the arrays. In some situations it may not cause problems. In others it can. For a home type server with light IO load you probably won't have any problems. For anything with a high IO load, you don't want to do this type of RAID setup. Anyone with such an IO load already knows this, which is why it's typically only hobbyists who would consider using such a configuration. -- Stan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 with two drive sizes question 2012-06-05 20:36 ` Stan Hoeppner @ 2012-06-05 20:48 ` Joachim Otahal (privat) 2012-06-06 4:16 ` Roman Mamedov 1 sibling, 0 replies; 8+ messages in thread From: Joachim Otahal (privat) @ 2012-06-05 20:48 UTC (permalink / raw) To: Mdadm Stan Hoeppner schrieb: > On 6/5/2012 2:59 PM, Roman Mamedov wrote: >> On Tue, 05 Jun 2012 21:41:39 +0200 >> "Joachim Otahal (privat)"<Jou@gmx.net> wrote: >> >>> Use only 750GB partitions, use the 3*250 GB loss at the end of each 1 TB >>> drive for the fourth 750 GB, and RAID6 those 8*750. Result is 4.5 TB >>> with a one-drive-loss tolerance and really bad performance. >>> I spare you the 500 GB partitions example which result in 4.5 TB with a >>> one-drive-loss tolerance and really bad performance. >> Except this would not make any sense even as a thought experiment. You don't >> want a configuration where two or more areas of the same physical disk need to >> be accessed in parallel for any read or write to the volume. And it's pretty >> easy to avoid that. > You make a good point but your backing argument is incorrect: XFS by > design, by default, writes to 4 equal sized regions of a disk in parallel. > > The problem here is running multiple RAID arrays, especially of > different RAID levels, on the same physical disk. Under high IO load > you end up thrashing the heads due to excessive seeking as the access > patterns are very different between the arrays. In some situations it > may not cause problems. In others it can. > > For a home type server with light IO load you probably won't have any > problems. For anything with a high IO load, you don't want to do this > type of RAID setup. Anyone with such an IO load already knows this, > which is why it's typically only hobbyists who would consider using such > a configuration. > Please stop. Next time I use <irony></irony> tags. RAID5 with 1 TB packages and appending the remaining 2*500 GB as RAID1 (as suggested by Roman Mamedov) is indeed the only sensible way, everything else is nonsense. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 with two drive sizes question 2012-06-05 20:36 ` Stan Hoeppner 2012-06-05 20:48 ` Joachim Otahal (privat) @ 2012-06-06 4:16 ` Roman Mamedov 2012-06-07 0:39 ` Stan Hoeppner 1 sibling, 1 reply; 8+ messages in thread From: Roman Mamedov @ 2012-06-06 4:16 UTC (permalink / raw) To: stan; +Cc: Joachim Otahal (privat), linux-raid [-- Attachment #1: Type: text/plain, Size: 1600 bytes --] On Tue, 05 Jun 2012 15:36:29 -0500 Stan Hoeppner <stan@hardwarefreak.com> wrote: > > Except this would not make any sense even as a thought experiment. You don't > > want a configuration where two or more areas of the same physical disk need to > > be accessed in parallel for any read or write to the volume. And it's pretty > > easy to avoid that. > > You make a good point but your backing argument is incorrect: XFS by > design, by default, writes to 4 equal sized regions of a disk in parallel. I said: "...need to be accessed in parallel for any read or write". With XFS you mean allocation groups, however I don't think that if you write any large file sequentially to XFS, it will always cause drive's head to jump around between four areas because the file is written "in parallel", striped to four different locations, which is the main problem that we're trying to avoid. XFS allocation groups are each a bit like an independent filesystem, to allow for some CPU- and RAM-access-level parallelization. However spinning devices and even SSDs can't really read or write quickly enough "in parallel", so parallel access to different areas of the same device is used in XFS not for *any read or write*, but only in those cases where that can be beneficial for performance -- and even then, likely managed carefully either by XFS or by lower level of I/O schedulers to minimize head movements. -- With respect, Roman ~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Stallman had a printer, with code he could not see. So he began to tinker, and set the software free." [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: RAID5 with two drive sizes question 2012-06-06 4:16 ` Roman Mamedov @ 2012-06-07 0:39 ` Stan Hoeppner 0 siblings, 0 replies; 8+ messages in thread From: Stan Hoeppner @ 2012-06-07 0:39 UTC (permalink / raw) To: Roman Mamedov; +Cc: Joachim Otahal (privat), linux-raid On 6/5/2012 11:16 PM, Roman Mamedov wrote: > On Tue, 05 Jun 2012 15:36:29 -0500 > Stan Hoeppner <stan@hardwarefreak.com> wrote: > >>> Except this would not make any sense even as a thought experiment. You don't >>> want a configuration where two or more areas of the same physical disk need to >>> be accessed in parallel for any read or write to the volume. And it's pretty >>> easy to avoid that. >> >> You make a good point but your backing argument is incorrect: XFS by >> design, by default, writes to 4 equal sized regions of a disk in parallel. > > I said: "...need to be accessed in parallel for any read or write". > > With XFS you mean allocation groups, however I don't think that if you write > any large file sequentially to XFS, it will always cause drive's head to jump > around between four areas because the file is written "in parallel", striped > to four different locations, which is the main problem that we're trying to > avoid. It depends on which allocator you use. Inode32, the default allocator, can cause a sufficiently large file's blocks to be rotored across all AGs in parallel. Inode64 writes one file to one AG. > XFS allocation groups are each a bit like an independent filesystem, This analogy may be somewhat relevant to the Inoe64 allocator, which stores directory metadata for a file in the same AG where the file is stored. But it definitely does not describe the Inode32 allocator behavior, which stores all metadata in the first 1TB of the FS, and all file extents above 1TB. Dependent on the total FS size, obviously. I described the maximal design case here where the FS is hard limited to 16TB. > to allow > for some CPU- and RAM-access-level parallelization. The focus of the concurrency mechanisms in XFS have always been on maximizing disk array performance and flexibility with very large disk counts and large numbers of concurrent accesses. Much of the parallel CPU/mem locality efficiency is a side effect of this, not the main target of the efforts, though there have been some of these. > However spinning devices > and even SSDs can't really read or write quickly enough "in parallel", so > parallel access to different areas of the same device is used in XFS not for > *any read or write*, but only in those cases where that can be beneficial for > performance I just reread that 4 times. If I'm correctly reading what you stated, then you are absolutely not correct. Please read about XFS allocation group design: http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure//tmp/en-US/html/Allocation_Groups.html and behavior of the allocators: http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/xfs-allocators.html > -- and even then, likely managed carefully either by XFS or by XFS is completely unaware of actuator placement or any such parameters internal to a block device. It operates above the block layer. It is after all a filesystem. > lower level of I/O schedulers to minimize head movements. The Linux elevators aren't going to be able to minimize actuator movement to much degree in this scenario, if/when there is concurrent full stripe write access in all md arrays on the drives. This problem will likely be further exacerbated if XFS is the filesystem used on each array. By default mkfs.xfs creates 16 AGs if the underlying device is a striped md array. Thus... If you have 4 drives and 4 md RAID 10 arrays across 4 partitions on the drives, then format each with mkfs.xfs defaults, you end up with 64 AGs in 4 XFS filesystems. With the default Inode32 allocator, you could end up with 4 concurrent file writes causing 64 actuator seeks per disk. With average 7.2k SATA drives this takes about 0.43 seconds to write 64 sectors, 32KB, to each drive, almost half a second for each 128KB written to all arrays concurrently, 1 second to write 256KB across 4 disks. If you used a single md RAID 10 array, you cut your seek load by a factor of 4. Now, there are ways to manually tweak such a setup to reduce the number of AGs and thus seeks, but this is only one of multiple reasons not use to multiple striped md arrays on the same set of disks, which was/is my original argument. -- Stan ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-06-07 0:39 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-06-05 17:27 RAID5 with two drive sizes question Joachim Otahal (privat) 2012-06-05 17:39 ` Roman Mamedov 2012-06-05 19:41 ` Joachim Otahal (privat) 2012-06-05 19:59 ` Roman Mamedov 2012-06-05 20:36 ` Stan Hoeppner 2012-06-05 20:48 ` Joachim Otahal (privat) 2012-06-06 4:16 ` Roman Mamedov 2012-06-07 0:39 ` Stan Hoeppner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).