* Best practice for large storage?
@ 2013-02-14 17:28 Roy Sigurd Karlsbakk
2013-02-14 17:33 ` Jeff Johnson
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-02-14 17:28 UTC (permalink / raw)
To: Linux RAID
Hi all
It seems we may need some storage for video soon. This is a 20k studen college in Norway, with rather a few on media related studies. Since these students produce rather large amounts of raw material, typically to be stored during the semester, we may need some 50-100TiB, perhaps more. I have setup systems with these amounts of storage earlier on ZFS, but may be using Linux MD for this project. I'm aware of the lack of checksumming, snapshots etc with Linux, but may be using it because of more Linux knowledge amongst the sysadmins here. In such a setup, I guess nearline SAS drives on a SAS expander will be used, and with the amount of storage needed, I won't be using a single RAID-6 (too insecure) or RAID-10 (too expensive) for the lot. In ZFS-land I used smallish VDEVs (~10 drives each) in a large pool.
- Would using LVM on top of RAID-6 give me something similar?
- If so, should I stripe the RAID sets, and again, if striping them, will it be as easy to add new RAID sets as we run out of space?
Thanks, and best regards
roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Best practice for large storage? 2013-02-14 17:28 Best practice for large storage? Roy Sigurd Karlsbakk @ 2013-02-14 17:33 ` Jeff Johnson 2013-02-14 17:39 ` Roy Sigurd Karlsbakk 2013-02-15 2:18 ` Chris Murphy 2013-02-16 7:09 ` Stan Hoeppner 2 siblings, 1 reply; 15+ messages in thread From: Jeff Johnson @ 2013-02-14 17:33 UTC (permalink / raw) To: Roy Sigurd Karlsbakk, Linux RAID Roy, Why not use ZFS on Linux? (http://www.zfsonlinux.org) ZFS on Linux is being merged into the upcoming Lustre 2.4 parallel filesystem release. --Jeff On 2/14/13 9:28 AM, Roy Sigurd Karlsbakk wrote: > Hi all > > It seems we may need some storage for video soon. This is a 20k studen college in Norway, with rather a few on media related studies. Since these students produce rather large amounts of raw material, typically to be stored during the semester, we may need some 50-100TiB, perhaps more. I have setup systems with these amounts of storage earlier on ZFS, but may be using Linux MD for this project. I'm aware of the lack of checksumming, snapshots etc with Linux, but may be using it because of more Linux knowledge amongst the sysadmins here. In such a setup, I guess nearline SAS drives on a SAS expander will be used, and with the amount of storage needed, I won't be using a single RAID-6 (too insecure) or RAID-10 (too expensive) for the lot. In ZFS-land I used smallish VDEVs (~10 drives each) in a large pool. > > - Would using LVM on top of RAID-6 give me something similar? > - If so, should I stripe the RAID sets, and again, if striping them, will it be as easy to add new RAID sets as we run out of space? > > Thanks, and best regards > > roy > -- > Roy Sigurd Karlsbakk > (+47) 98013356 > roy@karlsbakk.net > http://blogg.karlsbakk.net/ > GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt > -- > I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- ------------------------------ Jeff Johnson Co-Founder Aeon Computing jeff.johnson@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 /* New Address */ 4170 Morena Boulevard, Suite D - San Diego, CA 92117 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Best practice for large storage? 2013-02-14 17:33 ` Jeff Johnson @ 2013-02-14 17:39 ` Roy Sigurd Karlsbakk 2013-02-14 17:48 ` Jeff Johnson 0 siblings, 1 reply; 15+ messages in thread From: Roy Sigurd Karlsbakk @ 2013-02-14 17:39 UTC (permalink / raw) To: Jeff Johnson; +Cc: Linux RAID I don't feel like using the fuse version, but is zfsonlinux really stable yet? ----- Opprinnelig melding ----- > Roy, > > Why not use ZFS on Linux? (http://www.zfsonlinux.org) > > ZFS on Linux is being merged into the upcoming Lustre 2.4 parallel > filesystem release. > > --Jeff > > On 2/14/13 9:28 AM, Roy Sigurd Karlsbakk wrote: > > Hi all > > > > It seems we may need some storage for video soon. This is a 20k > > studen college in Norway, with rather a few on media related > > studies. Since these students produce rather large amounts of raw > > material, typically to be stored during the semester, we may need > > some 50-100TiB, perhaps more. I have setup systems with these > > amounts of storage earlier on ZFS, but may be using Linux MD for > > this project. I'm aware of the lack of checksumming, snapshots etc > > with Linux, but may be using it because of more Linux knowledge > > amongst the sysadmins here. In such a setup, I guess nearline SAS > > drives on a SAS expander will be used, and with the amount of > > storage needed, I won't be using a single RAID-6 (too insecure) or > > RAID-10 (too expensive) for the lot. In ZFS-land I used smallish > > VDEVs (~10 drives each) in a large pool. > > > > - Would using LVM on top of RAID-6 give me something similar? > > - If so, should I stripe the RAID sets, and again, if striping > > them, will it be as easy to add new RAID sets as we run out of > > space? > > > > Thanks, and best regards > > > > roy > > -- > > Roy Sigurd Karlsbakk > > (+47) 98013356 > > roy@karlsbakk.net > > http://blogg.karlsbakk.net/ > > GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt > > -- > > I all pedagogikk er det essensielt at pensum presenteres > > intelligibelt. Det er et elementært imperativ for alle pedagoger å > > unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de > > fleste tilfeller eksisterer adekvate og relevante synonymer på > > norsk. > > -- > > To unsubscribe from this list: send the line "unsubscribe > > linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > ------------------------------ > Jeff Johnson > Co-Founder > Aeon Computing > > jeff.johnson@aeoncomputing.com > www.aeoncomputing.com > t: 858-412-3810 x101 f: 858-412-3845 > m: 619-204-9061 > > /* New Address */ > 4170 Morena Boulevard, Suite D - San Diego, CA 92117 -- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 98013356 roy@karlsbakk.net http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Best practice for large storage? 2013-02-14 17:39 ` Roy Sigurd Karlsbakk @ 2013-02-14 17:48 ` Jeff Johnson 2013-02-15 9:51 ` Sebastian Riemer 2013-02-16 5:40 ` Stan Hoeppner 0 siblings, 2 replies; 15+ messages in thread From: Jeff Johnson @ 2013-02-14 17:48 UTC (permalink / raw) To: Roy Sigurd Karlsbakk; +Cc: Linux RAID Stable enough where it is being used at Lawrence Livermore Nat'l Labs on a 55PB Lustre resource. I've been using it on a pre-release Lustre 2.4 and I have not had any issues. On 2/14/13 9:39 AM, Roy Sigurd Karlsbakk wrote: > I don't feel like using the fuse version, but is zfsonlinux really stable yet? > > ----- Opprinnelig melding ----- >> Roy, >> >> Why not use ZFS on Linux? (http://www.zfsonlinux.org) >> >> ZFS on Linux is being merged into the upcoming Lustre 2.4 parallel >> filesystem release. >> >> --Jeff >> >> On 2/14/13 9:28 AM, Roy Sigurd Karlsbakk wrote: >>> Hi all >>> >>> It seems we may need some storage for video soon. This is a 20k >>> studen college in Norway, with rather a few on media related >>> studies. Since these students produce rather large amounts of raw >>> material, typically to be stored during the semester, we may need >>> some 50-100TiB, perhaps more. I have setup systems with these >>> amounts of storage earlier on ZFS, but may be using Linux MD for >>> this project. I'm aware of the lack of checksumming, snapshots etc >>> with Linux, but may be using it because of more Linux knowledge >>> amongst the sysadmins here. In such a setup, I guess nearline SAS >>> drives on a SAS expander will be used, and with the amount of >>> storage needed, I won't be using a single RAID-6 (too insecure) or >>> RAID-10 (too expensive) for the lot. In ZFS-land I used smallish >>> VDEVs (~10 drives each) in a large pool. >>> >>> - Would using LVM on top of RAID-6 give me something similar? >>> - If so, should I stripe the RAID sets, and again, if striping >>> them, will it be as easy to add new RAID sets as we run out of >>> space? >>> >>> Thanks, and best regards >>> >>> roy >>> -- >>> Roy Sigurd Karlsbakk >>> (+47) 98013356 >>> roy@karlsbakk.net >>> http://blogg.karlsbakk.net/ >>> GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt >>> -- >>> I all pedagogikk er det essensielt at pensum presenteres >>> intelligibelt. Det er et elementært imperativ for alle pedagoger å >>> unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de >>> fleste tilfeller eksisterer adekvate og relevante synonymer på >>> norsk. >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> ------------------------------ >> Jeff Johnson >> Co-Founder >> Aeon Computing >> >> jeff.johnson@aeoncomputing.com >> www.aeoncomputing.com >> t: 858-412-3810 x101 f: 858-412-3845 >> m: 619-204-9061 >> >> /* New Address */ >> 4170 Morena Boulevard, Suite D - San Diego, CA 92117 -- ------------------------------ Jeff Johnson Co-Founder Aeon Computing jeff.johnson@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 /* New Address */ 4170 Morena Boulevard, Suite D - San Diego, CA 92117 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Best practice for large storage? 2013-02-14 17:48 ` Jeff Johnson @ 2013-02-15 9:51 ` Sebastian Riemer 2013-02-16 12:48 ` Roy Sigurd Karlsbakk 2013-02-16 5:40 ` Stan Hoeppner 1 sibling, 1 reply; 15+ messages in thread From: Sebastian Riemer @ 2013-02-15 9:51 UTC (permalink / raw) To: Jeff Johnson; +Cc: Roy Sigurd Karlsbakk, Linux RAID On 14.02.2013 18:48, Jeff Johnson wrote: > Stable enough where it is being used at Lawrence Livermore Nat'l Labs on > a 55PB Lustre resource. > > I've been using it on a pre-release Lustre 2.4 and I have not had any > issues. ZFS completely fragments if you've got massive parallel write IO - especially with Solaris 11. You'll get only 2..3 MiB/s after some time as everything is stored completely random then. So if you don't really need these snapshots you shouldn't use ZFS. NILFS is also good for snapshots. > On 2/14/13 9:39 AM, Roy Sigurd Karlsbakk wrote: >> I don't feel like using the fuse version, but is zfsonlinux really >> stable yet? >> >> ----- Opprinnelig melding ----- >>> Roy, >>> >>> Why not use ZFS on Linux? (http://www.zfsonlinux.org) >>> >>> ZFS on Linux is being merged into the upcoming Lustre 2.4 parallel >>> filesystem release. >>> >>> --Jeff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Best practice for large storage? 2013-02-15 9:51 ` Sebastian Riemer @ 2013-02-16 12:48 ` Roy Sigurd Karlsbakk 2013-02-18 10:35 ` Sebastian Riemer 0 siblings, 1 reply; 15+ messages in thread From: Roy Sigurd Karlsbakk @ 2013-02-16 12:48 UTC (permalink / raw) To: Sebastian Riemer; +Cc: Linux RAID, Jeff Johnson > On 14.02.2013 18:48, Jeff Johnson wrote: > > Stable enough where it is being used at Lawrence Livermore Nat'l > > Labs on > > a 55PB Lustre resource. > > > > I've been using it on a pre-release Lustre 2.4 and I have not had > > any > > issues. > > ZFS completely fragments if you've got massive parallel write IO - > especially with Solaris 11. You'll get only 2..3 MiB/s after some time > as everything is stored completely random then. So if you don't really > need these snapshots you shouldn't use ZFS. NILFS is also good for > snapshots. This won't be massive parallel I/O, just a fileserver with a limited amount of users. Also, can you document this claim? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 98013356 roy@karlsbakk.net http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Best practice for large storage? 2013-02-16 12:48 ` Roy Sigurd Karlsbakk @ 2013-02-18 10:35 ` Sebastian Riemer 0 siblings, 0 replies; 15+ messages in thread From: Sebastian Riemer @ 2013-02-18 10:35 UTC (permalink / raw) To: Roy Sigurd Karlsbakk; +Cc: Linux RAID, Jeff Johnson On 16.02.2013 13:48, Roy Sigurd Karlsbakk wrote: >> On 14.02.2013 18:48, Jeff Johnson wrote: >>> Stable enough where it is being used at Lawrence Livermore Nat'l >>> Labs on >>> a 55PB Lustre resource. >>> >>> I've been using it on a pre-release Lustre 2.4 and I have not had >>> any >>> issues. >> >> ZFS completely fragments if you've got massive parallel write IO - >> especially with Solaris 11. You'll get only 2..3 MiB/s after some time >> as everything is stored completely random then. So if you don't really >> need these snapshots you shouldn't use ZFS. NILFS is also good for >> snapshots. > > This won't be massive parallel I/O, just a fileserver with a limited amount of users. Also, can you document this claim? Of cause, we had ZFS in production in our IaaS public cloud. In such a cloud nearly everything is random. Customers create and delete their storage from time to time and some of them do a lot of small writes. It fills up quite fast. We even had no snapshots and we already had the ZIL dedicated on enterprise SSDs. http://thomas.gouverneur.name/2011/06/20110609zfs-fragmentation-issue-examining-the-zil/ http://www.racktopsystems.com/dedicated-zfs-intent-log-aka-slogzil-and-data-fragmentation/ http://www.eall.com.br/blog/?p=2481 http://www.techforce.com.br/news/layout/set/print/linux_blog/zfs_part_4_sustained_random_small_files_sync_write_iops ZFS as block device with COMSTAR exports is really crap. You've got mostly synchronous (and small database) IO. This is why we switched to a Linux storage with LVM (without thin stuff). The customers run their file systems in their VMs anyway. Cheers, Sebastian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Best practice for large storage? 2013-02-14 17:48 ` Jeff Johnson 2013-02-15 9:51 ` Sebastian Riemer @ 2013-02-16 5:40 ` Stan Hoeppner 1 sibling, 0 replies; 15+ messages in thread From: Stan Hoeppner @ 2013-02-16 5:40 UTC (permalink / raw) To: Jeff Johnson; +Cc: Roy Sigurd Karlsbakk, Linux RAID On 2/14/2013 11:48 AM, Jeff Johnson wrote: > Stable enough where it is being used at Lawrence Livermore Nat'l Labs on > a 55PB Lustre resource. That's a tad misleading. LLNL' Sequoia has ZFS striped across three 8+2 hardware RADI6 arrays using 3TB drives. Lustre is then layered atop those. So here ZFS sits atop 72TB raw. It is not scaling to 55PB. Something worth noting in this "if they use it so should you" context is that US gov't computer labs tend to live on the bleeding edge, and have the budget, resources, and personnel on staff to fix anything, including rewriting Lustre and ZFS to fit their needs. The name Donald Becker may be familiar to many here. He wrote a good number of the Linux ethernet device drivers while building Beowulf clusters at NASA. They bought a bunch of hardware, no Linux drivers existed, so he wrote them to enable their hardware. Eventually they made it into mainline. The moral of this story should be obvious. -- Stan ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Best practice for large storage? 2013-02-14 17:28 Best practice for large storage? Roy Sigurd Karlsbakk 2013-02-14 17:33 ` Jeff Johnson @ 2013-02-15 2:18 ` Chris Murphy 2013-02-16 7:09 ` Stan Hoeppner 2 siblings, 0 replies; 15+ messages in thread From: Chris Murphy @ 2013-02-15 2:18 UTC (permalink / raw) To: Roy Sigurd Karlsbakk; +Cc: Linux RAID On Feb 14, 2013, at 10:28 AM, Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote: > This is a 20k studen college > large amounts of raw material > 50-100TiB, perhaps more > I guess nearline SAS drives > I won't be using a single RAID-6 (too insecure) or RAID-10 (too expensive) This could be a case for GlusterFS or Ceph. You might look at those groups and see what they'd suggest for your use case. Even if it doesn't make sense right away, it might make sense sooner than later in which case it's good to have an initial deployment that doesn't make it a hassle to move to Gluster when you're ready. Stan makes a good case for SAS HBA's capable of doing RAID, they're inexpensive, fast, reliable, you get support, and they don't cost really any more than the HBA you need anyway to connect all of these drives. Definitely use XFS for the resulting arrays. Then hand those over to GlusterFS as storage bricks (Ceph has a different arrangement and terms). You can inquire if the GlusterFS NFS client is suitable for this task if the clients are Windows or Mac; or if it's better to setup one or more NFS v4 servers which are themselves using the native GlusterFS client. It likely depends on network bandwidth (for video 10GigE is common), how many clients, etc. Gluster also scales well in performance and capacity, just add more bricks. And you won't need to bust the drive cap in your arrays, or stripe them. As storage gets really big, the risk of non-drive failures increases to the point that needs to be mitigated. And that's what a distributed file system will help you do. Chris Murphy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Best practice for large storage? 2013-02-14 17:28 Best practice for large storage? Roy Sigurd Karlsbakk 2013-02-14 17:33 ` Jeff Johnson 2013-02-15 2:18 ` Chris Murphy @ 2013-02-16 7:09 ` Stan Hoeppner 2 siblings, 0 replies; 15+ messages in thread From: Stan Hoeppner @ 2013-02-16 7:09 UTC (permalink / raw) To: Roy Sigurd Karlsbakk; +Cc: Linux RAID On 2/14/2013 11:28 AM, Roy Sigurd Karlsbakk wrote: > Hi all > > It seems we may need some storage for video soon. This is a 20k studen college in Norway, with rather a few on media related studies. Since these students produce rather large amounts of raw material, typically to be stored during the semester, we may need some 50-100TiB, perhaps more. I have setup systems with these amounts of storage earlier on ZFS, but may be using Linux MD for this project. I'm aware of the lack of checksumming, snapshots etc with Linux, but may be using it because of more Linux knowledge amongst the sysadmins here. In such a setup, I guess nearline SAS drives on a SAS expander will be used, and with the amount of storage needed, I won't be using a single RAID-6 (too insecure) or RAID-10 (too expensive) for the lot. In ZFS-land I used smallish VDEVs (~10 drives each) in a large pool. > > - Would using LVM on top of RAID-6 give me something similar? You would use LVM concatenation or md/linear to assemble the individual RAID6 arrays into a single logical device which you'd format with XFS. > - If so, should I stripe the RAID sets, and again, if striping them, will it be as easy to add new RAID sets as we run out of space? You *could* put a stripe over the RAID6s *IF* you build the system and leave it as is, permanently, as you can never expand a stripe. But even then it's not recommended due to the complexity of determining the proper chunk sizes for the nested stripes, and aligning the filesystem to the resulting device. It's better to create, say, 10+2 RAID6 arrays, and add them to an md/linear array. This linear array is nearly infinitely expandable by adding more identical 10+2 RAID6 arrays. Your chunk size and thus stripe width stays the same as well. The default RAID6 chunk of 512KB is probably fine for large video files as that would yield a 5MB stripe width. When expanding with identical constituent RAID6 arrays, you don't have to touch the XFS stripe alignment configuration, but simply grow the filesystem after adding additional arrays to the md/linear array. The reason I recommend the 10+2 is two fold. First, large video file ingestion works well with a wide RAID6 stripe. Second, because you could start with an LSI 9207-8e and two of something like this chassis: http://www.newegg.com/Product/Product.aspx?Item=N82E16816133047 using 48x 3TB Seagate Constellation (enterprise) SATA drives: http://www.newegg.com/Product/Product.aspx?Item=N82E16822178324 all of which are rather inexpensive compared to Dell, HP, IBM, Fujitsu Siemens, etc, at least here in the US. I know this chassis is available in Switzerland and Germany, but I don't know about Norway. Each chassis holds 24 drives, allowing for two 10+2 RAID6 arrays per chassis, four arrays total. You'd put the four md/RAID6 arrays in one md/linear array, and format it with XFS such as: ~$ mkfs.xfs -d su=512k,sw=10 /dev/md0 This will give you a filesystem with a little under 120TB of net free space with 120 allocation groups evenly distributed over the 4 arrays. All AGs can be written in parallel, yielding a high performance video ingestion, and playback system. Before mounting you would modify fstab to include the inode64 option. Don't even bother attempting to use EXT3/4 on a 120TB filesystem--they'll fall over after some use. JFS will work, but it's not well maintained, hasn't seen meaningful updates in many years, and is slower than XFS in most areas. XFS is the one *nix filesystem that was created and optimized specifically for large files and concurrent high bandwidth streaming IO. See 'man mdadm' 'man mkfs.xfs' 'man mount' and 'man xfs' for more specific information, commands, options, etc. This may be a little more detail than you wanted, but it should give a rough idea of at least one possible way to achieve your goal. -- Stan ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CAH3kUhFbR3coJSwPvqqOGrqcsvoJpdAPgAAiAgc_th1ym33DzA@mail.gmail.com>]
* Re: Best practice for large storage? [not found] <CAH3kUhFbR3coJSwPvqqOGrqcsvoJpdAPgAAiAgc_th1ym33DzA@mail.gmail.com> @ 2013-02-14 18:27 ` Roy Sigurd Karlsbakk 0 siblings, 0 replies; 15+ messages in thread From: Roy Sigurd Karlsbakk @ 2013-02-14 18:27 UTC (permalink / raw) To: Roberto Spadim; +Cc: Jeff Johnson, Linux RAID ----- Opprinnelig melding ----- > why not brtfs or xfs? Well, because btrfs isn't stable and doesn't support RAID-[56] unless you're using the git tree for testing that, which is rather more unstable than the rest, and xfs is just a filesystem that needs to sit on top of a raid of some kind. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 98013356 roy@karlsbakk.net http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CAH3kUhGVt1iyn9tt=2-+f6H++obOGSK3x0pBBPZV8CFUXjp5yw@mail.gmail.com>]
* Re: Best practice for large storage? [not found] <CAH3kUhGVt1iyn9tt=2-+f6H++obOGSK3x0pBBPZV8CFUXjp5yw@mail.gmail.com> @ 2013-02-14 23:23 ` Roy Sigurd Karlsbakk 2013-02-14 23:42 ` Roberto Spadim 0 siblings, 1 reply; 15+ messages in thread From: Roy Sigurd Karlsbakk @ 2013-02-14 23:23 UTC (permalink / raw) To: Roberto Spadim; +Cc: Jeff Johnson, Linux RAID xfs+lvm+mdadm works, but I'm still back to my original question. I don't care about what filesystem is on top, only about the storage underneath. Read the original question again, please ----- Opprinnelig melding ----- xfs+lvm+mdadm don`t work? too complex? 2013/2/14 Roy Sigurd Karlsbakk < roy@karlsbakk.net > ----- Opprinnelig melding ----- > why not brtfs or xfs? Well, because btrfs isn't stable and doesn't support RAID-[56] unless you're using the git tree for testing that, which is rather more unstable than the rest, and xfs is just a filesystem that needs to sit on top of a raid of some kind. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 98013356 roy@karlsbakk.net http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -- Roberto Spadim -- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 98013356 roy@karlsbakk.net http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Best practice for large storage? 2013-02-14 23:23 ` Roy Sigurd Karlsbakk @ 2013-02-14 23:42 ` Roberto Spadim 2013-02-15 1:01 ` Adam Goryachev 0 siblings, 1 reply; 15+ messages in thread From: Roberto Spadim @ 2013-02-14 23:42 UTC (permalink / raw) To: Roy Sigurd Karlsbakk; +Cc: Jeff Johnson, Linux RAID the problem is checksum? i really don't understand why the filesystem is not the problem, and why storage is the 'key' storage -> mdadm + harddisks / raid hardware + harddisks / network storage + hard disks -> here the key is data integrity WITHOUT SILENT DATA LOSS, today i only saw this on enterprise hardware raid controlers + enterprise sas disks lvm + filesystem -> don't have problem increasing storage -> here a filesystem that can grow without problems is mandatory since you want but more disks... the lvm part is to easly work with devices any part that i forgot? 2013/2/14 Roy Sigurd Karlsbakk <roy@karlsbakk.net>: > xfs+lvm+mdadm works, but I'm still back to my original question. I don't care about what filesystem is on top, only about the storage underneath. Read the original question again, please > > ----- Opprinnelig melding ----- > > > > xfs+lvm+mdadm don`t work? too complex? > > > > 2013/2/14 Roy Sigurd Karlsbakk < roy@karlsbakk.net > > > > ----- Opprinnelig melding ----- > >> why not brtfs or xfs? > > Well, because btrfs isn't stable and doesn't support RAID-[56] unless you're using the git tree for testing that, which is rather more unstable than the rest, and xfs is just a filesystem that needs to sit on top of a raid of some kind. > > > > Vennlige hilsener / Best regards > > roy > -- > Roy Sigurd Karlsbakk > (+47) 98013356 > roy@karlsbakk.net > http://blogg.karlsbakk.net/ > GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt > -- > I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. > > > > > -- > Roberto Spadim > > > -- > > Vennlige hilsener / Best regards > > roy > -- > Roy Sigurd Karlsbakk > (+47) 98013356 > roy@karlsbakk.net > http://blogg.karlsbakk.net/ > GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt > -- > I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -- Roberto Spadim -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Best practice for large storage? 2013-02-14 23:42 ` Roberto Spadim @ 2013-02-15 1:01 ` Adam Goryachev 2013-02-15 1:13 ` Roberto Spadim 0 siblings, 1 reply; 15+ messages in thread From: Adam Goryachev @ 2013-02-15 1:01 UTC (permalink / raw) To: Roberto Spadim; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux RAID On 15/02/13 10:42, Roberto Spadim wrote: > the problem is checksum? i really don't understand why the filesystem > is not the problem, and why storage is the 'key' > > storage -> mdadm + harddisks / raid hardware + harddisks / network > storage + hard disks -> here the key is data integrity WITHOUT SILENT > DATA LOSS, today i only saw this on enterprise hardware raid > controlers + enterprise sas disks > > lvm + filesystem -> don't have problem increasing storage -> here a > filesystem that can grow without problems is mandatory since you want > but more disks... the lvm part is to easly work with devices > > any part that i forgot? > > 2013/2/14 Roy Sigurd Karlsbakk <roy@karlsbakk.net>: >> xfs+lvm+mdadm works, but I'm still back to my original question. I don't care about what filesystem is on top, only about the storage underneath. Read the original question again, please I assume the question can be reduced to: 1) You need a very large amount of space (requires a large number of disks) 2) You need to be able to expand that space over time 3) You want decent data redundancy 4) You will have a reasonable amount of concurrent access by multiple users and want decent performance From my readings of the list, it would seem the suggestion is to use RAID6 + concatenation, with around 6 to 8 drives in each RAID6, and use XFS with certain parameters to ensure it balances the directories across multiple groups of the RAID6. Basically, you want to put as many drives into each RAID6 to reduce wasted space, but not too many or else you will suffer a triple drive failure and lose the whole lot. If you did not need to grow the space, then you would use RAID60, and do striping, but I think you can't grow that, although some pages I just read suggest it might be possible to grow a raid0 by converting to raid4 and back again. Another option would be to use LVM to join multiple RAID6's together Don't know if this helps, but hopefully. Regards, Adam -- Adam Goryachev Website Managers Ph: +61 2 8304 0000 adam@websitemanagers.com.au Fax: +61 2 8304 0001 www.websitemanagers.com.au ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Best practice for large storage? 2013-02-15 1:01 ` Adam Goryachev @ 2013-02-15 1:13 ` Roberto Spadim 0 siblings, 0 replies; 15+ messages in thread From: Roberto Spadim @ 2013-02-15 1:13 UTC (permalink / raw) To: Adam Goryachev; +Cc: Roy Sigurd Karlsbakk, Jeff Johnson, Linux RAID the point of performace is you will do a 'per user performace meter' or a 'total system performace' in others words... users will have each one a 'fixed' disk space? or every body will use the disks? a per user space could allow you to do a raid1 (or another raid) for about 100 users?! in other words, 100 users have a total of 320MB/s (sas) disk performace and 1tb of disk space? does your users need a low latency system? did you considered high cache? or ssd cache? if you tell that every body will use the whole system, how you want to share the disks? whould it be stripped or mirrored? stripped for better sequencial reading mirrored for better parallel reading what your system need? 2013/2/14 Adam Goryachev <adam@websitemanagers.com.au>: > On 15/02/13 10:42, Roberto Spadim wrote: >> the problem is checksum? i really don't understand why the filesystem >> is not the problem, and why storage is the 'key' >> >> storage -> mdadm + harddisks / raid hardware + harddisks / network >> storage + hard disks -> here the key is data integrity WITHOUT SILENT >> DATA LOSS, today i only saw this on enterprise hardware raid >> controlers + enterprise sas disks >> >> lvm + filesystem -> don't have problem increasing storage -> here a >> filesystem that can grow without problems is mandatory since you want >> but more disks... the lvm part is to easly work with devices >> >> any part that i forgot? >> >> 2013/2/14 Roy Sigurd Karlsbakk <roy@karlsbakk.net>: >>> xfs+lvm+mdadm works, but I'm still back to my original question. I don't care about what filesystem is on top, only about the storage underneath. Read the original question again, please > > I assume the question can be reduced to: > 1) You need a very large amount of space (requires a large number of disks) > 2) You need to be able to expand that space over time > 3) You want decent data redundancy > 4) You will have a reasonable amount of concurrent access by multiple users and want decent performance > > From my readings of the list, it would seem the suggestion is to use RAID6 + concatenation, with around 6 to 8 drives in each RAID6, and use XFS with certain parameters to ensure it balances the directories across multiple groups of the RAID6. Basically, you want to put as many drives into each RAID6 to reduce wasted space, but not too many or else you will suffer a triple drive failure and lose the whole lot. > > If you did not need to grow the space, then you would use RAID60, and do striping, but I think you can't grow that, although some pages I just read suggest it might be possible to grow a raid0 by converting to raid4 and back again. > > Another option would be to use LVM to join multiple RAID6's together > > Don't know if this helps, but hopefully. > > Regards, > Adam > > -- > Adam Goryachev > Website Managers > Ph: +61 2 8304 0000 adam@websitemanagers.com.au > Fax: +61 2 8304 0001 www.websitemanagers.com.au > -- Roberto Spadim ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-02-18 10:35 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-14 17:28 Best practice for large storage? Roy Sigurd Karlsbakk
2013-02-14 17:33 ` Jeff Johnson
2013-02-14 17:39 ` Roy Sigurd Karlsbakk
2013-02-14 17:48 ` Jeff Johnson
2013-02-15 9:51 ` Sebastian Riemer
2013-02-16 12:48 ` Roy Sigurd Karlsbakk
2013-02-18 10:35 ` Sebastian Riemer
2013-02-16 5:40 ` Stan Hoeppner
2013-02-15 2:18 ` Chris Murphy
2013-02-16 7:09 ` Stan Hoeppner
[not found] <CAH3kUhFbR3coJSwPvqqOGrqcsvoJpdAPgAAiAgc_th1ym33DzA@mail.gmail.com>
2013-02-14 18:27 ` Roy Sigurd Karlsbakk
[not found] <CAH3kUhGVt1iyn9tt=2-+f6H++obOGSK3x0pBBPZV8CFUXjp5yw@mail.gmail.com>
2013-02-14 23:23 ` Roy Sigurd Karlsbakk
2013-02-14 23:42 ` Roberto Spadim
2013-02-15 1:01 ` Adam Goryachev
2013-02-15 1:13 ` Roberto Spadim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).