* XFS use within multi-threaded apps @ 2010-10-18 13:42 Angelo McComis 2010-10-19 1:12 ` Dave Chinner 2010-10-19 4:24 ` Stewart Smith 0 siblings, 2 replies; 11+ messages in thread From: Angelo McComis @ 2010-10-18 13:42 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: text/plain, Size: 1286 bytes --] All: Apologies but I am new to this list, and somewhat new to XFS. I have a use case where I'd like to forward the use of XFS. This is for large (multi-GB, say anywhere from 5GB to 300GB) individual files, such as what you'd see under a database's data file / tablespace. My database vendor (who, coincidentally markets their own filesystems and operating systems) says that there are certain problems under XFS with specific mention of corruption issues, if a single root or the metadata become corrupted, the entire filesystem is gone, and it has performance issues on a multi-threaded workload, caused by the single root filesystem for metadata becoming a bottleneck. This feedback from the vendor is surely taken with a grain of salt as they have marketing motivations of their own product to consider. Surely, something like corruption and bottlenecks under heavy load / multi-threaded use would be a bug that would be addressed, right? And surely, something like a BTree structure, with a root node, journaled metadata, etc. would be inherent in other filesystem choices as well, right? The vendor, in the end, did recommend ext4, but ext4 is not in my mainline Linux kernel as anything beyond "tech preview" at this point. Thanks in advance for any/all feedback. Angelo [-- Attachment #1.2: Type: text/html, Size: 1487 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: XFS use within multi-threaded apps 2010-10-18 13:42 XFS use within multi-threaded apps Angelo McComis @ 2010-10-19 1:12 ` Dave Chinner 2010-10-19 4:24 ` Stewart Smith 1 sibling, 0 replies; 11+ messages in thread From: Dave Chinner @ 2010-10-19 1:12 UTC (permalink / raw) To: Angelo McComis; +Cc: xfs On Mon, Oct 18, 2010 at 09:42:04AM -0400, Angelo McComis wrote: > All: > > Apologies but I am new to this list, and somewhat new to XFS. > > I have a use case where I'd like to forward the use of XFS. This is for > large (multi-GB, say anywhere from 5GB to 300GB) individual files, such as > what you'd see under a database's data file / tablespace. Yup, perfect use case for XFS. > My database vendor (who, coincidentally markets their own filesystems and > operating systems) says that there are certain problems under XFS with > specific mention of corruption issues, if a single root or the metadata > become corrupted, the entire filesystem is gone, Yes, they are right about detected metadata corruption causing a filesystem _shutdown_, but that does not mean that a metadata corruption event will cause your entire filesystem to disappear. Besides, the worst case for _any_ filesystem is that it gets corrupted beyond repair and you have to restore from backups, so you still have to plan for this eventuality when dealing with disaster recovery scenarios. What they neglect to mention is that XFS has a lot of metadata corruption detection code, and shutѕ down at the first detection to prevent the filesystem for being further damaged before a repair process can be run. Apart from btrfs, XFS has the best run-time metadata corruption detection of any filesystem in Linux, and even so there are plans to improve that over the next year of so.... > and it has performance > issues on a multi-threaded workload, caused by the single root filesystem > for metadata becoming a bottleneck. Single root design has nothing to do with performance on multithreaded workloads. However, XFS really isn't a single-root design. While it has a single root for the _directory structure_, the allocation subsystem has a root per allocation group and hence allocation operations can occur in parallel in XFS. Hence the only points of serialisation for most operations is either an individual directory being operated on or the journalling subsystem. Simultaneous directory modifications are not something that databases (or any application) do very often, so that point of serialisation is not something you're ever likely to hit. Besides, this serialisation is a limitation of the linux VFS, not something specific to XFS. Similarly, databases don't do a lot of metadata operations so the journalling subsytem won't be a bottleneck, either. Databases do large amounts of _data IO_ to and from files, and that is what XFS excels at. Especially if the database is using direct IO, because then XFS allows concurrent read and write access to the file so the only limitations in throughput is the storage subsystem and the database itself... And FWIW, I've done nothing but improve multithreaded throughput for metadata operations in XFS for the past few months, so the claims your vendor is making really have no basis in reality. > This feedback from the vendor is surely taken with a grain of salt as they > have marketing motivations of their own product to consider. > > Surely, something like corruption and bottlenecks under heavy load / > multi-threaded use would be a bug that would be addressed, right? Yes, absolutely. Please ask the vendor to raise bugs for any issues they have seen next time they say this to you. > And surely, something like a BTree structure, with a root node, journaled > metadata, etc. would be inherent in other filesystem choices as well, right? Yes. > The vendor, in the end, did recommend ext4, but ext4 is not in my mainline > Linux kernel as anything beyond "tech preview" at this point. Oh, man, I almost spat out my coffee all over my keyboard when I read that. I needed a good laugh this morning. :) So what we have here is a classic case of FUD. Your vendor's recommendation to use ext4 instead of XFS directly contradicts their message not to use XFS. ext4 is exactly the same as XFS in regard to the single root/metadata corruption design issues, but ext4 does a much worse job of detecting corruption at runtime compared to XFS. ext4 is also immature, is pretty much untested in long-term production environments and has developers that are already struggling to understand and maintain the code because of the way it has been implemented. IOWs, your vendor is recommending a filesystem that is _inferior to XFS_. That's a classic sales technique - level FUD at a competitor, then recommend an inferior solution as the _better alternative_. The key to this technique is that the alternative needs to be something that the customer will recognise as not being viable for deployment in business critical systems. So now the customer doesn't want to use either, and they are ready for the "but we've got this really robust solution and it only costs $$$" sucker-punch. My best guess at the reason for such a carefully targeted sales technique is that their database is just as robust and performs just as well on XFS as it does on their own solutions that cost mega-$$$. What other motivation is there for taking such an approach? Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: XFS use within multi-threaded apps 2010-10-18 13:42 XFS use within multi-threaded apps Angelo McComis 2010-10-19 1:12 ` Dave Chinner @ 2010-10-19 4:24 ` Stewart Smith 2010-10-20 12:00 ` Angelo McComis 1 sibling, 1 reply; 11+ messages in thread From: Stewart Smith @ 2010-10-19 4:24 UTC (permalink / raw) To: Angelo McComis, xfs On Mon, 18 Oct 2010 09:42:04 -0400, Angelo McComis <angelo@mccomis.com> wrote: > I have a use case where I'd like to forward the use of XFS. This is for > large (multi-GB, say anywhere from 5GB to 300GB) individual files, such as > what you'd see under a database's data file / tablespace. The general advice from not only those of us who hack on database systems for a living (and hobby), but those that also run it in production on more systems than you'll ever be able to count is this for database system performance tuning (i.e. after making your SQL not completely nuts) Step 1) Use XFS. Nothing, and I do mean nothing comes close to reliability and consistent performance. We've seen various benchmarks where X was faster.... most of the time. Then suddenly your filesystem takes a mutex for 15 seconds and you're database performance goes down the crapper. > My database vendor (who, coincidentally markets their own filesystems and > operating systems) says that there are certain problems under XFS with > specific mention of corruption issues, if a single root or the metadata > become corrupted, the entire filesystem is gone, and it has performance > issues on a multi-threaded workload, caused by the single root filesystem > for metadata becoming a bottleneck. XFS has anything but performance problems on multithreaded workloads. It is *the* best of the Linux filesystems (actually... possibly any file system anywhere) for multithreaded IO. You can either benchmark it or go and read the source - check out the direct IO codepaths and what locks get taken (or rather, what locks aren't taken). Generally speaking, most DBMSs don't do much filesystem metadata operations, the most common being extending the data file. So what you really care about is multithreaded direct IO performance, scalability and reliability. > This feedback from the vendor is surely taken with a grain of salt as they > have marketing motivations of their own product to consider. If the vendor is who I suspect, and the filesystem being pushed is starting with two letters down the alphabet than XFS... I wouldn't. While a great file system for a number of applications, it is nowhere near ready for big database IO loads - to the extent that last I heard it still wasn't being recommended for the various DBs I care about (at least by the DB support guys). -- Stewart Smith _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: XFS use within multi-threaded apps 2010-10-19 4:24 ` Stewart Smith @ 2010-10-20 12:00 ` Angelo McComis 2010-10-23 19:56 ` Peter Grandi 0 siblings, 1 reply; 11+ messages in thread From: Angelo McComis @ 2010-10-20 12:00 UTC (permalink / raw) To: Stewart Smith; +Cc: xfs [-- Attachment #1.1: Type: text/plain, Size: 4305 bytes --] Stewart, and others: On Tue, Oct 19, 2010 at 12:24 AM, Stewart Smith <stewart@flamingspork.com>wrote: > On Mon, 18 Oct 2010 09:42:04 -0400, Angelo McComis <angelo@mccomis.com> > wrote: > > I have a use case where I'd like to forward the use of XFS. This is for > > large (multi-GB, say anywhere from 5GB to 300GB) individual files, such > as > > what you'd see under a database's data file / tablespace. > > The general advice from not only those of us who hack on database > systems for a living (and hobby), but those that also run it in > production on more systems than you'll ever be able to count is this for > database system performance tuning (i.e. after making your SQL not > completely nuts) > > Step 1) Use XFS. > > Nothing, and I do mean nothing comes close to reliability and consistent > performance. > > We've seen various benchmarks where X was faster.... most of the > time. Then suddenly your filesystem takes a mutex for 15 seconds and > you're database performance goes down the crapper. > > I have been running iozone benchmarks, both head to head ext3 versus XFS, single LUN, default mkfs options, etc. And the short answer is XFS wins hands down on writes and random writes. Ext3 wins a couple of the other tests, but not by nearly the margin that XFS wins on the other ones. That was the KB/sec tests. The IOPS test was even more telling, and showed XFS winning by orders of magnitudes on a few tests, and being close or a tie on the ones that it didn't win. I took two SAN luns and ran local FS versus SAN-presented (this is FC, attached to IBM DS8k storage), and ran the tests there. When done with the head to head tests, I concatenated the LUNS to make a RAID 0 / simple 2-stripe set, and ran the tests some more. I can't say that the numbers make a lot of sense, since it's a 4gbit FC connection > > My database vendor (who, coincidentally markets their own filesystems > and > > operating systems) says that there are certain problems under XFS with > > specific mention of corruption issues, if a single root or the metadata > > become corrupted, the entire filesystem is gone, and it has performance > > issues on a multi-threaded workload, caused by the single root filesystem > > for metadata becoming a bottleneck. > > XFS has anything but performance problems on multithreaded > workloads. It is *the* best of the Linux filesystems > (actually... possibly any file system anywhere) for multithreaded > IO. You can either benchmark it or go and read the source - check out > the direct IO codepaths and what locks get taken (or rather, what locks > aren't taken). > > Generally speaking, most DBMSs don't do much filesystem metadata > operations, the most common being extending the data file. So what you > really care about is multithreaded direct IO performance, scalability > and reliability. > > > This feedback from the vendor is surely taken with a grain of salt as > they > > have marketing motivations of their own product to consider. > > If the vendor is who I suspect, and the filesystem being pushed is > starting with two letters down the alphabet than XFS... I > wouldn't. While a great file system for a number of applications, it is > nowhere near ready for big database IO loads - to the extent that last I > heard it still wasn't being recommended for the various DBs I care about > (at least by the DB support guys). > Well - I mentioned it above. Their current recommendation for Linux is to stick with ext3... and for big file/big IO operations, switch to ext4. And we had a meeting wherein we had a discussion that went like "well, ext3 has problems whenever the kernel journal thread wakes up to flush under heavy I/O, and ext4 is not available to us..." and Dave Chinner on an earlier post regarding the maturity level of ext4 and its present status. I have been able to schedule a meeting with the folks at my vendor, on the database software side. Aside from the questions I have, points to make, etc., I'm curious if there's anything else, based on anyone here's input, I should be asking them? This is a pretty grand opportunity to sit down and grill them. Thanks to everyone who participates on this list - you are all a great resource and a perfect example of what the open source community is all about. Regards, Angelo [-- Attachment #1.2: Type: text/html, Size: 5283 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: XFS use within multi-threaded apps 2010-10-20 12:00 ` Angelo McComis @ 2010-10-23 19:56 ` Peter Grandi 2010-10-23 20:59 ` Angelo McComis 0 siblings, 1 reply; 11+ messages in thread From: Peter Grandi @ 2010-10-23 19:56 UTC (permalink / raw) To: Linux XFS >>> I have a use case where I'd like to forward the use of XFS. This is for >>> large (multi-GB, say anywhere from 5GB to 300GB) individual files, such as >>> what you'd see under a database's data file / tablespace. >> Step 1) Use XFS. >> Nothing, and I do mean nothing comes close to reliability and consistent >> performance. > I have been running iozone benchmarks, [ ... ] I think that it is exceptionally difficult to get useful results out of Iozone... >>> My database vendor (who, coincidentally markets their own >>> filesystems and operating systems) says that there are >>> certain problems under XFS with specific mention of >>> corruption issues, if a single root or the metadata become >>> corrupted, the entire filesystem is gone, If that's bad enough it applies to any file system out there except FAT and Reiser, as they store some metadata with each block. ZFS and BTRFS may have something similar. But it is not an issue. >>> and it has performance issues on a multi-threaded workload, >>> caused by the single root filesystem for metadata becoming a >>> bottleneck. That's actually more of a problem with Lustre, in extreme cases. >> XFS has anything but performance problems on multithreaded >> workloads. It is *the* best of the Linux filesystems >> (actually... possibly any file system anywhere) for >> multithreaded IO. That's actually multithreaded IO to the same file, for multithreaded IO to different files JFS (and allegedly 'ext4') are also fairly good. > Well - I mentioned it above. Their current recommendation for > Linux is to stick with ext3... and for big file/big IO > operations, switch to ext4. That's just about because those are the file systems that are "qualified", and 'ext3' defaults give the lowest risks in case the application environment is misdesigned and relies on 'O_PONIES'. > [ ... ] "well, ext3 has problems whenever the kernel journal > thread wakes up to flush under heavy I/O, That actually happens with every file system, and it is one of several naive misdesigns in the Linux IO subsystem. The default Linux page cache flusher parameters are often too "loose" by a 1-2 orders of magnitude, and this can cause serious problems. Nedver mind that the page cache In any case the Linux page cache itself is also a bit of a joke, a (hopefully) a DBMS will not use it anyhow, but use direct IO, and XFS is targeted at direct IO, large file, multistreaming loads. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: XFS use within multi-threaded apps 2010-10-23 19:56 ` Peter Grandi @ 2010-10-23 20:59 ` Angelo McComis 2010-10-23 21:01 ` Angelo McComis 0 siblings, 1 reply; 11+ messages in thread From: Angelo McComis @ 2010-10-23 20:59 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux XFS [-- Attachment #1.1: Type: text/plain, Size: 5056 bytes --] On Sat, Oct 23, 2010 at 3:56 PM, Peter Grandi <pg_xf2@xf2.for.sabi.co.uk>wrote: > >>> I have a use case where I'd like to forward the use of XFS. This is for > >>> large (multi-GB, say anywhere from 5GB to 300GB) individual files, such > as > >>> what you'd see under a database's data file / tablespace. > > >> Step 1) Use XFS. > >> Nothing, and I do mean nothing comes close to reliability and consistent > >> performance. > > > I have been running iozone benchmarks, [ ... ] > > I think that it is exceptionally difficult to get useful results > out of Iozone... > > True - the benchmarks themselves don't tell a complete story. Specific to iozone, I was basically comparing XFS to EXT3, and showing the results (various record sizes, various file sizes, and various worker thread counts)... The only true benchmark is to run the application in the way that is characteristic of how it will be used. Database benchmarks themselves would vary greatly between use cases: from generic looking up data (random reads), data warehouse analytics (sequential reads), ETL (sequential reads, sequential writes), etc. > >>> My database vendor (who, coincidentally markets their own > >>> filesystems and operating systems) says that there are > >>> certain problems under XFS with specific mention of > >>> corruption issues, if a single root or the metadata become > >>> corrupted, the entire filesystem is gone, > > If that's bad enough it applies to any file system out there > except FAT and Reiser, as they store some metadata with each > block. ZFS and BTRFS may have something similar. But it is not > an issue. > > >>> and it has performance issues on a multi-threaded workload, > >>> caused by the single root filesystem for metadata becoming a > >>> bottleneck. > > That's actually more of a problem with Lustre, in extreme cases. > > >> XFS has anything but performance problems on multithreaded > >> workloads. It is *the* best of the Linux filesystems > >> (actually... possibly any file system anywhere) for > >> multithreaded IO. > > That's actually multithreaded IO to the same file, for > multithreaded IO to different files JFS (and allegedly 'ext4') are > also fairly good. > > > Well - I mentioned it above. Their current recommendation for > > Linux is to stick with ext3... and for big file/big IO > > operations, switch to ext4. > > That's just about because those are the file systems that are > "qualified", and 'ext3' defaults give the lowest risks in case the > application environment is misdesigned and relies on 'O_PONIES'. > > > [ ... ] "well, ext3 has problems whenever the kernel journal > > thread wakes up to flush under heavy I/O, > > That actually happens with every file system, and it is one of > several naive misdesigns in the Linux IO subsystem. The default > Linux page cache flusher parameters are often too "loose" by a 1-2 > orders of magnitude, and this can cause serious problems. Nedver > mind that the page cache > > > In any case the Linux page cache itself is also a bit of a joke, a > (hopefully) a DBMS will not use it anyhow, but use direct IO, and > XFS is targeted at direct IO, large file, multistreaming loads. > > Peter, and others: Thanks for this great discussion. I appreciate the thought that went into all of the replies. In the end, we had a sit down discussion with our vendor. They admitted that they "support" XFS, but have very few customers using it (said they can count them on one hand), and when I pressed them on if it's a technology limitation, they threw down the gauntlet and said "look, we're giving you our frank recommendation here. EXT3." They quoted as having 10+TB databases running OLTP transactions on XFS, with 4-5GB/sec sustained throughput to the disk system. And 20-30TB for data warehouse type operations. When pressed about the cache flush issue, they mentioned they use direct IO under ext3, and it's not an issue in that case. In doing my research, I searched for references of other Fortune nn-sized companies who use this DB and use XFS underneath it. I came up empty handed... I searched my network for large-ish companies using XFS, and how they were using it. I'm not sure if we're bordering on "secret sauce" type stuff here, but I had an extremely difficult time getting enterprise references to back up the research I've done. For our use, we had to opt to follow the vendor recommendation, and it came down to not wanting to be one of those that they can count on one hand using XFS with their product. I'm still confounded by why - when XFS is technically superior in these cases - is it so obscure? Are Enterprise Linux guys just not looking this deep under the covers to uncover performance enhancements like this? Is it because RedHat didn't to include the XFS tools in the distro until recently, causing XFS to not be a choice part of it? Are other Linux folks "next, next, finish..." people when it comes to how they install? I really don't get it. Thanks for all the discussion folks. I hope to put forth other use cases as the surface. [-- Attachment #1.2: Type: text/html, Size: 6445 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: XFS use within multi-threaded apps 2010-10-23 20:59 ` Angelo McComis @ 2010-10-23 21:01 ` Angelo McComis 2010-10-24 2:13 ` Stan Hoeppner 2010-10-24 18:22 ` Michael Monnerie 0 siblings, 2 replies; 11+ messages in thread From: Angelo McComis @ 2010-10-23 21:01 UTC (permalink / raw) To: Linux XFS [-- Attachment #1.1: Type: text/plain, Size: 291 bytes --] Correction: > They quoted as having 10+TB databases running OLTP transactions on XFS, > with 4-5GB/sec sustained throughput to the disk system. And 20-30TB for > data warehouse type > They quoted having 10+TB databases running OLTP on EXT3 with 4-5GB/sec sustained throughput (not XFS). [-- Attachment #1.2: Type: text/html, Size: 558 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: XFS use within multi-threaded apps 2010-10-23 21:01 ` Angelo McComis @ 2010-10-24 2:13 ` Stan Hoeppner 2010-10-24 18:22 ` Michael Monnerie 1 sibling, 0 replies; 11+ messages in thread From: Stan Hoeppner @ 2010-10-24 2:13 UTC (permalink / raw) To: xfs Angelo McComis put forth on 10/23/2010 4:01 PM: > Correction: > > >> They quoted as having 10+TB databases running OLTP transactions on XFS, >> with 4-5GB/sec sustained throughput to the disk system. And 20-30TB for >> data warehouse type >> > > They quoted having 10+TB databases running OLTP on EXT3 with 4-5GB/sec > sustained throughput (not XFS). Given the data rate above, this sounds like they're quoting a shared nothing data model on a cluster, not a single host setup. You are interested in a single host setup, correct? Did you ask for a contact at their customer's organization so you could at least attempt to verify these performance claims? Or did they state these numbers are from an internal test system? With no way for you to verify these claims, they are, literally, worthless, as this vendor is asking you to take their claims on faith. Faith is for religion, not a business transaction. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: XFS use within multi-threaded apps 2010-10-23 21:01 ` Angelo McComis 2010-10-24 2:13 ` Stan Hoeppner @ 2010-10-24 18:22 ` Michael Monnerie 2010-10-24 23:08 ` Dave Chinner 1 sibling, 1 reply; 11+ messages in thread From: Michael Monnerie @ 2010-10-24 18:22 UTC (permalink / raw) To: xfs; +Cc: Angelo McComis [-- Attachment #1.1: Type: Text/Plain, Size: 1724 bytes --] On Samstag, 23. Oktober 2010 Angelo McComis wrote: > They quoted having 10+TB databases running OLTP on EXT3 with > 4-5GB/sec sustained throughput (not XFS). Which servers and storage are these? This is nothing you can do with "normal" storages. Using 8Gb/s Fibre Channel gives 1GB/s, if you can do full speed I/O. So you'd need at least 5 parallel Fibre Channel storages running without any overhead. Also, a single server can't do that high rates, so there must be several front-end servers. That again means their database must be especially organised for that type of load (shared nothing or so). On the other hand, if they have these performance numbers on 100 shared serves, it only needs 51MB/s per server of I/O to get 5GB/s total throughput. So that is a number without a lot of meaning, as long as you don't know which hardware is used. And: how high would be their throughput when using XFS instead EXT3? ;-) One question comes to my mind: if they do direct I/O, would there still be a lot of difference between XFS and EXT3, performance wise? And how many companies run around telling which filesystem they use for their performance critical business application? Normally they do this only for marketing, so they get paid or special prices if they say "with this product we are sooo happy". -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services http://proteger.at [gesprochen: Prot-e-schee] Tel: 0660 / 415 65 31 ****** Radiointerview zum Thema Spam ****** http://www.it-podcast.at/archiv.html#podcast-100716 // Wir haben im Moment zwei Häuser zu verkaufen: // http://zmi.at/langegg/ // http://zmi.at/haus2009/ [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: XFS use within multi-threaded apps 2010-10-24 18:22 ` Michael Monnerie @ 2010-10-24 23:08 ` Dave Chinner 2010-10-25 3:12 ` Stan Hoeppner 0 siblings, 1 reply; 11+ messages in thread From: Dave Chinner @ 2010-10-24 23:08 UTC (permalink / raw) To: Michael Monnerie; +Cc: Angelo McComis, xfs On Sun, Oct 24, 2010 at 08:22:46PM +0200, Michael Monnerie wrote: > On Samstag, 23. Oktober 2010 Angelo McComis wrote: > > They quoted having 10+TB databases running OLTP on EXT3 with > > 4-5GB/sec sustained throughput (not XFS). > > Which servers and storage are these? This is nothing you can do with > "normal" storages. Using 8Gb/s Fibre Channel gives 1GB/s, if you can do > full speed I/O. So you'd need at least 5 parallel Fibre Channel storages > running without any overhead. Also, a single server can't do that high > rates, so there must be several front-end servers. That again means > their database must be especially organised for that type of load > (shared nothing or so). Have a look at IBM's TPC-C submission here on RHEL5.2: http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=108081902 That's got 8x4GB FC connections to 40 storage arrays with 1920 disks behind them. It uses 80x 24 disk raid0 luns, with each lun split into 12 data partitions on the outer edge of each lun. That gives 960 data partitions for the benchmark. Now, this result uses raw devices for this specific benchmark, but it could easily use files in ext3 filesystems. With 960 ext3 filesystems, you could easily max out the 3.2GB/s of IO that sucker has as it is <4MB/s per filesystem. So I'm pretty sure IBM are not quoting a single filesystem throuhgput result. While you could get that sort of result form a single filesytsem with XFS, I think it's an order of magnitude higher than a single ext3 filesystem can acheive.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: XFS use within multi-threaded apps 2010-10-24 23:08 ` Dave Chinner @ 2010-10-25 3:12 ` Stan Hoeppner 0 siblings, 0 replies; 11+ messages in thread From: Stan Hoeppner @ 2010-10-25 3:12 UTC (permalink / raw) To: xfs Dave Chinner put forth on 10/24/2010 6:08 PM: > On Sun, Oct 24, 2010 at 08:22:46PM +0200, Michael Monnerie wrote: >> On Samstag, 23. Oktober 2010 Angelo McComis wrote: >>> They quoted having 10+TB databases running OLTP on EXT3 with >>> 4-5GB/sec sustained throughput (not XFS). >> >> Which servers and storage are these? This is nothing you can do with >> "normal" storages. Using 8Gb/s Fibre Channel gives 1GB/s, if you can do >> full speed I/O. So you'd need at least 5 parallel Fibre Channel storages >> running without any overhead. Also, a single server can't do that high >> rates, so there must be several front-end servers. That again means >> their database must be especially organised for that type of load >> (shared nothing or so). > > Have a look at IBM's TPC-C submission here on RHEL5.2: > > http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=108081902 > > That's got 8x4GB FC connections to 40 storage arrays with 1920 disks > behind them. It uses 80x 24 disk raid0 luns, with each lun split > into 12 data partitions on the outer edge of each lun. That gives > 960 data partitions for the benchmark. They're reporting 8 _dual port_ 4Gb FC cards, so that's 16 connections. > Now, this result uses raw devices for this specific benchmark, but > it could easily use files in ext3 filesystems. With 960 ext3 > filesystems, you could easily max out the 3.2GB/s of IO that sucker > has as it is <4MB/s per filesystem. So the max is 6.4GB/s. The resulting ~8MB/s per filesystem would still be a piece of cake. Also, would anyone in their right mind have their DB write/read directly to raw partitions in a production environment? I'm not a DB expert, but this seems ill advised, unless the DB is really designed well for this. > So I'm pretty sure IBM are not quoting a single filesystem > throuhgput result. While you could get that sort of result form a > single filesytsem with XFS, I think it's an order of magnitude > higher than a single ext3 filesystem can acheive.... I figured they were quoting the OP a cluster result, as I mentioned previously. Thanks for pointing out that a single 8-way multicore x86 box can yield this kind of performance today--2 million TPC-C. Actually this result is two years old. Wow. I haven't paid attention to TPC results for a while. Nonetheless, it's really interesting to see an 8 socket 48 core x86 box churning out numbers almost double that of an HP Itanium 64 socket/core SuperDome from only 3 years prior. The cost of the 8-way x86 server is a fraction of the 64-way Itanium, but storage cost usually doesn't budge much: http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=105112801 Did anyone happen to see that SUN, under the Oracle cloak, has finally started publishing TPC results again? IIRC SUN had quit publishing results many many years ago because their E25K with 72 UltraSparcs couldn't even keep up with an IBM Power box with 16 sockets. The current Oracle result for its USparc T2 12 node cluster is pretty impressive, from a total score at least. The efficiency is pretty low, given the 384 core count, and considering the result is only 3.5x that of the 48 core Xeon IBM xSeries: http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=109110401 -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-10-25 3:11 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-10-18 13:42 XFS use within multi-threaded apps Angelo McComis 2010-10-19 1:12 ` Dave Chinner 2010-10-19 4:24 ` Stewart Smith 2010-10-20 12:00 ` Angelo McComis 2010-10-23 19:56 ` Peter Grandi 2010-10-23 20:59 ` Angelo McComis 2010-10-23 21:01 ` Angelo McComis 2010-10-24 2:13 ` Stan Hoeppner 2010-10-24 18:22 ` Michael Monnerie 2010-10-24 23:08 ` Dave Chinner 2010-10-25 3:12 ` Stan Hoeppner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox