public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS use within multi-threaded apps
@ 2010-10-18 13:42 Angelo McComis
  2010-10-19  1:12 ` Dave Chinner
  2010-10-19  4:24 ` Stewart Smith
  0 siblings, 2 replies; 11+ messages in thread
From: Angelo McComis @ 2010-10-18 13:42 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 1286 bytes --]

All:

Apologies but I am new to this list, and somewhat new to XFS.

I have a use case where I'd like to forward the use of XFS. This is for
large (multi-GB, say anywhere from 5GB to 300GB) individual files, such as
what you'd see under a database's data file / tablespace.

My database vendor (who, coincidentally markets their own filesystems and
operating systems) says that there are certain problems under XFS with
specific mention of corruption issues, if a single root or the metadata
become corrupted, the entire filesystem is gone, and it has performance
issues on a multi-threaded workload, caused by the single root filesystem
for metadata becoming a bottleneck.

This feedback from the vendor is surely taken with a grain of salt as they
have marketing motivations of their own product to consider.

Surely, something like corruption and bottlenecks under heavy load /
multi-threaded use would be a bug that would be addressed, right?

And surely, something like a BTree structure, with a root node, journaled
metadata, etc. would be inherent in other filesystem choices as well, right?

The vendor, in the end, did recommend ext4, but ext4 is not in my mainline
Linux kernel as anything beyond "tech preview" at this point.

Thanks in advance for any/all feedback.

Angelo

[-- Attachment #1.2: Type: text/html, Size: 1487 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS use within multi-threaded apps
  2010-10-18 13:42 XFS use within multi-threaded apps Angelo McComis
@ 2010-10-19  1:12 ` Dave Chinner
  2010-10-19  4:24 ` Stewart Smith
  1 sibling, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2010-10-19  1:12 UTC (permalink / raw)
  To: Angelo McComis; +Cc: xfs

On Mon, Oct 18, 2010 at 09:42:04AM -0400, Angelo McComis wrote:
> All:
> 
> Apologies but I am new to this list, and somewhat new to XFS.
> 
> I have a use case where I'd like to forward the use of XFS. This is for
> large (multi-GB, say anywhere from 5GB to 300GB) individual files, such as
> what you'd see under a database's data file / tablespace.

Yup, perfect use case for XFS.

> My database vendor (who, coincidentally markets their own filesystems and
> operating systems) says that there are certain problems under XFS with
> specific mention of corruption issues, if a single root or the metadata
> become corrupted, the entire filesystem is gone,

Yes, they are right about detected metadata corruption causing a
filesystem _shutdown_, but that does not mean that a metadata
corruption event will cause your entire filesystem to disappear.
Besides, the worst case for _any_ filesystem is that it gets
corrupted beyond repair and you have to restore from backups,
so you still have to plan for this eventuality when dealing with
disaster recovery scenarios.

What they neglect to mention is that XFS has a lot of metadata
corruption detection code, and shutѕ down at the first detection to
prevent the filesystem for being further damaged before a repair
process can be run. Apart from btrfs, XFS has the best run-time
metadata corruption detection of any filesystem in Linux, and even
so there are plans to improve that over the next year of so....

> and it has performance
> issues on a multi-threaded workload, caused by the single root filesystem
> for metadata becoming a bottleneck.

Single root design has nothing to do with performance on
multithreaded workloads. However, XFS really isn't a single-root
design. While it has a single root for the _directory structure_,
the allocation subsystem has a root per allocation group and hence
allocation operations can occur in parallel in XFS.

Hence the only points of serialisation for most operations is either
an individual directory being operated on or the journalling subsystem.

Simultaneous directory modifications are not something that
databases (or any application) do very often, so that point of
serialisation is not something you're ever likely to hit. Besides,
this serialisation is a limitation of the linux VFS, not something
specific to XFS.  Similarly, databases don't do a lot of metadata
operations so the journalling subsytem won't be a bottleneck,
either.

Databases do large amounts of _data IO_ to and from files, and that
is what XFS excels at. Especially if the database is using direct
IO, because then XFS allows concurrent read and write access to the
file so the only limitations in throughput is the storage subsystem
and the database itself...

And FWIW, I've done nothing but improve multithreaded throughput for
metadata operations in XFS for the past few months, so the claims
your vendor is making really have no basis in reality.

> This feedback from the vendor is surely taken with a grain of salt as they
> have marketing motivations of their own product to consider.
> 
> Surely, something like corruption and bottlenecks under heavy load /
> multi-threaded use would be a bug that would be addressed, right?

Yes, absolutely. Please ask the vendor to raise bugs for any issues
they have seen next time they say this to you.

> And surely, something like a BTree structure, with a root node, journaled
> metadata, etc. would be inherent in other filesystem choices as well, right?

Yes.

> The vendor, in the end, did recommend ext4, but ext4 is not in my mainline
> Linux kernel as anything beyond "tech preview" at this point.

Oh, man, I almost spat out my coffee all over my keyboard when I
read that. I needed a good laugh this morning. :)

So what we have here is a classic case of FUD.

Your vendor's recommendation to use ext4 instead of XFS directly
contradicts their message not to use XFS.  ext4 is exactly the same
as XFS in regard to the single root/metadata corruption design
issues, but ext4 does a much worse job of detecting corruption
at runtime compared to XFS.

ext4 is also immature, is pretty much untested in long-term
production environments and has developers that are already
struggling to understand and maintain the code because of the way it
has been implemented.

IOWs, your vendor is recommending a filesystem that is _inferior to XFS_.

That's a classic sales technique - level FUD at a competitor, then
recommend an inferior solution as the _better alternative_. The key
to this technique is that the alternative needs to be something that
the customer will recognise as not being viable for deployment in
business critical systems. So now the customer doesn't want to use
either, and they are ready for the "but we've got this really robust
solution and it only costs $$$" sucker-punch.

My best guess at the reason for such a carefully targeted sales
technique is that their database is just as robust and performs just
as well on XFS as it does on their own solutions that cost mega-$$$.
What other motivation is there for taking such an approach?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS use within multi-threaded apps
  2010-10-18 13:42 XFS use within multi-threaded apps Angelo McComis
  2010-10-19  1:12 ` Dave Chinner
@ 2010-10-19  4:24 ` Stewart Smith
  2010-10-20 12:00   ` Angelo McComis
  1 sibling, 1 reply; 11+ messages in thread
From: Stewart Smith @ 2010-10-19  4:24 UTC (permalink / raw)
  To: Angelo McComis, xfs

On Mon, 18 Oct 2010 09:42:04 -0400, Angelo McComis <angelo@mccomis.com> wrote:
> I have a use case where I'd like to forward the use of XFS. This is for
> large (multi-GB, say anywhere from 5GB to 300GB) individual files, such as
> what you'd see under a database's data file / tablespace.

The general advice from not only those of us who hack on database
systems for a living (and hobby), but those that also run it in
production on more systems than you'll ever be able to count is this for
database system performance tuning (i.e. after making your SQL not
completely nuts)

Step 1) Use XFS.

Nothing, and I do mean nothing comes close to reliability and consistent
performance.

We've seen various benchmarks where X was faster.... most of the
time. Then suddenly your filesystem takes a mutex for 15 seconds and
you're database performance goes down the crapper.

> My database vendor (who, coincidentally markets their own filesystems and
> operating systems) says that there are certain problems under XFS with
> specific mention of corruption issues, if a single root or the metadata
> become corrupted, the entire filesystem is gone, and it has performance
> issues on a multi-threaded workload, caused by the single root filesystem
> for metadata becoming a bottleneck.

XFS has anything but performance problems on multithreaded
workloads. It is *the* best of the Linux filesystems
(actually... possibly any file system anywhere) for multithreaded
IO. You can either benchmark it or go and read the source - check out
the direct IO codepaths and what locks get taken (or rather, what locks
aren't taken).

Generally speaking, most DBMSs don't do much filesystem metadata
operations, the most common being extending the data file. So what you
really care about is multithreaded direct IO performance, scalability
and reliability.

> This feedback from the vendor is surely taken with a grain of salt as they
> have marketing motivations of their own product to consider.

If the vendor is who I suspect, and the filesystem being pushed is
starting with two letters down the alphabet than XFS... I
wouldn't. While a great file system for a number of applications, it is
nowhere near ready for big database IO loads - to the extent that last I
heard it still wasn't being recommended for the various DBs I care about
(at least by the DB support guys).
-- 
Stewart Smith

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS use within multi-threaded apps
  2010-10-19  4:24 ` Stewart Smith
@ 2010-10-20 12:00   ` Angelo McComis
  2010-10-23 19:56     ` Peter Grandi
  0 siblings, 1 reply; 11+ messages in thread
From: Angelo McComis @ 2010-10-20 12:00 UTC (permalink / raw)
  To: Stewart Smith; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 4305 bytes --]

Stewart, and others:

On Tue, Oct 19, 2010 at 12:24 AM, Stewart Smith <stewart@flamingspork.com>wrote:

> On Mon, 18 Oct 2010 09:42:04 -0400, Angelo McComis <angelo@mccomis.com>
> wrote:
> > I have a use case where I'd like to forward the use of XFS. This is for
> > large (multi-GB, say anywhere from 5GB to 300GB) individual files, such
> as
> > what you'd see under a database's data file / tablespace.
>
> The general advice from not only those of us who hack on database
> systems for a living (and hobby), but those that also run it in
> production on more systems than you'll ever be able to count is this for
> database system performance tuning (i.e. after making your SQL not
> completely nuts)
>
> Step 1) Use XFS.
>
> Nothing, and I do mean nothing comes close to reliability and consistent
> performance.
>
> We've seen various benchmarks where X was faster.... most of the
> time. Then suddenly your filesystem takes a mutex for 15 seconds and
> you're database performance goes down the crapper.
>
>
I have been running iozone benchmarks, both head to head ext3 versus XFS,
single LUN, default mkfs options, etc.  And the short answer is XFS wins
hands down on writes and random writes.  Ext3 wins a couple of the other
tests, but not by nearly the margin that XFS wins on the other ones.  That
was the KB/sec tests. The IOPS test was even more telling, and showed XFS
winning by orders of magnitudes on a few tests, and being close or a tie on
the ones that it didn't win.  I took two SAN luns and ran local FS versus
SAN-presented (this is FC, attached to IBM DS8k storage), and ran the tests
there.  When done with the head to head tests, I concatenated the LUNS to
make a RAID 0 / simple 2-stripe set, and ran the tests some more.

I can't say that the numbers make a lot of sense, since it's a 4gbit FC
connection


>  > My database vendor (who, coincidentally markets their own filesystems
> and
> > operating systems) says that there are certain problems under XFS with
> > specific mention of corruption issues, if a single root or the metadata
> > become corrupted, the entire filesystem is gone, and it has performance
> > issues on a multi-threaded workload, caused by the single root filesystem
> > for metadata becoming a bottleneck.
>
> XFS has anything but performance problems on multithreaded
> workloads. It is *the* best of the Linux filesystems
> (actually... possibly any file system anywhere) for multithreaded
> IO. You can either benchmark it or go and read the source - check out
> the direct IO codepaths and what locks get taken (or rather, what locks
> aren't taken).
>
> Generally speaking, most DBMSs don't do much filesystem metadata
> operations, the most common being extending the data file. So what you
> really care about is multithreaded direct IO performance, scalability
> and reliability.
>
> > This feedback from the vendor is surely taken with a grain of salt as
> they
> > have marketing motivations of their own product to consider.
>
> If the vendor is who I suspect, and the filesystem being pushed is
> starting with two letters down the alphabet than XFS... I
> wouldn't. While a great file system for a number of applications, it is
> nowhere near ready for big database IO loads - to the extent that last I
> heard it still wasn't being recommended for the various DBs I care about
> (at least by the DB support guys).
>

Well -  I mentioned it above. Their current recommendation for Linux is to
stick with ext3... and for big file/big IO operations, switch to ext4. And
we had a meeting wherein we had a discussion that went like "well, ext3 has
problems whenever the kernel journal thread wakes up to flush under heavy
I/O, and ext4
is not available to us..."  and Dave Chinner on an earlier post regarding
the maturity level of ext4 and its present status.

I have been able to schedule a meeting with the folks at my vendor, on the
database software side. Aside from the questions I have, points to make,
etc., I'm curious if there's anything else, based on anyone here's input, I
should be asking them? This is a pretty grand opportunity to sit down and
grill them.

Thanks to everyone who participates on this list - you are all a great
resource and a perfect example of what the open source community is all
about.

Regards,
Angelo

[-- Attachment #1.2: Type: text/html, Size: 5283 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS use within multi-threaded apps
  2010-10-20 12:00   ` Angelo McComis
@ 2010-10-23 19:56     ` Peter Grandi
  2010-10-23 20:59       ` Angelo McComis
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Grandi @ 2010-10-23 19:56 UTC (permalink / raw)
  To: Linux XFS

>>> I have a use case where I'd like to forward the use of XFS. This is for
>>> large (multi-GB, say anywhere from 5GB to 300GB) individual files, such as
>>> what you'd see under a database's data file / tablespace.

>> Step 1) Use XFS.
>> Nothing, and I do mean nothing comes close to reliability and consistent
>> performance.

> I have been running iozone benchmarks, [ ... ]

I think that it is exceptionally difficult to get useful results
out of Iozone...

>>> My database vendor (who, coincidentally markets their own
>>> filesystems and operating systems) says that there are
>>> certain problems under XFS with specific mention of
>>> corruption issues, if a single root or the metadata become
>>> corrupted, the entire filesystem is gone,

If that's bad enough it applies to any file system out there
except FAT and Reiser, as they store some metadata with each
block. ZFS and BTRFS may have something similar. But it is not
an issue.

>>> and it has performance issues on a multi-threaded workload,
>>> caused by the single root filesystem for metadata becoming a
>>> bottleneck.

That's actually more of a problem with Lustre, in extreme cases.

>> XFS has anything but performance problems on multithreaded
>> workloads. It is *the* best of the Linux filesystems
>> (actually... possibly any file system anywhere) for
>> multithreaded IO.

That's actually multithreaded IO to the same file, for
multithreaded IO to different files JFS (and allegedly 'ext4') are
also fairly good. 

> Well - I mentioned it above. Their current recommendation for
> Linux is to stick with ext3... and for big file/big IO
> operations, switch to ext4.

That's just about because those are the file systems that are
"qualified", and 'ext3' defaults give the lowest risks in case the
application environment is misdesigned and relies on 'O_PONIES'.

> [ ... ] "well, ext3 has problems whenever the kernel journal
> thread wakes up to flush under heavy I/O,

That actually happens with every file system, and it is one of
several naive misdesigns in the Linux IO subsystem. The default
Linux page cache flusher parameters are often too "loose" by a 1-2
orders of magnitude, and this can cause serious problems. Nedver
mind that the page cache


In any case the Linux page cache itself is also a bit of a joke, a
(hopefully) a DBMS will not use it anyhow, but use direct IO, and
XFS is targeted at direct IO, large file, multistreaming loads.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS use within multi-threaded apps
  2010-10-23 19:56     ` Peter Grandi
@ 2010-10-23 20:59       ` Angelo McComis
  2010-10-23 21:01         ` Angelo McComis
  0 siblings, 1 reply; 11+ messages in thread
From: Angelo McComis @ 2010-10-23 20:59 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux XFS


[-- Attachment #1.1: Type: text/plain, Size: 5056 bytes --]

On Sat, Oct 23, 2010 at 3:56 PM, Peter Grandi <pg_xf2@xf2.for.sabi.co.uk>wrote:

> >>> I have a use case where I'd like to forward the use of XFS. This is for
> >>> large (multi-GB, say anywhere from 5GB to 300GB) individual files, such
> as
> >>> what you'd see under a database's data file / tablespace.
>
> >> Step 1) Use XFS.
> >> Nothing, and I do mean nothing comes close to reliability and consistent
> >> performance.
>
> > I have been running iozone benchmarks, [ ... ]
>
> I think that it is exceptionally difficult to get useful results
> out of Iozone...
>
>
True - the benchmarks themselves don't tell a complete story. Specific to
iozone, I was basically comparing XFS to EXT3, and showing the results
(various record sizes, various file sizes, and various worker thread
counts)... The only true benchmark is to run the application in the way that
is characteristic of how it will be used. Database benchmarks themselves
would vary greatly between use cases: from generic looking up data (random
reads), data warehouse analytics (sequential reads), ETL (sequential reads,
sequential writes), etc.


> >>> My database vendor (who, coincidentally markets their own
> >>> filesystems and operating systems) says that there are
> >>> certain problems under XFS with specific mention of
> >>> corruption issues, if a single root or the metadata become
> >>> corrupted, the entire filesystem is gone,
>
> If that's bad enough it applies to any file system out there
> except FAT and Reiser, as they store some metadata with each
> block. ZFS and BTRFS may have something similar. But it is not
> an issue.
>
> >>> and it has performance issues on a multi-threaded workload,
> >>> caused by the single root filesystem for metadata becoming a
> >>> bottleneck.
>
> That's actually more of a problem with Lustre, in extreme cases.
>
> >> XFS has anything but performance problems on multithreaded
> >> workloads. It is *the* best of the Linux filesystems
> >> (actually... possibly any file system anywhere) for
> >> multithreaded IO.
>
> That's actually multithreaded IO to the same file, for
> multithreaded IO to different files JFS (and allegedly 'ext4') are
> also fairly good.
>
> > Well - I mentioned it above. Their current recommendation for
> > Linux is to stick with ext3... and for big file/big IO
> > operations, switch to ext4.
>
> That's just about because those are the file systems that are
> "qualified", and 'ext3' defaults give the lowest risks in case the
> application environment is misdesigned and relies on 'O_PONIES'.
>
> > [ ... ] "well, ext3 has problems whenever the kernel journal
> > thread wakes up to flush under heavy I/O,
>
> That actually happens with every file system, and it is one of
> several naive misdesigns in the Linux IO subsystem. The default
> Linux page cache flusher parameters are often too "loose" by a 1-2
> orders of magnitude, and this can cause serious problems. Nedver
> mind that the page cache
>
>
> In any case the Linux page cache itself is also a bit of a joke, a
> (hopefully) a DBMS will not use it anyhow, but use direct IO, and
> XFS is targeted at direct IO, large file, multistreaming loads.
>
>
Peter, and others:

Thanks for this great discussion. I appreciate the thought that went into
all of the replies.

In the end, we had a sit down discussion with our vendor.  They admitted
that they "support" XFS, but have very few customers using it (said they can
count them on one hand), and when I pressed them on if it's a technology
limitation, they threw down the gauntlet and said "look, we're giving you
our frank recommendation here. EXT3."  They quoted as having 10+TB databases
running OLTP transactions on XFS, with 4-5GB/sec sustained throughput to the
disk system.  And 20-30TB for data warehouse type operations.  When pressed
about the cache flush issue, they mentioned they use direct IO under ext3,
and it's not an issue in that case.

In doing my research, I searched for references of other Fortune nn-sized
companies who use this DB and use XFS underneath it. I came up empty
handed...  I searched my network for large-ish companies using XFS, and how
they were using it.  I'm not sure if we're bordering on "secret sauce" type
stuff here, but I had an extremely difficult time getting enterprise
references to back up the research I've done.

For our use, we had to opt to follow the vendor recommendation, and it came
down to not wanting to be one of those that they can count on one hand using
XFS with their product.

I'm still confounded by why - when XFS is technically superior in these
cases - is it so obscure?  Are Enterprise Linux guys just not looking this
deep under the covers to uncover performance enhancements like this? Is it
because RedHat didn't to include the XFS tools in the distro until recently,
causing XFS to not be a choice part of it? Are other Linux folks "next,
next, finish..." people when it comes to how they install?  I really don't
get it.

Thanks for all the discussion folks. I hope to put forth other use cases as
the surface.

[-- Attachment #1.2: Type: text/html, Size: 6445 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS use within multi-threaded apps
  2010-10-23 20:59       ` Angelo McComis
@ 2010-10-23 21:01         ` Angelo McComis
  2010-10-24  2:13           ` Stan Hoeppner
  2010-10-24 18:22           ` Michael Monnerie
  0 siblings, 2 replies; 11+ messages in thread
From: Angelo McComis @ 2010-10-23 21:01 UTC (permalink / raw)
  To: Linux XFS


[-- Attachment #1.1: Type: text/plain, Size: 291 bytes --]

Correction:


> They quoted as having 10+TB databases running OLTP transactions on XFS,
> with 4-5GB/sec sustained throughput to the disk system.  And 20-30TB for
> data warehouse type
>

They quoted having 10+TB databases running OLTP on EXT3 with 4-5GB/sec
sustained throughput (not XFS).

[-- Attachment #1.2: Type: text/html, Size: 558 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS use within multi-threaded apps
  2010-10-23 21:01         ` Angelo McComis
@ 2010-10-24  2:13           ` Stan Hoeppner
  2010-10-24 18:22           ` Michael Monnerie
  1 sibling, 0 replies; 11+ messages in thread
From: Stan Hoeppner @ 2010-10-24  2:13 UTC (permalink / raw)
  To: xfs

Angelo McComis put forth on 10/23/2010 4:01 PM:
> Correction:
> 
> 
>> They quoted as having 10+TB databases running OLTP transactions on XFS,
>> with 4-5GB/sec sustained throughput to the disk system.  And 20-30TB for
>> data warehouse type
>>
> 
> They quoted having 10+TB databases running OLTP on EXT3 with 4-5GB/sec
> sustained throughput (not XFS).

Given the data rate above, this sounds like they're quoting a shared
nothing data model on a cluster, not a single host setup.  You are
interested in a single host setup, correct?

Did you ask for a contact at their customer's organization so you could
at least attempt to verify these performance claims?  Or did they state
these numbers are from an internal test system?  With no way for you to
verify these claims, they are, literally, worthless, as this vendor is
asking you to take their claims on faith.  Faith is for religion, not a
business transaction.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS use within multi-threaded apps
  2010-10-23 21:01         ` Angelo McComis
  2010-10-24  2:13           ` Stan Hoeppner
@ 2010-10-24 18:22           ` Michael Monnerie
  2010-10-24 23:08             ` Dave Chinner
  1 sibling, 1 reply; 11+ messages in thread
From: Michael Monnerie @ 2010-10-24 18:22 UTC (permalink / raw)
  To: xfs; +Cc: Angelo McComis


[-- Attachment #1.1: Type: Text/Plain, Size: 1724 bytes --]

On Samstag, 23. Oktober 2010 Angelo McComis wrote:
> They quoted having 10+TB databases running OLTP on EXT3 with
> 4-5GB/sec sustained throughput (not XFS).

Which servers and storage are these? This is nothing you can do with 
"normal" storages. Using 8Gb/s Fibre Channel gives 1GB/s, if you can do 
full speed I/O. So you'd need at least 5 parallel Fibre Channel storages 
running without any overhead. Also, a single server can't do that high 
rates, so there must be several front-end servers. That again means 
their database must be especially organised for that type of load 
(shared nothing or so).

On the other hand, if they have these performance numbers on 100 shared 
serves, it only needs 51MB/s per server of I/O to get 5GB/s total 
throughput. So that is a number without a lot of meaning, as long as you 
don't know which hardware is used.

And: how high would be their throughput when using XFS instead EXT3? ;-)

One question comes to my mind: if they do direct I/O, would there still 
be a lot of difference between XFS and EXT3, performance wise?

And how many companies run around telling which filesystem they use for 
their performance critical business application? Normally they do this 
only for marketing, so they get paid or special prices if they say "with 
this product we are sooo happy".

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Radiointerview zum Thema Spam ******
http://www.it-podcast.at/archiv.html#podcast-100716

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS use within multi-threaded apps
  2010-10-24 18:22           ` Michael Monnerie
@ 2010-10-24 23:08             ` Dave Chinner
  2010-10-25  3:12               ` Stan Hoeppner
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2010-10-24 23:08 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: Angelo McComis, xfs

On Sun, Oct 24, 2010 at 08:22:46PM +0200, Michael Monnerie wrote:
> On Samstag, 23. Oktober 2010 Angelo McComis wrote:
> > They quoted having 10+TB databases running OLTP on EXT3 with
> > 4-5GB/sec sustained throughput (not XFS).
> 
> Which servers and storage are these? This is nothing you can do with 
> "normal" storages. Using 8Gb/s Fibre Channel gives 1GB/s, if you can do 
> full speed I/O. So you'd need at least 5 parallel Fibre Channel storages 
> running without any overhead. Also, a single server can't do that high 
> rates, so there must be several front-end servers. That again means 
> their database must be especially organised for that type of load 
> (shared nothing or so).

Have a look at IBM's TPC-C submission here on RHEL5.2:

http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=108081902

That's got 8x4GB FC connections to 40 storage arrays with 1920 disks
behind them. It uses 80x 24 disk raid0 luns, with each lun split
into 12 data partitions on the outer edge of each lun. That gives
960 data partitions for the benchmark.

Now, this result uses raw devices for this specific benchmark, but
it could easily use files in ext3 filesystems. With 960 ext3
filesystems, you could easily max out the 3.2GB/s of IO that sucker
has as it is <4MB/s per filesystem.

So I'm pretty sure IBM are not quoting a single filesystem
throuhgput result. While you could get that sort of result form a
single filesytsem with XFS, I think it's an order of magnitude
higher than a single ext3 filesystem can acheive....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: XFS use within multi-threaded apps
  2010-10-24 23:08             ` Dave Chinner
@ 2010-10-25  3:12               ` Stan Hoeppner
  0 siblings, 0 replies; 11+ messages in thread
From: Stan Hoeppner @ 2010-10-25  3:12 UTC (permalink / raw)
  To: xfs

Dave Chinner put forth on 10/24/2010 6:08 PM:
> On Sun, Oct 24, 2010 at 08:22:46PM +0200, Michael Monnerie wrote:
>> On Samstag, 23. Oktober 2010 Angelo McComis wrote:
>>> They quoted having 10+TB databases running OLTP on EXT3 with
>>> 4-5GB/sec sustained throughput (not XFS).
>>
>> Which servers and storage are these? This is nothing you can do with 
>> "normal" storages. Using 8Gb/s Fibre Channel gives 1GB/s, if you can do 
>> full speed I/O. So you'd need at least 5 parallel Fibre Channel storages 
>> running without any overhead. Also, a single server can't do that high 
>> rates, so there must be several front-end servers. That again means 
>> their database must be especially organised for that type of load 
>> (shared nothing or so).
> 
> Have a look at IBM's TPC-C submission here on RHEL5.2:
> 
> http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=108081902
> 
> That's got 8x4GB FC connections to 40 storage arrays with 1920 disks
> behind them. It uses 80x 24 disk raid0 luns, with each lun split
> into 12 data partitions on the outer edge of each lun. That gives
> 960 data partitions for the benchmark.

They're reporting 8 _dual port_ 4Gb FC cards, so that's 16 connections.

> Now, this result uses raw devices for this specific benchmark, but
> it could easily use files in ext3 filesystems. With 960 ext3
> filesystems, you could easily max out the 3.2GB/s of IO that sucker
> has as it is <4MB/s per filesystem.

So the max is 6.4GB/s.  The resulting ~8MB/s per filesystem would still
be a piece of cake.

Also, would anyone in their right mind have their DB write/read directly
to raw partitions in a production environment?  I'm not a DB expert, but
this seems ill advised, unless the DB is really designed well for this.

> So I'm pretty sure IBM are not quoting a single filesystem
> throuhgput result. While you could get that sort of result form a
> single filesytsem with XFS, I think it's an order of magnitude
> higher than a single ext3 filesystem can acheive....

I figured they were quoting the OP a cluster result, as I mentioned
previously.  Thanks for pointing out that a single 8-way multicore x86
box can yield this kind of performance today--2 million TPC-C.  Actually
this result is two years old.  Wow.  I haven't paid attention to TPC
results for a while.

Nonetheless, it's really interesting to see an 8 socket 48 core x86 box
churning out numbers almost double that of an HP Itanium 64 socket/core
SuperDome from only 3 years prior.  The cost of the 8-way x86 server is
a fraction of the 64-way Itanium, but storage cost usually doesn't budge
much:

http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=105112801

Did anyone happen to see that SUN, under the Oracle cloak, has finally
started publishing TPC results again?  IIRC SUN had quit publishing
results many many years ago because their E25K with 72 UltraSparcs
couldn't even keep up with an IBM Power box with 16 sockets.  The
current Oracle result for its USparc T2 12 node cluster is pretty
impressive, from a total score at least.  The efficiency is pretty low,
given the 384 core count, and considering the result is only 3.5x that
of the 48 core Xeon IBM xSeries:

http://www.tpc.org/tpcc/results/tpcc_result_detail.asp?id=109110401

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-10-25  3:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-18 13:42 XFS use within multi-threaded apps Angelo McComis
2010-10-19  1:12 ` Dave Chinner
2010-10-19  4:24 ` Stewart Smith
2010-10-20 12:00   ` Angelo McComis
2010-10-23 19:56     ` Peter Grandi
2010-10-23 20:59       ` Angelo McComis
2010-10-23 21:01         ` Angelo McComis
2010-10-24  2:13           ` Stan Hoeppner
2010-10-24 18:22           ` Michael Monnerie
2010-10-24 23:08             ` Dave Chinner
2010-10-25  3:12               ` Stan Hoeppner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox