From: tytso@mit.edu
To: Justin Piszcz <jpiszcz@lucidpixels.com>
Cc: Eric Sandeen <sandeen@redhat.com>,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
Alan Piszcz <ap@solarrain.com>
Subject: Re: EXT4 is ~2X as slow as XFS (593MB/s vs 304MB/s) for writes?
Date: Sun, 28 Feb 2010 00:42:40 -0500 [thread overview]
Message-ID: <20100228054240.GE14646@thunk.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1002270634200.17433@p34.internal.lan>
On Sat, Feb 27, 2010 at 06:36:37AM -0500, Justin Piszcz wrote:
>
> I still would like to know however, why 350MiB/s seems to be the maximum
> performance I can get from two different md raids (that easily do 600MiB/s
> with XFS).
Can you run "filefrag -v <filename>" on the large file you created
using dd? Part of the problem may be the block allocator simply not
being well optimized super large writes. To be honest, that's not
something we've tried (at all) to optimize, mainly because for most
users of ext4 they're more interested in much more reasonable sized
files, and we only have so many hours in a day to hack on ext4. :-)
XFS in contrast has in the past had plenty of paying customers
interested in writing really large scientific data sets, so this is
something Irix *has* spent time optimizing.
As far as I know none of the ext4 developers' day jobs are currently
focused on really large files using ext4. Some of us do use ext4 to
support really large files, but it's via some kind of cluster or
parallel file system layered on top of ext4 (i.e., Sun/Clusterfs
Lustre File Systems, or Google's GFS) --- and so what gets actually
stored in ext4 isn't a single 10-20 gigabyte file.
I'm saying this not as an excuse; but it's an explanation for why no
one has really noticed this performance problem until you brought it
up. I'd like to see ext4 be a good general purpose file system, which
includes handling the really big files stored in a single system. But
it's just not something we've tried optimizing yet.
So if you can gather some data, such as the filefrag information, that
would be a great first step. Something else that would be useful is
gathering blktrace information, so we can see how we are scheduling
the writes and whether we have something bad going on there. I
wouldn't be surprised if there is some stupidity going on in the
generic FS/MM writeback code which is throttling us, and which XFS has
worked around. Ext4 has worked around some writeback brain-damage
already, but I've been focused on much smaller files (files in the
tens or hundreds megabytes) since that's what I tend to use much more
frequently.
It's great to see that you're really interested in this; if you're
willing to do some investigative work, hopefully it's something we can
address.
Best Regards,
- Ted
P.S. I'm a bit unclear regarding your comment about "-o nodelalloc"
in one of your earlier threads. Does using nodelalloc actually speeds
things up? There were a bunch of numbers being thrown around, and in
some configurations I thought you were getting around 300 MB/s without
using nodelalloc? Or am I misunderstanding your numbers and what
configuratoins you used with each test run?
If nodelalloc is actually speeding things up, then we almost certainly
have some kind of writeback problem. So filefrag and blktrace are
definitely the tools we need to look at to understand what is going
on.
next prev parent reply other threads:[~2010-02-28 5:42 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-27 0:31 EXT4 is ~2X as slow as XFS (593MB/s vs 304MB/s) for writes? Justin Piszcz
2010-02-27 0:46 ` Dmitry Monakhov
2010-02-27 1:05 ` Justin Piszcz
2010-02-28 0:56 ` Asdo
2010-02-28 9:59 ` Justin Piszcz
2010-02-27 0:51 ` Eric Sandeen
2010-02-27 1:08 ` Justin Piszcz
2010-02-27 1:12 ` Eric Sandeen
2010-02-27 1:28 ` Eric Sandeen
2010-02-27 10:14 ` Justin Piszcz
2010-02-27 10:51 ` Justin Piszcz
2010-02-27 11:09 ` Justin Piszcz
2010-02-27 11:36 ` Justin Piszcz
2010-02-28 5:42 ` tytso [this message]
2010-02-28 14:55 ` Justin Piszcz
2010-03-01 8:39 ` Andreas Dilger
2010-03-01 9:21 ` Justin Piszcz
2010-03-01 14:48 ` Michael Tokarev
2010-03-01 15:07 ` Justin Piszcz
2010-03-01 16:15 ` Eric Sandeen
2010-02-28 23:50 ` Dave Chinner
2010-03-02 0:08 ` Eric Sandeen
2010-03-02 0:37 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100228054240.GE14646@thunk.org \
--to=tytso@mit.edu \
--cc=ap@solarrain.com \
--cc=jpiszcz@lucidpixels.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sandeen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox