From: Theodore Tso <tytso@mit.edu>
To: Xu CanHao <xucanhao@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Ext3 vs NTFS performance
Date: Sat, 5 May 2007 09:45:05 -0400 [thread overview]
Message-ID: <20070505134504.GA21049@thunk.org> (raw)
In-Reply-To: <6ec7a4340705042013r4a78a705s43f07da97ec43569@mail.gmail.com>
On Sat, May 05, 2007 at 11:13:36AM +0800, Xu CanHao wrote:
> On 5 Mai, 10:20, Theodore Tso <t...@mit.edu> wrote:
> >
> >This is being worked on already. XFS has a per-filesystem ioctl, but
> >we want to create a filesystem-independent system call,
> >sys_fallocate(), that would wired into the already existing
> >posix_fallocate() function exported by glibc.
>
> The story told us: an application must look to the file-systems, ext3
> is good at aaa, is not good at bbb; XFS is good at ccc, is not good at
> ddd; reiserfs is good at eee, is not good at fff........
>
> For this scenario, XFS is good at dealing with fragmentation while ext3 not.
That's true. XFS has the ability to do delayed allocations, so that
the blocks don't get allocated until they are written out. Hence, a
workload that writes a pattern which uses random access writes in
strides of 128k, and then goes back to fill them in, will result in
fragmentation given ext3's current block reservation allocation
algorithm --- but, as long as the system isn't under high memory
pressure, XFS will do better in this particular scenario.
Actually, ext3 does have a block reservation system, which will
prevent this scenario if the random access writes are within a range
of 32k or so --- which is enough to protect against the bad effects of
more common random access write patterns, such as those used when
writing out ELF object files, for example. Increasing
EXT3_DEFAULT_RESERVE_BLOCKS by a factor of 4 would adaopt the ext3
block reservation system to this pathalogical workload, and we could
easily add a tunable mount option to change the reservation size used
by ext3. Unfortunately, this could make fragmentation work for other
workloads. So adding delayed allocation to ext4 is a better solution.
But as has already been discussed on this thread, in situations where
the fileserver is under high memory pressure, any filesystem (XFS or
ext4) would still end up allocating blocks out of order, resulting in
fragmentation. Explicit preallocation, as opposed to delayed
allocation, is really the best long-term solution; and in order to do
that, Samba needs to detect this scenario --- which as has been noted,
there appears to be no good reason for the Windows CIFS client (or any
other application)to be doing this, other than perhaps to deliberate
trigger a worst case allocation pattern in ext3 --- and translate it
into a explicit preallocation request.
Regards,
- Ted
next prev parent reply other threads:[~2007-05-05 15:44 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-05 3:13 Ext3 vs NTFS performance Xu CanHao
2007-05-05 13:45 ` Theodore Tso [this message]
[not found] <8gShI-3hY-11@gated-at.bofh.it>
[not found] ` <8h1bh-8sG-11@gated-at.bofh.it>
[not found] ` <8h2Al-280-1@gated-at.bofh.it>
[not found] ` <8hW9y-2Lp-3@gated-at.bofh.it>
2007-05-07 11:21 ` Bodo Eggert
-- strict thread matches above, loose matches on Subject: below --
2007-05-06 1:48 Albert Cahalan
[not found] <8hiYr-2fJ-1@gated-at.bofh.it>
[not found] ` <8huGm-2W4-33@gated-at.bofh.it>
2007-05-05 22:25 ` Bodo Eggert
2007-05-06 5:04 ` Xu CanHao
2007-05-03 3:51 Al Boldi
2007-05-01 20:43 Cabot, Mason B
2007-05-01 21:23 ` Andrew Morton
2007-05-02 12:21 ` Andi Kleen
2007-05-02 16:04 ` Theodore Tso
2007-05-02 18:40 ` Andi Kleen
2007-05-02 19:28 ` Theodore Tso
2007-05-02 16:16 ` Theodore Tso
2007-05-02 18:08 ` Jeremy Allison
2007-05-02 19:34 ` Theodore Tso
2007-05-02 20:38 ` Jeff Garzik
2007-05-02 22:01 ` Theodore Tso
2007-05-02 3:54 ` Gerhard Mack
2007-05-02 15:46 ` David Chinner
2007-05-02 15:44 ` David Chinner
2007-05-02 19:46 ` Chris Mason
2007-05-03 0:15 ` David Chinner
2007-05-03 12:57 ` Chris Mason
2007-05-03 21:14 ` Valerie Henson
2007-05-03 22:40 ` Bernd Eckenfels
2007-05-04 8:12 ` Anton Altaparmakov
2007-05-04 9:46 ` Christoph Hellwig
2007-05-04 14:47 ` Anton Altaparmakov
2007-05-04 15:49 ` Michael Tokarev
2007-05-04 18:41 ` Theodore Tso
2007-05-05 9:59 ` Christoph Hellwig
2007-05-06 20:59 ` Jörn Engel
2007-05-04 12:23 ` Theodore Tso
2007-05-04 19:40 ` Valerie Henson
2007-05-04 18:56 ` Phillip Susi
2007-05-04 19:52 ` Cabot, Mason B
2007-05-07 14:31 ` Phillip Susi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070505134504.GA21049@thunk.org \
--to=tytso@mit.edu \
--cc=linux-kernel@vger.kernel.org \
--cc=xucanhao@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox