From: Fredrick <fjohnber@zoho.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Ric Wheeler <rwheeler@redhat.com>,
linux-ext4@vger.kernel.org, Andreas Dilger <adilger@dilger.ca>,
wenqing.lz@taobao.com
Subject: Re: ext4_fallocate
Date: Tue, 26 Jun 2012 11:06:02 -0700 [thread overview]
Message-ID: <4FE9FA0A.8010708@zoho.com> (raw)
In-Reply-To: <20120626173050.GA6745@thunk.org>
On 06/26/2012 10:30 AM, Theodore Ts'o wrote:
> On Tue, Jun 26, 2012 at 09:13:35AM -0400, Ric Wheeler wrote:
>>
>> Has anyone made progress digging into the performance impact of
>> running without this patch? We should definitely see if there is
>> some low hanging fruit there, especially given that XFS does not
>> seem to suffer such a huge hit.
>
> I just haven't had time, sorry. It's so much easier to run with the
> patch. :-)
>
> Part of the problem certainly caused by the fact that ext4 is using
> physical block journaling instead of logical journalling. But we see
> the problem in no-journal mode as well. I think part of the problem
> is simply that many of the workloads where people are doing this, they
> also care about robustness after power failures, and if you are doing
> random writes into uninitialized space, with fsyncs in-between, you
> are basically guaranteed a 2x expansion in the number of writes you
> need to do to the system.
>
Even our workload is same as above. Our programs write a chunk
and do fysnc for robustness. This happens repeatedly
on the file as the program pushes more data on the disk.
> One other thing which we *have* seen is that we need to do a better
> job with extent merging; if you run without this patch, and you run
> with fio in AIO mode where you are doing tons and tons of random
> writes into uninitialized space, you can end up fragmenting the extent
> tree very badly. So fixing this would certainly help.
>
>> Opening this security exposure is still something that is clearly a
>> hack and best avoided if we can fix the root cause :)
>
> See Linus's recent rant about how security arguments made by
> theoreticians very often end up getting trumped by practical matters.
> If you are running a daemon, whether it is a user-mode cluster file
> system, or a database server, where it is (a) fundamentally trusted,
> and (b) doing its own user-space checksuming and its own guarantees to
> never return uninitialized data, even if we fix all potential
> problems, we *still* can be reducing the number of random writes ---
> and on a fully loaded system, we're guaranteed to be seek-constrained,
> so each random write to update fs metadata means that you're burning
> 0.5% of your 200 seeks/second on your 3TB disk (where previously you
> had half a dozen 500gig disks each with 200 seeks/second).
>
I can see the performance degradation on SSDs too, though the percentage
is less compared to SATA.
> I agree with you that it would be nice to look into this further, and
> optimizing our extent merging is definitely on the hot list of
> perofrmance improvements to look at. But people who are using ext4 as
> back-end database servers or cluster file system servers and who are
> interested in wringing out every last percentage of performace are
> going to be interested in this technique, no matter what we do. If
> you have Sagans and Sagans of servers all over the world, even a tenth
> of a percentage point performance improvement can easily translate
> into big dollars.
>
Sailing the same boat. :)
> - Ted
>
-Fredrick
next prev parent reply other threads:[~2012-06-26 18:08 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-25 6:42 ext4_fallocate Fredrick
2012-06-25 7:33 ` ext4_fallocate Andreas Dilger
2012-06-28 15:12 ` ext4_fallocate Phillip Susi
2012-06-28 15:23 ` ext4_fallocate Eric Sandeen
2012-06-25 8:51 ` ext4_fallocate Zheng Liu
2012-06-25 19:04 ` ext4_fallocate Fredrick
2012-06-25 19:17 ` ext4_fallocate Theodore Ts'o
2012-06-26 1:23 ` ext4_fallocate Fredrick
2012-06-26 13:13 ` ext4_fallocate Ric Wheeler
2012-06-26 17:30 ` ext4_fallocate Theodore Ts'o
2012-06-26 18:06 ` Fredrick [this message]
2012-06-26 18:21 ` ext4_fallocate Ric Wheeler
2012-06-26 18:57 ` ext4_fallocate Ted Ts'o
2012-06-26 19:22 ` ext4_fallocate Ric Wheeler
2012-06-26 18:05 ` ext4_fallocate Fredrick
2012-06-26 18:59 ` ext4_fallocate Ted Ts'o
2012-06-26 19:30 ` ext4_fallocate Ric Wheeler
2012-06-26 19:57 ` ext4_fallocate Eric Sandeen
2012-06-26 20:44 ` ext4_fallocate Eric Sandeen
2012-06-27 15:14 ` ext4_fallocate Eric Sandeen
2012-06-27 19:30 ` ext4_fallocate Theodore Ts'o
2012-06-27 23:02 ` ext4_fallocate Eric Sandeen
2012-06-28 11:27 ` ext4_fallocate Ric Wheeler
2012-06-29 19:02 ` ext4_fallocate Andreas Dilger
2012-07-02 3:03 ` ext4_fallocate Zheng Liu
2012-06-28 12:48 ` ext4_fallocate Theodore Ts'o
2012-07-02 3:16 ` ext4_fallocate Zheng Liu
2012-07-02 16:33 ` ext4_fallocate Eric Sandeen
2012-07-02 17:44 ` ext4_fallocate Jan Kara
2012-07-02 17:48 ` ext4_fallocate Ric Wheeler
2012-07-03 17:41 ` ext4_fallocate Zheng Liu
2012-07-03 17:57 ` ext4_fallocate Zach Brown
2012-07-04 2:23 ` ext4_fallocate Zheng Liu
2012-07-02 18:01 ` ext4_fallocate Theodore Ts'o
2012-07-03 9:30 ` ext4_fallocate Jan Kara
2012-07-04 1:15 ` ext4_fallocate Phillip Susi
2012-07-04 2:36 ` ext4_fallocate Zheng Liu
2012-07-04 3:06 ` ext4_fallocate Phillip Susi
2012-07-04 3:48 ` ext4_fallocate Zheng Liu
2012-07-04 12:20 ` ext4_fallocate Ric Wheeler
2012-07-04 13:25 ` ext4_fallocate Zheng Liu
2012-06-26 13:06 ` ext4_fallocate Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FE9FA0A.8010708@zoho.com \
--to=fjohnber@zoho.com \
--cc=adilger@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=rwheeler@redhat.com \
--cc=tytso@mit.edu \
--cc=wenqing.lz@taobao.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).