All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fredrick <fjohnber@zoho.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Ric Wheeler <rwheeler@redhat.com>,
	linux-ext4@vger.kernel.org, Andreas Dilger <adilger@dilger.ca>,
	wenqing.lz@taobao.com
Subject: Re: ext4_fallocate
Date: Tue, 26 Jun 2012 11:06:02 -0700	[thread overview]
Message-ID: <4FE9FA0A.8010708@zoho.com> (raw)
In-Reply-To: <20120626173050.GA6745@thunk.org>

On 06/26/2012 10:30 AM, Theodore Ts'o wrote:
> On Tue, Jun 26, 2012 at 09:13:35AM -0400, Ric Wheeler wrote:
>>
>> Has anyone made progress digging into the performance impact of
>> running without this patch? We should definitely see if there is
>> some low hanging fruit there, especially given that XFS does not
>> seem to suffer such a huge hit.
>


> I just haven't had time, sorry.  It's so much easier to run with the
> patch.  :-)
>
> Part of the problem certainly caused by the fact that ext4 is using
> physical block journaling instead of logical journalling.  But we see
> the problem in no-journal mode as well.  I think part of the problem
> is simply that many of the workloads where people are doing this, they
> also care about robustness after power failures, and if you are doing
> random writes into uninitialized space, with fsyncs in-between, you
> are basically guaranteed a 2x expansion in the number of writes you
> need to do to the system.
>

Even our workload is same as above. Our programs write a chunk
and do fysnc for robustness. This happens repeatedly
on the file as the program pushes more data on the disk.


> One other thing which we *have* seen is that we need to do a better
> job with extent merging; if you run without this patch, and you run
> with fio in AIO mode where you are doing tons and tons of random
> writes into uninitialized space, you can end up fragmenting the extent
> tree very badly.   So fixing this would certainly help.
>
>> Opening this security exposure is still something that is clearly a
>> hack and best avoided if we can fix the root cause :)
>
> See Linus's recent rant about how security arguments made by
> theoreticians very often end up getting trumped by practical matters.
> If you are running a daemon, whether it is a user-mode cluster file
> system, or a database server, where it is (a) fundamentally trusted,
> and (b) doing its own user-space checksuming and its own guarantees to
> never return uninitialized data, even if we fix all potential
> problems, we *still* can be reducing the number of random writes ---
> and on a fully loaded system, we're guaranteed to be seek-constrained,
> so each random write to update fs metadata means that you're burning
> 0.5% of your 200 seeks/second on your 3TB disk (where previously you
> had half a dozen 500gig disks each with 200 seeks/second).
>

I can see the performance degradation on SSDs too, though the percentage
is less compared to SATA.

> I agree with you that it would be nice to look into this further, and
> optimizing our extent merging is definitely on the hot list of
> perofrmance improvements to look at.  But people who are using ext4 as
> back-end database servers or cluster file system servers and who are
> interested in wringing out every last percentage of performace are
> going to be interested in this technique, no matter what we do.  If
> you have Sagans and Sagans of servers all over the world, even a tenth
> of a percentage point performance improvement can easily translate
> into big dollars.
>

Sailing the same boat. :)

> 						- Ted
>

-Fredrick


  reply	other threads:[~2012-06-26 18:08 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-25  6:42 ext4_fallocate Fredrick
2012-06-25  7:33 ` ext4_fallocate Andreas Dilger
2012-06-28 15:12   ` ext4_fallocate Phillip Susi
2012-06-28 15:23     ` ext4_fallocate Eric Sandeen
2012-06-25  8:51 ` ext4_fallocate Zheng Liu
2012-06-25 19:04   ` ext4_fallocate Fredrick
2012-06-25 19:17   ` ext4_fallocate Theodore Ts'o
2012-06-26  1:23     ` ext4_fallocate Fredrick
2012-06-26 13:13     ` ext4_fallocate Ric Wheeler
2012-06-26 17:30       ` ext4_fallocate Theodore Ts'o
2012-06-26 18:06         ` Fredrick [this message]
2012-06-26 18:21         ` ext4_fallocate Ric Wheeler
2012-06-26 18:57           ` ext4_fallocate Ted Ts'o
2012-06-26 19:22             ` ext4_fallocate Ric Wheeler
2012-06-26 18:05       ` ext4_fallocate Fredrick
2012-06-26 18:59         ` ext4_fallocate Ted Ts'o
2012-06-26 19:30         ` ext4_fallocate Ric Wheeler
2012-06-26 19:57           ` ext4_fallocate Eric Sandeen
2012-06-26 20:44             ` ext4_fallocate Eric Sandeen
2012-06-27 15:14               ` ext4_fallocate Eric Sandeen
2012-06-27 19:30               ` ext4_fallocate Theodore Ts'o
2012-06-27 23:02                 ` ext4_fallocate Eric Sandeen
2012-06-28 11:27                   ` ext4_fallocate Ric Wheeler
2012-06-29 19:02                     ` ext4_fallocate Andreas Dilger
2012-07-02  3:03                       ` ext4_fallocate Zheng Liu
2012-06-28 12:48                   ` ext4_fallocate Theodore Ts'o
2012-07-02  3:16                   ` ext4_fallocate Zheng Liu
2012-07-02 16:33                     ` ext4_fallocate Eric Sandeen
2012-07-02 17:44                       ` ext4_fallocate Jan Kara
2012-07-02 17:48                         ` ext4_fallocate Ric Wheeler
2012-07-03 17:41                           ` ext4_fallocate Zheng Liu
2012-07-03 17:57                             ` ext4_fallocate Zach Brown
2012-07-04  2:23                               ` ext4_fallocate Zheng Liu
2012-07-02 18:01                         ` ext4_fallocate Theodore Ts'o
2012-07-03  9:30                           ` ext4_fallocate Jan Kara
2012-07-04  1:15                         ` ext4_fallocate Phillip Susi
2012-07-04  2:36                           ` ext4_fallocate Zheng Liu
2012-07-04  3:06                             ` ext4_fallocate Phillip Susi
2012-07-04  3:48                               ` ext4_fallocate Zheng Liu
2012-07-04 12:20                               ` ext4_fallocate Ric Wheeler
2012-07-04 13:25                                 ` ext4_fallocate Zheng Liu
2012-06-26 13:06 ` ext4_fallocate Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FE9FA0A.8010708@zoho.com \
    --to=fjohnber@zoho.com \
    --cc=adilger@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=rwheeler@redhat.com \
    --cc=tytso@mit.edu \
    --cc=wenqing.lz@taobao.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.