linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Eric Sandeen <sandeen@redhat.com>
Cc: Ted Ts'o <tytso@mit.edu>, Ric Wheeler <rwheeler@redhat.com>,
	Zheng Liu <gnehzuil.liu@gmail.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-ext4@vger.kernel.org, Zheng Liu <wenqing.lz@taobao.com>
Subject: Re: [RFC][PATCH 0/3] add FALLOC_FL_NO_HIDE_STALE flag in fallocate
Date: Wed, 18 Apr 2012 13:02:08 +1000	[thread overview]
Message-ID: <20120418030208.GO6734@dastard> (raw)
In-Reply-To: <4F8DBC20.5010401@redhat.com>

On Tue, Apr 17, 2012 at 01:53:20PM -0500, Eric Sandeen wrote:
> On 4/17/12 1:43 PM, Ted Ts'o wrote:
> > On Tue, Apr 17, 2012 at 01:59:37PM -0400, Ric Wheeler wrote:
> >>
> >> You could get both security and avoid the run time hit by fully
> >> writing the file or by having a variation that relied on "discard"
> >> (i.e., no need to zero data if we can discard or track it as
> >> unwritten).
> > 
> > It's certainly the case that if the device supports persistent
> > discard, something which we definitely *should* do is to send the
> > discard at fallocate time and then mark the space as initialized.
> > 
> > Unfortunately, not all devices, and in particular no HDD's for which I
> > aware support persistent discard.  And, writing all zero's to the file
> > is in fact what a number of programs for which I am aware (including
> > an enterprise database) are doing, precisely because they tend to
> > write into the fallocated space in a somewhat random order, and the
> > extent conversion costs is in fact quite significant.  But writing all
> > zero's to the file before you can use it is quite costly; at the very
> > least it burns disk bandwidth --- one of the main motivations of
> > fallocate was to avoid needing to do a "write all zero pass", and
> > while it does solve the problem for some use cases (such as DVR's),
> > it's not a complete solution.
> 
> Can we please start with profiling the workload causing trouble, see why
> ext4 takes such a hit, and see if anything can be done there to fix
> it surgically, rather than just throwing this big hammer at it?
> 
> In my (admittedly quick, hacky) test, xfs suffed about a 1% perf degradation,
> ext4 about 8%.  Until we at least know why ext4 is so much worse, I'll
> signal a strong NAK for this change, for whatever may or may not be worth.  :)

In actual fact, on my 12 disk RAID0 array, XFS is faster with
unwritten extents *enabled* than when hacked to turn them off. Yes,
you can turn off unwritten extent tracking in XFS if you know what
you are doing, we just don't provide any interfaces to users to do
so because of all the security problems it entails.

The result (using 256MB prealloc file, 2000 sparse 4k block writes,
one with O_SYNC, the other done async with a post write sync), with
averages over 5 runs are:

		O_SYNC		post-sync
unwritten	7.297s		5.734s
stale		7.641s		6.108s

These results are consistently repeatable, and only reinforce the
point that if ext4 is slow using unwritten extent tracking, then
it's an implementation problem and not an excuse to add an interface
to expose stale data....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2012-04-18  3:02 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-17 16:53 [RFC][PATCH 0/3] add FALLOC_FL_NO_HIDE_STALE flag in fallocate Zheng Liu
2012-04-17 16:53 ` [RFC][PATCH 1/3] vfs: " Zheng Liu
2012-04-17 16:53 ` [RFC][PATCH 2/3] vfs: add security check for _NO_HIDE_STALE flag Zheng Liu
2012-04-17 16:53 ` [RFC][PATCH 3/3] ext4: add FALLOC_FL_NO_HIDE_STALE support Zheng Liu
2012-04-17 17:40 ` [RFC][PATCH 0/3] add FALLOC_FL_NO_HIDE_STALE flag in fallocate Eric Sandeen
2012-04-18  4:08   ` Zheng Liu
2012-04-18  7:48     ` Lukas Czerner
2012-04-18 12:03       ` Zheng Liu
2012-04-18 12:07         ` Lukas Czerner
2012-04-20  9:52           ` Zheng Liu
2012-04-18  4:59   ` Andreas Dilger
2012-04-18  8:19     ` Lukas Czerner
2012-04-18 12:48       ` Zheng Liu
2012-04-18 15:09         ` Andreas Dilger
2012-04-20  9:59           ` Zheng Liu
2012-04-18 11:38     ` Zheng Liu
2012-04-18 11:39       ` Lukas Czerner
2012-04-18 12:06         ` Zheng Liu
2012-04-18 14:57     ` Eric Sandeen
2012-04-17 17:59 ` Ric Wheeler
2012-04-17 18:43   ` Ted Ts'o
2012-04-17 18:52     ` Ric Wheeler
2012-04-17 18:53     ` Eric Sandeen
2012-04-17 19:04       ` Ted Ts'o
2012-04-18  3:02       ` Dave Chinner [this message]
2012-04-18 16:07         ` Ted Ts'o
2012-04-18 23:37           ` Dave Chinner
2012-04-18  8:04     ` Lukas Czerner
  -- strict thread matches above, loose matches on Subject: below --
2012-04-23  1:55 Szabolcs Szakacsits

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120418030208.GO6734@dastard \
    --to=david@fromorbit.com \
    --cc=gnehzuil.liu@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rwheeler@redhat.com \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    --cc=wenqing.lz@taobao.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).