linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zheng Liu <gnehzuil.liu@gmail.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Eric Sandeen <sandeen@redhat.com>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH 0/5 v2] add extent status tree caching
Date: Fri, 19 Jul 2013 09:03:15 +0800	[thread overview]
Message-ID: <20130719010315.GB21615@gmail.com> (raw)
In-Reply-To: <20130719000738.GD17938@thunk.org>

On Thu, Jul 18, 2013 at 08:07:38PM -0400, Theodore Ts'o wrote:
> On Fri, Jul 19, 2013 at 07:54:51AM +0800, Zheng Liu wrote:
> > 
> > I have talked with my colleague who is a MySQL contributor about whether
> > MySQL tries to preallocate some files or not.  As far as I know, at
> > least MySQL doesn't try to do it until now.  I don't have the source
> > code of Oracle or DB2, these giant databases might use preallocation I
> > guess.
> 
> Oracle and DB2 don't use preallocate, because they don't want the
> metadata update overhead.  So for software packages that are really
> critically worried about 99percentile latency, they will generally
> either pre-zero the file ahead of time, so all of the extents are
> written.  Or, they will use the out-of-tree nohidestale patch, and
> mark all of the extents as written.  (If you are doing A/B benchmark
> comparisons, using nohidestale means the setup overhead for each
> benchmark run can be measured in minutes instead of hours...)
> 
> On at least one of the enterprise databases which I'm familiar with,
> they don't pre-zero the entire database file, but they'll do it in
> chunks of N megabytes.  That means they don't have the huge time lag
> when the database is initially created, but then every so often, when
> the database will suddenly use most of the disk bandwidth to zero the
> next chunk of 16 or 32 or 64 megabytes.  (This tends to do a real
> number on your 99.9 percentile latency numbers, if you care about such
> things....)

Thanks for correcting me. :-). Yes, MySQL does like this.  But the
difference between them is that MySQL doesn't try to zero any chunks
directly.  It just writes out the dirty pages (Yes, in MySQL it has its
own buffer pool and manages it by itself, and it is also called page.),
such as 16 or 32 megabytes, if I understand correctly.  So, in general,
it always wins if we keep the metadata of ext4 file system in memory, at
least for database application.

                                                - Zheng

      reply	other threads:[~2013-07-19  0:44 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-16 15:17 [PATCH 0/5 v2] add extent status tree caching Theodore Ts'o
2013-07-16 15:17 ` [PATCH 1/5] ext4: refactor code to read the extent tree block Theodore Ts'o
2013-07-16 15:18 ` [PATCH 2/5] ext4: print the block number of invalid extent tree blocks Theodore Ts'o
2013-07-18  0:56   ` Zheng Liu
2013-07-16 15:18 ` [PATCH 3/5] ext4: use unsigned int for es_status values Theodore Ts'o
2013-07-16 15:18 ` [PATCH 4/5] ext4: cache all of an extent tree's leaf block upon reading Theodore Ts'o
2013-07-16 15:18 ` [PATCH 5/5] ext4: add new ioctl EXT4_IOC_PRECACHE_EXTENTS Theodore Ts'o
2013-07-18  1:19   ` Zheng Liu
2013-07-18  2:50     ` Theodore Ts'o
2013-07-18 13:06       ` Zheng Liu
2013-07-18 15:21         ` Theodore Ts'o
2013-07-18 18:35 ` [PATCH 0/5 v2] add extent status tree caching Eric Sandeen
2013-07-18 18:53   ` Theodore Ts'o
2013-07-19  0:56     ` Eric Sandeen
2013-07-19  2:59       ` Theodore Ts'o
2013-07-19  3:33         ` Dave Chinner
2013-07-19 14:22           ` Jeff Moyer
2013-07-19 16:19           ` Theodore Ts'o
2013-07-22  1:38             ` Dave Chinner
2013-07-22  2:17               ` Zheng Liu
2013-07-22 10:02                 ` Dave Chinner
2013-07-22 12:57                   ` Zheng Liu
2013-07-30  3:08                     ` Dave Chinner
2013-08-04  1:27                       ` Theodore Ts'o
2013-08-13  3:10                         ` Dave Chinner
2013-08-13  3:21                           ` Eric Sandeen
2013-08-13 13:04                             ` Theodore Ts'o
2013-08-16  3:21                               ` Dave Chinner
2013-08-16 14:39                                 ` Theodore Ts'o
2013-07-18 23:54   ` Zheng Liu
2013-07-19  0:07     ` Theodore Ts'o
2013-07-19  1:03       ` Zheng Liu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130719010315.GB21615@gmail.com \
    --to=gnehzuil.liu@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).