bigalloc performance stats (was Re: [PATCH 00/23] New spin of the bigalloc patches)

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ted Ts'o <tytso@mit.edu>
To: Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: bigalloc performance stats (was Re: [PATCH 00/23] New spin of the bigalloc patches)
Date: Fri, 8 Jul 2011 19:02:00 -0400	[thread overview]
Message-ID: <20110708230200.GJ3331@thunk.org> (raw)
In-Reply-To: <1309970166-11770-1-git-send-email-tytso@mit.edu>

I have some initial benchmark figures that may help provide some
insight into why I am especially interested in getting bigalloc into
ext4.

The following statistics were collected on a Google file server.  As
Michael Rubin mentioned in his talks at the LinuxCon this year, and at
the Kernel Summit two years ago, one of the things that we do our
servers is to really pack in a large number of jobs onto a single
machine, for cost and power efficiency.

As a result, we generally don't have machines which are *only* a file
server; that would leave wasted memory and CPU on the table.  I
believe the same thing will be found in people who are implementing
cloud computing using virtualization; the whole point is to do things
efficiently, which means a large number of guest OS's will be packed
onto a single physical machine, so memory and disk bandwidth will
often be at a premium.  This is the environment in which these figures
were captured.

I compared a stock ext4 file system, against ext4 file system with
bigalloc with 64k, 256k, and 1M clusters.  First, let's looked at the
average time needed to execute the fallocate system call and the inode
truncation portion of the ftruncate and unlink system calls (this data
was gathered using tracepoints, so the overhead of syscall entry and
exit are not included in these numbers):

                  ext4                64k            256k            1M
            time meta  max   time   meta max  time   meta  max  time meta   max
fallocate 14,262 1.1494 11 |  895  0.0417 2 |  318  0.0084  2 |  122 0.00077  1
truncate  12,944 0.8256 27 | 6911  0.4877 3 | 4541  0.2822  3 | 4558 0.2744   3

The time column is in microseconds (i.e., in this server, using stock
ext4, fallocate was taking 14.2 milliseconds on average); the "meta"
column indicates the average number of metadata reads were necessary
to complete the operation, and the "max" column indicates the maximum
number of metadata reads needed to complete the operation.

Note the improvement in the average time to execute the fallocate()
system call went down by over two orders of magnitude comparing ext4
against bigalloc with a 1M cluster size, using the same workload (from
14.2 ms to 122 usec).  And even the 64k and 256k cluster sizes did
quite well (factors of 16 and 45, respectively) compared to stock
ext4.

Also of interest was the percentage of direct I/O reads and writes
that took over 100ms:

                      ext4    64k     256k     1M
DIO reads > 100ms:   0.498%  0.228%  0.257%  0.269%
DIO writes > 100ms:  0.202%  0.134%  0.109%  0.0582%

Since we don't need to read or write the block allocation bitmaps when
we do our DIO (since we fallocate the files in advance), this
improvement must be largely due to improved fragmentation of the files
(we let the workload run for a couple of days on a set of disks so we
could get something closer "steady state" as opposed "freshly
formatted" results).  The reason why the DIO reads improve so much
more is because of the need to read in the extent tree blocks, which
would tend to be in memory already most of the time since the inode
would have been freshly fallocated while the DIO write was going on.

These are only initial results, but they were gathered on a production
workload --- but I hope this demonstrates why I consider bigalloc to
be especially interesting in environments where server resources
(especially memory) are constrained due to desire to use those
resources as efficiently as possible.

					- Ted

     prev parent reply	other threads:[~2011-07-08 23:02 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-06 16:35 [PATCH 00/23] New spin of the bigalloc patches Theodore Ts'o
2011-07-06 16:35 ` [PATCH 01/23] ext4: read-only support for bigalloc file systems Theodore Ts'o
2011-07-06 16:35 ` [PATCH 02/23] ext4: enforce bigalloc restrictions (e.g., no online resizing, etc.) Theodore Ts'o
2011-09-28 12:55   ` [02/23] " Ted Ts'o
2011-07-06 16:35 ` [PATCH 03/23] ext4: convert instances of EXT4_BLOCKS_PER_GROUP to EXT4_CLUSTERS_PER_GROUP Theodore Ts'o
2011-07-06 16:35 ` [PATCH 04/23] ext4: factor out block group accounting into functions Theodore Ts'o
2011-07-06 16:35 ` [PATCH 05/23] ext4: split out ext4_free_blocks_after_init() Theodore Ts'o
2011-07-06 16:35 ` [PATCH 06/23] ext4: bigalloc changes to block bitmap initialization functions Theodore Ts'o
2011-07-06 16:35 ` [PATCH 07/23] ext4: convert block group-relative offsets to use clusters Theodore Ts'o
2011-07-06 16:35 ` [PATCH 08/23] ext4: teach mballoc preallocation code about bigalloc clusters Theodore Ts'o
2011-07-06 16:35 ` [PATCH 09/23] ext4: teach ext4_free_blocks() about bigalloc and clusters Theodore Ts'o
2011-07-06 16:35 ` [PATCH 10/23] ext4: teach ext4_ext_map_blocks() about the bigalloc feature Theodore Ts'o
2011-07-06 16:35 ` [PATCH 11/23] ext4: teach ext4_ext_truncate() " Theodore Ts'o
2011-07-06 16:35 ` [PATCH 12/23] ext4: convert s_{dirty,free}blocks_counter to s_{dirty,free}clusters_counter Theodore Ts'o
2011-07-06 22:59   ` Andreas Dilger
2011-07-08 22:41     ` Ted Ts'o
2011-07-06 16:35 ` [PATCH 13/23] ext4: convert the free_blocks field in s_flex_groups to be free_clusters Theodore Ts'o
2011-07-06 16:35 ` [PATCH 14/23] ext4: teach ext4_statfs() to deal with clusters if bigalloc is enabled Theodore Ts'o
2011-07-06 22:58   ` Andreas Dilger
2011-07-08 22:40     ` Ted Ts'o
2011-07-06 16:35 ` [PATCH 15/23] ext4: tune mballoc's default group prealloc size for bigalloc file systems Theodore Ts'o
2011-07-06 16:35 ` [PATCH 16/23] ext4: Fix bigalloc quota accounting and i_blocks value Theodore Ts'o
2011-07-06 16:36 ` [PATCH 17/23] ext4: enable mounting bigalloc as read/write Theodore Ts'o
2011-07-06 16:36 ` [PATCH 18/23] ext4: Rename ext4_free_blks_{count,set}() to refer to clusters Theodore Ts'o
2011-07-06 23:06   ` Andreas Dilger
2011-07-08 22:42     ` Ted Ts'o
2011-07-06 16:36 ` [PATCH 19/23] ext4: rename ext4_count_free_blocks() to ext4_count_free_clusters() Theodore Ts'o
2011-07-06 16:36 ` [PATCH 20/23] ext4: rename ext4_free_blocks_after_init() to ext4_free_clusters_after_init() Theodore Ts'o
2011-07-06 16:36 ` [PATCH 21/23] ext4: rename ext4_claim_free_blocks() to ext4_claim_free_clusters() Theodore Ts'o
2011-07-06 16:36 ` [PATCH 22/23] ext4: rename ext4_has_free_blocks() to ext4_has_free_clusters() Theodore Ts'o
2011-07-06 16:36 ` [PATCH 23/23] ext4: add some tracepoints in ext4/extents.c Theodore Ts'o
2011-07-06 18:12   ` Eric Gouriou
2011-07-08 23:20     ` Ted Ts'o
2011-07-08 23:02 ` Ted Ts'o [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110708230200.GJ3331@thunk.org \
    --to=tytso@mit.edu \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).