All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gummerer <t.gummerer@gmail.com>
To: git@vger.kernel.org
Cc: t.gummerer@gmail.com, gitster@pobox.com, tr@thomasrast.ch,
	mhagger@alum.mit.edu, pclouds@gmail.com,
	robin.rosenberg@dewire.com, sunshine@sunshineco.com,
	ramsay@ramsay1.demon.co.uk
Subject: [PATCH v4 00/24] Index-v5
Date: Wed, 27 Nov 2013 13:00:35 +0100	[thread overview]
Message-ID: <1385553659-9928-1-git-send-email-t.gummerer@gmail.com> (raw)

Hi,

previous rounds (without api) are at $gmane/202752, $gmane/202923,
$gmane/203088 and $gmane/203517, the previous rounds with api were at
$gmane/229732, $gmane/230210 and $gmane/232488.  Thanks to Duy for
reviewing the the last round and Junio, Ramsay and Eric for additional
comments.

Since the last round I've added a POC for partial writing, resulting
in the following performance improvements for update-index:

Test                                        1063432           HEAD
------------------------------------------------------------------------------------
0003.2: v[23]: update-index                 0.60(0.38+0.20)   0.76(0.36+0.17) +26.7%
0003.3: v[23]: grep nonexistent -- subdir   0.28(0.17+0.11)   0.28(0.18+0.09) +0.0%
0003.4: v[23]: ls-files -- subdir           0.26(0.15+0.10)   0.24(0.14+0.09) -7.7%
0003.7: v[23] update-index                  0.59(0.36+0.22)   0.58(0.36+0.20) -1.7%
0003.9: v4: update-index                    0.46(0.28+0.17)   0.45(0.30+0.11) -2.2%
0003.10: v4: grep nonexistent -- subdir     0.26(0.14+0.11)   0.21(0.14+0.07) -19.2%
0003.11: v4: ls-files -- subdir             0.24(0.14+0.10)   0.20(0.12+0.08) -16.7%
0003.14: v4 update-index                    0.49(0.31+0.18)   0.65(0.34+0.17) +32.7%
0003.16: v5: update-index                   0.53(0.30+0.22)   0.50(0.28+0.20) -5.7%
0003.17: v5: ls-files                       0.27(0.15+0.12)   0.27(0.17+0.10) +0.0%
0003.18: v5: grep nonexistent -- subdir     0.02(0.01+0.01)   0.03(0.01+0.01) +50.0%
0003.19: v5: ls-files -- subdir             0.02(0.00+0.02)   0.02(0.01+0.01) +0.0%
0003.22: v5 update-index                    0.53(0.29+0.23)   0.02(0.01+0.01) -96.2%

Given this, I don't think a complete change of the in-core format for
the cache-entries is necessary to take full advantage of the new index
file format.  Instead some changes to the current in-core format would
work well with the new on-disk format.

The current in-memory format fits the internal needs of git fairly well,
so I don't think changing it to fit a better index file format would
make a lot of sense, given that we can take advantage of the new format
with the existing in-memory format.

This series doesn't use kb/fast-hashmap yet, but that should be fairly
simple to change if the series is deemed a good change.  The
performance tests for update-index test require
tg/perf-lib-test-perf-cleanup. 

Other changes, made following the review comments are:

documentation: add documentation of the index-v5 file format
  - Update documentation that directory flags are now 32-bits.  That
    makes aligned access simpler
  - offset_to_offset is no longer included in the checksum for files.
    It's unnecessary.

read-cache: read index-v5
  - Add fix for reading with different level pathspecs given
  - Use init_directory_entry to initialize all fields in a new
    directory entry
  - use memset to simplify the create_new_conflict function
  - Add comments to explain -5 when reading directories and files
  - Add comments for the more complex functions
  - Add name flex_array to the end of ondisk_directory_entry for
    simplified reading
  - Add name flex_array to the end of ondisk_cache_entry for
    simplified reading
  - Move conflict reading functions to next patch
  - mark functions as static when they are

read-cache: read resolve-undo data
  - Add comments for the more complex function
  - Read conflicts + resolve undo data as extension

read-cache: read cache-tree in index-v5
  - Add comments for the more complex function
  - Instead of sorting the directory entries, sort the cache-tree
    directly.  This also required changing the algorithms with which
    the cache entries are extracted from the directory tree.

read-cache: write index-v5
  - Free pointers allocated by super_directory
  - Rewrite condition as suggested by Duy
  - Don't check for CE_REMOVE'd entries in the writing code, they are
    already checked in the compile_directory_data code
  - Remove overly complicated directory size calculation since flags
    are now 32-bits

read-cache: write resolve-undo data for index-v5
  - Free pointers allocated by super_directory
  - Write conflicts + resolve undo data as extension

introduce GIT_INDEX_VERSION environment variable
  - Add documentation for GIT_INDEX_VERSION

test-lib: allow setting the index format version

Removed commits:
  - read-cache: don't check uid, gid, ino
  - read-cache: use fixed width integer types (independently in pu)
  - read-cache: clear version in discard_index()

Typos fixed as suggested by Eric Sunshine

Thomas Gummerer (22):
  read-cache: split index file version specific functionality
  read-cache: move index v2 specific functions to their own file
  read-cache: Re-read index if index file changed
  add documentation for the index api
  read-cache: add index reading api
  make sure partially read index is not changed
  grep.c: use index api
  ls-files.c: use index api
  documentation: add documentation of the index-v5 file format
  read-cache: make in-memory format aware of stat_crc
  read-cache: read index-v5
  read-cache: read resolve-undo data
  read-cache: read cache-tree in index-v5
  read-cache: write index-v5
  read-cache: write index-v5 cache-tree data
  read-cache: write resolve-undo data for index-v5
  update-index.c: rewrite index when index-version is given
  introduce GIT_INDEX_VERSION environment variable
  test-lib: allow setting the index format version
  t1600: add index v5 specific tests
  POC for partial writing
  perf: add partial writing test

Thomas Rast (1):
  p0003-index.sh: add perf test for the index formats

 Documentation/git.txt                            |    5 +
 Documentation/technical/api-in-core-index.txt    |   56 +-
 Documentation/technical/index-file-format-v5.txt |  294 +++++
 Makefile                                         |   10 +
 builtin/apply.c                                  |    2 +
 builtin/grep.c                                   |   69 +-
 builtin/ls-files.c                               |   36 +-
 builtin/update-index.c                           |   50 +-
 cache-tree.c                                     |   15 +-
 cache-tree.h                                     |    2 +
 cache.h                                          |  115 +-
 lockfile.c                                       |    2 +-
 read-cache-v2.c                                  |  561 +++++++++
 read-cache-v5.c                                  | 1406 ++++++++++++++++++++++
 read-cache.c                                     |  691 +++--------
 read-cache.h                                     |   67 ++
 resolve-undo.c                                   |    1 +
 t/perf/p0003-index.sh                            |   74 ++
 t/t1600-index-v5.sh                              |   25 +
 t/t2101-update-index-reupdate.sh                 |   12 +-
 t/test-lib-functions.sh                          |    5 +
 t/test-lib.sh                                    |    3 +
 test-index-version.c                             |    6 +
 unpack-trees.c                                   |    3 +-
 24 files changed, 2921 insertions(+), 589 deletions(-)
 create mode 100644 Documentation/technical/index-file-format-v5.txt
 create mode 100644 read-cache-v2.c
 create mode 100644 read-cache-v5.c
 create mode 100644 read-cache.h
 create mode 100755 t/perf/p0003-index.sh
 create mode 100755 t/t1600-index-v5.sh

-- 
1.8.4.2

             reply	other threads:[~2013-11-27 12:01 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-27 12:00 Thomas Gummerer [this message]
2013-11-27 12:00 ` [PATCH v4 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 02/24] read-cache: split index file version specific functionality Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 03/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 04/24] read-cache: Re-read index if index file changed Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 05/24] add documentation for the index api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 06/24] read-cache: add index reading api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 07/24] make sure partially read index is not changed Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 08/24] grep.c: use index api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 09/24] ls-files.c: " Thomas Gummerer
2013-11-30  9:17   ` Duy Nguyen
2013-11-30 10:30     ` Thomas Gummerer
2013-11-30 15:39   ` Antoine Pelisse
2013-11-30 20:08     ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 10/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 11/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 12/24] read-cache: read index-v5 Thomas Gummerer
2013-11-30  9:17   ` Duy Nguyen
2013-11-30 10:40     ` Thomas Gummerer
2013-11-30 12:19   ` Antoine Pelisse
2013-11-30 20:10     ` Thomas Gummerer
2013-11-30 15:26   ` Antoine Pelisse
2013-11-30 20:27     ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 13/24] read-cache: read resolve-undo data Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 14/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 15/24] read-cache: write index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 16/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 17/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 18/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 19/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 20/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
2013-11-27 21:57   ` Eric Sunshine
2013-11-27 22:08     ` Junio C Hamano
2013-11-28  9:57       ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 21/24] test-lib: allow setting the index format version Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 22/24] t1600: add index v5 specific tests Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 23/24] POC for partial writing Thomas Gummerer
2013-11-30  9:58   ` Duy Nguyen
2013-11-30 10:50     ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 24/24] perf: add partial writing test Thomas Gummerer
2013-12-09 10:14 ` [PATCH v4 00/24] Index-v5 Thomas Gummerer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1385553659-9928-1-git-send-email-t.gummerer@gmail.com \
    --to=t.gummerer@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=ramsay@ramsay1.demon.co.uk \
    --cc=robin.rosenberg@dewire.com \
    --cc=sunshine@sunshineco.com \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.