From: Thomas Gummerer <t.gummerer@gmail.com>
To: git@vger.kernel.org
Cc: t.gummerer@gmail.com, gitster@pobox.com, tr@thomasrast.ch,
mhagger@alum.mit.edu, pclouds@gmail.com,
robin.rosenberg@dewire.com, sunshine@sunshineco.com,
ramsay@ramsay1.demon.co.uk
Subject: [PATCH v4 00/24] Index-v5
Date: Wed, 27 Nov 2013 13:00:35 +0100 [thread overview]
Message-ID: <1385553659-9928-1-git-send-email-t.gummerer@gmail.com> (raw)
Hi,
previous rounds (without api) are at $gmane/202752, $gmane/202923,
$gmane/203088 and $gmane/203517, the previous rounds with api were at
$gmane/229732, $gmane/230210 and $gmane/232488. Thanks to Duy for
reviewing the the last round and Junio, Ramsay and Eric for additional
comments.
Since the last round I've added a POC for partial writing, resulting
in the following performance improvements for update-index:
Test 1063432 HEAD
------------------------------------------------------------------------------------
0003.2: v[23]: update-index 0.60(0.38+0.20) 0.76(0.36+0.17) +26.7%
0003.3: v[23]: grep nonexistent -- subdir 0.28(0.17+0.11) 0.28(0.18+0.09) +0.0%
0003.4: v[23]: ls-files -- subdir 0.26(0.15+0.10) 0.24(0.14+0.09) -7.7%
0003.7: v[23] update-index 0.59(0.36+0.22) 0.58(0.36+0.20) -1.7%
0003.9: v4: update-index 0.46(0.28+0.17) 0.45(0.30+0.11) -2.2%
0003.10: v4: grep nonexistent -- subdir 0.26(0.14+0.11) 0.21(0.14+0.07) -19.2%
0003.11: v4: ls-files -- subdir 0.24(0.14+0.10) 0.20(0.12+0.08) -16.7%
0003.14: v4 update-index 0.49(0.31+0.18) 0.65(0.34+0.17) +32.7%
0003.16: v5: update-index 0.53(0.30+0.22) 0.50(0.28+0.20) -5.7%
0003.17: v5: ls-files 0.27(0.15+0.12) 0.27(0.17+0.10) +0.0%
0003.18: v5: grep nonexistent -- subdir 0.02(0.01+0.01) 0.03(0.01+0.01) +50.0%
0003.19: v5: ls-files -- subdir 0.02(0.00+0.02) 0.02(0.01+0.01) +0.0%
0003.22: v5 update-index 0.53(0.29+0.23) 0.02(0.01+0.01) -96.2%
Given this, I don't think a complete change of the in-core format for
the cache-entries is necessary to take full advantage of the new index
file format. Instead some changes to the current in-core format would
work well with the new on-disk format.
The current in-memory format fits the internal needs of git fairly well,
so I don't think changing it to fit a better index file format would
make a lot of sense, given that we can take advantage of the new format
with the existing in-memory format.
This series doesn't use kb/fast-hashmap yet, but that should be fairly
simple to change if the series is deemed a good change. The
performance tests for update-index test require
tg/perf-lib-test-perf-cleanup.
Other changes, made following the review comments are:
documentation: add documentation of the index-v5 file format
- Update documentation that directory flags are now 32-bits. That
makes aligned access simpler
- offset_to_offset is no longer included in the checksum for files.
It's unnecessary.
read-cache: read index-v5
- Add fix for reading with different level pathspecs given
- Use init_directory_entry to initialize all fields in a new
directory entry
- use memset to simplify the create_new_conflict function
- Add comments to explain -5 when reading directories and files
- Add comments for the more complex functions
- Add name flex_array to the end of ondisk_directory_entry for
simplified reading
- Add name flex_array to the end of ondisk_cache_entry for
simplified reading
- Move conflict reading functions to next patch
- mark functions as static when they are
read-cache: read resolve-undo data
- Add comments for the more complex function
- Read conflicts + resolve undo data as extension
read-cache: read cache-tree in index-v5
- Add comments for the more complex function
- Instead of sorting the directory entries, sort the cache-tree
directly. This also required changing the algorithms with which
the cache entries are extracted from the directory tree.
read-cache: write index-v5
- Free pointers allocated by super_directory
- Rewrite condition as suggested by Duy
- Don't check for CE_REMOVE'd entries in the writing code, they are
already checked in the compile_directory_data code
- Remove overly complicated directory size calculation since flags
are now 32-bits
read-cache: write resolve-undo data for index-v5
- Free pointers allocated by super_directory
- Write conflicts + resolve undo data as extension
introduce GIT_INDEX_VERSION environment variable
- Add documentation for GIT_INDEX_VERSION
test-lib: allow setting the index format version
Removed commits:
- read-cache: don't check uid, gid, ino
- read-cache: use fixed width integer types (independently in pu)
- read-cache: clear version in discard_index()
Typos fixed as suggested by Eric Sunshine
Thomas Gummerer (22):
read-cache: split index file version specific functionality
read-cache: move index v2 specific functions to their own file
read-cache: Re-read index if index file changed
add documentation for the index api
read-cache: add index reading api
make sure partially read index is not changed
grep.c: use index api
ls-files.c: use index api
documentation: add documentation of the index-v5 file format
read-cache: make in-memory format aware of stat_crc
read-cache: read index-v5
read-cache: read resolve-undo data
read-cache: read cache-tree in index-v5
read-cache: write index-v5
read-cache: write index-v5 cache-tree data
read-cache: write resolve-undo data for index-v5
update-index.c: rewrite index when index-version is given
introduce GIT_INDEX_VERSION environment variable
test-lib: allow setting the index format version
t1600: add index v5 specific tests
POC for partial writing
perf: add partial writing test
Thomas Rast (1):
p0003-index.sh: add perf test for the index formats
Documentation/git.txt | 5 +
Documentation/technical/api-in-core-index.txt | 56 +-
Documentation/technical/index-file-format-v5.txt | 294 +++++
Makefile | 10 +
builtin/apply.c | 2 +
builtin/grep.c | 69 +-
builtin/ls-files.c | 36 +-
builtin/update-index.c | 50 +-
cache-tree.c | 15 +-
cache-tree.h | 2 +
cache.h | 115 +-
lockfile.c | 2 +-
read-cache-v2.c | 561 +++++++++
read-cache-v5.c | 1406 ++++++++++++++++++++++
read-cache.c | 691 +++--------
read-cache.h | 67 ++
resolve-undo.c | 1 +
t/perf/p0003-index.sh | 74 ++
t/t1600-index-v5.sh | 25 +
t/t2101-update-index-reupdate.sh | 12 +-
t/test-lib-functions.sh | 5 +
t/test-lib.sh | 3 +
test-index-version.c | 6 +
unpack-trees.c | 3 +-
24 files changed, 2921 insertions(+), 589 deletions(-)
create mode 100644 Documentation/technical/index-file-format-v5.txt
create mode 100644 read-cache-v2.c
create mode 100644 read-cache-v5.c
create mode 100644 read-cache.h
create mode 100755 t/perf/p0003-index.sh
create mode 100755 t/t1600-index-v5.sh
--
1.8.4.2
next reply other threads:[~2013-11-27 12:01 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-27 12:00 Thomas Gummerer [this message]
2013-11-27 12:00 ` [PATCH v4 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 02/24] read-cache: split index file version specific functionality Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 03/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 04/24] read-cache: Re-read index if index file changed Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 05/24] add documentation for the index api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 06/24] read-cache: add index reading api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 07/24] make sure partially read index is not changed Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 08/24] grep.c: use index api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 09/24] ls-files.c: " Thomas Gummerer
2013-11-30 9:17 ` Duy Nguyen
2013-11-30 10:30 ` Thomas Gummerer
2013-11-30 15:39 ` Antoine Pelisse
2013-11-30 20:08 ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 10/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 11/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 12/24] read-cache: read index-v5 Thomas Gummerer
2013-11-30 9:17 ` Duy Nguyen
2013-11-30 10:40 ` Thomas Gummerer
2013-11-30 12:19 ` Antoine Pelisse
2013-11-30 20:10 ` Thomas Gummerer
2013-11-30 15:26 ` Antoine Pelisse
2013-11-30 20:27 ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 13/24] read-cache: read resolve-undo data Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 14/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 15/24] read-cache: write index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 16/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 17/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 18/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 19/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 20/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
2013-11-27 21:57 ` Eric Sunshine
2013-11-27 22:08 ` Junio C Hamano
2013-11-28 9:57 ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 21/24] test-lib: allow setting the index format version Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 22/24] t1600: add index v5 specific tests Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 23/24] POC for partial writing Thomas Gummerer
2013-11-30 9:58 ` Duy Nguyen
2013-11-30 10:50 ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 24/24] perf: add partial writing test Thomas Gummerer
2013-12-09 10:14 ` [PATCH v4 00/24] Index-v5 Thomas Gummerer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1385553659-9928-1-git-send-email-t.gummerer@gmail.com \
--to=t.gummerer@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=ramsay@ramsay1.demon.co.uk \
--cc=robin.rosenberg@dewire.com \
--cc=sunshine@sunshineco.com \
--cc=tr@thomasrast.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.