From: Thomas Gummerer <t.gummerer@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, tr@thomasrast.ch, mhagger@alum.mit.edu,
pclouds@gmail.com, robin.rosenberg@dewire.com,
sunshine@sunshineco.com, ramsay@ramsay1.demon.co.uk,
Antoine Pelisse <apelisse@gmail.com>
Subject: Re: [PATCH v4 00/24] Index-v5
Date: Mon, 09 Dec 2013 11:14:43 +0100 [thread overview]
Message-ID: <87vbyyfi0c.fsf@gmail.com> (raw)
In-Reply-To: <1385553659-9928-1-git-send-email-t.gummerer@gmail.com>
Thomas Gummerer <t.gummerer@gmail.com> writes:
> Hi,
>
> previous rounds (without api) are at $gmane/202752, $gmane/202923,
> $gmane/203088 and $gmane/203517, the previous rounds with api were at
> $gmane/229732, $gmane/230210 and $gmane/232488. Thanks to Duy for
> reviewing the the last round and Junio, Ramsay and Eric for additional
> comments.
>
> Since the last round I've added a POC for partial writing, resulting
> in the following performance improvements for update-index:
>
> Test 1063432 HEAD
> ------------------------------------------------------------------------------------
> 0003.2: v[23]: update-index 0.60(0.38+0.20) 0.76(0.36+0.17) +26.7%
> 0003.3: v[23]: grep nonexistent -- subdir 0.28(0.17+0.11) 0.28(0.18+0.09) +0.0%
> 0003.4: v[23]: ls-files -- subdir 0.26(0.15+0.10) 0.24(0.14+0.09) -7.7%
> 0003.7: v[23] update-index 0.59(0.36+0.22) 0.58(0.36+0.20) -1.7%
> 0003.9: v4: update-index 0.46(0.28+0.17) 0.45(0.30+0.11) -2.2%
> 0003.10: v4: grep nonexistent -- subdir 0.26(0.14+0.11) 0.21(0.14+0.07) -19.2%
> 0003.11: v4: ls-files -- subdir 0.24(0.14+0.10) 0.20(0.12+0.08) -16.7%
> 0003.14: v4 update-index 0.49(0.31+0.18) 0.65(0.34+0.17) +32.7%
> 0003.16: v5: update-index 0.53(0.30+0.22) 0.50(0.28+0.20) -5.7%
> 0003.17: v5: ls-files 0.27(0.15+0.12) 0.27(0.17+0.10) +0.0%
> 0003.18: v5: grep nonexistent -- subdir 0.02(0.01+0.01) 0.03(0.01+0.01) +50.0%
> 0003.19: v5: ls-files -- subdir 0.02(0.00+0.02) 0.02(0.01+0.01) +0.0%
> 0003.22: v5 update-index 0.53(0.29+0.23) 0.02(0.01+0.01) -96.2%
>
> Given this, I don't think a complete change of the in-core format for
> the cache-entries is necessary to take full advantage of the new index
> file format. Instead some changes to the current in-core format would
> work well with the new on-disk format.
>
> The current in-memory format fits the internal needs of git fairly well,
> so I don't think changing it to fit a better index file format would
> make a lot of sense, given that we can take advantage of the new format
> with the existing in-memory format.
Any more opinions on this series? I've applied the changes suggested by
Duy, Antoine and Eric locally, but I wouldn't want to spam the list with
the whole series without a chance of this being applied. How do you
want me to proceed?
> This series doesn't use kb/fast-hashmap yet, but that should be fairly
> simple to change if the series is deemed a good change. The
> performance tests for update-index test require
> tg/perf-lib-test-perf-cleanup.
>
> Other changes, made following the review comments are:
>
> documentation: add documentation of the index-v5 file format
> - Update documentation that directory flags are now 32-bits. That
> makes aligned access simpler
> - offset_to_offset is no longer included in the checksum for files.
> It's unnecessary.
>
> read-cache: read index-v5
> - Add fix for reading with different level pathspecs given
> - Use init_directory_entry to initialize all fields in a new
> directory entry
> - use memset to simplify the create_new_conflict function
> - Add comments to explain -5 when reading directories and files
> - Add comments for the more complex functions
> - Add name flex_array to the end of ondisk_directory_entry for
> simplified reading
> - Add name flex_array to the end of ondisk_cache_entry for
> simplified reading
> - Move conflict reading functions to next patch
> - mark functions as static when they are
>
> read-cache: read resolve-undo data
> - Add comments for the more complex function
> - Read conflicts + resolve undo data as extension
>
> read-cache: read cache-tree in index-v5
> - Add comments for the more complex function
> - Instead of sorting the directory entries, sort the cache-tree
> directly. This also required changing the algorithms with which
> the cache entries are extracted from the directory tree.
>
> read-cache: write index-v5
> - Free pointers allocated by super_directory
> - Rewrite condition as suggested by Duy
> - Don't check for CE_REMOVE'd entries in the writing code, they are
> already checked in the compile_directory_data code
> - Remove overly complicated directory size calculation since flags
> are now 32-bits
>
> read-cache: write resolve-undo data for index-v5
> - Free pointers allocated by super_directory
> - Write conflicts + resolve undo data as extension
>
> introduce GIT_INDEX_VERSION environment variable
> - Add documentation for GIT_INDEX_VERSION
>
> test-lib: allow setting the index format version
>
> Removed commits:
> - read-cache: don't check uid, gid, ino
> - read-cache: use fixed width integer types (independently in pu)
> - read-cache: clear version in discard_index()
>
> Typos fixed as suggested by Eric Sunshine
>
> Thomas Gummerer (22):
> read-cache: split index file version specific functionality
> read-cache: move index v2 specific functions to their own file
> read-cache: Re-read index if index file changed
> add documentation for the index api
> read-cache: add index reading api
> make sure partially read index is not changed
> grep.c: use index api
> ls-files.c: use index api
> documentation: add documentation of the index-v5 file format
> read-cache: make in-memory format aware of stat_crc
> read-cache: read index-v5
> read-cache: read resolve-undo data
> read-cache: read cache-tree in index-v5
> read-cache: write index-v5
> read-cache: write index-v5 cache-tree data
> read-cache: write resolve-undo data for index-v5
> update-index.c: rewrite index when index-version is given
> introduce GIT_INDEX_VERSION environment variable
> test-lib: allow setting the index format version
> t1600: add index v5 specific tests
> POC for partial writing
> perf: add partial writing test
>
> Thomas Rast (1):
> p0003-index.sh: add perf test for the index formats
>
> Documentation/git.txt | 5 +
> Documentation/technical/api-in-core-index.txt | 56 +-
> Documentation/technical/index-file-format-v5.txt | 294 +++++
> Makefile | 10 +
> builtin/apply.c | 2 +
> builtin/grep.c | 69 +-
> builtin/ls-files.c | 36 +-
> builtin/update-index.c | 50 +-
> cache-tree.c | 15 +-
> cache-tree.h | 2 +
> cache.h | 115 +-
> lockfile.c | 2 +-
> read-cache-v2.c | 561 +++++++++
> read-cache-v5.c | 1406 ++++++++++++++++++++++
> read-cache.c | 691 +++--------
> read-cache.h | 67 ++
> resolve-undo.c | 1 +
> t/perf/p0003-index.sh | 74 ++
> t/t1600-index-v5.sh | 25 +
> t/t2101-update-index-reupdate.sh | 12 +-
> t/test-lib-functions.sh | 5 +
> t/test-lib.sh | 3 +
> test-index-version.c | 6 +
> unpack-trees.c | 3 +-
> 24 files changed, 2921 insertions(+), 589 deletions(-)
> create mode 100644 Documentation/technical/index-file-format-v5.txt
> create mode 100644 read-cache-v2.c
> create mode 100644 read-cache-v5.c
> create mode 100644 read-cache.h
> create mode 100755 t/perf/p0003-index.sh
> create mode 100755 t/t1600-index-v5.sh
>
> --
> 1.8.4.2
>
--
Thomas
prev parent reply other threads:[~2013-12-09 10:14 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 02/24] read-cache: split index file version specific functionality Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 03/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 04/24] read-cache: Re-read index if index file changed Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 05/24] add documentation for the index api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 06/24] read-cache: add index reading api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 07/24] make sure partially read index is not changed Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 08/24] grep.c: use index api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 09/24] ls-files.c: " Thomas Gummerer
2013-11-30 9:17 ` Duy Nguyen
2013-11-30 10:30 ` Thomas Gummerer
2013-11-30 15:39 ` Antoine Pelisse
2013-11-30 20:08 ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 10/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 11/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 12/24] read-cache: read index-v5 Thomas Gummerer
2013-11-30 9:17 ` Duy Nguyen
2013-11-30 10:40 ` Thomas Gummerer
2013-11-30 12:19 ` Antoine Pelisse
2013-11-30 20:10 ` Thomas Gummerer
2013-11-30 15:26 ` Antoine Pelisse
2013-11-30 20:27 ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 13/24] read-cache: read resolve-undo data Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 14/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 15/24] read-cache: write index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 16/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 17/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 18/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 19/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 20/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
2013-11-27 21:57 ` Eric Sunshine
2013-11-27 22:08 ` Junio C Hamano
2013-11-28 9:57 ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 21/24] test-lib: allow setting the index format version Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 22/24] t1600: add index v5 specific tests Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 23/24] POC for partial writing Thomas Gummerer
2013-11-30 9:58 ` Duy Nguyen
2013-11-30 10:50 ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 24/24] perf: add partial writing test Thomas Gummerer
2013-12-09 10:14 ` Thomas Gummerer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87vbyyfi0c.fsf@gmail.com \
--to=t.gummerer@gmail.com \
--cc=apelisse@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=ramsay@ramsay1.demon.co.uk \
--cc=robin.rosenberg@dewire.com \
--cc=sunshine@sunshineco.com \
--cc=tr@thomasrast.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).