All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gummerer <t.gummerer@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, tr@thomasrast.ch, mhagger@alum.mit.edu,
	pclouds@gmail.com, robin.rosenberg@dewire.com,
	sunshine@sunshineco.com, ramsay@ramsay1.demon.co.uk,
	Antoine Pelisse <apelisse@gmail.com>
Subject: Re: [PATCH v4 00/24] Index-v5
Date: Mon, 09 Dec 2013 11:14:43 +0100	[thread overview]
Message-ID: <87vbyyfi0c.fsf@gmail.com> (raw)
In-Reply-To: <1385553659-9928-1-git-send-email-t.gummerer@gmail.com>

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Hi,
>
> previous rounds (without api) are at $gmane/202752, $gmane/202923,
> $gmane/203088 and $gmane/203517, the previous rounds with api were at
> $gmane/229732, $gmane/230210 and $gmane/232488.  Thanks to Duy for
> reviewing the the last round and Junio, Ramsay and Eric for additional
> comments.
>
> Since the last round I've added a POC for partial writing, resulting
> in the following performance improvements for update-index:
>
> Test                                        1063432           HEAD
> ------------------------------------------------------------------------------------
> 0003.2: v[23]: update-index                 0.60(0.38+0.20)   0.76(0.36+0.17) +26.7%
> 0003.3: v[23]: grep nonexistent -- subdir   0.28(0.17+0.11)   0.28(0.18+0.09) +0.0%
> 0003.4: v[23]: ls-files -- subdir           0.26(0.15+0.10)   0.24(0.14+0.09) -7.7%
> 0003.7: v[23] update-index                  0.59(0.36+0.22)   0.58(0.36+0.20) -1.7%
> 0003.9: v4: update-index                    0.46(0.28+0.17)   0.45(0.30+0.11) -2.2%
> 0003.10: v4: grep nonexistent -- subdir     0.26(0.14+0.11)   0.21(0.14+0.07) -19.2%
> 0003.11: v4: ls-files -- subdir             0.24(0.14+0.10)   0.20(0.12+0.08) -16.7%
> 0003.14: v4 update-index                    0.49(0.31+0.18)   0.65(0.34+0.17) +32.7%
> 0003.16: v5: update-index                   0.53(0.30+0.22)   0.50(0.28+0.20) -5.7%
> 0003.17: v5: ls-files                       0.27(0.15+0.12)   0.27(0.17+0.10) +0.0%
> 0003.18: v5: grep nonexistent -- subdir     0.02(0.01+0.01)   0.03(0.01+0.01) +50.0%
> 0003.19: v5: ls-files -- subdir             0.02(0.00+0.02)   0.02(0.01+0.01) +0.0%
> 0003.22: v5 update-index                    0.53(0.29+0.23)   0.02(0.01+0.01) -96.2%
>
> Given this, I don't think a complete change of the in-core format for
> the cache-entries is necessary to take full advantage of the new index
> file format.  Instead some changes to the current in-core format would
> work well with the new on-disk format.
>
> The current in-memory format fits the internal needs of git fairly well,
> so I don't think changing it to fit a better index file format would
> make a lot of sense, given that we can take advantage of the new format
> with the existing in-memory format.

Any more opinions on this series?  I've applied the changes suggested by
Duy, Antoine and Eric locally, but I wouldn't want to spam the list with
the whole series without a chance of this being applied.  How do you
want me to proceed?

> This series doesn't use kb/fast-hashmap yet, but that should be fairly
> simple to change if the series is deemed a good change.  The
> performance tests for update-index test require
> tg/perf-lib-test-perf-cleanup.
>
> Other changes, made following the review comments are:
>
> documentation: add documentation of the index-v5 file format
>   - Update documentation that directory flags are now 32-bits.  That
>     makes aligned access simpler
>   - offset_to_offset is no longer included in the checksum for files.
>     It's unnecessary.
>
> read-cache: read index-v5
>   - Add fix for reading with different level pathspecs given
>   - Use init_directory_entry to initialize all fields in a new
>     directory entry
>   - use memset to simplify the create_new_conflict function
>   - Add comments to explain -5 when reading directories and files
>   - Add comments for the more complex functions
>   - Add name flex_array to the end of ondisk_directory_entry for
>     simplified reading
>   - Add name flex_array to the end of ondisk_cache_entry for
>     simplified reading
>   - Move conflict reading functions to next patch
>   - mark functions as static when they are
>
> read-cache: read resolve-undo data
>   - Add comments for the more complex function
>   - Read conflicts + resolve undo data as extension
>
> read-cache: read cache-tree in index-v5
>   - Add comments for the more complex function
>   - Instead of sorting the directory entries, sort the cache-tree
>     directly.  This also required changing the algorithms with which
>     the cache entries are extracted from the directory tree.
>
> read-cache: write index-v5
>   - Free pointers allocated by super_directory
>   - Rewrite condition as suggested by Duy
>   - Don't check for CE_REMOVE'd entries in the writing code, they are
>     already checked in the compile_directory_data code
>   - Remove overly complicated directory size calculation since flags
>     are now 32-bits
>
> read-cache: write resolve-undo data for index-v5
>   - Free pointers allocated by super_directory
>   - Write conflicts + resolve undo data as extension
>
> introduce GIT_INDEX_VERSION environment variable
>   - Add documentation for GIT_INDEX_VERSION
>
> test-lib: allow setting the index format version
>
> Removed commits:
>   - read-cache: don't check uid, gid, ino
>   - read-cache: use fixed width integer types (independently in pu)
>   - read-cache: clear version in discard_index()
>
> Typos fixed as suggested by Eric Sunshine
>
> Thomas Gummerer (22):
>   read-cache: split index file version specific functionality
>   read-cache: move index v2 specific functions to their own file
>   read-cache: Re-read index if index file changed
>   add documentation for the index api
>   read-cache: add index reading api
>   make sure partially read index is not changed
>   grep.c: use index api
>   ls-files.c: use index api
>   documentation: add documentation of the index-v5 file format
>   read-cache: make in-memory format aware of stat_crc
>   read-cache: read index-v5
>   read-cache: read resolve-undo data
>   read-cache: read cache-tree in index-v5
>   read-cache: write index-v5
>   read-cache: write index-v5 cache-tree data
>   read-cache: write resolve-undo data for index-v5
>   update-index.c: rewrite index when index-version is given
>   introduce GIT_INDEX_VERSION environment variable
>   test-lib: allow setting the index format version
>   t1600: add index v5 specific tests
>   POC for partial writing
>   perf: add partial writing test
>
> Thomas Rast (1):
>   p0003-index.sh: add perf test for the index formats
>
>  Documentation/git.txt                            |    5 +
>  Documentation/technical/api-in-core-index.txt    |   56 +-
>  Documentation/technical/index-file-format-v5.txt |  294 +++++
>  Makefile                                         |   10 +
>  builtin/apply.c                                  |    2 +
>  builtin/grep.c                                   |   69 +-
>  builtin/ls-files.c                               |   36 +-
>  builtin/update-index.c                           |   50 +-
>  cache-tree.c                                     |   15 +-
>  cache-tree.h                                     |    2 +
>  cache.h                                          |  115 +-
>  lockfile.c                                       |    2 +-
>  read-cache-v2.c                                  |  561 +++++++++
>  read-cache-v5.c                                  | 1406 ++++++++++++++++++++++
>  read-cache.c                                     |  691 +++--------
>  read-cache.h                                     |   67 ++
>  resolve-undo.c                                   |    1 +
>  t/perf/p0003-index.sh                            |   74 ++
>  t/t1600-index-v5.sh                              |   25 +
>  t/t2101-update-index-reupdate.sh                 |   12 +-
>  t/test-lib-functions.sh                          |    5 +
>  t/test-lib.sh                                    |    3 +
>  test-index-version.c                             |    6 +
>  unpack-trees.c                                   |    3 +-
>  24 files changed, 2921 insertions(+), 589 deletions(-)
>  create mode 100644 Documentation/technical/index-file-format-v5.txt
>  create mode 100644 read-cache-v2.c
>  create mode 100644 read-cache-v5.c
>  create mode 100644 read-cache.h
>  create mode 100755 t/perf/p0003-index.sh
>  create mode 100755 t/t1600-index-v5.sh
>
> --
> 1.8.4.2
>

--
Thomas

      parent reply	other threads:[~2013-12-09 10:14 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-27 12:00 [PATCH v4 00/24] Index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 02/24] read-cache: split index file version specific functionality Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 03/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 04/24] read-cache: Re-read index if index file changed Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 05/24] add documentation for the index api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 06/24] read-cache: add index reading api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 07/24] make sure partially read index is not changed Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 08/24] grep.c: use index api Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 09/24] ls-files.c: " Thomas Gummerer
2013-11-30  9:17   ` Duy Nguyen
2013-11-30 10:30     ` Thomas Gummerer
2013-11-30 15:39   ` Antoine Pelisse
2013-11-30 20:08     ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 10/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 11/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 12/24] read-cache: read index-v5 Thomas Gummerer
2013-11-30  9:17   ` Duy Nguyen
2013-11-30 10:40     ` Thomas Gummerer
2013-11-30 12:19   ` Antoine Pelisse
2013-11-30 20:10     ` Thomas Gummerer
2013-11-30 15:26   ` Antoine Pelisse
2013-11-30 20:27     ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 13/24] read-cache: read resolve-undo data Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 14/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 15/24] read-cache: write index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 16/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 17/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 18/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 19/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 20/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
2013-11-27 21:57   ` Eric Sunshine
2013-11-27 22:08     ` Junio C Hamano
2013-11-28  9:57       ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 21/24] test-lib: allow setting the index format version Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 22/24] t1600: add index v5 specific tests Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 23/24] POC for partial writing Thomas Gummerer
2013-11-30  9:58   ` Duy Nguyen
2013-11-30 10:50     ` Thomas Gummerer
2013-11-27 12:00 ` [PATCH v4 24/24] perf: add partial writing test Thomas Gummerer
2013-12-09 10:14 ` Thomas Gummerer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vbyyfi0c.fsf@gmail.com \
    --to=t.gummerer@gmail.com \
    --cc=apelisse@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=ramsay@ramsay1.demon.co.uk \
    --cc=robin.rosenberg@dewire.com \
    --cc=sunshine@sunshineco.com \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.