All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  johannes.schindelin@gmx.de,  peff@peff.net,
	ps@pks.im,  me@ttaylorr.com,  johncai86@gmail.com,
	 newren@gmail.com, Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH v2 5/6] p5313: add size comparison test
Date: Thu, 19 Sep 2024 14:58:27 -0700	[thread overview]
Message-ID: <xmqqcykz5mrg.fsf@gitster.g> (raw)
In-Reply-To: <999b1d094241b0ba8d6924ac6976eafc64c7d4a6.1726692382.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Wed, 18 Sep 2024 20:46:20 +0000")

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> As custom options are added to 'git pack-objects' and 'git repack' to
> adjust how compression is done, use this new performance test script to
> demonstrate their effectiveness in performance and size.
>
> The recently-added --full-name-hash option swaps the default name-hash
> algorithm with one that attempts to uniformly distribute the hashes
> based on the full path name instead of the last 16 characters.
>
> This has a dramatic effect on full repacks for repositories with many
> versions of most paths. It can have a negative impact on cases such as
> pushing a single change.
>
> This can be seen by running pt5313 on the open source fluentui
> repository [1]. Most commits will have this kind of output for the thin
> and big pack cases, though certain commits (such as [2]) will have
> problematic thin pack size for other reasons.
>
> [1] https://github.com/microsoft/fluentui
> [2] a637a06df05360ce5ff21420803f64608226a875
>
> Checked out at the parent of [2], I see the following statistics:
>
> Test                                           this tree
> ------------------------------------------------------------------
> 5313.2: thin pack                              0.02(0.01+0.01)
> 5313.3: thin pack size                                    1.1K
> 5313.4: thin pack with --full-name-hash        0.02(0.01+0.00)
> 5313.5: thin pack size with --full-name-hash              3.0K
> 5313.6: big pack                               1.65(3.35+0.24)
> 5313.7: big pack size                                    58.0M
> 5313.8: big pack with --full-name-hash         1.53(2.52+0.18)
> 5313.9: big pack size with --full-name-hash              57.6M
> 5313.10: repack                                176.52(706.60+3.53)
> 5313.11: repack size                                    446.7K
> 5313.12: repack with --full-name-hash          37.47(134.18+3.06)
> 5313.13: repack size with --full-name-hash              183.1K
>
> Note that this demonstrates a 3x size _increase_ in the case that
> simulates a small "git push". The size change is neutral on the case of
> pushing the difference between HEAD and HEAD~1000.
>
> However, the full repack case is both faster and more efficient.

Nice.

> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
>  t/perf/p5313-pack-objects.sh | 71 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
>  create mode 100755 t/perf/p5313-pack-objects.sh

"wc -c" -> "test_file_size" or "test-tool path-utils file-size"?

  reply	other threads:[~2024-09-19 21:58 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-09 13:56 [PATCH 0/4] pack-objects: create new name-hash algorithm Derrick Stolee via GitGitGadget
2024-09-09 13:56 ` [PATCH 1/4] pack-objects: add --full-name-hash option Derrick Stolee via GitGitGadget
2024-09-09 17:45   ` Junio C Hamano
2024-09-10  2:31     ` Derrick Stolee
2024-09-09 13:56 ` [PATCH 2/4] git-repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-09-09 17:48   ` Junio C Hamano
2024-09-09 13:56 ` [PATCH 3/4] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-09-09 13:56 ` [PATCH 4/4] p5314: add a size test for name-hash collisions Derrick Stolee via GitGitGadget
2024-09-09 16:07 ` [PATCH 0/4] pack-objects: create new name-hash algorithm Derrick Stolee
2024-09-09 18:06 ` Junio C Hamano
2024-09-10  2:37   ` Derrick Stolee
2024-09-10 14:56     ` Taylor Blau
2024-09-10 21:05       ` Derrick Stolee
2024-09-11  6:32         ` Jeff King
2024-09-10 16:07     ` Junio C Hamano
2024-09-10 20:36       ` Junio C Hamano
2024-09-10 21:09         ` Derrick Stolee
2024-09-18 20:46 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2024-09-18 20:46   ` [PATCH v2 1/6] pack-objects: add --full-name-hash option Derrick Stolee via GitGitGadget
2024-09-19 21:56     ` Junio C Hamano
2024-09-18 20:46   ` [PATCH v2 2/6] repack: test " Derrick Stolee via GitGitGadget
2024-09-19 22:11     ` Junio C Hamano
2024-09-18 20:46   ` [PATCH v2 3/6] pack-objects: add GIT_TEST_FULL_NAME_HASH Derrick Stolee via GitGitGadget
2024-09-19 22:22     ` Junio C Hamano
2024-09-23  1:39       ` Derrick Stolee
2024-09-18 20:46   ` [PATCH v2 4/6] git-repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-09-18 20:46   ` [PATCH v2 5/6] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-09-19 21:58     ` Junio C Hamano [this message]
2024-09-23  1:50       ` Derrick Stolee
2024-09-23 16:14         ` Junio C Hamano
2024-09-18 20:46   ` [PATCH v2 6/6] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-09-24  7:02     ` Patrick Steinhardt
2024-09-25 13:35       ` Derrick Stolee
2024-09-18 23:30   ` [PATCH v2 0/6] pack-objects: create new name-hash algorithm Derrick Stolee
2024-10-04 21:17   ` Junio C Hamano
2024-10-04 22:30     ` Taylor Blau
2024-10-04 22:35     ` Jeff King
2024-10-08 18:32   ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqcykz5mrg.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=johncai86@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.