git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, johannes.schindelin@gmx.de, peff@peff.net,
	ps@pks.im, me@ttaylorr.com, johncai86@gmail.com,
	newren@gmail.com, Derrick Stolee <stolee@gmail.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: [PATCH 5/7] p5313: add size comparison test
Date: Tue, 05 Nov 2024 03:05:05 +0000	[thread overview]
Message-ID: <c14ef6879e451401381ebbdb8f30d33c8f56c25b.1730775908.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1823.git.1730775907.gitgitgadget@gmail.com>

From: Derrick Stolee <stolee@gmail.com>

As custom options are added to 'git pack-objects' and 'git repack' to
adjust how compression is done, use this new performance test script to
demonstrate their effectiveness in performance and size.

The recently-added --full-name-hash option swaps the default name-hash
algorithm with one that attempts to uniformly distribute the hashes
based on the full path name instead of the last 16 characters.

This has a dramatic effect on full repacks for repositories with many
versions of most paths. It can have a negative impact on cases such as
pushing a single change.

This can be seen by running pt5313 on the open source fluentui
repository [1]. Most commits will have this kind of output for the thin
and big pack cases, though certain commits (such as [2]) will have
problematic thin pack size for other reasons.

[1] https://github.com/microsoft/fluentui
[2] a637a06df05360ce5ff21420803f64608226a875

Checked out at the parent of [2], I see the following statistics:

Test                                               HEAD
---------------------------------------------------------------------
5313.2: thin pack                                  0.37(0.43+0.02)
5313.3: thin pack size                                        1.2M
5313.4: thin pack with --full-name-hash            0.06(0.09+0.02)
5313.5: thin pack size with --full-name-hash                 20.4K
5313.6: big pack                                   2.01(7.73+0.23)
5313.7: big pack size                                        20.3M
5313.8: big pack with --full-name-hash             1.32(2.77+0.27)
5313.9: big pack size with --full-name-hash                  19.9M
5313.10: shallow fetch pack                        1.40(3.01+0.08)
5313.11: shallow pack size                                   34.4M
5313.12: shallow pack with --full-name-hash        1.08(1.25+0.14)
5313.13: shallow pack size with --full-name-hash             35.4M
5313.14: repack                                    90.70(672.88+2.46)
5313.15: repack size                                        439.6M
5313.16: repack with --full-name-hash              18.53(123.41+2.53)
5313.17: repack size with --full-name-hash                  169.7M

In this case, we see positive behaviors such as a significant shrink in
the size of the thin pack and full repack. The big pack is slightly
smaller with --full-name-hash than without. The shallow pack is slightly
larger with --full-name-hash.

In the case of the Git repository, these numbers show some of the issues
with this approach:

Test                                               HEAD
--------------------------------------------------------------------
5313.2: thin pack                                  0.00(0.00+0.00)
5313.3: thin pack size                                         589
5313.4: thin pack with --full-name-hash            0.00(0.00+0.00)
5313.5: thin pack size with --full-name-hash                 14.9K
5313.6: big pack                                   2.07(3.57+0.17)
5313.7: big pack size                                        17.6M
5313.8: big pack with --full-name-hash             2.00(3.07+0.19)
5313.9: big pack size with --full-name-hash                  17.9M
5313.10: shallow fetch pack                        1.41(2.23+0.06)
5313.11: shallow pack size                                   12.1M
5313.12: shallow pack with --full-name-hash        1.22(1.66+0.04)
5313.13: shallow pack size with --full-name-hash             12.4M
5313.14: repack                                    15.75(89.29+1.54)
5313.15: repack size                                        126.4M
5313.16: repack with --full-name-hash              15.56(89.78+1.32)
5313.17: repack size with --full-name-hash                  126.0M

The thin pack that simulates a push is much worse with --full-name-hash
in this case. The name hash values are doing a lot to assist with delta
bases, it seems. The big pack and shallow clone cases are slightly worse
with the --full-name-hash option. Only the full repack gains some
benefits in size.

The results are similar with the nodejs/node repo:

Test                                               HEAD
---------------------------------------------------------------------
5313.2: thin pack                                  0.01(0.01+0.00)
5313.3: thin pack size                                        1.6K
5313.4: thin pack with --full-name-hash            0.01(0.00+0.00)
5313.5: thin pack size with --full-name-hash                  3.1K
5313.6: big pack                                   4.26(8.03+0.24)
5313.7: big pack size                                        56.0M
5313.8: big pack with --full-name-hash             4.16(6.55+0.22)
5313.9: big pack size with --full-name-hash                  56.2M
5313.10: shallow fetch pack                        7.67(11.80+0.29)
5313.11: shallow pack size                                  104.6M
5313.12: shallow pack with --full-name-hash        7.52(9.65+0.23)
5313.13: shallow pack size with --full-name-hash            105.9M
5313.14: repack                                    71.22(317.61+3.95)
5313.15: repack size                                        739.9M
5313.16: repack with --full-name-hash              48.85(267.02+3.72)
5313.17: repack size with --full-name-hash                  793.5M

The Linux kernel repository was the initial target of the default name
hash value, and its naming conventions are practically build to take the
most advantage of the default name hash values:

Test                                               HEAD
-------------------------------------------------------------------------
5313.2: thin pack                                  0.15(0.01+0.03)
5313.3: thin pack size                                        4.6K
5313.4: thin pack with --full-name-hash            0.03(0.02+0.01)
5313.5: thin pack size with --full-name-hash                  6.8K
5313.6: big pack                                   18.51(33.74+0.95)
5313.7: big pack size                                       201.1M
5313.8: big pack with --full-name-hash             16.01(29.81+0.88)
5313.9: big pack size with --full-name-hash                 202.1M
5313.10: shallow fetch pack                        11.49(17.61+0.54)
5313.11: shallow pack size                                  269.2M
5313.12: shallow pack with --full-name-hash        11.24(15.25+0.56)
5313.13: shallow pack size with --full-name-hash            269.8M
5313.14: repack                                    1001.25(2271.06+38.86)
5313.15: repack size                                          2.5G
5313.16: repack with --full-name-hash              625.75(1941.96+36.09)
5313.17: repack size with --full-name-hash                    2.6G

Finally, an internal Javascript repo of moderate size shows significant
gains when repacking with --full-name-hash due to it having many name
hash collisions. However, it's worth noting that only the full repack
case has enough improvement to be worth it. But the improvements are
significant: 6.4 GB to 862 MB.

Test                                               HEAD
--------------------------------------------------------------------------
5313.2: thin pack                                  0.03(0.02+0.00)
5313.3: thin pack size                                        1.2K
5313.4: thin pack with --full-name-hash            0.03(0.03+0.00)
5313.5: thin pack size with --full-name-hash                  2.6K
5313.6: big pack                                   2.20(3.23+0.30)
5313.7: big pack size                                       130.7M
5313.8: big pack with --full-name-hash             2.33(3.17+0.34)
5313.9: big pack size with --full-name-hash                 131.0M
5313.10: shallow fetch pack                        3.56(6.02+0.32)
5313.11: shallow pack size                                   44.5M
5313.12: shallow pack with --full-name-hash        2.94(3.94+0.32)
5313.13: shallow pack size with --full-name-hash             45.3M
5313.14: repack                                    2435.22(12523.11+23.53)
5313.15: repack size                                          6.4G
5313.16: repack with --full-name-hash              473.25(1805.11+17.22)
5313.17: repack size with --full-name-hash                  861.9M

These tests demonstrate that it is important to be careful about which
cases are best for using the --full-name-hash option.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
 t/perf/p5313-pack-objects.sh | 94 ++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)
 create mode 100755 t/perf/p5313-pack-objects.sh

diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh
new file mode 100755
index 00000000000..dfa29695315
--- /dev/null
+++ b/t/perf/p5313-pack-objects.sh
@@ -0,0 +1,94 @@
+#!/bin/sh
+
+test_description='Tests pack performance using bitmaps'
+. ./perf-lib.sh
+
+GIT_TEST_PASSING_SANITIZE_LEAK=0
+export GIT_TEST_PASSING_SANITIZE_LEAK
+
+test_perf_large_repo
+
+test_expect_success 'create rev input' '
+	cat >in-thin <<-EOF &&
+	$(git rev-parse HEAD)
+	^$(git rev-parse HEAD~1)
+	EOF
+
+	cat >in-big <<-EOF &&
+	$(git rev-parse HEAD)
+	^$(git rev-parse HEAD~1000)
+	EOF
+
+	cat >in-shallow <<-EOF
+	$(git rev-parse HEAD)
+	--shallow $(git rev-parse HEAD)
+	EOF
+'
+
+test_perf 'thin pack' '
+	git pack-objects --thin --stdout --revs --sparse  <in-thin >out
+'
+
+test_size 'thin pack size' '
+	test_file_size out
+'
+
+test_perf 'thin pack with --full-name-hash' '
+	git pack-objects --thin --stdout --revs --sparse --full-name-hash <in-thin >out
+'
+
+test_size 'thin pack size with --full-name-hash' '
+	test_file_size out
+'
+
+test_perf 'big pack' '
+	git pack-objects --stdout --revs --sparse  <in-big >out
+'
+
+test_size 'big pack size' '
+	test_file_size out
+'
+
+test_perf 'big pack with --full-name-hash' '
+	git pack-objects --stdout --revs --sparse --full-name-hash <in-big >out
+'
+
+test_size 'big pack size with --full-name-hash' '
+	test_file_size out
+'
+
+test_perf 'shallow fetch pack' '
+	git pack-objects --stdout --revs --sparse --shallow <in-shallow >out
+'
+
+test_size 'shallow pack size' '
+	test_file_size out
+'
+
+test_perf 'shallow pack with --full-name-hash' '
+	git pack-objects --stdout --revs --sparse --shallow --full-name-hash <in-shallow >out
+'
+
+test_size 'shallow pack size with --full-name-hash' '
+	test_file_size out
+'
+
+test_perf 'repack' '
+	git repack -adf
+'
+
+test_size 'repack size' '
+	pack=$(ls .git/objects/pack/pack-*.pack) &&
+	test_file_size "$pack"
+'
+
+test_perf 'repack with --full-name-hash' '
+	git repack -adf --full-name-hash
+'
+
+test_size 'repack size with --full-name-hash' '
+	pack=$(ls .git/objects/pack/pack-*.pack) &&
+	test_file_size "$pack"
+'
+
+test_done
-- 
gitgitgadget


  parent reply	other threads:[~2024-11-05  3:05 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-05  3:05 [PATCH 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Derrick Stolee via GitGitGadget
2024-11-05  3:05 ` [PATCH 1/7] pack-objects: add --full-name-hash option Derrick Stolee via GitGitGadget
2024-11-21 20:08   ` Taylor Blau
2024-11-21 21:35     ` Taylor Blau
2024-11-21 23:32       ` Junio C Hamano
2024-11-22 11:46       ` Derrick Stolee
2024-11-22 11:59     ` Derrick Stolee
2024-11-26  8:26   ` Patrick Steinhardt
2024-11-05  3:05 ` [PATCH 2/7] repack: " Derrick Stolee via GitGitGadget
2024-11-21 20:12   ` Taylor Blau
2024-11-22 12:07     ` Derrick Stolee
2024-11-05  3:05 ` [PATCH 3/7] pack-objects: add GIT_TEST_FULL_NAME_HASH Derrick Stolee via GitGitGadget
2024-11-21 20:15   ` Taylor Blau
2024-11-22 12:09     ` Derrick Stolee
2024-11-22  1:13   ` Jonathan Tan
2024-11-22  3:23     ` Junio C Hamano
2024-11-22 18:01       ` Jonathan Tan
2024-11-25  0:39         ` Junio C Hamano
2024-11-25 19:45           ` Jonathan Tan
2024-11-26  1:29             ` Junio C Hamano
2024-11-26  8:26   ` Patrick Steinhardt
2024-11-05  3:05 ` [PATCH 4/7] git-repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-11-21 20:17   ` Taylor Blau
2024-11-22 15:26     ` Derrick Stolee
2024-11-05  3:05 ` Derrick Stolee via GitGitGadget [this message]
2024-11-21 20:31   ` [PATCH 5/7] p5313: add size comparison test Taylor Blau
2024-11-22 15:26     ` Derrick Stolee
2024-11-26  8:26   ` Patrick Steinhardt
2024-11-05  3:05 ` [PATCH 6/7] pack-objects: disable --full-name-hash when shallow Derrick Stolee via GitGitGadget
2024-11-21 20:33   ` Taylor Blau
2024-11-22 15:27     ` Derrick Stolee
2024-11-05  3:05 ` [PATCH 7/7] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-11-21 20:42   ` Taylor Blau
2024-11-22  1:23   ` Jonathan Tan
2024-11-21 23:50 ` [PATCH 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Jonathan Tan
2024-11-22  3:01   ` Junio C Hamano
2024-11-22  4:22     ` Junio C Hamano
2024-11-22 15:27     ` Derrick Stolee
2024-11-24 23:57       ` Junio C Hamano
2024-11-22 18:05     ` Jonathan Tan
2024-12-02 23:21 ` [PATCH v2 0/8] " Derrick Stolee via GitGitGadget
2024-12-02 23:21   ` [PATCH v2 1/8] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2024-12-04 20:06     ` karthik nayak
2024-12-04 21:05       ` Junio C Hamano
2024-12-05  9:46         ` karthik nayak
2024-12-09 23:15     ` Jonathan Tan
2024-12-10  0:01       ` Junio C Hamano
2024-12-02 23:21   ` [PATCH v2 2/8] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2024-12-04 20:53     ` karthik nayak
2024-12-02 23:21   ` [PATCH v2 3/8] repack: " Derrick Stolee via GitGitGadget
2024-12-04 21:15     ` karthik nayak
2024-12-02 23:21   ` [PATCH v2 4/8] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2024-12-04 21:21     ` karthik nayak
2024-12-09 23:12     ` Jonathan Tan
2024-12-20 17:03       ` Derrick Stolee
2024-12-02 23:21   ` [PATCH v2 5/8] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-12-02 23:21   ` [PATCH v2 6/8] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-12-02 23:21   ` [PATCH v2 7/8] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2024-12-02 23:21   ` [PATCH v2 8/8] pack-objects: add third name hash version Derrick Stolee via GitGitGadget
2024-12-03  3:23   ` [PATCH v2 0/8] pack-objects: Create an alternative name hash algorithm (recreated) Junio C Hamano
2024-12-04  4:56     ` Derrick Stolee
2024-12-04  5:02       ` Junio C Hamano
2024-12-20 17:19   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2024-12-20 17:19     ` [PATCH v3 1/8] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2025-01-22 22:08       ` Taylor Blau
2024-12-20 17:19     ` [PATCH v3 2/8] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2025-01-22 22:17       ` Taylor Blau
2025-01-24 17:29         ` Derrick Stolee
2024-12-20 17:19     ` [PATCH v3 3/8] repack: " Derrick Stolee via GitGitGadget
2025-01-22 22:18       ` Taylor Blau
2024-12-20 17:19     ` [PATCH v3 4/8] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2025-01-22 22:20       ` Taylor Blau
2024-12-20 17:19     ` [PATCH v3 5/8] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-12-20 17:19     ` [PATCH v3 6/8] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-12-20 17:19     ` [PATCH v3 7/8] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2025-01-22 22:22       ` Taylor Blau
2024-12-20 17:19     ` [PATCH v3 8/8] pack-objects: add third name hash version Derrick Stolee via GitGitGadget
2025-01-22 22:37       ` Taylor Blau
2025-01-24 17:34         ` Derrick Stolee
2025-01-21 20:21     ` [PATCH v3 0/8] pack-objects: Create an alternative name hash algorithm (recreated) Derrick Stolee
2025-01-22 23:28       ` Taylor Blau
2025-01-24 17:45         ` Derrick Stolee
2025-01-27 19:02     ` [PATCH v4 0/7] " Derrick Stolee via GitGitGadget
2025-01-27 19:02       ` [PATCH v4 1/7] pack-objects: create new name-hash function version Jonathan Tan via GitGitGadget
2025-01-27 19:02       ` [PATCH v4 2/7] pack-objects: add --name-hash-version option Derrick Stolee via GitGitGadget
2025-01-27 21:18         ` Junio C Hamano
2025-01-29 13:38           ` Derrick Stolee
2025-01-27 19:02       ` [PATCH v4 3/7] repack: " Derrick Stolee via GitGitGadget
2025-01-27 19:02       ` [PATCH v4 4/7] pack-objects: add GIT_TEST_NAME_HASH_VERSION Derrick Stolee via GitGitGadget
2025-01-27 19:02       ` [PATCH v4 5/7] p5313: add size comparison test Derrick Stolee via GitGitGadget
2025-01-27 19:02       ` [PATCH v4 6/7] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2025-01-27 19:02       ` [PATCH v4 7/7] pack-objects: prevent name hash version change Derrick Stolee via GitGitGadget
2025-01-31 21:39       ` [PATCH v4 0/7] pack-objects: Create an alternative name hash algorithm (recreated) Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c14ef6879e451401381ebbdb8f30d33c8f56c25b.1730775908.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=johncai86@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).