git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org,  johannes.schindelin@gmx.de,  peff@peff.net,
	ps@pks.im,  me@ttaylorr.com,  johncai86@gmail.com,
	 newren@gmail.com
Subject: Re: [PATCH 0/4] pack-objects: create new name-hash algorithm
Date: Tue, 10 Sep 2024 13:36:22 -0700	[thread overview]
Message-ID: <xmqqikv3dz5l.fsf@gitster.g> (raw)
In-Reply-To: <xmqq4j6nlcfy.fsf@gitster.g> (Junio C. Hamano's message of "Tue, 10 Sep 2024 09:07:29 -0700")

Junio C Hamano <gitster@pobox.com> writes:

> Derrick Stolee <stolee@gmail.com> writes:
>
>> The thing that surprised me is just how effective this is for the
>> creation of large pack-files that include many versions of most
>> files. The cross-path deltas have less of an effect here, and the
>> benefits of avoiding name-hash collisions can be overwhelming in
>> many cases.
>
> Yes, "make sure we notice a file F moving from directory A to B" is
> inherently optimized for short span of history, i.e. a smallish push
> rather than a whole history clone, where the definition of
> "smallish" is that even if you create optimal delta chains, the
> length of these delta chains will not exceed the "--depth" option.
>
> If the history you are pushing modified A/F twice, renamed it to B/F
> (with or without modification at the same time), then modified B/F
> twice more, you'd want to pack the 5-commit segment and having to
> artificially cut the delta chain that can contain all of these 5
> blobs into two at the renaming commit is a huge loss.

Which actually leads me to suspect that we probably do not even have
to expose the --full-name-hash option to the end users in "git repack".

If we are doing incremental that would fit within the depth setting,
it is likely that we would be better off without the full-name-hash
optimization, and if we are doing "repack -a" for the whole
repository, especially with "-f", it would make sense to do the
full-name-hash optimization.

If we can tell how large a chunk of history we are packing before we
actually start calling builtin/pack-objects.c:add_object_entry(), we
probably should be able to even select between with and without
full-name-hash automatically, but I do not think we know the object
count before we finish calling add_object_entry(), so unless we are
willing to compute and keep both while reading and pick between the
two after we finish reading the list of objects, or something, it
will require a major surgery to do so, I am afraid.


  reply	other threads:[~2024-09-10 20:36 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-09 13:56 [PATCH 0/4] pack-objects: create new name-hash algorithm Derrick Stolee via GitGitGadget
2024-09-09 13:56 ` [PATCH 1/4] pack-objects: add --full-name-hash option Derrick Stolee via GitGitGadget
2024-09-09 17:45   ` Junio C Hamano
2024-09-10  2:31     ` Derrick Stolee
2024-09-09 13:56 ` [PATCH 2/4] git-repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-09-09 17:48   ` Junio C Hamano
2024-09-09 13:56 ` [PATCH 3/4] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-09-09 13:56 ` [PATCH 4/4] p5314: add a size test for name-hash collisions Derrick Stolee via GitGitGadget
2024-09-09 16:07 ` [PATCH 0/4] pack-objects: create new name-hash algorithm Derrick Stolee
2024-09-09 18:06 ` Junio C Hamano
2024-09-10  2:37   ` Derrick Stolee
2024-09-10 14:56     ` Taylor Blau
2024-09-10 21:05       ` Derrick Stolee
2024-09-11  6:32         ` Jeff King
2024-09-10 16:07     ` Junio C Hamano
2024-09-10 20:36       ` Junio C Hamano [this message]
2024-09-10 21:09         ` Derrick Stolee
2024-09-18 20:46 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2024-09-18 20:46   ` [PATCH v2 1/6] pack-objects: add --full-name-hash option Derrick Stolee via GitGitGadget
2024-09-19 21:56     ` Junio C Hamano
2024-09-18 20:46   ` [PATCH v2 2/6] repack: test " Derrick Stolee via GitGitGadget
2024-09-19 22:11     ` Junio C Hamano
2024-09-18 20:46   ` [PATCH v2 3/6] pack-objects: add GIT_TEST_FULL_NAME_HASH Derrick Stolee via GitGitGadget
2024-09-19 22:22     ` Junio C Hamano
2024-09-23  1:39       ` Derrick Stolee
2024-09-18 20:46   ` [PATCH v2 4/6] git-repack: update usage to match docs Derrick Stolee via GitGitGadget
2024-09-18 20:46   ` [PATCH v2 5/6] p5313: add size comparison test Derrick Stolee via GitGitGadget
2024-09-19 21:58     ` Junio C Hamano
2024-09-23  1:50       ` Derrick Stolee
2024-09-23 16:14         ` Junio C Hamano
2024-09-18 20:46   ` [PATCH v2 6/6] test-tool: add helper for name-hash values Derrick Stolee via GitGitGadget
2024-09-24  7:02     ` Patrick Steinhardt
2024-09-25 13:35       ` Derrick Stolee
2024-09-18 23:30   ` [PATCH v2 0/6] pack-objects: create new name-hash algorithm Derrick Stolee
2024-10-04 21:17   ` Junio C Hamano
2024-10-04 22:30     ` Taylor Blau
2024-10-04 22:35     ` Jeff King
2024-10-08 18:32   ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqikv3dz5l.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=johncai86@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).