From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Derrick Stolee <stolee@gmail.com>, Taylor Blau <me@ttaylorr.com>,
Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, johannes.schindelin@gmx.de, ps@pks.im,
johncai86@gmail.com, newren@gmail.com,
christian.couder@gmail.com, kristofferhaugsbakk@fastmail.com,
jonathantanmy@google.com
Subject: Re: [PATCH 0/6] PATH WALK I: The path-walk API
Date: Tue, 12 Nov 2024 07:29:06 +0900 [thread overview]
Message-ID: <xmqq1pzhbe19.fsf@gitster.g> (raw)
In-Reply-To: <20241111215502.GC5019@coredump.intra.peff.net> (Jeff King's message of "Mon, 11 Nov 2024 16:55:02 -0500")
Jeff King <peff@peff.net> writes:
>> Yes. Due to --depth limit, we need to break delta chains somewhere
>> anyway, and a rename boundary is just as good place as any other in
>> a sufficiently long chain.
>
> We don't necessarily have to break the chains due to depth limits,
> because they are not always linear. They can end up as bushy trees,
True. And being able to pair blobs before and after a rename will
give us more candidates to place in a single bushy tree, so in that
sense, with a short segment of history, it is understandable that
the full-name hash fails to have as many candidates as the original
hash gives us. But with sufficiently large number of blobs at the
same path that are similar (i.e. not a "pushing a short segment of
history", but an initial clone), splitting what could be one delta
family into two delta families at the rename boundary is not too
bad, as long as both halves have enough blobs to deltify against
each other.
> I'm not sure in practice how often we find these kinds of deltas. If you
> look at, say, all the deltas for "Makefile" in git.git like this:
>
> git rev-list --objects --all --reflog --full-history -- Makefile |
> perl -lne 'print $1 if /(.*) Makefile/' |
> git cat-file --batch-check='%(objectsize:disk) %(objectname) %(deltabase)'
>
> my repo has 33 full copies (you can see non-deltas by grepping for
> "0{40}$" in the output) out of 4473 total. So it's not like we never
> break chains. But we can use graphviz to visualize it by piping the
> above through:
>
> perl -alne '
> BEGIN { print "digraph {" }
> print "node_$F[1] [label=$F[0]]";
> print "node_$F[1] -> node_$F[2]" if $F[2] !~ /^0+$/;
> END { print "}" }
> '
>
> and then piping it through "dot" to make an svg, or using an interactive
> viewer like "xdot" (the labels are the on-disk size of each object). I
> see a lot of wide parts of the graph in the output.
>
> Of course this may all depend on packing patterns, too. I did my
> investigations after running "git repack -adf" to generate what should
> be a pretty reasonable pack. You might see something different from
> incremental repacking over time.
That is very true. I forgot that we do things to encourage bushy
delta-base selection. One thing I also am happy to see is the
effect of our "clever" delta-base selection, where the algorithm
does not blindly favor the delta-base that makes the resulting delta
absolutely minimal, but takes the depth of the delta-base into
account (i.e. a base at a much shallower depth is preferred over a
base near the depth limit, even if it results in a slightly larger
delta data).
> I'm not sure what any of this means for --path-walk, of course. ;)
> Ultimately we care about resulting size and time to compute, so if it
> can do better on those metrics then it doesn't matter what the graph
> looks like.
True, too. Another thing that we care about is the time to access
data, and favoring shallow delta chain, even with the help of the
in-core delta-base cache, has merit.
Thanks.
next prev parent reply other threads:[~2024-11-11 22:29 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-31 6:26 [PATCH 0/6] PATH WALK I: The path-walk API Derrick Stolee via GitGitGadget
2024-10-31 6:26 ` [PATCH 1/6] path-walk: introduce an object walk by path Derrick Stolee via GitGitGadget
2024-11-01 13:12 ` karthik nayak
2024-11-01 13:44 ` Derrick Stolee
[not found] ` <draft-87r07v14kl.fsf@archlinux.mail-host-address-is-not-set>
2024-11-01 13:42 ` karthik nayak
2024-10-31 6:26 ` [PATCH 2/6] test-lib-functions: add test_cmp_sorted Derrick Stolee via GitGitGadget
2024-10-31 6:27 ` [PATCH 3/6] t6601: add helper for testing path-walk API Derrick Stolee via GitGitGadget
2024-11-01 13:46 ` karthik nayak
2024-11-01 22:23 ` Jonathan Tan
2024-11-04 15:56 ` Derrick Stolee
2024-11-04 23:39 ` Jonathan Tan
2024-11-08 14:53 ` Derrick Stolee
2024-11-06 14:04 ` Patrick Steinhardt
2024-11-08 14:58 ` Derrick Stolee
2024-10-31 6:27 ` [PATCH 4/6] path-walk: allow consumer to specify object types Derrick Stolee via GitGitGadget
2024-10-31 6:27 ` [PATCH 5/6] path-walk: visit tags and cached objects Derrick Stolee via GitGitGadget
2024-11-01 14:25 ` karthik nayak
2024-11-04 15:56 ` Derrick Stolee
2024-10-31 6:27 ` [PATCH 6/6] path-walk: mark trees and blobs as UNINTERESTING Derrick Stolee via GitGitGadget
2024-10-31 12:36 ` [PATCH 0/6] PATH WALK I: The path-walk API Derrick Stolee
2024-11-01 19:23 ` Taylor Blau
2024-11-04 15:48 ` Derrick Stolee
2024-11-04 17:25 ` Jeff King
2024-11-05 0:11 ` Junio C Hamano
2024-11-08 15:17 ` Derrick Stolee
2024-11-11 2:56 ` Junio C Hamano
2024-11-11 13:20 ` Derrick Stolee
2024-11-11 21:55 ` Jeff King
2024-11-11 22:29 ` Junio C Hamano [this message]
2024-11-11 22:04 ` Jeff King
2024-11-09 19:41 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2024-11-09 19:41 ` [PATCH v2 1/6] path-walk: introduce an object walk by path Derrick Stolee via GitGitGadget
2024-11-09 19:41 ` [PATCH v2 2/6] test-lib-functions: add test_cmp_sorted Derrick Stolee via GitGitGadget
2024-11-09 19:41 ` [PATCH v2 3/6] t6601: add helper for testing path-walk API Derrick Stolee via GitGitGadget
2024-11-21 22:39 ` Taylor Blau
2024-11-09 19:41 ` [PATCH v2 4/6] path-walk: allow consumer to specify object types Derrick Stolee via GitGitGadget
2024-11-21 22:44 ` Taylor Blau
2024-11-09 19:41 ` [PATCH v2 5/6] path-walk: visit tags and cached objects Derrick Stolee via GitGitGadget
2024-11-09 19:41 ` [PATCH v2 6/6] path-walk: mark trees and blobs as UNINTERESTING Derrick Stolee via GitGitGadget
2024-11-21 22:57 ` [PATCH v2 0/6] PATH WALK I: The path-walk API Taylor Blau
2024-11-25 8:56 ` Patrick Steinhardt
2024-11-26 7:39 ` Junio C Hamano
2024-11-26 7:43 ` Patrick Steinhardt
2024-11-26 8:16 ` Junio C Hamano
2024-12-06 19:45 ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2024-12-06 19:45 ` [PATCH v3 1/7] path-walk: introduce an object walk by path Derrick Stolee via GitGitGadget
2024-12-13 11:58 ` Patrick Steinhardt
2024-12-18 14:21 ` Derrick Stolee
2024-12-27 14:18 ` Patrick Steinhardt
2024-12-06 19:45 ` [PATCH v3 2/7] test-lib-functions: add test_cmp_sorted Derrick Stolee via GitGitGadget
2024-12-06 19:45 ` [PATCH v3 3/7] t6601: add helper for testing path-walk API Derrick Stolee via GitGitGadget
2024-12-06 19:45 ` [PATCH v3 4/7] path-walk: allow consumer to specify object types Derrick Stolee via GitGitGadget
2024-12-06 19:45 ` [PATCH v3 5/7] path-walk: visit tags and cached objects Derrick Stolee via GitGitGadget
2024-12-13 11:58 ` Patrick Steinhardt
2024-12-18 14:23 ` Derrick Stolee
2024-12-06 19:45 ` [PATCH v3 6/7] path-walk: mark trees and blobs as UNINTERESTING Derrick Stolee via GitGitGadget
2024-12-06 19:45 ` [PATCH v3 7/7] path-walk: reorder object visits Derrick Stolee via GitGitGadget
2024-12-13 11:58 ` [PATCH v3 0/7] PATH WALK I: The path-walk API Patrick Steinhardt
2024-12-20 16:21 ` [PATCH v4 " Derrick Stolee via GitGitGadget
2024-12-20 16:21 ` [PATCH v4 1/7] path-walk: introduce an object walk by path Derrick Stolee via GitGitGadget
2024-12-27 14:18 ` Patrick Steinhardt
2024-12-20 16:21 ` [PATCH v4 2/7] test-lib-functions: add test_cmp_sorted Derrick Stolee via GitGitGadget
2024-12-20 16:21 ` [PATCH v4 3/7] t6601: add helper for testing path-walk API Derrick Stolee via GitGitGadget
2024-12-20 16:21 ` [PATCH v4 4/7] path-walk: allow consumer to specify object types Derrick Stolee via GitGitGadget
2024-12-20 16:21 ` [PATCH v4 5/7] path-walk: visit tags and cached objects Derrick Stolee via GitGitGadget
2024-12-20 16:21 ` [PATCH v4 6/7] path-walk: mark trees and blobs as UNINTERESTING Derrick Stolee via GitGitGadget
2024-12-20 16:21 ` [PATCH v4 7/7] path-walk: reorder object visits Derrick Stolee via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq1pzhbe19.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=johannes.schindelin@gmx.de \
--cc=johncai86@gmail.com \
--cc=jonathantanmy@google.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).