From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: "Toon Claes" <toon@iotcl.com>,
git@vger.kernel.org, "Patrick Steinhardt" <ps@pks.im>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH 0/8] Introduce git-blame-tree(1) command
Date: Tue, 1 Apr 2025 05:29:52 -0400 [thread overview]
Message-ID: <20250401092952.GD21089@coredump.intra.peff.net> (raw)
In-Reply-To: <Z+XJ+1L3PnC9Dyba@nand.local>
On Thu, Mar 27, 2025 at 05:58:19PM -0400, Taylor Blau wrote:
> On Thu, Mar 27, 2025 at 02:32:43AM -0400, Jeff King wrote:
> > The pathspec-trie stuff is, I think, still a reasonable idea for general
> > use. But IIRC, the rewritten blame-tree you guys worked on does not
> > benefit from it, because it ditches pathspecs entirely (both because
> > they're too slow without the tries, but also because it's important to
> > continually narrow the pathspec while traversing). That trie code was
> > never run in production, I think (and I see there is a patch to narrow
> > the pathspec while traversing; I suspect that likewise was never used).
>
> Yeah, the rewritten blame-tree code uses changed-path Bloom filters to
> narrow the set of revisions that we need to actually compute tree-diffs
> for.
>
> The general idea is that we have a set of paths that we have yet to
> blame, and those are the "interesting" ones. IOW, if a changed-path
> Bloom filter tells us that we are at some revision where there is maybe
> a change to one or more unblamed paths, we need to compute a tree-diff.
> But if the Bloom filter says "no", then we can skip the tree-diff at
> that layer entirely.
You'd still in theory benefit from the tree-diffs you _do_ run using a
continually narrowing pathspec. Skimming over the code from your
tb/blame-tree branch, it looks like it's just fed the original pathspec.
But that's probably good enough in practice. Especially for
non-recursive blame-trees, where pruning already-matched entries will
never save you from opening another tree anyway.
> > So yeah. I don't know if all of this is really a very good starting
> > point. Taylor, if you can share the current code that GitHub is running,
> > I think that would be beneficial for the community.
>
> Sure. You can fetch from the 'tb/blame-tree' branch from my tree (which
> is located at 'git@github.com:ttaylorr/git.git'). I owe a huge "thank
> you" to Victoria Dye, who split out the various topics from GitHub's
> fork into individual rebased branches.
Thanks. I don't have time to pick it up as a topic myself, but hopefully
it can be useful to Toon (or any others interested in the topic).
-Peff
next prev parent reply other threads:[~2025-04-01 9:29 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-26 20:18 [PATCH 0/8] Introduce git-blame-tree(1) command Toon Claes
2025-03-26 20:18 ` [PATCH 1/8] combine-diff: zero memory used for callback filepairs Toon Claes
2025-03-26 20:18 ` [PATCH 2/8] within_depth: fix return for empty path Toon Claes
2025-03-26 20:18 ` [PATCH 3/8] diff: teach tree-diff a max-depth parameter Toon Claes
2025-03-26 20:18 ` [PATCH 4/8] blame-tree: introduce new subcommand to blame files Toon Claes
2025-03-26 20:18 ` [PATCH 5/8] pathspec: add optional trie index Toon Claes
2025-03-26 20:18 ` [PATCH 6/8] pathspec: turn on tries when appropriate Toon Claes
2025-03-26 20:18 ` [PATCH 7/8] tree-diff: use pathspec tries Toon Claes
2025-03-26 20:18 ` [PATCH 8/8] blame-tree: narrow pathspec as we traverse Toon Claes
2025-03-26 20:38 ` [PATCH 0/8] Introduce git-blame-tree(1) command Taylor Blau
2025-03-27 6:32 ` Jeff King
2025-03-27 21:58 ` Taylor Blau
2025-03-28 13:27 ` Toon Claes
2025-04-01 9:29 ` Jeff King [this message]
2025-03-27 12:04 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250401092952.GD21089@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=ps@pks.im \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).