From: Taylor Blau <me@ttaylorr.com>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org,
Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>,
Derrick Stolee <stolee@gmail.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v5 3/6] last-modified: use Bloom filters when available
Date: Thu, 17 Jul 2025 20:16:27 -0400 [thread overview]
Message-ID: <aHmSW1pOUF0U3KFk@nand.local> (raw)
In-Reply-To: <20250716133518.1788126-3-toon@iotcl.com>
On Wed, Jul 16, 2025 at 03:35:15PM +0200, Toon Claes wrote:
> Our 'git last-modified' performs a revision walk, and computes a diff at
> each point in the walk to figure out whether a given revision changed
> any of the paths it considers interesting.
>
> When changed-path Bloom filters are available, we can avoid computing
> many such diffs. Before computing a diff, we first check if any of the
> remaining paths of interest were possibly changed at a given commit by
> consulting its Bloom filter. If any of them are, we are resigned to
> compute the diff.
>
> If none of those queries returned "maybe", we know that the given commit
> doesn't contain any changed paths which are interesting to us. So, we
> can avoid computing it in this case.
>
> Comparing the perf test results on git.git:
>
> Test HEAD~ HEAD
> ------------------------------------------------------------------------------------
> 8020.1: top-level last-modified 4.49(4.34+0.11) 2.22(2.05+0.09) -50.6%
> 8020.2: top-level recursive last-modified 5.64(5.45+0.11) 5.62(5.30+0.11) -0.4%
> 8020.3: subdir last-modified 0.11(0.06+0.04) 0.07(0.03+0.04) -36.4%
As an aside on 8020.3 (that I probably should have mentioned in the last
commit), I think that our "| head -n1" heuristic for picking a sub-tree
is skewing these results down. In git.git, the lexicographically
earliest sub-tree is ".github", which is awfully tiny. I wonder if we
should be grabbing the *last* sub-tree, or maybe the largest one by
count of entries?
> @@ -40,6 +43,12 @@ struct last_modified {
>
> static void last_modified_release(struct last_modified *lm)
> {
> + struct hashmap_iter iter;
> + struct last_modified_entry *ent;
> +
> + hashmap_for_each_entry(&lm->paths, &iter, ent, hashent)
> + clear_bloom_key(&ent->key);
> +
I did a double-take to make sure that ent->key would always be
initialized here, but it is thanks to the FLEX_ALLOC_STR() call below.
> hashmap_clear_and_free(&lm->paths, struct last_modified_entry, hashent);
> release_revisions(&lm->rev);
> }
> @@ -67,6 +76,9 @@ static void add_path_from_diff(struct diff_queue_struct *q,
>
> FLEX_ALLOC_STR(ent, path, path);
> oidcpy(&ent->oid, &p->two->oid);
> + if (lm->rev.bloom_filter_settings)
> + fill_bloom_key(path, strlen(path), &ent->key,
> + lm->rev.bloom_filter_settings);
> hashmap_entry_init(&ent->hashent, strhash(ent->path));
> hashmap_add(&lm->paths, &ent->hashent);
> }
> @@ -126,6 +138,7 @@ static void mark_path(const char *path, const struct object_id *oid,
> data->callback(path, data->commit, data->callback_data);
>
> hashmap_remove(data->paths, &ent->hashent, path);
> + clear_bloom_key(&ent->key);
OK, we're calling clear_bloom_key() here, too, but it uses
FREE_AND_NULL(), so calling it again in last_modified_release() may be a
noop, but won't ever be a double-free.
> @@ -238,6 +276,13 @@ static int last_modified_init(struct last_modified *lm, struct repository *r,
> return argc;
> }
>
> + /*
> + * We're not interested in generation numbers here,
> + * but calling this function to prepare the commit-graph.
> + */
> + (void)generation_numbers_enabled(lm->rev.repo);
> + lm->rev.bloom_filter_settings = get_bloom_filter_settings(lm->rev.repo);
Hmmph. I think when I originally wrote this I was using the side-effect
of calling generation_numbers_enabled() as a hack. But I think that it
may be worth making "prepare_commit_graph()" a non-static function and
calling that instead.
Thanks,
Taylor
next prev parent reply other threads:[~2025-07-18 0:16 UTC|newest]
Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 17:46 [PATCH RFC 0/5] Introduce git-blame-tree(1) command Toon Claes
2025-04-22 17:46 ` [PATCH RFC 1/5] blame-tree: introduce new subcommand to blame files Toon Claes
2025-04-24 16:19 ` Junio C Hamano
2025-05-07 13:13 ` Toon Claes
2025-04-22 17:46 ` [PATCH RFC 2/5] t/perf: add blame-tree perf script Toon Claes
2025-04-22 17:46 ` [PATCH RFC 3/5] blame-tree: use Bloom filters when available Toon Claes
2025-04-22 17:46 ` [PATCH RFC 4/5] blame-tree: implement faster algorithm Toon Claes
2025-04-22 17:46 ` [PATCH RFC 5/5] blame-tree.c: initialize revision machinery without walk Toon Claes
2025-04-23 13:26 ` [PATCH RFC 0/5] Introduce git-blame-tree(1) command Marc Branchaud
2025-05-07 14:22 ` Toon Claes
2025-05-07 20:23 ` Marc Branchaud
2025-05-07 20:45 ` Junio C Hamano
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 14:26 ` Junio C Hamano
2025-05-08 15:12 ` Marc Branchaud
2025-05-14 14:42 ` Toon Claes
2025-05-14 19:29 ` Junio C Hamano
2025-05-14 21:15 ` Marc Branchaud
2025-05-15 13:29 ` Patrick Steinhardt
2025-05-15 16:39 ` Junio C Hamano
2025-05-15 17:39 ` Marc Branchaud
2025-05-15 19:30 ` Jeff King
2025-05-16 4:38 ` Patrick Steinhardt
2025-05-20 8:49 ` Toon Claes
2025-05-15 17:30 ` Marc Branchaud
2025-05-16 4:30 ` Patrick Steinhardt
2025-05-14 21:15 ` Marc Branchaud
2025-05-07 20:49 ` Kristoffer Haugsbakk
2025-05-08 13:20 ` D. Ben Knoble
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 13:18 ` D. Ben Knoble
2025-05-23 9:33 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 1/5] last-modified: new subcommand to show when files were last modified Toon Claes
2025-05-25 20:07 ` Justin Tobler
2025-06-05 8:32 ` Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-06-13 9:34 ` Toon Claes
2025-06-13 9:52 ` Kristoffer Haugsbakk
2025-05-23 9:33 ` [PATCH RFC v2 2/5] t/perf: add last-modified perf script Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 3/5] last-modified: use Bloom filters when available Toon Claes
2025-05-27 10:40 ` Patrick Steinhardt
2025-06-13 11:05 ` Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 4/5] last-modified: implement faster algorithm Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-05-23 9:33 ` [PATCH RFC v2 5/5] last-modified: initialize revision machinery without walk Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-07-01 20:35 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Kristoffer Haugsbakk
2025-07-01 21:06 ` Junio C Hamano
2025-07-01 21:30 ` Kristoffer Haugsbakk
2025-07-02 13:00 ` Toon Claes
2025-07-09 15:53 ` Toon Claes
2025-07-09 17:00 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 0/3] " Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-01 20:20 ` Kristoffer Haugsbakk
2025-07-02 11:51 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 2/3] t/perf: add last-modified perf script Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-01 23:01 ` [PATCH RFC v3 0/3] Introduce git-last-modified(1) command Junio C Hamano
2025-07-09 15:26 ` [PATCH v4 " Toon Claes
2025-07-09 21:57 ` Junio C Hamano
2025-07-10 18:37 ` Junio C Hamano
2025-07-16 13:32 ` [PATCH v5 0/6] " Toon Claes
2025-07-16 13:35 ` [PATCH v5 1/6] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-18 0:02 ` Taylor Blau
2025-07-19 6:44 ` Jeff King
2025-07-22 15:50 ` Toon Claes
2025-08-01 9:09 ` Christian Couder
2025-08-01 16:59 ` Junio C Hamano
2025-07-16 13:35 ` [PATCH v5 2/6] t/perf: add last-modified perf script Toon Claes
2025-07-18 0:08 ` Taylor Blau
2025-07-22 15:52 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 3/6] last-modified: use Bloom filters when available Toon Claes
2025-07-18 0:16 ` Taylor Blau [this message]
2025-07-22 16:02 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 4/6] pretty: allow caller to disable indentation Toon Claes
2025-07-16 15:50 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 5/6] last-modified: support --extended format Toon Claes
2025-07-16 16:09 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-17 22:37 ` Junio C Hamano
2025-07-18 17:36 ` Junio C Hamano
2025-07-22 16:06 ` Toon Claes
2025-07-16 13:42 ` [PATCH v5 6/6] fixup! last-modified: use Bloom filters when available Toon Claes
2025-07-17 23:39 ` [PATCH v5 0/6] Introduce git-last-modified(1) command Taylor Blau
2025-07-22 15:35 ` Toon Claes
2025-07-30 17:59 ` Toon Claes
2025-07-31 7:45 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 0/4] " Toon Claes
2025-07-31 18:40 ` Junio C Hamano
2025-07-31 23:57 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 0/3] " Toon Claes
2025-08-05 14:34 ` Patrick Steinhardt
2025-08-05 16:21 ` Junio C Hamano
2025-08-05 16:34 ` Junio C Hamano
2025-08-05 16:55 ` Toon Claes
2025-08-05 17:20 ` Jean-Noël AVILA
2025-08-05 21:46 ` Junio C Hamano
2025-08-06 12:01 ` Toon Claes
2025-08-06 15:38 ` Junio C Hamano
2025-08-28 22:44 ` Junio C Hamano
2025-08-05 18:28 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-08-05 9:33 ` [PATCH v7 2/3] t/perf: add last-modified perf script Toon Claes
2025-08-05 9:33 ` [PATCH v7 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-30 17:55 ` [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt
2025-08-01 16:22 ` Toon Claes
2025-08-01 17:09 ` Junio C Hamano
2025-08-04 6:34 ` Patrick Steinhardt
2025-08-04 17:14 ` Junio C Hamano
2025-08-05 5:35 ` Toon Claes
2025-08-01 20:34 ` Jean-Noël AVILA
2025-08-05 5:36 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-08-01 10:18 ` Christian Couder
2025-08-01 10:22 ` Patrick Steinhardt
2025-08-01 17:06 ` Junio C Hamano
2025-08-02 8:18 ` Christian Couder
2025-08-02 11:31 ` Christian Couder
2025-08-02 13:38 ` Christian Couder
2025-08-02 16:26 ` Junio C Hamano
2025-08-04 6:35 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 2/4] t/perf: add last-modified perf script Toon Claes
2025-07-30 17:55 ` [PATCH v6 3/4] commit-graph: export prepare_commit_graph() Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 4/4] last-modified: use Bloom filters when available Toon Claes
2025-07-31 6:43 ` Patrick Steinhardt
2025-08-01 16:23 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-07-09 15:26 ` [PATCH v4 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-09 15:26 ` [PATCH v4 2/3] t/perf: add last-modified perf script Toon Claes
2025-07-09 15:26 ` [PATCH v4 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-16 13:35 ` [PATCH v5 6/6] fixup! " Toon Claes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aHmSW1pOUF0U3KFk@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=stolee@gmail.com \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).