From: Patrick Steinhardt <ps@pks.im>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>,
Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>,
Christian Couder <christian.couder@gmail.com>
Subject: Re: [PATCH v6 4/4] last-modified: use Bloom filters when available
Date: Thu, 31 Jul 2025 08:43:09 +0200 [thread overview]
Message-ID: <aIsQfZyMq0ltbMtz@pks.im> (raw)
In-Reply-To: <20250730175510.987383-5-toon@iotcl.com>
On Wed, Jul 30, 2025 at 07:55:10PM +0200, Toon Claes wrote:
> Our 'git last-modified' performs a revision walk, and computes a diff at
> each point in the walk to figure out whether a given revision changed
> any of the paths it considers interesting.
>
> When changed-path Bloom filters are available, we can avoid computing
> many such diffs. Before computing a diff, we first check if any of the
> remaining paths of interest were possibly changed at a given commit by
> consulting its Bloom filter. If any of them are, we are resigned to
> compute the diff.
>
> If none of those queries returned "maybe", we know that the given commit
> doesn't contain any changed paths which are interesting to us. So, we
> can avoid computing it in this case.
>
> Comparing the perf test results on git.git:
>
> Test HEAD~ HEAD
> ------------------------------------------------------------------------------------
> 8020.1: top-level last-modified 4.49(4.34+0.11) 2.22(2.05+0.09) -50.6%
> 8020.2: top-level recursive last-modified 5.64(5.45+0.11) 5.62(5.30+0.11) -0.4%
> 8020.3: subdir last-modified 0.11(0.06+0.04) 0.07(0.03+0.04) -36.4%
Nice results.
> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> index e4c73464c7..19bf25f8a5 100644
> --- a/builtin/last-modified.c
> +++ b/builtin/last-modified.c
> @@ -179,6 +192,27 @@ static void last_modified_diff(struct diff_queue_struct *q,
> }
> }
>
> +static int maybe_changed_path(struct last_modified *lm, struct commit *origin)
> +{
> + struct bloom_filter *filter;
> + struct last_modified_entry *ent;
> + struct hashmap_iter iter;
> +
> + if (!lm->rev.bloom_filter_settings)
> + return 1;
> +
> + filter = get_bloom_filter(lm->rev.repo, origin);
> + if (!filter)
> + return 1;
> +
> + hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
> + if (bloom_filter_contains(filter, &ent->key,
> + lm->rev.bloom_filter_settings))
> + return 1;
> + }
> + return 0;
> +}
This function is basically the same as `maybe_changed_paths()` in
"blame.c", but that isn't a huge issue from my point of view. What makes
me wonder though is why we have an additional check over there for
whether or not the commit has a valid generation number.
> @@ -227,6 +264,9 @@ static int last_modified_init(struct last_modified *lm, struct repository *r,
> return argc;
> }
>
> + prepare_commit_graph(lm->rev.repo);
> + lm->rev.bloom_filter_settings = get_bloom_filter_settings(lm->rev.repo);
> +
So this here is why we export `prepare_commit_graph()`. How about we
instead expose `bloom_filters_enabled()` that mirrors what we do in
`generation_numbers_enabled()` and `corrected_commit_dates_enabled()`?
That would both be on a higher level and do exactly what we want to
achieve.
Patrick
next prev parent reply other threads:[~2025-07-31 6:43 UTC|newest]
Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-22 17:46 [PATCH RFC 0/5] Introduce git-blame-tree(1) command Toon Claes
2025-04-22 17:46 ` [PATCH RFC 1/5] blame-tree: introduce new subcommand to blame files Toon Claes
2025-04-24 16:19 ` Junio C Hamano
2025-05-07 13:13 ` Toon Claes
2025-04-22 17:46 ` [PATCH RFC 2/5] t/perf: add blame-tree perf script Toon Claes
2025-04-22 17:46 ` [PATCH RFC 3/5] blame-tree: use Bloom filters when available Toon Claes
2025-04-22 17:46 ` [PATCH RFC 4/5] blame-tree: implement faster algorithm Toon Claes
2025-04-22 17:46 ` [PATCH RFC 5/5] blame-tree.c: initialize revision machinery without walk Toon Claes
2025-04-23 13:26 ` [PATCH RFC 0/5] Introduce git-blame-tree(1) command Marc Branchaud
2025-05-07 14:22 ` Toon Claes
2025-05-07 20:23 ` Marc Branchaud
2025-05-07 20:45 ` Junio C Hamano
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 14:26 ` Junio C Hamano
2025-05-08 15:12 ` Marc Branchaud
2025-05-14 14:42 ` Toon Claes
2025-05-14 19:29 ` Junio C Hamano
2025-05-14 21:15 ` Marc Branchaud
2025-05-15 13:29 ` Patrick Steinhardt
2025-05-15 16:39 ` Junio C Hamano
2025-05-15 17:39 ` Marc Branchaud
2025-05-15 19:30 ` Jeff King
2025-05-16 4:38 ` Patrick Steinhardt
2025-05-20 8:49 ` Toon Claes
2025-05-15 17:30 ` Marc Branchaud
2025-05-16 4:30 ` Patrick Steinhardt
2025-05-14 21:15 ` Marc Branchaud
2025-05-07 20:49 ` Kristoffer Haugsbakk
2025-05-08 13:20 ` D. Ben Knoble
2025-05-08 13:26 ` Marc Branchaud
2025-05-08 13:18 ` D. Ben Knoble
2025-05-23 9:33 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 1/5] last-modified: new subcommand to show when files were last modified Toon Claes
2025-05-25 20:07 ` Justin Tobler
2025-06-05 8:32 ` Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-06-13 9:34 ` Toon Claes
2025-06-13 9:52 ` Kristoffer Haugsbakk
2025-05-23 9:33 ` [PATCH RFC v2 2/5] t/perf: add last-modified perf script Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 3/5] last-modified: use Bloom filters when available Toon Claes
2025-05-27 10:40 ` Patrick Steinhardt
2025-06-13 11:05 ` Toon Claes
2025-05-23 9:33 ` [PATCH RFC v2 4/5] last-modified: implement faster algorithm Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-05-23 9:33 ` [PATCH RFC v2 5/5] last-modified: initialize revision machinery without walk Toon Claes
2025-05-27 10:39 ` Patrick Steinhardt
2025-07-01 20:35 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Kristoffer Haugsbakk
2025-07-01 21:06 ` Junio C Hamano
2025-07-01 21:30 ` Kristoffer Haugsbakk
2025-07-02 13:00 ` Toon Claes
2025-07-09 15:53 ` Toon Claes
2025-07-09 17:00 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 0/3] " Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-01 20:20 ` Kristoffer Haugsbakk
2025-07-02 11:51 ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 2/3] t/perf: add last-modified perf script Toon Claes
2025-06-30 18:49 ` [PATCH RFC v3 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-01 23:01 ` [PATCH RFC v3 0/3] Introduce git-last-modified(1) command Junio C Hamano
2025-07-09 15:26 ` [PATCH v4 " Toon Claes
2025-07-09 21:57 ` Junio C Hamano
2025-07-10 18:37 ` Junio C Hamano
2025-07-16 13:32 ` [PATCH v5 0/6] " Toon Claes
2025-07-16 13:35 ` [PATCH v5 1/6] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-18 0:02 ` Taylor Blau
2025-07-19 6:44 ` Jeff King
2025-07-22 15:50 ` Toon Claes
2025-08-01 9:09 ` Christian Couder
2025-08-01 16:59 ` Junio C Hamano
2025-07-16 13:35 ` [PATCH v5 2/6] t/perf: add last-modified perf script Toon Claes
2025-07-18 0:08 ` Taylor Blau
2025-07-22 15:52 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 3/6] last-modified: use Bloom filters when available Toon Claes
2025-07-18 0:16 ` Taylor Blau
2025-07-22 16:02 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 4/6] pretty: allow caller to disable indentation Toon Claes
2025-07-16 15:50 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-16 13:35 ` [PATCH v5 5/6] last-modified: support --extended format Toon Claes
2025-07-16 16:09 ` Junio C Hamano
2025-07-17 16:31 ` Toon Claes
2025-07-17 22:37 ` Junio C Hamano
2025-07-18 17:36 ` Junio C Hamano
2025-07-22 16:06 ` Toon Claes
2025-07-16 13:42 ` [PATCH v5 6/6] fixup! last-modified: use Bloom filters when available Toon Claes
2025-07-17 23:39 ` [PATCH v5 0/6] Introduce git-last-modified(1) command Taylor Blau
2025-07-22 15:35 ` Toon Claes
2025-07-30 17:59 ` Toon Claes
2025-07-31 7:45 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 0/4] " Toon Claes
2025-07-31 18:40 ` Junio C Hamano
2025-07-31 23:57 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 0/3] " Toon Claes
2025-08-05 14:34 ` Patrick Steinhardt
2025-08-05 16:21 ` Junio C Hamano
2025-08-05 16:34 ` Junio C Hamano
2025-08-05 16:55 ` Toon Claes
2025-08-05 17:20 ` Jean-Noël AVILA
2025-08-05 21:46 ` Junio C Hamano
2025-08-06 12:01 ` Toon Claes
2025-08-06 15:38 ` Junio C Hamano
2025-08-28 22:44 ` Junio C Hamano
2025-08-05 18:28 ` Junio C Hamano
2025-08-05 9:33 ` [PATCH v7 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-08-05 9:33 ` [PATCH v7 2/3] t/perf: add last-modified perf script Toon Claes
2025-08-05 9:33 ` [PATCH v7 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-30 17:55 ` [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt
2025-08-01 16:22 ` Toon Claes
2025-08-01 17:09 ` Junio C Hamano
2025-08-04 6:34 ` Patrick Steinhardt
2025-08-04 17:14 ` Junio C Hamano
2025-08-05 5:35 ` Toon Claes
2025-08-01 20:34 ` Jean-Noël AVILA
2025-08-05 5:36 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-08-01 10:18 ` Christian Couder
2025-08-01 10:22 ` Patrick Steinhardt
2025-08-01 17:06 ` Junio C Hamano
2025-08-02 8:18 ` Christian Couder
2025-08-02 11:31 ` Christian Couder
2025-08-02 13:38 ` Christian Couder
2025-08-02 16:26 ` Junio C Hamano
2025-08-04 6:35 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 2/4] t/perf: add last-modified perf script Toon Claes
2025-07-30 17:55 ` [PATCH v6 3/4] commit-graph: export prepare_commit_graph() Toon Claes
2025-07-31 6:42 ` Patrick Steinhardt
2025-07-30 17:55 ` [PATCH v6 4/4] last-modified: use Bloom filters when available Toon Claes
2025-07-31 6:43 ` Patrick Steinhardt [this message]
2025-08-01 16:23 ` Toon Claes
2025-08-04 6:33 ` Patrick Steinhardt
2025-07-09 15:26 ` [PATCH v4 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-09 15:26 ` [PATCH v4 2/3] t/perf: add last-modified perf script Toon Claes
2025-07-09 15:26 ` [PATCH v4 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-16 13:35 ` [PATCH v5 6/6] fixup! " Toon Claes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aIsQfZyMq0ltbMtz@pks.im \
--to=ps@pks.im \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=me@ttaylorr.com \
--cc=stolee@gmail.com \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.