All of lore.kernel.org
 help / color / mirror / Atom feed
From: Toon Claes <toon@iotcl.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>,
	Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>,
	Christian Couder <christian.couder@gmail.com>
Subject: Re: [PATCH v6 4/4] last-modified: use Bloom filters when available
Date: Fri, 01 Aug 2025 18:23:08 +0200	[thread overview]
Message-ID: <87ldo3ui37.fsf@iotcl.com> (raw)
In-Reply-To: <aIsQfZyMq0ltbMtz@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> On Wed, Jul 30, 2025 at 07:55:10PM +0200, Toon Claes wrote:
>> Our 'git last-modified' performs a revision walk, and computes a diff at
>> each point in the walk to figure out whether a given revision changed
>> any of the paths it considers interesting.
>> 
>> When changed-path Bloom filters are available, we can avoid computing
>> many such diffs. Before computing a diff, we first check if any of the
>> remaining paths of interest were possibly changed at a given commit by
>> consulting its Bloom filter. If any of them are, we are resigned to
>> compute the diff.
>> 
>> If none of those queries returned "maybe", we know that the given commit
>> doesn't contain any changed paths which are interesting to us. So, we
>> can avoid computing it in this case.
>> 
>> Comparing the perf test results on git.git:
>> 
>>     Test                                        HEAD~             HEAD
>>     ------------------------------------------------------------------------------------
>>     8020.1: top-level last-modified             4.49(4.34+0.11)   2.22(2.05+0.09) -50.6%
>>     8020.2: top-level recursive last-modified   5.64(5.45+0.11)   5.62(5.30+0.11) -0.4%
>>     8020.3: subdir last-modified                0.11(0.06+0.04)   0.07(0.03+0.04) -36.4%
>
> Nice results.
>
>> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
>> index e4c73464c7..19bf25f8a5 100644
>> --- a/builtin/last-modified.c
>> +++ b/builtin/last-modified.c
>> @@ -179,6 +192,27 @@ static void last_modified_diff(struct diff_queue_struct *q,
>>  	}
>>  }
>>  
>> +static int maybe_changed_path(struct last_modified *lm, struct commit *origin)
>> +{
>> +	struct bloom_filter *filter;
>> +	struct last_modified_entry *ent;
>> +	struct hashmap_iter iter;
>> +
>> +	if (!lm->rev.bloom_filter_settings)
>> +		return 1;
>> +
>> +	filter = get_bloom_filter(lm->rev.repo, origin);
>> +	if (!filter)
>> +		return 1;
>> +
>> +	hashmap_for_each_entry(&lm->paths, &iter, ent, hashent) {
>> +		if (bloom_filter_contains(filter, &ent->key,
>> +					  lm->rev.bloom_filter_settings))
>> +			return 1;
>> +	}
>> +	return 0;
>> +}
>
> This function is basically the same as `maybe_changed_paths()` in
> "blame.c", but that isn't a huge issue from my point of view. What makes
> me wonder though is why we have an additional check over there for
> whether or not the commit has a valid generation number.

I've been asking me the same question. And I couldn't find a good reason
(neither from the commit history, or from my reasoning). This check was
in the version shared by Taylor, but because we were ignoring the return
value from generation_numbers_enabled() in that version, it didn't make
sense to me to do this check. That's why I removed it.

>> @@ -227,6 +264,9 @@ static int last_modified_init(struct last_modified *lm, struct repository *r,
>>  		return argc;
>>  	}
>>  
>> +	prepare_commit_graph(lm->rev.repo);
>> +	lm->rev.bloom_filter_settings = get_bloom_filter_settings(lm->rev.repo);
>> +
>
> So this here is why we export `prepare_commit_graph()`. How about we
> instead expose `bloom_filters_enabled()` that mirrors what we do in
> `generation_numbers_enabled()` and `corrected_commit_dates_enabled()`?
> That would both be on a higher level and do exactly what we want to
> achieve.

I've got another proposal, what if we let get_bloom_filter_settings()
call prepare_commit_graph()? Functions like
repo_find_commit_pos_in_graph() and lookup_commit_in_graph() do this
too. 

-- 
Cheers,
Toon

  reply	other threads:[~2025-08-01 16:23 UTC|newest]

Thread overview: 135+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-22 17:46 [PATCH RFC 0/5] Introduce git-blame-tree(1) command Toon Claes
2025-04-22 17:46 ` [PATCH RFC 1/5] blame-tree: introduce new subcommand to blame files Toon Claes
2025-04-24 16:19   ` Junio C Hamano
2025-05-07 13:13     ` Toon Claes
2025-04-22 17:46 ` [PATCH RFC 2/5] t/perf: add blame-tree perf script Toon Claes
2025-04-22 17:46 ` [PATCH RFC 3/5] blame-tree: use Bloom filters when available Toon Claes
2025-04-22 17:46 ` [PATCH RFC 4/5] blame-tree: implement faster algorithm Toon Claes
2025-04-22 17:46 ` [PATCH RFC 5/5] blame-tree.c: initialize revision machinery without walk Toon Claes
2025-04-23 13:26 ` [PATCH RFC 0/5] Introduce git-blame-tree(1) command Marc Branchaud
2025-05-07 14:22   ` Toon Claes
2025-05-07 20:23     ` Marc Branchaud
2025-05-07 20:45       ` Junio C Hamano
2025-05-08 13:26         ` Marc Branchaud
2025-05-08 14:26           ` Junio C Hamano
2025-05-08 15:12             ` Marc Branchaud
2025-05-14 14:42               ` Toon Claes
2025-05-14 19:29                 ` Junio C Hamano
2025-05-14 21:15                   ` Marc Branchaud
2025-05-15 13:29                     ` Patrick Steinhardt
2025-05-15 16:39                       ` Junio C Hamano
2025-05-15 17:39                         ` Marc Branchaud
2025-05-15 19:30                           ` Jeff King
2025-05-16  4:38                             ` Patrick Steinhardt
2025-05-20  8:49                               ` Toon Claes
2025-05-15 17:30                       ` Marc Branchaud
2025-05-16  4:30                         ` Patrick Steinhardt
2025-05-14 21:15                 ` Marc Branchaud
2025-05-07 20:49       ` Kristoffer Haugsbakk
2025-05-08 13:20         ` D. Ben Knoble
2025-05-08 13:26         ` Marc Branchaud
2025-05-08 13:18       ` D. Ben Knoble
2025-05-23  9:33 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Toon Claes
2025-05-23  9:33   ` [PATCH RFC v2 1/5] last-modified: new subcommand to show when files were last modified Toon Claes
2025-05-25 20:07     ` Justin Tobler
2025-06-05  8:32       ` Toon Claes
2025-05-27 10:39     ` Patrick Steinhardt
2025-06-13  9:34       ` Toon Claes
2025-06-13  9:52         ` Kristoffer Haugsbakk
2025-05-23  9:33   ` [PATCH RFC v2 2/5] t/perf: add last-modified perf script Toon Claes
2025-05-23  9:33   ` [PATCH RFC v2 3/5] last-modified: use Bloom filters when available Toon Claes
2025-05-27 10:40     ` Patrick Steinhardt
2025-06-13 11:05       ` Toon Claes
2025-05-23  9:33   ` [PATCH RFC v2 4/5] last-modified: implement faster algorithm Toon Claes
2025-05-27 10:39     ` Patrick Steinhardt
2025-05-23  9:33   ` [PATCH RFC v2 5/5] last-modified: initialize revision machinery without walk Toon Claes
2025-05-27 10:39     ` Patrick Steinhardt
2025-07-01 20:35   ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Kristoffer Haugsbakk
2025-07-01 21:06     ` Junio C Hamano
2025-07-01 21:30       ` Kristoffer Haugsbakk
2025-07-02 13:00         ` Toon Claes
2025-07-09 15:53           ` Toon Claes
2025-07-09 17:00             ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 0/3] " Toon Claes
2025-06-30 18:49   ` [PATCH RFC v3 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-01 20:20     ` Kristoffer Haugsbakk
2025-07-02 11:51     ` Junio C Hamano
2025-06-30 18:49   ` [PATCH RFC v3 2/3] t/perf: add last-modified perf script Toon Claes
2025-06-30 18:49   ` [PATCH RFC v3 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-01 23:01   ` [PATCH RFC v3 0/3] Introduce git-last-modified(1) command Junio C Hamano
2025-07-09 15:26   ` [PATCH v4 " Toon Claes
2025-07-09 21:57     ` Junio C Hamano
2025-07-10 18:37       ` Junio C Hamano
2025-07-16 13:32     ` [PATCH v5 0/6] " Toon Claes
2025-07-16 13:35       ` [PATCH v5 1/6] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-18  0:02         ` Taylor Blau
2025-07-19  6:44           ` Jeff King
2025-07-22 15:50           ` Toon Claes
2025-08-01  9:09           ` Christian Couder
2025-08-01 16:59             ` Junio C Hamano
2025-07-16 13:35       ` [PATCH v5 2/6] t/perf: add last-modified perf script Toon Claes
2025-07-18  0:08         ` Taylor Blau
2025-07-22 15:52           ` Toon Claes
2025-07-16 13:35       ` [PATCH v5 3/6] last-modified: use Bloom filters when available Toon Claes
2025-07-18  0:16         ` Taylor Blau
2025-07-22 16:02           ` Toon Claes
2025-07-16 13:35       ` [PATCH v5 4/6] pretty: allow caller to disable indentation Toon Claes
2025-07-16 15:50         ` Junio C Hamano
2025-07-17 16:31           ` Toon Claes
2025-07-16 13:35       ` [PATCH v5 5/6] last-modified: support --extended format Toon Claes
2025-07-16 16:09         ` Junio C Hamano
2025-07-17 16:31           ` Toon Claes
2025-07-17 22:37         ` Junio C Hamano
2025-07-18 17:36           ` Junio C Hamano
2025-07-22 16:06             ` Toon Claes
2025-07-16 13:42       ` [PATCH v5 6/6] fixup! last-modified: use Bloom filters when available Toon Claes
2025-07-17 23:39       ` [PATCH v5 0/6] Introduce git-last-modified(1) command Taylor Blau
2025-07-22 15:35         ` Toon Claes
2025-07-30 17:59           ` Toon Claes
2025-07-31  7:45             ` Patrick Steinhardt
2025-07-30 17:55       ` [PATCH v6 0/4] " Toon Claes
2025-07-31 18:40         ` Junio C Hamano
2025-07-31 23:57           ` Junio C Hamano
2025-08-05  9:33         ` [PATCH v7 0/3] " Toon Claes
2025-08-05 14:34           ` Patrick Steinhardt
2025-08-05 16:21             ` Junio C Hamano
2025-08-05 16:34           ` Junio C Hamano
2025-08-05 16:55             ` Toon Claes
2025-08-05 17:20               ` Jean-Noël AVILA
2025-08-05 21:46                 ` Junio C Hamano
2025-08-06 12:01                   ` Toon Claes
2025-08-06 15:38                     ` Junio C Hamano
2025-08-28 22:44                       ` Junio C Hamano
2025-08-05 18:28               ` Junio C Hamano
2025-08-05  9:33         ` [PATCH v7 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-08-05  9:33         ` [PATCH v7 2/3] t/perf: add last-modified perf script Toon Claes
2025-08-05  9:33         ` [PATCH v7 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-30 17:55       ` [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-31  6:42         ` Patrick Steinhardt
2025-08-01 16:22           ` Toon Claes
2025-08-01 17:09             ` Junio C Hamano
2025-08-04  6:34               ` Patrick Steinhardt
2025-08-04 17:14                 ` Junio C Hamano
2025-08-05  5:35                   ` Toon Claes
2025-08-01 20:34             ` Jean-Noël AVILA
2025-08-05  5:36               ` Toon Claes
2025-08-04  6:33             ` Patrick Steinhardt
2025-08-01 10:18         ` Christian Couder
2025-08-01 10:22           ` Patrick Steinhardt
2025-08-01 17:06             ` Junio C Hamano
2025-08-02  8:18               ` Christian Couder
2025-08-02 11:31                 ` Christian Couder
2025-08-02 13:38                   ` Christian Couder
2025-08-02 16:26                     ` Junio C Hamano
2025-08-04  6:35               ` Patrick Steinhardt
2025-07-30 17:55       ` [PATCH v6 2/4] t/perf: add last-modified perf script Toon Claes
2025-07-30 17:55       ` [PATCH v6 3/4] commit-graph: export prepare_commit_graph() Toon Claes
2025-07-31  6:42         ` Patrick Steinhardt
2025-07-30 17:55       ` [PATCH v6 4/4] last-modified: use Bloom filters when available Toon Claes
2025-07-31  6:43         ` Patrick Steinhardt
2025-08-01 16:23           ` Toon Claes [this message]
2025-08-04  6:33             ` Patrick Steinhardt
2025-07-09 15:26   ` [PATCH v4 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-09 15:26   ` [PATCH v4 2/3] t/perf: add last-modified perf script Toon Claes
2025-07-09 15:26   ` [PATCH v4 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-16 13:35   ` [PATCH v5 6/6] fixup! " Toon Claes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ldo3ui37.fsf@iotcl.com \
    --to=toon@iotcl.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kristofferhaugsbakk@fastmail.com \
    --cc=me@ttaylorr.com \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.