git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Toon Claes <toon@iotcl.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Taylor Blau <me@ttaylorr.com>,
	 Derrick Stolee <stolee@gmail.com>, Toon Claes <toon@iotcl.com>
Subject: [PATCH RFC 3/5] blame-tree: use Bloom filters when available
Date: Tue, 22 Apr 2025 19:46:26 +0200	[thread overview]
Message-ID: <20250422-toon-new-blame-tree-v1-3-fdb51b8a394a@iotcl.com> (raw)
In-Reply-To: <20250422-toon-new-blame-tree-v1-0-fdb51b8a394a@iotcl.com>

From: Taylor Blau <me@ttaylorr.com>

Our 'git blame-tree' performs a revision walk, and computes a diff at
each point in the walk to figure out whether a given revision changed
any of the paths it considers interesting.

When changed-path Bloom filters are available, we can avoid computing
many such diffs. Before computing a diff, we first check if any of the
remaining paths of interest were possibly changed at a given commit by
consulting its Bloom filter. If any of them are, we are resigned to
compute the diff.

If none of those queries returned "maybe", we know that the given commit
doesn't contain any changed paths which are interesting to us. So, we
can avoid computing it in this case.

This results in a substantial performance speed-up in common cases of
'git blame-tree'. In the kernel, here is the before and after (all times
computed with best-of-five):

With commit-graphs (but no Bloom filters):

    real	0m5.133s
    user	0m4.942s
    sys	0m0.180s

...and with Bloom filters:

    real	0m0.936s
    user	0m0.842s
    sys	0m0.092s

These times are with my development-version of Git, so it's compiled
without optimizations. Compiling instead with `-O3`, the results look
even better:

    real	0m0.754s
    user	0m0.661s
    sys	0m0.092s

Signed-off-by: Toon Claes <toon@iotcl.com>
---
 blame-tree.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/blame-tree.c b/blame-tree.c
index ce57db2cfc..47354557a7 100644
--- a/blame-tree.c
+++ b/blame-tree.c
@@ -7,11 +7,15 @@
 #include "revision.h"
 #include "repository.h"
 #include "log-tree.h"
+#include "dir.h"
+#include "commit-graph.h"
+#include "bloom.h"
 
 struct blame_tree_entry {
 	struct hashmap_entry hashent;
 	struct object_id oid;
 	struct commit *commit;
+	struct bloom_key key;
 	const char path[FLEX_ARRAY];
 };
 
@@ -28,6 +32,9 @@ static void add_from_diff(struct diff_queue_struct *q,
 
 		FLEX_ALLOC_STR(ent, path, path);
 		oidcpy(&ent->oid, &p->two->oid);
+		if (bt->rev.bloom_filter_settings)
+			fill_bloom_key(path, strlen(path), &ent->key,
+				       bt->rev.bloom_filter_settings);
 		hashmap_entry_init(&ent->hashent, strhash(ent->path));
 		hashmap_add(&bt->paths, &ent->hashent);
 	}
@@ -92,12 +99,21 @@ void blame_tree_init(struct blame_tree *bt,
 	if (setup_revisions(argc, argv, &bt->rev, NULL) > 1)
 		die(_("unknown blame-tree argument: %s"), argv[1]);
 
+	(void)generation_numbers_enabled(bt->rev.repo);
+	bt->rev.bloom_filter_settings = get_bloom_filter_settings(bt->rev.repo);
+
 	if (add_from_revs(bt) < 0)
 		die(_("unable to setup blame-tree"));
 }
 
 void blame_tree_release(struct blame_tree *bt)
 {
+	struct hashmap_iter iter;
+	struct blame_tree_entry *ent;
+
+	hashmap_for_each_entry(&bt->paths, &iter, ent, hashent) {
+		clear_bloom_key(&ent->key);
+	}
 	hashmap_clear_and_free(&bt->paths, struct blame_tree_entry, hashent);
 	release_revisions(&bt->rev);
 }
@@ -137,6 +153,7 @@ static void mark_path(const char *path, const struct object_id *oid,
 		data->callback(path, data->commit, data->callback_data);
 
 	hashmap_remove(data->paths, &ent->hashent, path);
+	clear_bloom_key(&ent->key);
 	free(ent);
 }
 
@@ -180,6 +197,30 @@ static void blame_diff(struct diff_queue_struct *q,
 	}
 }
 
+static int maybe_changed_path(struct blame_tree *bt, struct commit *origin)
+{
+	struct bloom_filter *filter;
+	struct blame_tree_entry *e;
+	struct hashmap_iter iter;
+
+	if (!bt->rev.bloom_filter_settings)
+		return 1;
+
+	if (commit_graph_generation(origin) == GENERATION_NUMBER_INFINITY)
+		return 1;
+
+	filter = get_bloom_filter(bt->rev.repo, origin);
+	if (!filter)
+		return 1;
+
+	hashmap_for_each_entry(&bt->paths, &iter, e, hashent) {
+		if (bloom_filter_contains(filter, &e->key,
+					  bt->rev.bloom_filter_settings))
+			return 1;
+	}
+	return 0;
+}
+
 int blame_tree_run(struct blame_tree *bt, blame_tree_callback cb, void *cbdata)
 {
 	struct blame_tree_callback_data data;
@@ -199,6 +240,9 @@ int blame_tree_run(struct blame_tree *bt, blame_tree_callback cb, void *cbdata)
 		if (!data.commit)
 			break;
 
+		if (!maybe_changed_path(bt, data.commit))
+			continue;
+
 		if (data.commit->object.flags & BOUNDARY) {
 			diff_tree_oid(bt->rev.repo->hash_algo->empty_tree,
 				       &data.commit->object.oid,

-- 
2.49.0.rc2


  parent reply	other threads:[~2025-04-22 17:46 UTC|newest]

Thread overview: 135+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-22 17:46 [PATCH RFC 0/5] Introduce git-blame-tree(1) command Toon Claes
2025-04-22 17:46 ` [PATCH RFC 1/5] blame-tree: introduce new subcommand to blame files Toon Claes
2025-04-24 16:19   ` Junio C Hamano
2025-05-07 13:13     ` Toon Claes
2025-04-22 17:46 ` [PATCH RFC 2/5] t/perf: add blame-tree perf script Toon Claes
2025-04-22 17:46 ` Toon Claes [this message]
2025-04-22 17:46 ` [PATCH RFC 4/5] blame-tree: implement faster algorithm Toon Claes
2025-04-22 17:46 ` [PATCH RFC 5/5] blame-tree.c: initialize revision machinery without walk Toon Claes
2025-04-23 13:26 ` [PATCH RFC 0/5] Introduce git-blame-tree(1) command Marc Branchaud
2025-05-07 14:22   ` Toon Claes
2025-05-07 20:23     ` Marc Branchaud
2025-05-07 20:45       ` Junio C Hamano
2025-05-08 13:26         ` Marc Branchaud
2025-05-08 14:26           ` Junio C Hamano
2025-05-08 15:12             ` Marc Branchaud
2025-05-14 14:42               ` Toon Claes
2025-05-14 19:29                 ` Junio C Hamano
2025-05-14 21:15                   ` Marc Branchaud
2025-05-15 13:29                     ` Patrick Steinhardt
2025-05-15 16:39                       ` Junio C Hamano
2025-05-15 17:39                         ` Marc Branchaud
2025-05-15 19:30                           ` Jeff King
2025-05-16  4:38                             ` Patrick Steinhardt
2025-05-20  8:49                               ` Toon Claes
2025-05-15 17:30                       ` Marc Branchaud
2025-05-16  4:30                         ` Patrick Steinhardt
2025-05-14 21:15                 ` Marc Branchaud
2025-05-07 20:49       ` Kristoffer Haugsbakk
2025-05-08 13:20         ` D. Ben Knoble
2025-05-08 13:26         ` Marc Branchaud
2025-05-08 13:18       ` D. Ben Knoble
2025-05-23  9:33 ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Toon Claes
2025-05-23  9:33   ` [PATCH RFC v2 1/5] last-modified: new subcommand to show when files were last modified Toon Claes
2025-05-25 20:07     ` Justin Tobler
2025-06-05  8:32       ` Toon Claes
2025-05-27 10:39     ` Patrick Steinhardt
2025-06-13  9:34       ` Toon Claes
2025-06-13  9:52         ` Kristoffer Haugsbakk
2025-05-23  9:33   ` [PATCH RFC v2 2/5] t/perf: add last-modified perf script Toon Claes
2025-05-23  9:33   ` [PATCH RFC v2 3/5] last-modified: use Bloom filters when available Toon Claes
2025-05-27 10:40     ` Patrick Steinhardt
2025-06-13 11:05       ` Toon Claes
2025-05-23  9:33   ` [PATCH RFC v2 4/5] last-modified: implement faster algorithm Toon Claes
2025-05-27 10:39     ` Patrick Steinhardt
2025-05-23  9:33   ` [PATCH RFC v2 5/5] last-modified: initialize revision machinery without walk Toon Claes
2025-05-27 10:39     ` Patrick Steinhardt
2025-07-01 20:35   ` [PATCH RFC v2 0/5] Introduce git-last-modified(1) command Kristoffer Haugsbakk
2025-07-01 21:06     ` Junio C Hamano
2025-07-01 21:30       ` Kristoffer Haugsbakk
2025-07-02 13:00         ` Toon Claes
2025-07-09 15:53           ` Toon Claes
2025-07-09 17:00             ` Junio C Hamano
2025-06-30 18:49 ` [PATCH RFC v3 0/3] " Toon Claes
2025-06-30 18:49   ` [PATCH RFC v3 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-01 20:20     ` Kristoffer Haugsbakk
2025-07-02 11:51     ` Junio C Hamano
2025-06-30 18:49   ` [PATCH RFC v3 2/3] t/perf: add last-modified perf script Toon Claes
2025-06-30 18:49   ` [PATCH RFC v3 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-01 23:01   ` [PATCH RFC v3 0/3] Introduce git-last-modified(1) command Junio C Hamano
2025-07-09 15:26   ` [PATCH v4 " Toon Claes
2025-07-09 21:57     ` Junio C Hamano
2025-07-10 18:37       ` Junio C Hamano
2025-07-16 13:32     ` [PATCH v5 0/6] " Toon Claes
2025-07-16 13:35       ` [PATCH v5 1/6] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-18  0:02         ` Taylor Blau
2025-07-19  6:44           ` Jeff King
2025-07-22 15:50           ` Toon Claes
2025-08-01  9:09           ` Christian Couder
2025-08-01 16:59             ` Junio C Hamano
2025-07-16 13:35       ` [PATCH v5 2/6] t/perf: add last-modified perf script Toon Claes
2025-07-18  0:08         ` Taylor Blau
2025-07-22 15:52           ` Toon Claes
2025-07-16 13:35       ` [PATCH v5 3/6] last-modified: use Bloom filters when available Toon Claes
2025-07-18  0:16         ` Taylor Blau
2025-07-22 16:02           ` Toon Claes
2025-07-16 13:35       ` [PATCH v5 4/6] pretty: allow caller to disable indentation Toon Claes
2025-07-16 15:50         ` Junio C Hamano
2025-07-17 16:31           ` Toon Claes
2025-07-16 13:35       ` [PATCH v5 5/6] last-modified: support --extended format Toon Claes
2025-07-16 16:09         ` Junio C Hamano
2025-07-17 16:31           ` Toon Claes
2025-07-17 22:37         ` Junio C Hamano
2025-07-18 17:36           ` Junio C Hamano
2025-07-22 16:06             ` Toon Claes
2025-07-16 13:42       ` [PATCH v5 6/6] fixup! last-modified: use Bloom filters when available Toon Claes
2025-07-17 23:39       ` [PATCH v5 0/6] Introduce git-last-modified(1) command Taylor Blau
2025-07-22 15:35         ` Toon Claes
2025-07-30 17:59           ` Toon Claes
2025-07-31  7:45             ` Patrick Steinhardt
2025-07-30 17:55       ` [PATCH v6 0/4] " Toon Claes
2025-07-31 18:40         ` Junio C Hamano
2025-07-31 23:57           ` Junio C Hamano
2025-08-05  9:33         ` [PATCH v7 0/3] " Toon Claes
2025-08-05 14:34           ` Patrick Steinhardt
2025-08-05 16:21             ` Junio C Hamano
2025-08-05 16:34           ` Junio C Hamano
2025-08-05 16:55             ` Toon Claes
2025-08-05 17:20               ` Jean-Noël AVILA
2025-08-05 21:46                 ` Junio C Hamano
2025-08-06 12:01                   ` Toon Claes
2025-08-06 15:38                     ` Junio C Hamano
2025-08-28 22:44                       ` Junio C Hamano
2025-08-05 18:28               ` Junio C Hamano
2025-08-05  9:33         ` [PATCH v7 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-08-05  9:33         ` [PATCH v7 2/3] t/perf: add last-modified perf script Toon Claes
2025-08-05  9:33         ` [PATCH v7 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-30 17:55       ` [PATCH v6 1/4] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-31  6:42         ` Patrick Steinhardt
2025-08-01 16:22           ` Toon Claes
2025-08-01 17:09             ` Junio C Hamano
2025-08-04  6:34               ` Patrick Steinhardt
2025-08-04 17:14                 ` Junio C Hamano
2025-08-05  5:35                   ` Toon Claes
2025-08-01 20:34             ` Jean-Noël AVILA
2025-08-05  5:36               ` Toon Claes
2025-08-04  6:33             ` Patrick Steinhardt
2025-08-01 10:18         ` Christian Couder
2025-08-01 10:22           ` Patrick Steinhardt
2025-08-01 17:06             ` Junio C Hamano
2025-08-02  8:18               ` Christian Couder
2025-08-02 11:31                 ` Christian Couder
2025-08-02 13:38                   ` Christian Couder
2025-08-02 16:26                     ` Junio C Hamano
2025-08-04  6:35               ` Patrick Steinhardt
2025-07-30 17:55       ` [PATCH v6 2/4] t/perf: add last-modified perf script Toon Claes
2025-07-30 17:55       ` [PATCH v6 3/4] commit-graph: export prepare_commit_graph() Toon Claes
2025-07-31  6:42         ` Patrick Steinhardt
2025-07-30 17:55       ` [PATCH v6 4/4] last-modified: use Bloom filters when available Toon Claes
2025-07-31  6:43         ` Patrick Steinhardt
2025-08-01 16:23           ` Toon Claes
2025-08-04  6:33             ` Patrick Steinhardt
2025-07-09 15:26   ` [PATCH v4 1/3] last-modified: new subcommand to show when files were last modified Toon Claes
2025-07-09 15:26   ` [PATCH v4 2/3] t/perf: add last-modified perf script Toon Claes
2025-07-09 15:26   ` [PATCH v4 3/3] last-modified: use Bloom filters when available Toon Claes
2025-07-16 13:35   ` [PATCH v5 6/6] fixup! " Toon Claes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250422-toon-new-blame-tree-v1-3-fdb51b8a394a@iotcl.com \
    --to=toon@iotcl.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).