git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] Not computing changed path filter for root commits
@ 2023-09-11 22:31 Jonathan Tan
  2023-09-15 18:25 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jonathan Tan @ 2023-09-11 22:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, szeder.dev, me, derrickstolee

This is following the discussion about adding a new changed path filter
version due to the current implementation of murmur3 not matching the
algorithm. [1]

SZEDER Gábor suggested [2] that we change the revision walk to read
changed path filters also for root commits, but I don't think that's
possible - we have to tie reading changed path filters to when we read
trees, and right now, we don't seem to read trees when evaluating root
commits (rev_compare_tree() in revision.c is in the only code path that
uses changed path filters, and it itself is only called per-parent and
thus not called for root commits). The alternative is to not generate
changed path filters for root commits (or what I did in this patch,
which is to generate an all-1 filter), which seems reasonable to me.

Is this a good idea? If yes, there is the follow-up question of how to
report it in traces. I don't know off-hand if it's better to reuse the
"large" statistic, since what we output is the same (all 1s), or if we
should make a new one. Presumably what we need to weigh is the clarity
versus the migration costs, but I don't know how to weigh it.

If people are generally in agreement, I can send an updated patch that
does not goto into an "else" block :) and also with updated tests (t0095
needs to check a non-root commit, and t4216 needs to be updated with
whatever statistics we decide to use).

[1] https://lore.kernel.org/git/20230826150610.GA1928@szeder.dev/
[2] https://lore.kernel.org/git/20230830200218.GA5147@szeder.dev/
---
 bloom.c              |  3 ++-
 t/t0095-bloom.sh     |  2 +-
 t/t4216-log-bloom.sh | 12 +++---------
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/bloom.c b/bloom.c
index aef6b5fea2..b21130b236 100644
--- a/bloom.c
+++ b/bloom.c
@@ -226,7 +226,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	if (c->parents)
 		diff_tree_oid(&c->parents->item->object.oid, &c->object.oid, "", &diffopt);
 	else
-		diff_tree_oid(NULL, &c->object.oid, "", &diffopt);
+		goto large;
 	diffcore_std(&diffopt);
 
 	if (diff_queued_diff.nr <= settings->max_changed_paths) {
@@ -292,6 +292,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	} else {
 		for (i = 0; i < diff_queued_diff.nr; i++)
 			diff_free_filepair(diff_queued_diff.queue[i]);
+large:
 		init_truncated_large_filter(filter);
 
 		if (computed)
diff --git a/t/t0095-bloom.sh b/t/t0095-bloom.sh
index b567383eb8..02a0b41026 100755
--- a/t/t0095-bloom.sh
+++ b/t/t0095-bloom.sh
@@ -74,7 +74,7 @@ test_expect_success !SANITIZE_LEAK 'get bloom filters for commit with no changes
 	git commit --allow-empty -m "c0" &&
 	cat >expect <<-\EOF &&
 	Filter_Length:1
-	Filter_Data:00|
+	Filter_Data:ff|
 	EOF
 	test-tool bloom get_filter_for_commit "$(git rev-parse HEAD)" >actual &&
 	test_cmp expect actual
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index fa9d32facf..d14fe93fb1 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -283,7 +283,6 @@ test_expect_success 'correctly report changes over limit' '
 			git commit-graph write --reachable --changed-paths &&
 		test_max_changed_paths 11 trace-update &&
 		test_filter_computed 2 trace-update &&
-		test_filter_trunc_large 0 trace-update &&
 
 		for path in $(git ls-tree -r --name-only HEAD)
 		do
@@ -306,9 +305,7 @@ test_expect_success 'correctly report commits with no changed paths' '
 		GIT_TRACE2_EVENT="$(pwd)/trace.event" \
 			git commit-graph write --reachable --changed-paths &&
 		test_filter_computed 1 trace.event &&
-		test_filter_not_computed 0 trace.event &&
-		test_filter_trunc_empty 1 trace.event &&
-		test_filter_trunc_large 0 trace.event
+		test_filter_not_computed 0 trace.event
 	)
 '
 
@@ -363,8 +360,7 @@ test_expect_success '--max-new-filters overrides configuration' '
 				--max-new-filters=1 &&
 		test_filter_computed 1 trace.event &&
 		test_filter_not_computed 1 trace.event &&
-		test_filter_trunc_empty 0 trace.event &&
-		test_filter_trunc_large 0 trace.event
+		test_filter_trunc_empty 0 trace.event
 	)
 '
 
@@ -386,9 +382,7 @@ test_expect_success 'Bloom generation backfills empty commits' '
 				git commit-graph write --reachable \
 					--changed-paths --max-new-filters=2 &&
 			test_filter_computed 2 trace.event &&
-			test_filter_not_computed 4 trace.event &&
-			test_filter_trunc_empty 2 trace.event &&
-			test_filter_trunc_large 0 trace.event || return 1
+			test_filter_not_computed 4 trace.event || return 1
 		done &&
 
 		# Finally, make sure that once all commits have filters, that
-- 
2.42.0.283.g2d96d420d3-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-10-10 19:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-11 22:31 [RFC PATCH] Not computing changed path filter for root commits Jonathan Tan
2023-09-15 18:25 ` Junio C Hamano
2023-09-15 20:29 ` SZEDER Gábor
2023-09-19 18:19   ` Taylor Blau
2023-10-02 22:55     ` Jonathan Tan
2023-10-09 17:20       ` Taylor Blau
2023-10-09 17:26         ` Taylor Blau
2023-10-09 20:59           ` Jonathan Tan
2023-10-10 19:47             ` Taylor Blau
2023-09-19 18:21 ` Taylor Blau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).