git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal
@ 2025-08-07  5:12 Lidong Yan
  2025-08-07  6:49 ` Patrick Steinhardt
  2025-08-08  6:58 ` [PATCH v2] " Lidong Yan
  0 siblings, 2 replies; 16+ messages in thread
From: Lidong Yan @ 2025-08-07  5:12 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, ttaylorr, Lidong Yan

When traversing commits, a pathspec item can be used to limit the
traversal to commits that modify the specified paths. And the
commit-graph includes a Bloom filter to exclude commits that definitely
did not modify a given pathspec item. During commit traversal, the
Bloom filter can significantly improve performance. However, it is
disabled if the specified pathspec item contains wildcard characters
or magic signatures. Enable Bloom filter even if a pathspec item
contains wildcard characters by filter only the non-wildcard part of
the pathspec item. Also Enable Bloom filter if magic signature is not
"exclude" or "icase".

With this optimization, we get some improvements for pathspec with
wildcard and magic signature. First, in the Git repository we see these
modest results:

git log -100 -- "t/*"

Benchmark 1: new
  Time (mean ± σ):      20.7 ms ±   0.5 ms
  Range (min … max):    19.8 ms …  21.8 ms

Benchmark 2: old
  Time (mean ± σ):      25.4 ms ±   0.6 ms
  Range (min … max):    24.1 ms …  26.8 ms

git log -100 -- ":(top)t"

Benchmark 1: new
  Time (mean ± σ):      15.3 ms ±   0.3 ms
  Range (min … max):    14.5 ms …  16.1 ms

Benchmark 2: old
  Time (mean ± σ):      19.5 ms ±   0.5 ms
  Range (min … max):    18.7 ms …  20.7 ms

But in a larger repo, such as the LLVM project repo below, we get even
better results:

git log -100 -- "libc/*"

Benchmark 1: new
  Time (mean ± σ):      10.1 ms ±   0.7 ms
  Range (min … max):     8.7 ms …  11.5 ms

Benchmark 2: old
  Time (mean ± σ):      26.1 ms ±   0.7 ms
  Range (min … max):    24.6 ms …  27.4 ms

git log -100 -- ":(top)libc"

Benchmark 1: new
  Time (mean ± σ):      11.0 ms ±   0.8 ms
  Range (min … max):     9.6 ms …  13.9 ms

Benchmark 2: old
  Time (mean ± σ):      20.7 ms ±   0.8 ms
  Range (min … max):    18.8 ms …  21.8 ms

Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
---
 revision.c           | 26 ++++++++++++++++++++------
 t/t4216-log-bloom.sh | 31 +++++++++++++++++++++++++++----
 2 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/revision.c b/revision.c
index 18f300d455..ef8c0b6eca 100644
--- a/revision.c
+++ b/revision.c
@@ -671,12 +671,13 @@ static void trace2_bloom_filter_statistics_atexit(void)
 
 static int forbid_bloom_filters(struct pathspec *spec)
 {
-	if (spec->has_wildcard)
-		return 1;
-	if (spec->magic & ~PATHSPEC_LITERAL)
+	int forbid_mask =
+		PATHSPEC_EXCLUDE | PATHSPEC_ICASE;
+
+	if (spec->magic & forbid_mask)
 		return 1;
 	for (size_t nr = 0; nr < spec->nr; nr++)
-		if (spec->items[nr].magic & ~PATHSPEC_LITERAL)
+		if (spec->items[nr].magic & forbid_mask)
 			return 1;
 
 	return 0;
@@ -693,9 +694,22 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
 	size_t len;
 	int res = 0;
 
+	len = pi->nowildcard_len;
 	/* remove single trailing slash from path, if needed */
-	if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
-		path_alloc = xmemdupz(pi->match, pi->len - 1);
+	if (len > 0 && pi->match[len - 1] == '/')
+		len--;
+	else if (len != pi->len) {
+		/*
+		 * for path like "/dir/file*", nowildcard part would be
+		 * "/dir/file", but only "/dir" should be used for the
+		 * bloom filter
+		 */
+		while (len > 0 && pi->match[len - 1] != '/')
+			len--;
+	}
+
+	if (len != pi->len) {
+		path_alloc = xmemdupz(pi->match, len);
 		path = path_alloc;
 	} else
 		path = pi->match;
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 639868ac56..d8200e4dcb 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -154,11 +154,34 @@ test_expect_success 'git log with multiple literal paths uses Bloom filter' '
 	test_bloom_filters_used "-- file*"
 '
 
-test_expect_success 'git log with path contains a wildcard does not use Bloom filter' '
+test_expect_success 'git log with paths all contain non-wildcard part uses Bloom filter' '
+	test_bloom_filters_used "-- A/\* file4" &&
+	test_bloom_filters_used "-- file4 A/\*" &&
+	test_bloom_filters_used "-- * A/\*"
+'
+
+test_expect_success 'git log with path only contains wildcard part does not use Bloom filter' '
 	test_bloom_filters_not_used "-- file\*" &&
-	test_bloom_filters_not_used "-- A/\* file4" &&
-	test_bloom_filters_not_used "-- file4 A/\*" &&
-	test_bloom_filters_not_used "-- * A/\*"
+	test_bloom_filters_not_used "-- file\* A/\*" &&
+	test_bloom_filters_not_used "-- file\* *" &&
+	test_bloom_filters_not_used "-- \*"
+'
+
+test_expect_success 'git log with path contains various magic signatures' '
+	cd A &&
+	test_bloom_filters_used "-- \:\(top\)B" &&
+	cd .. &&
+
+	test_bloom_filters_used "-- \:\(glob\)A/\*\*/C" &&
+	test_bloom_filters_not_used "-- \:\(icase\)FILE4" &&
+	test_bloom_filters_not_used "-- \:\(exclude\)A/B/C" &&
+
+	cat >.gitattributes <<-EOF &&
+		A/file1 text
+		A/B/file2 -text
+	EOF
+	test_bloom_filters_used "-- \:\(attr\:text\)A" &&
+	rm .gitattributes
 '
 
 test_expect_success 'setup - add commit-graph to the chain without Bloom filters' '
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-08-11 16:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-07  5:12 [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal Lidong Yan
2025-08-07  6:49 ` Patrick Steinhardt
2025-08-07  8:59   ` Lidong Yan
2025-08-07 16:15   ` Junio C Hamano
2025-08-08  6:40     ` Lidong Yan
2025-08-08  6:58 ` [PATCH v2] " Lidong Yan
2025-08-08 15:50   ` Junio C Hamano
2025-08-09  2:06     ` Lidong Yan
2025-08-09  2:16     ` [PATCH v3] " Lidong Yan
2025-08-09  4:22       ` [PATCH v4] " Lidong Yan
2025-08-09  7:40         ` Lidong Yan
2025-08-11  6:01         ` [PATCH v5] " Lidong Yan
2025-08-11 15:56           ` Junio C Hamano
2025-08-11 16:08             ` Lidong Yan
2025-08-09 23:51       ` [PATCH v3] " Junio C Hamano
2025-08-10  1:57         ` Lidong Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).