git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal
@ 2025-08-07  5:12 Lidong Yan
  2025-08-07  6:49 ` Patrick Steinhardt
  2025-08-08  6:58 ` [PATCH v2] " Lidong Yan
  0 siblings, 2 replies; 16+ messages in thread
From: Lidong Yan @ 2025-08-07  5:12 UTC (permalink / raw)
  To: git; +Cc: stolee, gitster, ttaylorr, Lidong Yan

When traversing commits, a pathspec item can be used to limit the
traversal to commits that modify the specified paths. And the
commit-graph includes a Bloom filter to exclude commits that definitely
did not modify a given pathspec item. During commit traversal, the
Bloom filter can significantly improve performance. However, it is
disabled if the specified pathspec item contains wildcard characters
or magic signatures. Enable Bloom filter even if a pathspec item
contains wildcard characters by filter only the non-wildcard part of
the pathspec item. Also Enable Bloom filter if magic signature is not
"exclude" or "icase".

With this optimization, we get some improvements for pathspec with
wildcard and magic signature. First, in the Git repository we see these
modest results:

git log -100 -- "t/*"

Benchmark 1: new
  Time (mean ± σ):      20.7 ms ±   0.5 ms
  Range (min … max):    19.8 ms …  21.8 ms

Benchmark 2: old
  Time (mean ± σ):      25.4 ms ±   0.6 ms
  Range (min … max):    24.1 ms …  26.8 ms

git log -100 -- ":(top)t"

Benchmark 1: new
  Time (mean ± σ):      15.3 ms ±   0.3 ms
  Range (min … max):    14.5 ms …  16.1 ms

Benchmark 2: old
  Time (mean ± σ):      19.5 ms ±   0.5 ms
  Range (min … max):    18.7 ms …  20.7 ms

But in a larger repo, such as the LLVM project repo below, we get even
better results:

git log -100 -- "libc/*"

Benchmark 1: new
  Time (mean ± σ):      10.1 ms ±   0.7 ms
  Range (min … max):     8.7 ms …  11.5 ms

Benchmark 2: old
  Time (mean ± σ):      26.1 ms ±   0.7 ms
  Range (min … max):    24.6 ms …  27.4 ms

git log -100 -- ":(top)libc"

Benchmark 1: new
  Time (mean ± σ):      11.0 ms ±   0.8 ms
  Range (min … max):     9.6 ms …  13.9 ms

Benchmark 2: old
  Time (mean ± σ):      20.7 ms ±   0.8 ms
  Range (min … max):    18.8 ms …  21.8 ms

Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
---
 revision.c           | 26 ++++++++++++++++++++------
 t/t4216-log-bloom.sh | 31 +++++++++++++++++++++++++++----
 2 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/revision.c b/revision.c
index 18f300d455..ef8c0b6eca 100644
--- a/revision.c
+++ b/revision.c
@@ -671,12 +671,13 @@ static void trace2_bloom_filter_statistics_atexit(void)
 
 static int forbid_bloom_filters(struct pathspec *spec)
 {
-	if (spec->has_wildcard)
-		return 1;
-	if (spec->magic & ~PATHSPEC_LITERAL)
+	int forbid_mask =
+		PATHSPEC_EXCLUDE | PATHSPEC_ICASE;
+
+	if (spec->magic & forbid_mask)
 		return 1;
 	for (size_t nr = 0; nr < spec->nr; nr++)
-		if (spec->items[nr].magic & ~PATHSPEC_LITERAL)
+		if (spec->items[nr].magic & forbid_mask)
 			return 1;
 
 	return 0;
@@ -693,9 +694,22 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
 	size_t len;
 	int res = 0;
 
+	len = pi->nowildcard_len;
 	/* remove single trailing slash from path, if needed */
-	if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
-		path_alloc = xmemdupz(pi->match, pi->len - 1);
+	if (len > 0 && pi->match[len - 1] == '/')
+		len--;
+	else if (len != pi->len) {
+		/*
+		 * for path like "/dir/file*", nowildcard part would be
+		 * "/dir/file", but only "/dir" should be used for the
+		 * bloom filter
+		 */
+		while (len > 0 && pi->match[len - 1] != '/')
+			len--;
+	}
+
+	if (len != pi->len) {
+		path_alloc = xmemdupz(pi->match, len);
 		path = path_alloc;
 	} else
 		path = pi->match;
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 639868ac56..d8200e4dcb 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -154,11 +154,34 @@ test_expect_success 'git log with multiple literal paths uses Bloom filter' '
 	test_bloom_filters_used "-- file*"
 '
 
-test_expect_success 'git log with path contains a wildcard does not use Bloom filter' '
+test_expect_success 'git log with paths all contain non-wildcard part uses Bloom filter' '
+	test_bloom_filters_used "-- A/\* file4" &&
+	test_bloom_filters_used "-- file4 A/\*" &&
+	test_bloom_filters_used "-- * A/\*"
+'
+
+test_expect_success 'git log with path only contains wildcard part does not use Bloom filter' '
 	test_bloom_filters_not_used "-- file\*" &&
-	test_bloom_filters_not_used "-- A/\* file4" &&
-	test_bloom_filters_not_used "-- file4 A/\*" &&
-	test_bloom_filters_not_used "-- * A/\*"
+	test_bloom_filters_not_used "-- file\* A/\*" &&
+	test_bloom_filters_not_used "-- file\* *" &&
+	test_bloom_filters_not_used "-- \*"
+'
+
+test_expect_success 'git log with path contains various magic signatures' '
+	cd A &&
+	test_bloom_filters_used "-- \:\(top\)B" &&
+	cd .. &&
+
+	test_bloom_filters_used "-- \:\(glob\)A/\*\*/C" &&
+	test_bloom_filters_not_used "-- \:\(icase\)FILE4" &&
+	test_bloom_filters_not_used "-- \:\(exclude\)A/B/C" &&
+
+	cat >.gitattributes <<-EOF &&
+		A/file1 text
+		A/B/file2 -text
+	EOF
+	test_bloom_filters_used "-- \:\(attr\:text\)A" &&
+	rm .gitattributes
 '
 
 test_expect_success 'setup - add commit-graph to the chain without Bloom filters' '
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-07  5:12 [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal Lidong Yan
@ 2025-08-07  6:49 ` Patrick Steinhardt
  2025-08-07  8:59   ` Lidong Yan
  2025-08-07 16:15   ` Junio C Hamano
  2025-08-08  6:58 ` [PATCH v2] " Lidong Yan
  1 sibling, 2 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2025-08-07  6:49 UTC (permalink / raw)
  To: Lidong Yan; +Cc: git, stolee, gitster, ttaylorr

On Thu, Aug 07, 2025 at 01:12:43PM +0800, Lidong Yan wrote:

In the subject it should be s/enable bloom/enable Bloom.

> When traversing commits, a pathspec item can be used to limit the
> traversal to commits that modify the specified paths. And the
> commit-graph includes a Bloom filter to exclude commits that definitely
> did not modify a given pathspec item. During commit traversal, the
> Bloom filter can significantly improve performance. However, it is
> disabled if the specified pathspec item contains wildcard characters
> or magic signatures.

Let's add a paragraph here, as we now switch into the "what is being
done mode".

> Enable Bloom filter even if a pathspec item contains wildcard
> characters by filter only the non-wildcard part of the pathspec item.

s/by filter/by filtering/

> Also Enable Bloom filter if magic signature is not "exclude" or
> "icase".

This explains what is done, but not why this is safe to do.

> With this optimization, we get some improvements for pathspec with
> wildcard and magic signature. First, in the Git repository we see these

"for pathspecs with wildcards or magic signatures".

> diff --git a/revision.c b/revision.c
> index 18f300d455..ef8c0b6eca 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -671,12 +671,13 @@ static void trace2_bloom_filter_statistics_atexit(void)
>  
>  static int forbid_bloom_filters(struct pathspec *spec)
>  {
> -	if (spec->has_wildcard)
> -		return 1;
> -	if (spec->magic & ~PATHSPEC_LITERAL)
> +	int forbid_mask =

The mask should be `unsigned`.

> +		PATHSPEC_EXCLUDE | PATHSPEC_ICASE;

I think instead of a forbid-mask we should use an allow-mask. Otherwise
it can happen quite easily that we add new magic that isn't compatible
with Bloom filters but forget to update this part here. I'd rather be
slow but correct than fast but incorrect.

> +	if (spec->magic & forbid_mask)
>  		return 1;
>  	for (size_t nr = 0; nr < spec->nr; nr++)
> -		if (spec->items[nr].magic & ~PATHSPEC_LITERAL)
> +		if (spec->items[nr].magic & forbid_mask)
>  			return 1;
>  
>  	return 0;
> @@ -693,9 +694,22 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
>  	size_t len;
>  	int res = 0;
>  
> +	len = pi->nowildcard_len;
>  	/* remove single trailing slash from path, if needed */
> -	if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
> -		path_alloc = xmemdupz(pi->match, pi->len - 1);
> +	if (len > 0 && pi->match[len - 1] == '/')
> +		len--;
> +	else if (len != pi->len) {
> +		/*
> +		 * for path like "/dir/file*", nowildcard part would be
> +		 * "/dir/file", but only "/dir" should be used for the
> +		 * bloom filter
> +		 */
> +		while (len > 0 && pi->match[len - 1] != '/')
> +			len--;
> +	}
> +
> +	if (len != pi->len) {
> +		path_alloc = xmemdupz(pi->match, len);
>  		path = path_alloc;
>  	} else
>  		path = pi->match;

Okay, this matches what I've expected: if we have a wildcard we cannot
match on the component that contains the wildcard itself. But what we
_can_ do is to match on all the components leading to that wildcard
component.

One thing I did wonder though: what happens if the first component
contains the wildcard? We cannot really make any use of the Bloom filter
in that case as the path we match against becomes empty. I expect that
we'll handle this just fine. But is it still more performant than not
even trying Bloom filters in the first place?

> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> index 639868ac56..d8200e4dcb 100755
> --- a/t/t4216-log-bloom.sh
> +++ b/t/t4216-log-bloom.sh
> @@ -154,11 +154,34 @@ test_expect_success 'git log with multiple literal paths uses Bloom filter' '
>  	test_bloom_filters_used "-- file*"
>  '
>  
> -test_expect_success 'git log with path contains a wildcard does not use Bloom filter' '
> +test_expect_success 'git log with paths all contain non-wildcard part uses Bloom filter' '
> +	test_bloom_filters_used "-- A/\* file4" &&
> +	test_bloom_filters_used "-- file4 A/\*" &&
> +	test_bloom_filters_used "-- * A/\*"
> +'
> +
> +test_expect_success 'git log with path only contains wildcard part does not use Bloom filter' '
>  	test_bloom_filters_not_used "-- file\*" &&
> -	test_bloom_filters_not_used "-- A/\* file4" &&
> -	test_bloom_filters_not_used "-- file4 A/\*" &&
> -	test_bloom_filters_not_used "-- * A/\*"
> +	test_bloom_filters_not_used "-- file\* A/\*" &&
> +	test_bloom_filters_not_used "-- file\* *" &&
> +	test_bloom_filters_not_used "-- \*"
> +'
> +
> +test_expect_success 'git log with path contains various magic signatures' '
> +	cd A &&
> +	test_bloom_filters_used "-- \:\(top\)B" &&
> +	cd .. &&
> +
> +	test_bloom_filters_used "-- \:\(glob\)A/\*\*/C" &&
> +	test_bloom_filters_not_used "-- \:\(icase\)FILE4" &&
> +	test_bloom_filters_not_used "-- \:\(exclude\)A/B/C" &&
> +
> +	cat >.gitattributes <<-EOF &&
> +		A/file1 text
> +		A/B/file2 -text

We typically indent the heredoc text to the same level as the command.

> +	EOF
> +	test_bloom_filters_used "-- \:\(attr\:text\)A" &&
> +	rm .gitattributes

You can use `test_when_finished" instead to clean up after yourself even
in case the test fails.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-07  6:49 ` Patrick Steinhardt
@ 2025-08-07  8:59   ` Lidong Yan
  2025-08-07 16:15   ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Lidong Yan @ 2025-08-07  8:59 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, stolee, gitster, ttaylorr

Patrick Steinhardt <ps@pks.im> wrote:
> 
> On Thu, Aug 07, 2025 at 01:12:43PM +0800, Lidong Yan wrote:
> 
> In the subject it should be s/enable bloom/enable Bloom.
> 
>> When traversing commits, a pathspec item can be used to limit the
>> traversal to commits that modify the specified paths. And the
>> commit-graph includes a Bloom filter to exclude commits that definitely
>> did not modify a given pathspec item. During commit traversal, the
>> Bloom filter can significantly improve performance. However, it is
>> disabled if the specified pathspec item contains wildcard characters
>> or magic signatures.
> 
> Let's add a paragraph here, as we now switch into the "what is being
> done mode".
> 
>> Enable Bloom filter even if a pathspec item contains wildcard
>> characters by filter only the non-wildcard part of the pathspec item.
> 
> s/by filter/by filtering/
> 
>> With this optimization, we get some improvements for pathspec with
>> wildcard and magic signature. First, in the Git repository we see these
> 
> "for pathspecs with wildcards or magic signatures".

Will fix all the grammatical errors in next version.


>> Also Enable Bloom filter if magic signature is not "exclude" or
>> "icase".
> 
> This explains what is done, but not why this is safe to do.

I forgot to mention this — I’ll make sure to include it in the next commit message. 

> 
>> diff --git a/revision.c b/revision.c
>> index 18f300d455..ef8c0b6eca 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -671,12 +671,13 @@ static void trace2_bloom_filter_statistics_atexit(void)
>> 
>> static int forbid_bloom_filters(struct pathspec *spec)
>> {
>> - if (spec->has_wildcard)
>> - return 1;
>> - if (spec->magic & ~PATHSPEC_LITERAL)
>> + int forbid_mask =
> 
> The mask should be `unsigned`.
> 
>> + PATHSPEC_EXCLUDE | PATHSPEC_ICASE;
> 
> I think instead of a forbid-mask we should use an allow-mask. Otherwise
> it can happen quite easily that we add new magic that isn't compatible
> with Bloom filters but forget to update this part here. I'd rather be
> slow but correct than fast but incorrect.

Interesting and reasonable point — will fix.

> 
> One thing I did wonder though: what happens if the first component
> contains the wildcard? We cannot really make any use of the Bloom filter
> in that case as the path we match against becomes empty. I expect that
> we'll handle this just fine. But is it still more performant than not
> even trying Bloom filters in the first place?

If the first component contains the wildcard, convert_pathspec_to_bloom_keyvec()
failed and returns -1, which lead to prepare_to_use_bloom_filter() cleanup (free and NULLing)
all bloom filter related field. So the performant would be as same as bloom filter is
not even tried.

I actually find a bug in my code and I will add test and fix it, the code should be

	if (len != pi->len)
		while (len > 0 && pi->match[len - 1])
			len—;
	if (len > 0 && pi->match[len - 1] == ‘/‘)
		len—;

> We typically indent the heredoc text to the same level as the command.

Got it, will fix.

> 
>> + EOF
>> + test_bloom_filters_used "-- \:\(attr\:text\)A" &&
>> + rm .gitattributes
> 
> You can use `test_when_finished" instead to clean up after yourself even
> in case the test fails.

I see — I never knew how to deal with the issue where one failing test case
causes all subsequent tests to fail.

Thanks,
Lidong


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-07  6:49 ` Patrick Steinhardt
  2025-08-07  8:59   ` Lidong Yan
@ 2025-08-07 16:15   ` Junio C Hamano
  2025-08-08  6:40     ` Lidong Yan
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2025-08-07 16:15 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Lidong Yan, git, stolee, ttaylorr

Patrick Steinhardt <ps@pks.im> writes:

> On Thu, Aug 07, 2025 at 01:12:43PM +0800, Lidong Yan wrote:
>
> In the subject it should be s/enable bloom/enable Bloom.
>
>> When traversing commits, a pathspec item can be used to limit the
>> traversal to commits that modify the specified paths. And the
>> commit-graph includes a Bloom filter to exclude commits that definitely
>> did not modify a given pathspec item. During commit traversal, the
>> Bloom filter can significantly improve performance. However, it is
>> disabled if the specified pathspec item contains wildcard characters
>> or magic signatures.
>
> Let's add a paragraph here, as we now switch into the "what is being
> done mode".

I agree, a paragraph break is very welcome here.  I was confused
when I read this first that you are suggesting to add a new
paragraph with unspecified contents.

>> Also Enable Bloom filter if magic signature is not "exclude" or
>> "icase".
>
> This explains what is done, but not why this is safe to do.

And for that, the enumeration needs to be done inclusively, e.g. "we
know :(literal) is OK because we can do X; we know :(glob) is OK
because we can do Y".

The numbers are impressive ;-)

>> diff --git a/revision.c b/revision.c
>> index 18f300d455..ef8c0b6eca 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -671,12 +671,13 @@ static void trace2_bloom_filter_statistics_atexit(void)
>>  
>>  static int forbid_bloom_filters(struct pathspec *spec)
>>  {
>> -	if (spec->has_wildcard)
>> -		return 1;
>> -	if (spec->magic & ~PATHSPEC_LITERAL)
>> +	int forbid_mask =
>
> The mask should be `unsigned`.
>
>> +		PATHSPEC_EXCLUDE | PATHSPEC_ICASE;
>
> I think instead of a forbid-mask we should use an allow-mask. Otherwise
> it can happen quite easily that we add new magic that isn't compatible
> with Bloom filters but forget to update this part here. I'd rather be
> slow but correct than fast but incorrect.

Good suggestion.  From the use of &-, it is obvious it is a mask, but
"allow-mask" is not clear among what kind of things some are
allowed, so how about calling it:

    unsigned allowed_magic;

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-07 16:15   ` Junio C Hamano
@ 2025-08-08  6:40     ` Lidong Yan
  0 siblings, 0 replies; 16+ messages in thread
From: Lidong Yan @ 2025-08-08  6:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, stolee, ttaylorr

Junio C Hamano <gitster@pobox.com> writes:
> 
> The numbers are impressive ;-)

The bad news is that the Git versions I tested are 2.39 and 2.51, and I used
git commit-graph write --split to build the commit graph, which resulted in a
large value for filter_not_present in trace.perf.

The good news is that after rebuilding the commit-graph, I repeated the experiment
on HEAD and HEAD~1, and the results on HEAD were still better. Anyway, I will
update the experimental results in the next patch.

Thanks,
Lidong

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-07  5:12 [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal Lidong Yan
  2025-08-07  6:49 ` Patrick Steinhardt
@ 2025-08-08  6:58 ` Lidong Yan
  2025-08-08 15:50   ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Lidong Yan @ 2025-08-08  6:58 UTC (permalink / raw)
  To: yldhome2d2; +Cc: git, gitster, stolee, ttaylorr

When traversing commits, a pathspec item can be used to limit the
traversal to commits that modify the specified paths. And the
commit-graph includes a Bloom filter to exclude commits that definitely
did not modify a given pathspec item. During commit traversal, the
Bloom filter can significantly improve performance. However, it is
disabled if the specified pathspec item contains wildcard characters
or magic signatures.

For performance reason, enable Bloom filter even if a pathspec item
contains wildcard characters by filtering only the non-wildcard part of
the pathspec item.

The function of pathspec magic signature is generally to narrow down
the path specified by the pathspecs. So, enable Bloom filter when
the magic signature is "top", "glob", "attr", "--depth" or "literal".
"exclude" is used to select paths other than the specified path, rather
than serving as a filtering function, so it cannot be used together with
the Bloom filter. Since Bloom filter is not case insensitive even in
case insensitive system (e.g. MacOS), it cannot be used together with
"icase" magic.

With this optimization, we get some improvements for pathspecs with
wildcards or magic signatures. First, in the Git repository we see these
modest results:

git log -100 -- "t/*"

Benchmark 1: new
  Time (mean ± σ):      20.4 ms ±   0.6 ms
  Range (min … max):    19.3 ms …  24.4 ms

Benchmark 2: old
  Time (mean ± σ):      23.4 ms ±   0.5 ms
  Range (min … max):    22.5 ms …  24.7 ms

git log -100 -- ":(top)t"

Benchmark 1: new
  Time (mean ± σ):      16.2 ms ±   0.4 ms
  Range (min … max):    15.3 ms …  17.2 ms

Benchmark 2: old
  Time (mean ± σ):      18.6 ms ±   0.5 ms
  Range (min … max):    17.6 ms …  20.4 ms

But in a larger repo, such as the LLVM project repo below, we get even
better results:

git log -100 -- "libc/*"

Benchmark 1: new
  Time (mean ± σ):      16.0 ms ±   0.6 ms
  Range (min … max):    14.7 ms …  17.8 ms

Benchmark 2: old
  Time (mean ± σ):      26.7 ms ±   0.5 ms
  Range (min … max):    25.4 ms …  27.8 ms

git log -100 -- ":(top)libc"

Benchmark 1: new
  Time (mean ± σ):      15.6 ms ±   0.6 ms
  Range (min … max):    14.4 ms …  17.7 ms

Benchmark 2: old
  Time (mean ± σ):      19.6 ms ±   0.5 ms
  Range (min … max):    18.6 ms …  20.6 ms

Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
---
 revision.c           | 30 ++++++++++++++++++++++++------
 t/t4216-log-bloom.sh | 31 +++++++++++++++++++++++++++----
 2 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/revision.c b/revision.c
index 18f300d455..2a5b98390e 100644
--- a/revision.c
+++ b/revision.c
@@ -671,12 +671,17 @@ static void trace2_bloom_filter_statistics_atexit(void)
 
 static int forbid_bloom_filters(struct pathspec *spec)
 {
-	if (spec->has_wildcard)
-		return 1;
-	if (spec->magic & ~PATHSPEC_LITERAL)
+	unsigned int allowed_magic =
+		PATHSPEC_FROMTOP |
+		PATHSPEC_MAXDEPTH |
+		PATHSPEC_LITERAL |
+		PATHSPEC_GLOB |
+		PATHSPEC_ATTR;
+
+	if (spec->magic & ~allowed_magic)
 		return 1;
 	for (size_t nr = 0; nr < spec->nr; nr++)
-		if (spec->items[nr].magic & ~PATHSPEC_LITERAL)
+		if (spec->items[nr].magic & ~allowed_magic)
 			return 1;
 
 	return 0;
@@ -693,9 +698,22 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
 	size_t len;
 	int res = 0;
 
+	len = pi->nowildcard_len;
+	if (len != pi->len) {
+		/*
+		 * for path like "/dir/file*", nowildcard part would be
+		 * "/dir/file", but only "/dir" should be used for the
+		 * bloom filter
+		 */
+		while (len > 0 && pi->match[len - 1] != '/')
+			len--;
+	}
 	/* remove single trailing slash from path, if needed */
-	if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
-		path_alloc = xmemdupz(pi->match, pi->len - 1);
+	if (len > 0 && pi->match[len - 1] == '/')
+		len--;
+
+	if (len != pi->len) {
+		path_alloc = xmemdupz(pi->match, len);
 		path = path_alloc;
 	} else
 		path = pi->match;
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 639868ac56..1064990de3 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -154,11 +154,34 @@ test_expect_success 'git log with multiple literal paths uses Bloom filter' '
 	test_bloom_filters_used "-- file*"
 '
 
-test_expect_success 'git log with path contains a wildcard does not use Bloom filter' '
+test_expect_success 'git log with paths all contain non-wildcard part uses Bloom filter' '
+	test_bloom_filters_used "-- A/\* file4" &&
+	test_bloom_filters_used "-- A/file\*" &&
+	test_bloom_filters_used "-- * A/\*"
+'
+
+test_expect_success 'git log with path only contains wildcard part does not use Bloom filter' '
 	test_bloom_filters_not_used "-- file\*" &&
-	test_bloom_filters_not_used "-- A/\* file4" &&
-	test_bloom_filters_not_used "-- file4 A/\*" &&
-	test_bloom_filters_not_used "-- * A/\*"
+	test_bloom_filters_not_used "-- file\* A/\*" &&
+	test_bloom_filters_not_used "-- file\* *" &&
+	test_bloom_filters_not_used "-- \*"
+'
+
+test_expect_success 'git log with path contains various magic signatures' '
+	cd A &&
+	test_bloom_filters_used "-- \:\(top\)B" &&
+	cd .. &&
+
+	test_bloom_filters_used "-- \:\(glob\)A/\*\*/C" &&
+	test_bloom_filters_not_used "-- \:\(icase\)FILE4" &&
+	test_bloom_filters_not_used "-- \:\(exclude\)A/B/C" &&
+
+	test_when_finished "rm -f .gitattributes" &&
+	cat >.gitattributes <<-EOF &&
+	A/file1 text
+	A/B/file2 -text
+	EOF
+	test_bloom_filters_used "-- \:\(attr\:text\)A"
 '
 
 test_expect_success 'setup - add commit-graph to the chain without Bloom filters' '
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-08  6:58 ` [PATCH v2] " Lidong Yan
@ 2025-08-08 15:50   ` Junio C Hamano
  2025-08-09  2:06     ` Lidong Yan
  2025-08-09  2:16     ` [PATCH v3] " Lidong Yan
  0 siblings, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2025-08-08 15:50 UTC (permalink / raw)
  To: Lidong Yan; +Cc: git, stolee, ttaylorr

Lidong Yan <yldhome2d2@gmail.com> writes:

>  static int forbid_bloom_filters(struct pathspec *spec)
>  {
> -	if (spec->has_wildcard)
> -		return 1;
> -	if (spec->magic & ~PATHSPEC_LITERAL)
> +	unsigned int allowed_magic =
> +		PATHSPEC_FROMTOP |
> +		PATHSPEC_MAXDEPTH |
> +		PATHSPEC_LITERAL |
> +		PATHSPEC_GLOB |
> +		PATHSPEC_ATTR;
> +
> +	if (spec->magic & ~allowed_magic)
>  		return 1;
>  	for (size_t nr = 0; nr < spec->nr; nr++)
> -		if (spec->items[nr].magic & ~PATHSPEC_LITERAL)
> +		if (spec->items[nr].magic & ~allowed_magic)
>  			return 1;

Quite straight-forward and easy to see that this is a simple
enhancement of the existing code.  Good.

> @@ -693,9 +698,22 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
>  	size_t len;
>  	int res = 0;
>  
> +	len = pi->nowildcard_len;
> +	if (len != pi->len) {
> +		/*
> +		 * for path like "/dir/file*", nowildcard part would be
> +		 * "/dir/file", but only "/dir" should be used for the

Leading "/" makes it look as if the pathspec element can begin with
a slash, but it can not, can it?

> +		 * bloom filter
> +		 */
> +		while (len > 0 && pi->match[len - 1] != '/')
> +			len--;

In a tree that has both "builtin/" directory and "builtin.h" file,
what pathspec_element do we get here when we run

	$ git log "builtin*"

We need to be able to catch commits that touch anything in
"builtin/" and also anything at the top-level that begins with
"builtin", like "builtin.h" and other things that might have existed
ever in the history.  I think len goes down to 0 and that is the
correct behaviour.

The code does deal with the case where len is reduced down to zero,
but a bit poorly.  It would allocate a zero-length string, then
realize it frees it, and returns -1.  As it must know how long the
resulting path is before it allocated, and by definition len is
pi->len if it did not have to allocate, it should be able to take
the "goto cleanup" code path before it even attempts to allocate,
and it should not have to do strlen().  But the current code is not
incorrect.

> +	}
>  	/* remove single trailing slash from path, if needed */
> -	if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
> -		path_alloc = xmemdupz(pi->match, pi->len - 1);
> +	if (len > 0 && pi->match[len - 1] == '/')
> +		len--;
> +
> +	if (len != pi->len) {
> +		path_alloc = xmemdupz(pi->match, len);
>  		path = path_alloc;
>  	} else
>  		path = pi->match;

> +test_expect_success 'git log with paths all contain non-wildcard part uses Bloom filter' '
> +	test_bloom_filters_used "-- A/\* file4" &&

We'd ask the Bloom filter: skip commits that you know that can never
be touching either "A/(anything)" or "file4".

> +	test_bloom_filters_used "-- A/file\*" &&

We'd ask the Bloom filter: skip commits that you know that can never
be touching either "A/".

> +	test_bloom_filters_used "-- * A/\*"

What do we ask?  The second one says that a commit that might touch
"A/(something)" is worth investigating, but what about the first
one?  

Ahhh, that one is not quoted, so the shell expands to existing files
(which presumably do not have any wildcard characters).  OK.  If the
lone * were quoted, we shouldn't be using the Bloom filter.

> +'
> +
> +test_expect_success 'git log with path only contains wildcard part does not use Bloom filter' '
>  	test_bloom_filters_not_used "-- file\*" &&

OK, this is exactly the case I wondered about in the above wrt "builtin*"
and I agree with the expectation of this test.

> -	test_bloom_filters_not_used "-- A/\* file4" &&
> -	test_bloom_filters_not_used "-- file4 A/\*" &&
> -	test_bloom_filters_not_used "-- * A/\*"
> +	test_bloom_filters_not_used "-- file\* A/\*" &&

This one I understand.  The first one reduces len down to 0 in the
wildcard stripping loop, and makes us say "a commit that might touch
any path is worth investigating", which amounts to the same thing as
not using the Bloom filter at all.

> +	test_bloom_filters_not_used "-- file\* *" &&

Ditto.  Even if the unquoted * may expand to many concrete paths,
"file\*" that can match any path that begins with "file" that ever
have existed in the history would not help skipping any commit.

> +	test_bloom_filters_not_used "-- \*"

Ditto.

> +'
> +
> +test_expect_success 'git log with path contains various magic signatures' '
> +	cd A &&
> +	test_bloom_filters_used "-- \:\(top\)B" &&
> +	cd .. &&
> +
> +	test_bloom_filters_used "-- \:\(glob\)A/\*\*/C" &&
> +	test_bloom_filters_not_used "-- \:\(icase\)FILE4" &&
> +	test_bloom_filters_not_used "-- \:\(exclude\)A/B/C" &&
> +
> +	test_when_finished "rm -f .gitattributes" &&
> +	cat >.gitattributes <<-EOF &&
> +	A/file1 text
> +	A/B/file2 -text
> +	EOF
> +	test_bloom_filters_used "-- \:\(attr\:text\)A"
>  '

OK.  Good to see negative test cases here, which we sometimes forget
to test when showing off our shiny new toy.

Taking what I suggested above, here is a possible improvement.

 revision.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git i/revision.c w/revision.c
index 2a5b98390e..2a92bdda84 100644
--- i/revision.c
+++ w/revision.c
@@ -696,14 +696,14 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
 	char *path_alloc = NULL;
 	const char *path;
 	size_t len;
-	int res = 0;
+	int res = -1; /* be pessimistic */
 
 	len = pi->nowildcard_len;
 	if (len != pi->len) {
 		/*
-		 * for path like "/dir/file*", nowildcard part would be
-		 * "/dir/file", but only "/dir" should be used for the
-		 * bloom filter
+		 * for path like "dir/file*", nowildcard part would be
+		 * "dir/file", but only "dir" should be used for the
+		 * bloom filter.
 		 */
 		while (len > 0 && pi->match[len - 1] != '/')
 			len--;
@@ -712,19 +712,17 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
 	if (len > 0 && pi->match[len - 1] == '/')
 		len--;
 
+	if (!len)
+		goto cleanup;
+
 	if (len != pi->len) {
 		path_alloc = xmemdupz(pi->match, len);
 		path = path_alloc;
 	} else
 		path = pi->match;
 
-	len = strlen(path);
-	if (!len) {
-		res = -1;
-		goto cleanup;
-	}
-
 	*out = bloom_keyvec_new(path, len, settings);
+	res = 0;
 
 cleanup:
 	free(path_alloc);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-08 15:50   ` Junio C Hamano
@ 2025-08-09  2:06     ` Lidong Yan
  2025-08-09  2:16     ` [PATCH v3] " Lidong Yan
  1 sibling, 0 replies; 16+ messages in thread
From: Lidong Yan @ 2025-08-09  2:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, stolee, ttaylorr

Junio C Hamano <gitster@pobox.com> writes:
> 
>> @@ -693,9 +698,22 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
>> size_t len;
>> int res = 0;
>> 
>> + len = pi->nowildcard_len;
>> + if (len != pi->len) {
>> + /*
>> + * for path like "/dir/file*", nowildcard part would be
>> + * "/dir/file", but only "/dir" should be used for the
> 
> Leading "/" makes it look as if the pathspec element can begin with
> a slash, but it can not, can it?

Yes, seems like if we pass a absolute path "/path/to/repository/dir/file”, git
will automatically move "/path/to/repository” (in setup.c abspath_part_inside_repo())
So I should remove leading slash in my comment.

> Taking what I suggested above, here is a possible improvement.
> 
> revision.c | 18 ++++++++----------
> 1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git i/revision.c w/revision.c
> index 2a5b98390e..2a92bdda84 100644
> --- i/revision.c
> +++ w/revision.c
> @@ -696,14 +696,14 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
> char *path_alloc = NULL;
> const char *path;
> size_t len;
> - int res = 0;
> + int res = -1; /* be pessimistic */
> 
> len = pi->nowildcard_len;
> if (len != pi->len) {
> /*
> - * for path like "/dir/file*", nowildcard part would be
> - * "/dir/file", but only "/dir" should be used for the
> - * bloom filter
> + * for path like "dir/file*", nowildcard part would be
> + * "dir/file", but only "dir" should be used for the
> + * bloom filter.
> */
> while (len > 0 && pi->match[len - 1] != '/')
> len--;
> @@ -712,19 +712,17 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
> if (len > 0 && pi->match[len - 1] == '/')
> len--;
> 
> + if (!len)
> + goto cleanup;
> +
> if (len != pi->len) {
> path_alloc = xmemdupz(pi->match, len);
> path = path_alloc;
> } else
> path = pi->match;
> 
> - len = strlen(path);
> - if (!len) {
> - res = -1;
> - goto cleanup;
> - }
> -
> *out = bloom_keyvec_new(path, len, settings);
> + res = 0;
> 
> cleanup:
> free(path_alloc);

Thanks, I will apply this and add your signed-off.
Lidong

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-08 15:50   ` Junio C Hamano
  2025-08-09  2:06     ` Lidong Yan
@ 2025-08-09  2:16     ` Lidong Yan
  2025-08-09  4:22       ` [PATCH v4] " Lidong Yan
  2025-08-09 23:51       ` [PATCH v3] " Junio C Hamano
  1 sibling, 2 replies; 16+ messages in thread
From: Lidong Yan @ 2025-08-09  2:16 UTC (permalink / raw)
  To: gitster; +Cc: git, stolee, ttaylorr, yldhome2d2

When traversing commits, a pathspec item can be used to limit the
traversal to commits that modify the specified paths. And the
commit-graph includes a Bloom filter to exclude commits that definitely
did not modify a given pathspec item. During commit traversal, the
Bloom filter can significantly improve performance. However, it is
disabled if the specified pathspec item contains wildcard characters
or magic signatures.

For performance reason, enable Bloom filter even if a pathspec item
contains wildcard characters by filtering only the non-wildcard part of
the pathspec item.

The function of pathspec magic signature is generally to narrow down
the path specified by the pathspecs. So, enable Bloom filter when
the magic signature is "top", "glob", "attr", "--depth" or "literal".
"exclude" is used to select paths other than the specified path, rather
than serving as a filtering function, so it cannot be used together with
the Bloom filter. Since Bloom filter is not case insensitive even in
case insensitive system (e.g. MacOS), it cannot be used together with
"icase" magic.

With this optimization, we get some improvements for pathspecs with
wildcards or magic signatures. First, in the Git repository we see these
modest results:

git log -100 -- "t/*"

Benchmark 1: new
  Time (mean ± σ):      20.4 ms ±   0.6 ms
  Range (min … max):    19.3 ms …  24.4 ms

Benchmark 2: old
  Time (mean ± σ):      23.4 ms ±   0.5 ms
  Range (min … max):    22.5 ms …  24.7 ms

git log -100 -- ":(top)t"

Benchmark 1: new
  Time (mean ± σ):      16.2 ms ±   0.4 ms
  Range (min … max):    15.3 ms …  17.2 ms

Benchmark 2: old
  Time (mean ± σ):      18.6 ms ±   0.5 ms
  Range (min … max):    17.6 ms …  20.4 ms

But in a larger repo, such as the LLVM project repo below, we get even
better results:

git log -100 -- "libc/*"

Benchmark 1: new
  Time (mean ± σ):      16.0 ms ±   0.6 ms
  Range (min … max):    14.7 ms …  17.8 ms

Benchmark 2: old
  Time (mean ± σ):      26.7 ms ±   0.5 ms
  Range (min … max):    25.4 ms …  27.8 ms

git log -100 -- ":(top)libc"

Benchmark 1: new
  Time (mean ± σ):      15.6 ms ±   0.6 ms
  Range (min … max):    14.4 ms …  17.7 ms

Benchmark 2: old
  Time (mean ± σ):      19.6 ms ±   0.5 ms
  Range (min … max):    18.6 ms …  20.6 ms

Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
[jc: avoid allocating zero length path in
convert_pathspec_to_bloom_keyvec()]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 revision.c           | 37 +++++++++++++++++++++++++++----------
 t/t4216-log-bloom.sh | 31 +++++++++++++++++++++++++++----
 2 files changed, 54 insertions(+), 14 deletions(-)

diff --git a/revision.c b/revision.c
index 18f300d455..386c52aba1 100644
--- a/revision.c
+++ b/revision.c
@@ -671,12 +671,17 @@ static void trace2_bloom_filter_statistics_atexit(void)
 
 static int forbid_bloom_filters(struct pathspec *spec)
 {
-	if (spec->has_wildcard)
-		return 1;
-	if (spec->magic & ~PATHSPEC_LITERAL)
+	unsigned int allowed_magic =
+		PATHSPEC_FROMTOP |
+		PATHSPEC_MAXDEPTH |
+		PATHSPEC_LITERAL |
+		PATHSPEC_GLOB |
+		PATHSPEC_ATTR;
+
+	if (spec->magic & ~allowed_magic)
 		return 1;
 	for (size_t nr = 0; nr < spec->nr; nr++)
-		if (spec->items[nr].magic & ~PATHSPEC_LITERAL)
+		if (spec->items[nr].magic & ~allowed_magic)
 			return 1;
 
 	return 0;
@@ -693,19 +698,31 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
 	size_t len;
 	int res = 0;
 
+	len = pi->nowildcard_len;
+	if (len != pi->len) {
+		/*
+		 * for path like "dir/file*", nowildcard part would be
+		 * "dir/file", but only "dir" should be used for the
+		 * bloom filter
+		 */
+		while (len > 0 && pi->match[len - 1] != '/')
+			len--;
+	}
 	/* remove single trailing slash from path, if needed */
-	if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
-		path_alloc = xmemdupz(pi->match, pi->len - 1);
-		path = path_alloc;
-	} else
-		path = pi->match;
+	if (len > 0 && pi->match[len - 1] == '/')
+		len--;
 
-	len = strlen(path);
 	if (!len) {
 		res = -1;
 		goto cleanup;
 	}
 
+	if (len != pi->len) {
+		path_alloc = xmemdupz(pi->match, len);
+		path = path_alloc;
+	} else
+		path = pi->match;
+
 	*out = bloom_keyvec_new(path, len, settings);
 
 cleanup:
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 639868ac56..1064990de3 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -154,11 +154,34 @@ test_expect_success 'git log with multiple literal paths uses Bloom filter' '
 	test_bloom_filters_used "-- file*"
 '
 
-test_expect_success 'git log with path contains a wildcard does not use Bloom filter' '
+test_expect_success 'git log with paths all contain non-wildcard part uses Bloom filter' '
+	test_bloom_filters_used "-- A/\* file4" &&
+	test_bloom_filters_used "-- A/file\*" &&
+	test_bloom_filters_used "-- * A/\*"
+'
+
+test_expect_success 'git log with path only contains wildcard part does not use Bloom filter' '
 	test_bloom_filters_not_used "-- file\*" &&
-	test_bloom_filters_not_used "-- A/\* file4" &&
-	test_bloom_filters_not_used "-- file4 A/\*" &&
-	test_bloom_filters_not_used "-- * A/\*"
+	test_bloom_filters_not_used "-- file\* A/\*" &&
+	test_bloom_filters_not_used "-- file\* *" &&
+	test_bloom_filters_not_used "-- \*"
+'
+
+test_expect_success 'git log with path contains various magic signatures' '
+	cd A &&
+	test_bloom_filters_used "-- \:\(top\)B" &&
+	cd .. &&
+
+	test_bloom_filters_used "-- \:\(glob\)A/\*\*/C" &&
+	test_bloom_filters_not_used "-- \:\(icase\)FILE4" &&
+	test_bloom_filters_not_used "-- \:\(exclude\)A/B/C" &&
+
+	test_when_finished "rm -f .gitattributes" &&
+	cat >.gitattributes <<-EOF &&
+	A/file1 text
+	A/B/file2 -text
+	EOF
+	test_bloom_filters_used "-- \:\(attr\:text\)A"
 '
 
 test_expect_success 'setup - add commit-graph to the chain without Bloom filters' '
-- 
2.51.0.rc0.49.gea0dd4b6b4.dirty


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-09  2:16     ` [PATCH v3] " Lidong Yan
@ 2025-08-09  4:22       ` Lidong Yan
  2025-08-09  7:40         ` Lidong Yan
  2025-08-11  6:01         ` [PATCH v5] " Lidong Yan
  2025-08-09 23:51       ` [PATCH v3] " Junio C Hamano
  1 sibling, 2 replies; 16+ messages in thread
From: Lidong Yan @ 2025-08-09  4:22 UTC (permalink / raw)
  To: yldhome2d2; +Cc: git, gitster, stolee, ttaylorr

When traversing commits, a pathspec item can be used to limit the
traversal to commits that modify the specified paths. And the
commit-graph includes a Bloom filter to exclude commits that definitely
did not modify a given pathspec item. During commit traversal, the
Bloom filter can significantly improve performance. However, it is
disabled if the specified pathspec item contains wildcard characters
or magic signatures.

For performance reason, enable Bloom filter even if a pathspec item
contains wildcard characters by filtering only the non-wildcard part of
the pathspec item.

The function of pathspec magic signature is generally to narrow down
the path specified by the pathspecs. So, enable Bloom filter when
the magic signature is "top", "glob", "attr", "--depth" or "literal".
"exclude" is used to select paths other than the specified path, rather
than serving as a filtering function, so it cannot be used together with
the Bloom filter. Since Bloom filter is not case insensitive even in
case insensitive system (e.g. MacOS), it cannot be used together with
"icase" magic.

With this optimization, we get some improvements for pathspecs with
wildcards or magic signatures. First, in the Git repository we see these
modest results:

git log -100 -- "t/*"

Benchmark 1: new
  Time (mean ± σ):      20.4 ms ±   0.6 ms
  Range (min … max):    19.3 ms …  24.4 ms

Benchmark 2: old
  Time (mean ± σ):      23.4 ms ±   0.5 ms
  Range (min … max):    22.5 ms …  24.7 ms

git log -100 -- ":(top)t"

Benchmark 1: new
  Time (mean ± σ):      16.2 ms ±   0.4 ms
  Range (min … max):    15.3 ms …  17.2 ms

Benchmark 2: old
  Time (mean ± σ):      18.6 ms ±   0.5 ms
  Range (min … max):    17.6 ms …  20.4 ms

But in a larger repo, such as the LLVM project repo below, we get even
better results:

git log -100 -- "libc/*"

Benchmark 1: new
  Time (mean ± σ):      16.0 ms ±   0.6 ms
  Range (min … max):    14.7 ms …  17.8 ms

Benchmark 2: old
  Time (mean ± σ):      26.7 ms ±   0.5 ms
  Range (min … max):    25.4 ms …  27.8 ms

git log -100 -- ":(top)libc"

Benchmark 1: new
  Time (mean ± σ):      15.6 ms ±   0.6 ms
  Range (min … max):    14.4 ms …  17.7 ms

Benchmark 2: old
  Time (mean ± σ):      19.6 ms ±   0.5 ms
  Range (min … max):    18.6 ms …  20.6 ms

Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
[jc: avoid allocating zero length path in
convert_pathspec_to_bloom_keyvec()]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 revision.c           | 45 +++++++++++++++++++++++++++-----------------
 t/t4216-log-bloom.sh | 31 ++++++++++++++++++++++++++----
 2 files changed, 55 insertions(+), 21 deletions(-)

diff --git a/revision.c b/revision.c
index 18f300d455..79372fd483 100644
--- a/revision.c
+++ b/revision.c
@@ -671,12 +671,17 @@ static void trace2_bloom_filter_statistics_atexit(void)
 
 static int forbid_bloom_filters(struct pathspec *spec)
 {
-	if (spec->has_wildcard)
-		return 1;
-	if (spec->magic & ~PATHSPEC_LITERAL)
+	unsigned int allowed_magic =
+		PATHSPEC_FROMTOP |
+		PATHSPEC_MAXDEPTH |
+		PATHSPEC_LITERAL |
+		PATHSPEC_GLOB |
+		PATHSPEC_ATTR;
+
+	if (spec->magic & ~allowed_magic)
 		return 1;
 	for (size_t nr = 0; nr < spec->nr; nr++)
-		if (spec->items[nr].magic & ~PATHSPEC_LITERAL)
+		if (spec->items[nr].magic & ~allowed_magic)
 			return 1;
 
 	return 0;
@@ -691,26 +696,32 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
 	char *path_alloc = NULL;
 	const char *path;
 	size_t len;
-	int res = 0;
 
+	len = pi->nowildcard_len;
+	if (len != pi->len) {
+		/*
+		 * for path like "dir/file*", nowildcard part would be
+		 * "dir/file", but only "dir" should be used for the
+		 * bloom filter
+		 */
+		while (len > 0 && pi->match[len - 1] != '/')
+			len--;
+	}
 	/* remove single trailing slash from path, if needed */
-	if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
-		path_alloc = xmemdupz(pi->match, pi->len - 1);
+	if (len > 0 && pi->match[len - 1] == '/')
+		len--;
+
+	if (!len)
+		return -1;
+
+	if (len != pi->len) {
+		path_alloc = xmemdupz(pi->match, len);
 		path = path_alloc;
 	} else
 		path = pi->match;
 
-	len = strlen(path);
-	if (!len) {
-		res = -1;
-		goto cleanup;
-	}
-
 	*out = bloom_keyvec_new(path, len, settings);
-
-cleanup:
-	free(path_alloc);
-	return res;
+	return 0;
 }
 
 static void prepare_to_use_bloom_filter(struct rev_info *revs)
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 639868ac56..1064990de3 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -154,11 +154,34 @@ test_expect_success 'git log with multiple literal paths uses Bloom filter' '
 	test_bloom_filters_used "-- file*"
 '
 
-test_expect_success 'git log with path contains a wildcard does not use Bloom filter' '
+test_expect_success 'git log with paths all contain non-wildcard part uses Bloom filter' '
+	test_bloom_filters_used "-- A/\* file4" &&
+	test_bloom_filters_used "-- A/file\*" &&
+	test_bloom_filters_used "-- * A/\*"
+'
+
+test_expect_success 'git log with path only contains wildcard part does not use Bloom filter' '
 	test_bloom_filters_not_used "-- file\*" &&
-	test_bloom_filters_not_used "-- A/\* file4" &&
-	test_bloom_filters_not_used "-- file4 A/\*" &&
-	test_bloom_filters_not_used "-- * A/\*"
+	test_bloom_filters_not_used "-- file\* A/\*" &&
+	test_bloom_filters_not_used "-- file\* *" &&
+	test_bloom_filters_not_used "-- \*"
+'
+
+test_expect_success 'git log with path contains various magic signatures' '
+	cd A &&
+	test_bloom_filters_used "-- \:\(top\)B" &&
+	cd .. &&
+
+	test_bloom_filters_used "-- \:\(glob\)A/\*\*/C" &&
+	test_bloom_filters_not_used "-- \:\(icase\)FILE4" &&
+	test_bloom_filters_not_used "-- \:\(exclude\)A/B/C" &&
+
+	test_when_finished "rm -f .gitattributes" &&
+	cat >.gitattributes <<-EOF &&
+	A/file1 text
+	A/B/file2 -text
+	EOF
+	test_bloom_filters_used "-- \:\(attr\:text\)A"
 '
 
 test_expect_success 'setup - add commit-graph to the chain without Bloom filters' '
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-09  4:22       ` [PATCH v4] " Lidong Yan
@ 2025-08-09  7:40         ` Lidong Yan
  2025-08-11  6:01         ` [PATCH v5] " Lidong Yan
  1 sibling, 0 replies; 16+ messages in thread
From: Lidong Yan @ 2025-08-09  7:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, gitster, stolee, ttaylorr

Lidong Yan <yldhome2d2@gmail.com> writes:
> 
> When traversing commits, a pathspec item can be used to limit the
> traversal to commits that modify the specified paths. And the
> commit-graph includes a Bloom filter to exclude commits that definitely
> did not modify a given pathspec item. During commit traversal, the
> Bloom filter can significantly improve performance. However, it is
> disabled if the specified pathspec item contains wildcard characters
> or magic signatures.
> 
> For performance reason, enable Bloom filter even if a pathspec item
> contains wildcard characters by filtering only the non-wildcard part of
> the pathspec item.
> 
> The function of pathspec magic signature is generally to narrow down
> the path specified by the pathspecs. So, enable Bloom filter when
> the magic signature is "top", "glob", "attr", "--depth" or "literal".
> "exclude" is used to select paths other than the specified path, rather
> than serving as a filtering function, so it cannot be used together with
> the Bloom filter. Since Bloom filter is not case insensitive even in
> case insensitive system (e.g. MacOS), it cannot be used together with
> "icase" magic.
> 
> With this optimization, we get some improvements for pathspecs with
> wildcards or magic signatures. First, in the Git repository we see these
> modest results:
> 
> git log -100 -- "t/*"
> 
> Benchmark 1: new
>  Time (mean ± σ):      20.4 ms ±   0.6 ms
>  Range (min … max):    19.3 ms …  24.4 ms
> 
> Benchmark 2: old
>  Time (mean ± σ):      23.4 ms ±   0.5 ms
>  Range (min … max):    22.5 ms …  24.7 ms
> 
> git log -100 -- ":(top)t"
> 
> Benchmark 1: new
>  Time (mean ± σ):      16.2 ms ±   0.4 ms
>  Range (min … max):    15.3 ms …  17.2 ms
> 
> Benchmark 2: old
>  Time (mean ± σ):      18.6 ms ±   0.5 ms
>  Range (min … max):    17.6 ms …  20.4 ms
> 
> But in a larger repo, such as the LLVM project repo below, we get even
> better results:
> 
> git log -100 -- "libc/*"
> 
> Benchmark 1: new
>  Time (mean ± σ):      16.0 ms ±   0.6 ms
>  Range (min … max):    14.7 ms …  17.8 ms
> 
> Benchmark 2: old
>  Time (mean ± σ):      26.7 ms ±   0.5 ms
>  Range (min … max):    25.4 ms …  27.8 ms
> 
> git log -100 -- ":(top)libc"
> 
> Benchmark 1: new
>  Time (mean ± σ):      15.6 ms ±   0.6 ms
>  Range (min … max):    14.4 ms …  17.7 ms
> 
> Benchmark 2: old
>  Time (mean ± σ):      19.6 ms ±   0.5 ms
>  Range (min … max):    18.6 ms …  20.6 ms
> 
> Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
> [jc: avoid allocating zero length path in
> convert_pathspec_to_bloom_keyvec()]
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> revision.c           | 45 +++++++++++++++++++++++++++-----------------
> t/t4216-log-bloom.sh | 31 ++++++++++++++++++++++++++----
> 2 files changed, 55 insertions(+), 21 deletions(-)
> 
> diff --git a/revision.c b/revision.c
> index 18f300d455..79372fd483 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -671,12 +671,17 @@ static void trace2_bloom_filter_statistics_atexit(void)
> 
> static int forbid_bloom_filters(struct pathspec *spec)
> {
> - if (spec->has_wildcard)
> - return 1;
> - if (spec->magic & ~PATHSPEC_LITERAL)
> + unsigned int allowed_magic =
> + PATHSPEC_FROMTOP |
> + PATHSPEC_MAXDEPTH |
> + PATHSPEC_LITERAL |
> + PATHSPEC_GLOB |
> + PATHSPEC_ATTR;
> +
> + if (spec->magic & ~allowed_magic)
> return 1;
> for (size_t nr = 0; nr < spec->nr; nr++)
> - if (spec->items[nr].magic & ~PATHSPEC_LITERAL)
> + if (spec->items[nr].magic & ~allowed_magic)
> return 1;
> 
> return 0;
> @@ -691,26 +696,32 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
> char *path_alloc = NULL;
> const char *path;
> size_t len;
> - int res = 0;
> 
> + len = pi->nowildcard_len;
> + if (len != pi->len) {
> + /*
> + * for path like "dir/file*", nowildcard part would be
> + * "dir/file", but only "dir" should be used for the
> + * bloom filter
> + */
> + while (len > 0 && pi->match[len - 1] != '/')
> + len--;
> + }
> /* remove single trailing slash from path, if needed */
> - if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
> - path_alloc = xmemdupz(pi->match, pi->len - 1);
> + if (len > 0 && pi->match[len - 1] == '/')
> + len--;
> +
> + if (!len)
> + return -1;
> +
> + if (len != pi->len) {
> + path_alloc = xmemdupz(pi->match, len);
> path = path_alloc;
> } else
> path = pi->match;
> 
> - len = strlen(path);
> - if (!len) {
> - res = -1;
> - goto cleanup;
> - }
> -
> *out = bloom_keyvec_new(path, len, settings);
> -
> -cleanup:
> - free(path_alloc);
> - return res;
> + return 0;
> }

I realized that I shouldn’t delete free(path_alloc) part in the patch. I wonder if
Junio could help remove that part from the patch.

Thanks,
Lidong

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-09  2:16     ` [PATCH v3] " Lidong Yan
  2025-08-09  4:22       ` [PATCH v4] " Lidong Yan
@ 2025-08-09 23:51       ` Junio C Hamano
  2025-08-10  1:57         ` Lidong Yan
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2025-08-09 23:51 UTC (permalink / raw)
  To: Lidong Yan; +Cc: git, stolee, ttaylorr

Lidong Yan <yldhome2d2@gmail.com> writes:

> [jc: avoid allocating zero length path in
> convert_pathspec_to_bloom_keyvec()]

This is different from what I did, though.

> @@ -693,19 +698,31 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
>  	size_t len;
>  	int res = 0;
>  
> +	len = pi->nowildcard_len;
> +	if (len != pi->len) {
> +		/*
> +		 * for path like "dir/file*", nowildcard part would be
> +		 * "dir/file", but only "dir" should be used for the
> +		 * bloom filter
> +		 */

A missing full-stop.

> +		while (len > 0 && pi->match[len - 1] != '/')
> +			len--;
> +	}
>  	/* remove single trailing slash from path, if needed */
> -	if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
> -		path_alloc = xmemdupz(pi->match, pi->len - 1);
> -		path = path_alloc;
> -	} else
> -		path = pi->match;
> +	if (len > 0 && pi->match[len - 1] == '/')
> +		len--;
>  
> -	len = strlen(path);
>  	if (!len) {
>  		res = -1;
>  		goto cleanup;
>  	}
>  
> +	if (len != pi->len) {
> +		path_alloc = xmemdupz(pi->match, len);
> +		path = path_alloc;
> +	} else
> +		path = pi->match;
> +
>  	*out = bloom_keyvec_new(path, len, settings);
>  
>  cleanup:

Two comments.

 * For a function that finds an error condition in the middle and
   jumps to the "cleanup:" label at the end, it is more future-proof
   to start pessimistic (i.e. initialize 'res' to error(-1)) and
   flip 'res' to success(0) at the very end when everything went
   well.  It would simplify the change necessary when we need to add
   _more_ early error return code paths to the function in the
   future.

   But this flip from "assume success" to "assume failure" is
   something that should be not be done as part of this patch;
   perhaps doing it a separate preliminary clean-up patch is a
   better way to do so.

 * I think the change from v3 (this one) to v4 makes the function
   worse; we found that it is a good practice to have a single place
   to release any resources we temporarily acquired and arrange
   exception handling code to just jump there during the course of
   this project.

   The current implementation may happen to have only one such early
   return (i.e. "len has become 0; we realize that we cannot use the
   Bloom filter"), but adding a new early return in the future would
   be easier if you kept the original arrangement.  The new early
   return condition may have to be computed after we have acquired
   resources we need to release, so it may need more than a simple
   "return -1".

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-09 23:51       ` [PATCH v3] " Junio C Hamano
@ 2025-08-10  1:57         ` Lidong Yan
  0 siblings, 0 replies; 16+ messages in thread
From: Lidong Yan @ 2025-08-10  1:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, stolee, ttaylorr

Junio C Hamano <gitster@pobox.com> writes:
> 
> Lidong Yan <yldhome2d2@gmail.com> writes:
> 
>> [jc: avoid allocating zero length path in
>> convert_pathspec_to_bloom_keyvec()]
> 
> This is different from what I did, though.

Sorry, I don’t fully understand what you mean — should I remove
this line or rewrite it?

> 
>> @@ -693,19 +698,31 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
>> size_t len;
>> int res = 0;
>> 
>> + len = pi->nowildcard_len;
>> + if (len != pi->len) {
>> + /*
>> + * for path like "dir/file*", nowildcard part would be
>> + * "dir/file", but only "dir" should be used for the
>> + * bloom filter
>> + */
> 
> A missing full-stop.

Will fix.

> 
>> + while (len > 0 && pi->match[len - 1] != '/')
>> + len--;
>> + }
>> /* remove single trailing slash from path, if needed */
>> - if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
>> - path_alloc = xmemdupz(pi->match, pi->len - 1);
>> - path = path_alloc;
>> - } else
>> - path = pi->match;
>> + if (len > 0 && pi->match[len - 1] == '/')
>> + len--;
>> 
>> - len = strlen(path);
>> if (!len) {
>> res = -1;
>> goto cleanup;
>> }
>> 
>> + if (len != pi->len) {
>> + path_alloc = xmemdupz(pi->match, len);
>> + path = path_alloc;
>> + } else
>> + path = pi->match;
>> +
>> *out = bloom_keyvec_new(path, len, settings);
>> 
>> cleanup:
> 
> Two comments.
> 
> * For a function that finds an error condition in the middle and
>   jumps to the "cleanup:" label at the end, it is more future-proof
>   to start pessimistic (i.e. initialize 'res' to error(-1)) and
>   flip 'res' to success(0) at the very end when everything went
>   well.  It would simplify the change necessary when we need to add
>   _more_ early error return code paths to the function in the
>   future.
> 
>   But this flip from "assume success" to "assume failure" is
>   something that should be not be done as part of this patch;
>   perhaps doing it a separate preliminary clean-up patch is a
>   better way to do so.

Ah, I’ve always thought that `ret = -1; goto cleanup;` was a kind of the
everybody-should-use pattern. So when I saw you write `int ret = -1;` first,
I was a bit puzzled. Now I understand what you mean, and I’ll add a
cleanup patch.

> 
> * I think the change from v3 (this one) to v4 makes the function
>   worse; we found that it is a good practice to have a single place
>   to release any resources we temporarily acquired and arrange
>   exception handling code to just jump there during the course of
>   this project.
> 
>   The current implementation may happen to have only one such early
>   return (i.e. "len has become 0; we realize that we cannot use the
>   Bloom filter"), but adding a new early return in the future would
>   be easier if you kept the original arrangement.  The new early
>   return condition may have to be computed after we have acquired
>   resources we need to release, so it may need more than a simple
>   "return -1”.

Understand. I was thinking about less code is better. I will add the cleanup
part back.

Thanks,
Lidong

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-09  4:22       ` [PATCH v4] " Lidong Yan
  2025-08-09  7:40         ` Lidong Yan
@ 2025-08-11  6:01         ` Lidong Yan
  2025-08-11 15:56           ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Lidong Yan @ 2025-08-11  6:01 UTC (permalink / raw)
  To: yldhome2d2; +Cc: git, gitster, stolee, ttaylorr

When traversing commits, a pathspec item can be used to limit the
traversal to commits that modify the specified paths. And the
commit-graph includes a Bloom filter to exclude commits that definitely
did not modify a given pathspec item. During commit traversal, the
Bloom filter can significantly improve performance. However, it is
disabled if the specified pathspec item contains wildcard characters
or magic signatures.

For performance reason, enable Bloom filter even if a pathspec item
contains wildcard characters by filtering only the non-wildcard part of
the pathspec item.

The function of pathspec magic signature is generally to narrow down
the path specified by the pathspecs. So, enable Bloom filter when
the magic signature is "top", "glob", "attr", "--depth" or "literal".
"exclude" is used to select paths other than the specified path, rather
than serving as a filtering function, so it cannot be used together with
the Bloom filter. Since Bloom filter is not case insensitive even in
case insensitive system (e.g. MacOS), it cannot be used together with
"icase" magic.

With this optimization, we get some improvements for pathspecs with
wildcards or magic signatures. First, in the Git repository we see these
modest results:

git log -100 -- "t/*"

Benchmark 1: new
  Time (mean ± σ):      20.4 ms ±   0.6 ms
  Range (min … max):    19.3 ms …  24.4 ms

Benchmark 2: old
  Time (mean ± σ):      23.4 ms ±   0.5 ms
  Range (min … max):    22.5 ms …  24.7 ms

git log -100 -- ":(top)t"

Benchmark 1: new
  Time (mean ± σ):      16.2 ms ±   0.4 ms
  Range (min … max):    15.3 ms …  17.2 ms

Benchmark 2: old
  Time (mean ± σ):      18.6 ms ±   0.5 ms
  Range (min … max):    17.6 ms …  20.4 ms

But in a larger repo, such as the LLVM project repo below, we get even
better results:

git log -100 -- "libc/*"

Benchmark 1: new
  Time (mean ± σ):      16.0 ms ±   0.6 ms
  Range (min … max):    14.7 ms …  17.8 ms

Benchmark 2: old
  Time (mean ± σ):      26.7 ms ±   0.5 ms
  Range (min … max):    25.4 ms …  27.8 ms

git log -100 -- ":(top)libc"

Benchmark 1: new
  Time (mean ± σ):      15.6 ms ±   0.6 ms
  Range (min … max):    14.4 ms …  17.7 ms

Benchmark 2: old
  Time (mean ± σ):      19.6 ms ±   0.5 ms
  Range (min … max):    18.6 ms …  20.6 ms

Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
[jc: avoid allocating zero length path in
convert_pathspec_to_bloom_keyvec()]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 revision.c           | 42 +++++++++++++++++++++++++++++-------------
 t/t4216-log-bloom.sh | 31 +++++++++++++++++++++++++++----
 2 files changed, 56 insertions(+), 17 deletions(-)

diff --git a/revision.c b/revision.c
index 18f300d455..7449064def 100644
--- a/revision.c
+++ b/revision.c
@@ -671,12 +671,17 @@ static void trace2_bloom_filter_statistics_atexit(void)
 
 static int forbid_bloom_filters(struct pathspec *spec)
 {
-	if (spec->has_wildcard)
-		return 1;
-	if (spec->magic & ~PATHSPEC_LITERAL)
+	unsigned int allowed_magic =
+		PATHSPEC_FROMTOP |
+		PATHSPEC_MAXDEPTH |
+		PATHSPEC_LITERAL |
+		PATHSPEC_GLOB |
+		PATHSPEC_ATTR;
+
+	if (spec->magic & ~allowed_magic)
 		return 1;
 	for (size_t nr = 0; nr < spec->nr; nr++)
-		if (spec->items[nr].magic & ~PATHSPEC_LITERAL)
+		if (spec->items[nr].magic & ~allowed_magic)
 			return 1;
 
 	return 0;
@@ -691,23 +696,34 @@ static int convert_pathspec_to_bloom_keyvec(struct bloom_keyvec **out,
 	char *path_alloc = NULL;
 	const char *path;
 	size_t len;
-	int res = 0;
+	int res = -1;
 
+	len = pi->nowildcard_len;
+	if (len != pi->len) {
+		/*
+		 * for path like "dir/file*", nowildcard part would be
+		 * "dir/file", but only "dir" should be used for the
+		 * bloom filter.
+		 */
+		while (len > 0 && pi->match[len - 1] != '/')
+			len--;
+	}
 	/* remove single trailing slash from path, if needed */
-	if (pi->len > 0 && pi->match[pi->len - 1] == '/') {
-		path_alloc = xmemdupz(pi->match, pi->len - 1);
+	if (len > 0 && pi->match[len - 1] == '/')
+		len--;
+
+	if (!len)
+		goto cleanup;
+
+	if (len != pi->len) {
+		path_alloc = xmemdupz(pi->match, len);
 		path = path_alloc;
 	} else
 		path = pi->match;
 
-	len = strlen(path);
-	if (!len) {
-		res = -1;
-		goto cleanup;
-	}
-
 	*out = bloom_keyvec_new(path, len, settings);
 
+	res = 0;
 cleanup:
 	free(path_alloc);
 	return res;
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 639868ac56..1064990de3 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -154,11 +154,34 @@ test_expect_success 'git log with multiple literal paths uses Bloom filter' '
 	test_bloom_filters_used "-- file*"
 '
 
-test_expect_success 'git log with path contains a wildcard does not use Bloom filter' '
+test_expect_success 'git log with paths all contain non-wildcard part uses Bloom filter' '
+	test_bloom_filters_used "-- A/\* file4" &&
+	test_bloom_filters_used "-- A/file\*" &&
+	test_bloom_filters_used "-- * A/\*"
+'
+
+test_expect_success 'git log with path only contains wildcard part does not use Bloom filter' '
 	test_bloom_filters_not_used "-- file\*" &&
-	test_bloom_filters_not_used "-- A/\* file4" &&
-	test_bloom_filters_not_used "-- file4 A/\*" &&
-	test_bloom_filters_not_used "-- * A/\*"
+	test_bloom_filters_not_used "-- file\* A/\*" &&
+	test_bloom_filters_not_used "-- file\* *" &&
+	test_bloom_filters_not_used "-- \*"
+'
+
+test_expect_success 'git log with path contains various magic signatures' '
+	cd A &&
+	test_bloom_filters_used "-- \:\(top\)B" &&
+	cd .. &&
+
+	test_bloom_filters_used "-- \:\(glob\)A/\*\*/C" &&
+	test_bloom_filters_not_used "-- \:\(icase\)FILE4" &&
+	test_bloom_filters_not_used "-- \:\(exclude\)A/B/C" &&
+
+	test_when_finished "rm -f .gitattributes" &&
+	cat >.gitattributes <<-EOF &&
+	A/file1 text
+	A/B/file2 -text
+	EOF
+	test_bloom_filters_used "-- \:\(attr\:text\)A"
 '
 
 test_expect_success 'setup - add commit-graph to the chain without Bloom filters' '
-- 
2.39.5 (Apple Git-154)


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-11  6:01         ` [PATCH v5] " Lidong Yan
@ 2025-08-11 15:56           ` Junio C Hamano
  2025-08-11 16:08             ` Lidong Yan
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2025-08-11 15:56 UTC (permalink / raw)
  To: Lidong Yan; +Cc: git, stolee, ttaylorr

Lidong Yan <yldhome2d2@gmail.com> writes:

> Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
> [jc: avoid allocating zero length path in
> convert_pathspec_to_bloom_keyvec()]
> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Instead just do

        Helped-by: Junio C Hamano <gitster@pobox.com>
        Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>

here.  [who: comment] followed by a sign-off from that person is
done by the person who is signing off the tweak, not by the original
author.

No need to resend; I'll fix it up locally.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5] bloom: enable bloom filter with wildcard pathspec in revision traversal
  2025-08-11 15:56           ` Junio C Hamano
@ 2025-08-11 16:08             ` Lidong Yan
  0 siblings, 0 replies; 16+ messages in thread
From: Lidong Yan @ 2025-08-11 16:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, stolee, ttaylorr

Junio C Hamano <gitster@pobox.com> writes:
> 
> Lidong Yan <yldhome2d2@gmail.com> writes:
> 
>> Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
>> [jc: avoid allocating zero length path in
>> convert_pathspec_to_bloom_keyvec()]
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> 
> Instead just do
> 
>        Helped-by: Junio C Hamano <gitster@pobox.com>
>        Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
> 
> here.  [who: comment] followed by a sign-off from that person is
> done by the person who is signing off the tweak, not by the original
> author.

I see — when someone makes additions to another person’s
commit, they’ll also modify the log message in the process.

Thanks,
Lidong


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-08-11 16:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-07  5:12 [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal Lidong Yan
2025-08-07  6:49 ` Patrick Steinhardt
2025-08-07  8:59   ` Lidong Yan
2025-08-07 16:15   ` Junio C Hamano
2025-08-08  6:40     ` Lidong Yan
2025-08-08  6:58 ` [PATCH v2] " Lidong Yan
2025-08-08 15:50   ` Junio C Hamano
2025-08-09  2:06     ` Lidong Yan
2025-08-09  2:16     ` [PATCH v3] " Lidong Yan
2025-08-09  4:22       ` [PATCH v4] " Lidong Yan
2025-08-09  7:40         ` Lidong Yan
2025-08-11  6:01         ` [PATCH v5] " Lidong Yan
2025-08-11 15:56           ` Junio C Hamano
2025-08-11 16:08             ` Lidong Yan
2025-08-09 23:51       ` [PATCH v3] " Junio C Hamano
2025-08-10  1:57         ` Lidong Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).