From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: Lidong Yan <yldhome2d2@gmail.com>,
git@vger.kernel.org, stolee@gmail.com, ttaylorr@github.com
Subject: Re: [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal
Date: Thu, 07 Aug 2025 09:15:01 -0700 [thread overview]
Message-ID: <xmqqa54brtve.fsf@gitster.g> (raw)
In-Reply-To: <aJRMaYfMd3PlRtoz@pks.im> (Patrick Steinhardt's message of "Thu, 7 Aug 2025 08:49:13 +0200")
Patrick Steinhardt <ps@pks.im> writes:
> On Thu, Aug 07, 2025 at 01:12:43PM +0800, Lidong Yan wrote:
>
> In the subject it should be s/enable bloom/enable Bloom.
>
>> When traversing commits, a pathspec item can be used to limit the
>> traversal to commits that modify the specified paths. And the
>> commit-graph includes a Bloom filter to exclude commits that definitely
>> did not modify a given pathspec item. During commit traversal, the
>> Bloom filter can significantly improve performance. However, it is
>> disabled if the specified pathspec item contains wildcard characters
>> or magic signatures.
>
> Let's add a paragraph here, as we now switch into the "what is being
> done mode".
I agree, a paragraph break is very welcome here. I was confused
when I read this first that you are suggesting to add a new
paragraph with unspecified contents.
>> Also Enable Bloom filter if magic signature is not "exclude" or
>> "icase".
>
> This explains what is done, but not why this is safe to do.
And for that, the enumeration needs to be done inclusively, e.g. "we
know :(literal) is OK because we can do X; we know :(glob) is OK
because we can do Y".
The numbers are impressive ;-)
>> diff --git a/revision.c b/revision.c
>> index 18f300d455..ef8c0b6eca 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -671,12 +671,13 @@ static void trace2_bloom_filter_statistics_atexit(void)
>>
>> static int forbid_bloom_filters(struct pathspec *spec)
>> {
>> - if (spec->has_wildcard)
>> - return 1;
>> - if (spec->magic & ~PATHSPEC_LITERAL)
>> + int forbid_mask =
>
> The mask should be `unsigned`.
>
>> + PATHSPEC_EXCLUDE | PATHSPEC_ICASE;
>
> I think instead of a forbid-mask we should use an allow-mask. Otherwise
> it can happen quite easily that we add new magic that isn't compatible
> with Bloom filters but forget to update this part here. I'd rather be
> slow but correct than fast but incorrect.
Good suggestion. From the use of &-, it is obvious it is a mask, but
"allow-mask" is not clear among what kind of things some are
allowed, so how about calling it:
unsigned allowed_magic;
Thanks.
next prev parent reply other threads:[~2025-08-07 16:15 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-07 5:12 [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal Lidong Yan
2025-08-07 6:49 ` Patrick Steinhardt
2025-08-07 8:59 ` Lidong Yan
2025-08-07 16:15 ` Junio C Hamano [this message]
2025-08-08 6:40 ` Lidong Yan
2025-08-08 6:58 ` [PATCH v2] " Lidong Yan
2025-08-08 15:50 ` Junio C Hamano
2025-08-09 2:06 ` Lidong Yan
2025-08-09 2:16 ` [PATCH v3] " Lidong Yan
2025-08-09 4:22 ` [PATCH v4] " Lidong Yan
2025-08-09 7:40 ` Lidong Yan
2025-08-11 6:01 ` [PATCH v5] " Lidong Yan
2025-08-11 15:56 ` Junio C Hamano
2025-08-11 16:08 ` Lidong Yan
2025-08-09 23:51 ` [PATCH v3] " Junio C Hamano
2025-08-10 1:57 ` Lidong Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqa54brtve.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
--cc=ttaylorr@github.com \
--cc=yldhome2d2@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).