From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: Lidong Yan <yldhome2d2@gmail.com>,
git@vger.kernel.org, stolee@gmail.com, ttaylorr@github.com
Subject: Re: [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal
Date: Thu, 07 Aug 2025 09:15:01 -0700 [thread overview]
Message-ID: <xmqqa54brtve.fsf@gitster.g> (raw)
In-Reply-To: <aJRMaYfMd3PlRtoz@pks.im> (Patrick Steinhardt's message of "Thu, 7 Aug 2025 08:49:13 +0200")
Patrick Steinhardt <ps@pks.im> writes:
> On Thu, Aug 07, 2025 at 01:12:43PM +0800, Lidong Yan wrote:
>
> In the subject it should be s/enable bloom/enable Bloom.
>
>> When traversing commits, a pathspec item can be used to limit the
>> traversal to commits that modify the specified paths. And the
>> commit-graph includes a Bloom filter to exclude commits that definitely
>> did not modify a given pathspec item. During commit traversal, the
>> Bloom filter can significantly improve performance. However, it is
>> disabled if the specified pathspec item contains wildcard characters
>> or magic signatures.
>
> Let's add a paragraph here, as we now switch into the "what is being
> done mode".
I agree, a paragraph break is very welcome here. I was confused
when I read this first that you are suggesting to add a new
paragraph with unspecified contents.
>> Also Enable Bloom filter if magic signature is not "exclude" or
>> "icase".
>
> This explains what is done, but not why this is safe to do.
And for that, the enumeration needs to be done inclusively, e.g. "we
know :(literal) is OK because we can do X; we know :(glob) is OK
because we can do Y".
The numbers are impressive ;-)
>> diff --git a/revision.c b/revision.c
>> index 18f300d455..ef8c0b6eca 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -671,12 +671,13 @@ static void trace2_bloom_filter_statistics_atexit(void)
>>
>> static int forbid_bloom_filters(struct pathspec *spec)
>> {
>> - if (spec->has_wildcard)
>> - return 1;
>> - if (spec->magic & ~PATHSPEC_LITERAL)
>> + int forbid_mask =
>
> The mask should be `unsigned`.
>
>> + PATHSPEC_EXCLUDE | PATHSPEC_ICASE;
>
> I think instead of a forbid-mask we should use an allow-mask. Otherwise
> it can happen quite easily that we add new magic that isn't compatible
> with Bloom filters but forget to update this part here. I'd rather be
> slow but correct than fast but incorrect.
Good suggestion. From the use of &-, it is obvious it is a mask, but
"allow-mask" is not clear among what kind of things some are
allowed, so how about calling it:
unsigned allowed_magic;
Thanks.
next prev parent reply other threads:[~2025-08-07 16:15 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-07 5:12 [PATCH] bloom: enable bloom filter with wildcard pathspec in revision traversal Lidong Yan
2025-08-07 6:49 ` Patrick Steinhardt
2025-08-07 8:59 ` Lidong Yan
2025-08-07 16:15 ` Junio C Hamano [this message]
2025-08-08 6:40 ` Lidong Yan
2025-08-08 6:58 ` [PATCH v2] " Lidong Yan
2025-08-08 15:50 ` Junio C Hamano
2025-08-09 2:06 ` Lidong Yan
2025-08-09 2:16 ` [PATCH v3] " Lidong Yan
2025-08-09 4:22 ` [PATCH v4] " Lidong Yan
2025-08-09 7:40 ` Lidong Yan
2025-08-11 6:01 ` [PATCH v5] " Lidong Yan
2025-08-11 15:56 ` Junio C Hamano
2025-08-11 16:08 ` Lidong Yan
2025-08-09 23:51 ` [PATCH v3] " Junio C Hamano
2025-08-10 1:57 ` Lidong Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqa54brtve.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
--cc=ttaylorr@github.com \
--cc=yldhome2d2@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.