From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, derrickstolee@github.com
Subject: Re: Changed path filter hash differs from murmur3 if char is signed
Date: Thu, 11 May 2023 15:51:16 -0700 [thread overview]
Message-ID: <xmqqjzxetrbv.fsf@gitster.g> (raw)
In-Reply-To: <20230511224101.972442-1-jonathantanmy@google.com> (Jonathan Tan's message of "Thu, 11 May 2023 15:40:59 -0700")
Jonathan Tan <jonathantanmy@google.com> writes:
> So...how do we proceed? I can see at least 2 ways:
>
> - Decide that we're going to stick with the details of the existing
> implementation and declare that "data" is always interpreted as signed.
> In that case, I would put "signed" wherever necessary, rename the
> function to something that is not "murmur3", and change the names of
> byte1 etc. to indicate that they are not constrained to be less than or
> equal to 0xff.
>
> - Bump the version number to 2 and correct the implementation to
> match murmur3 (so, "data" is unsigned). Then we would have to think of
> a transition plan. One possible one might be to always reject version
> 1 bloom filters, which I'm personally OK with, but it may seem too
> heavy a cost to some since in the perhaps typical case where a repo has
> filenames restricted to 0x7f and below, the existing bloom filters are
> still correct.
If path filter hashing were merely advisory, in the sense that if a
matching data is found, great, the processing goes faster, but if
not, we would get correct results albeit not so quickly, a third
option would be to just update the implementation without updating
the version number. But we may not be so lucky---you must have seen
a wrong result returned quickly, which is not what we want to see.
But if I recall correctly we made the file format in such a way that
bumping the version number is cheap in that transition can appear
seamless. An updated implementation can just be told to _ignore_
old and possibly incorrect Bloom filters until it gets told to
recompute, at which time it can write a correct one with a new
version number. So I would prefer your "Bump the version number and
ignore the old and possibly wrong data".
Thanks.
next prev parent reply other threads:[~2023-05-11 22:51 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-11 22:40 Changed path filter hash differs from murmur3 if char is signed Jonathan Tan
2023-05-11 22:51 ` Junio C Hamano [this message]
2023-05-11 23:10 ` Taylor Blau
2023-05-12 17:33 ` Jonathan Tan
2023-05-12 19:42 ` Junio C Hamano
2023-05-12 20:54 ` Jonathan Tan
2023-05-12 21:27 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqjzxetrbv.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.