All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, derrickstolee@github.com
Subject: Re: Changed path filter hash differs from murmur3 if char is signed
Date: Fri, 12 May 2023 12:42:10 -0700	[thread overview]
Message-ID: <xmqqbkippca5.fsf@gitster.g> (raw)
In-Reply-To: <20230512173330.1072880-1-jonathantanmy@google.com> (Jonathan Tan's message of "Fri, 12 May 2023 10:33:29 -0700")

Jonathan Tan <jonathantanmy@google.com> writes:

> Yes - if the bloom filter contained junk data (in our example, created
> using a different hash function on filenames that have characters that
> exceed 0x7f), the bloom filter would report "no, this commit does not
> contain a change in such-and-such path" and then we would skip the
> commit, even if the commit did have a change in that path.

Just to help my understanding (read: I am not suggesting this as one
of the holes to exploit to help a smooth transition), does the above
mean that, as long as the path we are asking about does not have a
byte with the high-bit set, we would be OK, even if the Bloom filter
were constructed with a bad function and there were other paths that
had such a byte?

> I don't have statistics on this, but if the majority of repos have
> only <=0x7f filenames (which seems reasonable to me), this might save
> sufficient work that we can proceed with bumping the version number and
> ignoring old data.
>
>> Better yet, we should be able to reuse existing Bloom filter data for
>> paths that have all characters <=0xff, and only recompute them where

"ff" -> "7f" I presume?

>> necessary. That makes much more sense than the previous paragraph.

  reply	other threads:[~2023-05-12 19:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11 22:40 Changed path filter hash differs from murmur3 if char is signed Jonathan Tan
2023-05-11 22:51 ` Junio C Hamano
2023-05-11 23:10   ` Taylor Blau
2023-05-12 17:33     ` Jonathan Tan
2023-05-12 19:42       ` Junio C Hamano [this message]
2023-05-12 20:54         ` Jonathan Tan
2023-05-12 21:27           ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqbkippca5.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.