All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: Derrick Stolee <stolee@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git <git@vger.kernel.org>, Taylor Blau <me@ttaylorr.com>,
	Garima Singh <garimasigit@gmail.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 1/3] revision: complicated pathspecs disable filters
Date: Wed, 15 Apr 2020 18:52:41 -0600	[thread overview]
Message-ID: <20200416005241.GB36156@syl.local> (raw)
In-Reply-To: <CANQwDwdcM-hc+Nyc0nBnp79W0BuH2PfdgM6WLqET3sPn1GYc-A@mail.gmail.com>

On Thu, Apr 16, 2020 at 12:18:33AM +0200, Jakub Narębski wrote:
> On Wed, 15 Apr 2020 at 20:37, Derrick Stolee <stolee@gmail.com> wrote:
> [...]
> > -->8--
> > From 89beb9598daabb19e3c896bbceeb0fc1b9ccc6ca Mon Sep 17 00:00:00 2001
> > From: Derrick Stolee <dstolee@microsoft.com>
> > Date: Wed, 15 Apr 2020 18:04:25 +0000
> > Subject: [PATCH] bloom: compute all Bloom hashes from lowercase
> >
> > The changed-path Bloom filters currently hash path strings using
> > the exact string for the path. This makes it difficult* to use the
> > filters when restricting to case-insensitive pathspecs.
> >
> > * I say "difficult" because it is possible to generate all 2^n
> >   options for the case of a path and test them all, but this is
> >   a bad idea and should not be done. "Impossible" is an appropriate
> >   alternative.
> >
> > THIS IS A BREAKING CHANGE. Commit-graph files with changed-path
> > Bloom filters computed by a previous commit will not be compatible
> > with the filters computed in this commit, nor will we get correct
> > results when testing across these incompatible versions. Normally,
> > this would be a completely unacceptable change, but the filters
> > have not been released and hence are still possible to update
> > before release.
> >
> > TODO: If we decide to move in this direction, then the following
> > steps should be done (and some of them should be done anyway):
> >
> > * We need to document the Bloom filter format to specify exactly
> >   how we compute the filter data. The details should be careful
> >   enough that someone can reproduce the exact file format without
> >   looking at the C code.
> >
> > * That document would include the tolower() transformation that is
> >   being done here.
>
> Why not modify the BDAT chunk to include version of
> case folding transformation or other collation algorithm
> (other transformation).that is done prior to computing
> the Bloom filter key? Though that might be unnecessary
> flexibility...

If this ends up being something that we want to do, I agree with
Stolee's reasoning that this should be a breaking change. If we were,
say, several months into having Bloom filters in a release and decided
at that point to make the change, then: sure, supporting both by writing
a bit in the BDAT chunk makes sense.

But, we're many months away from that state yet, and so I don't think
the cost of rebuilding what few commit-graphs exist with bloom filters
in them today to support both ordinary and lower-cased paths in the
filter.

Anyway, I'm still not sold on this idea in general (nor do I understand
it that others are), so I'll respond in more detail in another part of
the thread...

> For example the value of 0x00 in such field of BDAT
> chunk header would mean no transformation, while
> the value of 0x01 would mean per-character tolower()
> or Unicode equivalent of it.
>
> Best,
> --
> Jakub Narębski

Thanks,
Taylor

  reply	other threads:[~2020-04-16  0:52 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-11  1:02 [PATCH 0/3] Integrate changed-path Bloom filters with 'git blame' Derrick Stolee via GitGitGadget
2020-04-11  1:02 ` [PATCH 1/3] revision: complicated pathspecs disable filters Derrick Stolee via GitGitGadget
2020-04-11 21:40   ` Junio C Hamano
2020-04-13 11:49     ` Derrick Stolee
2020-04-14 18:25       ` Junio C Hamano
2020-04-15 13:27         ` Derrick Stolee
2020-04-15 18:37           ` Derrick Stolee
2020-04-15 19:32             ` Junio C Hamano
2020-04-15 19:39               ` Junio C Hamano
2020-04-15 21:25             ` Junio C Hamano
2020-04-16  0:56               ` Taylor Blau
2020-04-15 22:18             ` Jakub Narębski
2020-04-16  0:52               ` Taylor Blau [this message]
2020-04-16 13:26                 ` Derrick Stolee
2020-04-16 16:33                   ` Taylor Blau
2020-04-16 18:02                     ` Junio C Hamano
2020-04-12 22:22   ` Taylor Blau
2020-04-12 22:30     ` Junio C Hamano
2020-04-13  0:07       ` Taylor Blau
2020-04-13 11:54         ` Derrick Stolee
2020-04-11  1:03 ` [PATCH 2/3] commit: write commit-graph with bloom filters Derrick Stolee via GitGitGadget
2020-04-11 21:57   ` Junio C Hamano
2020-04-12 20:51     ` Taylor Blau
2020-04-13 12:08       ` Derrick Stolee
2020-04-13 22:11         ` Junio C Hamano
2020-04-11  1:03 ` [PATCH 3/3] blame: use changed-path Bloom filters Derrick Stolee via GitGitGadget
2020-04-11 22:03   ` Junio C Hamano
2020-04-12  7:39     ` Eric Sunshine
2020-04-11 21:30 ` [PATCH 0/3] Integrate changed-path Bloom filters with 'git blame' Junio C Hamano
2020-04-13 14:45 ` [PATCH v2 0/4] " Derrick Stolee via GitGitGadget
2020-04-13 14:45   ` [PATCH v2 1/4] revision: complicated pathspecs disable filters Derrick Stolee via GitGitGadget
2020-04-13 16:09     ` Taylor Blau
2020-04-13 22:18       ` Junio C Hamano
2020-04-13 14:45   ` [PATCH v2 2/4] commit: write commit-graph with Bloom filters Derrick Stolee via GitGitGadget
2020-04-13 16:12     ` Taylor Blau
2020-04-13 22:21       ` Junio C Hamano
2020-04-14 15:04         ` Derrick Stolee
2020-04-14 17:26           ` Junio C Hamano
2020-04-14 17:40             ` Derrick Stolee
2020-04-15  0:17               ` Taylor Blau
2020-04-13 14:45   ` [PATCH v2 3/4] commit-graph: write commit-graph in more tests Derrick Stolee via GitGitGadget
2020-04-13 14:45   ` [PATCH v2 4/4] blame: use changed-path Bloom filters Derrick Stolee via GitGitGadget
2020-04-13 16:21   ` [PATCH v2 0/4] Integrate changed-path Bloom filters with 'git blame' Taylor Blau
2020-04-16 20:14   ` [PATCH v3 0/3] " Derrick Stolee via GitGitGadget
2020-04-16 20:14     ` [PATCH v3 1/3] revision: complicated pathspecs disable filters Derrick Stolee via GitGitGadget
2020-06-07 20:33       ` SZEDER Gábor
2020-04-16 20:14     ` [PATCH v3 2/3] tests: write commit-graph with Bloom filters Derrick Stolee via GitGitGadget
2020-04-16 20:14     ` [PATCH v3 3/3] blame: use changed-path " Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200416005241.GB36156@syl.local \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=garimasigit@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.