git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>,
	Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH 07/15] commit-graph: new filter ver. that fixes murmur3
Date: Wed, 30 Aug 2023 22:02:18 +0200	[thread overview]
Message-ID: <20230830200218.GA5147@szeder.dev> (raw)
In-Reply-To: <20230829163124.363157-1-jonathantanmy@google.com>

On Tue, Aug 29, 2023 at 09:31:23AM -0700, Jonathan Tan wrote:
> SZEDER Gábor <szeder.dev@gmail.com> writes:
 
> > > +test_expect_success 'version 2 changed-path used when version 2 requested' '
> > > +	(
> > > +		cd highbit2 &&
> > > +		test_bloom_filters_used "-- $CENT"
> > 
> > This test_bloom_filter_used helper runs two pathspec-limited 'git log'
> > invocations, one with disabled and the other with enabled
> > commit-graph, and thus with disabled and enabled modified path Bloom
> > filters, and compares their output.
> > 
> > One of the flaws of the current modified path Bloom filters
> > implementation is that it doesn't check Bloom filters for root
> > commits.
> > 
> > In several of the above test cases test_bloom_filters_used is invoked
> > in a repository with only a root commit, so they don't check that
> > the output is the same with and without Bloom filters.
> 
> Ah...you are right. Indeed, when I flip one of the tests from
> test_bloom_filters_used to _not_, the test still passes. I'll change
> the tests.

I'd prefer to leave the test cases unchanged, and make the revision
walking machinery look at Bloom filters even for root commits, because
this is an annoying and recurring testing issue.  I remember it
annoyed me back then, when I wanted to reproduce a couple of bugs that
I knew were there, but my initial test cases didn't fail because the
Bloom filter of the root commit was ignored; Derrick overlooked this
in b16a8277644, you overlooked it now, and none of the reviewers then
and now caught it, either.

> > The string "split" occurs twice in this patch series, but only in
> > patch hunk contexts, and it doesn't occur at all in the previous long
> > thread about the original patch series.
> > 
> > Unfortunately, split commit-graphs weren't really considered in the
> > design of the modified path Bloom filters feature, and layers with
> > different Bloom filter settings weren't considered at all.  I've
> > reported it back then, but the fixes so far were incomplete, and e.g.
> > the test cases shown in
> > 
> >   https://public-inbox.org/git/20201015132147.GB24954@szeder.dev/
> > 
> > still fail.
> > 
> > Since the interaction of different versions and split commit-graphs
> > was neither mentioned in any of the commit messages nor discussed
> > during the previous rounds, and there isn't any test case excercising
> > it, and since the Bloom filter version information is stored in the
> > same 'g->bloom_filter_settings' structure as the number of hashes, I'm
> > afraid (though haven't actually checked) that handling commit-graph
> > layers with different Bloom filter versions is prone to the same
> > issues as well.
> 
> My original design (up to patch 7 in this patch set) defends against
> this by taking the very first version detected and rejecting every
> other version, and Taylor's subsequent design reads every version, but
> annotates filters with its version. So I think we're covered.

In the meantime I adapted the test cases from the above linked message
to write commit-graph layers with different Bloom filter versions, and
it does fail, because commits from the bottom commit-graph layer are
omitted from the revision walk's output.  And the test case doesn't
even need a middle layer without modified path Bloom filters to "hide"
the different version in the bottom layer.  Merging the layers seems
to work, though.

Besides fixing this issue, I think that the interaction of different
Bloom filter versions and split commit-graphs needs to be thoroughly
covered with test cases and discussed in the commit messages before
this series could be considered good for merging.


diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 48f8109a66..55f67e5110 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -586,4 +586,40 @@ test_expect_success 'when writing commit graph, reuse changed-path of another ve
 	test_filter_upgraded 1 trace2.txt
 '
 
+test_expect_success 'split commit graph vs changed paths breakage - setup' '
+	git init split-breakage &&
+	(
+		cd split-breakage &&
+		git commit --allow-empty -m "Bloom filters are written but still ignored for root commits :(" &&
+		for i in 1 2 3
+		do
+			echo $i >$CENT &&
+			git add $CENT &&
+			git commit -m "$i" || return 1
+		done &&
+		git log --oneline -- $CENT >expect
+	)
+'
+
+test_expect_failure 'split commit graph vs changed paths breakage - split' '
+	(
+		cd split-breakage &&
+
+		# Compute v1 Bloom filters for the commits at the bottom.
+		git rev-parse HEAD^ | git commit-graph write --stdin-commits --changed-paths --split &&
+		# Compute v2 Bloom filters for the rest of the commits at the top.
+		git rev-parse HEAD | git -c commitgraph.changedPathsVersion=2 commit-graph write --stdin-commits --changed-paths --split=no-merge &&
+
+		# Just to make sure that there are as many graph layers as I
+		# think there should be.
+		test_line_count = 2 .git/objects/info/commit-graphs/commit-graph-chain &&
+
+		# This checks Bloom filters using version information in the
+		# top layer, thus misses commits modifying the file in the
+		# bottom commit-graph layer.
+		git log --oneline -- $CENT >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done

It fails with:

  + cd split-breakage
  + git rev-parse HEAD^
  + git commit-graph write --stdin-commits --changed-paths --split
  + git rev-parse HEAD
  + git -c commitgraph.changedPathsVersion=2 commit-graph write --stdin-commits --changed-paths --split=no-merge
  + test_line_count = 2 .git/objects/info/commit-graphs/commit-graph-chain
  + git log --oneline -- ¢
  + test_cmp expect actual
  --- expect      2023-08-30 19:07:59.882066827 +0000
  +++ actual      2023-08-30 19:07:59.894067148 +0000
  @@ -1,3 +1 @@
   1db2248 3
  -cfcc97f 2
  -bd9c2c8 1
  error: last command exited with $?=1


  reply	other threads:[~2023-08-30 20:18 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-21 21:43 [PATCH 00/15] bloom: changed-path Bloom filters v2 Taylor Blau
2023-08-21 21:43 ` [PATCH 01/15] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-08-21 21:44 ` [PATCH 02/15] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-08-21 21:44 ` [PATCH 03/15] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-08-21 21:44 ` [PATCH 04/15] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-08-21 21:44 ` [PATCH 05/15] t4216: test changed path filters with high bit paths Taylor Blau
2023-08-21 21:44 ` [PATCH 06/15] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-08-21 21:44 ` [PATCH 07/15] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-08-26 15:06   ` SZEDER Gábor
2023-08-29 16:31     ` Jonathan Tan
2023-08-30 20:02       ` SZEDER Gábor [this message]
2023-09-01 20:56         ` Jonathan Tan
2023-09-25 23:03           ` Taylor Blau
2023-10-08 14:35             ` SZEDER Gábor
2023-10-09 18:17               ` Taylor Blau
2023-10-09 19:31                 ` Taylor Blau
2023-10-09 19:52                   ` Junio C Hamano
2023-10-10 20:34                     ` Taylor Blau
2023-08-21 21:44 ` [PATCH 08/15] bloom: annotate filters with hash version Taylor Blau
2023-08-21 21:44 ` [PATCH 09/15] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-21 21:44 ` [PATCH 10/15] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-21 21:44 ` [PATCH 11/15] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-21 21:44 ` [PATCH 12/15] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-08-21 21:44 ` [PATCH 13/15] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-21 21:44 ` [PATCH 14/15] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-21 21:44 ` [PATCH 15/15] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-08-24 22:22 ` [PATCH 00/15] bloom: changed-path Bloom filters v2 Jonathan Tan
2023-08-25 17:06   ` Jonathan Tan
2023-08-29 22:18     ` Jonathan Tan
2023-08-29 23:16       ` Junio C Hamano
2023-08-30 16:43 ` [PATCH v2 " Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 01/15] gitformat-commit-graph: describe version 2 of BDAT Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 02/15] t/helper/test-read-graph.c: extract `dump_graph_info()` Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 03/15] bloom.h: make `load_bloom_filter_from_graph()` public Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 04/15] t/helper/test-read-graph: implement `bloom-filters` mode Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 05/15] t4216: test changed path filters with high bit paths Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 06/15] repo-settings: introduce commitgraph.changedPathsVersion Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 07/15] commit-graph: new filter ver. that fixes murmur3 Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 08/15] bloom: annotate filters with hash version Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 09/15] bloom: prepare to discard incompatible Bloom filters Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 10/15] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 11/15] commit-graph.c: unconditionally load Bloom filters Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 12/15] commit-graph: drop unnecessary `graph_read_bloom_data_context` Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 13/15] object.h: fix mis-aligned flag bits table Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 14/15] commit-graph: reuse existing Bloom filters where possible Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 15/15] bloom: introduce `deinit_bloom_filters()` Jonathan Tan
2023-08-30 19:38   ` [PATCH v2 00/15] bloom: changed-path Bloom filters v2 Junio C Hamano
2023-10-10 20:33 ` [PATCH v3 00/17] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2023-10-10 20:33   ` [PATCH v3 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-10-10 20:33   ` [PATCH v3 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2023-10-10 20:33   ` [PATCH v3 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2023-10-17  8:45     ` Patrick Steinhardt
2023-10-10 20:33   ` [PATCH v3 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-10-10 20:33   ` [PATCH v3 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-10-17  8:45     ` Patrick Steinhardt
2023-10-18 17:37       ` Taylor Blau
2023-10-18 23:56         ` Junio C Hamano
2023-10-10 20:33   ` [PATCH v3 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-10-10 20:33   ` [PATCH v3 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-10-10 20:33   ` [PATCH v3 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2023-10-17  8:45     ` Patrick Steinhardt
2023-10-18 17:41       ` Taylor Blau
2023-10-10 20:33   ` [PATCH v3 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-10-10 20:33   ` [PATCH v3 10/17] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-10-17  8:45     ` Patrick Steinhardt
2023-10-18 17:46       ` Taylor Blau
2023-10-10 20:33   ` [PATCH v3 11/17] bloom: annotate filters with hash version Taylor Blau
2023-10-10 20:33   ` [PATCH v3 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-10-10 20:33   ` [PATCH v3 13/17] commit-graph.c: unconditionally load " Taylor Blau
2023-10-17  8:45     ` Patrick Steinhardt
2023-10-10 20:34   ` [PATCH v3 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-10-10 20:34   ` [PATCH v3 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2023-10-10 20:34   ` [PATCH v3 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-10-10 20:34   ` [PATCH v3 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-10-17  8:45   ` [PATCH v3 00/17] bloom: changed-path Bloom filters v2 (& sundries) Patrick Steinhardt
2023-10-18 17:47     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230830200218.GA5147@szeder.dev \
    --to=szeder.dev@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).