From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: me@ttaylorr.com, jnareb@gmail.com, garimasigit@gmail.com,
Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH v2 0/4] Integrate changed-path Bloom filters with 'git blame'
Date: Mon, 13 Apr 2020 14:45:22 +0000 [thread overview]
Message-ID: <pull.609.v2.git.1586789126.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.609.git.1586566981.gitgitgadget@gmail.com>
If the changed-path Bloom filters are relatively stable, then I propose
trying to build upon them as a way to discover any deficiencies. Also, it's
good to use them when we can.
The goal I set out to achieve was to use Bloom filters as much as possible
in git blame. I think I have achieved most of that, except I did not
integrate it with the -C mode. In that case, the blob-diff computation takes
a majority of the computation time, so short-circuiting the tree diff using
Bloom filters. Also, it's just really complicated. If someone else thinks
there is an easy win there, then please go ahead and give it a try with the
extra logic presented here in PATCH 3.
While I was playing around with Bloom filters and git blame, I purposefully
got it working with some scenarios but not all. Then I tried to trigger a
failing build in the blame tests using GIT_TEST_COMMIT_GRAPH=1 and
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1. But the tests all succeeded!
Examining the data, I see that the commit-graph didn't have the Bloom filter
chunks at all. This is because we are not setting the flag to write them in
the right spot. The GIT_TEST_COMMIT_GRAPH=1 variable triggers a commit-graph
write during git commit, so we need to update the code there instead of just
inspecting the variable in git commit-graph write. (This is PATCH 2.)
By updating this variable, I saw some test failures in other tests regarding
non-exact pathspecs. I fixed these in PATCH 1 so we keep clean builds.
I based this change on [1] but it would apply cleanly (and logically) on
gs/commit-graph-path-filter
Updates in v2:
* Added PATCH 3 to write commit-graph files during 'git merge' if
GIT_TEST_COMMIT_GRAPH is enabled.
* Updated PATCH 1 with the simplification recommended by Taylor.
* Fixed the lower-case "bloom" in the commit message.
Thanks, -Stolee
[1]
https://lore.kernel.org/git/pull.601.v2.git.1586437211842.gitgitgadget@gmail.com/
Derrick Stolee (4):
revision: complicated pathspecs disable filters
commit: write commit-graph with Bloom filters
commit-graph: write commit-graph in more tests
blame: use changed-path Bloom filters
blame.c | 139 ++++++++++++++++++++++++++++++++++++++++++++---
blame.h | 6 ++
builtin/blame.c | 10 ++++
builtin/commit.c | 4 +-
builtin/merge.c | 7 ++-
commit-graph.c | 9 +++
commit-graph.h | 2 +
revision.c | 19 ++++++-
8 files changed, 181 insertions(+), 15 deletions(-)
base-commit: f4df00a0dd448edce0e854a97f63598fefe27d27
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-609%2Fderrickstolee%2Fbloom-blame-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-609/derrickstolee/bloom-blame-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/609
Range-diff vs v1:
1: 9cc31c289aa ! 1: adc03eee4ac revision: complicated pathspecs disable filters
@@ Commit message
path Bloom filters in the test suite. That will be fixed in the
next change.
+ Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
## revision.c ##
+@@ revision.c: static void trace2_bloom_filter_statistics_atexit(void)
+ jw_release(&jw);
+ }
+
++static int forbid_bloom_filters(struct pathspec *spec)
++{
++ if (spec->has_wildcard)
++ return 1;
++ if (spec->nr > 1)
++ return 1;
++ if (spec->magic & ~PATHSPEC_LITERAL)
++ return 1;
++ if (spec->nr && (spec->items[0].magic & ~PATHSPEC_LITERAL))
++ return 1;
++
++ return 0;
++}
++
+ static void prepare_to_use_bloom_filter(struct rev_info *revs)
+ {
+ struct pathspec_item *pi;
@@ revision.c: static void prepare_to_use_bloom_filter(struct rev_info *revs)
- if (!revs->commits)
- return;
+ int len;
-+ if (revs->prune_data.has_wildcard)
-+ return;
-+ if (revs->prune_data.nr > 1)
-+ return;
-+ if (revs->prune_data.magic ||
-+ (revs->prune_data.nr &&
-+ revs->prune_data.items[0].magic))
+ if (!revs->commits)
+- return;
+ return;
+
++ if (forbid_bloom_filters(&revs->prune_data))
++ return;
+
repo_parse_commit(revs->repo, revs->commits->item);
- if (!revs->repo->objects->commit_graph)
2: bb5ce39d028 < -: ----------- commit: write commit-graph with bloom filters
-: ----------- > 2: 7e8f1aed113 commit: write commit-graph with Bloom filters
-: ----------- > 3: 824f8ad067b commit-graph: write commit-graph in more tests
3: 431fde68031 = 4: 4ae196d6355 blame: use changed-path Bloom filters
--
gitgitgadget
next prev parent reply other threads:[~2020-04-13 14:45 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-11 1:02 [PATCH 0/3] Integrate changed-path Bloom filters with 'git blame' Derrick Stolee via GitGitGadget
2020-04-11 1:02 ` [PATCH 1/3] revision: complicated pathspecs disable filters Derrick Stolee via GitGitGadget
2020-04-11 21:40 ` Junio C Hamano
2020-04-13 11:49 ` Derrick Stolee
2020-04-14 18:25 ` Junio C Hamano
2020-04-15 13:27 ` Derrick Stolee
2020-04-15 18:37 ` Derrick Stolee
2020-04-15 19:32 ` Junio C Hamano
2020-04-15 19:39 ` Junio C Hamano
2020-04-15 21:25 ` Junio C Hamano
2020-04-16 0:56 ` Taylor Blau
2020-04-15 22:18 ` Jakub Narębski
2020-04-16 0:52 ` Taylor Blau
2020-04-16 13:26 ` Derrick Stolee
2020-04-16 16:33 ` Taylor Blau
2020-04-16 18:02 ` Junio C Hamano
2020-04-12 22:22 ` Taylor Blau
2020-04-12 22:30 ` Junio C Hamano
2020-04-13 0:07 ` Taylor Blau
2020-04-13 11:54 ` Derrick Stolee
2020-04-11 1:03 ` [PATCH 2/3] commit: write commit-graph with bloom filters Derrick Stolee via GitGitGadget
2020-04-11 21:57 ` Junio C Hamano
2020-04-12 20:51 ` Taylor Blau
2020-04-13 12:08 ` Derrick Stolee
2020-04-13 22:11 ` Junio C Hamano
2020-04-11 1:03 ` [PATCH 3/3] blame: use changed-path Bloom filters Derrick Stolee via GitGitGadget
2020-04-11 22:03 ` Junio C Hamano
2020-04-12 7:39 ` Eric Sunshine
2020-04-11 21:30 ` [PATCH 0/3] Integrate changed-path Bloom filters with 'git blame' Junio C Hamano
2020-04-13 14:45 ` Derrick Stolee via GitGitGadget [this message]
2020-04-13 14:45 ` [PATCH v2 1/4] revision: complicated pathspecs disable filters Derrick Stolee via GitGitGadget
2020-04-13 16:09 ` Taylor Blau
2020-04-13 22:18 ` Junio C Hamano
2020-04-13 14:45 ` [PATCH v2 2/4] commit: write commit-graph with Bloom filters Derrick Stolee via GitGitGadget
2020-04-13 16:12 ` Taylor Blau
2020-04-13 22:21 ` Junio C Hamano
2020-04-14 15:04 ` Derrick Stolee
2020-04-14 17:26 ` Junio C Hamano
2020-04-14 17:40 ` Derrick Stolee
2020-04-15 0:17 ` Taylor Blau
2020-04-13 14:45 ` [PATCH v2 3/4] commit-graph: write commit-graph in more tests Derrick Stolee via GitGitGadget
2020-04-13 14:45 ` [PATCH v2 4/4] blame: use changed-path Bloom filters Derrick Stolee via GitGitGadget
2020-04-13 16:21 ` [PATCH v2 0/4] Integrate changed-path Bloom filters with 'git blame' Taylor Blau
2020-04-16 20:14 ` [PATCH v3 0/3] " Derrick Stolee via GitGitGadget
2020-04-16 20:14 ` [PATCH v3 1/3] revision: complicated pathspecs disable filters Derrick Stolee via GitGitGadget
2020-06-07 20:33 ` SZEDER Gábor
2020-04-16 20:14 ` [PATCH v3 2/3] tests: write commit-graph with Bloom filters Derrick Stolee via GitGitGadget
2020-04-16 20:14 ` [PATCH v3 3/3] blame: use changed-path " Derrick Stolee via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.609.v2.git.1586789126.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=dstolee@microsoft.com \
--cc=garimasigit@gmail.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.