From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>,
Derrick Stolee <dstolee@microsoft.com>, Jeff King <peff@peff.net>
Subject: Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.
Date: Fri, 04 May 2018 22:36:07 +0200 [thread overview]
Message-ID: <87h8nnxio8.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <86zi1fus3t.fsf@gmail.com>
On Fri, May 04 2018, Jakub Narebski wrote:
(Just off-the cuff here and I'm surely about to be corrected by
Derrick...)
> * What to do about merge commits, and octopus merges in particular?
> Should Bloom filter be stored for each of the parents? How to ensure
> fast access then (fixed-width records) - use large edge list?
You could still store it fixed with, you'd just say that if you
encounter a merge with N parents the filter wouldn't store files changed
in that commit, but rather whether any of the N (including the merge)
had changes to files as of the their common merge-base.
Then if they did you'd need to walk all sides of the merge where each
commit would also have the filter to figure out where the change(s)
was/were, but if they didn't you could skip straight to the merge base
and keep walking.
Which, on the topic of what else a commit graph could store: A mapping
from merge commits of N parents to the merge-base of those commits.
You could also store nothing for merges (or only files the merge itself
changed v.s. its parents). Derrick talked about how the bloom filter
implementation has a value that's "Didn't compute (for whatever reason),
look at it manually".
> * Then there is problem of rename and copying detection - I think we can
> simply ignore it: unless someone has an idea about how to handle it?
>
> Though this means that "git log --follow <file>" wouldn't get any
> speedup, and neither the half of "git gui blame" that runs "git blame
> --incremental -C -C -w" -- the one that allows code copying and
> movement detection.
Couldn't the bloom filter also speed up --follow if you did two passes
through the history? The first to figure out all files that ever changed
names, and then say you did `--follow sha1-name.c` on git.git. The
filter would have had all the bits for both sha1_name.c and sha1-name.c
set on all commits that touched either for all of the history.
Of course this would only work with a given default value of -M<n>, but
on the assumption that most users left it at the default, and
furthermore that renames weren't so common as to make the filter useless
with too many false-positives as a result, it might be worth it. If you
next prev parent reply other threads:[~2018-05-04 20:36 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-04 19:40 [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc Jakub Narebski
2018-05-04 20:07 ` Ævar Arnfjörð Bjarmason
2018-05-04 20:36 ` Ævar Arnfjörð Bjarmason [this message]
2018-05-05 13:28 ` Jakub Narebski
2018-05-06 23:55 ` [RFC] Other chunks for commit-graph, part 2 - reachability indexes Jakub Narebski
2018-05-07 14:26 ` [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc Derrick Stolee
2018-05-12 14:00 ` Jakub Narebski
2018-05-14 13:20 ` Derrick Stolee
2018-05-14 20:58 ` Jakub Narebski
2018-05-15 10:01 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h8nnxio8.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).