From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>,
Derrick Stolee <dstolee@microsoft.com>, Jeff King <peff@peff.net>
Subject: Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.
Date: Fri, 04 May 2018 22:36:07 +0200 [thread overview]
Message-ID: <87h8nnxio8.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <86zi1fus3t.fsf@gmail.com>
On Fri, May 04 2018, Jakub Narebski wrote:
(Just off-the cuff here and I'm surely about to be corrected by
Derrick...)
> * What to do about merge commits, and octopus merges in particular?
> Should Bloom filter be stored for each of the parents? How to ensure
> fast access then (fixed-width records) - use large edge list?
You could still store it fixed with, you'd just say that if you
encounter a merge with N parents the filter wouldn't store files changed
in that commit, but rather whether any of the N (including the merge)
had changes to files as of the their common merge-base.
Then if they did you'd need to walk all sides of the merge where each
commit would also have the filter to figure out where the change(s)
was/were, but if they didn't you could skip straight to the merge base
and keep walking.
Which, on the topic of what else a commit graph could store: A mapping
from merge commits of N parents to the merge-base of those commits.
You could also store nothing for merges (or only files the merge itself
changed v.s. its parents). Derrick talked about how the bloom filter
implementation has a value that's "Didn't compute (for whatever reason),
look at it manually".
> * Then there is problem of rename and copying detection - I think we can
> simply ignore it: unless someone has an idea about how to handle it?
>
> Though this means that "git log --follow <file>" wouldn't get any
> speedup, and neither the half of "git gui blame" that runs "git blame
> --incremental -C -C -w" -- the one that allows code copying and
> movement detection.
Couldn't the bloom filter also speed up --follow if you did two passes
through the history? The first to figure out all files that ever changed
names, and then say you did `--follow sha1-name.c` on git.git. The
filter would have had all the bits for both sha1_name.c and sha1-name.c
set on all commits that touched either for all of the history.
Of course this would only work with a given default value of -M<n>, but
on the assumption that most users left it at the default, and
furthermore that renames weren't so common as to make the filter useless
with too many false-positives as a result, it might be worth it. If you
next prev parent reply other threads:[~2018-05-04 20:36 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-04 19:40 [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc Jakub Narebski
2018-05-04 20:07 ` Ævar Arnfjörð Bjarmason
2018-05-04 20:36 ` Ævar Arnfjörð Bjarmason [this message]
2018-05-05 13:28 ` Jakub Narebski
2018-05-06 23:55 ` [RFC] Other chunks for commit-graph, part 2 - reachability indexes Jakub Narebski
2018-05-07 14:26 ` [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc Derrick Stolee
2018-05-12 14:00 ` Jakub Narebski
2018-05-14 13:20 ` Derrick Stolee
2018-05-14 20:58 ` Jakub Narebski
2018-05-15 10:01 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h8nnxio8.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=jnareb@gmail.com \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.