git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org, Derrick Stolee <stolee@gmail.com>,
	Derrick Stolee <dstolee@microsoft.com>, Jeff King <peff@peff.net>
Subject: Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.
Date: Fri, 04 May 2018 22:36:07 +0200	[thread overview]
Message-ID: <87h8nnxio8.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <86zi1fus3t.fsf@gmail.com>


On Fri, May 04 2018, Jakub Narebski wrote:

(Just off-the cuff here and I'm surely about to be corrected by
Derrick...)

> * What to do about merge commits, and octopus merges in particular?
>   Should Bloom filter be stored for each of the parents?  How to ensure
>   fast access then (fixed-width records) - use large edge list?

You could still store it fixed with, you'd just say that if you
encounter a merge with N parents the filter wouldn't store files changed
in that commit, but rather whether any of the N (including the merge)
had changes to files as of the their common merge-base.

Then if they did you'd need to walk all sides of the merge where each
commit would also have the filter to figure out where the change(s)
was/were, but if they didn't you could skip straight to the merge base
and keep walking.

Which, on the topic of what else a commit graph could store: A mapping
from merge commits of N parents to the merge-base of those commits.

You could also store nothing for merges (or only files the merge itself
changed v.s. its parents). Derrick talked about how the bloom filter
implementation has a value that's "Didn't compute (for whatever reason),
look at it manually".

> * Then there is problem of rename and copying detection - I think we can
>   simply ignore it: unless someone has an idea about how to handle it?
>
>   Though this means that "git log --follow <file>" wouldn't get any
>   speedup, and neither the half of "git gui blame" that runs "git blame
>   --incremental -C -C -w" -- the one that allows code copying and
>   movement detection.

Couldn't the bloom filter also speed up --follow if you did two passes
through the history? The first to figure out all files that ever changed
names, and then say you did `--follow sha1-name.c` on git.git. The
filter would have had all the bits for both sha1_name.c and sha1-name.c
set on all commits that touched either for all of the history.

Of course this would only work with a given default value of -M<n>, but
on the assumption that most users left it at the default, and
furthermore that renames weren't so common as to make the filter useless
with too many false-positives as a result, it might be worth it. If you

  parent reply	other threads:[~2018-05-04 20:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-04 19:40 [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc Jakub Narebski
2018-05-04 20:07 ` Ævar Arnfjörð Bjarmason
2018-05-04 20:36 ` Ævar Arnfjörð Bjarmason [this message]
2018-05-05 13:28   ` Jakub Narebski
2018-05-06 23:55 ` [RFC] Other chunks for commit-graph, part 2 - reachability indexes Jakub Narebski
2018-05-07 14:26 ` [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc Derrick Stolee
2018-05-12 14:00   ` Jakub Narebski
2018-05-14 13:20     ` Derrick Stolee
2018-05-14 20:58       ` Jakub Narebski
2018-05-15 10:01         ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h8nnxio8.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).