git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tommy Wang <subscription@august8.net>
To: git <git@vger.kernel.org>
Subject: Default history simplification
Date: Thu, 19 Nov 2009 17:30:43 -0600	[thread overview]
Message-ID: <ae09c2a40911191530y626dd035q90de0212e0b4b6d8@mail.gmail.com> (raw)

Hi all,

I'm working on a small perl script that needs to reproduce the results
of the default git history simplification used by log & rev-list when
a list of paths is given.  It is important that I reproduce its
results exactly.  I would love to simply use the rev-list built-in to
do my work for me; but I fear that I may have much too many path
limiters than the linux command-line can handle (which if I'm correct,
can only take so many arguments).

I can understand whether a commit is included or not rather easily: we
basically only include a commit iff the commit introduces a change to
the given path (ie, it is !TREESAME to any of its parents).

For merges, I get a little confused.  In the simple case of a 2-parent
merge, if we find that we are TREESAME with one parent -- it follows
that the interesting change must have come from that parent.  So we
follow it, and can safely assume that the other parent is
uninteresting.

If we find that we are TREESAME with both parents, it follows that
neither parent nor the merge commit is interesting.  Therefore, we
randomly pick the first parent and move up the graph looking for an
interesting commit.  Looking at this more closely, the parents we
ignored will fall under three scenarios:

1. It finds it way all the way up to a root commit without finding an
interesting commit -- which can happen if your repository has multiple
root commits.
2. It will move up the graph and eventually find a common ancestor
with the first parent.
3. It will find an interesting commit that has an identical/equivalent
commit somewhere in the first parent's ancestry.

In all 3 cases, it is safe to follow just the first TREESAME parent.
It is interesting to note that if are working with a repository with
multiple root commits (say, you imported a project without history,
and later merged in its upstream history), you could potentially lose
that history with this algorithm if you happened to merge it as the
second parent.  This is not necessarily a flaw (in fact, it may be a
feature!), since you have still fully explained the state of your HEAD
with respect to the first parent (which is probably your mainline).

In the case that neither parent was TREESAME, we can have 2 scenarios:

1. The merge commit itself made the interesting change.
2. The interesting changes came from both parents, in which we would
rightfully follow both parents.

My first question is this:

If the merge commit itself (D) made the interesting change, we could
potentially follow an uninteresting parent -- most likely all the way
up until we find a common ancestor (A).  This would produce a graph
that looks like:

A---B---D
 \-------/ (C pruned)

Or, if both parents were uninteresting:

A---D (both B & C pruned)
 \---/

I assume that the simplification takes care of this by removing
duplicate parents as well as searching for common ancestors?  (It is
not mentioned in the documentation).

My second question is then:

Given that we perform an extra simplification pass to remove common
ancestors and duplicate parents, what is the purpose of selectively
following parents?  Is it just for speed?  If we always followed all
parents and applied our duplicate simplification and common ancestor
simplification, we would not always arrive at the same result?  In
which case, if my application is not concerned with speed (ie, I don't
mind following all parents), would I always arrive at the same graph?

Thanks,

Tommy

             reply	other threads:[~2009-11-19 23:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-19 23:30 Tommy Wang [this message]
2009-11-20  8:55 ` Default history simplification Jan Krüger
2009-11-21  6:37   ` Junio C Hamano
2009-11-20 14:41 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae09c2a40911191530y626dd035q90de0212e0b4b6d8@mail.gmail.com \
    --to=subscription@august8.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).