Default history simplification

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Default history simplification
@ 2009-11-19 23:30 Tommy Wang
  2009-11-20  8:55 ` Jan Krüger
  2009-11-20 14:41 ` Jeff King
  0 siblings, 2 replies; 4+ messages in thread
From: Tommy Wang @ 2009-11-19 23:30 UTC (permalink / raw)
  To: git

Hi all,

I'm working on a small perl script that needs to reproduce the results
of the default git history simplification used by log & rev-list when
a list of paths is given.  It is important that I reproduce its
results exactly.  I would love to simply use the rev-list built-in to
do my work for me; but I fear that I may have much too many path
limiters than the linux command-line can handle (which if I'm correct,
can only take so many arguments).

I can understand whether a commit is included or not rather easily: we
basically only include a commit iff the commit introduces a change to
the given path (ie, it is !TREESAME to any of its parents).

For merges, I get a little confused.  In the simple case of a 2-parent
merge, if we find that we are TREESAME with one parent -- it follows
that the interesting change must have come from that parent.  So we
follow it, and can safely assume that the other parent is
uninteresting.

If we find that we are TREESAME with both parents, it follows that
neither parent nor the merge commit is interesting.  Therefore, we
randomly pick the first parent and move up the graph looking for an
interesting commit.  Looking at this more closely, the parents we
ignored will fall under three scenarios:

1. It finds it way all the way up to a root commit without finding an
interesting commit -- which can happen if your repository has multiple
root commits.
2. It will move up the graph and eventually find a common ancestor
with the first parent.
3. It will find an interesting commit that has an identical/equivalent
commit somewhere in the first parent's ancestry.

In all 3 cases, it is safe to follow just the first TREESAME parent.
It is interesting to note that if are working with a repository with
multiple root commits (say, you imported a project without history,
and later merged in its upstream history), you could potentially lose
that history with this algorithm if you happened to merge it as the
second parent.  This is not necessarily a flaw (in fact, it may be a
feature!), since you have still fully explained the state of your HEAD
with respect to the first parent (which is probably your mainline).

In the case that neither parent was TREESAME, we can have 2 scenarios:

1. The merge commit itself made the interesting change.
2. The interesting changes came from both parents, in which we would
rightfully follow both parents.

My first question is this:

If the merge commit itself (D) made the interesting change, we could
potentially follow an uninteresting parent -- most likely all the way
up until we find a common ancestor (A).  This would produce a graph
that looks like:

A---B---D
 \-------/ (C pruned)

Or, if both parents were uninteresting:

A---D (both B & C pruned)
 \---/

I assume that the simplification takes care of this by removing
duplicate parents as well as searching for common ancestors?  (It is
not mentioned in the documentation).

My second question is then:

Given that we perform an extra simplification pass to remove common
ancestors and duplicate parents, what is the purpose of selectively
following parents?  Is it just for speed?  If we always followed all
parents and applied our duplicate simplification and common ancestor
simplification, we would not always arrive at the same result?  In
which case, if my application is not concerned with speed (ie, I don't
mind following all parents), would I always arrive at the same graph?

Thanks,

Tommy

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Default history simplification
  2009-11-19 23:30 Default history simplification Tommy Wang
@ 2009-11-20  8:55 ` Jan Krüger
  2009-11-21  6:37   ` Junio C Hamano
  2009-11-20 14:41 ` Jeff King
  1 sibling, 1 reply; 4+ messages in thread
From: Jan Krüger @ 2009-11-20  8:55 UTC (permalink / raw)
  To: Tommy Wang; +Cc: git

Hi,

Tommy Wang <subscription@august8.net> wrote:
> [...] I would love to simply use the rev-list built-in to
> do my work for me; but I fear that I may have much too many path
> limiters than the linux command-line can handle (which if I'm correct,
> can only take so many arguments).

On my system, "only so many arguments" means about two megabytes worth
of command line. On several operating systems, "getconf ARG_MAX" can
tell you the approximate limit (which I think includes the space for
environment variables).

I don't really have any comments about the other things you said.

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Default history simplification
  2009-11-19 23:30 Default history simplification Tommy Wang
  2009-11-20  8:55 ` Jan Krüger
@ 2009-11-20 14:41 ` Jeff King
  1 sibling, 0 replies; 4+ messages in thread
From: Jeff King @ 2009-11-20 14:41 UTC (permalink / raw)
  To: Tommy Wang; +Cc: git

On Thu, Nov 19, 2009 at 05:30:43PM -0600, Tommy Wang wrote:

> I'm working on a small perl script that needs to reproduce the results
> of the default git history simplification used by log & rev-list when
> a list of paths is given.  It is important that I reproduce its
> results exactly.  I would love to simply use the rev-list built-in to
> do my work for me; but I fear that I may have much too many path
> limiters than the linux command-line can handle (which if I'm correct,
> can only take so many arguments).

FYI, Junio just posted a patch for handling pathspec limiting over
--stdin, which would solve this problem for you.

  http://article.gmane.org/gmane.comp.version-control.git/133333

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Default history simplification
  2009-11-20  8:55 ` Jan Krüger
@ 2009-11-21  6:37   ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2009-11-21  6:37 UTC (permalink / raw)
  To: Jan Krüger; +Cc: Tommy Wang, git

Jan Krüger <jk@jk.gs> writes:

> Tommy Wang <subscription@august8.net> wrote:
>> [...] I would love to simply use the rev-list built-in to
>> do my work for me; but I fear that I may have much too many path
>> limiters than the linux command-line can handle (which if I'm correct,
>> can only take so many arguments).
>
> On my system, "only so many arguments" means about two megabytes worth
> of command line.

Good point.

The command line limitation, even if it is very small, shouldn't be a
problem if Tommy is interested in recreating exactly what rev-list does.

If his tool that drives rev-list or log from the command line cannot work
because his request overflows the command line, the same request will
overflow the command line for rev-list or log, so they wouldn't work
either.

For a more serious answer, see the other thread ;-).

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-11-21  6:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-19 23:30 Default history simplification Tommy Wang
2009-11-20  8:55 ` Jan Krüger
2009-11-21  6:37   ` Junio C Hamano
2009-11-20 14:41 ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).