git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Antonio Russo <antonio.e.russo@gmail.com>, git-ml <git@vger.kernel.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v2] Teach git-rev-list --simplify-forks
Date: Fri, 1 May 2020 11:44:43 -0400	[thread overview]
Message-ID: <7d85d901-714e-12c7-69fd-38f49c686440@gmail.com> (raw)
In-Reply-To: <2a668dd2-35c0-7510-fc4d-6c44e8407a07@gmail.com>

On 5/1/2020 10:13 AM, Antonio Russo wrote:
> On 4/29/20 7:12 AM, Derrick Stolee wrote:
>> On 4/29/2020 4:01 AM, Antonio Russo wrote:
>>> When used with --graph, instead of displaying the full graph, display a
>>> spanning subgraph produced by a depth-first search of the graph visiting
>>> parents from left to right.  Edges to already visited commits are
>>> discarded.  This process is repeated for every commit to be displayed.
>>>
>>> This is valuable to reduce visual clutter when there are many merges
>>> that were not rebased onto each other and the user is primarily
>>> interested in the state of the branch being merged into.
>>
>> My earlier comment to recommend how to fix the test failures intended
>> to demonstrate that this area of code requires a bit of precision. I
>> took some time to review this patch, but I find it needs some significant
>> updates.
>>
>> tl;dr: The way you manipulate the commit parents seems incorrect to me,
>> but perhaps there is enough prior art in the way we simplify parents to
>> make that acceptable. Someone else will need to chime in on that part.
> 
> First, thank you for taking the time look at this.  I understand your
> hesitation about the "amputation" of the history, but in some sense
> that's the point of this option.  I really want to be ignorant of the
> details of when the fork branched off.  I would like the reported
> history to be appear nearly equivalent to a rebase-and-fastforward only
> merge style, which results in a much simpler git log --graph.
> 
>> It may be possible to do this "drop edges" entirely inside of graph.c
>> (the graph rendering code) instead of revision.c, which would automatically
>> work with the performance gains from the newer topo-walk algorithm.
> 
> Non-local information about the commit graph needs to be used to do this
> amputation of the history.  We cannot know how many parents we want to
> display until we've completely explored all the parents of that node.
> Unfortunately, that means that the whole graph needs to be loaded, and I
> cannot really see how there would be any gain by doing this in graph.c.
> 
> Caveat: there are semi-streaming DFS implementations (i.e., O(n log n)
> space) that we might be able to use to get the first line out the door
> quicker. I would, however, like to leave that to another patch.
> 
> I will also add that, for the tests I've done, all performance penalties
> have been insignificant (less than ~5% for showing the first commit),
> while there are significant performance _improvements_, e.g., ~40% for
> displaying the full tree.
> 
> A notable exception is --all, which can be ~50x faster for the full
> output, but is often dramatically slower to show anything (i.e., the
> first line).
> 
>> There are enough deviations from code and test style that this will
>> need a significant revision regardless.
> 
> (Please see forthcoming revision 3).
> 
>>> This second revision of the patch sets revs->limited.  This forces the
>>> graph of commits to be loaded, and simplfiy_forks() therefore reliably
>>> traverses it.  This addresses the test failures mentioned before (see [1]).
>>
>> This will have a significant performance impact on the option, as you will
>> not see even the first result until the DFS has completed.
> 
> First of all, short of using some other, more sophisticated streaming
> version of this algorithm, the full DFS must finish before the first
> commit having two (or more) parents can be shown.
> 
> That said, the performance is not significantly affected:
> 
> I ran the following test (2.26.2, with my patch on top of it):
> (git lg = git log --graph --pretty=oneline)
> 
>  % time git lg -n1 --ignore-merge-bases e896a286df > /dev/null
>  0.73s user 0.02s system 99% cpu 0.746 total
> 
>  % time git lg -n1 e896a286df > /dev/null
>  0.72s user 0.01s system 99% cpu 0.731 total
> 
>  For the linux git repo:
> 
>  % time git lg -n1 --ignore-merge-bases v5.7-rc3 >/dev/null
>  9.25s user 0.39s system 99% cpu 9.646 total
> 
>  % time git lg -n1 v5.7-rc3 >/dev/null
>  9.02s user 0.35s system 99% cpu 9.378 total
> 
> So the performance seems basically unaffected for very limited graphs.
> It's also about 40% faster for complicated ones (as mentioned in my
> first email):
> 
>  % time git lg --ignore-merge-bases e870325ee8 > /dev/null
>  0.83s user 0.06s system 99% cpu 0.886 total
> 
>  % time git lg e870325ee8 > /dev/null
>  1.41s user 0.03s system 99% cpu 1.443 total
> 
>  For the linux git repo:
> 
>  % time git lg --ignore-merge-bases v5.7-rc3 >/dev/null
>  11.86s user 0.62s system 99% cpu 12.489 total
> 
>  % time git lg v5.7-rc3 >/dev/null
>  21.56s user 0.55s system 99% cpu 22.108 total

First, run `git commit-graph write --reachable` to create a commit-graph
file which has generation numbers. In this case, I get the following:

$ time git log --oneline --graph v5.7-rc3 >/dev/null

real    0m13.548s
user    0m13.348s
sys     0m0.200s

$ time git log --oneline --graph -n 1 v5.7-rc3 >/dev/null

real    0m0.007s
user    0m0.004s
sys     0m0.004s

Notice exactly how much better this gets for the first result with that
file.

> This is because the amputated graph is much simpler, and the rest of the
> code needs to do much less work.
> 
> Passing --all is another beast, and does indeed suffer:
> 
>  % time git lg --ignore-merge-bases --all >/dev/null
>  4.06s user 0.04s system 99% cpu 4.105 total
> 
>  % time git lg --all >/dev/null
>  189.59s user 0.04s system 99% cpu 3:09.65 total
> 
>  (and for the first line)
> 
>  % time git lg -n1 --ignore-merge-bases --all >/dev/null
>  3.86s user 0.02s system 99% cpu 3.874 total
> 
>  % time git lg -n1 --all >/dev/null
>  0.83s user 0.02s system 99% cpu 0.848 total
> 
> (If you need to use --all for the Linux git repo, you should not use
> --ignore-merge-bases).

I think this is a deficiency in your implementation, not a hard rule
about how these options would need to interact.

Thanks,
-Stolee


  reply	other threads:[~2020-05-01 15:44 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-27  5:07 [PATCH] Teach git-rev-list --simplify-forks Antonio Russo
2020-04-27 10:55 ` Derrick Stolee
2020-04-28 12:49   ` Antonio Russo
2020-04-28 14:11     ` Derrick Stolee
2020-04-29  8:01 ` [PATCH v2] " Antonio Russo
2020-04-29 10:08   ` Philip Oakley
2020-04-29 13:04     ` Antonio Russo
2020-04-29 13:12   ` Derrick Stolee
2020-05-01 14:13     ` Antonio Russo
2020-05-01 15:44       ` Derrick Stolee [this message]
2020-05-01 14:21   ` [PATCH 1/3 v3] Clean up t6016-rev-list-graph-simplify-history Antonio Russo
2020-05-01 17:10     ` Junio C Hamano
2020-05-02 15:27       ` Antonio Russo
2020-05-01 14:22   ` [PATCH 2/3 v3] Teach git-rev-list --ignore-merge-bases Antonio Russo
2020-05-02 13:25     ` Johannes Sixt
2020-05-02 14:21       ` Antonio Russo
2020-05-01 14:23   ` [PATCH 3/3 v3] Add new tests of ignore-merge-bases Antonio Russo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7d85d901-714e-12c7-69fd-38f49c686440@gmail.com \
    --to=stolee@gmail.com \
    --cc=antonio.e.russo@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).