From: Jakub Narebski <jnareb@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, Jeff King <peff@peff.net>,
Junio C Hamano <gitster@pobox.com>,
Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v4 6/7] revision.c: generation-based topo-order algorithm
Date: Fri, 26 Oct 2018 18:55:22 +0200 [thread overview]
Message-ID: <86bm7g641x.fsf@gmail.com> (raw)
In-Reply-To: <6501930f-4097-1b81-6d0d-be54f050a5a4@gmail.com> (Derrick Stolee's message of "Tue, 23 Oct 2018 09:54:30 -0400")
Derrick Stolee <stolee@gmail.com> writes:
> On 10/22/2018 9:37 AM, Jakub Narebski wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
[...]
>> Sidenote: the new algorithm looks a bit like Unix pipeline, where each
>> step of pipeline does not output much more than next step needs /
>> requests.
>
> That's essentially the idea.
Some of the newer languages have built-in support for similar kind of
pipeline for connecting processes, be it channels in Go, supplies and
suppliers in Perl6. I wonder if there exists some library implementing
this kind of construct in C.
That aside, I wonder if when there would be support for more
reachability indices than generation numbers, if it wouldn't be better
to pass a commit as a limiter (up to this commit), than specific indices
like current passing of generation number. Just food for thoughs...
[...]
>>> In my local testing, I used the following Git commands on the
>>> Linux repository in three modes: HEAD~1 with no commit-graph,
>>> HEAD~1 with a commit-graph, and HEAD with a commit-graph. This
>>> allows comparing the benefits we get from parsing commits from
>>> the commit-graph and then again the benefits we get by
>>> restricting the set of commits we walk.
>>>
>>> Test: git rev-list --topo-order -100 HEAD
>>> HEAD~1, no commit-graph: 6.80 s
>>> HEAD~1, w/ commit-graph: 0.77 s
>>> HEAD, w/ commit-graph: 0.02 s
>>>
>>> Test: git rev-list --topo-order -100 HEAD -- tools
>>> HEAD~1, no commit-graph: 9.63 s
>>> HEAD~1, w/ commit-graph: 6.06 s
>>> HEAD, w/ commit-graph: 0.06 s
>>>
>>> This speedup is due to a few things. First, the new generation-
>>> number-enabled algorithm walks commits on order of the number of
>>> results output (subject to some branching structure expectations).
>>> Since we limit to 100 results, we are running a query similar to
>>> filling a single page of results. Second, when specifying a path,
>>> we must parse the root tree object for each commit we walk. The
>>> previous benefits from the commit-graph are entirely from reading
>>> the commit-graph instead of parsing commits. Since we need to
>>> parse trees for the same number of commits as before, we slow
>>> down significantly from the non-path-based query.
>>>
>>> For the test above, I specifically selected a path that is changed
>>> frequently, including by merge commits. A less-frequently-changed
>>> path (such as 'README') has similar end-to-end time since we need
>>> to walk the same number of commits (before determining we do not
>>> have 100 hits). However, get the benefit that the output is
>>> presented to the user as it is discovered, much the same as a
>>> normal 'git log' command (no '--topo-order'). This is an improved
>>> user experience, even if the command has the same runtime.
>>>
>> First, do I understand it correctly that in first case the gains from
>> new algorithms are so slim because with commit-graph file and no path
>> limiting we don't hit repository anyway; we walk less commits, but
>> reading commit data from commit-graph file is fast/
>
> If you mean 0.77s to 0.02s is "slim" then yes, it is because the
> commit-graph command already made a full walk of the commit history
> faster. (I'm only poking at this because the _relative_ improvement is
> significant, even if the command was already sub-second.)
First, you didn't provide us with percentages, i.e. relative improvement
(and I am lazy). Second, 0.02s can be within the margin of error,
depending on how it is measured, and how stable this measurement is.
>> Second, I wonder if there is some easy way to perform automatic latency
>> tests, i.e. how fast does Git show the first page of output...
>
> I have talked with Jeff Hostetler about this, to see if we can have a
> "time to first page" traced with trace2, but we don't seem to have
> access to that information within Git. We would need to insert it into
> the pager. The "-100" is used instead.
Perhaps another solution to the problem of "first page of output"
latency tests could be feasible, namely create a helper test-pager-1p
"pager" program that would automatically quit after first page of
output; or perhaps even one that benchmarks each page of output
automatically.
There exists 'pv' (pipe viewer) program for pipes, so I think it would
be possible to do equivalent, but as a pager.
[...]
>>> +static inline void test_flag_and_insert(struct prio_queue *q, struct commit *c, int flag)
>>> +{
>>> + if (c->object.flags & flag)
>>> + return;
>>> +
>>> + c->object.flags |= flag;
>>> + prio_queue_put(q, c);
>>> +}
>>
>> This is an independent change, though I see that it is quite specific
>> (as opposed to quite generic prio_queue_peek() operation added earlier
>> in this series), so it does not make much sense as standalone change.
>>
>> It inserts commit into priority queue only if it didn't have flags set,
>> and sets the flag (so we won't add it to the queue again, not without
>> unsetting the flag), am I correct?
>
> Yes, this pattern of using a flag to avoid duplicate entries in the
> priority queue appears in multiple walks. It wasn't needed before. We
> call it four times in the code below.
>> I guess that we use test_flag_and_insert() instead of prio_queue_put()
>> to avoid duplicate entries in the queue. I think the queue is initially
>> populated with the starting commits, but those need not to be
>> unreachable from each other, and walking down parents we can encounter
>> starting commit already in the queue. Am I correct?
>
> We can also reach commits in multiple ways, so the initial conditions
> are not the only ways to insert duplicates.
Right.
[...]
>>> + if (c->object.flags & UNINTERESTING)
>>> + mark_parents_uninteresting(c);
>>> +
>>> + for (p = c->parents; p; p = p->next)
>>> + test_flag_and_insert(&info->explore_queue, p->item, TOPO_WALK_EXPLORED);
>>
>> Do we need to insert parents to the queue even if they were marked
>> UNINTERESTING?
>
> We need to propagate the UNINTERESTING flag to our parents. That
> propagation happens in process_parents().
I think I understand. We need to propagate UNINTERESTING flag down the
chain, isn't it?
[...]
>>> static void init_topo_walk(struct rev_info *revs)
>>> {
>>> struct topo_walk_info *info;
>>> + struct commit_list *list;
>>> revs->topo_walk_info = xmalloc(sizeof(struct topo_walk_info));
>>> info = revs->topo_walk_info;
>>> memset(info, 0, sizeof(struct topo_walk_info));
>>> - limit_list(revs);
>>> - sort_in_topological_order(&revs->commits, revs->sort_order);
>>> + init_indegree_slab(&info->indegree);
>>> + memset(&info->explore_queue, '\0', sizeof(info->explore_queue));
>>> + memset(&info->indegree_queue, '\0', sizeof(info->indegree_queue));
>>> + memset(&info->topo_queue, '\0', sizeof(info->topo_queue));
>>
>> Why this memset uses '\0' as a filler value and not 0? The queues are
>> not strings [and you use 0 in other places].
I think you missed answering about this issue.
[...]
>> Looks all right.
>
> Thanks for taking the time on this huge patch!
You are welcome.
--
Jakub Narębski
next prev parent reply other threads:[~2018-10-26 16:55 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-27 20:41 [PATCH 0/6] Use generation numbers for --topo-order Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 21:23 ` [PATCH 0/6] Use generation numbers for --topo-order Junio C Hamano
2018-09-18 4:08 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-09-18 18:02 ` SZEDER Gábor
2018-09-19 19:31 ` Junio C Hamano
2018-09-19 19:38 ` Junio C Hamano
2018-09-20 21:18 ` Junio C Hamano
2018-09-18 4:08 ` [PATCH v2 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-18 5:51 ` Ævar Arnfjörð Bjarmason
2018-09-18 6:05 ` [PATCH v2 0/6] Use generation numbers for --topo-order Ævar Arnfjörð Bjarmason
2018-09-21 15:47 ` Derrick Stolee
2018-09-21 17:39 ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2018-09-21 17:39 ` [PATCH v3 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-26 19:15 ` Derrick Stolee
2018-10-11 13:54 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-11 13:57 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-11 13:58 ` Jeff King
2018-10-12 4:34 ` Junio C Hamano
2018-09-21 17:39 ` [PATCH v3 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-11 14:06 ` Jeff King
2018-10-12 6:33 ` Junio C Hamano
2018-10-12 12:32 ` Derrick Stolee
2018-10-12 16:15 ` Johannes Sixt
2018-10-13 8:05 ` Junio C Hamano
2018-09-21 17:39 ` [PATCH v3 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-11 14:21 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 6/7] revision.h: add whitespace in flag definitions Derrick Stolee via GitGitGadget
2018-09-21 17:39 ` [PATCH v3 7/7] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-27 17:57 ` Derrick Stolee
2018-10-06 16:56 ` Jakub Narebski
2018-10-11 15:35 ` Jeff King
2018-10-11 16:21 ` Derrick Stolee
2018-10-25 9:43 ` Jeff King
2018-10-25 13:00 ` Derrick Stolee
2018-10-11 22:32 ` Stefan Beller
2018-09-21 21:22 ` [PATCH v3 0/7] Use generation numbers for --topo-order Junio C Hamano
2018-10-16 22:36 ` [PATCH v4 " Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-21 10:21 ` Jakub Narebski
2018-10-21 15:28 ` Derrick Stolee
2018-10-16 22:36 ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-21 15:55 ` Jakub Narebski
2018-10-22 1:12 ` Junio C Hamano
2018-10-22 1:51 ` Derrick Stolee
2018-10-22 1:55 ` [RFC PATCH] revision.c: use new algorithm in A..B case Derrick Stolee
2018-10-25 8:28 ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Junio C Hamano
2018-10-26 20:56 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-21 21:17 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee via GitGitGadget
2018-10-22 13:37 ` Jakub Narebski
2018-10-23 13:54 ` Derrick Stolee
2018-10-26 16:55 ` Jakub Narebski [this message]
2018-10-16 22:36 ` [PATCH v4 7/7] t6012: make rev-list tests more interesting Derrick Stolee via GitGitGadget
2018-10-23 15:48 ` Jakub Narebski
2018-10-21 12:57 ` [PATCH v4 0/7] Use generation numbers for --topo-order Jakub Narebski
2018-11-01 5:21 ` Junio C Hamano
2018-11-01 13:49 ` Derrick Stolee
2018-11-01 23:54 ` Junio C Hamano
2018-11-01 13:46 ` [PATCH v5 " Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 1/7] prio-queue: add 'peek' operation Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 2/7] test-reach: add run_three_modes method Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 3/7] test-reach: add rev-list tests Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee
2018-11-01 15:48 ` SZEDER Gábor
2018-11-01 16:12 ` Derrick Stolee
2019-11-08 2:50 ` Mike Hommey
2019-11-11 1:07 ` Derrick Stolee
2019-11-18 23:04 ` SZEDER Gábor
2018-11-01 13:46 ` [PATCH v5 7/7] t6012: make rev-list tests more interesting Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86bm7g641x.fsf@gmail.com \
--to=jnareb@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.