From: Jakub Narebski <jnareb@gmail.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Junio C Hamano <gitster@pobox.com>,
Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic
Date: Sun, 21 Oct 2018 17:55:23 +0200 [thread overview]
Message-ID: <86h8hfguqc.fsf@gmail.com> (raw)
In-Reply-To: <cd9eef96885a811196ab0b851a98de4455b950ab.1539729393.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Tue, 16 Oct 2018 15:36:42 -0700 (PDT)")
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> When running 'git rev-list --topo-order' and its kin, the topo_order
> setting in struct rev_info implies the limited setting. This means
> that the following things happen during prepare_revision_walk():
>
> * revs->limited implies we run limit_list() to walk the entire
> reachable set. There are some short-cuts here, such as if we
> perform a range query like 'git rev-list COMPARE..HEAD' and we
> can stop limit_list() when all queued commits are uninteresting.
And if revs->topo_order is set, then (with current implementation) we
need limit_list() to run to generate commit_list with commits to be
topologically sorted, which is done by setting revs->limited.
In short, with current code revs->topo_order implies revs->limited.
>
> * revs->topo_order implies we run sort_in_topological_order(). See
> the implementation of that method in commit.c. It implies that
> the full set of commits to order is in the given commit_list.
So the current code uses "generate list of commits, then sort it"
approach...
>
> These two methods imply that a 'git rev-list --topo-order HEAD'
> command must walk the entire reachable set of commits _twice_ before
> returning a single result.
>
> If we have a commit-graph file with generation numbers computed, then
> there is a better way.
...instead of generating commits in topological order as you go.
> This patch introduces some necessary logic
> redirection when we are in this situation.
O.K., this should make main commit smaller. All right.
> In v2.18.0, the commit-graph file contains zero-valued bytes in the
> positions where the generation number is stored in v2.19.0 and later.
> Thus, we use generation_numbers_enabled() to check if the commit-graph
> is available and has non-zero generation numbers.
>
> When setting revs->limited only because revs->topo_order is true,
> only do so if generation numbers are not available. There is no
> reason to use the new logic as it will behave similarly when all
> generation numbers are INFINITY or ZERO.
O.K. we will be using new algorithm only when there actually are some
generation numbers.
> In prepare_revision_walk(), if we have revs->topo_order but not
> revs->limited, then we trigger the new logic. It breaks the logic
> into three pieces, to fit with the existing framework:
So if revs->limited is set (but not because revs->topo_order is set),
which means A..B queries, we will be still using the old algorithm.
All right, though I wonder if it could be improved in the future
(perhaps with the help of other graph labelling / indices than
generation numbers, maybe a positive-cut index).
Do you have an idea why there is no improvement with the new code in
this case?
> 1. init_topo_walk() fills a new struct topo_walk_info in the rev_info
> struct. We use the presence of this struct as a signal to use the
> new methods during our walk. In this patch, this method simply
> calls limit_list() and sort_in_topological_order(). In the future,
> this method will set up a new data structure to perform that logic
> in-line.
>
> 2. next_topo_commit() provides get_revision_1() with the next topo-
> ordered commit in the list. Currently, this simply pops the commit
> from revs->commits.
>
> 3. expand_topo_walk() provides get_revision_1() with a way to signal
> walking beyond the latest commit. Currently, this calls
> add_parents_to_list() exactly like the old logic.
So all three new functions should perform exactly like the old logic,
isn't it?
> While this commit presents method redirection for performing the
> exact same logic as before, it allows the next commit to focus only
> on the new logic.
All right, it's logical.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> revision.c | 42 ++++++++++++++++++++++++++++++++++++++----
> revision.h | 4 ++++
> 2 files changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/revision.c b/revision.c
> index e18bd530e4..2dcde8a8ac 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -25,6 +25,7 @@
> #include "worktree.h"
> #include "argv-array.h"
> #include "commit-reach.h"
> +#include "commit-graph.h"
>
> volatile show_early_output_fn_t show_early_output;
>
> @@ -2454,7 +2455,7 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s
> if (revs->diffopt.objfind)
> revs->simplify_history = 0;
>
> - if (revs->topo_order)
> + if (revs->topo_order && !generation_numbers_enabled(the_repository))
> revs->limited = 1;
All right, with --topo-order and existing generation numbers don't force
the revs->limited code (i.e. explicit not wrapped use of limit_list()).
So with --topo-order and A..B, we have revs->limited set, with
--topo-order and no generation numbers we have revs->limited set.
>
> if (revs->prune_data.nr) {
> @@ -2892,6 +2893,33 @@ static int mark_uninteresting(const struct object_id *oid,
> return 0;
> }
>
> +struct topo_walk_info {};
Nice trick with using NULL-ness of the pointer to the currently empty
struct as a boolean flag denoting whether to use new generation number
using algorithm for topological sorting.
> +
> +static void init_topo_walk(struct rev_info *revs)
> +{
> + struct topo_walk_info *info;
I guess this helper variables is here for next revisions, as we could
have made without it...
> + revs->topo_walk_info = xmalloc(sizeof(struct topo_walk_info));
> + info = revs->topo_walk_info;
> + memset(info, 0, sizeof(struct topo_walk_info));
...by using
+ memset(revs->topo_walk_info, 0, sizeof(struct topo_walk_info));
> +
> + limit_list(revs);
> + sort_in_topological_order(&revs->commits, revs->sort_order);
This is not exactly identical to the old code, which has
if (limit_list(revs) < 0)
return -1;
if (revs->topo_order)
sort_in_topological_order(&revs->commits, revs->sort_order);
We know that init_topo_walk() would be invoked, as the name implies,
only when revs->topo_order is set, but do we know that limit_list()
would not return an error?
> +}
> +
> +static struct commit *next_topo_commit(struct rev_info *revs)
> +{
> + return pop_commit(&revs->commits);
> +}
All right, identical to the old code.
> +
> +static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
> +{
> + if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0) {
> + if (!revs->ignore_missing_links)
> + die("Failed to traverse parents of commit %s",
> + oid_to_hex(&commit->object.oid));
> + }
> +}
All right, identical to the old code.
While at it, should this message be marked up for translation, or is it
something so low-level (and rare) that should be kept untranslated? But
this would be better left for separate commit series, to not entangle
this one with spurious changes.
> +
> int prepare_revision_walk(struct rev_info *revs)
> {
> int i;
> @@ -2928,11 +2956,13 @@ int prepare_revision_walk(struct rev_info *revs)
> commit_list_sort_by_date(&revs->commits);
> if (revs->no_walk)
> return 0;
> - if (revs->limited)
> + if (revs->limited) {
> if (limit_list(revs) < 0)
> return -1;
> - if (revs->topo_order)
> - sort_in_topological_order(&revs->commits, revs->sort_order);
> + if (revs->topo_order)
> + sort_in_topological_order(&revs->commits, revs->sort_order);
> + } else if (revs->topo_order)
> + init_topo_walk(revs);
Previously when revs->topo_order was set, Git called
sort_in_topological_order(), because revs->limited got always set to
truthy value if revs->topo_order was true.
Now running sort_in_topological_order() is done only if revs->limited is
set (because of A..B); if it is not, init_topo_walk() is called.
All right, identical to the old code, up to checking the return value of
limit_list(), see previous comments.
> if (revs->line_level_traverse)
> line_log_filter(revs);
> if (revs->simplify_merges)
> @@ -3257,6 +3287,8 @@ static struct commit *get_revision_1(struct rev_info *revs)
>
> if (revs->reflog_info)
> commit = next_reflog_entry(revs->reflog_info);
> + else if (revs->topo_walk_info)
> + commit = next_topo_commit(revs);
> else
> commit = pop_commit(&revs->commits);
All right, identical to the old code.
> @@ -3278,6 +3310,8 @@ static struct commit *get_revision_1(struct rev_info *revs)
>
> if (revs->reflog_info)
> try_to_simplify_commit(revs, commit);
> + else if (revs->topo_walk_info)
> + expand_topo_walk(revs, commit);
> else if (add_parents_to_list(revs, commit, &revs->commits, NULL) < 0) {
> if (!revs->ignore_missing_links)
> die("Failed to traverse parents of commit %s",
All right, identical to the old code.
> diff --git a/revision.h b/revision.h
> index 2b30ac270d..fd4154ff75 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -56,6 +56,8 @@ struct rev_cmdline_info {
> #define REVISION_WALK_NO_WALK_SORTED 1
> #define REVISION_WALK_NO_WALK_UNSORTED 2
>
> +struct topo_walk_info;
> +
> struct rev_info {
> /* Starting list */
> struct commit_list *commits;
> @@ -245,6 +247,8 @@ struct rev_info {
> const char *break_bar;
>
> struct revision_sources *sources;
> +
> + struct topo_walk_info *topo_walk_info;
> };
>
> int ref_excluded(struct string_list *, const char *path);
next prev parent reply other threads:[~2018-10-21 15:55 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-27 20:41 [PATCH 0/6] Use generation numbers for --topo-order Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-08-27 20:41 ` [PATCH 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-08-27 21:23 ` [PATCH 0/6] Use generation numbers for --topo-order Junio C Hamano
2018-09-18 4:08 ` [PATCH v2 " Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 1/6] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 2/6] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-09-18 18:02 ` SZEDER Gábor
2018-09-19 19:31 ` Junio C Hamano
2018-09-19 19:38 ` Junio C Hamano
2018-09-20 21:18 ` Junio C Hamano
2018-09-18 4:08 ` [PATCH v2 3/6] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 4/6] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 5/6] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-09-18 4:08 ` [PATCH v2 6/6] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-18 5:51 ` Ævar Arnfjörð Bjarmason
2018-09-18 6:05 ` [PATCH v2 0/6] Use generation numbers for --topo-order Ævar Arnfjörð Bjarmason
2018-09-21 15:47 ` Derrick Stolee
2018-09-21 17:39 ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2018-09-21 17:39 ` [PATCH v3 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-09-26 19:15 ` Derrick Stolee
2018-10-11 13:54 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-11 13:57 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-11 13:58 ` Jeff King
2018-10-12 4:34 ` Junio C Hamano
2018-09-21 17:39 ` [PATCH v3 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-11 14:06 ` Jeff King
2018-10-12 6:33 ` Junio C Hamano
2018-10-12 12:32 ` Derrick Stolee
2018-10-12 16:15 ` Johannes Sixt
2018-10-13 8:05 ` Junio C Hamano
2018-09-21 17:39 ` [PATCH v3 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-11 14:21 ` Jeff King
2018-09-21 17:39 ` [PATCH v3 6/7] revision.h: add whitespace in flag definitions Derrick Stolee via GitGitGadget
2018-09-21 17:39 ` [PATCH v3 7/7] revision.c: refactor basic topo-order logic Derrick Stolee via GitGitGadget
2018-09-27 17:57 ` Derrick Stolee
2018-10-06 16:56 ` Jakub Narebski
2018-10-11 15:35 ` Jeff King
2018-10-11 16:21 ` Derrick Stolee
2018-10-25 9:43 ` Jeff King
2018-10-25 13:00 ` Derrick Stolee
2018-10-11 22:32 ` Stefan Beller
2018-09-21 21:22 ` [PATCH v3 0/7] Use generation numbers for --topo-order Junio C Hamano
2018-10-16 22:36 ` [PATCH v4 " Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 1/7] prio-queue: add 'peek' operation Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 2/7] test-reach: add run_three_modes method Derrick Stolee via GitGitGadget
2018-10-16 22:36 ` [PATCH v4 3/7] test-reach: add rev-list tests Derrick Stolee via GitGitGadget
2018-10-21 10:21 ` Jakub Narebski
2018-10-21 15:28 ` Derrick Stolee
2018-10-16 22:36 ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee via GitGitGadget
2018-10-21 15:55 ` Jakub Narebski [this message]
2018-10-22 1:12 ` Junio C Hamano
2018-10-22 1:51 ` Derrick Stolee
2018-10-22 1:55 ` [RFC PATCH] revision.c: use new algorithm in A..B case Derrick Stolee
2018-10-25 8:28 ` [PATCH v4 4/7] revision.c: begin refactoring --topo-order logic Junio C Hamano
2018-10-26 20:56 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee via GitGitGadget
2018-10-21 21:17 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee via GitGitGadget
2018-10-22 13:37 ` Jakub Narebski
2018-10-23 13:54 ` Derrick Stolee
2018-10-26 16:55 ` Jakub Narebski
2018-10-16 22:36 ` [PATCH v4 7/7] t6012: make rev-list tests more interesting Derrick Stolee via GitGitGadget
2018-10-23 15:48 ` Jakub Narebski
2018-10-21 12:57 ` [PATCH v4 0/7] Use generation numbers for --topo-order Jakub Narebski
2018-11-01 5:21 ` Junio C Hamano
2018-11-01 13:49 ` Derrick Stolee
2018-11-01 23:54 ` Junio C Hamano
2018-11-01 13:46 ` [PATCH v5 " Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 1/7] prio-queue: add 'peek' operation Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 2/7] test-reach: add run_three_modes method Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 3/7] test-reach: add rev-list tests Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 4/7] revision.c: begin refactoring --topo-order logic Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 5/7] commit/revisions: bookkeeping before refactoring Derrick Stolee
2018-11-01 13:46 ` [PATCH v5 6/7] revision.c: generation-based topo-order algorithm Derrick Stolee
2018-11-01 15:48 ` SZEDER Gábor
2018-11-01 16:12 ` Derrick Stolee
2019-11-08 2:50 ` Mike Hommey
2019-11-11 1:07 ` Derrick Stolee
2019-11-18 23:04 ` SZEDER Gábor
2018-11-01 13:46 ` [PATCH v5 7/7] t6012: make rev-list tests more interesting Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86h8hfguqc.fsf@gmail.com \
--to=jnareb@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.