From: Derrick Stolee <stolee@gmail.com>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Thomas Rast <tr@thomasrast.ch>
Subject: Re: [PATCH 2/2] line-log: avoid unnecessary full tree diffs
Date: Thu, 22 Aug 2019 10:53:13 -0400 [thread overview]
Message-ID: <5261a122-cdcb-dab5-dffa-75976c607017@gmail.com> (raw)
In-Reply-To: <20190822084158.GC20404@szeder.dev>
On 8/22/2019 4:41 AM, SZEDER Gábor wrote:
> On Wed, Aug 21, 2019 at 07:35:15PM +0200, SZEDER Gábor wrote:
>> So line-level log clearly computes a lot less diffs than
>> '--full-history', though still about 50% more than a regular
>> pathspec-limited history traversal. Looking at the commit-parent
>> pairs in the output, it appears that the difference comes mostly from
>> merge commits, because line-level log compares a merge commit with all
>> of its parents.
>
>> It seems there is still more room for improvements by avoiding
>> commit-non_first_parent diffs when the first parent is TREESAME, and
>> doing so could hopefully avoid triggering rename detection in those
>> subtree merges or in case of your evil path.
>
> Well, that fruit hung much lower than I though, just look at the size
> of the WIP patch below. I just hope that there are no unexpected
> surprises, but FWIW it produces the exact same output for all files up
> to 't/t5515' in v2.23.0 as the previous patch.
>
> Can't wait to see how it fares with that evil Windows path :)
Thanks for this! With this patch, we finally have the time down to ~20s.
This is a HUGE improvement, especially considering there is only one result
for the particular section, so the entire history is explored in that time.
> --- >8 ---
>
> Subject: [PATCH 3/2] WIP line-log: stop diff-ing after first TREESAME merge parent
>
> # git.git, ~25% of all commits are merges
> $ time git --no-pager log -L:read_alternate_refs:sha1-file.c v2.23.0
>
> Before:
>
> real 0m2.516s
> user 0m2.456s
> sys 0m0.060s
>
> After:
>
> real 0m1.132s
> user 0m1.096s
> sys 0m0.036s
>
> # linux.git, ~7% of all commits are merges
> $ time ~/src/git/git --no-pager log \
> -L:build_restore_work_registers:arch/mips/mm/tlbex.c v5.2
>
> Before:
>
> real 0m2.599s
> user 0m2.466s
> sys 0m0.157s
>
> After:
>
> real 0m1.976s
> user 0m1.856s
> sys 0m0.121s
>
> [TODO: get rid of unnecessary arrays, tests?, write commit message...]
> ---
> line-log.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/line-log.c b/line-log.c
> index 9010e00950..a4b032f83a 100644
> --- a/line-log.c
> +++ b/line-log.c
> @@ -1184,13 +1184,11 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
>
> p = commit->parents;
> for (i = 0; i < nparents; i++) {
> + int changed;
> parents[i] = p->item;
> p = p->next;
> queue_diffs(range, &rev->diffopt, &diffqueues[i], commit, parents[i]);
> - }
>
> - for (i = 0; i < nparents; i++) {
> - int changed;
> cand[i] = NULL;
> changed = process_all_files(&cand[i], rev, &diffqueues[i], range);
> if (!changed) {
Interesting. The old logic computed ALL the diffs, then started navigating.
By navigating before computing all the diffs, we are now avoiding the rename logic
on the SECOND parent, and there will be a lot of second parents that do not include
the file (depending on the number of parallel topics being merged independently).
That's why git.git has a better performance difference than linux.git.
> @@ -1203,7 +1201,7 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
> commit_list_append(parents[i], &commit->parents);
> free(parents);
> free(cand);
> - free_diffqueues(nparents, diffqueues);
> + free_diffqueues(i, diffqueues);
Good point here, as we haven't initialized all of the queues.
Thanks,
-Stolee
next prev parent reply other threads:[~2019-08-22 14:53 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-21 11:04 [PATCH 0/2] line-log: avoid unnecessary full tree diffs SZEDER Gábor
2019-08-21 11:04 ` [PATCH 1/2] line-log: extract pathspec parsing from line ranges into a helper function SZEDER Gábor
2019-08-21 11:04 ` [PATCH 2/2] line-log: avoid unnecessary full tree diffs SZEDER Gábor
2019-08-21 15:53 ` Derrick Stolee
2019-08-21 17:35 ` SZEDER Gábor
2019-08-21 18:12 ` Derrick Stolee
2019-08-22 8:41 ` SZEDER Gábor
2019-08-22 14:53 ` Derrick Stolee [this message]
2019-08-22 16:01 ` Junio C Hamano
2019-08-22 16:26 ` SZEDER Gábor
2019-08-22 16:51 ` Derrick Stolee
2019-08-23 10:04 ` SZEDER Gábor
2019-08-21 17:29 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5261a122-cdcb-dab5-dffa-75976c607017@gmail.com \
--to=stolee@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=szeder.dev@gmail.com \
--cc=tr@thomasrast.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).