git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: git@vger.kernel.org
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [PATCH 1/4] line-log: avoid unnecessary tree diffs when processing merge commits
Date: Sun, 24 Aug 2025 21:06:41 +0200	[thread overview]
Message-ID: <20250824190644.2573279-2-szeder.dev@gmail.com> (raw)
In-Reply-To: <20250824190644.2573279-1-szeder.dev@gmail.com>

In process_ranges_merge_commit(), the line-level log first creates an
array of diff queues by iterating over all parents of a merge commit
and computing a tree diff for each.  Then in a second loop it iterates
over those diff queues, and if it finds that none of the interesting
paths were modified in one of them, then it will return early.  This
means that when none of the interesting paths were modified between a
merge and its first parent, then the tree diff between the merge and
its second (Nth...) parent was computed in vain.

Unify these two loops, so when it iterates over all parents of a merge
commit, then it first computes the tree diff between the merge and
that particular parent and then processes the resulting diff queue
right away.  This way we can spare some tree diff computing, thereby
speeding up line-level log in repositories with mergy history:

  # git.git, 25.8% of commits are merges:
  Benchmark 1: ./git_v2.51.0 -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0
    Time (mean ± σ):      1.001 s ±  0.009 s    [User: 0.906 s, System: 0.095 s]
    Range (min … max):    0.991 s …  1.023 s    10 runs

  Benchmark 2: ./git -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0
    Time (mean ± σ):     445.5 ms ±   3.4 ms    [User: 358.8 ms, System: 84.3 ms]
    Range (min … max):   440.1 ms … 450.3 ms    10 runs

  Summary
    './git -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0' ran
      2.25 ± 0.03 times faster than './git_v2.51.0 -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0'

  # linux.git, 7.5% of commits are merges:
  Benchmark 1: ./git_v2.51.0 -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16
    Time (mean ± σ):      3.246 s ±  0.007 s    [User: 2.835 s, System: 0.409 s]
    Range (min … max):    3.232 s …  3.255 s    10 runs

  Benchmark 2: ./git -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16
    Time (mean ± σ):      2.467 s ±  0.014 s    [User: 2.113 s, System: 0.353 s]
    Range (min … max):    2.455 s …  2.505 s    10 runs

  Summary
    './git -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16' ran
      1.32 ± 0.01 times faster than './git_v2.51.0 -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16'

And since now each iteration computes a tree diff and processes its
result, there is no reason to store the diff queues for each merge
parent anymore, so replace that diff queue array with a loop-local
diff queue variable.  With this change the static free_diffqueues()
helper function in 'line-log.c' has no more callers left, remove it.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 line-log.c | 20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/line-log.c b/line-log.c
index 07f2154e84..cf30915c94 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1087,13 +1087,6 @@ static struct diff_filepair *diff_filepair_dup(struct diff_filepair *pair)
 	return new_filepair;
 }
 
-static void free_diffqueues(int n, struct diff_queue_struct *dq)
-{
-	for (int i = 0; i < n; i++)
-		diff_queue_clear(&dq[i]);
-	free(dq);
-}
-
 static int process_all_files(struct line_log_data **range_out,
 			     struct rev_info *rev,
 			     struct diff_queue_struct *queue,
@@ -1209,7 +1202,6 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c
 static int process_ranges_merge_commit(struct rev_info *rev, struct commit *commit,
 				       struct line_log_data *range)
 {
-	struct diff_queue_struct *diffqueues;
 	struct line_log_data **cand;
 	struct commit **parents;
 	struct commit_list *p;
@@ -1220,20 +1212,19 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
 	if (nparents > 1 && rev->first_parent_only)
 		nparents = 1;
 
-	ALLOC_ARRAY(diffqueues, nparents);
 	CALLOC_ARRAY(cand, nparents);
 	ALLOC_ARRAY(parents, nparents);
 
 	p = commit->parents;
 	for (i = 0; i < nparents; i++) {
+		struct diff_queue_struct diffqueue = DIFF_QUEUE_INIT;
+		int changed;
 		parents[i] = p->item;
 		p = p->next;
-		queue_diffs(range, &rev->diffopt, &diffqueues[i], commit, parents[i]);
-	}
+		queue_diffs(range, &rev->diffopt, &diffqueue, commit, parents[i]);
 
-	for (i = 0; i < nparents; i++) {
-		int changed;
-		changed = process_all_files(&cand[i], rev, &diffqueues[i], range);
+		changed = process_all_files(&cand[i], rev, &diffqueue, range);
+		diff_queue_clear(&diffqueue);
 		if (!changed) {
 			/*
 			 * This parent can take all the blame, so we
@@ -1267,7 +1258,6 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
 		free(cand[i]);
 	}
 	free(cand);
-	free_diffqueues(nparents, diffqueues);
 	return ret;
 
 	/* NEEDSWORK evil merge detection stuff */
-- 
2.51.0.433.g1a66b3fb12


  reply	other threads:[~2025-08-24 19:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-24 19:06 [PATCH 0/4] line-log: optimize merge commit processing SZEDER Gábor
2025-08-24 19:06 ` SZEDER Gábor [this message]
2025-08-25 14:13   ` [PATCH 1/4] line-log: avoid unnecessary tree diffs when processing merge commits Derrick Stolee
2025-08-25 15:35   ` Junio C Hamano
2025-08-28 20:27     ` SZEDER Gábor
2025-08-24 19:06 ` [PATCH 2/4] line-log: get rid of the parents array in process_ranges_merge_commit() SZEDER Gábor
2025-08-24 19:06 ` [PATCH 3/4] line-log: initialize diff queue in process_ranges_ordinary_commit() SZEDER Gábor
2025-08-24 19:06 ` [PATCH 4/4] line-log: simplify condition checking for merge commits SZEDER Gábor
2025-08-25 20:57   ` Junio C Hamano
2025-08-25 21:43     ` Derrick Stolee
2025-08-25 21:57       ` Junio C Hamano
2025-08-25 14:16 ` [PATCH 0/4] line-log: optimize merge commit processing Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250824190644.2573279-2-szeder.dev@gmail.com \
    --to=szeder.dev@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).