git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] line-log: optimize merge commit processing
@ 2025-08-24 19:06 SZEDER Gábor
  2025-08-24 19:06 ` [PATCH 1/4] line-log: avoid unnecessary tree diffs when processing merge commits SZEDER Gábor
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: SZEDER Gábor @ 2025-08-24 19:06 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor

The first patch is an optimization of the line-level log machinery.
The rest are cleanups in the area that I subjectively consider slight
improvements.

SZEDER Gábor (4):
  line-log: avoid unnecessary tree diffs when processing merge commits
  line-log: get rid of the parents array in
    process_ranges_merge_commit()
  line-log: initialize diff queue in process_ranges_ordinary_commit()
  line-log: simplify condition checking for merge commits

 line-log.c | 50 ++++++++++++++++++++------------------------------
 1 file changed, 20 insertions(+), 30 deletions(-)

-- 
2.51.0.433.g1a66b3fb12


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] line-log: avoid unnecessary tree diffs when processing merge commits
  2025-08-24 19:06 [PATCH 0/4] line-log: optimize merge commit processing SZEDER Gábor
@ 2025-08-24 19:06 ` SZEDER Gábor
  2025-08-25 14:13   ` Derrick Stolee
  2025-08-25 15:35   ` Junio C Hamano
  2025-08-24 19:06 ` [PATCH 2/4] line-log: get rid of the parents array in process_ranges_merge_commit() SZEDER Gábor
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 12+ messages in thread
From: SZEDER Gábor @ 2025-08-24 19:06 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor

In process_ranges_merge_commit(), the line-level log first creates an
array of diff queues by iterating over all parents of a merge commit
and computing a tree diff for each.  Then in a second loop it iterates
over those diff queues, and if it finds that none of the interesting
paths were modified in one of them, then it will return early.  This
means that when none of the interesting paths were modified between a
merge and its first parent, then the tree diff between the merge and
its second (Nth...) parent was computed in vain.

Unify these two loops, so when it iterates over all parents of a merge
commit, then it first computes the tree diff between the merge and
that particular parent and then processes the resulting diff queue
right away.  This way we can spare some tree diff computing, thereby
speeding up line-level log in repositories with mergy history:

  # git.git, 25.8% of commits are merges:
  Benchmark 1: ./git_v2.51.0 -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0
    Time (mean ± σ):      1.001 s ±  0.009 s    [User: 0.906 s, System: 0.095 s]
    Range (min … max):    0.991 s …  1.023 s    10 runs

  Benchmark 2: ./git -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0
    Time (mean ± σ):     445.5 ms ±   3.4 ms    [User: 358.8 ms, System: 84.3 ms]
    Range (min … max):   440.1 ms … 450.3 ms    10 runs

  Summary
    './git -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0' ran
      2.25 ± 0.03 times faster than './git_v2.51.0 -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0'

  # linux.git, 7.5% of commits are merges:
  Benchmark 1: ./git_v2.51.0 -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16
    Time (mean ± σ):      3.246 s ±  0.007 s    [User: 2.835 s, System: 0.409 s]
    Range (min … max):    3.232 s …  3.255 s    10 runs

  Benchmark 2: ./git -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16
    Time (mean ± σ):      2.467 s ±  0.014 s    [User: 2.113 s, System: 0.353 s]
    Range (min … max):    2.455 s …  2.505 s    10 runs

  Summary
    './git -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16' ran
      1.32 ± 0.01 times faster than './git_v2.51.0 -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16'

And since now each iteration computes a tree diff and processes its
result, there is no reason to store the diff queues for each merge
parent anymore, so replace that diff queue array with a loop-local
diff queue variable.  With this change the static free_diffqueues()
helper function in 'line-log.c' has no more callers left, remove it.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 line-log.c | 20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/line-log.c b/line-log.c
index 07f2154e84..cf30915c94 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1087,13 +1087,6 @@ static struct diff_filepair *diff_filepair_dup(struct diff_filepair *pair)
 	return new_filepair;
 }
 
-static void free_diffqueues(int n, struct diff_queue_struct *dq)
-{
-	for (int i = 0; i < n; i++)
-		diff_queue_clear(&dq[i]);
-	free(dq);
-}
-
 static int process_all_files(struct line_log_data **range_out,
 			     struct rev_info *rev,
 			     struct diff_queue_struct *queue,
@@ -1209,7 +1202,6 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c
 static int process_ranges_merge_commit(struct rev_info *rev, struct commit *commit,
 				       struct line_log_data *range)
 {
-	struct diff_queue_struct *diffqueues;
 	struct line_log_data **cand;
 	struct commit **parents;
 	struct commit_list *p;
@@ -1220,20 +1212,19 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
 	if (nparents > 1 && rev->first_parent_only)
 		nparents = 1;
 
-	ALLOC_ARRAY(diffqueues, nparents);
 	CALLOC_ARRAY(cand, nparents);
 	ALLOC_ARRAY(parents, nparents);
 
 	p = commit->parents;
 	for (i = 0; i < nparents; i++) {
+		struct diff_queue_struct diffqueue = DIFF_QUEUE_INIT;
+		int changed;
 		parents[i] = p->item;
 		p = p->next;
-		queue_diffs(range, &rev->diffopt, &diffqueues[i], commit, parents[i]);
-	}
+		queue_diffs(range, &rev->diffopt, &diffqueue, commit, parents[i]);
 
-	for (i = 0; i < nparents; i++) {
-		int changed;
-		changed = process_all_files(&cand[i], rev, &diffqueues[i], range);
+		changed = process_all_files(&cand[i], rev, &diffqueue, range);
+		diff_queue_clear(&diffqueue);
 		if (!changed) {
 			/*
 			 * This parent can take all the blame, so we
@@ -1267,7 +1258,6 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
 		free(cand[i]);
 	}
 	free(cand);
-	free_diffqueues(nparents, diffqueues);
 	return ret;
 
 	/* NEEDSWORK evil merge detection stuff */
-- 
2.51.0.433.g1a66b3fb12


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] line-log: get rid of the parents array in process_ranges_merge_commit()
  2025-08-24 19:06 [PATCH 0/4] line-log: optimize merge commit processing SZEDER Gábor
  2025-08-24 19:06 ` [PATCH 1/4] line-log: avoid unnecessary tree diffs when processing merge commits SZEDER Gábor
@ 2025-08-24 19:06 ` SZEDER Gábor
  2025-08-24 19:06 ` [PATCH 3/4] line-log: initialize diff queue in process_ranges_ordinary_commit() SZEDER Gábor
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: SZEDER Gábor @ 2025-08-24 19:06 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor

We can easily iterate through the parents of a merge commit without
turning the list of parents into a dynamically allocated array of
parents, so let's do so.  This way we can avoid a memory allocation
for each processed merge commit, though its effect on runtime seems to
be unmeasurable.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 line-log.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/line-log.c b/line-log.c
index cf30915c94..b2a31ae956 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1203,7 +1203,6 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
 				       struct line_log_data *range)
 {
 	struct line_log_data **cand;
-	struct commit **parents;
 	struct commit_list *p;
 	int i;
 	int nparents = commit_list_count(commit->parents);
@@ -1213,15 +1212,15 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
 		nparents = 1;
 
 	CALLOC_ARRAY(cand, nparents);
-	ALLOC_ARRAY(parents, nparents);
 
-	p = commit->parents;
-	for (i = 0; i < nparents; i++) {
+	for (p = commit->parents, i = 0;
+	     p && i < nparents;
+	     p = p->next, i++) {
+		struct commit *parent = p->item;
 		struct diff_queue_struct diffqueue = DIFF_QUEUE_INIT;
 		int changed;
-		parents[i] = p->item;
-		p = p->next;
-		queue_diffs(range, &rev->diffopt, &diffqueue, commit, parents[i]);
+
+		queue_diffs(range, &rev->diffopt, &diffqueue, commit, parent);
 
 		changed = process_all_files(&cand[i], rev, &diffqueue, range);
 		diff_queue_clear(&diffqueue);
@@ -1230,9 +1229,9 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
 			 * This parent can take all the blame, so we
 			 * don't follow any other path in history
 			 */
-			add_line_range(rev, parents[i], cand[i]);
+			add_line_range(rev, parent, cand[i]);
 			free_commit_list(commit->parents);
-			commit_list_append(parents[i], &commit->parents);
+			commit_list_append(parent, &commit->parents);
 
 			ret = 0;
 			goto out;
@@ -1243,14 +1242,15 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
 	 * No single parent took the blame.  We add the candidates
 	 * from the above loop to the parents.
 	 */
-	for (i = 0; i < nparents; i++)
-		add_line_range(rev, parents[i], cand[i]);
+	for (p = commit->parents, i = 0;
+	     p && i < nparents;
+	     p = p->next, i++)
+		add_line_range(rev, p->item, cand[i]);
 
 	ret = 1;
 
 out:
 	clear_commit_line_range(rev, commit);
-	free(parents);
 	for (i = 0; i < nparents; i++) {
 		if (!cand[i])
 			continue;
-- 
2.51.0.433.g1a66b3fb12


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] line-log: initialize diff queue in process_ranges_ordinary_commit()
  2025-08-24 19:06 [PATCH 0/4] line-log: optimize merge commit processing SZEDER Gábor
  2025-08-24 19:06 ` [PATCH 1/4] line-log: avoid unnecessary tree diffs when processing merge commits SZEDER Gábor
  2025-08-24 19:06 ` [PATCH 2/4] line-log: get rid of the parents array in process_ranges_merge_commit() SZEDER Gábor
@ 2025-08-24 19:06 ` SZEDER Gábor
  2025-08-24 19:06 ` [PATCH 4/4] line-log: simplify condition checking for merge commits SZEDER Gábor
  2025-08-25 14:16 ` [PATCH 0/4] line-log: optimize merge commit processing Derrick Stolee
  4 siblings, 0 replies; 12+ messages in thread
From: SZEDER Gábor @ 2025-08-24 19:06 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor

process_ranges_ordinary_commit() uses a local diff queue variable,
which it leaves uninitialized before passing its address to
queue_diffs().  This is not an issue, because at the end of that
function the contents of an other diff queue is moved into it by
simply overwriting whatever is in there, i.e. without reading any
uninitialized memory.

Still, seeing the uninitialized diff queue being passed around scared
me more than once, so out of caution let's make sure that it's
initialized.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 line-log.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/line-log.c b/line-log.c
index b2a31ae956..71fa857ee8 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1182,7 +1182,7 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c
 					  struct line_log_data *range)
 {
 	struct commit *parent = NULL;
-	struct diff_queue_struct queue;
+	struct diff_queue_struct queue = DIFF_QUEUE_INIT;
 	struct line_log_data *parent_range;
 	int changed;
 
-- 
2.51.0.433.g1a66b3fb12


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] line-log: simplify condition checking for merge commits
  2025-08-24 19:06 [PATCH 0/4] line-log: optimize merge commit processing SZEDER Gábor
                   ` (2 preceding siblings ...)
  2025-08-24 19:06 ` [PATCH 3/4] line-log: initialize diff queue in process_ranges_ordinary_commit() SZEDER Gábor
@ 2025-08-24 19:06 ` SZEDER Gábor
  2025-08-25 20:57   ` Junio C Hamano
  2025-08-25 14:16 ` [PATCH 0/4] line-log: optimize merge commit processing Derrick Stolee
  4 siblings, 1 reply; 12+ messages in thread
From: SZEDER Gábor @ 2025-08-24 19:06 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor

In process_ranges_arbitrary_commit() the condition deciding whether
the given commit is not a merge, i.e. that it doesn't have more than
one parent, is head-scratchingly backwards, flip it.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 line-log.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/line-log.c b/line-log.c
index 71fa857ee8..188d387d40 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1273,10 +1273,10 @@ int line_log_process_ranges_arbitrary_commit(struct rev_info *rev, struct commit
 			struct line_log_data *prange = line_log_data_copy(range);
 			add_line_range(rev, commit->parents->item, prange);
 			clear_commit_line_range(rev, commit);
-		} else if (!commit->parents || !commit->parents->next)
-			changed = process_ranges_ordinary_commit(rev, commit, range);
-		else
+		} else if (commit->parents && commit->parents->next)
 			changed = process_ranges_merge_commit(rev, commit, range);
+		else
+			changed = process_ranges_ordinary_commit(rev, commit, range);
 	}
 
 	if (!changed)
-- 
2.51.0.433.g1a66b3fb12


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] line-log: avoid unnecessary tree diffs when processing merge commits
  2025-08-24 19:06 ` [PATCH 1/4] line-log: avoid unnecessary tree diffs when processing merge commits SZEDER Gábor
@ 2025-08-25 14:13   ` Derrick Stolee
  2025-08-25 15:35   ` Junio C Hamano
  1 sibling, 0 replies; 12+ messages in thread
From: Derrick Stolee @ 2025-08-25 14:13 UTC (permalink / raw)
  To: SZEDER Gábor, git

On 8/24/2025 3:06 PM, SZEDER Gábor wrote:
> In process_ranges_merge_commit(), the line-level log first creates an
> array of diff queues by iterating over all parents of a merge commit
> and computing a tree diff for each.  Then in a second loop it iterates
> over those diff queues, and if it finds that none of the interesting
> paths were modified in one of them, then it will return early.  This
> means that when none of the interesting paths were modified between a
> merge and its first parent, then the tree diff between the merge and
> its second (Nth...) parent was computed in vain.

Great find! This goes all the way back to the original implementation
in 12da1d1f6f (Implement line-history search (git log -L), 2013-03-28)
where this detail could easily be missed in the rest of the scaffolding
to implement the feature.

This is an understandable mistake to make as it can take some time to
understand Git's simplified history computation and how it short-
circuits these merge diffs in most cases.
>   Summary
>     './git -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0' ran
>       2.25 ± 0.03 times faster than './git_v2.51.0 -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0'

>   Summary
>     './git -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16' ran
>       1.32 ± 0.01 times faster than './git_v2.51.0 -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16'

Great stats!
> And since now each iteration computes a tree diff and processes its
> result, there is no reason to store the diff queues for each merge
> parent anymore, so replace that diff queue array with a loop-local
> diff queue variable.  With this change the static free_diffqueues()
> helper function in 'line-log.c' has no more callers left, remove it.
> 
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  line-log.c | 20 +++++---------------
>  1 file changed, 5 insertions(+), 15 deletions(-)
> 
> diff --git a/line-log.c b/line-log.c
> index 07f2154e84..cf30915c94 100644
> --- a/line-log.c
> +++ b/line-log.c
> @@ -1087,13 +1087,6 @@ static struct diff_filepair *diff_filepair_dup(struct diff_filepair *pair)
>  	return new_filepair;
>  }
>  
> -static void free_diffqueues(int n, struct diff_queue_struct *dq)
> -{
> -	for (int i = 0; i < n; i++)
> -		diff_queue_clear(&dq[i]);
> -	free(dq);
> -}
> -
>  static int process_all_files(struct line_log_data **range_out,
>  			     struct rev_info *rev,
>  			     struct diff_queue_struct *queue,
> @@ -1209,7 +1202,6 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c
>  static int process_ranges_merge_commit(struct rev_info *rev, struct commit *commit,
>  				       struct line_log_data *range)
>  {
> -	struct diff_queue_struct *diffqueues;
>  	struct line_log_data **cand;
>  	struct commit **parents;
>  	struct commit_list *p;
> @@ -1220,20 +1212,19 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
>  	if (nparents > 1 && rev->first_parent_only)
>  		nparents = 1;
>  
> -	ALLOC_ARRAY(diffqueues, nparents);
>  	CALLOC_ARRAY(cand, nparents);
>  	ALLOC_ARRAY(parents, nparents);
>  
>  	p = commit->parents;
>  	for (i = 0; i < nparents; i++) {
> +		struct diff_queue_struct diffqueue = DIFF_QUEUE_INIT;
> +		int changed;
>  		parents[i] = p->item;
>  		p = p->next;
> -		queue_diffs(range, &rev->diffopt, &diffqueues[i], commit, parents[i]);
> -	}
> +		queue_diffs(range, &rev->diffopt, &diffqueue, commit, parents[i]);
>  
> -	for (i = 0; i < nparents; i++) {
> -		int changed;
> -		changed = process_all_files(&cand[i], rev, &diffqueues[i], range);
> +		changed = process_all_files(&cand[i], rev, &diffqueue, range);
> +		diff_queue_clear(&diffqueue);
>  		if (!changed) {
>  			/*
>  			 * This parent can take all the blame, so we
> @@ -1267,7 +1258,6 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
>  		free(cand[i]);
>  	}
>  	free(cand);
> -	free_diffqueues(nparents, diffqueues);
>  	return ret;
>  
>  	/* NEEDSWORK evil merge detection stuff */

This diff is a lot cleaner than I expected it to be. Excellent!

I applied this patch locally and tested it on a few repos I have
to give extra confidence to your patch.

For an internal monorepo, I was able to measure these results:

Benchmark 1: old
  Time (mean ± σ):     19.709 s ±  0.014 s    [User: 18.846 s, System: 0.862 s]
  Range (min … max):   19.681 s … 19.725 s    10 runs
 
Benchmark 2: new
  Time (mean ± σ):      9.061 s ±  0.015 s    [User: 8.487 s, System: 0.574 s]
  Range (min … max):    9.042 s …  9.089 s    10 runs
 
Summary
  'new' ran
    2.18 ± 0.00 times faster than 'old'

I did also want to check to see the impact of f32dde8c12 (line-log:
integrate with changed-path Bloom filters, 2020-05-11), and having
computed filters diminished the size of your impact somewhat:

Your Git example:

Benchmark 1: old
  Time (mean ± σ):     279.2 ms ±   2.2 ms    [User: 231.0 ms, System: 47.9 ms]
  Range (min … max):   275.5 ms … 282.6 ms    10 runs
 
Benchmark 2: new 
  Time (mean ± σ):     242.4 ms ±   3.6 ms    [User: 191.8 ms, System: 50.4 ms]
  Range (min … max):   237.3 ms … 249.9 ms    12 runs
 
Summary
  'new ' ran
    1.15 ± 0.02 times faster than 'old'

Your Linux example:

Benchmark 1: old
  Time (mean ± σ):      1.694 s ±  0.008 s    [User: 1.524 s, System: 0.169 s]
  Range (min … max):    1.688 s …  1.714 s    10 runs
 
Benchmark 2: new 
  Time (mean ± σ):      1.644 s ±  0.008 s    [User: 1.482 s, System: 0.161 s]
  Range (min … max):    1.636 s …  1.663 s    10 runs
 
Summary
  'new ' ran
    1.03 ± 0.01 times faster than 'old'

My internal monorepo example:

Benchmark 1: old
  Time (mean ± σ):      3.749 s ±  0.007 s    [User: 3.188 s, System: 0.559 s]
  Range (min … max):    3.736 s …  3.759 s    10 runs
 
Benchmark 2: new
  Time (mean ± σ):      2.713 s ±  0.005 s    [User: 2.318 s, System: 0.394 s]
  Range (min … max):    2.706 s …  2.723 s    10 runs
 
Summary
  'new' ran
    1.38 ± 0.00 times faster than 'old'

Rerunning with "-c commitGraph.readChangedPaths=false" resulted
in numbers closer to your examples (940ms->420ms for Git and
2.6->2.1s for Linux). I expect most users to be in the situation
where there are no changed-path Bloom filters, so this is very
good to deliver that value.

At first, I found this to be concerning: we only store filters
for the diff between a commit and its first parent, so this cost
of visiting the later parents should be _much worse_ in those
cases. However, it turns out that the way that the filters are
handled in line_log_process_ranges_arbitrary_commit() avoids a
call to process_ranges_merge_commit() if the filter says that the
first parent is TREESAME on the given path.

This means that a good chunk of the performance benefits in
f32dde8c12 (line-log: integrate with changed-path Bloom filters,
2020-05-11) are _actually_ due to avoiding this extra work for
multiple parents. Thanks for digging in and bringing this benefit
to all users!

Thanks,
-Stolee



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] line-log: optimize merge commit processing
  2025-08-24 19:06 [PATCH 0/4] line-log: optimize merge commit processing SZEDER Gábor
                   ` (3 preceding siblings ...)
  2025-08-24 19:06 ` [PATCH 4/4] line-log: simplify condition checking for merge commits SZEDER Gábor
@ 2025-08-25 14:16 ` Derrick Stolee
  4 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee @ 2025-08-25 14:16 UTC (permalink / raw)
  To: SZEDER Gábor, git

On 8/24/2025 3:06 PM, SZEDER Gábor wrote:
> The first patch is an optimization of the line-level log machinery.
> The rest are cleanups in the area that I subjectively consider slight
> improvements.
> 
> SZEDER Gábor (4):
>   line-log: avoid unnecessary tree diffs when processing merge commits

I gave a detailed double-check of your perf numbers in a direct reply
to that patch.

>   line-log: get rid of the parents array in
>     process_ranges_merge_commit()
>   line-log: initialize diff queue in process_ranges_ordinary_commit()
>   line-log: simplify condition checking for merge commits

These cleanups are very welcome. The whole series is excellent.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] line-log: avoid unnecessary tree diffs when processing merge commits
  2025-08-24 19:06 ` [PATCH 1/4] line-log: avoid unnecessary tree diffs when processing merge commits SZEDER Gábor
  2025-08-25 14:13   ` Derrick Stolee
@ 2025-08-25 15:35   ` Junio C Hamano
  2025-08-28 20:27     ` SZEDER Gábor
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2025-08-25 15:35 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git

SZEDER Gábor <szeder.dev@gmail.com> writes:

> @@ -1209,7 +1202,6 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c
>  static int process_ranges_merge_commit(struct rev_info *rev, struct commit *commit,
>  				       struct line_log_data *range)
>  {
> -	struct diff_queue_struct *diffqueues;
>  	struct line_log_data **cand;
>  	struct commit **parents;
>  	struct commit_list *p;
> @@ -1220,20 +1212,19 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
>  	if (nparents > 1 && rev->first_parent_only)
>  		nparents = 1;
>  
> -	ALLOC_ARRAY(diffqueues, nparents);
>  	CALLOC_ARRAY(cand, nparents);
>  	ALLOC_ARRAY(parents, nparents);
>  
>  	p = commit->parents;
>  	for (i = 0; i < nparents; i++) {
> +		struct diff_queue_struct diffqueue = DIFF_QUEUE_INIT;
> +		int changed;
>  		parents[i] = p->item;
>  		p = p->next;
> -		queue_diffs(range, &rev->diffopt, &diffqueues[i], commit, parents[i]);
> -	}
> +		queue_diffs(range, &rev->diffopt, &diffqueue, commit, parents[i]);
>  
> -	for (i = 0; i < nparents; i++) {
> -		int changed;
> -		changed = process_all_files(&cand[i], rev, &diffqueues[i], range);
> +		changed = process_all_files(&cand[i], rev, &diffqueue, range);
> +		diff_queue_clear(&diffqueue);
>  		if (!changed) {
>  			/*
>  			 * This parent can take all the blame, so we

This is surprisingly small change that eliminates quite a lot of
waste.  Nicely done.

> @@ -1267,7 +1258,6 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
>  		free(cand[i]);
>  	}
>  	free(cand);
> -	free_diffqueues(nparents, diffqueues);
>  	return ret;
>  
>  	/* NEEDSWORK evil merge detection stuff */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] line-log: simplify condition checking for merge commits
  2025-08-24 19:06 ` [PATCH 4/4] line-log: simplify condition checking for merge commits SZEDER Gábor
@ 2025-08-25 20:57   ` Junio C Hamano
  2025-08-25 21:43     ` Derrick Stolee
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2025-08-25 20:57 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git

SZEDER Gábor <szeder.dev@gmail.com> writes:

> In process_ranges_arbitrary_commit() the condition deciding whether
> the given commit is not a merge, i.e. that it doesn't have more than
> one parent, is head-scratchingly backwards, flip it.

Hmph, the condition is about "is it a root commit?  or is it a
single-parent commit?", which does not sound overly complicated to
me.

> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> ---
>  line-log.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/line-log.c b/line-log.c
> index 71fa857ee8..188d387d40 100644
> --- a/line-log.c
> +++ b/line-log.c
> @@ -1273,10 +1273,10 @@ int line_log_process_ranges_arbitrary_commit(struct rev_info *rev, struct commit
>  			struct line_log_data *prange = line_log_data_copy(range);
>  			add_line_range(rev, commit->parents->item, prange);
>  			clear_commit_line_range(rev, commit);
> -		} else if (!commit->parents || !commit->parents->next)
> -			changed = process_ranges_ordinary_commit(rev, commit, range);
> -		else
> +		} else if (commit->parents && commit->parents->next)
>  			changed = process_ranges_merge_commit(rev, commit, range);
> +		else
> +			changed = process_ranges_ordinary_commit(rev, commit, range);
>  	}
>  
>  	if (!changed)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] line-log: simplify condition checking for merge commits
  2025-08-25 20:57   ` Junio C Hamano
@ 2025-08-25 21:43     ` Derrick Stolee
  2025-08-25 21:57       ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Derrick Stolee @ 2025-08-25 21:43 UTC (permalink / raw)
  To: Junio C Hamano, SZEDER Gábor; +Cc: git

On 8/25/2025 4:57 PM, Junio C Hamano wrote:
> SZEDER Gábor <szeder.dev@gmail.com> writes:
> 
>> In process_ranges_arbitrary_commit() the condition deciding whether
>> the given commit is not a merge, i.e. that it doesn't have more than
>> one parent, is head-scratchingly backwards, flip it.
> 
> Hmph, the condition is about "is it a root commit?  or is it a
> single-parent commit?", which does not sound overly complicated to
> me.

It is something that one can interpret carefully by thinking about
it, but the negation and OR condition made me need to pause and
think about it, while the positive of "does it have a parent and
a second parent?" was something that flowed naturally when I read
it.

Definitely a taste thing, so I could see you wanting to skip this
one on a pure "don't touch what's not broken" policy.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] line-log: simplify condition checking for merge commits
  2025-08-25 21:43     ` Derrick Stolee
@ 2025-08-25 21:57       ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2025-08-25 21:57 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: SZEDER Gábor, git

Derrick Stolee <stolee@gmail.com> writes:

> ... but the negation and OR condition made me need to pause and
> think about it, while the positive of "does it have a parent and
> a second parent?" was something that flowed naturally when I read
> it.

Yeah, that I 100% agree with.  If it were

	if (!(c->parent && c->parent->next))
		handle_ordinary_commit();
	else
		handle_merge_commit();

that would have been very easy to grok.  I do not have a strong
preference between that and

	if (c->parent && c->parent->next)
		handle_merge_commit();
	else
		handle_ordinary_commit();

myself, but I always felt that handling ordinary commits was the
primary thing in this code path, which made me react to the swapping
of orders of these two calls.

> Definitely a taste thing, so I could see you wanting to skip this
> one on a pure "don't touch what's not broken" policy.

True, too, but the code that fails to be in a readable shape too
falls into the "broken" category, so in that sense I do not mind
queuing the patch, either (and indeed tonight's 'seen' will include
this step in the topic).

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] line-log: avoid unnecessary tree diffs when processing merge commits
  2025-08-25 15:35   ` Junio C Hamano
@ 2025-08-28 20:27     ` SZEDER Gábor
  0 siblings, 0 replies; 12+ messages in thread
From: SZEDER Gábor @ 2025-08-28 20:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, git

On Mon, Aug 25, 2025 at 08:35:53AM -0700, Junio C Hamano wrote:
> SZEDER Gábor <szeder.dev@gmail.com> writes:
> 
> > @@ -1209,7 +1202,6 @@ static int process_ranges_ordinary_commit(struct rev_info *rev, struct commit *c
> >  static int process_ranges_merge_commit(struct rev_info *rev, struct commit *commit,
> >  				       struct line_log_data *range)
> >  {
> > -	struct diff_queue_struct *diffqueues;
> >  	struct line_log_data **cand;
> >  	struct commit **parents;
> >  	struct commit_list *p;
> > @@ -1220,20 +1212,19 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
> >  	if (nparents > 1 && rev->first_parent_only)
> >  		nparents = 1;
> >  
> > -	ALLOC_ARRAY(diffqueues, nparents);
> >  	CALLOC_ARRAY(cand, nparents);
> >  	ALLOC_ARRAY(parents, nparents);
> >  
> >  	p = commit->parents;
> >  	for (i = 0; i < nparents; i++) {
> > +		struct diff_queue_struct diffqueue = DIFF_QUEUE_INIT;
> > +		int changed;
> >  		parents[i] = p->item;
> >  		p = p->next;
> > -		queue_diffs(range, &rev->diffopt, &diffqueues[i], commit, parents[i]);
> > -	}
> > +		queue_diffs(range, &rev->diffopt, &diffqueue, commit, parents[i]);
> >  
> > -	for (i = 0; i < nparents; i++) {
> > -		int changed;
> > -		changed = process_all_files(&cand[i], rev, &diffqueues[i], range);
> > +		changed = process_all_files(&cand[i], rev, &diffqueue, range);
> > +		diff_queue_clear(&diffqueue);
> >  		if (!changed) {
> >  			/*
> >  			 * This parent can take all the blame, so we
> 
> This is surprisingly small change that eliminates quite a lot of
> waste.  Nicely done.

It's funny you say that...

This patch series just turned 6 years old this weekend, and up until
Sunday morning this first patch was actually two, because the
optimization and the removal of the now unnecessary diffqueues array
were two separate patches that I finally decided to squash together.

Here is the diff of that optimization-only patch :)

  ---- >8 ----

 line-log.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/line-log.c b/line-log.c
index 07f2154e84..b3766c67ea 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1220,19 +1220,17 @@ static int process_ranges_merge_commit(struct rev_info *rev, struct commit *comm
 	if (nparents > 1 && rev->first_parent_only)
 		nparents = 1;
 
-	ALLOC_ARRAY(diffqueues, nparents);
+	CALLOC_ARRAY(diffqueues, nparents);
 	CALLOC_ARRAY(cand, nparents);
 	ALLOC_ARRAY(parents, nparents);
 
 	p = commit->parents;
 	for (i = 0; i < nparents; i++) {
+		int changed;
 		parents[i] = p->item;
 		p = p->next;
 		queue_diffs(range, &rev->diffopt, &diffqueues[i], commit, parents[i]);
-	}
 
-	for (i = 0; i < nparents; i++) {
-		int changed;
 		changed = process_all_files(&cand[i], rev, &diffqueues[i], range);
 		if (!changed) {
 			/*

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-08-28 20:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-24 19:06 [PATCH 0/4] line-log: optimize merge commit processing SZEDER Gábor
2025-08-24 19:06 ` [PATCH 1/4] line-log: avoid unnecessary tree diffs when processing merge commits SZEDER Gábor
2025-08-25 14:13   ` Derrick Stolee
2025-08-25 15:35   ` Junio C Hamano
2025-08-28 20:27     ` SZEDER Gábor
2025-08-24 19:06 ` [PATCH 2/4] line-log: get rid of the parents array in process_ranges_merge_commit() SZEDER Gábor
2025-08-24 19:06 ` [PATCH 3/4] line-log: initialize diff queue in process_ranges_ordinary_commit() SZEDER Gábor
2025-08-24 19:06 ` [PATCH 4/4] line-log: simplify condition checking for merge commits SZEDER Gábor
2025-08-25 20:57   ` Junio C Hamano
2025-08-25 21:43     ` Derrick Stolee
2025-08-25 21:57       ` Junio C Hamano
2025-08-25 14:16 ` [PATCH 0/4] line-log: optimize merge commit processing Derrick Stolee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).