* [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined()
@ 2025-09-05 13:06 Toon Claes
2025-09-05 15:27 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Toon Claes @ 2025-09-05 13:06 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Christian Couder, Toon Claes
Function diff_tree_combined() copies the 'struct diff_options' from the
input 'struct rev_info' to override some flags. One flag was
'recursive', which is set to 1. This has been the case since the
inception of this function in af3feefa1d (diff-tree -c: show a merge
commit a bit more sensibly., 2006-01-24). From that commit there's no
clear indication why recursive is overridden. But this breaks the
recently introduced subcommand git-last-modified(1) in some cases
(i.e. when criss-cross merges are involved). It then would exit with the
error:
BUG: paths remaining beyond boundary in last-modified
The last-modified machinery uses a hashmap for all the files it wants to
get the last-modified commit for. Through log_tree_commit() the callback
mark_path() is called. Here the incoming path is looked up in the
hashmap, but because the diff-tree machinery internally switched to
recursive (even if the last-modified machinery wasn't) the entry for
'dir' was never expelled from the hashmap, and the BUG() statement was
hit.
Remove overriding 'diffopt.flags.recursive' in diff_tree_combined() and
move setting this flag to the callers which relied on the recursive
behavior.
Because internally diff-tree no longer runs recursive, this results in a
nice speedup when running `git last-modified` on git.git:
Benchmark 1: next
Time (mean ± σ): 6.068 s ± 0.038 s [User: 5.268 s, System: 0.187 s]
Range (min … max): 6.007 s … 6.123 s 10 runs
Benchmark 2: HEAD
Time (mean ± σ): 3.057 s ± 0.009 s [User: 2.996 s, System: 0.055 s]
Range (min … max): 3.047 s … 3.070 s 10 runs
Summary
HEAD ran
1.98 ± 0.01 times faster than next
Signed-off-by: Toon Claes <toon@iotcl.com>
---
git-last-modified(1) was recently merged into 'next', but upon use
I've (well actually Patrick) noticed it has a bug:
$ git last-modified
3f3e11118993fe0500b3957ab798e39caaa3952f .gitignore
3f3e11118993fe0500b3957ab798e39caaa3952f Makefile
3f3e11118993fe0500b3957ab798e39caaa3952f builtin.h
3f3e11118993fe0500b3957ab798e39caaa3952f command-list.txt
3f3e11118993fe0500b3957ab798e39caaa3952f commit-graph.c
3f3e11118993fe0500b3957ab798e39caaa3952f git.c
3f3e11118993fe0500b3957ab798e39caaa3952f meson.build
56072ff0384da5d874fc378d36e089a18f28f1e3 fetch-pack.c
457534d0417d047b943f76a849f256b739894ce9 progress.c
0d8f4ccfe3b13bb5eb95f030dc5fe76efb255397 for-each-ref.h
109c3df14ccf372c2438a470bdfb566265399f0a combine-diff.c
109c3df14ccf372c2438a470bdfb566265399f0a diff.c
109c3df14ccf372c2438a470bdfb566265399f0a diff.h
109c3df14ccf372c2438a470bdfb566265399f0a dir.c
[snip]
9e5878fbede57c0499133adf73844261849cd7b2 git-web--browse.sh
b2fb3911eab730a08168c7f85a7935ad5a330b53 config.mak.in
36268b762c4aa6a0d4831f69852b20ab545aff4d LGPL-2.1
1e58dba142c673c59fbb9d10aeecf62217d4fc9c aclocal.m4
9517e6b84357252e1882091343661c34d978771e levenshtein.h
703601d6780c32d33dadf19b2b367f2f79e1e34c COPYING
BUG: ../builtin/last-modified.c:236: paths remaining beyond boundary in last-modified
zsh: IOT instruction (core dumped) git last-modified
I didn't see this before but the BUG() statement was only moved up in
the last version of the patch series, and it only comes into trouble
with merge commits.
This series fixes that issue.
This patch is based on 'next' at 1ba7204a04 (Merge branch
'kh/doc-markup-fixes' into next, 2025-09-03).
---
Changes in v2:
- Do not change behavioral change in `git diff-tree -c`.
- Small change in the test to make it pass on Windows.
- Link to v1: https://lore.kernel.org/r/20250904-toon-fix-last-modified-v1-1-91bf87ddf62b@iotcl.com
---
builtin/diff-tree.c | 13 ++++++++++++-
combine-diff.c | 1 -
submodule.c | 1 +
t/t8020-last-modified.sh | 16 ++++++++++++++++
4 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 49dd4d00ebf1bcc644383ee99df3a9e05502b89b..6ef0438d36dd1091d2806b46306f2b5ee7274bc0 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -16,10 +16,21 @@ static struct rev_info log_tree_opt;
static int diff_tree_commit_oid(const struct object_id *oid)
{
+ struct rev_info rev_info = log_tree_opt;
struct commit *commit = lookup_commit_reference(the_repository, oid);
+
if (!commit)
return -1;
- return log_tree_commit(&log_tree_opt, commit);
+
+ /*
+ * log_tree_commit() calls into diff_tree_combined() which historically
+ * used to enable diffopt.flags.recursive when option '-c' is given.
+ * To not break backward compatibility, set 'resursive' here.
+ */
+ if (rev_info.combine_merges)
+ rev_info.diffopt.flags.recursive = 1;
+
+ return log_tree_commit(&rev_info, commit);
}
/* Diff one or more commits. */
diff --git a/combine-diff.c b/combine-diff.c
index 3878faabe7bb2f7c80cffbf3add6123f17960627..305414efdf436d53fee8d79aa4219f6a4dd3445e 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1515,7 +1515,6 @@ void diff_tree_combined(const struct object_id *oid,
diffopts = *opt;
copy_pathspec(&diffopts.pathspec, &opt->pathspec);
- diffopts.flags.recursive = 1;
diffopts.flags.allow_external = 0;
/* find set of paths that everybody touches
diff --git a/submodule.c b/submodule.c
index fff3c755703163da423d5978b8bdf0d36c6f8ea9..1c273ec87fa865b635ffb9343cfd28744524c737 100644
--- a/submodule.c
+++ b/submodule.c
@@ -916,6 +916,7 @@ static void collect_changed_submodules(struct repository *r,
diff_rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
diff_rev.diffopt.format_callback = collect_changed_submodules_cb;
diff_rev.diffopt.format_callback_data = &data;
+ diff_rev.diffopt.flags.recursive = 1;
diff_rev.dense_combined_merges = 1;
diff_tree_combined_merge(commit, &diff_rev);
release_revisions(&diff_rev);
diff --git a/t/t8020-last-modified.sh b/t/t8020-last-modified.sh
index 5eb4cef0359212eeddc68578697ca432760e0be3..e13aad14398dd9055ff45ab57700329e7aab37c4 100755
--- a/t/t8020-last-modified.sh
+++ b/t/t8020-last-modified.sh
@@ -128,6 +128,22 @@ test_expect_success 'only last-modified files in the current tree' '
EOF
'
+test_expect_success 'last-modified with subdir and criss-cross merge' '
+ git checkout -b branch-k1 1 &&
+ mkdir -p a k &&
+ test_commit k1 a/file2 &&
+ git checkout -b branch-k2 &&
+ test_commit k2 k/file2 &&
+ git checkout branch-k1 &&
+ test_merge km2 branch-k2 &&
+ test_merge km3 3 &&
+ check_last_modified <<-\EOF
+ km3 a
+ k2 k
+ 1 file
+ EOF
+'
+
test_expect_success 'cross merge boundaries in blaming' '
git checkout HEAD^0 &&
git rm -rf . &&
---
base-commit: 1ba7204a041bf9fa3af3ad21a018399fff66f7b9
change-id: 20250902-toon-fix-last-modified-865172060265
Best regards,
--
Toon Claes <toon@iotcl.com>
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined()
2025-09-05 13:06 [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined() Toon Claes
@ 2025-09-05 15:27 ` Junio C Hamano
2025-09-05 15:37 ` Kristoffer Haugsbakk
2025-09-08 12:19 ` Toon Claes
2025-09-16 7:11 ` Patrick Steinhardt
2025-09-18 8:00 ` [PATCH v3] last-modified: fix bug when some paths remain unhandled Toon Claes
2 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2025-09-05 15:27 UTC (permalink / raw)
To: Toon Claes; +Cc: git, Patrick Steinhardt, Christian Couder
Toon Claes <toon@iotcl.com> writes:
> This patch is based on 'next' at 1ba7204a04 (Merge branch
> 'kh/doc-markup-fixes' into next, 2025-09-03).
Can't you be a bit more specific? We usually say "do not build on
'next'", but what we really mean by that statement are
* Your topic may interact with some topics already in 'next', but
it is unlikely that you depend on _all_ of them. If you are
willing to depend on a few selected topics (meaning: you accept
that you have to adjust your topic when they get updated, and you
accept that you cannot graduate before all of them graduate),
identify them and build on the result of a merge of these topics
into 'master'. State how you constructed your base in your cover
letter.
* If you are truly depending on _everything_ in 'next', then stop.
Wait until all of them graduates, and then submit your topic
after that.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined()
2025-09-05 15:27 ` Junio C Hamano
@ 2025-09-05 15:37 ` Kristoffer Haugsbakk
2025-09-08 12:19 ` Toon Claes
1 sibling, 0 replies; 9+ messages in thread
From: Kristoffer Haugsbakk @ 2025-09-05 15:37 UTC (permalink / raw)
To: Junio C Hamano, Toon Claes; +Cc: git, Patrick Steinhardt, Christian Couder
On Fri, Sep 5, 2025, at 17:27, Junio C Hamano wrote:
> Toon Claes <toon@iotcl.com> writes:
>
>> This patch is based on 'next' at 1ba7204a04 (Merge branch
>> 'kh/doc-markup-fixes' into next, 2025-09-03).
>
> Can't you be a bit more specific? We usually say "do not build on
> 'next'", but what we really mean by that statement are
>
> * Your topic may interact with some topics already in 'next', but
> it is unlikely that you depend on _all_ of them. If you are
> willing to depend on a few selected topics (meaning: you accept
> that you have to adjust your topic when they get updated, and you
> accept that you cannot graduate before all of them graduate),
> identify them and build on the result of a merge of these topics
> into 'master'. State how you constructed your base in your cover
> letter.
>
> * If you are truly depending on _everything_ in 'next', then stop.
> Wait until all of them graduates, and then submit your topic
> after that.
It seemed like Toon was applying the part in SubmittingPatches about
applying fixup patches on top of the series once the series is in
`next`.[1] But then used `next` as the base instead of the series.
Of course that simple topic named kh/doc-markup-fixes is irrelevant to
what he is fixing up. ;)
† 1: “After the patches are merged to the 'next' branch,”
--
Kristoffer Haugsbakk
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined()
2025-09-05 15:27 ` Junio C Hamano
2025-09-05 15:37 ` Kristoffer Haugsbakk
@ 2025-09-08 12:19 ` Toon Claes
1 sibling, 0 replies; 9+ messages in thread
From: Toon Claes @ 2025-09-08 12:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Christian Couder
Junio C Hamano <gitster@pobox.com> writes:
> Toon Claes <toon@iotcl.com> writes:
>
>> This patch is based on 'next' at 1ba7204a04 (Merge branch
>> 'kh/doc-markup-fixes' into next, 2025-09-03).
>
> Can't you be a bit more specific? We usually say "do not build on
> 'next'", but what we really mean by that statement are
Sorry, I didn't realize I didn't follow proper process here.
Anyhow, this patch is based on 'master' at 2462961280 (The sixth batch,
2025-09-02) with the following branch merged into: 'tc/last-modified' at
8d9a7cdfda (last-modified: use Bloom filters when available, 2025-08-05)
--
Toon
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined()
2025-09-05 13:06 [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined() Toon Claes
2025-09-05 15:27 ` Junio C Hamano
@ 2025-09-16 7:11 ` Patrick Steinhardt
2025-09-16 15:22 ` Toon Claes
2025-09-18 8:00 ` [PATCH v3] last-modified: fix bug when some paths remain unhandled Toon Claes
2 siblings, 1 reply; 9+ messages in thread
From: Patrick Steinhardt @ 2025-09-16 7:11 UTC (permalink / raw)
To: Toon Claes; +Cc: git, Christian Couder
On Fri, Sep 05, 2025 at 03:06:31PM +0200, Toon Claes wrote:
> Function diff_tree_combined() copies the 'struct diff_options' from the
> input 'struct rev_info' to override some flags. One flag was
> 'recursive', which is set to 1. This has been the case since the
> inception of this function in af3feefa1d (diff-tree -c: show a merge
> commit a bit more sensibly., 2006-01-24). From that commit there's no
> clear indication why recursive is overridden. But this breaks the
> recently introduced subcommand git-last-modified(1) in some cases
> (i.e. when criss-cross merges are involved). It then would exit with the
> error:
>
> BUG: paths remaining beyond boundary in last-modified
>
> The last-modified machinery uses a hashmap for all the files it wants to
> get the last-modified commit for. Through log_tree_commit() the callback
> mark_path() is called. Here the incoming path is looked up in the
> hashmap, but because the diff-tree machinery internally switched to
> recursive (even if the last-modified machinery wasn't) the entry for
> 'dir' was never expelled from the hashmap, and the BUG() statement was
> hit.
Okay. So if I understand correctly, the issue here is that once we do a
recursive diff we skip directory entries and only print blobs. And
because of that we never manage to blame the directory to any specific
commit.
What this doesn't explain though is how this is related to criss-cross
merges. Why does the error not happen with "normal" merges?
> Remove overriding 'diffopt.flags.recursive' in diff_tree_combined() and
> move setting this flag to the callers which relied on the recursive
> behavior.
>
> Because internally diff-tree no longer runs recursive, this results in a
> nice speedup when running `git last-modified` on git.git:
I assume this is only the case for `git last-modified --no-recursive`,
right? Do we know to set `diffopt.flags.recursive` in that case already,
or do we handle recursion manually in that case?
> diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
> index 49dd4d00ebf1bcc644383ee99df3a9e05502b89b..6ef0438d36dd1091d2806b46306f2b5ee7274bc0 100644
> --- a/builtin/diff-tree.c
> +++ b/builtin/diff-tree.c
> @@ -16,10 +16,21 @@ static struct rev_info log_tree_opt;
>
> static int diff_tree_commit_oid(const struct object_id *oid)
> {
> + struct rev_info rev_info = log_tree_opt;
> struct commit *commit = lookup_commit_reference(the_repository, oid);
> +
> if (!commit)
> return -1;
> - return log_tree_commit(&log_tree_opt, commit);
> +
> + /*
> + * log_tree_commit() calls into diff_tree_combined() which historically
> + * used to enable diffopt.flags.recursive when option '-c' is given.
> + * To not break backward compatibility, set 'resursive' here.
> + */
> + if (rev_info.combine_merges)
> + rev_info.diffopt.flags.recursive = 1;
> +
> + return log_tree_commit(&rev_info, commit);
> }
>
> /* Diff one or more commits. */
Don't we have to do the same dance for `stdin_diff_commit()`? There's
also callsites in git-am(1), git-log(1) and others. Why don't we have to
adjust any of those?
> diff --git a/combine-diff.c b/combine-diff.c
> index 3878faabe7bb2f7c80cffbf3add6123f17960627..305414efdf436d53fee8d79aa4219f6a4dd3445e 100644
> --- a/combine-diff.c
> +++ b/combine-diff.c
> @@ -1515,7 +1515,6 @@ void diff_tree_combined(const struct object_id *oid,
>
> diffopts = *opt;
> copy_pathspec(&diffopts.pathspec, &opt->pathspec);
> - diffopts.flags.recursive = 1;
> diffopts.flags.allow_external = 0;
>
> /* find set of paths that everybody touches
With the above I'm a bit worried that we're changing behaviour for
direct or indirect callers without noticing. Our tests may not detect
any regressions, but that doesn't really prove that there is none.
An alternative approach could be to introduce a new flag that causes us
to not override the `recursive` flag. It's quite an ugly workaround and
makes the infra even weirder. But at least we could be sure that we
don't alter behaviour inadvertently.
Other than that I don't really have much of a better idea.
Patrick
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined()
2025-09-16 7:11 ` Patrick Steinhardt
@ 2025-09-16 15:22 ` Toon Claes
2025-09-18 5:49 ` Patrick Steinhardt
0 siblings, 1 reply; 9+ messages in thread
From: Toon Claes @ 2025-09-16 15:22 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Christian Couder
Patrick Steinhardt <ps@pks.im> writes:
> On Fri, Sep 05, 2025 at 03:06:31PM +0200, Toon Claes wrote:
>> Function diff_tree_combined() copies the 'struct diff_options' from the
>> input 'struct rev_info' to override some flags. One flag was
>> 'recursive', which is set to 1. This has been the case since the
>> inception of this function in af3feefa1d (diff-tree -c: show a merge
>> commit a bit more sensibly., 2006-01-24). From that commit there's no
>> clear indication why recursive is overridden. But this breaks the
>> recently introduced subcommand git-last-modified(1) in some cases
>> (i.e. when criss-cross merges are involved). It then would exit with the
>> error:
>>
>> BUG: paths remaining beyond boundary in last-modified
>>
>> The last-modified machinery uses a hashmap for all the files it wants to
>> get the last-modified commit for. Through log_tree_commit() the callback
>> mark_path() is called. Here the incoming path is looked up in the
>> hashmap, but because the diff-tree machinery internally switched to
>> recursive (even if the last-modified machinery wasn't) the entry for
>> 'dir' was never expelled from the hashmap, and the BUG() statement was
>> hit.
>
> Okay. So if I understand correctly, the issue here is that once we do a
> recursive diff we skip directory entries and only print blobs. And
> because of that we never manage to blame the directory to any specific
> commit.
>
> What this doesn't explain though is how this is related to criss-cross
> merges. Why does the error not happen with "normal" merges?
To be honest, I don't exactly understand what's happening. Depending on
conditions(??) it determines a change is attributed to a merge commit.
When this is the cases, we run into a different code path in
log_tree_diff() in log-tree.c (see `is_merge`). One scenario I managed
to reproduce is condition is when using criss-cross merges. It seems to
me the diff machinery tries to find the commit that actually introduced
the change, ignoring the commit that merged that commit. In case of
criss-cross merges, it's hard to find the commit that made the change,
so it attributes the change to a merge commit. And then we run into the
code path that was causing this error.
>> Remove overriding 'diffopt.flags.recursive' in diff_tree_combined() and
>> move setting this flag to the callers which relied on the recursive
>> behavior.
>>
>> Because internally diff-tree no longer runs recursive, this results in a
>> nice speedup when running `git last-modified` on git.git:
>
> I assume this is only the case for `git last-modified --no-recursive`,
> right?
Yes, that's the default.
> Do we know to set `diffopt.flags.recursive` in that case already,
> or do we handle recursion manually in that case?
git-last-modified sets that flag when `-r` or `--recursive` is given.
That flag is passed down to diff_tree_combined(), which reuses that
value (and no longer overrided it). Does that answer your question?
>> diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
>> index 49dd4d00ebf1bcc644383ee99df3a9e05502b89b..6ef0438d36dd1091d2806b46306f2b5ee7274bc0 100644
>> --- a/builtin/diff-tree.c
>> +++ b/builtin/diff-tree.c
>> @@ -16,10 +16,21 @@ static struct rev_info log_tree_opt;
>>
>> static int diff_tree_commit_oid(const struct object_id *oid)
>> {
>> + struct rev_info rev_info = log_tree_opt;
>> struct commit *commit = lookup_commit_reference(the_repository, oid);
>> +
>> if (!commit)
>> return -1;
>> - return log_tree_commit(&log_tree_opt, commit);
>> +
>> + /*
>> + * log_tree_commit() calls into diff_tree_combined() which historically
>> + * used to enable diffopt.flags.recursive when option '-c' is given.
>> + * To not break backward compatibility, set 'resursive' here.
>> + */
>> + if (rev_info.combine_merges)
>> + rev_info.diffopt.flags.recursive = 1;
>> +
>> + return log_tree_commit(&rev_info, commit);
>> }
>>
>> /* Diff one or more commits. */
>
> Don't we have to do the same dance for `stdin_diff_commit()`? There's
> also callsites in git-am(1), git-log(1) and others. Why don't we have to
> adjust any of those?
Good question. I can address these in the next version.
>> diff --git a/combine-diff.c b/combine-diff.c
>> index 3878faabe7bb2f7c80cffbf3add6123f17960627..305414efdf436d53fee8d79aa4219f6a4dd3445e 100644
>> --- a/combine-diff.c
>> +++ b/combine-diff.c
>> @@ -1515,7 +1515,6 @@ void diff_tree_combined(const struct object_id *oid,
>>
>> diffopts = *opt;
>> copy_pathspec(&diffopts.pathspec, &opt->pathspec);
>> - diffopts.flags.recursive = 1;
>> diffopts.flags.allow_external = 0;
>>
>> /* find set of paths that everybody touches
>
> With the above I'm a bit worried that we're changing behaviour for
> direct or indirect callers without noticing. Our tests may not detect
> any regressions, but that doesn't really prove that there is none.
As I've been trying to explain in the first version[1] of my patch, I
would say my change fixes a bug. Agreed, it's an ancient old bug, I
still consider it a bug. I think in practice no one will notice this
change in behaviour, because otherwise they would have complained about
the buggy behaviour.
For the callsites we didn't address, I'm doubtful anyone will notice a
change. Also, as I'm laying out above, we only see a bug in rare edge
cases. That's also why we don't see any test failures.
So yes, there might be a change in behaviour, but it will be correct and
barely anyone will notice.
Unfortunately I don't have any data to prove this.
> An alternative approach could be to introduce a new flag that causes us
> to not override the `recursive` flag. It's quite an ugly workaround and
> makes the infra even weirder. But at least we could be sure that we
> don't alter behaviour inadvertently.
Personally, I would rather not do this. It will create more tech debt,
on top of the tech debt that is the odd-ish behaviour.
> Other than that I don't really have much of a better idea.
Yeah, I've been trying some things, but I also have no other idea.
[1]: https://lore.kernel.org/git/20250904-toon-fix-last-modified-v1-1-91bf87ddf62b@iotcl.com/
--
Cheers,
Toon
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined()
2025-09-16 15:22 ` Toon Claes
@ 2025-09-18 5:49 ` Patrick Steinhardt
0 siblings, 0 replies; 9+ messages in thread
From: Patrick Steinhardt @ 2025-09-18 5:49 UTC (permalink / raw)
To: Toon Claes; +Cc: git, Christian Couder
On Tue, Sep 16, 2025 at 05:22:21PM +0200, Toon Claes wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > Do we know to set `diffopt.flags.recursive` in that case already,
> > or do we handle recursion manually in that case?
>
> git-last-modified sets that flag when `-r` or `--recursive` is given.
> That flag is passed down to diff_tree_combined(), which reuses that
> value (and no longer overrided it). Does that answer your question?
Yup, it does.
> >> diff --git a/combine-diff.c b/combine-diff.c
> >> index 3878faabe7bb2f7c80cffbf3add6123f17960627..305414efdf436d53fee8d79aa4219f6a4dd3445e 100644
> >> --- a/combine-diff.c
> >> +++ b/combine-diff.c
> >> @@ -1515,7 +1515,6 @@ void diff_tree_combined(const struct object_id *oid,
> >>
> >> diffopts = *opt;
> >> copy_pathspec(&diffopts.pathspec, &opt->pathspec);
> >> - diffopts.flags.recursive = 1;
> >> diffopts.flags.allow_external = 0;
> >>
> >> /* find set of paths that everybody touches
> >
> > With the above I'm a bit worried that we're changing behaviour for
> > direct or indirect callers without noticing. Our tests may not detect
> > any regressions, but that doesn't really prove that there is none.
>
> As I've been trying to explain in the first version[1] of my patch, I
> would say my change fixes a bug. Agreed, it's an ancient old bug, I
> still consider it a bug. I think in practice no one will notice this
> change in behaviour, because otherwise they would have complained about
> the buggy behaviour.
>
> For the callsites we didn't address, I'm doubtful anyone will notice a
> change. Also, as I'm laying out above, we only see a bug in rare edge
> cases. That's also why we don't see any test failures.
>
> So yes, there might be a change in behaviour, but it will be correct and
> barely anyone will notice.
>
> Unfortunately I don't have any data to prove this.
The old code certainly feels fishy, that much I can say. Other than that
I'm not knowledgeable enough in this code area to have an informed
opinion. The old mailing list thread [1] doesn't surface any useful
info, either.
Patrick
[1]: https://lore.kernel.org/git/7vwtgqas0y.fsf@assigned-by-dhcp.cox.net/
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3] last-modified: fix bug when some paths remain unhandled
2025-09-05 13:06 [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined() Toon Claes
2025-09-05 15:27 ` Junio C Hamano
2025-09-16 7:11 ` Patrick Steinhardt
@ 2025-09-18 8:00 ` Toon Claes
2025-09-18 15:18 ` Junio C Hamano
2 siblings, 1 reply; 9+ messages in thread
From: Toon Claes @ 2025-09-18 8:00 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt, Christian Couder, Toon Claes
The recently introduced new subcommand git-last-modified(1) runs into an
error in some scenarios. It then would exit with the message:
BUG: paths remaining beyond boundary in last-modified
This seems to happens for example when criss-cross merges are involved.
In that scenario, the function diff_tree_combined() gets called.
The function diff_tree_combined() copies the `struct diff_options` from
the input `struct rev_info` to override some flags. One flag is
`recursive`, which is always set to 1. This has been the case since the
inception of this function in af3feefa1d (diff-tree -c: show a merge
commit a bit more sensibly., 2006-01-24).
This behavior is incompatible with git-last-modified(1), when called
non-recursive (which is the default).
The last-modified machinery uses a hashmap for all the paths it wants to
get the last-modified commit for. Through log_tree_commit() the callback
mark_path() is called. The diff machinery uses diff_tree_combined()
internally, and due to it's recursive behavior the callback receives
entries inside subtrees, but not the subtree entries themselves. So a
directory is never expelled from the hashmap, and the BUG() statement
gets hit.
Because there are many callers calling into diff_tree_combined(), both
directly and indirectly, we cannot simply change it's behavior.
Instead, add a flag `no_recursive_diff_tree_combined` which supresses
the behavior of diff_tree_combined() to override `recursive` and set
this flag in builtin/last-modified.c.
Signed-off-by: Toon Claes <toon@iotcl.com>
---
git-last-modified(1) was recently merged into 'master', but upon use
I've (well actually Patrick) noticed it has a bug:
$ git last-modified next
3f3e11118993fe0500b3957ab798e39caaa3952f .gitignore
3f3e11118993fe0500b3957ab798e39caaa3952f Makefile
3f3e11118993fe0500b3957ab798e39caaa3952f builtin.h
3f3e11118993fe0500b3957ab798e39caaa3952f command-list.txt
3f3e11118993fe0500b3957ab798e39caaa3952f commit-graph.c
3f3e11118993fe0500b3957ab798e39caaa3952f git.c
3f3e11118993fe0500b3957ab798e39caaa3952f meson.build
56072ff0384da5d874fc378d36e089a18f28f1e3 fetch-pack.c
457534d0417d047b943f76a849f256b739894ce9 progress.c
0d8f4ccfe3b13bb5eb95f030dc5fe76efb255397 for-each-ref.h
109c3df14ccf372c2438a470bdfb566265399f0a combine-diff.c
109c3df14ccf372c2438a470bdfb566265399f0a diff.c
109c3df14ccf372c2438a470bdfb566265399f0a diff.h
109c3df14ccf372c2438a470bdfb566265399f0a dir.c
[snip]
9e5878fbede57c0499133adf73844261849cd7b2 git-web--browse.sh
b2fb3911eab730a08168c7f85a7935ad5a330b53 config.mak.in
36268b762c4aa6a0d4831f69852b20ab545aff4d LGPL-2.1
1e58dba142c673c59fbb9d10aeecf62217d4fc9c aclocal.m4
9517e6b84357252e1882091343661c34d978771e levenshtein.h
703601d6780c32d33dadf19b2b367f2f79e1e34c COPYING
BUG: ../builtin/last-modified.c:236: paths remaining beyond boundary in last-modified
zsh: IOT instruction (core dumped) git last-modified
This series fixes that issue.
---
Changes in v3:
- Instead of changing diff_tree_combined()'s default behavior, add a
flag to modify it's behavior.
- Overhaul the commit message to match the change in strategy.
- Link to v2: https://lore.kernel.org/r/20250905-toon-fix-last-modified-v2-1-d859eeed408e@iotcl.com
Changes in v2:
- Do not change behavioral change in `git diff-tree -c`.
- Small change in the test to make it pass on Windows.
- Link to v1: https://lore.kernel.org/r/20250904-toon-fix-last-modified-v1-1-91bf87ddf62b@iotcl.com
---
builtin/last-modified.c | 1 +
combine-diff.c | 3 ++-
diff.h | 7 +++++++
t/t8020-last-modified.sh | 16 ++++++++++++++++
4 files changed, 26 insertions(+), 1 deletion(-)
diff --git a/builtin/last-modified.c b/builtin/last-modified.c
index 886ba12cb5..ae8b36a2c3 100644
--- a/builtin/last-modified.c
+++ b/builtin/last-modified.c
@@ -265,6 +265,7 @@ static int last_modified_init(struct last_modified *lm, struct repository *r,
lm->rev.boundary = 1;
lm->rev.no_commit_id = 1;
lm->rev.diff = 1;
+ lm->rev.diffopt.flags.no_recursive_diff_tree_combined = 1;
lm->rev.diffopt.flags.recursive = lm->recursive;
lm->rev.diffopt.flags.tree_in_recursive = lm->show_trees;
diff --git a/combine-diff.c b/combine-diff.c
index 3878faabe7..e779b86e0b 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1515,8 +1515,9 @@ void diff_tree_combined(const struct object_id *oid,
diffopts = *opt;
copy_pathspec(&diffopts.pathspec, &opt->pathspec);
- diffopts.flags.recursive = 1;
diffopts.flags.allow_external = 0;
+ if (!opt->flags.no_recursive_diff_tree_combined)
+ diffopts.flags.recursive = 1;
/* find set of paths that everybody touches
*
diff --git a/diff.h b/diff.h
index 9bb939a4f1..df8f7643b0 100644
--- a/diff.h
+++ b/diff.h
@@ -126,6 +126,13 @@ struct diff_flags {
unsigned recursive;
unsigned tree_in_recursive;
+ /*
+ * Historically diff_tree_combined() overrides recursive to 1. To
+ * suppress this behavior, set the flag below.
+ * It has no effect if recursive is already set to 1.
+ */
+ unsigned no_recursive_diff_tree_combined;
+
/* Affects the way how a file that is seemingly binary is treated. */
unsigned binary;
unsigned text;
diff --git a/t/t8020-last-modified.sh b/t/t8020-last-modified.sh
index 5eb4cef035..e13aad1439 100755
--- a/t/t8020-last-modified.sh
+++ b/t/t8020-last-modified.sh
@@ -128,6 +128,22 @@ test_expect_success 'only last-modified files in the current tree' '
EOF
'
+test_expect_success 'last-modified with subdir and criss-cross merge' '
+ git checkout -b branch-k1 1 &&
+ mkdir -p a k &&
+ test_commit k1 a/file2 &&
+ git checkout -b branch-k2 &&
+ test_commit k2 k/file2 &&
+ git checkout branch-k1 &&
+ test_merge km2 branch-k2 &&
+ test_merge km3 3 &&
+ check_last_modified <<-\EOF
+ km3 a
+ k2 k
+ 1 file
+ EOF
+'
+
test_expect_success 'cross merge boundaries in blaming' '
git checkout HEAD^0 &&
git rm -rf . &&
---
base-commit: 215033b3ac599432a17d58f18a92b356d98354a9
change-id: 20250902-toon-fix-last-modified-865172060265
Best regards,
--
Toon Claes <toon@iotcl.com>
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3] last-modified: fix bug when some paths remain unhandled
2025-09-18 8:00 ` [PATCH v3] last-modified: fix bug when some paths remain unhandled Toon Claes
@ 2025-09-18 15:18 ` Junio C Hamano
0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2025-09-18 15:18 UTC (permalink / raw)
To: Toon Claes; +Cc: git, Patrick Steinhardt, Christian Couder
Toon Claes <toon@iotcl.com> writes:
> The recently introduced new subcommand git-last-modified(1) runs into an
> error in some scenarios. It then would exit with the message:
>
> BUG: paths remaining beyond boundary in last-modified
>
> This seems to happens for example when criss-cross merges are involved.
"seems to happens for example when" -> "seems to happen, for
example, when".
"In some scenarios" followed by "seems to" is not particularly
convincing that we know what the cause of the problem, though.
> In that scenario, the function diff_tree_combined() gets called.
and in a simpler history that does not exhibit the problem, we
traverse without calling diff_tree_combined()? What do we use
instead?
> The function diff_tree_combined() copies the `struct diff_options` from
> the input `struct rev_info` to override some flags. One flag is
> `recursive`, which is always set to 1. This has been the case since the
> inception of this function in af3feefa1d (diff-tree -c: show a merge
> commit a bit more sensibly., 2006-01-24).
>
> This behavior is incompatible with git-last-modified(1), when called
> non-recursive (which is the default).
"non-recursively"?
> Changes in v3:
> - Instead of changing diff_tree_combined()'s default behavior, add a
> flag to modify it's behavior.
> - Overhaul the commit message to match the change in strategy.
> - Link to v2: https://lore.kernel.org/r/20250905-toon-fix-last-modified-v2-1-d859eeed408e@iotcl.com
I agree that this surgical approach may be the safest without
risking regression to all other callers.
Will queue on top of recent 'master' that already has the
tc/last-modified topic.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-09-18 15:18 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-05 13:06 [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined() Toon Claes
2025-09-05 15:27 ` Junio C Hamano
2025-09-05 15:37 ` Kristoffer Haugsbakk
2025-09-08 12:19 ` Toon Claes
2025-09-16 7:11 ` Patrick Steinhardt
2025-09-16 15:22 ` Toon Claes
2025-09-18 5:49 ` Patrick Steinhardt
2025-09-18 8:00 ` [PATCH v3] last-modified: fix bug when some paths remain unhandled Toon Claes
2025-09-18 15:18 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).