From: Patrick Steinhardt <ps@pks.im>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org, Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined()
Date: Tue, 16 Sep 2025 09:11:04 +0200 [thread overview]
Message-ID: <aMkNiC_s-Dt4iRPp@pks.im> (raw)
In-Reply-To: <20250905-toon-fix-last-modified-v2-1-d859eeed408e@iotcl.com>
On Fri, Sep 05, 2025 at 03:06:31PM +0200, Toon Claes wrote:
> Function diff_tree_combined() copies the 'struct diff_options' from the
> input 'struct rev_info' to override some flags. One flag was
> 'recursive', which is set to 1. This has been the case since the
> inception of this function in af3feefa1d (diff-tree -c: show a merge
> commit a bit more sensibly., 2006-01-24). From that commit there's no
> clear indication why recursive is overridden. But this breaks the
> recently introduced subcommand git-last-modified(1) in some cases
> (i.e. when criss-cross merges are involved). It then would exit with the
> error:
>
> BUG: paths remaining beyond boundary in last-modified
>
> The last-modified machinery uses a hashmap for all the files it wants to
> get the last-modified commit for. Through log_tree_commit() the callback
> mark_path() is called. Here the incoming path is looked up in the
> hashmap, but because the diff-tree machinery internally switched to
> recursive (even if the last-modified machinery wasn't) the entry for
> 'dir' was never expelled from the hashmap, and the BUG() statement was
> hit.
Okay. So if I understand correctly, the issue here is that once we do a
recursive diff we skip directory entries and only print blobs. And
because of that we never manage to blame the directory to any specific
commit.
What this doesn't explain though is how this is related to criss-cross
merges. Why does the error not happen with "normal" merges?
> Remove overriding 'diffopt.flags.recursive' in diff_tree_combined() and
> move setting this flag to the callers which relied on the recursive
> behavior.
>
> Because internally diff-tree no longer runs recursive, this results in a
> nice speedup when running `git last-modified` on git.git:
I assume this is only the case for `git last-modified --no-recursive`,
right? Do we know to set `diffopt.flags.recursive` in that case already,
or do we handle recursion manually in that case?
> diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
> index 49dd4d00ebf1bcc644383ee99df3a9e05502b89b..6ef0438d36dd1091d2806b46306f2b5ee7274bc0 100644
> --- a/builtin/diff-tree.c
> +++ b/builtin/diff-tree.c
> @@ -16,10 +16,21 @@ static struct rev_info log_tree_opt;
>
> static int diff_tree_commit_oid(const struct object_id *oid)
> {
> + struct rev_info rev_info = log_tree_opt;
> struct commit *commit = lookup_commit_reference(the_repository, oid);
> +
> if (!commit)
> return -1;
> - return log_tree_commit(&log_tree_opt, commit);
> +
> + /*
> + * log_tree_commit() calls into diff_tree_combined() which historically
> + * used to enable diffopt.flags.recursive when option '-c' is given.
> + * To not break backward compatibility, set 'resursive' here.
> + */
> + if (rev_info.combine_merges)
> + rev_info.diffopt.flags.recursive = 1;
> +
> + return log_tree_commit(&rev_info, commit);
> }
>
> /* Diff one or more commits. */
Don't we have to do the same dance for `stdin_diff_commit()`? There's
also callsites in git-am(1), git-log(1) and others. Why don't we have to
adjust any of those?
> diff --git a/combine-diff.c b/combine-diff.c
> index 3878faabe7bb2f7c80cffbf3add6123f17960627..305414efdf436d53fee8d79aa4219f6a4dd3445e 100644
> --- a/combine-diff.c
> +++ b/combine-diff.c
> @@ -1515,7 +1515,6 @@ void diff_tree_combined(const struct object_id *oid,
>
> diffopts = *opt;
> copy_pathspec(&diffopts.pathspec, &opt->pathspec);
> - diffopts.flags.recursive = 1;
> diffopts.flags.allow_external = 0;
>
> /* find set of paths that everybody touches
With the above I'm a bit worried that we're changing behaviour for
direct or indirect callers without noticing. Our tests may not detect
any regressions, but that doesn't really prove that there is none.
An alternative approach could be to introduce a new flag that causes us
to not override the `recursive` flag. It's quite an ugly workaround and
makes the infra even weirder. But at least we could be sure that we
don't alter behaviour inadvertently.
Other than that I don't really have much of a better idea.
Patrick
next prev parent reply other threads:[~2025-09-16 7:11 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-05 13:06 [PATCH v2] combine-diff: don't override recursive flag in diff_tree_combined() Toon Claes
2025-09-05 15:27 ` Junio C Hamano
2025-09-05 15:37 ` Kristoffer Haugsbakk
2025-09-08 12:19 ` Toon Claes
2025-09-16 7:11 ` Patrick Steinhardt [this message]
2025-09-16 15:22 ` Toon Claes
2025-09-18 5:49 ` Patrick Steinhardt
2025-09-18 8:00 ` [PATCH v3] last-modified: fix bug when some paths remain unhandled Toon Claes
2025-09-18 15:18 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aMkNiC_s-Dt4iRPp@pks.im \
--to=ps@pks.im \
--cc=chriscool@tuxfamily.org \
--cc=git@vger.kernel.org \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).