From: Junio C Hamano <gitster@pobox.com>
To: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 4/5] merge-ort: abort merge when trees have duplicate entries
Date: Mon, 01 Jun 2026 21:23:12 +0900 [thread overview]
Message-ID: <xmqqtsrm4fgv.fsf@gitster.g> (raw)
In-Reply-To: <0d81c027aafcb386398836ffc73b058b7ea4c702.1776731171.git.gitgitgadget@gmail.com> (Elijah Newren via GitGitGadget's message of "Tue, 21 Apr 2026 00:26:10 +0000")
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Elijah Newren <newren@gmail.com>
>
> Trees with duplicate entries are malformed; fsck reports "contains
> duplicate file entries" for them. merge-ort has from the beginning
> assumed that we would never hit such trees. It was written with the
> assumption that traverse_trees() calls collect_merge_info_callback() at
> most once per path. The "sanity checks" in that callback (added in
> d2bc1994f363 (merge-ort: implement a very basic collect_merge_info(),
> 2020-12-13)) verify properties of each individual call but not that
> invariant. The strmap_put() in setup_path_info() silently overwrites
> the entry from any prior call for the same path, because it assumed
> there would be no other path. Unfortunately, supplemental data
> structures for various optimizations could still be tweaked before the
> extra paths were overwritten, and those data structures not matching
> expected state could trip various assertions.
>
> Change the return type of setup_path_info() from void to int to allow us
> to detect this case, and abort the merge with a clear error message when
> it occurs.
OK.
> @@ -1081,9 +1081,11 @@ static void setup_path_info(struct merge_options *opt,
> */
> mi->is_null = 1;
> }
> - strmap_put(&opt->priv->paths, fullpath, mi);
> + if (strmap_put(&opt->priv->paths, fullpath, mi))
> + return error(_("tree has duplicate entries for '%s'"), fullpath);
OK. I was wondering what _other_ kind of malformed trees would the
updated code by this change is prepared to handle (most notably,
tree entries must be sorted, and one way to detect duplicate is to
remember one single path that we saw earlier, which would work as
long as the entries are sorted). This "ah, we saw that path already"
approach is much more robust in that it does not have to depend on a
sorted tree.
Makes sense.
next prev parent reply other threads:[~2026-06-01 12:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 0:26 [PATCH 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
2026-04-21 0:26 ` [PATCH 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
2026-06-01 12:13 ` Junio C Hamano
2026-04-21 0:26 ` [PATCH 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
2026-06-01 12:23 ` Junio C Hamano
2026-04-21 0:26 ` [PATCH 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() Elijah Newren via GitGitGadget
2026-04-21 0:26 ` [PATCH 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
2026-06-01 12:23 ` Junio C Hamano [this message]
2026-04-21 0:26 ` [PATCH 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget
2026-06-01 12:33 ` Junio C Hamano
2026-06-01 12:33 ` [PATCH 0/5] Duplicate entry hardening Junio C Hamano
2026-06-01 13:54 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqtsrm4fgv.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox