Git development
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>,
	Christian Couder <christian.couder@gmail.com>,
	Elijah Newren <newren@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: [PATCH v2 0/5] Duplicate entry hardening
Date: Sun, 14 Jun 2026 06:37:21 +0000	[thread overview]
Message-ID: <pull.2096.v2.git.1781419047.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2096.git.1776731171.gitgitgadget@gmail.com>

We had some corrupt trees with duplicate entries in real world repositories,
which triggered an assertion failure in merge-ort. Further, the corrupt tree
creation in the third party tool would have been avoided had verify_cache()
correctly checked for D/F conflicts. Provide fixes for both issues,
including 3 preparatory changes for the merge-ort fix.

Changes since v1:

 * Add some more detail to the commit message of Patch 1
 * Split some code out in Patch 5 into a helper function.

Elijah Newren (5):
  merge-ort: propagate callback errors from traverse_trees_wrapper()
  merge-ort: drop unnecessary show_all_errors from collect_merge_info()
  merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
  merge-ort: abort merge when trees have duplicate entries
  cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts

 cache-tree.c                         | 64 +++++++++++++++++++----
 merge-ort.c                          | 78 ++++++++++++++++------------
 t/meson.build                        |  1 +
 t/t0093-direct-index-write.pl        | 38 ++++++++++++++
 t/t0093-verify-cache-df-gap.sh       | 59 +++++++++++++++++++++
 t/t6422-merge-rename-corner-cases.sh | 54 +++++++++++++++++++
 6 files changed, 249 insertions(+), 45 deletions(-)
 create mode 100644 t/t0093-direct-index-write.pl
 create mode 100755 t/t0093-verify-cache-df-gap.sh


base-commit: e8955061076952cc5eab0300424fc48b601fe12d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2096%2Fnewren%2Fduplicate-entry-hardening-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2096/newren/duplicate-entry-hardening-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2096

Range-diff vs v1:

 1:  282f906d1b ! 1:  dc0f596f31 merge-ort: propagate callback errors from traverse_trees_wrapper()
     @@ Commit message
          traverse_trees_wrapper() saves entries from a first pass through
          traverse_trees() and then replays them through the real callback
          (collect_merge_info_callback).  However, the replay loop silently
     -    discards the callback return value.  This means any error reported by
     -    the callback during replay -- including a future check for malformed
     -    trees -- would be ignored, allowing the merge to proceed with corrupt
     -    state.
     +    discards the callback return value.  This is not a deferred error;
     +    it is an ignored error.
      
     -    Capture the return value, stop the loop on negative (error) returns,
     -    and propagate the error to the caller.  Note that the callback returns
     -    a positive mask value on success, so we normalize non-negative returns
     -    to 0 for the caller.
     +    Today the only originator of a negative return in this entire call
     +    graph is traverse_trees()'s "exceeded maximum allowed tree depth"
     +    check; everything else (collect_merge_info_callback,
     +    traverse_trees_wrapper, the inner traverse_trees recursion) only
     +    relays that.  So in current Git, the visible effect of dropping the
     +    replay callback's return value is narrow but bad: a tree nested past
     +    core.maxTreeDepth has its -1 swallowed, the subtree below the limit
     +    is silently pruned, and the merge completes as if that were the
     +    correct result.
     +
     +    A later patch in this series will teach collect_merge_info_callback()
     +    to return -1 on an additional path -- detecting duplicate
     +    entries in malformed trees -- which is similarly handled today by
     +    just ignoring the problem (resulting in mostly a "last one wins" rule,
     +    though the non-last entry can mutate various state flags).
     +
     +    Capture the return value, stop the loop on negative returns, and
     +    propagate the error to the caller.  The callback returns a positive mask
     +    value on success, so normalize non-negative returns to
     +    0 for the caller.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
 2:  949b5d8e3f = 2:  b4ff725a77 merge-ort: drop unnecessary show_all_errors from collect_merge_info()
 3:  d422f73e12 = 3:  673fbea13f merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
 4:  0d81c027aa = 4:  b5421244d4 merge-ort: abort merge when trees have duplicate entries
 5:  a87bbaa84f ! 5:  cf50f1aabc cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
     @@ Commit message
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## cache-tree.c ##
     +@@ cache-tree.c: void cache_tree_invalidate_path(struct index_state *istate, const char *path)
     + 		istate->cache_changed |= CACHE_TREE_CHANGED;
     + }
     + 
     ++/*
     ++ * Check whether this_ce and the next entry in the index form a D/F
     ++ * conflict ("path" vs "path/file").  Returns the conflicting "path/..."
     ++ * name when one is found, or NULL otherwise.
     ++ *
     ++ * The cache is sorted, so "path/file" sorts after "path" and the
     ++ * conflict is usually visible as adjacent entries.  But other entries
     ++ * can sort between them -- e.g. "path-internal" sits between "path"
     ++ * and "path/file" because '-' (0x2D) precedes '/' (0x2F) -- so when
     ++ * the immediately following entry shares our prefix but starts with a
     ++ * character that sorts before '/', binary search for "path/" instead.
     ++ */
     ++static const char *find_df_conflict(struct index_state *istate,
     ++				    const struct cache_entry *this_ce,
     ++				    const struct cache_entry *next_ce)
     ++{
     ++	const char *this_name = this_ce->name;
     ++	const char *next_name = next_ce->name;
     ++	int this_len = ce_namelen(this_ce);
     ++	const struct cache_entry *other;
     ++	struct strbuf probe = STRBUF_INIT;
     ++	int pos;
     ++
     ++	if (this_len >= ce_namelen(next_ce) ||
     ++	    next_name[this_len] > '/' ||
     ++	    strncmp(this_name, next_name, this_len))
     ++		return NULL;
     ++
     ++	if (next_name[this_len] == '/')
     ++		return next_name;
     ++
     ++	strbuf_add(&probe, this_name, this_len);
     ++	strbuf_addch(&probe, '/');
     ++	pos = index_name_pos_sparse(istate, probe.buf, probe.len);
     ++	strbuf_release(&probe);
     ++
     ++	if (pos < 0)
     ++		pos = -pos - 1;
     ++	if (pos >= (int)istate->cache_nr)
     ++		return NULL;
     ++	other = istate->cache[pos];
     ++	if (ce_namelen(other) > this_len &&
     ++	    other->name[this_len] == '/' &&
     ++	    !strncmp(this_name, other->name, this_len))
     ++		return other->name;
     ++	return NULL;
     ++}
     ++
     + static int verify_cache(struct index_state *istate, int flags)
     + {
     + 	unsigned i, funny;
      @@ cache-tree.c: static int verify_cache(struct index_state *istate, int flags)
     + 	 */
     + 	funny = 0;
       	for (i = 0; i + 1 < istate->cache_nr; i++) {
     - 		/* path/file always comes after path because of the way
     - 		 * the cache is sorted.  Also path can appear only once,
     +-		/* path/file always comes after path because of the way
     +-		 * the cache is sorted.  Also path can appear only once,
      -		 * which means conflicting one would immediately follow.
     -+		 * so path/file is likely the immediately following path
     -+		 * but might be separated if there is e.g. a
     -+		 * path-internal/... file.
     - 		 */
     +-		 */
       		const struct cache_entry *this_ce = istate->cache[i];
       		const struct cache_entry *next_ce = istate->cache[i + 1];
     - 		const char *this_name = this_ce->name;
     - 		const char *next_name = next_ce->name;
     - 		int this_len = ce_namelen(this_ce);
     -+		const char *conflict_name = NULL;
     -+
     - 		if (this_len < ce_namelen(next_ce) &&
     +-		const char *this_name = this_ce->name;
     +-		const char *next_name = next_ce->name;
     +-		int this_len = ce_namelen(this_ce);
     +-		if (this_len < ce_namelen(next_ce) &&
      -		    next_name[this_len] == '/' &&
     -+		    next_name[this_len] <= '/' &&
     - 		    strncmp(this_name, next_name, this_len) == 0) {
     -+			if (next_name[this_len] == '/') {
     -+				conflict_name = next_name;
     -+			} else if (next_name[this_len] < '/') {
     -+				/*
     -+				 * The immediately next entry shares our
     -+				 * prefix but sorts before "path/" (e.g.,
     -+				 * "path-internal" between "path" and
     -+				 * "path/file", since '-' (0x2D) < '/'
     -+				 * (0x2F)).  Binary search to find where
     -+				 * "path/" would be and check for a D/F
     -+				 * conflict there.
     -+				 */
     -+				struct cache_entry *other;
     -+				struct strbuf probe = STRBUF_INIT;
     -+				int pos;
     -+
     -+				strbuf_add(&probe, this_name, this_len);
     -+				strbuf_addch(&probe, '/');
     -+				pos = index_name_pos_sparse(istate,
     -+							    probe.buf,
     -+							    probe.len);
     -+				strbuf_release(&probe);
     -+
     -+				if (pos < 0)
     -+					pos = -pos - 1;
     -+				if (pos >= (int)istate->cache_nr)
     -+					continue;
     -+				other = istate->cache[pos];
     -+				if (ce_namelen(other) > this_len &&
     -+				    other->name[this_len] == '/' &&
     -+				    !strncmp(this_name, other->name, this_len))
     -+					conflict_name = other->name;
     -+			}
     -+		}
     +-		    strncmp(this_name, next_name, this_len) == 0) {
     ++		const char *conflict_name;
      +
     ++		conflict_name = find_df_conflict(istate, this_ce, next_ce);
      +		if (conflict_name) {
       			if (10 < ++funny) {
       				fprintf(stderr, "...\n");
     @@ cache-tree.c: static int verify_cache(struct index_state *istate, int flags)
       			}
       			fprintf(stderr, "You have both %s and %s\n",
      -				this_name, next_name);
     -+				this_name, conflict_name);
     ++				this_ce->name, conflict_name);
       		}
       	}
       	if (funny)

-- 
gitgitgadget

  parent reply	other threads:[~2026-06-14  6:37 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  0:26 [PATCH 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
2026-04-21  0:26 ` [PATCH 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
2026-06-01 12:13   ` Junio C Hamano
2026-06-14  3:16     ` Elijah Newren
2026-04-21  0:26 ` [PATCH 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
2026-06-01 12:23   ` Junio C Hamano
2026-04-21  0:26 ` [PATCH 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() Elijah Newren via GitGitGadget
2026-04-21  0:26 ` [PATCH 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
2026-06-01 12:23   ` Junio C Hamano
2026-04-21  0:26 ` [PATCH 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget
2026-06-01 12:33   ` Junio C Hamano
2026-06-14  3:16     ` Elijah Newren
2026-06-01 12:33 ` [PATCH 0/5] Duplicate entry hardening Junio C Hamano
2026-06-01 13:54   ` Patrick Steinhardt
2026-06-12 13:29     ` Automated reviews by AI (was Re: [PATCH 0/5] Duplicate entry hardening) Christian Couder
2026-06-12 19:32       ` Junio C Hamano
2026-06-14  6:37 ` Elijah Newren via GitGitGadget [this message]
2026-06-14  6:37   ` [PATCH v2 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2096.v2.git.1781419047.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox