All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>,
	Christian Couder <christian.couder@gmail.com>,
	Elijah Newren <newren@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: [PATCH v2 0/5] Duplicate entry hardening
Date: Sun, 14 Jun 2026 06:37:21 +0000	[thread overview]
Message-ID: <pull.2096.v2.git.1781419047.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2096.git.1776731171.gitgitgadget@gmail.com>

We had some corrupt trees with duplicate entries in real world repositories,
which triggered an assertion failure in merge-ort. Further, the corrupt tree
creation in the third party tool would have been avoided had verify_cache()
correctly checked for D/F conflicts. Provide fixes for both issues,
including 3 preparatory changes for the merge-ort fix.

Changes since v1:

 * Add some more detail to the commit message of Patch 1
 * Split some code out in Patch 5 into a helper function.

Elijah Newren (5):
  merge-ort: propagate callback errors from traverse_trees_wrapper()
  merge-ort: drop unnecessary show_all_errors from collect_merge_info()
  merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
  merge-ort: abort merge when trees have duplicate entries
  cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts

 cache-tree.c                         | 64 +++++++++++++++++++----
 merge-ort.c                          | 78 ++++++++++++++++------------
 t/meson.build                        |  1 +
 t/t0093-direct-index-write.pl        | 38 ++++++++++++++
 t/t0093-verify-cache-df-gap.sh       | 59 +++++++++++++++++++++
 t/t6422-merge-rename-corner-cases.sh | 54 +++++++++++++++++++
 6 files changed, 249 insertions(+), 45 deletions(-)
 create mode 100644 t/t0093-direct-index-write.pl
 create mode 100755 t/t0093-verify-cache-df-gap.sh


base-commit: e8955061076952cc5eab0300424fc48b601fe12d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2096%2Fnewren%2Fduplicate-entry-hardening-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2096/newren/duplicate-entry-hardening-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2096

Range-diff vs v1:

 1:  282f906d1b ! 1:  dc0f596f31 merge-ort: propagate callback errors from traverse_trees_wrapper()
     @@ Commit message
          traverse_trees_wrapper() saves entries from a first pass through
          traverse_trees() and then replays them through the real callback
          (collect_merge_info_callback).  However, the replay loop silently
     -    discards the callback return value.  This means any error reported by
     -    the callback during replay -- including a future check for malformed
     -    trees -- would be ignored, allowing the merge to proceed with corrupt
     -    state.
     +    discards the callback return value.  This is not a deferred error;
     +    it is an ignored error.
      
     -    Capture the return value, stop the loop on negative (error) returns,
     -    and propagate the error to the caller.  Note that the callback returns
     -    a positive mask value on success, so we normalize non-negative returns
     -    to 0 for the caller.
     +    Today the only originator of a negative return in this entire call
     +    graph is traverse_trees()'s "exceeded maximum allowed tree depth"
     +    check; everything else (collect_merge_info_callback,
     +    traverse_trees_wrapper, the inner traverse_trees recursion) only
     +    relays that.  So in current Git, the visible effect of dropping the
     +    replay callback's return value is narrow but bad: a tree nested past
     +    core.maxTreeDepth has its -1 swallowed, the subtree below the limit
     +    is silently pruned, and the merge completes as if that were the
     +    correct result.
     +
     +    A later patch in this series will teach collect_merge_info_callback()
     +    to return -1 on an additional path -- detecting duplicate
     +    entries in malformed trees -- which is similarly handled today by
     +    just ignoring the problem (resulting in mostly a "last one wins" rule,
     +    though the non-last entry can mutate various state flags).
     +
     +    Capture the return value, stop the loop on negative returns, and
     +    propagate the error to the caller.  The callback returns a positive mask
     +    value on success, so normalize non-negative returns to
     +    0 for the caller.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
 2:  949b5d8e3f = 2:  b4ff725a77 merge-ort: drop unnecessary show_all_errors from collect_merge_info()
 3:  d422f73e12 = 3:  673fbea13f merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
 4:  0d81c027aa = 4:  b5421244d4 merge-ort: abort merge when trees have duplicate entries
 5:  a87bbaa84f ! 5:  cf50f1aabc cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
     @@ Commit message
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## cache-tree.c ##
     +@@ cache-tree.c: void cache_tree_invalidate_path(struct index_state *istate, const char *path)
     + 		istate->cache_changed |= CACHE_TREE_CHANGED;
     + }
     + 
     ++/*
     ++ * Check whether this_ce and the next entry in the index form a D/F
     ++ * conflict ("path" vs "path/file").  Returns the conflicting "path/..."
     ++ * name when one is found, or NULL otherwise.
     ++ *
     ++ * The cache is sorted, so "path/file" sorts after "path" and the
     ++ * conflict is usually visible as adjacent entries.  But other entries
     ++ * can sort between them -- e.g. "path-internal" sits between "path"
     ++ * and "path/file" because '-' (0x2D) precedes '/' (0x2F) -- so when
     ++ * the immediately following entry shares our prefix but starts with a
     ++ * character that sorts before '/', binary search for "path/" instead.
     ++ */
     ++static const char *find_df_conflict(struct index_state *istate,
     ++				    const struct cache_entry *this_ce,
     ++				    const struct cache_entry *next_ce)
     ++{
     ++	const char *this_name = this_ce->name;
     ++	const char *next_name = next_ce->name;
     ++	int this_len = ce_namelen(this_ce);
     ++	const struct cache_entry *other;
     ++	struct strbuf probe = STRBUF_INIT;
     ++	int pos;
     ++
     ++	if (this_len >= ce_namelen(next_ce) ||
     ++	    next_name[this_len] > '/' ||
     ++	    strncmp(this_name, next_name, this_len))
     ++		return NULL;
     ++
     ++	if (next_name[this_len] == '/')
     ++		return next_name;
     ++
     ++	strbuf_add(&probe, this_name, this_len);
     ++	strbuf_addch(&probe, '/');
     ++	pos = index_name_pos_sparse(istate, probe.buf, probe.len);
     ++	strbuf_release(&probe);
     ++
     ++	if (pos < 0)
     ++		pos = -pos - 1;
     ++	if (pos >= (int)istate->cache_nr)
     ++		return NULL;
     ++	other = istate->cache[pos];
     ++	if (ce_namelen(other) > this_len &&
     ++	    other->name[this_len] == '/' &&
     ++	    !strncmp(this_name, other->name, this_len))
     ++		return other->name;
     ++	return NULL;
     ++}
     ++
     + static int verify_cache(struct index_state *istate, int flags)
     + {
     + 	unsigned i, funny;
      @@ cache-tree.c: static int verify_cache(struct index_state *istate, int flags)
     + 	 */
     + 	funny = 0;
       	for (i = 0; i + 1 < istate->cache_nr; i++) {
     - 		/* path/file always comes after path because of the way
     - 		 * the cache is sorted.  Also path can appear only once,
     +-		/* path/file always comes after path because of the way
     +-		 * the cache is sorted.  Also path can appear only once,
      -		 * which means conflicting one would immediately follow.
     -+		 * so path/file is likely the immediately following path
     -+		 * but might be separated if there is e.g. a
     -+		 * path-internal/... file.
     - 		 */
     +-		 */
       		const struct cache_entry *this_ce = istate->cache[i];
       		const struct cache_entry *next_ce = istate->cache[i + 1];
     - 		const char *this_name = this_ce->name;
     - 		const char *next_name = next_ce->name;
     - 		int this_len = ce_namelen(this_ce);
     -+		const char *conflict_name = NULL;
     -+
     - 		if (this_len < ce_namelen(next_ce) &&
     +-		const char *this_name = this_ce->name;
     +-		const char *next_name = next_ce->name;
     +-		int this_len = ce_namelen(this_ce);
     +-		if (this_len < ce_namelen(next_ce) &&
      -		    next_name[this_len] == '/' &&
     -+		    next_name[this_len] <= '/' &&
     - 		    strncmp(this_name, next_name, this_len) == 0) {
     -+			if (next_name[this_len] == '/') {
     -+				conflict_name = next_name;
     -+			} else if (next_name[this_len] < '/') {
     -+				/*
     -+				 * The immediately next entry shares our
     -+				 * prefix but sorts before "path/" (e.g.,
     -+				 * "path-internal" between "path" and
     -+				 * "path/file", since '-' (0x2D) < '/'
     -+				 * (0x2F)).  Binary search to find where
     -+				 * "path/" would be and check for a D/F
     -+				 * conflict there.
     -+				 */
     -+				struct cache_entry *other;
     -+				struct strbuf probe = STRBUF_INIT;
     -+				int pos;
     -+
     -+				strbuf_add(&probe, this_name, this_len);
     -+				strbuf_addch(&probe, '/');
     -+				pos = index_name_pos_sparse(istate,
     -+							    probe.buf,
     -+							    probe.len);
     -+				strbuf_release(&probe);
     -+
     -+				if (pos < 0)
     -+					pos = -pos - 1;
     -+				if (pos >= (int)istate->cache_nr)
     -+					continue;
     -+				other = istate->cache[pos];
     -+				if (ce_namelen(other) > this_len &&
     -+				    other->name[this_len] == '/' &&
     -+				    !strncmp(this_name, other->name, this_len))
     -+					conflict_name = other->name;
     -+			}
     -+		}
     +-		    strncmp(this_name, next_name, this_len) == 0) {
     ++		const char *conflict_name;
      +
     ++		conflict_name = find_df_conflict(istate, this_ce, next_ce);
      +		if (conflict_name) {
       			if (10 < ++funny) {
       				fprintf(stderr, "...\n");
     @@ cache-tree.c: static int verify_cache(struct index_state *istate, int flags)
       			}
       			fprintf(stderr, "You have both %s and %s\n",
      -				this_name, next_name);
     -+				this_name, conflict_name);
     ++				this_ce->name, conflict_name);
       		}
       	}
       	if (funny)

-- 
gitgitgadget

  parent reply	other threads:[~2026-06-14  6:37 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  0:26 [PATCH 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
2026-04-21  0:26 ` [PATCH 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
2026-06-01 12:13   ` Junio C Hamano
2026-06-14  3:16     ` Elijah Newren
2026-04-21  0:26 ` [PATCH 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
2026-06-01 12:23   ` Junio C Hamano
2026-04-21  0:26 ` [PATCH 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() Elijah Newren via GitGitGadget
2026-04-21  0:26 ` [PATCH 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
2026-06-01 12:23   ` Junio C Hamano
2026-04-21  0:26 ` [PATCH 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget
2026-06-01 12:33   ` Junio C Hamano
2026-06-14  3:16     ` Elijah Newren
2026-06-01 12:33 ` [PATCH 0/5] Duplicate entry hardening Junio C Hamano
2026-06-01 13:54   ` Patrick Steinhardt
2026-06-12 13:29     ` Automated reviews by AI (was Re: [PATCH 0/5] Duplicate entry hardening) Christian Couder
2026-06-12 19:32       ` Junio C Hamano
2026-06-14  6:37 ` Elijah Newren via GitGitGadget [this message]
2026-06-14  6:37   ` [PATCH v2 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2096.v2.git.1781419047.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.