From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>,
Christian Couder <christian.couder@gmail.com>,
Elijah Newren <newren@gmail.com>,
Elijah Newren <newren@gmail.com>
Subject: [PATCH v2 0/5] Duplicate entry hardening
Date: Sun, 14 Jun 2026 06:37:21 +0000 [thread overview]
Message-ID: <pull.2096.v2.git.1781419047.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2096.git.1776731171.gitgitgadget@gmail.com>
We had some corrupt trees with duplicate entries in real world repositories,
which triggered an assertion failure in merge-ort. Further, the corrupt tree
creation in the third party tool would have been avoided had verify_cache()
correctly checked for D/F conflicts. Provide fixes for both issues,
including 3 preparatory changes for the merge-ort fix.
Changes since v1:
* Add some more detail to the commit message of Patch 1
* Split some code out in Patch 5 into a helper function.
Elijah Newren (5):
merge-ort: propagate callback errors from traverse_trees_wrapper()
merge-ort: drop unnecessary show_all_errors from collect_merge_info()
merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
merge-ort: abort merge when trees have duplicate entries
cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
cache-tree.c | 64 +++++++++++++++++++----
merge-ort.c | 78 ++++++++++++++++------------
t/meson.build | 1 +
t/t0093-direct-index-write.pl | 38 ++++++++++++++
t/t0093-verify-cache-df-gap.sh | 59 +++++++++++++++++++++
t/t6422-merge-rename-corner-cases.sh | 54 +++++++++++++++++++
6 files changed, 249 insertions(+), 45 deletions(-)
create mode 100644 t/t0093-direct-index-write.pl
create mode 100755 t/t0093-verify-cache-df-gap.sh
base-commit: e8955061076952cc5eab0300424fc48b601fe12d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2096%2Fnewren%2Fduplicate-entry-hardening-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2096/newren/duplicate-entry-hardening-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2096
Range-diff vs v1:
1: 282f906d1b ! 1: dc0f596f31 merge-ort: propagate callback errors from traverse_trees_wrapper()
@@ Commit message
traverse_trees_wrapper() saves entries from a first pass through
traverse_trees() and then replays them through the real callback
(collect_merge_info_callback). However, the replay loop silently
- discards the callback return value. This means any error reported by
- the callback during replay -- including a future check for malformed
- trees -- would be ignored, allowing the merge to proceed with corrupt
- state.
+ discards the callback return value. This is not a deferred error;
+ it is an ignored error.
- Capture the return value, stop the loop on negative (error) returns,
- and propagate the error to the caller. Note that the callback returns
- a positive mask value on success, so we normalize non-negative returns
- to 0 for the caller.
+ Today the only originator of a negative return in this entire call
+ graph is traverse_trees()'s "exceeded maximum allowed tree depth"
+ check; everything else (collect_merge_info_callback,
+ traverse_trees_wrapper, the inner traverse_trees recursion) only
+ relays that. So in current Git, the visible effect of dropping the
+ replay callback's return value is narrow but bad: a tree nested past
+ core.maxTreeDepth has its -1 swallowed, the subtree below the limit
+ is silently pruned, and the merge completes as if that were the
+ correct result.
+
+ A later patch in this series will teach collect_merge_info_callback()
+ to return -1 on an additional path -- detecting duplicate
+ entries in malformed trees -- which is similarly handled today by
+ just ignoring the problem (resulting in mostly a "last one wins" rule,
+ though the non-last entry can mutate various state flags).
+
+ Capture the return value, stop the loop on negative returns, and
+ propagate the error to the caller. The callback returns a positive mask
+ value on success, so normalize non-negative returns to
+ 0 for the caller.
Signed-off-by: Elijah Newren <newren@gmail.com>
2: 949b5d8e3f = 2: b4ff725a77 merge-ort: drop unnecessary show_all_errors from collect_merge_info()
3: d422f73e12 = 3: 673fbea13f merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
4: 0d81c027aa = 4: b5421244d4 merge-ort: abort merge when trees have duplicate entries
5: a87bbaa84f ! 5: cf50f1aabc cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
@@ Commit message
Signed-off-by: Elijah Newren <newren@gmail.com>
## cache-tree.c ##
+@@ cache-tree.c: void cache_tree_invalidate_path(struct index_state *istate, const char *path)
+ istate->cache_changed |= CACHE_TREE_CHANGED;
+ }
+
++/*
++ * Check whether this_ce and the next entry in the index form a D/F
++ * conflict ("path" vs "path/file"). Returns the conflicting "path/..."
++ * name when one is found, or NULL otherwise.
++ *
++ * The cache is sorted, so "path/file" sorts after "path" and the
++ * conflict is usually visible as adjacent entries. But other entries
++ * can sort between them -- e.g. "path-internal" sits between "path"
++ * and "path/file" because '-' (0x2D) precedes '/' (0x2F) -- so when
++ * the immediately following entry shares our prefix but starts with a
++ * character that sorts before '/', binary search for "path/" instead.
++ */
++static const char *find_df_conflict(struct index_state *istate,
++ const struct cache_entry *this_ce,
++ const struct cache_entry *next_ce)
++{
++ const char *this_name = this_ce->name;
++ const char *next_name = next_ce->name;
++ int this_len = ce_namelen(this_ce);
++ const struct cache_entry *other;
++ struct strbuf probe = STRBUF_INIT;
++ int pos;
++
++ if (this_len >= ce_namelen(next_ce) ||
++ next_name[this_len] > '/' ||
++ strncmp(this_name, next_name, this_len))
++ return NULL;
++
++ if (next_name[this_len] == '/')
++ return next_name;
++
++ strbuf_add(&probe, this_name, this_len);
++ strbuf_addch(&probe, '/');
++ pos = index_name_pos_sparse(istate, probe.buf, probe.len);
++ strbuf_release(&probe);
++
++ if (pos < 0)
++ pos = -pos - 1;
++ if (pos >= (int)istate->cache_nr)
++ return NULL;
++ other = istate->cache[pos];
++ if (ce_namelen(other) > this_len &&
++ other->name[this_len] == '/' &&
++ !strncmp(this_name, other->name, this_len))
++ return other->name;
++ return NULL;
++}
++
+ static int verify_cache(struct index_state *istate, int flags)
+ {
+ unsigned i, funny;
@@ cache-tree.c: static int verify_cache(struct index_state *istate, int flags)
+ */
+ funny = 0;
for (i = 0; i + 1 < istate->cache_nr; i++) {
- /* path/file always comes after path because of the way
- * the cache is sorted. Also path can appear only once,
+- /* path/file always comes after path because of the way
+- * the cache is sorted. Also path can appear only once,
- * which means conflicting one would immediately follow.
-+ * so path/file is likely the immediately following path
-+ * but might be separated if there is e.g. a
-+ * path-internal/... file.
- */
+- */
const struct cache_entry *this_ce = istate->cache[i];
const struct cache_entry *next_ce = istate->cache[i + 1];
- const char *this_name = this_ce->name;
- const char *next_name = next_ce->name;
- int this_len = ce_namelen(this_ce);
-+ const char *conflict_name = NULL;
-+
- if (this_len < ce_namelen(next_ce) &&
+- const char *this_name = this_ce->name;
+- const char *next_name = next_ce->name;
+- int this_len = ce_namelen(this_ce);
+- if (this_len < ce_namelen(next_ce) &&
- next_name[this_len] == '/' &&
-+ next_name[this_len] <= '/' &&
- strncmp(this_name, next_name, this_len) == 0) {
-+ if (next_name[this_len] == '/') {
-+ conflict_name = next_name;
-+ } else if (next_name[this_len] < '/') {
-+ /*
-+ * The immediately next entry shares our
-+ * prefix but sorts before "path/" (e.g.,
-+ * "path-internal" between "path" and
-+ * "path/file", since '-' (0x2D) < '/'
-+ * (0x2F)). Binary search to find where
-+ * "path/" would be and check for a D/F
-+ * conflict there.
-+ */
-+ struct cache_entry *other;
-+ struct strbuf probe = STRBUF_INIT;
-+ int pos;
-+
-+ strbuf_add(&probe, this_name, this_len);
-+ strbuf_addch(&probe, '/');
-+ pos = index_name_pos_sparse(istate,
-+ probe.buf,
-+ probe.len);
-+ strbuf_release(&probe);
-+
-+ if (pos < 0)
-+ pos = -pos - 1;
-+ if (pos >= (int)istate->cache_nr)
-+ continue;
-+ other = istate->cache[pos];
-+ if (ce_namelen(other) > this_len &&
-+ other->name[this_len] == '/' &&
-+ !strncmp(this_name, other->name, this_len))
-+ conflict_name = other->name;
-+ }
-+ }
+- strncmp(this_name, next_name, this_len) == 0) {
++ const char *conflict_name;
+
++ conflict_name = find_df_conflict(istate, this_ce, next_ce);
+ if (conflict_name) {
if (10 < ++funny) {
fprintf(stderr, "...\n");
@@ cache-tree.c: static int verify_cache(struct index_state *istate, int flags)
}
fprintf(stderr, "You have both %s and %s\n",
- this_name, next_name);
-+ this_name, conflict_name);
++ this_ce->name, conflict_name);
}
}
if (funny)
--
gitgitgadget
next prev parent reply other threads:[~2026-06-14 6:37 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 0:26 [PATCH 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
2026-04-21 0:26 ` [PATCH 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
2026-06-01 12:13 ` Junio C Hamano
2026-06-14 3:16 ` Elijah Newren
2026-04-21 0:26 ` [PATCH 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
2026-06-01 12:23 ` Junio C Hamano
2026-04-21 0:26 ` [PATCH 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() Elijah Newren via GitGitGadget
2026-04-21 0:26 ` [PATCH 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
2026-06-01 12:23 ` Junio C Hamano
2026-04-21 0:26 ` [PATCH 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget
2026-06-01 12:33 ` Junio C Hamano
2026-06-14 3:16 ` Elijah Newren
2026-06-01 12:33 ` [PATCH 0/5] Duplicate entry hardening Junio C Hamano
2026-06-01 13:54 ` Patrick Steinhardt
2026-06-12 13:29 ` Automated reviews by AI (was Re: [PATCH 0/5] Duplicate entry hardening) Christian Couder
2026-06-12 19:32 ` Junio C Hamano
2026-06-14 6:37 ` Elijah Newren via GitGitGadget [this message]
2026-06-14 6:37 ` [PATCH v2 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
2026-06-14 6:37 ` [PATCH v2 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
2026-06-14 6:37 ` [PATCH v2 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() Elijah Newren via GitGitGadget
2026-06-14 6:37 ` [PATCH v2 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
2026-06-14 6:37 ` [PATCH v2 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.2096.v2.git.1781419047.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=newren@gmail.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox