* [PATCH 0/5] Duplicate entry hardening
@ 2026-04-21 0:26 Elijah Newren via GitGitGadget
2026-04-21 0:26 ` [PATCH 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Elijah Newren via GitGitGadget @ 2026-04-21 0:26 UTC (permalink / raw)
To: git; +Cc: Elijah Newren
We had some corrupt trees with duplicate entries in real world repositories,
which triggered an assertion failure in merge-ort. Further, the corrupt tree
creation in the third party tool would have been avoided had verify_cache()
correctly checked for D/F conflicts. Provide fixes for both issues,
including 3 preparatory changes for the merge-ort fix.
Elijah Newren (5):
merge-ort: propagate callback errors from traverse_trees_wrapper()
merge-ort: drop unnecessary show_all_errors from collect_merge_info()
merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
merge-ort: abort merge when trees have duplicate entries
cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
cache-tree.c | 46 ++++++++++++++--
merge-ort.c | 78 ++++++++++++++++------------
t/meson.build | 1 +
t/t0093-direct-index-write.pl | 38 ++++++++++++++
t/t0093-verify-cache-df-gap.sh | 59 +++++++++++++++++++++
t/t6422-merge-rename-corner-cases.sh | 54 +++++++++++++++++++
6 files changed, 239 insertions(+), 37 deletions(-)
create mode 100644 t/t0093-direct-index-write.pl
create mode 100755 t/t0093-verify-cache-df-gap.sh
base-commit: e8955061076952cc5eab0300424fc48b601fe12d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2096%2Fnewren%2Fduplicate-entry-hardening-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2096/newren/duplicate-entry-hardening-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2096
--
gitgitgadget
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper()
2026-04-21 0:26 [PATCH 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
@ 2026-04-21 0:26 ` Elijah Newren via GitGitGadget
2026-06-01 12:13 ` Junio C Hamano
2026-04-21 0:26 ` [PATCH 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
` (4 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Elijah Newren via GitGitGadget @ 2026-04-21 0:26 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Elijah Newren
From: Elijah Newren <newren@gmail.com>
traverse_trees_wrapper() saves entries from a first pass through
traverse_trees() and then replays them through the real callback
(collect_merge_info_callback). However, the replay loop silently
discards the callback return value. This means any error reported by
the callback during replay -- including a future check for malformed
trees -- would be ignored, allowing the merge to proceed with corrupt
state.
Capture the return value, stop the loop on negative (error) returns,
and propagate the error to the caller. Note that the callback returns
a positive mask value on success, so we normalize non-negative returns
to 0 for the caller.
Signed-off-by: Elijah Newren <newren@gmail.com>
---
merge-ort.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/merge-ort.c b/merge-ort.c
index 00923ce3cd..4b8e32209d 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1008,18 +1008,20 @@ static int traverse_trees_wrapper(struct index_state *istate,
info->traverse_path = renames->callback_data_traverse_path;
info->fn = old_fn;
for (i = old_offset; i < renames->callback_data_nr; ++i) {
- info->fn(n,
- renames->callback_data[i].mask,
- renames->callback_data[i].dirmask,
- renames->callback_data[i].names,
- info);
+ ret = info->fn(n,
+ renames->callback_data[i].mask,
+ renames->callback_data[i].dirmask,
+ renames->callback_data[i].names,
+ info);
+ if (ret < 0)
+ break;
}
renames->callback_data_nr = old_offset;
free(renames->callback_data_traverse_path);
renames->callback_data_traverse_path = old_callback_data_traverse_path;
info->traverse_path = NULL;
- return 0;
+ return ret < 0 ? ret : 0;
}
static void setup_path_info(struct merge_options *opt,
--
gitgitgadget
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info()
2026-04-21 0:26 [PATCH 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
2026-04-21 0:26 ` [PATCH 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
@ 2026-04-21 0:26 ` Elijah Newren via GitGitGadget
2026-06-01 12:23 ` Junio C Hamano
2026-04-21 0:26 ` [PATCH 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() Elijah Newren via GitGitGadget
` (3 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Elijah Newren via GitGitGadget @ 2026-04-21 0:26 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Elijah Newren
From: Elijah Newren <newren@gmail.com>
collect_merge_info() has set info.show_all_errors = 1 since
d2bc1994f363 (merge-ort: implement a very basic collect_merge_info(),
2020-12-13). This setting was copied from unpack-trees.c where it
controls batching of error messages for porcelain display, but
merge-ort has no such error-batching logic and never needed it.
With show_all_errors set, traverse_trees() captures a negative callback
return but continues processing remaining entries rather than stopping
immediately. Removing the setting restores the default behavior where
a negative return from collect_merge_info_callback() breaks out of the
traversal loop right away, allowing a future commit to exit early when
a corrupt tree is detected.
Signed-off-by: Elijah Newren <newren@gmail.com>
---
merge-ort.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/merge-ort.c b/merge-ort.c
index 4b8e32209d..74e9636020 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1740,7 +1740,6 @@ static int collect_merge_info(struct merge_options *opt,
setup_traverse_info(&info, opt->priv->toplevel_dir);
info.fn = collect_merge_info_callback;
info.data = opt;
- info.show_all_errors = 1;
if (repo_parse_tree(opt->repo, merge_base) < 0 ||
repo_parse_tree(opt->repo, side1) < 0 ||
--
gitgitgadget
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
2026-04-21 0:26 [PATCH 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
2026-04-21 0:26 ` [PATCH 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
2026-04-21 0:26 ` [PATCH 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
@ 2026-04-21 0:26 ` Elijah Newren via GitGitGadget
2026-04-21 0:26 ` [PATCH 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
` (2 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Elijah Newren via GitGitGadget @ 2026-04-21 0:26 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Elijah Newren
From: Elijah Newren <newren@gmail.com>
clear_or_reinit_internal_opts() is responsible for cleaning up the
various data structures in merge_options_internal. It already handles
many renames-related structures (dirs_removed, dir_renames,
relevant_sources, cached_pairs, deferred, etc.) but does not free
renames->pairs[].queue.
In the normal code path, resolve_and_process_renames() frees
pairs[s].queue and reinitializes it with diff_queue_init() before
clear_or_reinit_internal_opts() runs, so the omission is harmless.
However, if collect_merge_info() encounters an error and returns early
(before resolve_and_process_renames() is ever called), any diff pairs
already queued by collect_rename_info()/add_pair() will have their
backing array leaked.
Fix this by freeing renames->pairs[].queue in the cleanup function.
In the normal path the pointer is already NULL (from the earlier
diff_queue_init() in resolve_and_process_renames()), so free(NULL) is
a safe no-op.
Signed-off-by: Elijah Newren <newren@gmail.com>
---
merge-ort.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/merge-ort.c b/merge-ort.c
index 74e9636020..8f911cb639 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -728,6 +728,8 @@ static void clear_or_reinit_internal_opts(struct merge_options_internal *opti,
strintmap_clear_func(&renames->deferred[i].possible_trivial_merges);
strset_clear_func(&renames->deferred[i].target_dirs);
renames->deferred[i].trivial_merges_okay = 1; /* 1 == maybe */
+ free(renames->pairs[i].queue);
+ diff_queue_init(&renames->pairs[i]);
}
renames->cached_pairs_valid_side = 0;
renames->dir_rename_mask = 0;
--
gitgitgadget
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/5] merge-ort: abort merge when trees have duplicate entries
2026-04-21 0:26 [PATCH 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
` (2 preceding siblings ...)
2026-04-21 0:26 ` [PATCH 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() Elijah Newren via GitGitGadget
@ 2026-04-21 0:26 ` Elijah Newren via GitGitGadget
2026-06-01 12:23 ` Junio C Hamano
2026-04-21 0:26 ` [PATCH 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget
2026-06-01 12:33 ` [PATCH 0/5] Duplicate entry hardening Junio C Hamano
5 siblings, 1 reply; 14+ messages in thread
From: Elijah Newren via GitGitGadget @ 2026-04-21 0:26 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Elijah Newren
From: Elijah Newren <newren@gmail.com>
Trees with duplicate entries are malformed; fsck reports "contains
duplicate file entries" for them. merge-ort has from the beginning
assumed that we would never hit such trees. It was written with the
assumption that traverse_trees() calls collect_merge_info_callback() at
most once per path. The "sanity checks" in that callback (added in
d2bc1994f363 (merge-ort: implement a very basic collect_merge_info(),
2020-12-13)) verify properties of each individual call but not that
invariant. The strmap_put() in setup_path_info() silently overwrites
the entry from any prior call for the same path, because it assumed
there would be no other path. Unfortunately, supplemental data
structures for various optimizations could still be tweaked before the
extra paths were overwritten, and those data structures not matching
expected state could trip various assertions.
Change the return type of setup_path_info() from void to int to allow us
to detect this case, and abort the merge with a clear error message when
it occurs.
Signed-off-by: Elijah Newren <newren@gmail.com>
---
merge-ort.c | 61 ++++++++++++++++------------
t/t6422-merge-rename-corner-cases.sh | 54 ++++++++++++++++++++++++
2 files changed, 88 insertions(+), 27 deletions(-)
diff --git a/merge-ort.c b/merge-ort.c
index 8f911cb639..be0829bbb7 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1026,18 +1026,18 @@ static int traverse_trees_wrapper(struct index_state *istate,
return ret < 0 ? ret : 0;
}
-static void setup_path_info(struct merge_options *opt,
- struct string_list_item *result,
- const char *current_dir_name,
- int current_dir_name_len,
- char *fullpath, /* we'll take over ownership */
- struct name_entry *names,
- struct name_entry *merged_version,
- unsigned is_null, /* boolean */
- unsigned df_conflict, /* boolean */
- unsigned filemask,
- unsigned dirmask,
- int resolved /* boolean */)
+static int setup_path_info(struct merge_options *opt,
+ struct string_list_item *result,
+ const char *current_dir_name,
+ int current_dir_name_len,
+ char *fullpath, /* we'll take over ownership */
+ struct name_entry *names,
+ struct name_entry *merged_version,
+ unsigned is_null, /* boolean */
+ unsigned df_conflict, /* boolean */
+ unsigned filemask,
+ unsigned dirmask,
+ int resolved /* boolean */)
{
/* result->util is void*, so mi is a convenience typed variable */
struct merged_info *mi;
@@ -1081,9 +1081,11 @@ static void setup_path_info(struct merge_options *opt,
*/
mi->is_null = 1;
}
- strmap_put(&opt->priv->paths, fullpath, mi);
+ if (strmap_put(&opt->priv->paths, fullpath, mi))
+ return error(_("tree has duplicate entries for '%s'"), fullpath);
result->string = fullpath;
result->util = mi;
+ return 0;
}
static void add_pair(struct merge_options *opt,
@@ -1350,9 +1352,10 @@ static int collect_merge_info_callback(int n,
*/
if (side1_matches_mbase && side2_matches_mbase) {
/* mbase, side1, & side2 all match; use mbase as resolution */
- setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
- names, names+0, mbase_null, 0 /* df_conflict */,
- filemask, dirmask, 1 /* resolved */);
+ if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+ names, names+0, mbase_null, 0 /* df_conflict */,
+ filemask, dirmask, 1 /* resolved */))
+ return -1; /* Quit traversing */
return mask;
}
@@ -1364,9 +1367,10 @@ static int collect_merge_info_callback(int n,
*/
if (sides_match && filemask == 0x07) {
/* use side1 (== side2) version as resolution */
- setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
- names, names+1, side1_null, 0,
- filemask, dirmask, 1);
+ if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+ names, names+1, side1_null, 0,
+ filemask, dirmask, 1))
+ return -1; /* Quit traversing */
return mask;
}
@@ -1378,18 +1382,20 @@ static int collect_merge_info_callback(int n,
*/
if (side1_matches_mbase && filemask == 0x07) {
/* use side2 version as resolution */
- setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
- names, names+2, side2_null, 0,
- filemask, dirmask, 1);
+ if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+ names, names+2, side2_null, 0,
+ filemask, dirmask, 1))
+ return -1; /* Quit traversing */
return mask;
}
/* Similar to above but swapping sides 1 and 2 */
if (side2_matches_mbase && filemask == 0x07) {
/* use side1 version as resolution */
- setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
- names, names+1, side1_null, 0,
- filemask, dirmask, 1);
+ if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+ names, names+1, side1_null, 0,
+ filemask, dirmask, 1))
+ return -1; /* Quit traversing */
return mask;
}
@@ -1413,8 +1419,9 @@ static int collect_merge_info_callback(int n,
* unconflict some more cases, but that comes later so all we can
* do now is record the different non-null file hashes.)
*/
- setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
- names, NULL, 0, df_conflict, filemask, dirmask, 0);
+ if (setup_path_info(opt, &pi, dirname, info->pathlen, fullpath,
+ names, NULL, 0, df_conflict, filemask, dirmask, 0))
+ return -1; /* Quit traversing */
ci = pi.util;
VERIFY_CI(ci);
diff --git a/t/t6422-merge-rename-corner-cases.sh b/t/t6422-merge-rename-corner-cases.sh
index e18d5a227d..81b645bb3b 100755
--- a/t/t6422-merge-rename-corner-cases.sh
+++ b/t/t6422-merge-rename-corner-cases.sh
@@ -1525,4 +1525,58 @@ test_expect_success 'submodule/directory preliminary conflict' '
)
'
+# Testcase: submodule/directory conflict with duplicate tree entries
+# One side has a path as a gitlink (submodule). The other side replaces
+# the gitlink with a directory. A third-party tool creates a tree on the
+# submodule side that has *both* a gitlink and a tree entry for the same
+# path (adding a file inside the submodule path ignoring that there's a
+# gitlink there). collect_merge_info_callback() should detect the
+# duplicate and abort rather than silently corrupting its bookkeeping.
+
+test_expect_success 'duplicate tree entries trigger an error' '
+ test_when_finished "rm -rf duplicate-entry" &&
+ git init duplicate-entry &&
+ (
+ cd duplicate-entry &&
+
+ # Base commit: "docs" is a gitlink (submodule)
+ empty_tree=$(git mktree </dev/null) &&
+ fake_commit=$(git commit-tree $empty_tree </dev/null) &&
+ git update-index --add --cacheinfo 160000,$fake_commit,docs &&
+ echo base >file.txt &&
+ git add file.txt &&
+ git commit -m base &&
+
+ # side1: remove the gitlink, replace with a directory
+ git checkout -b side1 &&
+ git rm --cached docs &&
+ mkdir -p docs &&
+ echo hello >docs/requirements.txt &&
+ git add docs/requirements.txt &&
+ git commit -m "side1: submodule to directory" &&
+
+ # side2: keep the gitlink but craft a tree that also
+ # contains a tree entry for "docs" (simulating a tool
+ # that adds files inside a submodule path without
+ # removing the gitlink first).
+ git checkout main &&
+ git checkout -b side2 &&
+ blob_oid=$(echo world | git hash-object -w --stdin) &&
+ docs_tree=$(printf "100644 blob %s\trequirements.txt\n" \
+ "$blob_oid" | git mktree) &&
+ cur_tree=$(git rev-parse HEAD^{tree}) &&
+ git cat-file -p $cur_tree >tree-listing &&
+ printf "040000 tree %s\tdocs\n" "$docs_tree" >>tree-listing &&
+ new_tree=$(git mktree <tree-listing) &&
+ side2_commit=$(git commit-tree $new_tree -p HEAD \
+ -m "side2: add file alongside submodule") &&
+ git update-ref refs/heads/side2 $side2_commit &&
+
+ # Merging must detect the duplicate and abort
+ git checkout side1 &&
+ test_must_fail git merge side2 2>err &&
+ test_grep "duplicate entries" err
+ )
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
2026-04-21 0:26 [PATCH 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
` (3 preceding siblings ...)
2026-04-21 0:26 ` [PATCH 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
@ 2026-04-21 0:26 ` Elijah Newren via GitGitGadget
2026-06-01 12:33 ` Junio C Hamano
2026-06-01 12:33 ` [PATCH 0/5] Duplicate entry hardening Junio C Hamano
5 siblings, 1 reply; 14+ messages in thread
From: Elijah Newren via GitGitGadget @ 2026-04-21 0:26 UTC (permalink / raw)
To: git; +Cc: Elijah Newren, Elijah Newren
From: Elijah Newren <newren@gmail.com>
verify_cache() checks that the index does not contain both "path" and
"path/file" before writing a tree. It does this by comparing only
adjacent entries, relying on the assumption that "path/file" would
immediately follow "path" in sorted order. Unfortunately, this
assumption does not always hold. For example:
docs <-- submodule entry
docs-internal/README.md <-- intervening entry
docs/requirements.txt <-- D/F conflict, NOT adjacent to "docs"
When this happens, verify_cache() silently misses the D/F conflict and
write-tree produces a corrupt tree object containing duplicate entries
(one for the submodule "docs" and one for the tree "docs").
I could not find any caller in current git that both allows the index to
get into this state and then tries to write it out without doing other
checks beyond the verify_cache() call in cache_tree_update(), but
verify_cache() is documented as a safety net for preventing corrupt
trees and should actually provide that guarantee. A downstream consumer
that relied solely on cache_tree_update()'s internal checking via
verify_cache() to prevent duplicate tree entries was bitten by the gap.
Add a test that constructs a corrupt index directly (bypassing the D/F
checks in add_index_entry) and verifies that write-tree now rejects it.
Signed-off-by: Elijah Newren <newren@gmail.com>
---
cache-tree.c | 46 ++++++++++++++++++++++++--
t/meson.build | 1 +
t/t0093-direct-index-write.pl | 38 ++++++++++++++++++++++
t/t0093-verify-cache-df-gap.sh | 59 ++++++++++++++++++++++++++++++++++
4 files changed, 141 insertions(+), 3 deletions(-)
create mode 100644 t/t0093-direct-index-write.pl
create mode 100755 t/t0093-verify-cache-df-gap.sh
diff --git a/cache-tree.c b/cache-tree.c
index 7881b42aa2..f11844fe72 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -192,22 +192,62 @@ static int verify_cache(struct index_state *istate, int flags)
for (i = 0; i + 1 < istate->cache_nr; i++) {
/* path/file always comes after path because of the way
* the cache is sorted. Also path can appear only once,
- * which means conflicting one would immediately follow.
+ * so path/file is likely the immediately following path
+ * but might be separated if there is e.g. a
+ * path-internal/... file.
*/
const struct cache_entry *this_ce = istate->cache[i];
const struct cache_entry *next_ce = istate->cache[i + 1];
const char *this_name = this_ce->name;
const char *next_name = next_ce->name;
int this_len = ce_namelen(this_ce);
+ const char *conflict_name = NULL;
+
if (this_len < ce_namelen(next_ce) &&
- next_name[this_len] == '/' &&
+ next_name[this_len] <= '/' &&
strncmp(this_name, next_name, this_len) == 0) {
+ if (next_name[this_len] == '/') {
+ conflict_name = next_name;
+ } else if (next_name[this_len] < '/') {
+ /*
+ * The immediately next entry shares our
+ * prefix but sorts before "path/" (e.g.,
+ * "path-internal" between "path" and
+ * "path/file", since '-' (0x2D) < '/'
+ * (0x2F)). Binary search to find where
+ * "path/" would be and check for a D/F
+ * conflict there.
+ */
+ struct cache_entry *other;
+ struct strbuf probe = STRBUF_INIT;
+ int pos;
+
+ strbuf_add(&probe, this_name, this_len);
+ strbuf_addch(&probe, '/');
+ pos = index_name_pos_sparse(istate,
+ probe.buf,
+ probe.len);
+ strbuf_release(&probe);
+
+ if (pos < 0)
+ pos = -pos - 1;
+ if (pos >= (int)istate->cache_nr)
+ continue;
+ other = istate->cache[pos];
+ if (ce_namelen(other) > this_len &&
+ other->name[this_len] == '/' &&
+ !strncmp(this_name, other->name, this_len))
+ conflict_name = other->name;
+ }
+ }
+
+ if (conflict_name) {
if (10 < ++funny) {
fprintf(stderr, "...\n");
break;
}
fprintf(stderr, "You have both %s and %s\n",
- this_name, next_name);
+ this_name, conflict_name);
}
}
if (funny)
diff --git a/t/meson.build b/t/meson.build
index 7528e5cda5..362177999b 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -124,6 +124,7 @@ integration_tests = [
't0090-cache-tree.sh',
't0091-bugreport.sh',
't0092-diagnose.sh',
+ 't0093-verify-cache-df-gap.sh',
't0095-bloom.sh',
't0100-previous.sh',
't0101-at-syntax.sh',
diff --git a/t/t0093-direct-index-write.pl b/t/t0093-direct-index-write.pl
new file mode 100644
index 0000000000..2881a3ebb2
--- /dev/null
+++ b/t/t0093-direct-index-write.pl
@@ -0,0 +1,38 @@
+#!/usr/bin/perl
+#
+# Build a v2 index file from entries listed on stdin.
+# Each line: "octalmode hex-oid name"
+# Output: binary index written to stdout.
+#
+# This bypasses all D/F safety checks in add_index_entry(), simulating
+# what happens when code uses ADD_CACHE_JUST_APPEND to bulk-load entries.
+use strict;
+use warnings;
+use Digest::SHA qw(sha1 sha256);
+
+my $hash_algo = $ENV{'GIT_DEFAULT_HASH'} || 'sha1';
+my $hash_func = $hash_algo eq 'sha256' ? \&sha256 : \&sha1;
+
+my @entries;
+while (my $line = <STDIN>) {
+ chomp $line;
+ my ($mode, $oid_hex, $name) = split(/ /, $line, 3);
+ push @entries, [$mode, $oid_hex, $name];
+}
+
+my $body = "DIRC" . pack("NN", 2, scalar @entries);
+
+for my $ent (@entries) {
+ my ($mode, $oid_hex, $name) = @{$ent};
+ # 10 x 32-bit stat fields (zeroed), with mode in position 7
+ my $stat = pack("N10", 0, 0, 0, 0, 0, 0, oct($mode), 0, 0, 0);
+ my $oid = pack("H*", $oid_hex);
+ my $flags = pack("n", length($name) & 0xFFF);
+ my $entry = $stat . $oid . $flags . $name . "\0";
+ # Pad to 8-byte boundary
+ while (length($entry) % 8) { $entry .= "\0"; }
+ $body .= $entry;
+}
+
+binmode STDOUT;
+print $body . $hash_func->($body);
diff --git a/t/t0093-verify-cache-df-gap.sh b/t/t0093-verify-cache-df-gap.sh
new file mode 100755
index 0000000000..0b6829d805
--- /dev/null
+++ b/t/t0093-verify-cache-df-gap.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+
+test_description='verify_cache() must catch non-adjacent D/F conflicts
+
+Ensure that verify_cache() can complain about bad entries like:
+
+ docs <-- submodule
+ docs-internal/... <-- sorts here because "-" < "/"
+ docs/... <-- D/F conflict with "docs" above, not adjacent
+
+In order to test verify_cache, we directly construct a corrupt index
+(bypassing the D/F safety checks in add_index_entry) and verify that
+write-tree rejects it.
+'
+
+. ./test-lib.sh
+
+if ! test_have_prereq PERL
+then
+ skip_all='skipping verify_cache D/F tests; Perl not available'
+ test_done
+fi
+
+# Build a v2 index from entries on stdin, bypassing D/F checks.
+# Each line: "octalmode hex-oid name" (entries must be pre-sorted).
+build_corrupt_index () {
+ perl "$TEST_DIRECTORY/t0093-direct-index-write.pl" >"$1"
+}
+
+test_expect_success 'setup objects' '
+ test_commit base &&
+ BLOB=$(git rev-parse HEAD:base.t) &&
+ SUB_COMMIT=$(git rev-parse HEAD)
+'
+
+test_expect_success 'adjacent D/F conflict is caught by verify_cache' '
+ cat >index-entries <<-EOF &&
+ 0160000 $SUB_COMMIT docs
+ 0100644 $BLOB docs/requirements.txt
+ EOF
+ build_corrupt_index .git/index <index-entries &&
+
+ test_must_fail git write-tree 2>err &&
+ test_grep "You have both docs and docs/requirements.txt" err
+'
+
+test_expect_success 'non-adjacent D/F conflict is caught by verify_cache' '
+ cat >index-entries <<-EOF &&
+ 0160000 $SUB_COMMIT docs
+ 0100644 $BLOB docs-internal/README.md
+ 0100644 $BLOB docs/requirements.txt
+ EOF
+ build_corrupt_index .git/index <index-entries &&
+
+ test_must_fail git write-tree 2>err &&
+ test_grep "You have both docs and docs/requirements.txt" err
+'
+
+test_done
--
gitgitgadget
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper()
2026-04-21 0:26 ` [PATCH 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
@ 2026-06-01 12:13 ` Junio C Hamano
0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2026-06-01 12:13 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Elijah Newren <newren@gmail.com>
>
> traverse_trees_wrapper() saves entries from a first pass through
> traverse_trees() and then replays them through the real callback
> (collect_merge_info_callback). However, the replay loop silently
> discards the callback return value. This means any error reported by
> the callback during replay -- including a future check for malformed
> trees -- would be ignored, allowing the merge to proceed with corrupt
> state.
>
> Capture the return value, stop the loop on negative (error) returns,
> and propagate the error to the caller. Note that the callback returns
> a positive mask value on success, so we normalize non-negative returns
> to 0 for the caller.
All makes perfect sense.
How would the externally visible behaviour change at this step?
Upon an error from the callback, we used to keep going and processed
other callback data in the renames structure. We now leave the rest
unprocessed.
The caller of this helper would never have seen a failure, but now
they will. Both callers, collect_merge_info_callback() and
handle_deferred_entries(), are reacting to a negative "error" return
well (perhaps because they sometimes call traverse_trees() in the
same control flow, which does return an error already), so
presumably there is no downside caused by aborting the innermost
process upon the first error return.
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> merge-ort.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 00923ce3cd..4b8e32209d 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -1008,18 +1008,20 @@ static int traverse_trees_wrapper(struct index_state *istate,
> info->traverse_path = renames->callback_data_traverse_path;
> info->fn = old_fn;
> for (i = old_offset; i < renames->callback_data_nr; ++i) {
> - info->fn(n,
> - renames->callback_data[i].mask,
> - renames->callback_data[i].dirmask,
> - renames->callback_data[i].names,
> - info);
> + ret = info->fn(n,
> + renames->callback_data[i].mask,
> + renames->callback_data[i].dirmask,
> + renames->callback_data[i].names,
> + info);
> + if (ret < 0)
> + break;
> }
>
> renames->callback_data_nr = old_offset;
> free(renames->callback_data_traverse_path);
> renames->callback_data_traverse_path = old_callback_data_traverse_path;
> info->traverse_path = NULL;
> - return 0;
> + return ret < 0 ? ret : 0;
> }
>
> static void setup_path_info(struct merge_options *opt,
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info()
2026-04-21 0:26 ` [PATCH 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
@ 2026-06-01 12:23 ` Junio C Hamano
0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2026-06-01 12:23 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Elijah Newren <newren@gmail.com>
>
> collect_merge_info() has set info.show_all_errors = 1 since
> d2bc1994f363 (merge-ort: implement a very basic collect_merge_info(),
> 2020-12-13). This setting was copied from unpack-trees.c where it
> controls batching of error messages for porcelain display, but
> merge-ort has no such error-batching logic and never needed it.
>
> With show_all_errors set, traverse_trees() captures a negative callback
> return but continues processing remaining entries rather than stopping
> immediately. Removing the setting restores the default behavior where
> a negative return from collect_merge_info_callback() breaks out of the
> traversal loop right away, allowing a future commit to exit early when
> a corrupt tree is detected.
Nice spotting. As the error handling eventually is to die without
making any further damange, returning early without seeing "more
errors" is a good change.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
> merge-ort.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 4b8e32209d..74e9636020 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -1740,7 +1740,6 @@ static int collect_merge_info(struct merge_options *opt,
> setup_traverse_info(&info, opt->priv->toplevel_dir);
> info.fn = collect_merge_info_callback;
> info.data = opt;
> - info.show_all_errors = 1;
>
> if (repo_parse_tree(opt->repo, merge_base) < 0 ||
> repo_parse_tree(opt->repo, side1) < 0 ||
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] merge-ort: abort merge when trees have duplicate entries
2026-04-21 0:26 ` [PATCH 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
@ 2026-06-01 12:23 ` Junio C Hamano
0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2026-06-01 12:23 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Elijah Newren <newren@gmail.com>
>
> Trees with duplicate entries are malformed; fsck reports "contains
> duplicate file entries" for them. merge-ort has from the beginning
> assumed that we would never hit such trees. It was written with the
> assumption that traverse_trees() calls collect_merge_info_callback() at
> most once per path. The "sanity checks" in that callback (added in
> d2bc1994f363 (merge-ort: implement a very basic collect_merge_info(),
> 2020-12-13)) verify properties of each individual call but not that
> invariant. The strmap_put() in setup_path_info() silently overwrites
> the entry from any prior call for the same path, because it assumed
> there would be no other path. Unfortunately, supplemental data
> structures for various optimizations could still be tweaked before the
> extra paths were overwritten, and those data structures not matching
> expected state could trip various assertions.
>
> Change the return type of setup_path_info() from void to int to allow us
> to detect this case, and abort the merge with a clear error message when
> it occurs.
OK.
> @@ -1081,9 +1081,11 @@ static void setup_path_info(struct merge_options *opt,
> */
> mi->is_null = 1;
> }
> - strmap_put(&opt->priv->paths, fullpath, mi);
> + if (strmap_put(&opt->priv->paths, fullpath, mi))
> + return error(_("tree has duplicate entries for '%s'"), fullpath);
OK. I was wondering what _other_ kind of malformed trees would the
updated code by this change is prepared to handle (most notably,
tree entries must be sorted, and one way to detect duplicate is to
remember one single path that we saw earlier, which would work as
long as the entries are sorted). This "ah, we saw that path already"
approach is much more robust in that it does not have to depend on a
sorted tree.
Makes sense.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] Duplicate entry hardening
2026-04-21 0:26 [PATCH 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
` (4 preceding siblings ...)
2026-04-21 0:26 ` [PATCH 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget
@ 2026-06-01 12:33 ` Junio C Hamano
2026-06-01 13:54 ` Patrick Steinhardt
5 siblings, 1 reply; 14+ messages in thread
From: Junio C Hamano @ 2026-06-01 12:33 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> We had some corrupt trees with duplicate entries in real world repositories,
> which triggered an assertion failure in merge-ort. Further, the corrupt tree
> creation in the third party tool would have been avoided had verify_cache()
> correctly checked for D/F conflicts. Provide fixes for both issues,
> including 3 preparatory changes for the merge-ort fix.
>
> Elijah Newren (5):
> merge-ort: propagate callback errors from traverse_trees_wrapper()
> merge-ort: drop unnecessary show_all_errors from collect_merge_info()
> merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
> merge-ort: abort merge when trees have duplicate entries
> cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
This is a fix to an important corner of our system, but somehow left
in "Needs review" state for much longer than I would have liked, so
even though I am officially on vacation ;-), I took some time to
read these through (by the way it was a pleasant read, thank you).
I wonder if we create a rule like
Those of you who have more than 30 commits in our project are
expected to review one topic (or more) from other contributors
for every three patches you send and ask for reviews by others.
it would help balance the patch vs review ratio, perhaps?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
2026-04-21 0:26 ` [PATCH 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget
@ 2026-06-01 12:33 ` Junio C Hamano
0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2026-06-01 12:33 UTC (permalink / raw)
To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> I could not find any caller in current git that both allows the index to
> get into this state and then tries to write it out without doing other
> checks beyond the verify_cache() call in cache_tree_update(), but
> verify_cache() is documented as a safety net for preventing corrupt
> trees and should actually provide that guarantee.
Oh, absolutely. This kind of tightening is very much appreciated.
> diff --git a/cache-tree.c b/cache-tree.c
> index 7881b42aa2..f11844fe72 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -192,22 +192,62 @@ static int verify_cache(struct index_state *istate, int flags)
> for (i = 0; i + 1 < istate->cache_nr; i++) {
> /* path/file always comes after path because of the way
> * the cache is sorted. Also path can appear only once,
> - * which means conflicting one would immediately follow.
> + * so path/file is likely the immediately following path
> + * but might be separated if there is e.g. a
> + * path-internal/... file.
> */
> const struct cache_entry *this_ce = istate->cache[i];
> const struct cache_entry *next_ce = istate->cache[i + 1];
> const char *this_name = this_ce->name;
> const char *next_name = next_ce->name;
> int this_len = ce_namelen(this_ce);
> + const char *conflict_name = NULL;
> +
> if (this_len < ce_namelen(next_ce) &&
> - next_name[this_len] == '/' &&
> + next_name[this_len] <= '/' &&
> strncmp(this_name, next_name, this_len) == 0) {
> + if (next_name[this_len] == '/') {
> + conflict_name = next_name;
> + } else if (next_name[this_len] < '/') {
> + /*
> + * The immediately next entry shares our
> + * prefix but sorts before "path/" (e.g.,
> + * "path-internal" between "path" and
> + * "path/file", since '-' (0x2D) < '/'
> + * (0x2F)). Binary search to find where
> + * "path/" would be and check for a D/F
> + * conflict there.
> + */
> + struct cache_entry *other;
> + struct strbuf probe = STRBUF_INIT;
> + int pos;
> +
> + strbuf_add(&probe, this_name, this_len);
> + strbuf_addch(&probe, '/');
> + pos = index_name_pos_sparse(istate,
> + probe.buf,
> + probe.len);
> + strbuf_release(&probe);
> +
> + if (pos < 0)
> + pos = -pos - 1;
> + if (pos >= (int)istate->cache_nr)
> + continue;
> + other = istate->cache[pos];
> + if (ce_namelen(other) > this_len &&
> + other->name[this_len] == '/' &&
> + !strncmp(this_name, other->name, this_len))
> + conflict_name = other->name;
> + }
> + }
The narrow and tall comment block is a sign that this loop is
getting too deeply nested. I wonder if it makes it easier to follow
if we extract this new logic into a small helper function on its
own?
What the code checks and how it does so both make sense to me, though.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] Duplicate entry hardening
2026-06-01 12:33 ` [PATCH 0/5] Duplicate entry hardening Junio C Hamano
@ 2026-06-01 13:54 ` Patrick Steinhardt
2026-06-12 13:29 ` Automated reviews by AI (was Re: [PATCH 0/5] Duplicate entry hardening) Christian Couder
0 siblings, 1 reply; 14+ messages in thread
From: Patrick Steinhardt @ 2026-06-01 13:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren
On Mon, Jun 01, 2026 at 09:33:10PM +0900, Junio C Hamano wrote:
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > We had some corrupt trees with duplicate entries in real world repositories,
> > which triggered an assertion failure in merge-ort. Further, the corrupt tree
> > creation in the third party tool would have been avoided had verify_cache()
> > correctly checked for D/F conflicts. Provide fixes for both issues,
> > including 3 preparatory changes for the merge-ort fix.
> >
> > Elijah Newren (5):
> > merge-ort: propagate callback errors from traverse_trees_wrapper()
> > merge-ort: drop unnecessary show_all_errors from collect_merge_info()
> > merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
> > merge-ort: abort merge when trees have duplicate entries
> > cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
>
> This is a fix to an important corner of our system, but somehow left
> in "Needs review" state for much longer than I would have liked, so
> even though I am officially on vacation ;-), I took some time to
> read these through (by the way it was a pleasant read, thank you).
Honestly, I always shy away from the merge-related subsystems. It has a
lot of subtleties that I don't have any experience with, so I never
really consider my input to be helpful here.
> I wonder if we create a rule like
>
> Those of you who have more than 30 commits in our project are
> expected to review one topic (or more) from other contributors
> for every three patches you send and ask for reviews by others.
Heh, that would make me condense patch series into fewer patches ;)
> it would help balance the patch vs review ratio, perhaps?
It's a good question. I typically try to aim for reviewing series on the
mailing list at least every second day, and I always encourage other
folks in my team to do the same. But recently I (well, rather we)
haven't really been able to due to the current situation at GitLab,
which forces us to put almost all of our focus towards a different
project for a while.
Overall I agree that everyone who is a core contributor should also make
reviews part of their regular worflow. At least for corporate
contributors that might also make it easier to communicate this to their
respective employers. Regardless of that, my expectation is that there
will be times where it works well, and other times where it works less
well.
Patrick
^ permalink raw reply [flat|nested] 14+ messages in thread
* Automated reviews by AI (was Re: [PATCH 0/5] Duplicate entry hardening)
2026-06-01 13:54 ` Patrick Steinhardt
@ 2026-06-12 13:29 ` Christian Couder
2026-06-12 19:32 ` Junio C Hamano
0 siblings, 1 reply; 14+ messages in thread
From: Christian Couder @ 2026-06-12 13:29 UTC (permalink / raw)
To: Patrick Steinhardt
Cc: Junio C Hamano, Elijah Newren via GitGitGadget, git,
Elijah Newren, Konstantin Ryabitsev, Taylor Blau
On Tue, Jun 2, 2026 at 8:16 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Mon, Jun 01, 2026 at 09:33:10PM +0900, Junio C Hamano wrote:
> > This is a fix to an important corner of our system, but somehow left
> > in "Needs review" state for much longer than I would have liked, so
> > even though I am officially on vacation ;-), I took some time to
> > read these through (by the way it was a pleasant read, thank you).
>
> Honestly, I always shy away from the merge-related subsystems. It has a
> lot of subtleties that I don't have any experience with, so I never
> really consider my input to be helpful here.
>
> > I wonder if we create a rule like
> >
> > Those of you who have more than 30 commits in our project are
> > expected to review one topic (or more) from other contributors
> > for every three patches you send and ask for reviews by others.
>
> Heh, that would make me condense patch series into fewer patches ;)
>
> > it would help balance the patch vs review ratio, perhaps?
>
> It's a good question. I typically try to aim for reviewing series on the
> mailing list at least every second day, and I always encourage other
> folks in my team to do the same. But recently I (well, rather we)
> haven't really been able to due to the current situation at GitLab,
> which forces us to put almost all of our focus towards a different
> project for a while.
>
> Overall I agree that everyone who is a core contributor should also make
> reviews part of their regular worflow. At least for corporate
> contributors that might also make it easier to communicate this to their
> respective employers. Regardless of that, my expectation is that there
> will be times where it works well, and other times where it works less
> well.
Sashiko (https://github.com/sashiko-dev/sashiko) is used these days by
Linux kernel developers and seems to work well for them.
At GitLab and probably in other companies, some of us also use AI to
review our work before sending it to the mailing list. And yeah, it
helps find issues before our patches reach the mailing list.
In the same way as we require that patches must pass CI, do we want to
require that patches "pass" an AI review before they get accepted?
The benefit would be that it would hopefully catch a lot of trivial
things like indentation, typos/grammos, etc, and a lot of things a bit
more difficult to spot like memory issues. Perhaps with some amount of
prompting/configuration (for example pointing it at our
CodingGuidelines and SubmittingPatches) it could also catch issues
like style issues, commits that do too many things, refactoring
opportunities, etc.
We would likely still require at least one human review (by someone
who is not the maintainer) to validate architectural decisions, to
make sure it goes in the same direction as other efforts, and perhaps
also to make sure that AI suggestions were properly handled by the
patch author.
If we decide to require it, then there are a lot of questions that we
will have to answer.
Do we want to have our own system somehow managed by us or would we be
happy to use existing systems already in place in some companies as
long as we can still tweak them in some ways, like the current CI
systems we use?
If we use existing systems likely at GitLab and GitHub, it might be
more difficult to get coherent results as they might use different
LLMs, but maybe it could help tighten our docs to make sure everyone
is aligned, and we could get better reviews by using multiple systems
because an LLM might find an issue that the other LLM missed.
Do we want an AI review right after a patch is posted or only if there
is no human review in the next X days?
Also what if the AI makes a long concrete suggestion to improve on the
patches? Could that be incompatible with our AI policy to apply it?
Should we try to prevent the AI from making such a suggestion in the
first place?
I haven't looked at how Sashiko is used for the kernel, but maybe
there will need to be some kinds of restrictions/authentications to
avoid potential abuse.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Automated reviews by AI (was Re: [PATCH 0/5] Duplicate entry hardening)
2026-06-12 13:29 ` Automated reviews by AI (was Re: [PATCH 0/5] Duplicate entry hardening) Christian Couder
@ 2026-06-12 19:32 ` Junio C Hamano
0 siblings, 0 replies; 14+ messages in thread
From: Junio C Hamano @ 2026-06-12 19:32 UTC (permalink / raw)
To: Christian Couder
Cc: Patrick Steinhardt, Elijah Newren via GitGitGadget, git,
Elijah Newren, Konstantin Ryabitsev, Taylor Blau
Christian Couder <christian.couder@gmail.com> writes:
> On Tue, Jun 2, 2026 at 8:16 AM Patrick Steinhardt <ps@pks.im> wrote:
>>
>> Overall I agree that everyone who is a core contributor should also make
>> reviews part of their regular worflow. At least for corporate
>> contributors that might also make it easier to communicate this to their
>> respective employers. Regardless of that, my expectation is that there
>> will be times where it works well, and other times where it works less
>> well.
>
> Sashiko (https://github.com/sashiko-dev/sashiko) is used these days by
> Linux kernel developers and seems to work well for them.
>
> At GitLab and probably in other companies, some of us also use AI to
> review our work before sending it to the mailing list. And yeah, it
> helps find issues before our patches reach the mailing list.
>
> In the same way as we require that patches must pass CI, do we want to
> require that patches "pass" an AI review before they get accepted?
I do not think so. You (figuratively, not limited to Christian
Couder) are welcome to use whatever tool available to you to help
you polish your submission, and the higher quality your patches are
(e.g., fewer typos and jumps in logic flow that interferes the
thought process of human reviewers), the more helpful you are being
to the community. The use of GitHub PR initiated CI run falls into
the same category, I think, in that we do not require you to have an
account and trigger the CI there, but you are doing a good service
if you made sure you caught breakages on macOS you do not have
access to otherwise before sending your patches to the list.
But I do not think we should require you to bring your own token
budget to be able to contribute.
> The benefit would be that it would hopefully catch a lot of trivial
> things like indentation, typos/grammos, etc, and a lot of things a bit
> more difficult to spot like memory issues. Perhaps with some amount of
> prompting/configuration (for example pointing it at our
> CodingGuidelines and SubmittingPatches) it could also catch issues
> like style issues, commits that do too many things, refactoring
> opportunities, etc.
Yes.
Similarly, you are welcome to use tools including AI tools to help
you review others' patches, or help sanity check your reviews of
others' patches before you send them out. The reason why such an
effort is valuable to the community is the same.
But I personally consider that the use of the tools (not limited to
AI tools) is up to each developer. What counts a lot more is the
quality of the output. Just like PR driven CI at GitHub is offered
to everybody who wants to participate and is willing to have an
account there, it may help those aspiring developers if automated
review services are made easily available, but it is a different
story to _require_ use of such service.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-06-12 19:32 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 0:26 [PATCH 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
2026-04-21 0:26 ` [PATCH 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
2026-06-01 12:13 ` Junio C Hamano
2026-04-21 0:26 ` [PATCH 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
2026-06-01 12:23 ` Junio C Hamano
2026-04-21 0:26 ` [PATCH 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() Elijah Newren via GitGitGadget
2026-04-21 0:26 ` [PATCH 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
2026-06-01 12:23 ` Junio C Hamano
2026-04-21 0:26 ` [PATCH 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget
2026-06-01 12:33 ` Junio C Hamano
2026-06-01 12:33 ` [PATCH 0/5] Duplicate entry hardening Junio C Hamano
2026-06-01 13:54 ` Patrick Steinhardt
2026-06-12 13:29 ` Automated reviews by AI (was Re: [PATCH 0/5] Duplicate entry hardening) Christian Couder
2026-06-12 19:32 ` Junio C Hamano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.