Git development
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Patrick Steinhardt <ps@pks.im>,
	 Christian Couder <christian.couder@gmail.com>,
	 Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH v2 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
Date: Sun, 14 Jun 2026 08:22:38 -0700	[thread overview]
Message-ID: <xmqq5x3ldu4h.fsf@gitster.g> (raw)
In-Reply-To: <cf50f1aabcf84aa755808318756c233305cc008d.1781419047.git.gitgitgadget@gmail.com> (Elijah Newren via GitGitGadget's message of "Sun, 14 Jun 2026 06:37:26 +0000")

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/cache-tree.c b/cache-tree.c
> index 7881b42aa2..4d2669b312 100644
> --- a/cache-tree.c
> +++ b/cache-tree.c
> @@ -161,6 +161,54 @@ void cache_tree_invalidate_path(struct index_state *istate, const char *path)
>  		istate->cache_changed |= CACHE_TREE_CHANGED;
>  }
>  
> +/*
> + * Check whether this_ce and the next entry in the index form a D/F
> + * conflict ("path" vs "path/file").  Returns the conflicting "path/..."
> + * name when one is found, or NULL otherwise.
> + *
> + * The cache is sorted, so "path/file" sorts after "path" and the
> + * conflict is usually visible as adjacent entries.  But other entries
> + * can sort between them -- e.g. "path-internal" sits between "path"
> + * and "path/file" because '-' (0x2D) precedes '/' (0x2F) -- so when
> + * the immediately following entry shares our prefix but starts with a
> + * character that sorts before '/', binary search for "path/" instead.
> + */
> +static const char *find_df_conflict(struct index_state *istate,
> +				    const struct cache_entry *this_ce,
> +				    const struct cache_entry *next_ce)
> +{
> +	const char *this_name = this_ce->name;
> +	const char *next_name = next_ce->name;
> +	int this_len = ce_namelen(this_ce);
> +	const struct cache_entry *other;
> +	struct strbuf probe = STRBUF_INIT;
> +	int pos;

> +	if (this_len >= ce_namelen(next_ce) ||
> +	    next_name[this_len] > '/' ||
> +	    strncmp(this_name, next_name, this_len))
> +		return NULL;

First we reject an abvious "cannot conflict" case. If the current
one is not strictly shorter than the next one, it cannot be an
overlapping with one of the leading directories in the next one and
the next one cannot be "hiding" an overlapping one in the "docs,
docs-i, docs/r" relationship described in the log message (where
next=="docs-i' hides the fact that this=="docs" conflicts with
"docs/r").  Of course, the current one cannot be such a confliciting
entry, if an earlier part of the next name does not entirely match
the current name.  In addition, if the byte after the matching
leading part in the next name is larger than '/', any entry that
comes later than the next name cannot have '/' at that byte position.

Makes sense.

> +	if (next_name[this_len] == '/')
> +		return next_name;

And if we see '/' there, we definitely have conflict right there.

So these trivial cases after us, we need to see if an entry that
comes after next has a name whose leading part is identical to this
name, plus '/'.  How would we compute it?

> +	strbuf_add(&probe, this_name, this_len);
> +	strbuf_addch(&probe, '/');
> +	pos = index_name_pos_sparse(istate, probe.buf, probe.len);
> +	strbuf_release(&probe);

We see where the first of such a name (i.e. this name followed by '/')
may appear.  Normal cache entries in the index would never have a
trailing slash in their names, so I expect that this would either
find an directory that is sparsed out and returns a non-negative
index into the .cache[] array, or a negative index that indicates
where a name like this + '/' + anything would appear.

> +
> +	if (pos < 0)
> +		pos = -pos - 1;

So in order to check both cases, we do the usual index flipping.  We
do not need to to treat "it exists---that's sparse dir" case and "it
is expanded and we are trying to find the first entry in that index"
case differently below, which simplifies things a bit.

> +	if (pos >= (int)istate->cache_nr)
> +		return NULL;

OK, such an entry whose name is this followed by '/' does not exist
so we are safe.

> +	other = istate->cache[pos];
> +	if (ce_namelen(other) > this_len &&
> +	    other->name[this_len] == '/' &&
> +	    !strncmp(this_name, other->name, this_len))
> +		return other->name;
> +	return NULL;

This looks very similar to the inverse of the earlier one but
slightly different.  It might be easier to spot where it differs, if
we flipped the polarity, like so, perhaps?

	other = istate->cache[pos];
	if (this_len >= ce_namelen(other) ||
	    other->name[this_len] != '/' ||
	    strncmp(this_name, other->name, this_len))
		return NULL;

	return other->name;

Or perhaps not.  I do not care very strongly, but I only spotted the
subtle difference while attempting to flip the polarity in my head,
so it might help ther readers the same way.  I dunno.

Thanks, looking very good.


      reply	other threads:[~2026-06-14 15:22 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21  0:26 [PATCH 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
2026-04-21  0:26 ` [PATCH 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
2026-06-01 12:13   ` Junio C Hamano
2026-06-14  3:16     ` Elijah Newren
2026-04-21  0:26 ` [PATCH 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
2026-06-01 12:23   ` Junio C Hamano
2026-04-21  0:26 ` [PATCH 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() Elijah Newren via GitGitGadget
2026-04-21  0:26 ` [PATCH 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
2026-06-01 12:23   ` Junio C Hamano
2026-04-21  0:26 ` [PATCH 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget
2026-06-01 12:33   ` Junio C Hamano
2026-06-14  3:16     ` Elijah Newren
2026-06-01 12:33 ` [PATCH 0/5] Duplicate entry hardening Junio C Hamano
2026-06-01 13:54   ` Patrick Steinhardt
2026-06-12 13:29     ` Automated reviews by AI (was Re: [PATCH 0/5] Duplicate entry hardening) Christian Couder
2026-06-12 19:32       ` Junio C Hamano
2026-06-14  6:37 ` [PATCH v2 0/5] Duplicate entry hardening Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 1/5] merge-ort: propagate callback errors from traverse_trees_wrapper() Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 2/5] merge-ort: drop unnecessary show_all_errors from collect_merge_info() Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 3/5] merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 4/5] merge-ort: abort merge when trees have duplicate entries Elijah Newren via GitGitGadget
2026-06-14  6:37   ` [PATCH v2 5/5] cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts Elijah Newren via GitGitGadget
2026-06-14 15:22     ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq5x3ldu4h.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox