git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Philip Oakley <philipoakley@iee.email>
Cc: Jeremy Pridmore <jpridmore@rdt.co.uk>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	 Paul Baumgartner <pbaumgartner@rdt.co.uk>
Subject: Re: Git Rename Detection Bug
Date: Sat, 11 Nov 2023 07:13:28 -0800	[thread overview]
Message-ID: <CABPp-BEtva2WTGQG3Qs4EbZLK_RJC9vuA-2OYxkTPExgowwvqQ@mail.gmail.com> (raw)
In-Reply-To: <9baca4af-a570-4b7a-a1ee-de91b809e79c@iee.email>

Hi,

On Sat, Nov 11, 2023 at 3:08 AM Philip Oakley <philipoakley@iee.email> wrote:
>
> Hi all,
>
> On 11/11/2023 05:46, Elijah Newren wrote:
> > The fact that you were trying to "undo" renames and "redo the correct
> > ones" suggested there's something you still didn't understand about
> > rename detection, though.
>
>
> Could I suggest that we are missing a piece of terminology, to wit,
> BLOBSAME. It's a compatriot to TREESAME, as used in `git log` for
> history simplification (based on a tree's pathspec, most commonly a
> commit's top level path).

We could add it, but I'm not sure how it helps.  We already had 'exact
rename' which seems to fit the bill as well, and 'blob' is something
someone new to Git is unlikely to know.

Perhaps it's useful in some other context, though?

> File rename, at it's most basic, is when the blob associated with that
> changed path is identical, i.e. BLOBSAME. There is no need to 'record'
> the action of renaming, moving or whatever, the content sameness is
> right there, in plain sight, as an identical blob name.   After that
> (files with slight variations) it is a load of heuristics, but starting
> with BLOBSAME we see how easy the basic rename detection is, and why
> renames (and de-dup) don't need recording.

This is incorrect.  Let's say you have a file foo:
   * base version: foo has hash A
   * our version: foo has been renamed to bar, but bar still has hash A
   * their version: foo has been modified; it now has hash B

The foo->bar is an exact rename (or they are BLOBSAME if you prefer),
but the renaming/moving/whatever is a critical piece of information
because the changes to foo in 'their' version need to be applied to
bar to get the correct end results.

I do not know if in Jeremy's case foo has been modified on the
unrenamed side.  But the following hypothetical is exactly the type of
problem Jeremy is hitting: what should happen when 'our' version has
both a new 'bar' and a new 'baz' file that each have hash A?  In that
case, to which one was foo renamed?  It's inherently ambiguous.

> The heuristics of 'rename with small change' is trickier, but for a
> basic understanding, starting at BLOBSAME (and TREESAME for directory
> renames) should make it easier to grasp the concepts.

Interesting; TREESAME isn't used within directory rename detection
currently; it is only used currently when two (or three) trees with
the same name are TREESAME, in order to potentially avoid recursing
into the tree.  But even then, having two trees with the same name be
TREESAME isn't enough on its own to avoid recursing into that tree,
because the other side could have added files within the same-named
tree and we need to know about those added files because they could be
part of renames involving other files outside that tree.  There would
probably be similar challenges to attempting to apply the concept of
TREESAME to directory rename detection to two trees of different
names, but it's at least an interesting idea.  Hmm....

  reply	other threads:[~2023-11-11 15:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-06 12:00 Git Rename Detection Bug Jeremy Pridmore
2023-11-07  8:05 ` Elijah Newren
2023-11-10 11:28   ` Jeremy Pridmore
2023-11-11  5:46     ` Elijah Newren
2023-11-11 11:08       ` Philip Oakley
2023-11-11 15:13         ` Elijah Newren [this message]
2023-11-12 23:09           ` Junio C Hamano
2023-11-15 15:35             ` Philip Oakley
2023-11-15 14:36           ` Philip Oakley
2023-11-16  6:26             ` Elijah Newren
2023-11-15 16:51       ` Philip Oakley
2023-12-24  7:46         ` Elijah Newren
2023-12-28 15:33           ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BEtva2WTGQG3Qs4EbZLK_RJC9vuA-2OYxkTPExgowwvqQ@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jpridmore@rdt.co.uk \
    --cc=pbaumgartner@rdt.co.uk \
    --cc=philipoakley@iee.email \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).