git rename/moved status unreliable in ruby

Git development
 help / color / mirror / Atom feed

* git rename/moved status unreliable in ruby
@ 2026-05-01  5:05 sebastien.stettler
  2026-05-01 15:30 ` Phillip Wood
  2026-05-02  8:06 ` Chris Torek
  0 siblings, 2 replies; 9+ messages in thread
From: sebastien.stettler @ 2026-05-01  5:05 UTC (permalink / raw)
  To: git@vger.kernel.org

1. What did you do before the bug happened? (Steps to reproduce your issue)

when moving ruby classes between namespaces they are marked as new files and the old
ones are marked as deleted

if i only change the class name it will mark it as renamed

2. What did you expect to happen? (Expected behavior)

in the namespace state i would expected it to be marked as moved since
nothing has fundementally changed

3. What happened instead? (Actual behavior)

the file was marked as new file and the old file was marked as deleted

What's different between what you expected and what actually happened

4. Anything else you want to add:

I have demonstrated the behavior here https://github.com/billybonks/git-rename

Mostly i would like to understand what is the expectation from gits point of view in these mutations.
If this is considered something that can be improved i am happy to build out more test cases, and help with implementation.

if not, understanding the reasoning would be great

Thank you.



[System Info]
git version:
git version 2.47.1
cpu: arm64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
libcurl: 8.7.1
zlib: 1.2.12
uname: Darwin 25.3.0 Darwin Kernel Version 25.3.0: Wed Jan 28 20:51:28 PST 2026; root:xnu-12377.91.3~2/RELEASE_ARM64_T6041 arm64
compiler info: clang: 16.0.0 (clang-1600.0.26.4)
libc info: no libc information available
$SHELL (typically, interactive shell): /bin/zsh


[Enabled Hooks]


Sent with Proton Mail secure email.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git rename/moved status unreliable in ruby
  2026-05-01  5:05 git rename/moved status unreliable in ruby sebastien.stettler
@ 2026-05-01 15:30 ` Phillip Wood
  2026-05-02  7:25   ` Johannes Sixt
  2026-05-03 21:59   ` Junio C Hamano
  2026-05-02  8:06 ` Chris Torek
  1 sibling, 2 replies; 9+ messages in thread
From: Phillip Wood @ 2026-05-01 15:30 UTC (permalink / raw)
  To: sebastien.stettler, git@vger.kernel.org

Hi Sebastien

On 01/05/2026 06:05, sebastien.stettler wrote:
> 1. What did you do before the bug happened? (Steps to reproduce your issue)
> 
> when moving ruby classes between namespaces they are marked as new files and the old
> ones are marked as deleted
> 
> if i only change the class name it will mark it as renamed

Rename detection is based on how similar the two files are. Looking at 
the example you linked to below you're changing a file that looks like

module Math
   class Calculator
     def add(a, b)
       a + b
       ...
     end
   end
end

to

module Math
   module Calculators
     class Calculator
       def add(a, b)
         a + b
         ...
       end
     end
   end
end

Which means that git sees that every line has changed because the 
indentation has changed. If you want git to realize that the file has 
been renamed you could move it in one commit and then add modify it in 
the next commit.

Thanks

Phillip

> 2. What did you expect to happen? (Expected behavior)
> 
> in the namespace state i would expected it to be marked as moved since
> nothing has fundementally changed
> 
> 3. What happened instead? (Actual behavior)
> 
> the file was marked as new file and the old file was marked as deleted
> 
> What's different between what you expected and what actually happened
> 
> 4. Anything else you want to add:
> 
> I have demonstrated the behavior here https://github.com/billybonks/git-rename
> 
> Mostly i would like to understand what is the expectation from gits point of view in these mutations.
> If this is considered something that can be improved i am happy to build out more test cases, and help with implementation.
> 
> if not, understanding the reasoning would be great
> 
> Thank you.
> 
> 
> 
> [System Info]
> git version:
> git version 2.47.1
> cpu: arm64
> no commit associated with this build
> sizeof-long: 8
> sizeof-size_t: 8
> shell-path: /bin/sh
> feature: fsmonitor--daemon
> libcurl: 8.7.1
> zlib: 1.2.12
> uname: Darwin 25.3.0 Darwin Kernel Version 25.3.0: Wed Jan 28 20:51:28 PST 2026; root:xnu-12377.91.3~2/RELEASE_ARM64_T6041 arm64
> compiler info: clang: 16.0.0 (clang-1600.0.26.4)
> libc info: no libc information available
> $SHELL (typically, interactive shell): /bin/zsh
> 
> 
> [Enabled Hooks]
> 
> 
> Sent with Proton Mail secure email.
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git rename/moved status unreliable in ruby
  2026-05-01 15:30 ` Phillip Wood
@ 2026-05-02  7:25   ` Johannes Sixt
  2026-05-03 21:59   ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Johannes Sixt @ 2026-05-02  7:25 UTC (permalink / raw)
  To: phillip.wood; +Cc: Git Mailing List, sebastien.stettler

Am 01.05.26 um 17:30 schrieb Phillip Wood:
> Rename detection is based on how similar the two files are. Looking at
> the example you linked to below you're changing a file that looks like
> 
> ...
> 
> Which means that git sees that every line has changed because the
> indentation has changed.

That's correct, of course, but...

> If you want git to realize that the file has
> been renamed you could move it in one commit and then add modify it in
> the next commit.

... this is a fallacy. Splitting into two commits helps only certain
cases, in particular, when the commit that moves the files is compared
to an earlier commit, such as `git log` does. However, if a commit after
the change is compared to a commit before the move, the rename is still
not detected.

-- Hannes


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git rename/moved status unreliable in ruby
  2026-05-01 15:30 ` Phillip Wood
  2026-05-02  7:25   ` Johannes Sixt
@ 2026-05-03 21:59   ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2026-05-03 21:59 UTC (permalink / raw)
  To: Phillip Wood; +Cc: sebastien.stettler, git@vger.kernel.org

Phillip Wood <phillip.wood123@gmail.com> writes:

> Hi Sebastien
>
> On 01/05/2026 06:05, sebastien.stettler wrote:
>> 1. What did you do before the bug happened? (Steps to reproduce your issue)
>> 
>> when moving ruby classes between namespaces they are marked as new files and the old
>> ones are marked as deleted
>> 
>> if i only change the class name it will mark it as renamed
>
> Rename detection is based on how similar the two files are. Looking at 
> the example you linked to below you're changing a file that looks like
> ...
> Which means that git sees that every line has changed because the 
> indentation has changed. If you want git to realize that the file has 
> been renamed you could move it in one commit and then add modify it in 
> the next commit.

I've seen this repeated many times, but it is misleading to give it
without qualifying when that "works" and when it does not.  Such a
"stick to pure rename and make huge changes elsewhere" strategy
would help your "git log -M [--follow]" and possibly "git rebase",
but it would not help all that much if you are doing "git diff" or
"git merge".


    

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git rename/moved status unreliable in ruby
  2026-05-01  5:05 git rename/moved status unreliable in ruby sebastien.stettler
  2026-05-01 15:30 ` Phillip Wood
@ 2026-05-02  8:06 ` Chris Torek
  2026-05-02  9:34   ` sebastien.stettler
  2026-05-05  0:09   ` Junio C Hamano
  1 sibling, 2 replies; 9+ messages in thread
From: Chris Torek @ 2026-05-02  8:06 UTC (permalink / raw)
  To: sebastien.stettler; +Cc: git@vger.kernel.org

On Thu, Apr 30, 2026 at 10:06 PM sebastien.stettler
<sebastien.stettler@proton.me> wrote:
> ... understanding the reasoning would be great

In my opinion, the key to understanding here is this:

Git Stores Snapshots.

What this means is that every commit is a full snapshot of all of the
files for that commit. There are no "changes" at all, there is only a
full snapshot, every time.

Now, internally, the storage format is more complicated (and
compressive, ultimately using the concept of changes as well, though
not exactly the way one might expect). But from the "what things look
like" point of view, and how you should think about what Git sees,
each commit is simply a full and complete snapshot of every file. So
if you have one commit where `foo.rb` exists, and `bar.br` does not,
that snapshot has a `foo.rb` but no `bar.rb`. If you make a second
snapshot, in which `foo.rb` no longer exists but `bar.rb` does now,
that second snapshot, well, has those files.

The tricky part is that you normally ask Git to *compare* two
snapshots (at least for "what changed" purposes). When you do that,
Git extracts both snapshots and, well, compares them. If `foo.rb` has
been removed and `bar.rb` has been added, Git then goes on to compare
the *contents* of those two files.

If the contents match exactly, and you've asked Git to "find renames",
Git will always say that the file that vanished from the first commit,
only to be created identically under a new name in the second, was
"renamed", rather than the one file being deleted and the second
added.

If the contents match "fuzzily" (for some value and algorithm of
fuzz-factor), Git may also say "renamed". You can control this with
`--find-renames=<value>`. The key idea here is that Git is *finding*
renames: either exact-same-contents, or "sufficiently similar"
contents, based on remove-and-add pairs.

Since Git only *stores* snapshots, you can get two different results
from comparing the same two commits. All you have to do to get this is
to adjust whether Git checks for renames at all, and if so, to what
extent.

These rules apply to `git show`, `git diff`, `git merge`, and even the
diffstat that `git commit` optionally shows after a commit. For this
reason, all the "compare some commits" commands -- including `git
merge` -- take this `--find-renames=<value>` option. Detection of
renames can be countermanded entirely with `--no-renames`.

This is why -- and when -- making two separate commits, one with
"exact same content for deleted-file-D vs added-file-A", followed by
later changes to new file A, helps: if you compare the commit that has
file D to the middle commit, the two files match exactly, and any
rename detection you have turned on finds that rename. If you then
compare the middle commit to the final commit, file A exists in both,
so Git shows changes to file A. But as soon as you compare the
original file-D-containing commit to the final file-A-updated commit,
you run into the original issue again: to detect this as a rename, you
may need to allow rather generous rename detection.

If, in the future, Git gets fancier rename detection, comparing the
original commit directly against the final one could find the rename
automatically. So:

> If this is considered something that can be improved ...

It *could* be improved. Doing so in a way that works for more than
just some special cases -- e.g., in a way that works for ordinary
text, or graphical images, for instance, rather than just for Ruby
sources (or just C sources, or C++, or Swift, or Python, or whatever)
-- seems particularly tricky. Some degree of ignoring white-space
changes would probably help multiple cases, though.

Chris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git rename/moved status unreliable in ruby
  2026-05-02  8:06 ` Chris Torek
@ 2026-05-02  9:34   ` sebastien.stettler
  2026-05-04 10:00     ` Jeff King
  2026-05-05  0:09   ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: sebastien.stettler @ 2026-05-02  9:34 UTC (permalink / raw)
  To: phillip.wood@dunelm.org.uk, chris.torek@gmail.com
  Cc: git@vger.kernel.org, j6t@kdbg.org

Thanks for all the responses and thoughts thus far.

> Which means that git sees that every line has changed because the
> indentation has changed. If you want git to realize that the file has
> been renamed you could move it in one commit and then add modify it in
> the next commit.

This is the minimal solution to the problem but as Johannes illustrated,
there cases where that "move commit doesn't solve the problem"

> Splitting into two commits helps only certain
> cases, in particular, when the commit that moves the files is compared
> to an earlier commit, such as `git log` does. However, if a commit after
> the change is compared to a commit before the move, the rename is still
> not detected.


Further examples:

The move commit helps if i want to blame a file if i use the -W option, but if it doesnt
help with git log since the changes are now considered modifications.

git blame -w ruby-example/lib/calculator.rb
(shows previous files changes and not the indentation)
```
303f25f5 ruby-example/lib/calculator.rb (nothing    2026-05-02 16:40:51 +0800  1) module Lib
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800  2)   class Calculator
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800  3)     def add(a, b)
3d22d303 ruby-example/calculator.rb     (billybonks 2026-05-02 16:34:14 +0800  4)       a + b
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800  5)     end
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800  6)
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800  7)     def subtract(a, b)
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800  8)       a - b
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800  9)     end
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800 10)
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800 11)     def multiply(a, b)
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800 12)       a * b
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800 13)     end
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800 14)
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800 15)     def divide(a, b)
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800 16)       raise ZeroDivisionError, 'Cannot divide by zero' if b.zero?
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800 17)
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800 18)       a / b
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800 19)     end
^1cf274f ruby-example/calculator.rb     (billybonks 2026-05-01 12:36:47 +0800 20)   end
303f25f5 ruby-example/lib/calculator.rb (nothing    2026-05-02 16:40:51 +0800 21) end
```


git log --follow --diff-filter=ra -- ruby-example/lib/calculator.rb
( it does not show the move commit )
```
commit 303f25f50cde83c46f83bc3c337cd52b87b63d52 (HEAD -> example-2)
Author: nothing <nothing@contributed.com>
Date:   Sat May 2 16:40:51 2026 +0800

    update lib and require

commit 3d22d30304d836c07b3274689e5e2c536b29e1bb (example-3)
Author: billybonks <sebastienstettler@gmail.com>
Date:   Sat May 2 16:34:14 2026 +0800

    fix: sum was doing subtraction instead of addition
```
Using this approach does incur a heavy burnder on the user since most tools that do renaming etc 
will move and rename, and that cost does not give a complete solution as there are still many thigns
that don't work in the most ideal sense.


As mentioned by Chris

> It *could* be improved. Doing so in a way that works for more than
> just some special cases -- e.g., in a way that works for ordinary
> text, or graphical images, for instance, rather than just for Ruby
> sources (or just C sources, or C++, or Swift, or Python, or whatever)
> -- seems particularly tricky. Some degree of ignoring white-space
> changes would probably help multiple cases, though.


Has there been explorations of ignoring white space for the similarity checker, i would 
assume that majority of white space movements across many languages would result in a 
semantically similar document in most cases.
 
- Sebastien








^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git rename/moved status unreliable in ruby
  2026-05-02  9:34   ` sebastien.stettler
@ 2026-05-04 10:00     ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2026-05-04 10:00 UTC (permalink / raw)
  To: sebastien.stettler
  Cc: phillip.wood@dunelm.org.uk, chris.torek@gmail.com,
	git@vger.kernel.org, j6t@kdbg.org

On Sat, May 02, 2026 at 09:34:18AM +0000, sebastien.stettler wrote:

> Has there been explorations of ignoring white space for the similarity checker, i would 
> assume that majority of white space movements across many languages would result in a 
> semantically similar document in most cases.

I don't think anybody has ever looked into it. We do have "-w" and
friends for diffs, and it makes sense that there might be some mode to
soften renames in the same way (especially if you are doing a "-w"
diff, or a merge that ignores whitespace).

The line you need to touch is probably this:

diff --git a/diffcore-delta.c b/diffcore-delta.c
index 2b7db39983..379f6010d3 100644
--- a/diffcore-delta.c
+++ b/diffcore-delta.c
@@ -147,6 +147,8 @@ static struct spanhash_top *hash_chars(struct repository *r,
 		/* Ignore CR in CRLF sequence if text */
 		if (is_text && c == '\r' && sz && *buf == '\n')
 			continue;
+		if (is_text && (c == ' ' || c == '\t'))
+			continue;
 
 		accum1 = (accum1 << 7) ^ (accum2 >> 25);
 		accum2 = (accum2 << 7) ^ (old_1 >> 25);

but:

  1. The option to ignore whitespace would need to be plumbed through
     the rest of the diffcore code.

  2. This concept probably throws off some other rename heuristics.
     E.g., I think we do a rough check that the sizes of the objects are
     not too far apart before even looking at the content. So you could
     construct a pathological case where the line "a\n" was changed to
     have a million spaces, and the files would look like they couldn't
     possibly be similar, even though they are identical when ignoring
     whitespace. I think in practice you could just ignore this, as sane
     cases would tend to have a reasonable ratio of content to
     whitespace changes.

-Peff

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: git rename/moved status unreliable in ruby
  2026-05-02  8:06 ` Chris Torek
  2026-05-02  9:34   ` sebastien.stettler
@ 2026-05-05  0:09   ` Junio C Hamano
  2026-05-05  0:46     ` Chris Torek
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2026-05-05  0:09 UTC (permalink / raw)
  To: Chris Torek; +Cc: sebastien.stettler, git@vger.kernel.org

Chris Torek <chris.torek@gmail.com> writes:

> This is why -- and when -- making two separate commits, one with
> "exact same content for deleted-file-D vs added-file-A", followed by
> later changes to new file A, helps: if you compare the commit that has

"helps" -> "somtimes helps".  Only when comparison is done step-wise
(e.g., "git log -M/--follow" and "git rebase"), it may help, but in
general, when comparison between only two endpoints matter (e.g.,
"git diff" and "git merge"), such an artificial breaking of a
logically single change into two does not help.

>> If this is considered something that can be improved ...
>
> It *could* be improved. Doing so in a way that works for more than
> just some special cases -- e.g., in a way that works for ordinary
> text, or graphical images, for instance, rather than just for Ruby
> sources (or just C sources, or C++, or Swift, or Python, or whatever)
> -- seems particularly tricky. Some degree of ignoring white-space
> changes would probably help multiple cases, though.

You could tie it with the attributes system to allow logic
specialized for the nature of the contents.  The beauty of the
design decision to store "snapshots" is that these heuristics can be
improved without having to change anything in the history that are
cast in stone.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git rename/moved status unreliable in ruby
  2026-05-05  0:09   ` Junio C Hamano
@ 2026-05-05  0:46     ` Chris Torek
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Torek @ 2026-05-05  0:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: sebastien.stettler, git@vger.kernel.org

On Mon, May 4, 2026 at 5:09 PM Junio C Hamano <gitster@pobox.com> wrote:
> Chris Torek <chris.torek@gmail.com> writes:
>
> > This is why -- and when -- making two separate commits ... helps
>
> "helps" -> "somtimes helps".  Only when comparison is done step-wise
> (e.g., "git log -M/--follow" and "git rebase"), it may help,

That's why I said "and when". :-)

 > > ... Some degree of ignoring white-space
> > changes would probably help multiple cases, though.
>
> You could tie it with the attributes system to allow logic
> specialized for the nature of the contents.  The beauty of the
> design decision to store "snapshots" is that these heuristics can be
> improved without having to change anything in the history that are
> cast in stone.

Indeed. Something like Peff's suggestion might work, although I
see some danger in ignoring white space completely. It would
probably be better to compress "all leading but non-empty white
space" to either nothing or a single space, eliminate all trailing
white space, and compress other white space to a single blank.

(Though at the same time, when we're dealing with slugs extracted
from very long single lines, this is probably wrong, so perhaps this
should only be done for "intact single line" slugs. Then again it
might not matter at this point.)

Doing this on binary files and programs written in Whitespace[1]
would be wrong, of course. ;-)

Chris

[1]: https://en.wikipedia.org/wiki/Whitespace_(programming_language)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-05-05  0:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-01  5:05 git rename/moved status unreliable in ruby sebastien.stettler
2026-05-01 15:30 ` Phillip Wood
2026-05-02  7:25   ` Johannes Sixt
2026-05-03 21:59   ` Junio C Hamano
2026-05-02  8:06 ` Chris Torek
2026-05-02  9:34   ` sebastien.stettler
2026-05-04 10:00     ` Jeff King
2026-05-05  0:09   ` Junio C Hamano
2026-05-05  0:46     ` Chris Torek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox