* git rename/moved status unreliable in ruby @ 2026-05-01 5:05 sebastien.stettler 2026-05-01 15:30 ` Phillip Wood 2026-05-02 8:06 ` Chris Torek 0 siblings, 2 replies; 9+ messages in thread From: sebastien.stettler @ 2026-05-01 5:05 UTC (permalink / raw) To: git@vger.kernel.org 1. What did you do before the bug happened? (Steps to reproduce your issue) when moving ruby classes between namespaces they are marked as new files and the old ones are marked as deleted if i only change the class name it will mark it as renamed 2. What did you expect to happen? (Expected behavior) in the namespace state i would expected it to be marked as moved since nothing has fundementally changed 3. What happened instead? (Actual behavior) the file was marked as new file and the old file was marked as deleted What's different between what you expected and what actually happened 4. Anything else you want to add: I have demonstrated the behavior here https://github.com/billybonks/git-rename Mostly i would like to understand what is the expectation from gits point of view in these mutations. If this is considered something that can be improved i am happy to build out more test cases, and help with implementation. if not, understanding the reasoning would be great Thank you. [System Info] git version: git version 2.47.1 cpu: arm64 no commit associated with this build sizeof-long: 8 sizeof-size_t: 8 shell-path: /bin/sh feature: fsmonitor--daemon libcurl: 8.7.1 zlib: 1.2.12 uname: Darwin 25.3.0 Darwin Kernel Version 25.3.0: Wed Jan 28 20:51:28 PST 2026; root:xnu-12377.91.3~2/RELEASE_ARM64_T6041 arm64 compiler info: clang: 16.0.0 (clang-1600.0.26.4) libc info: no libc information available $SHELL (typically, interactive shell): /bin/zsh [Enabled Hooks] Sent with Proton Mail secure email. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git rename/moved status unreliable in ruby 2026-05-01 5:05 git rename/moved status unreliable in ruby sebastien.stettler @ 2026-05-01 15:30 ` Phillip Wood 2026-05-02 7:25 ` Johannes Sixt 2026-05-03 21:59 ` Junio C Hamano 2026-05-02 8:06 ` Chris Torek 1 sibling, 2 replies; 9+ messages in thread From: Phillip Wood @ 2026-05-01 15:30 UTC (permalink / raw) To: sebastien.stettler, git@vger.kernel.org Hi Sebastien On 01/05/2026 06:05, sebastien.stettler wrote: > 1. What did you do before the bug happened? (Steps to reproduce your issue) > > when moving ruby classes between namespaces they are marked as new files and the old > ones are marked as deleted > > if i only change the class name it will mark it as renamed Rename detection is based on how similar the two files are. Looking at the example you linked to below you're changing a file that looks like module Math class Calculator def add(a, b) a + b ... end end end to module Math module Calculators class Calculator def add(a, b) a + b ... end end end end Which means that git sees that every line has changed because the indentation has changed. If you want git to realize that the file has been renamed you could move it in one commit and then add modify it in the next commit. Thanks Phillip > 2. What did you expect to happen? (Expected behavior) > > in the namespace state i would expected it to be marked as moved since > nothing has fundementally changed > > 3. What happened instead? (Actual behavior) > > the file was marked as new file and the old file was marked as deleted > > What's different between what you expected and what actually happened > > 4. Anything else you want to add: > > I have demonstrated the behavior here https://github.com/billybonks/git-rename > > Mostly i would like to understand what is the expectation from gits point of view in these mutations. > If this is considered something that can be improved i am happy to build out more test cases, and help with implementation. > > if not, understanding the reasoning would be great > > Thank you. > > > > [System Info] > git version: > git version 2.47.1 > cpu: arm64 > no commit associated with this build > sizeof-long: 8 > sizeof-size_t: 8 > shell-path: /bin/sh > feature: fsmonitor--daemon > libcurl: 8.7.1 > zlib: 1.2.12 > uname: Darwin 25.3.0 Darwin Kernel Version 25.3.0: Wed Jan 28 20:51:28 PST 2026; root:xnu-12377.91.3~2/RELEASE_ARM64_T6041 arm64 > compiler info: clang: 16.0.0 (clang-1600.0.26.4) > libc info: no libc information available > $SHELL (typically, interactive shell): /bin/zsh > > > [Enabled Hooks] > > > Sent with Proton Mail secure email. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git rename/moved status unreliable in ruby 2026-05-01 15:30 ` Phillip Wood @ 2026-05-02 7:25 ` Johannes Sixt 2026-05-03 21:59 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: Johannes Sixt @ 2026-05-02 7:25 UTC (permalink / raw) To: phillip.wood; +Cc: Git Mailing List, sebastien.stettler Am 01.05.26 um 17:30 schrieb Phillip Wood: > Rename detection is based on how similar the two files are. Looking at > the example you linked to below you're changing a file that looks like > > ... > > Which means that git sees that every line has changed because the > indentation has changed. That's correct, of course, but... > If you want git to realize that the file has > been renamed you could move it in one commit and then add modify it in > the next commit. ... this is a fallacy. Splitting into two commits helps only certain cases, in particular, when the commit that moves the files is compared to an earlier commit, such as `git log` does. However, if a commit after the change is compared to a commit before the move, the rename is still not detected. -- Hannes ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git rename/moved status unreliable in ruby 2026-05-01 15:30 ` Phillip Wood 2026-05-02 7:25 ` Johannes Sixt @ 2026-05-03 21:59 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2026-05-03 21:59 UTC (permalink / raw) To: Phillip Wood; +Cc: sebastien.stettler, git@vger.kernel.org Phillip Wood <phillip.wood123@gmail.com> writes: > Hi Sebastien > > On 01/05/2026 06:05, sebastien.stettler wrote: >> 1. What did you do before the bug happened? (Steps to reproduce your issue) >> >> when moving ruby classes between namespaces they are marked as new files and the old >> ones are marked as deleted >> >> if i only change the class name it will mark it as renamed > > Rename detection is based on how similar the two files are. Looking at > the example you linked to below you're changing a file that looks like > ... > Which means that git sees that every line has changed because the > indentation has changed. If you want git to realize that the file has > been renamed you could move it in one commit and then add modify it in > the next commit. I've seen this repeated many times, but it is misleading to give it without qualifying when that "works" and when it does not. Such a "stick to pure rename and make huge changes elsewhere" strategy would help your "git log -M [--follow]" and possibly "git rebase", but it would not help all that much if you are doing "git diff" or "git merge". ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git rename/moved status unreliable in ruby 2026-05-01 5:05 git rename/moved status unreliable in ruby sebastien.stettler 2026-05-01 15:30 ` Phillip Wood @ 2026-05-02 8:06 ` Chris Torek 2026-05-02 9:34 ` sebastien.stettler 2026-05-05 0:09 ` Junio C Hamano 1 sibling, 2 replies; 9+ messages in thread From: Chris Torek @ 2026-05-02 8:06 UTC (permalink / raw) To: sebastien.stettler; +Cc: git@vger.kernel.org On Thu, Apr 30, 2026 at 10:06 PM sebastien.stettler <sebastien.stettler@proton.me> wrote: > ... understanding the reasoning would be great In my opinion, the key to understanding here is this: Git Stores Snapshots. What this means is that every commit is a full snapshot of all of the files for that commit. There are no "changes" at all, there is only a full snapshot, every time. Now, internally, the storage format is more complicated (and compressive, ultimately using the concept of changes as well, though not exactly the way one might expect). But from the "what things look like" point of view, and how you should think about what Git sees, each commit is simply a full and complete snapshot of every file. So if you have one commit where `foo.rb` exists, and `bar.br` does not, that snapshot has a `foo.rb` but no `bar.rb`. If you make a second snapshot, in which `foo.rb` no longer exists but `bar.rb` does now, that second snapshot, well, has those files. The tricky part is that you normally ask Git to *compare* two snapshots (at least for "what changed" purposes). When you do that, Git extracts both snapshots and, well, compares them. If `foo.rb` has been removed and `bar.rb` has been added, Git then goes on to compare the *contents* of those two files. If the contents match exactly, and you've asked Git to "find renames", Git will always say that the file that vanished from the first commit, only to be created identically under a new name in the second, was "renamed", rather than the one file being deleted and the second added. If the contents match "fuzzily" (for some value and algorithm of fuzz-factor), Git may also say "renamed". You can control this with `--find-renames=<value>`. The key idea here is that Git is *finding* renames: either exact-same-contents, or "sufficiently similar" contents, based on remove-and-add pairs. Since Git only *stores* snapshots, you can get two different results from comparing the same two commits. All you have to do to get this is to adjust whether Git checks for renames at all, and if so, to what extent. These rules apply to `git show`, `git diff`, `git merge`, and even the diffstat that `git commit` optionally shows after a commit. For this reason, all the "compare some commits" commands -- including `git merge` -- take this `--find-renames=<value>` option. Detection of renames can be countermanded entirely with `--no-renames`. This is why -- and when -- making two separate commits, one with "exact same content for deleted-file-D vs added-file-A", followed by later changes to new file A, helps: if you compare the commit that has file D to the middle commit, the two files match exactly, and any rename detection you have turned on finds that rename. If you then compare the middle commit to the final commit, file A exists in both, so Git shows changes to file A. But as soon as you compare the original file-D-containing commit to the final file-A-updated commit, you run into the original issue again: to detect this as a rename, you may need to allow rather generous rename detection. If, in the future, Git gets fancier rename detection, comparing the original commit directly against the final one could find the rename automatically. So: > If this is considered something that can be improved ... It *could* be improved. Doing so in a way that works for more than just some special cases -- e.g., in a way that works for ordinary text, or graphical images, for instance, rather than just for Ruby sources (or just C sources, or C++, or Swift, or Python, or whatever) -- seems particularly tricky. Some degree of ignoring white-space changes would probably help multiple cases, though. Chris ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git rename/moved status unreliable in ruby 2026-05-02 8:06 ` Chris Torek @ 2026-05-02 9:34 ` sebastien.stettler 2026-05-04 10:00 ` Jeff King 2026-05-05 0:09 ` Junio C Hamano 1 sibling, 1 reply; 9+ messages in thread From: sebastien.stettler @ 2026-05-02 9:34 UTC (permalink / raw) To: phillip.wood@dunelm.org.uk, chris.torek@gmail.com Cc: git@vger.kernel.org, j6t@kdbg.org Thanks for all the responses and thoughts thus far. > Which means that git sees that every line has changed because the > indentation has changed. If you want git to realize that the file has > been renamed you could move it in one commit and then add modify it in > the next commit. This is the minimal solution to the problem but as Johannes illustrated, there cases where that "move commit doesn't solve the problem" > Splitting into two commits helps only certain > cases, in particular, when the commit that moves the files is compared > to an earlier commit, such as `git log` does. However, if a commit after > the change is compared to a commit before the move, the rename is still > not detected. Further examples: The move commit helps if i want to blame a file if i use the -W option, but if it doesnt help with git log since the changes are now considered modifications. git blame -w ruby-example/lib/calculator.rb (shows previous files changes and not the indentation) ``` 303f25f5 ruby-example/lib/calculator.rb (nothing 2026-05-02 16:40:51 +0800 1) module Lib ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 2) class Calculator ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 3) def add(a, b) 3d22d303 ruby-example/calculator.rb (billybonks 2026-05-02 16:34:14 +0800 4) a + b ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 5) end ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 6) ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 7) def subtract(a, b) ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 8) a - b ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 9) end ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 10) ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 11) def multiply(a, b) ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 12) a * b ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 13) end ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 14) ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 15) def divide(a, b) ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 16) raise ZeroDivisionError, 'Cannot divide by zero' if b.zero? ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 17) ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 18) a / b ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 19) end ^1cf274f ruby-example/calculator.rb (billybonks 2026-05-01 12:36:47 +0800 20) end 303f25f5 ruby-example/lib/calculator.rb (nothing 2026-05-02 16:40:51 +0800 21) end ``` git log --follow --diff-filter=ra -- ruby-example/lib/calculator.rb ( it does not show the move commit ) ``` commit 303f25f50cde83c46f83bc3c337cd52b87b63d52 (HEAD -> example-2) Author: nothing <nothing@contributed.com> Date: Sat May 2 16:40:51 2026 +0800 update lib and require commit 3d22d30304d836c07b3274689e5e2c536b29e1bb (example-3) Author: billybonks <sebastienstettler@gmail.com> Date: Sat May 2 16:34:14 2026 +0800 fix: sum was doing subtraction instead of addition ``` Using this approach does incur a heavy burnder on the user since most tools that do renaming etc will move and rename, and that cost does not give a complete solution as there are still many thigns that don't work in the most ideal sense. As mentioned by Chris > It *could* be improved. Doing so in a way that works for more than > just some special cases -- e.g., in a way that works for ordinary > text, or graphical images, for instance, rather than just for Ruby > sources (or just C sources, or C++, or Swift, or Python, or whatever) > -- seems particularly tricky. Some degree of ignoring white-space > changes would probably help multiple cases, though. Has there been explorations of ignoring white space for the similarity checker, i would assume that majority of white space movements across many languages would result in a semantically similar document in most cases. - Sebastien ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git rename/moved status unreliable in ruby 2026-05-02 9:34 ` sebastien.stettler @ 2026-05-04 10:00 ` Jeff King 0 siblings, 0 replies; 9+ messages in thread From: Jeff King @ 2026-05-04 10:00 UTC (permalink / raw) To: sebastien.stettler Cc: phillip.wood@dunelm.org.uk, chris.torek@gmail.com, git@vger.kernel.org, j6t@kdbg.org On Sat, May 02, 2026 at 09:34:18AM +0000, sebastien.stettler wrote: > Has there been explorations of ignoring white space for the similarity checker, i would > assume that majority of white space movements across many languages would result in a > semantically similar document in most cases. I don't think anybody has ever looked into it. We do have "-w" and friends for diffs, and it makes sense that there might be some mode to soften renames in the same way (especially if you are doing a "-w" diff, or a merge that ignores whitespace). The line you need to touch is probably this: diff --git a/diffcore-delta.c b/diffcore-delta.c index 2b7db39983..379f6010d3 100644 --- a/diffcore-delta.c +++ b/diffcore-delta.c @@ -147,6 +147,8 @@ static struct spanhash_top *hash_chars(struct repository *r, /* Ignore CR in CRLF sequence if text */ if (is_text && c == '\r' && sz && *buf == '\n') continue; + if (is_text && (c == ' ' || c == '\t')) + continue; accum1 = (accum1 << 7) ^ (accum2 >> 25); accum2 = (accum2 << 7) ^ (old_1 >> 25); but: 1. The option to ignore whitespace would need to be plumbed through the rest of the diffcore code. 2. This concept probably throws off some other rename heuristics. E.g., I think we do a rough check that the sizes of the objects are not too far apart before even looking at the content. So you could construct a pathological case where the line "a\n" was changed to have a million spaces, and the files would look like they couldn't possibly be similar, even though they are identical when ignoring whitespace. I think in practice you could just ignore this, as sane cases would tend to have a reasonable ratio of content to whitespace changes. -Peff ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: git rename/moved status unreliable in ruby 2026-05-02 8:06 ` Chris Torek 2026-05-02 9:34 ` sebastien.stettler @ 2026-05-05 0:09 ` Junio C Hamano 2026-05-05 0:46 ` Chris Torek 1 sibling, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2026-05-05 0:09 UTC (permalink / raw) To: Chris Torek; +Cc: sebastien.stettler, git@vger.kernel.org Chris Torek <chris.torek@gmail.com> writes: > This is why -- and when -- making two separate commits, one with > "exact same content for deleted-file-D vs added-file-A", followed by > later changes to new file A, helps: if you compare the commit that has "helps" -> "somtimes helps". Only when comparison is done step-wise (e.g., "git log -M/--follow" and "git rebase"), it may help, but in general, when comparison between only two endpoints matter (e.g., "git diff" and "git merge"), such an artificial breaking of a logically single change into two does not help. >> If this is considered something that can be improved ... > > It *could* be improved. Doing so in a way that works for more than > just some special cases -- e.g., in a way that works for ordinary > text, or graphical images, for instance, rather than just for Ruby > sources (or just C sources, or C++, or Swift, or Python, or whatever) > -- seems particularly tricky. Some degree of ignoring white-space > changes would probably help multiple cases, though. You could tie it with the attributes system to allow logic specialized for the nature of the contents. The beauty of the design decision to store "snapshots" is that these heuristics can be improved without having to change anything in the history that are cast in stone. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: git rename/moved status unreliable in ruby 2026-05-05 0:09 ` Junio C Hamano @ 2026-05-05 0:46 ` Chris Torek 0 siblings, 0 replies; 9+ messages in thread From: Chris Torek @ 2026-05-05 0:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: sebastien.stettler, git@vger.kernel.org On Mon, May 4, 2026 at 5:09 PM Junio C Hamano <gitster@pobox.com> wrote: > Chris Torek <chris.torek@gmail.com> writes: > > > This is why -- and when -- making two separate commits ... helps > > "helps" -> "somtimes helps". Only when comparison is done step-wise > (e.g., "git log -M/--follow" and "git rebase"), it may help, That's why I said "and when". :-) > > ... Some degree of ignoring white-space > > changes would probably help multiple cases, though. > > You could tie it with the attributes system to allow logic > specialized for the nature of the contents. The beauty of the > design decision to store "snapshots" is that these heuristics can be > improved without having to change anything in the history that are > cast in stone. Indeed. Something like Peff's suggestion might work, although I see some danger in ignoring white space completely. It would probably be better to compress "all leading but non-empty white space" to either nothing or a single space, eliminate all trailing white space, and compress other white space to a single blank. (Though at the same time, when we're dealing with slugs extracted from very long single lines, this is probably wrong, so perhaps this should only be done for "intact single line" slugs. Then again it might not matter at this point.) Doing this on binary files and programs written in Whitespace[1] would be wrong, of course. ;-) Chris [1]: https://en.wikipedia.org/wiki/Whitespace_(programming_language) ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-05-05 0:46 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-01 5:05 git rename/moved status unreliable in ruby sebastien.stettler 2026-05-01 15:30 ` Phillip Wood 2026-05-02 7:25 ` Johannes Sixt 2026-05-03 21:59 ` Junio C Hamano 2026-05-02 8:06 ` Chris Torek 2026-05-02 9:34 ` sebastien.stettler 2026-05-04 10:00 ` Jeff King 2026-05-05 0:09 ` Junio C Hamano 2026-05-05 0:46 ` Chris Torek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox