From: "SZEDER Gábor" <szeder@ira.uka.de>
To: Bill Okara <billokara@gmail.com>
Cc: "SZEDER Gábor" <szeder@ira.uka.de>,
"Karsten Blees" <karsten.blees@gmail.com>,
"Stefan Beller" <sbeller@google.com>,
"Kevin Daudt" <me@ikke.info>,
git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>
Subject: Re: git mv messed up file mapping if folders contain identical files
Date: Fri, 26 Feb 2016 12:50:00 +0100 [thread overview]
Message-ID: <1456487400-31174-1-git-send-email-szeder@ira.uka.de> (raw)
In-Reply-To: <CADsr5c9j1ne5K4TKZGMvoFeaNWbQxDs253Y29bfb9BsA+7A0aA@mail.gmail.com>
Hi,
Please don't top-post on this list.
> I guess a bigger concern of this issue is the mess up of history. That
> is, even if not doing an merge/update, just doing the 'git mv' will
> messed up the file history, as shown in following:
>
>
> // Add a new resources/qa/content.txt files with a new commit message:
>
> > mkdir gitmvtest/resources/qa
> > cp gitmvtest/resources/demo/content.txt gitmvtest/resources/qa/.
> > git add .
> >git commit -m "Add a new QA context.txt"
> [master caba387] Add a new QA context.txt
> 1 file changed, 2 insertions(+)
>
> // Do the git mv
>
> > git checkout -b branch5
> > git mv gitmvtest/resources gitmvtest/src/main/.
> > git commit -m "Move resources to src/main/resources"
> [branch5 dd44309] Move resources to src/main/resources
> 4 files changed, 0 insertions(+), 0 deletions(-)
> rename gitmvtest/{resources/qa =>
> src/main/resources/demo}/content.txt (100%) <-- WRONG
> rename gitmvtest/{resources/prod =>
> src/main/resources/dev}/content.txt (100%) <-- WRONG
> rename gitmvtest/{resources/dev =>
> src/main/resources/prod}/content.txt (100%) <-- WRONG
> rename gitmvtest/{resources/demo =>
> src/main/resources/qa}/content.txt (100%) <-- WRONG
>
> // WRONG HISTORY
> > git log --follow --oneline gitmvtest/src/main/resources/demo/content.txt <== demo/content.txt points to the new QA history
> dd44309 Move resources to src/main/resources
> caba387 Add a new QA context.txt <== WRONG HISTORY
Git doesn't track copies and renames.
Git only tracks content and infers copies and renames from content
changes. For example, if a commit removes path 'A' and adds path 'B'
then Git checks whether they both have identical (or very similar)
content, and reports this change as a rename if they do. This is not
recorded anywhere in the repository, but 'git log --follow <path>'
performs this check upon seeing that the path in question doesn't
exist in the previous commit.
Anyway, diffcore used to handle your case better, and the patch below
restores the original behavior.
---- >8 ----
Subject: [PATCH] diffcore: fix iteration order of identical files during rename detection
If the two paths 'dir/A/file' and 'dir/B/file' have identical content
and the parent directory is renamed, e.g. 'git mv dir other-dir', then
diffcore reports the following exact renames:
renamed: dir/B/file -> other-dir/A/file
renamed: dir/A/file -> other-dir/B/file
While technically not wrong, this is confusing not only for the user,
but also for git commands that make decisions based on rename
information, e.g. 'git log --follow'.
This behavior is a side effect of commit v2.0.0-rc4~8^2~14
(diffcore-rename.c: simplify finding exact renames, 2013-11-14): the
hashmap storing sources returns entries from the same bucket, i.e.
sources matching the current destination, in LIFO order. Thus the
iteration first examines 'other-dir/A/file' and 'dir/B/file' and, upon
finding identical content and basename, reports an exact rename.
Restore the original behavior by reversing the order of filling the
hashmap with source entries.
Reported-by: Bill Okara <billokara@gmail.com>
Signed-off-by: SZEDER Gábor <szeder@ira.uka.de>
---
diffcore-rename.c | 6 ++++--
t/t4001-diff-rename.sh | 11 +++++++++++
2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/diffcore-rename.c b/diffcore-rename.c
index af1fe08861e6..69fcf77be02d 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -340,9 +340,11 @@ static int find_exact_renames(struct diff_options *options)
int i, renames = 0;
struct hashmap file_table;
- /* Add all sources to the hash table */
+ /* Add all sources to the hash table in reverse order, because
+ * later on they will be retrieved in LIFO order.
+ */
hashmap_init(&file_table, NULL, rename_src_nr);
- for (i = 0; i < rename_src_nr; i++)
+ for (i = rename_src_nr-1; i >= 0; i--)
insert_file_table(&file_table, i, rename_src[i].p->one);
/* Walk the destinations and find best source match */
diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh
index 2f327b749588..ed90c6c6f984 100755
--- a/t/t4001-diff-rename.sh
+++ b/t/t4001-diff-rename.sh
@@ -77,6 +77,17 @@ test_expect_success 'favour same basenames even with minor differences' '
git show HEAD:path1 | sed "s/15/16/" > subdir/path1 &&
git status | test_i18ngrep "renamed: .*path1 -> subdir/path1"'
+test_expect_success 'two files with same basename and same content' '
+ git reset --hard &&
+ mkdir -p dir/A dir/B &&
+ cp path1 dir/A/file &&
+ cp path1 dir/B/file &&
+ git add dir &&
+ git commit -m 2 &&
+ git mv dir other-dir &&
+ git status | test_i18ngrep "renamed: .*dir/A/file -> other-dir/A/file"
+'
+
test_expect_success 'setup for many rename source candidates' '
git reset --hard &&
for i in 0 1 2 3 4 5 6 7 8 9;
--
2.7.2.410.g92cb358
next prev parent reply other threads:[~2016-02-26 11:51 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-24 23:38 git mv messed up file mapping if folders contain identical files Bill Okara
2016-02-24 23:39 ` Junio C Hamano
2016-02-24 23:51 ` Bill Okara
2016-02-25 0:03 ` Bill Okara
2016-02-25 11:49 ` Kevin Daudt
2016-02-25 13:56 ` Stefan Beller
2016-02-25 16:25 ` Bill Okara
2016-02-26 11:50 ` SZEDER Gábor [this message]
2016-02-26 15:48 ` Bill Okara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1456487400-31174-1-git-send-email-szeder@ira.uka.de \
--to=szeder@ira.uka.de \
--cc=billokara@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=karsten.blees@gmail.com \
--cc=me@ikke.info \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).