* whitespace ignoring during diff -M
@ 2009-05-31 20:28 Daniel Mierswa
2009-06-28 20:02 ` Jeff King
0 siblings, 1 reply; 2+ messages in thread
From: Daniel Mierswa @ 2009-05-31 20:28 UTC (permalink / raw)
To: git
Hi list,
I was told to try it here after visiting #git/Freenode
I want git to think that the diff of two branches where filenames and
whitespace amount differ are the same.
The following is a snippet from my terminal with output, is there a
chance to make git think that those are equal?
impulze@istari ~/gittest $ git init
Initialized empty Git repository in /home/impulze/gittest/.git/
impulze@istari ~/gittest $ touch initial
impulze@istari ~/gittest $ git add initial && git commit initial -m
'initial commit'
[master (root-commit) 7b67dcd] initial commit
0 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 initial
impulze@istari ~/gittest $ git checkout -b another
Switched to a new branch 'another'
impulze@istari ~/gittest $ echo -e " abcdef \n ghijkl " > file.cc
impulze@istari ~/gittest $ unix2dos -a -u file.cc
impulze@istari ~/gittest $ git add file.cc && git commit file.cc -m
'another commit'
[another 37826f4] another commit
1 files changed, 2 insertions(+), 0 deletions(-)
create mode 100644 file.cc
impulze@istari ~/gittest $ git checkout master
Switched to branch 'master'
impulze@istari ~/gittest $ echo -e "\t\tabcdef\t\n\tghijkl\t" > file.c
impulze@istari ~/gittest $ git add file.c && git commit file.c -m
'master commit'
[master f9f0ac5] master commit
1 files changed, 2 insertions(+), 0 deletions(-)
create mode 100644 file.c
impulze@istari ~/gittest $ git --no-pager diff another -M -w
diff --git a/file.c b/file.c
new file mode 100644
index 0000000..18364be
--- /dev/null
+++ b/file.c
@@ -0,0 +1,2 @@
+ abcdef
+ ghijkl
diff --git a/file.cc b/file.cc
deleted file mode 100644
index 1a303ea..0000000
--- a/file.cc
+++ /dev/null
@@ -1,2 +0,0 @@
- abcdef
- ghijkl
--
Mierswa, Daniel
If you still don't like it, that's ok: that's why I'm boss. I simply
know better than you do.
--- Linus Torvalds, comp.os.linux.advocacy, 1996/07/22
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: whitespace ignoring during diff -M
2009-05-31 20:28 whitespace ignoring during diff -M Daniel Mierswa
@ 2009-06-28 20:02 ` Jeff King
0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2009-06-28 20:02 UTC (permalink / raw)
To: Daniel Mierswa; +Cc: git
[this is a bit of an old message, but I am way behind on git mail,
and nobody else seems to have responded, so...]
On Sun, May 31, 2009 at 10:28:50PM +0200, Daniel Mierswa wrote:
> I was told to try it here after visiting #git/Freenode
> I want git to think that the diff of two branches where filenames and
> whitespace amount differ are the same.
> The following is a snippet from my terminal with output, is there a
> chance to make git think that those are equal?
Rename detection in git does not respect the "-w" option at all. It
hashes each line of a text file, and then compares the hashes to see how
"similar" the files are.
It already makes some effort to ignore the CR in a CRLF sequence when
calculating the hash. So just running "unix2dos" (or vice versa) on a
file should still allow it to find renames.
This could probably be extended fairly trivially to ignore arbitrary
whitespace when generating the hash (I'm not sure if the feature should
be triggered by "-w" or not; it makes sense to me, but I'm not sure if
there are cases where people would want diff generation to have
different rules than rename detection. We maybe would even want to
ignore whitespace in diff generation _always_, as we always do already
with CRLF. Somebody would need to check the results of the two
approaches against a number of cases).
If you are interested, the relevant code is in hash_chars in
diffcore-delta.c. A trivial implementation would probably look something
like the patch below. I tested it with:
git init
cp /usr/share/dict/words words && git add words && git commit -m one
sed 's/^/ /' <words >munged
git add munged && git rm words
git diff --cached --summary
which curious reports 82% similarity. So maybe there is more
investigation to be done. Anyway, patch below.
---
diff --git a/diffcore-delta.c b/diffcore-delta.c
index e670f85..63704da 100644
--- a/diffcore-delta.c
+++ b/diffcore-delta.c
@@ -145,6 +145,8 @@ static struct spanhash_top *hash_chars(struct diff_filespec *one)
/* Ignore CR in CRLF sequence if text */
if (is_text && c == '\r' && sz && *buf == '\n')
continue;
+ if (is_text && (c == ' ' || c == '\t'))
+ continue;
accum1 = (accum1 << 7) ^ (accum2 >> 25);
accum2 = (accum2 << 7) ^ (old_1 >> 25);
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2009-06-28 20:01 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-31 20:28 whitespace ignoring during diff -M Daniel Mierswa
2009-06-28 20:02 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).