* [PATCH RFC] Rename detection and whitespace
@ 2011-04-19 20:13 Ciaran
2011-04-20 6:03 ` Jeff King
0 siblings, 1 reply; 2+ messages in thread
From: Ciaran @ 2011-04-19 20:13 UTC (permalink / raw)
To: git
Hi,
I recently had an issue with a merge where there was a branch of some
code made to do some format standardisation e.g. re-indenting,
normalising of whitespace etc. etc. and another branch to do with
tidying up the codebase (moving some files around etc.)
When we brought the two branches back together git struggled (failed)
to apply the formatting changes to the files that were moved (-s
recursive -X ignore-all-space). Now in terms of best practice
obviously what we did was stupid, and we should never have
parallelised the two pieces of work and I imagine that this may well
be considered 'known behaviour / won't fix' but I wanted to raise it
and get told that rather than assume it :)
After a minimal amount of investigation I found that with a small
patch I could make my local git instance perform the merge flawlessly
(in my case).
For me I tracked the behaviour down to the blob similarity calculation
that takes place in the diffcore-delta.c#hash_chars method. In our
case the problem was we were adjusting the whitespace at the front of
each line which meant that the 64 byte segment hashes were
different/mis-aligned between the 2 equivalent files. This code
already 'normalises' out CRLF/LF differences by skipping any CR
characters when followed by LF so my question is that would it be
considered wrong/evil to ignore *all* whitespace characters when -X
ignore-all-space has been passed.
The behaviour is trivial to reproduce without a merge:
i) Create a text file with a few lines in it.
ii) Add the file and commit it.
iii) Insert a space at the front of each line and rename the file
iv) git add -A .
v) git status will now show a 'new file' and a 'deleted file'
With the patch inlined below (not suitable for inclusion, naive in so
many ways (what is whitespace, and should be optional/configurable to
name just two!) git status will report a rename instead. This
behaviour carries across to merging.
Presumably I'm missing something important, but the improved rename
detection for us was outstanding!
---
diffcore-delta.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/diffcore-delta.c b/diffcore-delta.c
index 7cf431d..5429b8d 100644
--- a/diffcore-delta.c
+++ b/diffcore-delta.c
@@ -146,6 +146,9 @@ static struct spanhash_top *hash_chars(struct
diff_filespec *one)
if (is_text && c == '\r' && sz && *buf == '\n')
continue;
+ if (is_text && ( c == '\t' || c ==' ' ) )
+ continue;
+
accum1 = (accum1 << 7) ^ (accum2 >> 25);
accum2 = (accum2 << 7) ^ (old_1 >> 25);
accum1 += c;
--
1.7.4.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH RFC] Rename detection and whitespace
2011-04-19 20:13 [PATCH RFC] Rename detection and whitespace Ciaran
@ 2011-04-20 6:03 ` Jeff King
0 siblings, 0 replies; 2+ messages in thread
From: Jeff King @ 2011-04-20 6:03 UTC (permalink / raw)
To: Ciaran; +Cc: git
On Tue, Apr 19, 2011 at 09:13:19PM +0100, Ciaran wrote:
> For me I tracked the behaviour down to the blob similarity calculation
> that takes place in the diffcore-delta.c#hash_chars method. In our
> case the problem was we were adjusting the whitespace at the front of
> each line which meant that the 64 byte segment hashes were
> different/mis-aligned between the 2 equivalent files. This code
> already 'normalises' out CRLF/LF differences by skipping any CR
> characters when followed by LF so my question is that would it be
> considered wrong/evil to ignore *all* whitespace characters when -X
> ignore-all-space has been passed.
I think what you are proposing makes sense. If the diff which spawned
the rename detection doesn't care about whitespace, then probably it
should be ignored in the rename, too. That would go for merging with "-X
ignore-all-space", but also for "git diff -w" (and probably "-b").
I do think it should be conditional on those, though. We _know_ that
CRLF versus LF in a text file is irrelevant to the file's content. But
whether whitespace is irrelevant varies from file to file.
So your patch is in the right direction, in my opinion.
-Peff
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2011-04-20 6:03 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-19 20:13 [PATCH RFC] Rename detection and whitespace Ciaran
2011-04-20 6:03 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).