* git-blame not tracking copies
@ 2007-05-02 19:33 Andy Parkins
2007-05-02 23:42 ` Junio C Hamano
2007-05-06 6:02 ` Junio C Hamano
0 siblings, 2 replies; 4+ messages in thread
From: Andy Parkins @ 2007-05-02 19:33 UTC (permalink / raw)
To: git
Hello,
During the discussion on the ffmpeg list about potential migration to
git the following came up. It seems like a bug to me, so I said I
would raise it here.
This is the output of a test script (which I can supply if wanted, but
you can guess the content from the output.
Initialized empty Git repository in .git/
----- echo ABC to commit 1
Created initial commit beb7140: 1
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 newtest
----- echo DEF to commit 2
Created commit 207f5a3: 2
1 files changed, 1 insertions(+), 0 deletions(-)
----- echo ghijk to commit 3
Created commit 14abf8c: 3
1 files changed, 1 insertions(+), 0 deletions(-)
----- Blame 1...
^beb7140 (Andy Parkins 2007-05-02 20:25:27 +0100 1) ABC
207f5a35 (Andy Parkins 2007-05-02 20:25:27 +0100 2) DEF
14abf8ce (Andy Parkins 2007-05-02 20:25:27 +0100 3) ghijk
----- Copy newtest to newtest2, commit 4
Created commit 48861ce: 4
1 files changed, 3 insertions(+), 0 deletions(-)
create mode 100644 newtest2
----- Blame 2...
48861ced (Andy Parkins 2007-05-02 20:25:27 +0100 1) ABC
48861ced (Andy Parkins 2007-05-02 20:25:27 +0100 2) DEF
48861ced (Andy Parkins 2007-05-02 20:25:27 +0100 3) ghijk
----- Edit newtest2, commit 5
Created commit 2d2ec0f: 5
1 files changed, 1 insertions(+), 0 deletions(-)
----- Blame 3...
^beb7140 newtest (Andy Parkins 2007-05-02 20:25:27 +0100 1) ABC
207f5a35 newtest (Andy Parkins 2007-05-02 20:25:27 +0100 2) DEF
2d2ec0f0 newtest2 (Andy Parkins 2007-05-02 20:25:27 +0100 3) XXXX
48861ced newtest2 (Andy Parkins 2007-05-02 20:25:27 +0100 4) ghijk
All git-blame commands are "git-blame -C1 -C1"
The issues are
- Blame2 says all the lines come from commit 4, when actually they
come from commits 1, 2 and 3. It was pointed out that this is
particularly annoying because the file is an exact copy and so the
copy has the same hash as the original so should be easy to spot
- The output isn't stable, even if blame2 had a good reason for not
assigning lines 1 and 2 to their correct commits, why isn't the same
true in blame3?
- Blame3 incorrectly ascribes line 4 to commit 4, when it should have
remained as it was in blame1 - to commit 3.
Andy
--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: git-blame not tracking copies
2007-05-02 19:33 git-blame not tracking copies Andy Parkins
@ 2007-05-02 23:42 ` Junio C Hamano
2007-05-03 7:29 ` Andy Parkins
2007-05-06 6:02 ` Junio C Hamano
1 sibling, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2007-05-02 23:42 UTC (permalink / raw)
To: Andy Parkins; +Cc: git
Andy Parkins <andyparkins@gmail.com> writes:
> All git-blame commands are "git-blame -C1 -C1"
I am sick and not functioning well today, so will not be able to
review what is happening with your example deeply, but here are
some comments to get you started digging.
There is a built-in sanity valve in git-blame that refuses to
pass down the blame via -M/-C for really trivial hunks. Without
such safety, all the empty lines in the latest revision would be
attributed to a random empty line in a random file in the root
commit ;-).
By default, the sanity valve is set to refuse a hunk that has 20
or 40 alnum characters. These values seem to be appropriate for
real life projects, but obviously the real-world case would be
very different from a made-up "each commit changes one-line"
test case and it would be understandable that the command would
behave differently.
Also with real-life projects, probably depending on the coding
style and merge patterns, I would not be surprised if there are
rooms to tweak the heuristics.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: git-blame not tracking copies
2007-05-02 23:42 ` Junio C Hamano
@ 2007-05-03 7:29 ` Andy Parkins
0 siblings, 0 replies; 4+ messages in thread
From: Andy Parkins @ 2007-05-03 7:29 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano
On Thursday 2007 May 03, Junio C Hamano wrote:
> I am sick and not functioning well today, so will not be able to
I'm sorry to hear that. I hope you feel better soon.
> review what is happening with your example deeply, but here are
> some comments to get you started digging.
>
> There is a built-in sanity valve in git-blame that refuses to
> pass down the blame via -M/-C for really trivial hunks. Without
> such safety, all the empty lines in the latest revision would be
> attributed to a random empty line in a random file in the root
> commit ;-).
That sounds like an excellent feature. However - in the case of a full
file-to-file copy, surely that valve may be safely disabled for that one
step?
The copy is easily detectable because the hash is the same, so can't git-blame
just continue having flipped itself on to the original filename? I suppose
in essence a move is the same, the only difference between a move and a copy
is that in one case the copy is spatial in the other it is temporal.
Please excuse the fact that in the above I've completely minimised the fact
the job git-blame does is flippin' hard; I don't mean to imply that "any fule
could fix it" - because I can't :-)
Andy
--
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: git-blame not tracking copies
2007-05-02 19:33 git-blame not tracking copies Andy Parkins
2007-05-02 23:42 ` Junio C Hamano
@ 2007-05-06 6:02 ` Junio C Hamano
1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2007-05-06 6:02 UTC (permalink / raw)
To: Andy Parkins; +Cc: git
Andy Parkins <andyparkins@gmail.com> writes:
> The issues are
>
> - Blame2 says all the lines come from commit 4, when actually they
> come from commits 1, 2 and 3. It was pointed out that this is
> particularly annoying because the file is an exact copy and so the
> copy has the same hash as the original so should be easy to spot
>
> - The output isn't stable, even if blame2 had a good reason for not
> assigning lines 1 and 2 to their correct commits, why isn't the same
> true in blame3?
>
> - Blame3 incorrectly ascribes line 4 to commit 4, when it should have
> remained as it was in blame1 - to commit 3.
This turns out to be a simple and stupid boundary error.
The algorithm passes the blame by (ab)using diff to find the
common section. We run an equivalent of "diff -u0", notice the
lines that come out of it -- the lines that do not appear are
common ones. Simply put, if we have this hunk at the beginning:
@@ -4,1 +4,2 @@
-a
+A
+B
the only information we are interested in this hunk are (1) that
the hunk begins at line 4 in the postimage, so lines 1,2,3 in
the postimage are the same as the preimage; and (2) that the
different part ends just before line 6 (the hunk has two lines
in the postimage which we know do not match the preimage). So
if the next hunk from the diff look like this...
@@ -6,1 +8,1 @@
-e
+E
then we know lines 6 and 7 in the postimage are the same as the
preimage.
We need to use the information (2) from the last hunk in the
patch and blame all the remaining lines to the parent.
When we are assigning blame from one file in the "current"
commit down to one (possibly different) file in its parent
commit in pass_blame_to_parent() function, we grab patch between
the two, and we have the final "the rest are the same" call to
blame_chunk(). However, the code to assign blame for only part
of a file to unrelated file in its parent (i.e. -M/-C), the
corresponding code in find_copy_in_blob() to use the diff output
was missing this "the rest are the same" handling.
A 3-patch series follows.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-05-06 6:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-02 19:33 git-blame not tracking copies Andy Parkins
2007-05-02 23:42 ` Junio C Hamano
2007-05-03 7:29 ` Andy Parkins
2007-05-06 6:02 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).