* [PATCH 0/2] Document update for 'git-blame' '-M' and '-C' option
@ 2010-04-08 6:51 Bo Yang
2010-04-08 6:51 ` [PATCH 1/2] Add a basic idea section for git-blame Bo Yang
0 siblings, 1 reply; 9+ messages in thread
From: Bo Yang @ 2010-04-08 6:51 UTC (permalink / raw)
To: git; +Cc: trast, gitster
The current 'git-blame' manual state that '-M' detect code move and '-C' detect code copy. After read some code of 'git-blame' and some expriments with it. I found that both '-M' and '-C' detect code move and copy. So, I think the document should be updated.
Thanks Thomas Rast a lot for his help on this patch.
Bo Yang (2):
Add a basic idea section for git-blame.
Change the description of '-M' and '-C' option.
Documentation/blame-options.txt | 41 +++++++++++++++++++++++---------------
Documentation/git-blame.txt | 32 ++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+), 16 deletions(-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] Add a basic idea section for git-blame.
2010-04-08 6:51 [PATCH 0/2] Document update for 'git-blame' '-M' and '-C' option Bo Yang
@ 2010-04-08 6:51 ` Bo Yang
2010-04-08 6:51 ` [PATCH 2/2] Change the description of '-M' and '-C' option Bo Yang
2010-04-08 8:00 ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano
0 siblings, 2 replies; 9+ messages in thread
From: Bo Yang @ 2010-04-08 6:51 UTC (permalink / raw)
To: git; +Cc: trast, gitster
Explain the basic idea about blame shifting with
'-M' or '-C' given.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
---
Documentation/git-blame.txt | 32 ++++++++++++++++++++++++++++++++
1 files changed, 32 insertions(+), 0 deletions(-)
diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt
index a27f439..e31b87f 100644
--- a/Documentation/git-blame.txt
+++ b/Documentation/git-blame.txt
@@ -36,6 +36,38 @@ $ git log --pretty=oneline -S'blame_usage'
ea4c7f9bf69e781dd0cd88d2bccb2bf5cc15c9a7 git-blame: Make the output
-----------------------------------------------------------------------------
+
+BASIC IDEA
+----------
+
+This section briefly explains the basic idea behind 'git-blame'. You
+do not have to understand it to use git-blame, but it helps in
+understanding the `-M` and `-C` options. For the sake of simplicity,
+we assume that history is linear.
+
+A call to `git-blame <rev> -- <file>` works as follows:
+
+- Look at `git diff <rev>^ <rev>` to see what the commit did.
+
+- Discard all `-` lines in the diff, since they are no longer part of
+ `<file>`.
+
+- Take blame for all `+` lines; i.e., the newly added lines according
+ to the diff are attributed to `<rev>`.
+
+- Shift blame for all unchanged lines to the parent; i.e., these lines
+ are attributed as in a call to `git blame <rev>^ -- <file>`.
+
+Note that this does not expend any effort to shift blame of `+`
+lines. This can be changed with the `-M` and `-C` options.
+
+With `-M`, this command detects same lines of the current blaming code
+inside the current file. And it will shift the blame to the author of
+the original lines instead of author of current blaming code. It does
+the same for `-C` except that it will search across file boundary and
+multiple commits.
+
+
OPTIONS
-------
include::blame-options.txt[]
--
1.6.0.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] Change the description of '-M' and '-C' option.
2010-04-08 6:51 ` [PATCH 1/2] Add a basic idea section for git-blame Bo Yang
@ 2010-04-08 6:51 ` Bo Yang
2010-04-08 8:00 ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano
1 sibling, 0 replies; 9+ messages in thread
From: Bo Yang @ 2010-04-08 6:51 UTC (permalink / raw)
To: git; +Cc: trast, gitster
Both '-M' and '-C' option detect code moving and copying.
The difference between the two options is whether they
search across file boundary.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
---
Documentation/blame-options.txt | 41 +++++++++++++++++++++++---------------
1 files changed, 25 insertions(+), 16 deletions(-)
diff --git a/Documentation/blame-options.txt b/Documentation/blame-options.txt
index 4833cac..aca2a1c 100644
--- a/Documentation/blame-options.txt
+++ b/Documentation/blame-options.txt
@@ -79,14 +79,15 @@ of lines before or after the line given by <start>.
of the --date option at linkgit:git-log[1].
-M|<num>|::
- Detect moving lines in the file as well. When a commit
- moves a block of lines in a file (e.g. the original file
- has A and then B, and the commit changes it to B and
- then A), the traditional 'blame' algorithm typically blames
- the lines that were moved up (i.e. B) to the parent and
- assigns blame to the lines that were moved down (i.e. A)
- to the child commit. With this option, both groups of lines
- are blamed on the parent.
+ Detect moving/copying lines in the file as well. Instead of
+ taking blame for all added lines, attempt to find the same
+ lines 'in the same file' in the parent commit. If such a
+ match was found, shift the blame to these lines in the
+ parent. (This expends extra effort on the order of the size
+ of the file for every change.)
++
+The net effect is that if code is moved or copied within the file, the
+lines are attributed to the original instead of the move/copy.
+
<num> is optional but it is the lower bound on the number of
alphanumeric characters that git must detect as moving
@@ -94,14 +95,22 @@ within a file for it to associate those lines with the parent
commit.
-C|<num>|::
- In addition to `-M`, detect lines copied from other
- files that were modified in the same commit. This is
- useful when you reorganize your program and move code
- around across files. When this option is given twice,
- the command additionally looks for copies from other
- files in the commit that creates the file. When this
- option is given three times, the command additionally
- looks for copies from other files in any commit.
+ Like `-M`, detect moving/copying lines between files as well.
+ Instead of taking blame for all added lines, attempt to find the same
+ lines across file boundary according to the number of given `-C`.
+ This is useful when you reorganize your program and move/copy
+ code around across files. When this option is given once, detect
+ lines moved/copied from other files that were modified in the
+ same commit. (This expends extra effort on the order of
+ <number of modified files>*<file size> for every change.) When this
+ option is given twice, the command additionally looks for moves/copies
+ from other files in the commit that creates this file. (This expends
+ extra effort on the order of <number of files in the commit>*<file size>
+ for every change.) When this option is given three times, the
+ command additionally looks for moves/copies from other files
+ in any commit. (This expends extra effort on the order of
+ <commit number>*<number of files in one commit>*<file size>
+ for every change.)
+
<num> is optional but it is the lower bound on the number of
alphanumeric characters that git must detect as moving
--
1.6.0.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Add a basic idea section for git-blame.
2010-04-08 6:51 ` [PATCH 1/2] Add a basic idea section for git-blame Bo Yang
2010-04-08 6:51 ` [PATCH 2/2] Change the description of '-M' and '-C' option Bo Yang
@ 2010-04-08 8:00 ` Junio C Hamano
2010-04-08 16:27 ` Thomas Rast
2010-04-09 5:11 ` Bo Yang
1 sibling, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2010-04-08 8:00 UTC (permalink / raw)
To: Bo Yang; +Cc: git, trast
Bo Yang <struggleyb.nku@gmail.com> writes:
> +A call to `git-blame <rev> -- <file>` works as follows:
> +
> +- Look at `git diff <rev>^ <rev>` to see what the commit did.
> +
> +- Discard all `-` lines in the diff, since they are no longer part of
> + `<file>`.
> +
> +- Take blame for all `+` lines; i.e., the newly added lines according
> + to the diff are attributed to `<rev>`.
I hate to say it, but this is horrifying; it works quite the other way
around.
The name of the game is not "take blame by looking at the diff to see what
we introduced"; it is "avoid taking blame at all cost, by looking at the
diff to see what we might have inherited from our parents".
A description closer to the truth would be:
- Look at output of "git diff <rev>^$n <rev>" for each parent ($n runs
from 1 to number of parents);
- Ignore all +/- lines. The context ' ' lines and lines outside of diff
hunks are known to have been inherited from $n-th parent and they are
not our fault. We can happily pass blame for these lines to our
parent.
- Do the above for all the parents. We (grudgingly) take blame for lines
that we failed to find corresponding lines in our parents.
The -M option affects what happens between the second and the third step.
We try to see if the lines that we did not pass blame to <rev>^$n might
have come from <rev>^$n by running diff between the remainder and the blob
in <rev>^$n *again*. This lets us catch code movement within the blob,
hence the name of the option -M.
The -C option affects the choice of the blob in <rev>^$n. Usually, we
internally run an equivalent of "git diff -M <rev>^$n <rev>" to notice
that the file F that we are analyzing used to be called F' in $n-th
parent, and run the diff against it, but -C allows us to check with paths
other than that, and additional -C enlarges the search space. This is to
notice code movement (with a single -C) or copies (with more -Cs) across
paths.
For illustrated description, read the classic:
http://thread.gmane.org/gmane.comp.version-control.git/28826
Almost all the hits in the first page from this query would help, too:
http://search.gmane.org/?query=On+blame%2Fpickaxe&author=Junio+C+Hamano&group=gmane.comp.version-control.git&sort=revdate&DEFAULTOP=and&[=2&xP=blame%09pickaxe&xFILTERS=Gcomp.version-control.git-Ac-Ahamano-Ajunio---A
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] Change the description of '-M' and '-C' option.
@ 2010-04-08 8:07 Yann Dirson
2010-04-09 5:15 ` Bo Yang
0 siblings, 1 reply; 9+ messages in thread
From: Yann Dirson @ 2010-04-08 8:07 UTC (permalink / raw)
To: git; +Cc: ydirson
Very interesting addition, indeed - thanks much.
Some information may still be missing, though:
- the default value of the numeric argument to -M/-C
- the meaning of numeric arguments to the various flags when several
are given (eg. what does "blame -M10 -C5 -C15" mean ?)
- the numeric arguments are not visible in the summary
On another aspect, given the nature of repetitive -C we may want to warn
the user of bomb out if more than 3 occurences of the flag happen ?
--
Yann
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Add a basic idea section for git-blame.
2010-04-08 8:00 ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano
@ 2010-04-08 16:27 ` Thomas Rast
2010-04-08 16:51 ` Junio C Hamano
2010-04-09 5:11 ` Bo Yang
1 sibling, 1 reply; 9+ messages in thread
From: Thomas Rast @ 2010-04-08 16:27 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Bo Yang, git
Junio C Hamano wrote:
> Bo Yang <struggleyb.nku@gmail.com> writes:
>
> > +A call to `git-blame <rev> -- <file>` works as follows:
> > +
> > +- Look at `git diff <rev>^ <rev>` to see what the commit did.
> > +
> > +- Discard all `-` lines in the diff, since they are no longer part of
> > + `<file>`.
> > +
> > +- Take blame for all `+` lines; i.e., the newly added lines according
> > + to the diff are attributed to `<rev>`.
>
> I hate to say it, but this is horrifying; it works quite the other way
> around.
>
> The name of the game is not "take blame by looking at the diff to see what
> we introduced"; it is "avoid taking blame at all cost, by looking at the
> diff to see what we might have inherited from our parents".
"Horrifying" seems a fairly harsh word for another way to describe the
process *in linear history* and with default settings, doesn't it?
I now see that "assume linear history" is oversimplifying the process,
but I have to take blame^W^W^W^Wwill fail to shift blame for it since
I suggested that.
> For illustrated description, read the classic:
>
> http://thread.gmane.org/gmane.comp.version-control.git/28826
>
> Almost all the hits in the first page from this query would help, too:
>
> http://search.gmane.org/?query=On+blame%2Fpickaxe&author=Junio+C+Hamano&group=gmane.comp.version-control.git&sort=revdate&DEFAULTOP=and&[=2&xP=blame%09pickaxe&xFILTERS=Gcomp.version-control.git-Ac-Ahamano-Ajunio---A
Thanks. I trust this will help Bo make a revised description for the
manpage.
[PS for the list record: while part of the patches is indeed mine, I
never signed off on them.]
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Add a basic idea section for git-blame.
2010-04-08 16:27 ` Thomas Rast
@ 2010-04-08 16:51 ` Junio C Hamano
0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2010-04-08 16:51 UTC (permalink / raw)
To: Thomas Rast; +Cc: Bo Yang, git
Thomas Rast <trast@student.ethz.ch> writes:
> "Horrifying" seems a fairly harsh word for another way to describe the
> process *in linear history* and with default settings, doesn't it?
No, even in a linear setting, if you start from "We discard "-" and take
blame for +", you can never get to -M/-C.
Suppose you had "a/b/c/" in the parent and "1/b/c/2/a/3/4/a/5/" in the
child ("/" stands for LF). Your "b/c/" would match and you say "I added
1/, 2/a/3/4/a/5/". How would you fix that so that you can say that "a/"
after "2/" was actually moved from the first line, and possibly the second
"a/" was also copied from the same line, if you discard all "-"?
You can say "in linear history and no -M nor -C, you could annotate things
this way", but then you are not describing "blame" anymore; you are
describing something else that may work very well in much simpler setting
and it might even be how some other SCMs would do it.
But it _is_ horrifying to see that in a section that begins with "blame
works as follows".
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] Add a basic idea section for git-blame.
2010-04-08 8:00 ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano
2010-04-08 16:27 ` Thomas Rast
@ 2010-04-09 5:11 ` Bo Yang
1 sibling, 0 replies; 9+ messages in thread
From: Bo Yang @ 2010-04-09 5:11 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, trast
Hi Junio,
On Thu, Apr 8, 2010 at 1:00 AM, Junio C Hamano <gitster@pobox.com> wrote:
> The name of the game is not "take blame by looking at the diff to see what
> we introduced"; it is "avoid taking blame at all cost, by looking at the
> diff to see what we might have inherited from our parents".
>
> A description closer to the truth would be:
>
> - Look at output of "git diff <rev>^$n <rev>" for each parent ($n runs
> from 1 to number of parents);
>
> - Ignore all +/- lines. The context ' ' lines and lines outside of diff
> hunks are known to have been inherited from $n-th parent and they are
> not our fault. We can happily pass blame for these lines to our
> parent.
>
> - Do the above for all the parents. We (grudgingly) take blame for lines
> that we failed to find corresponding lines in our parents.
>
> The -M option affects what happens between the second and the third step.
> We try to see if the lines that we did not pass blame to <rev>^$n might
> have come from <rev>^$n by running diff between the remainder and the blob
> in <rev>^$n *again*. This lets us catch code movement within the blob,
> hence the name of the option -M.
I think here, not only the code movement, but also the copies will be found too.
Please correct me if I am wrong here. :-)
> The -C option affects the choice of the blob in <rev>^$n. Usually, we
> internally run an equivalent of "git diff -M <rev>^$n <rev>" to notice
> that the file F that we are analyzing used to be called F' in $n-th
> parent, and run the diff against it, but -C allows us to check with paths
> other than that, and additional -C enlarges the search space. This is to
> notice code movement (with a single -C) or copies (with more -Cs) across
> paths.
>
> For illustrated description, read the classic:
>
> http://thread.gmane.org/gmane.comp.version-control.git/28826
Thanks a lot for providing this reference. It really helps me too much
to understand how blame works. I will try to re-read the whole code
according the description here. ;-)
Regards!
Bo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] Change the description of '-M' and '-C' option.
2010-04-08 8:07 [PATCH 2/2] Change the description of '-M' and '-C' option Yann Dirson
@ 2010-04-09 5:15 ` Bo Yang
0 siblings, 0 replies; 9+ messages in thread
From: Bo Yang @ 2010-04-09 5:15 UTC (permalink / raw)
To: Yann Dirson; +Cc: git, ydirson
Hi Yann,
On Thu, Apr 8, 2010 at 1:07 AM, Yann Dirson <dirson@bertin.fr> wrote:
> Very interesting addition, indeed - thanks much.
>
> Some information may still be missing, though:
>
> - the default value of the numeric argument to -M/-C
> - the meaning of numeric arguments to the various flags when several
> are given (eg. what does "blame -M10 -C5 -C15" mean ?)
> - the numeric arguments are not visible in the summary
Ah, thanks a lot for these advice, I will add them on next version of
this patch.
> On another aspect, given the nature of repetitive -C we may want to warn
> the user of bomb out if more than 3 occurences of the flag happen ?
En, sound rationale, I think. I will try another patch to add this. Thanks!
Regards!
Bo
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-04-09 5:15 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-08 6:51 [PATCH 0/2] Document update for 'git-blame' '-M' and '-C' option Bo Yang
2010-04-08 6:51 ` [PATCH 1/2] Add a basic idea section for git-blame Bo Yang
2010-04-08 6:51 ` [PATCH 2/2] Change the description of '-M' and '-C' option Bo Yang
2010-04-08 8:00 ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano
2010-04-08 16:27 ` Thomas Rast
2010-04-08 16:51 ` Junio C Hamano
2010-04-09 5:11 ` Bo Yang
-- strict thread matches above, loose matches on Subject: below --
2010-04-08 8:07 [PATCH 2/2] Change the description of '-M' and '-C' option Yann Dirson
2010-04-09 5:15 ` Bo Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).