* [PATCH 0/2] Document update for 'git-blame' '-M' and '-C' option @ 2010-04-08 6:51 Bo Yang 2010-04-08 6:51 ` [PATCH 1/2] Add a basic idea section for git-blame Bo Yang 0 siblings, 1 reply; 7+ messages in thread From: Bo Yang @ 2010-04-08 6:51 UTC (permalink / raw) To: git; +Cc: trast, gitster The current 'git-blame' manual state that '-M' detect code move and '-C' detect code copy. After read some code of 'git-blame' and some expriments with it. I found that both '-M' and '-C' detect code move and copy. So, I think the document should be updated. Thanks Thomas Rast a lot for his help on this patch. Bo Yang (2): Add a basic idea section for git-blame. Change the description of '-M' and '-C' option. Documentation/blame-options.txt | 41 +++++++++++++++++++++++--------------- Documentation/git-blame.txt | 32 ++++++++++++++++++++++++++++++ 2 files changed, 57 insertions(+), 16 deletions(-) ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] Add a basic idea section for git-blame. 2010-04-08 6:51 [PATCH 0/2] Document update for 'git-blame' '-M' and '-C' option Bo Yang @ 2010-04-08 6:51 ` Bo Yang 2010-04-08 6:51 ` [PATCH 2/2] Change the description of '-M' and '-C' option Bo Yang 2010-04-08 8:00 ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano 0 siblings, 2 replies; 7+ messages in thread From: Bo Yang @ 2010-04-08 6:51 UTC (permalink / raw) To: git; +Cc: trast, gitster Explain the basic idea about blame shifting with '-M' or '-C' given. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> --- Documentation/git-blame.txt | 32 ++++++++++++++++++++++++++++++++ 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt index a27f439..e31b87f 100644 --- a/Documentation/git-blame.txt +++ b/Documentation/git-blame.txt @@ -36,6 +36,38 @@ $ git log --pretty=oneline -S'blame_usage' ea4c7f9bf69e781dd0cd88d2bccb2bf5cc15c9a7 git-blame: Make the output ----------------------------------------------------------------------------- + +BASIC IDEA +---------- + +This section briefly explains the basic idea behind 'git-blame'. You +do not have to understand it to use git-blame, but it helps in +understanding the `-M` and `-C` options. For the sake of simplicity, +we assume that history is linear. + +A call to `git-blame <rev> -- <file>` works as follows: + +- Look at `git diff <rev>^ <rev>` to see what the commit did. + +- Discard all `-` lines in the diff, since they are no longer part of + `<file>`. + +- Take blame for all `+` lines; i.e., the newly added lines according + to the diff are attributed to `<rev>`. + +- Shift blame for all unchanged lines to the parent; i.e., these lines + are attributed as in a call to `git blame <rev>^ -- <file>`. + +Note that this does not expend any effort to shift blame of `+` +lines. This can be changed with the `-M` and `-C` options. + +With `-M`, this command detects same lines of the current blaming code +inside the current file. And it will shift the blame to the author of +the original lines instead of author of current blaming code. It does +the same for `-C` except that it will search across file boundary and +multiple commits. + + OPTIONS ------- include::blame-options.txt[] -- 1.6.0.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] Change the description of '-M' and '-C' option. 2010-04-08 6:51 ` [PATCH 1/2] Add a basic idea section for git-blame Bo Yang @ 2010-04-08 6:51 ` Bo Yang 2010-04-08 8:00 ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano 1 sibling, 0 replies; 7+ messages in thread From: Bo Yang @ 2010-04-08 6:51 UTC (permalink / raw) To: git; +Cc: trast, gitster Both '-M' and '-C' option detect code moving and copying. The difference between the two options is whether they search across file boundary. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> --- Documentation/blame-options.txt | 41 +++++++++++++++++++++++--------------- 1 files changed, 25 insertions(+), 16 deletions(-) diff --git a/Documentation/blame-options.txt b/Documentation/blame-options.txt index 4833cac..aca2a1c 100644 --- a/Documentation/blame-options.txt +++ b/Documentation/blame-options.txt @@ -79,14 +79,15 @@ of lines before or after the line given by <start>. of the --date option at linkgit:git-log[1]. -M|<num>|:: - Detect moving lines in the file as well. When a commit - moves a block of lines in a file (e.g. the original file - has A and then B, and the commit changes it to B and - then A), the traditional 'blame' algorithm typically blames - the lines that were moved up (i.e. B) to the parent and - assigns blame to the lines that were moved down (i.e. A) - to the child commit. With this option, both groups of lines - are blamed on the parent. + Detect moving/copying lines in the file as well. Instead of + taking blame for all added lines, attempt to find the same + lines 'in the same file' in the parent commit. If such a + match was found, shift the blame to these lines in the + parent. (This expends extra effort on the order of the size + of the file for every change.) ++ +The net effect is that if code is moved or copied within the file, the +lines are attributed to the original instead of the move/copy. + <num> is optional but it is the lower bound on the number of alphanumeric characters that git must detect as moving @@ -94,14 +95,22 @@ within a file for it to associate those lines with the parent commit. -C|<num>|:: - In addition to `-M`, detect lines copied from other - files that were modified in the same commit. This is - useful when you reorganize your program and move code - around across files. When this option is given twice, - the command additionally looks for copies from other - files in the commit that creates the file. When this - option is given three times, the command additionally - looks for copies from other files in any commit. + Like `-M`, detect moving/copying lines between files as well. + Instead of taking blame for all added lines, attempt to find the same + lines across file boundary according to the number of given `-C`. + This is useful when you reorganize your program and move/copy + code around across files. When this option is given once, detect + lines moved/copied from other files that were modified in the + same commit. (This expends extra effort on the order of + <number of modified files>*<file size> for every change.) When this + option is given twice, the command additionally looks for moves/copies + from other files in the commit that creates this file. (This expends + extra effort on the order of <number of files in the commit>*<file size> + for every change.) When this option is given three times, the + command additionally looks for moves/copies from other files + in any commit. (This expends extra effort on the order of + <commit number>*<number of files in one commit>*<file size> + for every change.) + <num> is optional but it is the lower bound on the number of alphanumeric characters that git must detect as moving -- 1.6.0.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] Add a basic idea section for git-blame. 2010-04-08 6:51 ` [PATCH 1/2] Add a basic idea section for git-blame Bo Yang 2010-04-08 6:51 ` [PATCH 2/2] Change the description of '-M' and '-C' option Bo Yang @ 2010-04-08 8:00 ` Junio C Hamano 2010-04-08 16:27 ` Thomas Rast 2010-04-09 5:11 ` Bo Yang 1 sibling, 2 replies; 7+ messages in thread From: Junio C Hamano @ 2010-04-08 8:00 UTC (permalink / raw) To: Bo Yang; +Cc: git, trast Bo Yang <struggleyb.nku@gmail.com> writes: > +A call to `git-blame <rev> -- <file>` works as follows: > + > +- Look at `git diff <rev>^ <rev>` to see what the commit did. > + > +- Discard all `-` lines in the diff, since they are no longer part of > + `<file>`. > + > +- Take blame for all `+` lines; i.e., the newly added lines according > + to the diff are attributed to `<rev>`. I hate to say it, but this is horrifying; it works quite the other way around. The name of the game is not "take blame by looking at the diff to see what we introduced"; it is "avoid taking blame at all cost, by looking at the diff to see what we might have inherited from our parents". A description closer to the truth would be: - Look at output of "git diff <rev>^$n <rev>" for each parent ($n runs from 1 to number of parents); - Ignore all +/- lines. The context ' ' lines and lines outside of diff hunks are known to have been inherited from $n-th parent and they are not our fault. We can happily pass blame for these lines to our parent. - Do the above for all the parents. We (grudgingly) take blame for lines that we failed to find corresponding lines in our parents. The -M option affects what happens between the second and the third step. We try to see if the lines that we did not pass blame to <rev>^$n might have come from <rev>^$n by running diff between the remainder and the blob in <rev>^$n *again*. This lets us catch code movement within the blob, hence the name of the option -M. The -C option affects the choice of the blob in <rev>^$n. Usually, we internally run an equivalent of "git diff -M <rev>^$n <rev>" to notice that the file F that we are analyzing used to be called F' in $n-th parent, and run the diff against it, but -C allows us to check with paths other than that, and additional -C enlarges the search space. This is to notice code movement (with a single -C) or copies (with more -Cs) across paths. For illustrated description, read the classic: http://thread.gmane.org/gmane.comp.version-control.git/28826 Almost all the hits in the first page from this query would help, too: http://search.gmane.org/?query=On+blame%2Fpickaxe&author=Junio+C+Hamano&group=gmane.comp.version-control.git&sort=revdate&DEFAULTOP=and&[=2&xP=blame%09pickaxe&xFILTERS=Gcomp.version-control.git-Ac-Ahamano-Ajunio---A ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] Add a basic idea section for git-blame. 2010-04-08 8:00 ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano @ 2010-04-08 16:27 ` Thomas Rast 2010-04-08 16:51 ` Junio C Hamano 2010-04-09 5:11 ` Bo Yang 1 sibling, 1 reply; 7+ messages in thread From: Thomas Rast @ 2010-04-08 16:27 UTC (permalink / raw) To: Junio C Hamano; +Cc: Bo Yang, git Junio C Hamano wrote: > Bo Yang <struggleyb.nku@gmail.com> writes: > > > +A call to `git-blame <rev> -- <file>` works as follows: > > + > > +- Look at `git diff <rev>^ <rev>` to see what the commit did. > > + > > +- Discard all `-` lines in the diff, since they are no longer part of > > + `<file>`. > > + > > +- Take blame for all `+` lines; i.e., the newly added lines according > > + to the diff are attributed to `<rev>`. > > I hate to say it, but this is horrifying; it works quite the other way > around. > > The name of the game is not "take blame by looking at the diff to see what > we introduced"; it is "avoid taking blame at all cost, by looking at the > diff to see what we might have inherited from our parents". "Horrifying" seems a fairly harsh word for another way to describe the process *in linear history* and with default settings, doesn't it? I now see that "assume linear history" is oversimplifying the process, but I have to take blame^W^W^W^Wwill fail to shift blame for it since I suggested that. > For illustrated description, read the classic: > > http://thread.gmane.org/gmane.comp.version-control.git/28826 > > Almost all the hits in the first page from this query would help, too: > > http://search.gmane.org/?query=On+blame%2Fpickaxe&author=Junio+C+Hamano&group=gmane.comp.version-control.git&sort=revdate&DEFAULTOP=and&[=2&xP=blame%09pickaxe&xFILTERS=Gcomp.version-control.git-Ac-Ahamano-Ajunio---A Thanks. I trust this will help Bo make a revised description for the manpage. [PS for the list record: while part of the patches is indeed mine, I never signed off on them.] -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] Add a basic idea section for git-blame. 2010-04-08 16:27 ` Thomas Rast @ 2010-04-08 16:51 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2010-04-08 16:51 UTC (permalink / raw) To: Thomas Rast; +Cc: Bo Yang, git Thomas Rast <trast@student.ethz.ch> writes: > "Horrifying" seems a fairly harsh word for another way to describe the > process *in linear history* and with default settings, doesn't it? No, even in a linear setting, if you start from "We discard "-" and take blame for +", you can never get to -M/-C. Suppose you had "a/b/c/" in the parent and "1/b/c/2/a/3/4/a/5/" in the child ("/" stands for LF). Your "b/c/" would match and you say "I added 1/, 2/a/3/4/a/5/". How would you fix that so that you can say that "a/" after "2/" was actually moved from the first line, and possibly the second "a/" was also copied from the same line, if you discard all "-"? You can say "in linear history and no -M nor -C, you could annotate things this way", but then you are not describing "blame" anymore; you are describing something else that may work very well in much simpler setting and it might even be how some other SCMs would do it. But it _is_ horrifying to see that in a section that begins with "blame works as follows". ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] Add a basic idea section for git-blame. 2010-04-08 8:00 ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano 2010-04-08 16:27 ` Thomas Rast @ 2010-04-09 5:11 ` Bo Yang 1 sibling, 0 replies; 7+ messages in thread From: Bo Yang @ 2010-04-09 5:11 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, trast Hi Junio, On Thu, Apr 8, 2010 at 1:00 AM, Junio C Hamano <gitster@pobox.com> wrote: > The name of the game is not "take blame by looking at the diff to see what > we introduced"; it is "avoid taking blame at all cost, by looking at the > diff to see what we might have inherited from our parents". > > A description closer to the truth would be: > > - Look at output of "git diff <rev>^$n <rev>" for each parent ($n runs > from 1 to number of parents); > > - Ignore all +/- lines. The context ' ' lines and lines outside of diff > hunks are known to have been inherited from $n-th parent and they are > not our fault. We can happily pass blame for these lines to our > parent. > > - Do the above for all the parents. We (grudgingly) take blame for lines > that we failed to find corresponding lines in our parents. > > The -M option affects what happens between the second and the third step. > We try to see if the lines that we did not pass blame to <rev>^$n might > have come from <rev>^$n by running diff between the remainder and the blob > in <rev>^$n *again*. This lets us catch code movement within the blob, > hence the name of the option -M. I think here, not only the code movement, but also the copies will be found too. Please correct me if I am wrong here. :-) > The -C option affects the choice of the blob in <rev>^$n. Usually, we > internally run an equivalent of "git diff -M <rev>^$n <rev>" to notice > that the file F that we are analyzing used to be called F' in $n-th > parent, and run the diff against it, but -C allows us to check with paths > other than that, and additional -C enlarges the search space. This is to > notice code movement (with a single -C) or copies (with more -Cs) across > paths. > > For illustrated description, read the classic: > > http://thread.gmane.org/gmane.comp.version-control.git/28826 Thanks a lot for providing this reference. It really helps me too much to understand how blame works. I will try to re-read the whole code according the description here. ;-) Regards! Bo ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-04-09 5:12 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-08 6:51 [PATCH 0/2] Document update for 'git-blame' '-M' and '-C' option Bo Yang 2010-04-08 6:51 ` [PATCH 1/2] Add a basic idea section for git-blame Bo Yang 2010-04-08 6:51 ` [PATCH 2/2] Change the description of '-M' and '-C' option Bo Yang 2010-04-08 8:00 ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano 2010-04-08 16:27 ` Thomas Rast 2010-04-08 16:51 ` Junio C Hamano 2010-04-09 5:11 ` Bo Yang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).