git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Document update for 'git-blame' '-M' and '-C' option
@ 2010-04-08  6:51 Bo Yang
  2010-04-08  6:51 ` [PATCH 1/2] Add a basic idea section for git-blame Bo Yang
  0 siblings, 1 reply; 9+ messages in thread
From: Bo Yang @ 2010-04-08  6:51 UTC (permalink / raw)
  To: git; +Cc: trast, gitster

The current 'git-blame' manual state that '-M' detect code move and '-C' detect code copy. After read some code of 'git-blame' and some expriments with it. I found that both '-M' and '-C' detect code move and copy. So, I think the document should be updated.
Thanks Thomas Rast a lot for his help on this patch.

Bo Yang (2):
  Add a basic idea section for git-blame.
  Change the description of '-M' and '-C' option.

 Documentation/blame-options.txt |   41 +++++++++++++++++++++++---------------
 Documentation/git-blame.txt     |   32 ++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] Add a basic idea section for git-blame.
  2010-04-08  6:51 [PATCH 0/2] Document update for 'git-blame' '-M' and '-C' option Bo Yang
@ 2010-04-08  6:51 ` Bo Yang
  2010-04-08  6:51   ` [PATCH 2/2] Change the description of '-M' and '-C' option Bo Yang
  2010-04-08  8:00   ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano
  0 siblings, 2 replies; 9+ messages in thread
From: Bo Yang @ 2010-04-08  6:51 UTC (permalink / raw)
  To: git; +Cc: trast, gitster

Explain the basic idea about blame shifting with
'-M' or '-C' given.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
---
 Documentation/git-blame.txt |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt
index a27f439..e31b87f 100644
--- a/Documentation/git-blame.txt
+++ b/Documentation/git-blame.txt
@@ -36,6 +36,38 @@ $ git log --pretty=oneline -S'blame_usage'
 ea4c7f9bf69e781dd0cd88d2bccb2bf5cc15c9a7 git-blame: Make the output
 -----------------------------------------------------------------------------
 
+
+BASIC IDEA
+----------
+
+This section briefly explains the basic idea behind 'git-blame'.  You
+do not have to understand it to use git-blame, but it helps in
+understanding the `-M` and `-C` options.  For the sake of simplicity,
+we assume that history is linear.
+
+A call to `git-blame <rev> -- <file>` works as follows:
+
+- Look at `git diff <rev>^ <rev>` to see what the commit did.
+
+- Discard all `-` lines in the diff, since they are no longer part of
+  `<file>`.
+
+- Take blame for all `+` lines; i.e., the newly added lines according
+  to the diff are attributed to `<rev>`.
+
+- Shift blame for all unchanged lines to the parent; i.e., these lines
+  are attributed as in a call to `git blame <rev>^ -- <file>`.
+
+Note that this does not expend any effort to shift blame of `+`
+lines.  This can be changed with the `-M` and `-C` options.
+
+With `-M`, this command detects same lines of the current blaming code
+inside the current file. And it will shift the blame to the author of
+the original lines instead of author of current blaming code. It does
+the same for `-C` except that it will search across file boundary and
+multiple commits.
+
+
 OPTIONS
 -------
 include::blame-options.txt[]
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] Change the description of '-M' and '-C' option.
  2010-04-08  6:51 ` [PATCH 1/2] Add a basic idea section for git-blame Bo Yang
@ 2010-04-08  6:51   ` Bo Yang
  2010-04-08  8:00   ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Bo Yang @ 2010-04-08  6:51 UTC (permalink / raw)
  To: git; +Cc: trast, gitster

Both '-M' and '-C' option detect code moving and copying.
The difference between the two options is whether they
search across file boundary.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
---
 Documentation/blame-options.txt |   41 +++++++++++++++++++++++---------------
 1 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/Documentation/blame-options.txt b/Documentation/blame-options.txt
index 4833cac..aca2a1c 100644
--- a/Documentation/blame-options.txt
+++ b/Documentation/blame-options.txt
@@ -79,14 +79,15 @@ of lines before or after the line given by <start>.
 	of the --date option at linkgit:git-log[1].
 
 -M|<num>|::
-	Detect moving lines in the file as well.  When a commit
-	moves a block of lines in a file (e.g. the original file
-	has A and then B, and the commit changes it to B and
-	then A), the traditional 'blame' algorithm typically blames
-	the lines that were moved up (i.e. B) to the parent and
-	assigns blame to the lines that were moved down (i.e. A)
-	to the child commit.  With this option, both groups of lines
-	are blamed on the parent.
+	Detect moving/copying lines in the file as well.  Instead of
+	taking blame for all added lines, attempt to find the same
+	lines 'in the same file' in the parent commit.  If such a
+	match was found, shift the blame to these lines in the
+	parent. (This expends extra effort on the order of the size
+	of the file for every change.)
++
+The net effect is that if code is moved or copied within the file, the
+lines are attributed to the original instead of the move/copy.
 +
 <num> is optional but it is the lower bound on the number of
 alphanumeric characters that git must detect as moving
@@ -94,14 +95,22 @@ within a file for it to associate those lines with the parent
 commit.
 
 -C|<num>|::
-	In addition to `-M`, detect lines copied from other
-	files that were modified in the same commit.  This is
-	useful when you reorganize your program and move code
-	around across files.  When this option is given twice,
-	the command additionally looks for copies from other
-	files in the commit that creates the file. When this
-	option is given three times, the command additionally
-	looks for copies from other files in any commit.
+	Like `-M`, detect moving/copying lines between files as well.
+	Instead of taking blame for all added lines, attempt to find the same
+	lines across file boundary according to the number of given `-C`.
+	This is useful when you reorganize your program and move/copy
+	code around across files. When this option is given once, detect
+	lines moved/copied from other files that were modified in the
+	same commit. (This expends extra effort on the order of
+	<number of modified files>*<file size> for every change.) When this
+	option is given twice, the command additionally looks for moves/copies
+	from other files in the commit that creates this file. (This expends
+	extra effort on the order of <number of files in the commit>*<file size>
+	for every change.) When this option is given three times, the
+	command additionally looks for moves/copies from other files
+	in any commit. (This expends extra effort on the order of
+	<commit number>*<number of files in one commit>*<file size>
+	for every change.)
 +
 <num> is optional but it is the lower bound on the number of
 alphanumeric characters that git must detect as moving
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Add a basic idea section for git-blame.
  2010-04-08  6:51 ` [PATCH 1/2] Add a basic idea section for git-blame Bo Yang
  2010-04-08  6:51   ` [PATCH 2/2] Change the description of '-M' and '-C' option Bo Yang
@ 2010-04-08  8:00   ` Junio C Hamano
  2010-04-08 16:27     ` Thomas Rast
  2010-04-09  5:11     ` Bo Yang
  1 sibling, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2010-04-08  8:00 UTC (permalink / raw)
  To: Bo Yang; +Cc: git, trast

Bo Yang <struggleyb.nku@gmail.com> writes:

> +A call to `git-blame <rev> -- <file>` works as follows:
> +
> +- Look at `git diff <rev>^ <rev>` to see what the commit did.
> +
> +- Discard all `-` lines in the diff, since they are no longer part of
> +  `<file>`.
> +
> +- Take blame for all `+` lines; i.e., the newly added lines according
> +  to the diff are attributed to `<rev>`.

I hate to say it, but this is horrifying; it works quite the other way
around.

The name of the game is not "take blame by looking at the diff to see what
we introduced"; it is "avoid taking blame at all cost, by looking at the
diff to see what we might have inherited from our parents".

A description closer to the truth would be:

 - Look at output of "git diff <rev>^$n <rev>" for each parent ($n runs
   from 1 to number of parents);

 - Ignore all +/- lines.  The context ' ' lines and lines outside of diff
   hunks are known to have been inherited from $n-th parent and they are
   not our fault.  We can happily pass blame for these lines to our
   parent.

 - Do the above for all the parents.  We (grudgingly) take blame for lines
   that we failed to find corresponding lines in our parents.

The -M option affects what happens between the second and the third step.
We try to see if the lines that we did not pass blame to <rev>^$n might
have come from <rev>^$n by running diff between the remainder and the blob
in <rev>^$n *again*.  This lets us catch code movement within the blob,
hence the name of the option -M.

The -C option affects the choice of the blob in <rev>^$n.  Usually, we
internally run an equivalent of "git diff -M <rev>^$n <rev>" to notice
that the file F that we are analyzing used to be called F' in $n-th
parent, and run the diff against it, but -C allows us to check with paths
other than that, and additional -C enlarges the search space.  This is to
notice code movement (with a single -C) or copies (with more -Cs) across
paths.

For illustrated description, read the classic:
 
    http://thread.gmane.org/gmane.comp.version-control.git/28826

Almost all the hits in the first page from this query would help, too:

http://search.gmane.org/?query=On+blame%2Fpickaxe&author=Junio+C+Hamano&group=gmane.comp.version-control.git&sort=revdate&DEFAULTOP=and&[=2&xP=blame%09pickaxe&xFILTERS=Gcomp.version-control.git-Ac-Ahamano-Ajunio---A

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] Change the description of '-M' and '-C' option.
@ 2010-04-08  8:07 Yann Dirson
  2010-04-09  5:15 ` Bo Yang
  0 siblings, 1 reply; 9+ messages in thread
From: Yann Dirson @ 2010-04-08  8:07 UTC (permalink / raw)
  To: git; +Cc: ydirson

Very interesting addition, indeed - thanks much.

Some information may still be missing, though:

- the default value of the numeric argument to -M/-C
- the meaning of numeric arguments to the various flags when several
  are given (eg. what does "blame -M10 -C5 -C15" mean ?)
- the numeric arguments are not visible in the summary

On another aspect, given the nature of repetitive -C we may want to warn
the user of bomb out if more than 3 occurences of the flag happen ?

-- 
Yann

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Add a basic idea section for git-blame.
  2010-04-08  8:00   ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano
@ 2010-04-08 16:27     ` Thomas Rast
  2010-04-08 16:51       ` Junio C Hamano
  2010-04-09  5:11     ` Bo Yang
  1 sibling, 1 reply; 9+ messages in thread
From: Thomas Rast @ 2010-04-08 16:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Bo Yang, git

Junio C Hamano wrote:
> Bo Yang <struggleyb.nku@gmail.com> writes:
> 
> > +A call to `git-blame <rev> -- <file>` works as follows:
> > +
> > +- Look at `git diff <rev>^ <rev>` to see what the commit did.
> > +
> > +- Discard all `-` lines in the diff, since they are no longer part of
> > +  `<file>`.
> > +
> > +- Take blame for all `+` lines; i.e., the newly added lines according
> > +  to the diff are attributed to `<rev>`.
> 
> I hate to say it, but this is horrifying; it works quite the other way
> around.
> 
> The name of the game is not "take blame by looking at the diff to see what
> we introduced"; it is "avoid taking blame at all cost, by looking at the
> diff to see what we might have inherited from our parents".

"Horrifying" seems a fairly harsh word for another way to describe the
process *in linear history* and with default settings, doesn't it?

I now see that "assume linear history" is oversimplifying the process,
but I have to take blame^W^W^W^Wwill fail to shift blame for it since
I suggested that.

> For illustrated description, read the classic:
>  
>     http://thread.gmane.org/gmane.comp.version-control.git/28826
> 
> Almost all the hits in the first page from this query would help, too:
> 
> http://search.gmane.org/?query=On+blame%2Fpickaxe&author=Junio+C+Hamano&group=gmane.comp.version-control.git&sort=revdate&DEFAULTOP=and&[=2&xP=blame%09pickaxe&xFILTERS=Gcomp.version-control.git-Ac-Ahamano-Ajunio---A

Thanks.  I trust this will help Bo make a revised description for the
manpage.



[PS for the list record: while part of the patches is indeed mine, I
never signed off on them.]

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Add a basic idea section for git-blame.
  2010-04-08 16:27     ` Thomas Rast
@ 2010-04-08 16:51       ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2010-04-08 16:51 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Bo Yang, git

Thomas Rast <trast@student.ethz.ch> writes:

> "Horrifying" seems a fairly harsh word for another way to describe the
> process *in linear history* and with default settings, doesn't it?

No, even in a linear setting, if you start from "We discard "-" and take
blame for +", you can never get to -M/-C.

Suppose you had "a/b/c/" in the parent and "1/b/c/2/a/3/4/a/5/" in the
child ("/" stands for LF).  Your "b/c/" would match and you say "I added
1/, 2/a/3/4/a/5/".  How would you fix that so that you can say that "a/"
after "2/" was actually moved from the first line, and possibly the second
"a/" was also copied from the same line, if you discard all "-"?

You can say "in linear history and no -M nor -C, you could annotate things
this way", but then you are not describing "blame" anymore; you are
describing something else that may work very well in much simpler setting
and it might even be how some other SCMs would do it.

But it _is_ horrifying to see that in a section that begins with "blame
works as follows".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Add a basic idea section for git-blame.
  2010-04-08  8:00   ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano
  2010-04-08 16:27     ` Thomas Rast
@ 2010-04-09  5:11     ` Bo Yang
  1 sibling, 0 replies; 9+ messages in thread
From: Bo Yang @ 2010-04-09  5:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, trast

Hi Junio,
On Thu, Apr 8, 2010 at 1:00 AM, Junio C Hamano <gitster@pobox.com> wrote:
> The name of the game is not "take blame by looking at the diff to see what
> we introduced"; it is "avoid taking blame at all cost, by looking at the
> diff to see what we might have inherited from our parents".
>
> A description closer to the truth would be:
>
>  - Look at output of "git diff <rev>^$n <rev>" for each parent ($n runs
>   from 1 to number of parents);
>
>  - Ignore all +/- lines.  The context ' ' lines and lines outside of diff
>   hunks are known to have been inherited from $n-th parent and they are
>   not our fault.  We can happily pass blame for these lines to our
>   parent.
>
>  - Do the above for all the parents.  We (grudgingly) take blame for lines
>   that we failed to find corresponding lines in our parents.
>
> The -M option affects what happens between the second and the third step.
> We try to see if the lines that we did not pass blame to <rev>^$n might
> have come from <rev>^$n by running diff between the remainder and the blob
> in <rev>^$n *again*.  This lets us catch code movement within the blob,
> hence the name of the option -M.

I think here, not only the code movement, but also the copies will be found too.
Please correct me if I am wrong here. :-)

> The -C option affects the choice of the blob in <rev>^$n.  Usually, we
> internally run an equivalent of "git diff -M <rev>^$n <rev>" to notice
> that the file F that we are analyzing used to be called F' in $n-th
> parent, and run the diff against it, but -C allows us to check with paths
> other than that, and additional -C enlarges the search space.  This is to
> notice code movement (with a single -C) or copies (with more -Cs) across
> paths.
>
> For illustrated description, read the classic:
>
>    http://thread.gmane.org/gmane.comp.version-control.git/28826

Thanks a lot for providing this reference. It really helps me too much
to understand how blame works. I will try to re-read the whole code
according the description here. ;-)

Regards!
Bo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] Change the description of '-M' and '-C' option.
  2010-04-08  8:07 [PATCH 2/2] Change the description of '-M' and '-C' option Yann Dirson
@ 2010-04-09  5:15 ` Bo Yang
  0 siblings, 0 replies; 9+ messages in thread
From: Bo Yang @ 2010-04-09  5:15 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git, ydirson

Hi Yann,
On Thu, Apr 8, 2010 at 1:07 AM, Yann Dirson <dirson@bertin.fr> wrote:
> Very interesting addition, indeed - thanks much.
>
> Some information may still be missing, though:
>
> - the default value of the numeric argument to -M/-C
> - the meaning of numeric arguments to the various flags when several
>  are given (eg. what does "blame -M10 -C5 -C15" mean ?)
> - the numeric arguments are not visible in the summary

Ah, thanks a lot for these advice, I will add them on next version of
this patch.

> On another aspect, given the nature of repetitive -C we may want to warn
> the user of bomb out if more than 3 occurences of the flag happen ?

En, sound rationale, I think. I will try another patch to add this. Thanks!

Regards!
Bo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-04-09  5:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-08  6:51 [PATCH 0/2] Document update for 'git-blame' '-M' and '-C' option Bo Yang
2010-04-08  6:51 ` [PATCH 1/2] Add a basic idea section for git-blame Bo Yang
2010-04-08  6:51   ` [PATCH 2/2] Change the description of '-M' and '-C' option Bo Yang
2010-04-08  8:00   ` [PATCH 1/2] Add a basic idea section for git-blame Junio C Hamano
2010-04-08 16:27     ` Thomas Rast
2010-04-08 16:51       ` Junio C Hamano
2010-04-09  5:11     ` Bo Yang
  -- strict thread matches above, loose matches on Subject: below --
2010-04-08  8:07 [PATCH 2/2] Change the description of '-M' and '-C' option Yann Dirson
2010-04-09  5:15 ` Bo Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).