git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git diff -B: splitting up complete renames
@ 2008-01-05 17:18 Jakub Narebski
  2008-01-05 17:37 ` Linus Torvalds
  2008-01-05 19:18 ` Junio C Hamano
  0 siblings, 2 replies; 3+ messages in thread
From: Jakub Narebski @ 2008-01-05 17:18 UTC (permalink / raw)
  To: git

First: in Documentation/diffcore.txt there is an example of broken pair

  diffcore-break: For Splitting Up "Complete Rewrites"
  ----------------------------------------------------

  The second transformation in the chain is diffcore-break, and is
  controlled by the -B option to the git-diff-* commands.  This is
  used to detect a filepair that represents "complete rewrite" and
  break such filepair into two filepairs that represent delete and
  create.  E.g.  If the input contained this filepair:

  ------------------------------------------------
  :100644 100644 bcd1234... 0123456... M file0
  ------------------------------------------------

  and if it detects that the file "file0" is completely rewritten,
  it changes it to:

  ------------------------------------------------
  :100644 000000 bcd1234... 0000000... D file0
  :000000 100644 0000000... 0123456... A file0
  ------------------------------------------------

Shouldn't the last block read for the modern git read:

  ------------------------------------------------
  :100644 000000 bcd1234... 0123456... M99 file0
  ------------------------------------------------

or is it only for '-B' togehter with '-M' or '-C'?


Second: do you have per chance good examples of broken (and perhaps 
merged together) pair, such that it is affected by --diff-filter=B;
with -B<num> or -B<num1>/<num2> parameters needed, if required. COPYING 
and Makefile are one good example, but do you have literal files with 
similar behavior? 

Third: Do "git diff --no-index" (filesystem diff) can show breaking / 
use dissimilarity? I couldn't make it work...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: git diff -B: splitting up complete renames
  2008-01-05 17:18 git diff -B: splitting up complete renames Jakub Narebski
@ 2008-01-05 17:37 ` Linus Torvalds
  2008-01-05 19:18 ` Junio C Hamano
  1 sibling, 0 replies; 3+ messages in thread
From: Linus Torvalds @ 2008-01-05 17:37 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git



On Sat, 5 Jan 2008, Jakub Narebski wrote:
> 
> Shouldn't the last block read for the modern git read:
> 
>   ------------------------------------------------
>   :100644 000000 bcd1234... 0123456... M99 file0
>   ------------------------------------------------
> 
> or is it only for '-B' togehter with '-M' or '-C'?

Without -M/C, git re-joins them, so yeah, either the documentation (or 
that re-joining logic) is wrong.

> Second: do you have per chance good examples of broken (and perhaps 
> merged together) pair, such that it is affected by --diff-filter=B;
> with -B<num> or -B<num1>/<num2> parameters needed, if required. COPYING 
> and Makefile are one good example, but do you have literal files with 
> similar behavior? 

Hmm. A kernel example that shows *something* is

	git show -B --raw -r acbbbe9f5ab52da90a8edec02ec973e7f44dae81

which shows 

	:100644 100644 eb14b18... 1f2d6d5... M093       include/asm-x86/byteorder.h

even though that 093 is the *dissimilarity* index, so according to the 
documentation you quote you'd expect it to be

	:100644 000000 eb14b18... 0000000... D	include/asm-x86/byteorder.h
	:000000 100644 0000000... 1f2d6d5... A	include/asm-x86/byteorder.h

instead of a M. Whether it's the docs or the behavior that should be 
changed, I dunno.

Especially as using -B -M makes it all sane again, because then it shows 
it as the real rename:

	:100644 100644 435074a... 26b0dcd... M  include/asm-x86/Kbuild
	:100644 100644 a45470a... 1f2d6d5... R061       include/asm-x86/byteorder_32.h  include/asm-x86/byteorde
	:100644 000000 5e86c86... 0000000... D  include/asm-x86/byteorder_64.h

and it's all good.

My personal opinion is that -B doesn't make much sense without M (or C), 
but at least you can see an example of the breaking behaviour in the 
kernel for a real case.

			Linus

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: git diff -B: splitting up complete renames
  2008-01-05 17:18 git diff -B: splitting up complete renames Jakub Narebski
  2008-01-05 17:37 ` Linus Torvalds
@ 2008-01-05 19:18 ` Junio C Hamano
  1 sibling, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2008-01-05 19:18 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> ...
>   and if it detects that the file "file0" is completely rewritten,
>   it changes it to:
>
>   ------------------------------------------------
>   :100644 000000 bcd1234... 0000000... D file0
>   :000000 100644 0000000... 0123456... A file0
>   ------------------------------------------------
>
> Shouldn't the last block read for the modern git read:
>
>   ------------------------------------------------
>   :100644 000000 bcd1234... 0123456... M99 file0
>   ------------------------------------------------

Correct.  The description is based on an earlier round of the
code before 366175ef8c3b1e145f4ba846e63a1dea3ec3cacc (Rework -B
output).  "git show" of that patch has description in its
changes to Documentation/diffcore.txt.

The original motivation of -B was not about helping three way
renames it allows you when used with -M/-C.  It was about
showing textual diff more readably for quite dissimilar
filepairs.

> Third: Do "git diff --no-index" (filesystem diff) can show breaking / 
> use dissimilarity? I couldn't make it work...

It doesn't, for two reasons.

 * Look at diffcore_break().  It breaks filepairs with identical
   paths and nothing else.

   The reason for this check is that the caller can call
   diffcore_rename() _before_ diffcore_break() [*1*] and a
   filepair that was detected as renamed pair ought to be
   already found similar enough.  These days, I think you can
   replace the check with p->renamed_pair.

 * Look at diffcore_merge_broken(), which is responsible for
   re-merging a broken pair.  Because we match them by looking
   at their names (iow, we do not use pointers to link halves of
   a broken pair pointing at each other).

   This was correct before "no-index" stuff because "git diff"
   was about comparing pairs of blobs found at corresponding
   paths in two tree-like things (i.e. "a/one" and "b/one"
   corresponds to each other when you do "git diff -- one").
   The modification to introduce no-index forgot to update this
   logic.

If you wanted to fix this, you can change p->broken_pair from a
boolean to a pointer that points at the other half of the broken
pair (and record that when we break a pair in diffcore_break()),
and look at that pointer to decide which one to match up with
which other one in diffcore_merge_broken().  Together with a
change to look at p->renamed_pair instead of paths I mentioned,
I think it would work more like the regular git diff for
"no-index" case.

I consider that "no-index" frontend more or less a bolted on
half-baked hack that covers only minimally necessary cases to
serve as non-git "diff" replacement, without sharing enough with
the real "git diff" internals; I would not be surprised at all
if there are more corner cases like this that does not work.
But I do not think it is fundamentally unfixable.  The change to
add "no-index" support just needed to be more careful, and it is
not too late to make it so.


[Footnote]

*1* diffcore was designed as a generic library to allow
experimenting more combinations of transformations than what
then-current git used.  All of our existing callers ended up
calling the transformations in the same order (i.e. the order
diffcore_std() calls them), but individual transformations were
written not to assume too much about the order they would be
called.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-01-05 19:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-05 17:18 git diff -B: splitting up complete renames Jakub Narebski
2008-01-05 17:37 ` Linus Torvalds
2008-01-05 19:18 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).