* git diff -B: splitting up complete renames
@ 2008-01-05 17:18 Jakub Narebski
2008-01-05 17:37 ` Linus Torvalds
2008-01-05 19:18 ` Junio C Hamano
0 siblings, 2 replies; 3+ messages in thread
From: Jakub Narebski @ 2008-01-05 17:18 UTC (permalink / raw)
To: git
First: in Documentation/diffcore.txt there is an example of broken pair
diffcore-break: For Splitting Up "Complete Rewrites"
----------------------------------------------------
The second transformation in the chain is diffcore-break, and is
controlled by the -B option to the git-diff-* commands. This is
used to detect a filepair that represents "complete rewrite" and
break such filepair into two filepairs that represent delete and
create. E.g. If the input contained this filepair:
------------------------------------------------
:100644 100644 bcd1234... 0123456... M file0
------------------------------------------------
and if it detects that the file "file0" is completely rewritten,
it changes it to:
------------------------------------------------
:100644 000000 bcd1234... 0000000... D file0
:000000 100644 0000000... 0123456... A file0
------------------------------------------------
Shouldn't the last block read for the modern git read:
------------------------------------------------
:100644 000000 bcd1234... 0123456... M99 file0
------------------------------------------------
or is it only for '-B' togehter with '-M' or '-C'?
Second: do you have per chance good examples of broken (and perhaps
merged together) pair, such that it is affected by --diff-filter=B;
with -B<num> or -B<num1>/<num2> parameters needed, if required. COPYING
and Makefile are one good example, but do you have literal files with
similar behavior?
Third: Do "git diff --no-index" (filesystem diff) can show breaking /
use dissimilarity? I couldn't make it work...
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: git diff -B: splitting up complete renames
2008-01-05 17:18 git diff -B: splitting up complete renames Jakub Narebski
@ 2008-01-05 17:37 ` Linus Torvalds
2008-01-05 19:18 ` Junio C Hamano
1 sibling, 0 replies; 3+ messages in thread
From: Linus Torvalds @ 2008-01-05 17:37 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
On Sat, 5 Jan 2008, Jakub Narebski wrote:
>
> Shouldn't the last block read for the modern git read:
>
> ------------------------------------------------
> :100644 000000 bcd1234... 0123456... M99 file0
> ------------------------------------------------
>
> or is it only for '-B' togehter with '-M' or '-C'?
Without -M/C, git re-joins them, so yeah, either the documentation (or
that re-joining logic) is wrong.
> Second: do you have per chance good examples of broken (and perhaps
> merged together) pair, such that it is affected by --diff-filter=B;
> with -B<num> or -B<num1>/<num2> parameters needed, if required. COPYING
> and Makefile are one good example, but do you have literal files with
> similar behavior?
Hmm. A kernel example that shows *something* is
git show -B --raw -r acbbbe9f5ab52da90a8edec02ec973e7f44dae81
which shows
:100644 100644 eb14b18... 1f2d6d5... M093 include/asm-x86/byteorder.h
even though that 093 is the *dissimilarity* index, so according to the
documentation you quote you'd expect it to be
:100644 000000 eb14b18... 0000000... D include/asm-x86/byteorder.h
:000000 100644 0000000... 1f2d6d5... A include/asm-x86/byteorder.h
instead of a M. Whether it's the docs or the behavior that should be
changed, I dunno.
Especially as using -B -M makes it all sane again, because then it shows
it as the real rename:
:100644 100644 435074a... 26b0dcd... M include/asm-x86/Kbuild
:100644 100644 a45470a... 1f2d6d5... R061 include/asm-x86/byteorder_32.h include/asm-x86/byteorde
:100644 000000 5e86c86... 0000000... D include/asm-x86/byteorder_64.h
and it's all good.
My personal opinion is that -B doesn't make much sense without M (or C),
but at least you can see an example of the breaking behaviour in the
kernel for a real case.
Linus
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: git diff -B: splitting up complete renames
2008-01-05 17:18 git diff -B: splitting up complete renames Jakub Narebski
2008-01-05 17:37 ` Linus Torvalds
@ 2008-01-05 19:18 ` Junio C Hamano
1 sibling, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2008-01-05 19:18 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
> ...
> and if it detects that the file "file0" is completely rewritten,
> it changes it to:
>
> ------------------------------------------------
> :100644 000000 bcd1234... 0000000... D file0
> :000000 100644 0000000... 0123456... A file0
> ------------------------------------------------
>
> Shouldn't the last block read for the modern git read:
>
> ------------------------------------------------
> :100644 000000 bcd1234... 0123456... M99 file0
> ------------------------------------------------
Correct. The description is based on an earlier round of the
code before 366175ef8c3b1e145f4ba846e63a1dea3ec3cacc (Rework -B
output). "git show" of that patch has description in its
changes to Documentation/diffcore.txt.
The original motivation of -B was not about helping three way
renames it allows you when used with -M/-C. It was about
showing textual diff more readably for quite dissimilar
filepairs.
> Third: Do "git diff --no-index" (filesystem diff) can show breaking /
> use dissimilarity? I couldn't make it work...
It doesn't, for two reasons.
* Look at diffcore_break(). It breaks filepairs with identical
paths and nothing else.
The reason for this check is that the caller can call
diffcore_rename() _before_ diffcore_break() [*1*] and a
filepair that was detected as renamed pair ought to be
already found similar enough. These days, I think you can
replace the check with p->renamed_pair.
* Look at diffcore_merge_broken(), which is responsible for
re-merging a broken pair. Because we match them by looking
at their names (iow, we do not use pointers to link halves of
a broken pair pointing at each other).
This was correct before "no-index" stuff because "git diff"
was about comparing pairs of blobs found at corresponding
paths in two tree-like things (i.e. "a/one" and "b/one"
corresponds to each other when you do "git diff -- one").
The modification to introduce no-index forgot to update this
logic.
If you wanted to fix this, you can change p->broken_pair from a
boolean to a pointer that points at the other half of the broken
pair (and record that when we break a pair in diffcore_break()),
and look at that pointer to decide which one to match up with
which other one in diffcore_merge_broken(). Together with a
change to look at p->renamed_pair instead of paths I mentioned,
I think it would work more like the regular git diff for
"no-index" case.
I consider that "no-index" frontend more or less a bolted on
half-baked hack that covers only minimally necessary cases to
serve as non-git "diff" replacement, without sharing enough with
the real "git diff" internals; I would not be surprised at all
if there are more corner cases like this that does not work.
But I do not think it is fundamentally unfixable. The change to
add "no-index" support just needed to be more careful, and it is
not too late to make it so.
[Footnote]
*1* diffcore was designed as a generic library to allow
experimenting more combinations of transformations than what
then-current git used. All of our existing callers ended up
calling the transformations in the same order (i.e. the order
diffcore_std() calls them), but individual transformations were
written not to assume too much about the order they would be
called.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-01-05 19:19 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-05 17:18 git diff -B: splitting up complete renames Jakub Narebski
2008-01-05 17:37 ` Linus Torvalds
2008-01-05 19:18 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).