* Diff output from a rewrite of a function
@ 2007-03-08 19:04 Ilpo Järvinen
2007-03-08 22:01 ` Robin Rosenberg
0 siblings, 1 reply; 5+ messages in thread
From: Ilpo Järvinen @ 2007-03-08 19:04 UTC (permalink / raw)
To: git
Hi,
I have a C source file in which couple of functions are fully rewritten
(only a part of the file), almost no real similarities (block closing
parents will obviously match still and couple of other keyword lines here
and there). I wonder if there is way to produce diff that does not get
confused by the empty lines / identical lines that are present in both
original and the modified version. Default diff output is very bad looking
(IMHO) because these identical lines cause a "synchronization point" to
occur, that is, each identical line of the original is not considered as
+/- but left as is. Thus I have something like 3-6 add+del blocks per
function with a part of the change rather than e.g., one block per
function or so... I found -B from man git-diff-files, but I guess
"complete rewrite changes" means whole files as it did do anything.
Any ideas?
I'm currently using 1.4.2, if that's significant.
--
i.
ps. Please cc me when replying.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Diff output from a rewrite of a function
2007-03-08 19:04 Diff output from a rewrite of a function Ilpo Järvinen
@ 2007-03-08 22:01 ` Robin Rosenberg
2007-03-08 23:24 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: Robin Rosenberg @ 2007-03-08 22:01 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: git
torsdag 08 mars 2007 20:04 skrev Ilpo Järvinen:
> Hi,
>
> I have a C source file in which couple of functions are fully rewritten
> (only a part of the file), almost no real similarities (block closing
> parents will obviously match still and couple of other keyword lines here
> and there). I wonder if there is way to produce diff that does not get
> confused by the empty lines / identical lines that are present in both
> original and the modified version. Default diff output is very bad looking
> (IMHO) because these identical lines cause a "synchronization point" to
> occur, that is, each identical line of the original is not considered as
> +/- but left as is. Thus I have something like 3-6 add+del blocks per
> function with a part of the change rather than e.g., one block per
> function or so... I found -B from man git-diff-files, but I guess
> "complete rewrite changes" means whole files as it did do anything.
>
> Any ideas?
Increase the context size from the default three lines. Something like
diff -U 7 old new will require larger chunks of unchanged code for diff
break up a hunk. With git you can do
GIT_DIFF_OPTS=-u7 git-diff-....
>
> I'm currently using 1.4.2, if that's significant.
>
That was 1.5, but I think that part didn't change.
-- robin
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Diff output from a rewrite of a function
2007-03-08 22:01 ` Robin Rosenberg
@ 2007-03-08 23:24 ` Junio C Hamano
2007-03-09 8:39 ` Ilpo Järvinen
2007-03-09 19:47 ` Robin Rosenberg
0 siblings, 2 replies; 5+ messages in thread
From: Junio C Hamano @ 2007-03-08 23:24 UTC (permalink / raw)
To: Robin Rosenberg; +Cc: Ilpo Järvinen, git
Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes:
> Increase the context size from the default three lines. Something like
> diff -U 7 old new will require larger chunks of unchanged code for diff
> break up a hunk. With git you can do
>
> GIT_DIFF_OPTS=-u7 git-diff-....
I think you can say "git diff -U7" to do what you are
describing, but I do not think that does what you want. I think
what you want is a "negative context", which says "do not
consider group of lines less than N lines as matching between
preimage and postimage". What Ilpo wants is to see something
like this:
- Deleted 1
- Deleted 2
- Deleted 3
...
- Deleted 400
+ Added 1
+ Added 2
+ Added 3
...
+ Added 500
/* happens to match because this was left intact,too */
- Deleted 401
- Deleted 402
- Deleted 403
+ Added 501
as if the small context lines that happen to match are also part
of the change, like this:
- Deleted 1
- Deleted 2
- Deleted 3
...
- Deleted 400
- /* happens to match because this was left intact,too */
- Deleted 401
- Deleted 402
- Deleted 403
+ Added 1
+ Added 2
+ Added 3
...
+ Added 500
+ /* happens to match because this was left intact,too */
+ Added 501
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Diff output from a rewrite of a function
2007-03-08 23:24 ` Junio C Hamano
@ 2007-03-09 8:39 ` Ilpo Järvinen
2007-03-09 19:47 ` Robin Rosenberg
1 sibling, 0 replies; 5+ messages in thread
From: Ilpo Järvinen @ 2007-03-09 8:39 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Robin Rosenberg, git
On Thu, 8 Mar 2007, Junio C Hamano wrote:
> Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes:
>
> > Increase the context size from the default three lines. Something like
> > diff -U 7 old new will require larger chunks of unchanged code for diff
> > break up a hunk. With git you can do
> >
> > GIT_DIFF_OPTS=-u7 git-diff-....
>
> I think you can say "git diff -U7" to do what you are
> describing, but I do not think that does what you want.
Tested both and as you predicted. It didn't do the trick... Tested even
with 70... :-)
> I think what you want is a "negative context", which says "do not
> consider group of lines less than N lines as matching between
> preimage and postimage". What Ilpo wants is to see something
> like this:
>
> - Deleted 1
> - Deleted 2
> - Deleted 3
> ...
> - Deleted 400
> + Added 1
> + Added 2
> + Added 3
> ...
> + Added 500
> /* happens to match because this was left intact,too */
> - Deleted 401
> - Deleted 402
> - Deleted 403
> + Added 501
>
> as if the small context lines that happen to match are also part
> of the change, like this:
>
> - Deleted 1
> - Deleted 2
> - Deleted 3
> ...
> - Deleted 400
> - /* happens to match because this was left intact,too */
> - Deleted 401
> - Deleted 402
> - Deleted 403
> + Added 1
> + Added 2
> + Added 3
> ...
> + Added 500
> + /* happens to match because this was left intact,too */
> + Added 501
Yes, you are showing here exactly what I meant. IMHO the latter would be
easier for everyone, especially for the review in mailinglists. It's very
hard to find the correct part to comment from the messed up output, and in
the worst case that could be split to two blocks anyway and then there
will be deleted lines between the addition of the lines. Besides, the
latter would describe the nature of the change much better rather than
trying to be "too intelligent"... :-)
Though I guess that when the change is included to any repository (don't
know enough about git internals to know for sure though), it will again
perplex everyone looking it from the history because the diff tries again
to be that intelligent :-)? So in the simplest format this is kind of
helper for review only but that's the most important thing as commit
message could then include a note or so that anyone who really wants, can
create a better diff manually with correct options or whatever is
required for that.
Main problem seems to be empty lines which you basically have in almost
any code (or at least ought to have some of them in any non-trivial code)
and they're always identical unless bad spacing exists. As I noted earlier
also block closing lines and keywords seem the create identical lines
easily even when you don't have any relation between the code in the
different versions. It is quite common to say in any code:
}
}
or:
break;
or:
} else {
etc.
Four lines seem to be the largest I have but that's in the end of a
function so it wouldn't be a big deal (as there are no deleted lines after
that point). Here is a distribution of the synchronization (match) point
sizes for a real change (the previous but quite similar version of the
diff is available on netdev of linux kernel if somebody is that interested
about it):
ijjarvin@glomgold-39:~$ sort tmp/syncpoints | uniq -c
11 1
5 2
1 3
1 4
ijjarvin@glomgold-39:~$ git-diff --stat HEAD^^ HEAD^ net/ipv4/tcp_input.c
net/ipv4/tcp_input.c | 218 +++++++++++++++++++++++++++++++++++---------------
1 files changed, 153 insertions(+), 65 deletions(-)
ijjarvin@kivilampi-30:~/work/src/submit$
Two of the syncpoints with length of two (and the four lines closing
thingie) might be considered as ok. The intention of the new and old code
is to do the same thing but using a totally different algorithm, and those
two twoline blocks actually do logically the same thing, others do not,
but due to language keywords and structure, they match. Though, I would
not mind if a better diff output couldn't then keep those two lines as
from original.
I could have tried to move the relevant functions to another place in the
file but that's sort of hackish approach to a problem I hope my tools
would be able to do if I ask it to (not sure if moving them would have
worked anyway)...
Or alternatively doing a script that creates garbage after each line in
the modified functions and then removes the garbage from the diff output
or so... ;-)
I assume it is not possible currently since you didn't suggest anything?
--
i.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Diff output from a rewrite of a function
2007-03-08 23:24 ` Junio C Hamano
2007-03-09 8:39 ` Ilpo Järvinen
@ 2007-03-09 19:47 ` Robin Rosenberg
1 sibling, 0 replies; 5+ messages in thread
From: Robin Rosenberg @ 2007-03-09 19:47 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Ilpo Järvinen, git
fredag 09 mars 2007 00:24 skrev Junio C Hamano:
> Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes:
>
> > Increase the context size from the default three lines. Something like
> > diff -U 7 old new will require larger chunks of unchanged code for diff
> > break up a hunk. With git you can do
> >
> > GIT_DIFF_OPTS=-u7 git-diff-....
>
> I think you can say "git diff -U7" to do what you are
> describing, but I do not think that does what you want.
True, it's not exactly what I want, but it is easier to read than a diff split
up in many small hunks with misleading context. With larger hunks I can
ignore the misleading matching lines in the middle without thinking too much,
so it kind-o alleviates my problems a little.
-- robin
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-03-09 21:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-08 19:04 Diff output from a rewrite of a function Ilpo Järvinen
2007-03-08 22:01 ` Robin Rosenberg
2007-03-08 23:24 ` Junio C Hamano
2007-03-09 8:39 ` Ilpo Järvinen
2007-03-09 19:47 ` Robin Rosenberg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).