* diff --stat
@ 2012-02-14 0:11 Junio C Hamano
2012-02-14 19:50 ` Jeff King
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2012-02-14 0:11 UTC (permalink / raw)
To: git
Hrm, what is wrong with this picture?
$ git -c diff.color.old=red show --format='%s' --stat
Merge branch 'jk/diff-highlight' into pu
contrib/diff-highlight/README | 109 ++++++++++++++++++++++++++++++--
contrib/diff-highlight/diff-highlight | 109 ++++++++++++++++++++++++---------
2 files changed, 181 insertions(+), 37 deletions(-)
They both have 109 lines changed but the end of the graph lines do not
coincide...
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: diff --stat
2012-02-14 0:11 diff --stat Junio C Hamano
@ 2012-02-14 19:50 ` Jeff King
2012-02-14 20:07 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2012-02-14 19:50 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Mon, Feb 13, 2012 at 04:11:46PM -0800, Junio C Hamano wrote:
> Hrm, what is wrong with this picture?
>
> $ git -c diff.color.old=red show --format='%s' --stat
> Merge branch 'jk/diff-highlight' into pu
>
> contrib/diff-highlight/README | 109 ++++++++++++++++++++++++++++++--
> contrib/diff-highlight/diff-highlight | 109 ++++++++++++++++++++++++---------
> 2 files changed, 181 insertions(+), 37 deletions(-)
>
> They both have 109 lines changed but the end of the graph lines do not
> coincide...
I think it is rounding error. The first one is +102/-7, and the second
one is +79/-30. When we get to scale_linear, we try to scale 109 change
markers into a 33-character width. So the right scaling factor is ~.303.
So our "true" scaled widths should be:
README, added: 30.9
README, deleted: 2.1
highlight, added: 23.9
highlight, deleted: 9.1
However, we're dealing with integer numbers of characters, so we need to
round. In this case, it seems that our rounding produces (30, 2) in the
first instance and (24, 9) in the second. Which is odd. You'd think
we'd either always round to the nearest integer, or always round down.
But we end up rounding 30.9 down and 23.9 up. So it may be a subtle
loss-of-precision error in scale_linear.
Hmm. Looking at scale_linear, the formula is:
return ((it - 1) * (width - 1) + max_change - 1) / (max_change - 1);
I don't see how that can be accurate, since the magnitude of the "-1"
tweak will vary based on the value of "it". This code is due to
3ed74e6, but I don't quite follow the logic in the commit message.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: diff --stat
2012-02-14 19:50 ` Jeff King
@ 2012-02-14 20:07 ` Junio C Hamano
2012-02-14 20:29 ` Jeff King
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2012-02-14 20:07 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
Jeff King <peff@peff.net> writes:
> Hmm. Looking at scale_linear, the formula is:
>
> return ((it - 1) * (width - 1) + max_change - 1) / (max_change - 1);
>
> I don't see how that can be accurate, since the magnitude of the "-1"
> tweak will vary based on the value of "it". This code is due to
> 3ed74e6, but I don't quite follow the logic in the commit message.
Doesn't it need +1 at the end, I wonder? We want to map:
- zero to zero
- any number to at least 1
so scaling a non-zero "it" so that maximum maps to (width-1) and then
adding 1 would be the right way for the latter case.
Of course, an easy way out without worrying about the correct math is to
scale the total and the smaller one and then declare that the scaled
larger one is the difference between the two. That way, both of these two
files have 109 in total so the length of the entire graph would be the
same ;-).
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: diff --stat
2012-02-14 20:07 ` Junio C Hamano
@ 2012-02-14 20:29 ` Jeff King
2012-02-14 21:49 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2012-02-14 20:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Tue, Feb 14, 2012 at 12:07:31PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > Hmm. Looking at scale_linear, the formula is:
> >
> > return ((it - 1) * (width - 1) + max_change - 1) / (max_change - 1);
> >
> > I don't see how that can be accurate, since the magnitude of the "-1"
> > tweak will vary based on the value of "it". This code is due to
> > 3ed74e6, but I don't quite follow the logic in the commit message.
>
> Doesn't it need +1 at the end, I wonder? We want to map:
>
> - zero to zero
> - any number to at least 1
>
> so scaling a non-zero "it" so that maximum maps to (width-1) and then
> adding 1 would be the right way for the latter case.
Yeah, that makes more sense to me. I think you could also get by with
simply rounding the above properly to the nearest integer (so a little
bit of error that makes 23.9 into 24.0 would be OK, because we would end
up rounding 30.9 to 31.0, too). This seems like the most obvious
solution to me:
static int divide_and_round(int a, int b)
{
return (2 * a + b) / (2 * b);
}
/*
* We want non-zero changes to have at least 1 marker, so special-case
* zero, then scale to width-1, and add back in 1.
*/
static int scale_linear(int it, int width, int max_change)
{
if (!it)
return 0;
return 1 + divide_and_round(it * (width-1), max_change);
}
Any "must show at least 1" scheme is going to have some error in the
scaling (because we are rounding up all of the low end). I have a
feeling that the scheme from 3ed74e6 was trying to distribute that error
more evenly throughout the scale, and the scheme above is lumping all of
it at the start (i.e., the difference between what constitutes one
marker and two markers is much greater than the difference between two
and three).
But that's just a vague feeling. For some reason my brain is not doing
well with math today, so I'll forego writing a proof.
> Of course, an easy way out without worrying about the correct math is to
> scale the total and the smaller one and then declare that the scaled
> larger one is the difference between the two. That way, both of these two
> files have 109 in total so the length of the entire graph would be the
> same ;-).
It looks like we actually did that, pre-3ed74e6. I think it's a valid
strategy. It is just pushing the error around, but I don't know that
people are counting the +/- markers on the line and comparing them
exactly. A little error there is less of a big deal than error between
two lines which are supposed to have the same number of changes (you'll
note that we don't give the per-file added/deleted numbers exactly, so
they are a good place to hide error. :) ).
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: diff --stat
2012-02-14 20:29 ` Jeff King
@ 2012-02-14 21:49 ` Junio C Hamano
2012-02-14 21:53 ` Jeff King
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2012-02-14 21:49 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> On Tue, Feb 14, 2012 at 12:07:31PM -0800, Junio C Hamano wrote:
>
>> Of course, an easy way out without worrying about the correct math is to
>> scale the total and the smaller one and then declare that the scaled
>> larger one is the difference between the two. That way, both of these two
>> files have 109 in total so the length of the entire graph would be the
>> same ;-).
>
> It looks like we actually did that, pre-3ed74e6. I think it's a valid
> strategy. It is just pushing the error around,...
Yes, that is exactly why I suggested that approach. We have to deal with
rounding error somewhere no matter what we do, and the balance between
add/del is much less noticeable than the change with the same total not
lining up.
Anyway, here is an obvious patch to fix this. We did not have any test
that failed with this change, as all our test vectors fit comfortably on
the default 80-column output.
-- >8 --
Subject: diff --stat: resurrect "same for same" heuristic
When commit 3ed74e6 (diff --stat: ensure at least one '-' for deletions,
and one '+' for additions, 2006-09-28) improved the output for files with
tiny modifications, we accidentally broke rounding logic to ensure that
two equal sized changes are shown with the bars of the same length.
This updates the logic to compute the length of the graph bars, using the
same "non-zero changes is shown with at least one column" scaling logic,
but by scaling the sum of additions and deletions to come up with the
total length of the bar (this ensures that two equal sized changes result
in bars of the same length), and then scaling the smaller of the additions
or deletions. The other side is computed as the difference between the
two. This makes the apportioning between additions and deletions less
accurate due to rounding errors, but it is much less noticeable than two
files with the same amount of change showing bars of different length.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
diff.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/diff.c b/diff.c
index 3550c18..76d4724 100644
--- a/diff.c
+++ b/diff.c
@@ -1273,13 +1273,17 @@ const char mime_boundary_leader[] = "------------";
static int scale_linear(int it, int width, int max_change)
{
+ if (!it)
+ return 0;
/*
- * make sure that at least one '-' is printed if there were deletions,
- * and likewise for '+'.
+ * make sure that at least one '-' or '+' is printed if
+ * there is any change to this path. The easiest way is to
+ * scale linearly as if all the quantities were one smaller
+ * than they actually are, and then add one to the result.
*/
if (max_change < 2)
- return it;
- return ((it - 1) * (width - 1) + max_change - 1) / (max_change - 1);
+ return 1;
+ return 1 + ((it - 1) * (width - 1) / (max_change - 1));
}
static void show_name(FILE *file,
@@ -1495,8 +1499,19 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
dels += del;
if (width <= max_change) {
- add = scale_linear(add, width, max_change);
- del = scale_linear(del, width, max_change);
+ int total = add + del;
+
+ total = scale_linear(add + del, width, max_change);
+ if (total < 2 && add && del)
+ /* width >= 2 due to the sanity check */
+ total = 2;
+ if (add < del) {
+ add = scale_linear(add, width, max_change);
+ del = total - add;
+ } else {
+ del = scale_linear(del, width, max_change);
+ add = total - del;
+ }
}
fprintf(options->file, "%s", line_prefix);
show_name(options->file, prefix, name, len);
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: diff --stat
2012-02-14 21:49 ` Junio C Hamano
@ 2012-02-14 21:53 ` Jeff King
2012-02-14 22:06 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2012-02-14 21:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Tue, Feb 14, 2012 at 01:49:11PM -0800, Junio C Hamano wrote:
> static int scale_linear(int it, int width, int max_change)
> {
> + if (!it)
> + return 0;
> /*
> - * make sure that at least one '-' is printed if there were deletions,
> - * and likewise for '+'.
> + * make sure that at least one '-' or '+' is printed if
> + * there is any change to this path. The easiest way is to
> + * scale linearly as if all the quantities were one smaller
> + * than they actually are, and then add one to the result.
> */
> if (max_change < 2)
> - return it;
> - return ((it - 1) * (width - 1) + max_change - 1) / (max_change - 1);
> + return 1;
> + return 1 + ((it - 1) * (width - 1) / (max_change - 1));
I'm not sure I understand why the "it - 1" and "max_change - 1" bits are
still there, or what they are doing in the first place. I.e., why is it
not simply "scale linearly to width-1, then add 1" as I posted earlier?
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: diff --stat
2012-02-14 21:53 ` Jeff King
@ 2012-02-14 22:06 ` Junio C Hamano
0 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2012-02-14 22:06 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> I'm not sure I understand why the "it - 1" and "max_change - 1" bits are
> still there,...
Simple; I wasn't thinking straight.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-02-14 22:06 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-14 0:11 diff --stat Junio C Hamano
2012-02-14 19:50 ` Jeff King
2012-02-14 20:07 ` Junio C Hamano
2012-02-14 20:29 ` Jeff King
2012-02-14 21:49 ` Junio C Hamano
2012-02-14 21:53 ` Jeff King
2012-02-14 22:06 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).