* diff --stat @ 2012-02-14 0:11 Junio C Hamano 2012-02-14 19:50 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2012-02-14 0:11 UTC (permalink / raw) To: git Hrm, what is wrong with this picture? $ git -c diff.color.old=red show --format='%s' --stat Merge branch 'jk/diff-highlight' into pu contrib/diff-highlight/README | 109 ++++++++++++++++++++++++++++++-- contrib/diff-highlight/diff-highlight | 109 ++++++++++++++++++++++++--------- 2 files changed, 181 insertions(+), 37 deletions(-) They both have 109 lines changed but the end of the graph lines do not coincide... ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: diff --stat 2012-02-14 0:11 diff --stat Junio C Hamano @ 2012-02-14 19:50 ` Jeff King 2012-02-14 20:07 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2012-02-14 19:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Mon, Feb 13, 2012 at 04:11:46PM -0800, Junio C Hamano wrote: > Hrm, what is wrong with this picture? > > $ git -c diff.color.old=red show --format='%s' --stat > Merge branch 'jk/diff-highlight' into pu > > contrib/diff-highlight/README | 109 ++++++++++++++++++++++++++++++-- > contrib/diff-highlight/diff-highlight | 109 ++++++++++++++++++++++++--------- > 2 files changed, 181 insertions(+), 37 deletions(-) > > They both have 109 lines changed but the end of the graph lines do not > coincide... I think it is rounding error. The first one is +102/-7, and the second one is +79/-30. When we get to scale_linear, we try to scale 109 change markers into a 33-character width. So the right scaling factor is ~.303. So our "true" scaled widths should be: README, added: 30.9 README, deleted: 2.1 highlight, added: 23.9 highlight, deleted: 9.1 However, we're dealing with integer numbers of characters, so we need to round. In this case, it seems that our rounding produces (30, 2) in the first instance and (24, 9) in the second. Which is odd. You'd think we'd either always round to the nearest integer, or always round down. But we end up rounding 30.9 down and 23.9 up. So it may be a subtle loss-of-precision error in scale_linear. Hmm. Looking at scale_linear, the formula is: return ((it - 1) * (width - 1) + max_change - 1) / (max_change - 1); I don't see how that can be accurate, since the magnitude of the "-1" tweak will vary based on the value of "it". This code is due to 3ed74e6, but I don't quite follow the logic in the commit message. -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: diff --stat 2012-02-14 19:50 ` Jeff King @ 2012-02-14 20:07 ` Junio C Hamano 2012-02-14 20:29 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2012-02-14 20:07 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git Jeff King <peff@peff.net> writes: > Hmm. Looking at scale_linear, the formula is: > > return ((it - 1) * (width - 1) + max_change - 1) / (max_change - 1); > > I don't see how that can be accurate, since the magnitude of the "-1" > tweak will vary based on the value of "it". This code is due to > 3ed74e6, but I don't quite follow the logic in the commit message. Doesn't it need +1 at the end, I wonder? We want to map: - zero to zero - any number to at least 1 so scaling a non-zero "it" so that maximum maps to (width-1) and then adding 1 would be the right way for the latter case. Of course, an easy way out without worrying about the correct math is to scale the total and the smaller one and then declare that the scaled larger one is the difference between the two. That way, both of these two files have 109 in total so the length of the entire graph would be the same ;-). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: diff --stat 2012-02-14 20:07 ` Junio C Hamano @ 2012-02-14 20:29 ` Jeff King 2012-02-14 21:49 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2012-02-14 20:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Tue, Feb 14, 2012 at 12:07:31PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Hmm. Looking at scale_linear, the formula is: > > > > return ((it - 1) * (width - 1) + max_change - 1) / (max_change - 1); > > > > I don't see how that can be accurate, since the magnitude of the "-1" > > tweak will vary based on the value of "it". This code is due to > > 3ed74e6, but I don't quite follow the logic in the commit message. > > Doesn't it need +1 at the end, I wonder? We want to map: > > - zero to zero > - any number to at least 1 > > so scaling a non-zero "it" so that maximum maps to (width-1) and then > adding 1 would be the right way for the latter case. Yeah, that makes more sense to me. I think you could also get by with simply rounding the above properly to the nearest integer (so a little bit of error that makes 23.9 into 24.0 would be OK, because we would end up rounding 30.9 to 31.0, too). This seems like the most obvious solution to me: static int divide_and_round(int a, int b) { return (2 * a + b) / (2 * b); } /* * We want non-zero changes to have at least 1 marker, so special-case * zero, then scale to width-1, and add back in 1. */ static int scale_linear(int it, int width, int max_change) { if (!it) return 0; return 1 + divide_and_round(it * (width-1), max_change); } Any "must show at least 1" scheme is going to have some error in the scaling (because we are rounding up all of the low end). I have a feeling that the scheme from 3ed74e6 was trying to distribute that error more evenly throughout the scale, and the scheme above is lumping all of it at the start (i.e., the difference between what constitutes one marker and two markers is much greater than the difference between two and three). But that's just a vague feeling. For some reason my brain is not doing well with math today, so I'll forego writing a proof. > Of course, an easy way out without worrying about the correct math is to > scale the total and the smaller one and then declare that the scaled > larger one is the difference between the two. That way, both of these two > files have 109 in total so the length of the entire graph would be the > same ;-). It looks like we actually did that, pre-3ed74e6. I think it's a valid strategy. It is just pushing the error around, but I don't know that people are counting the +/- markers on the line and comparing them exactly. A little error there is less of a big deal than error between two lines which are supposed to have the same number of changes (you'll note that we don't give the per-file added/deleted numbers exactly, so they are a good place to hide error. :) ). -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: diff --stat 2012-02-14 20:29 ` Jeff King @ 2012-02-14 21:49 ` Junio C Hamano 2012-02-14 21:53 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2012-02-14 21:49 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > On Tue, Feb 14, 2012 at 12:07:31PM -0800, Junio C Hamano wrote: > >> Of course, an easy way out without worrying about the correct math is to >> scale the total and the smaller one and then declare that the scaled >> larger one is the difference between the two. That way, both of these two >> files have 109 in total so the length of the entire graph would be the >> same ;-). > > It looks like we actually did that, pre-3ed74e6. I think it's a valid > strategy. It is just pushing the error around,... Yes, that is exactly why I suggested that approach. We have to deal with rounding error somewhere no matter what we do, and the balance between add/del is much less noticeable than the change with the same total not lining up. Anyway, here is an obvious patch to fix this. We did not have any test that failed with this change, as all our test vectors fit comfortably on the default 80-column output. -- >8 -- Subject: diff --stat: resurrect "same for same" heuristic When commit 3ed74e6 (diff --stat: ensure at least one '-' for deletions, and one '+' for additions, 2006-09-28) improved the output for files with tiny modifications, we accidentally broke rounding logic to ensure that two equal sized changes are shown with the bars of the same length. This updates the logic to compute the length of the graph bars, using the same "non-zero changes is shown with at least one column" scaling logic, but by scaling the sum of additions and deletions to come up with the total length of the bar (this ensures that two equal sized changes result in bars of the same length), and then scaling the smaller of the additions or deletions. The other side is computed as the difference between the two. This makes the apportioning between additions and deletions less accurate due to rounding errors, but it is much less noticeable than two files with the same amount of change showing bars of different length. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- diff.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/diff.c b/diff.c index 3550c18..76d4724 100644 --- a/diff.c +++ b/diff.c @@ -1273,13 +1273,17 @@ const char mime_boundary_leader[] = "------------"; static int scale_linear(int it, int width, int max_change) { + if (!it) + return 0; /* - * make sure that at least one '-' is printed if there were deletions, - * and likewise for '+'. + * make sure that at least one '-' or '+' is printed if + * there is any change to this path. The easiest way is to + * scale linearly as if all the quantities were one smaller + * than they actually are, and then add one to the result. */ if (max_change < 2) - return it; - return ((it - 1) * (width - 1) + max_change - 1) / (max_change - 1); + return 1; + return 1 + ((it - 1) * (width - 1) / (max_change - 1)); } static void show_name(FILE *file, @@ -1495,8 +1499,19 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) dels += del; if (width <= max_change) { - add = scale_linear(add, width, max_change); - del = scale_linear(del, width, max_change); + int total = add + del; + + total = scale_linear(add + del, width, max_change); + if (total < 2 && add && del) + /* width >= 2 due to the sanity check */ + total = 2; + if (add < del) { + add = scale_linear(add, width, max_change); + del = total - add; + } else { + del = scale_linear(del, width, max_change); + add = total - del; + } } fprintf(options->file, "%s", line_prefix); show_name(options->file, prefix, name, len); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: diff --stat 2012-02-14 21:49 ` Junio C Hamano @ 2012-02-14 21:53 ` Jeff King 2012-02-14 22:06 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2012-02-14 21:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Tue, Feb 14, 2012 at 01:49:11PM -0800, Junio C Hamano wrote: > static int scale_linear(int it, int width, int max_change) > { > + if (!it) > + return 0; > /* > - * make sure that at least one '-' is printed if there were deletions, > - * and likewise for '+'. > + * make sure that at least one '-' or '+' is printed if > + * there is any change to this path. The easiest way is to > + * scale linearly as if all the quantities were one smaller > + * than they actually are, and then add one to the result. > */ > if (max_change < 2) > - return it; > - return ((it - 1) * (width - 1) + max_change - 1) / (max_change - 1); > + return 1; > + return 1 + ((it - 1) * (width - 1) / (max_change - 1)); I'm not sure I understand why the "it - 1" and "max_change - 1" bits are still there, or what they are doing in the first place. I.e., why is it not simply "scale linearly to width-1, then add 1" as I posted earlier? -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: diff --stat 2012-02-14 21:53 ` Jeff King @ 2012-02-14 22:06 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2012-02-14 22:06 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > I'm not sure I understand why the "it - 1" and "max_change - 1" bits are > still there,... Simple; I wasn't thinking straight. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-02-14 22:06 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-02-14 0:11 diff --stat Junio C Hamano 2012-02-14 19:50 ` Jeff King 2012-02-14 20:07 ` Junio C Hamano 2012-02-14 20:29 ` Jeff King 2012-02-14 21:49 ` Junio C Hamano 2012-02-14 21:53 ` Jeff King 2012-02-14 22:06 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).