From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael J Gruber Subject: Re: [PATCHv3 2/3] diff: introduce --stat-lines to limit the stat lines Date: Mon, 30 May 2011 08:40:48 +0200 Message-ID: <4DE33BF0.7060607@drmicha.warpmail.net> References: <4DC0FD3D.9010004@drmicha.warpmail.net> <5da631c64438ec3f669f0c2b7456bcfbc371e2f5.1306499600.git.git@drmicha.warpmail.net> <7v39jzqvny.fsf@alter.siamese.dyndns.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: git@vger.kernel.org To: Junio C Hamano X-From: git-owner@vger.kernel.org Mon May 30 08:40:59 2011 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1QQw9p-00053z-IY for gcvg-git-2@lo.gmane.org; Mon, 30 May 2011 08:40:58 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752942Ab1E3Gkw (ORCPT ); Mon, 30 May 2011 02:40:52 -0400 Received: from out3.smtp.messagingengine.com ([66.111.4.27]:46010 "EHLO out3.smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751026Ab1E3Gkv (ORCPT ); Mon, 30 May 2011 02:40:51 -0400 Received: from compute5.internal (compute5.nyi.mail.srv.osa [10.202.2.45]) by gateway1.messagingengine.com (Postfix) with ESMTP id CAA4520B0E; Mon, 30 May 2011 02:40:50 -0400 (EDT) Received: from frontend1.messagingengine.com ([10.202.2.160]) by compute5.internal (MEProxy); Mon, 30 May 2011 02:40:50 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=messagingengine.com; h=message-id:date:from:mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; s=smtpout; bh=6t0Z+iIni0GkyJ3+R/g0nFtT1Es=; b=nz/bm6vs3zgPuSgGeLIrNoIq/aVZ1Qn+bgIwGKoCeToENPOrLYiCJZK8yg01ocDKbo3yk685/p7nc9Q9Vja1flDiVMgUXNQeAkkvrS4NZ1RN1khH65LMR4+Znm9I1nPrSyftGwupdVLmxMfxLPm2vXDkquUwYuFXODCum8kjCdc= X-Sasl-enc: vitB2KW3pkLlE5hoc0455Z6TL/KFdSEXefganDMiuiHu 1306737650 Received: from localhost.localdomain (whitehead.math.tu-clausthal.de [139.174.44.62]) by mail.messagingengine.com (Postfix) with ESMTPSA id 3C84D4079F4; Mon, 30 May 2011 02:40:50 -0400 (EDT) User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc15 Lightning/1.0b3pre Thunderbird/3.1.10 In-Reply-To: <7v39jzqvny.fsf@alter.siamese.dyndns.org> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Junio C Hamano venit, vidit, dixit 28.05.2011 06:46: > Michael J Gruber writes: > >> diff --git a/diff.c b/diff.c >> index 4541939..6a6cbca 100644 >> --- a/diff.c >> +++ b/diff.c >> @@ -1259,6 +1259,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) >> >> width = options->stat_width ? options->stat_width : 80; >> name_width = options->stat_name_width ? options->stat_name_width : 50; >> + count = options->stat_count ? options->stat_count : data->nr; > > It is a tad hard to follow the logic of "count" here. It could be the > ceiling the user gave you, or we may not be constrained at all if the full > diff is smaller. But then > >> @@ -1275,11 +1276,12 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) >> add_c = diff_get_color_opt(options, DIFF_FILE_NEW); >> del_c = diff_get_color_opt(options, DIFF_FILE_OLD); >> >> - for (i = 0; i < data->nr; i++) { >> + for (i = 0; (i < count) && (i < data->nr); i++) { > > Here we compare "i" which does not have much to do with how many we > actually have to show (as we may be skipping in the loop) with "count" and > also with data->nr. In the loop, you compensate for the skipping > (i.e. increment of "i" that should not affect the outcome) by incrementing > "count", so (i < count) part will correctly count towards the limit the > user gave. It may be a correct math, but it is tricky and hard to read. > Especially if we started with the options->stat_count that is larger than > the data->nr, of course "i < count" will always hold true with or without > skipping, which also is correct but even trickier. Maybe I was trying too hard to be efficient. We could have count be stat_count if non-zero or data->nr and limit all loops by the double conditional (i < count) && (i < data->nr), resp. use an additional counter like shown or skipped and compare with that. Given the short discussion for the initial patch, I really did not expect how much it would still take... (My bad, of course.) > >> struct diffstat_file *file = data->files[i]; >> uintmax_t change = file->added + file->deleted; >> if (!data->files[i]->is_renamed && >> (change == 0)) { >> + count++; /* not shown == room for one more */ >> continue; >> } >> fill_print_name(file); >> @@ -1292,6 +1294,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) >> if (max_change < change) >> max_change = change; >> } >> + count = i; /* min(count, data->nr) */ > > This value is mechanically min(count, data->nr) but it is confusing to > explain what it really means. It is the number of elements you have to > look at in data->files[] array, still skipping irrelevant entries in the > loop, until you show options->stat_count number of entries. As you explain further down, we may have to look at even more in order to be able to say "..." or not. > >> @@ -1306,7 +1309,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) >> else >> width = max_change; >> >> - for (i = 0; i < data->nr; i++) { >> + for (i = 0; i < count; i++) { > > Hence this loop is correct. You ignore everything beyond count in the > array, as their lengths are immaterial to the outcome. > I think I really should not worry too much about efficiency here and more about simple code. >> @@ -1373,6 +1376,19 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) >> show_graph(options->file, '-', del, del_c, reset); >> fprintf(options->file, "\n"); >> } >> + if (count < data->nr) >> + fprintf(options->file, "%s ...\n", line_prefix); > > At this point, count is the sum of the number of items you are going to > show and the number of items you are going to skip. If that is smaller > than the total in the original array, there are items that you are not > showing nor skipping. But you terminated the first loop early before > reaching the end of data->file[] array, so you do not know enough to say > "there _are_ more to show but I am not showing", I think. Sigh, you're right. We would have to look until we find the first non-show non-skipped in order to say that. > > If > > - data->nr is 3; > - the data->file[0] is to be shown but other entries in data->file[] are > not to be shown; and > - opt->stat_count is 1 > > then your first loop fills data->file[0]->name, increments i to 1, the > condition (i < count) no longer holds, and stops. At this point count is > 1, data->nr is 3, but data->file[1] and [2] are not to be shown anyway, so > this "there are more but I am not showing them" is not quite right, is it? > >> + for (i = count; i < data->nr; i++) { >> + uintmax_t added = data->files[i]->added; >> + uintmax_t deleted = data->files[i]->deleted; >> + if (!data->files[i]->is_renamed && >> + (added + deleted == 0)) { >> + total_files--; >> + continue; >> + } >> + adds += added; >> + dels += deleted; >> + } > > If this loop runs "adds += added" even once, then you have "there are more > but I am not showing" situation, and if all the entries beyond count are > skipped, then you are not limiting the output. > > By the way, in this round, you are trying to show only the first N items, > and if there are more to show, then showing "..." after them. I kind of > liked the idea that came up during the discussion in earlier rounds to > show N-1 then "..." then the last one item, but I do not feel too > strongly about it. I thought fmt-merge-msg was the example to follow. It does exactly what I'm trying here, i.e. show count items (if there are count or more), and show "..." if there are more. > diff.c | 6 ++++-- > t/t4049-diff-stat-count.sh | 25 +++++++++++++++++++++++++ > 2 files changed, 29 insertions(+), 2 deletions(-) > > diff --git a/diff.c b/diff.c > index 82789e3..542ff42 100644 > --- a/diff.c > +++ b/diff.c > @@ -1247,6 +1247,7 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) > int width, name_width, count; > const char *reset, *add_c, *del_c; > const char *line_prefix = ""; > + int extra_shown = 0; > struct strbuf *msg = NULL; > > if (data->nr == 0) > @@ -1376,8 +1377,6 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) > show_graph(options->file, '-', del, del_c, reset); > fprintf(options->file, "\n"); > } > - if (count < data->nr) > - fprintf(options->file, "%s ...\n", line_prefix); > for (i = count; i < data->nr; i++) { > uintmax_t added = data->files[i]->added; > uintmax_t deleted = data->files[i]->deleted; > @@ -1388,6 +1387,9 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options) > } > adds += added; > dels += deleted; > + if (!extra_shown) > + fprintf(options->file, "%s ...\n", line_prefix); > + extra_shown = 1; > } > fprintf(options->file, "%s", line_prefix); > fprintf(options->file, > diff --git a/t/t4049-diff-stat-count.sh b/t/t4049-diff-stat-count.sh > new file mode 100755 > index 0000000..641e70d > --- /dev/null > +++ b/t/t4049-diff-stat-count.sh > @@ -0,0 +1,25 @@ > +#!/bin/sh > +# Copyright (c) 2011, Google Inc. > + > +test_description='diff --stat-count' > +. ./test-lib.sh > + > +test_expect_success setup ' > + >a && > + >b && > + >c && > + >d && > + git add a b c d && > + chmod +x c d && > + echo a >a && > + echo b >b && > + cat >expect <<-\EOF > + a | 1 + > + b | 1 + > + 2 files changed, 2 insertions(+), 0 deletions(-) > + EOF > + git diff --stat --stat-count=2 >actual && > + test_cmp expect actual > +' > + > +test_done That works, thanks also for the test. Should I squash this in or go for a clearer use of "count"? (Also, I may have to take into account your notes about the workflow.) Michael