* [PATCH] diff: Make numstat machine friendly also for renames (and copies) @ 2007-12-10 22:32 Jakub Narebski 2007-12-10 22:55 ` [PATCH (amend)] " Jakub Narebski 2007-12-10 23:00 ` [PATCH] " Junio C Hamano 0 siblings, 2 replies; 12+ messages in thread From: Jakub Narebski @ 2007-12-10 22:32 UTC (permalink / raw) To: git "git diff --numstat" used the same format as "git diff --stat" for renamed (and copied) files, except that filenames were not shortened when they didn't fit in the column width. This format is suitable for human consumption, but it cannot be unambiguously parsed. Instead of that always use final file name ("to" name) for numstat. It is possible to find name before rename when name after is known. This required to use pprint_rename (pretty print rename) during output (in the show_stats function) and not during parsing (in diffstat_add function). Adding from_name field to struct diffstat_t makes is_renamed bitfield redundant; nevertheless for the sake of clarity, readability and making this patch minimal (and because it would not reduce memory footprint) it was not removed, and its used not replaced by checking from_name field. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This would be useful for gitweb, later. I hope I have made it in time before feature freeze... diff.c | 15 ++++++++++++--- 1 files changed, 12 insertions(+), 3 deletions(-) diff --git a/diff.c b/diff.c index f780e3e..38b9367 100644 --- a/diff.c +++ b/diff.c @@ -735,6 +735,7 @@ struct diffstat_t { int alloc; struct diffstat_file { char *name; + char *from_name; unsigned is_unmerged:1; unsigned is_binary:1; unsigned is_renamed:1; @@ -755,11 +756,14 @@ static struct diffstat_file *diffstat_add(struct diffstat_t *diffstat, } diffstat->files[diffstat->nr++] = x; if (name_b) { - x->name = pprint_rename(name_a, name_b); + x->from_name = xstrdup(name_a); + x->name = xstrdup(name_b); x->is_renamed = 1; } - else + else { + x->from_name = NULL; x->name = xstrdup(name_a); + } return x; } @@ -837,7 +841,7 @@ static void show_stats(struct diffstat_t* data, struct diff_options *options) struct diffstat_file *file = data->files[i]; int change = file->added + file->deleted; - if (!file->is_renamed) { /* renames are already quoted by pprint_rename */ + if (!file->is_renamed) { /* renames will be quoted by pprint_rename */ struct strbuf buf; strbuf_init(&buf, 0); if (quote_c_style(file->name, &buf, NULL, 0)) { @@ -846,6 +850,11 @@ static void show_stats(struct diffstat_t* data, struct diff_options *options) } else { strbuf_release(&buf); } + } else { + char *qname = pprint_rename(file->from_name, file->name); + free(file->name); + free(file->from_name); + file->name = qname; } len = strlen(file->name); -- 1.5.3.7 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH (amend)] diff: Make numstat machine friendly also for renames (and copies) 2007-12-10 22:32 [PATCH] diff: Make numstat machine friendly also for renames (and copies) Jakub Narebski @ 2007-12-10 22:55 ` Jakub Narebski 2007-12-11 1:06 ` Junio C Hamano 2007-12-10 23:00 ` [PATCH] " Junio C Hamano 1 sibling, 1 reply; 12+ messages in thread From: Jakub Narebski @ 2007-12-10 22:55 UTC (permalink / raw) To: git "git diff --numstat" used the same format as "git diff --stat" for renamed (and copied) files, except that filenames were not shortened when they didn't fit in the column width. This format is suitable for human consumption, but it cannot be unambiguously parsed. Instead of that always use final file name ("to" name) for numstat. It is possible to find name before rename when name after is known. This required to use pprint_rename (pretty print rename) during output (in the show_stats function) and not during parsing (in diffstat_add function). Adding from_name field to struct diffstat_t makes is_renamed bitfield redundant; nevertheless for the sake of clarity, readability and making this patch minimal (and because it would not reduce memory footprint) it was not removed, and its used not replaced by checking from_name field. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- Sorry for mistake: I have tested this commit, corrected it... and forgot to update patch to send. Previous version of this patch (from 7 May 2007) used instead of current only "to_name" format similar to git-diff-tree raw format for renames: added deleted TAB path for "src" TAB path for "dst" LF The problem was when -z option was used: how to separate end of record from end of from_name and start of to_name. For git-diff we have status to distinguish those; no such thing for numstat output. Previous version of patch used (or was to use actually, because of error in the code) added deleted TAB path for "src" NUL NUL path for "dst" NUL when -z option was used. This is left now for future --numstat-extended option... diff.c | 22 +++++++++++++--------- 1 files changed, 13 insertions(+), 9 deletions(-) diff --git a/diff.c b/diff.c index f780e3e..8039ac7 100644 --- a/diff.c +++ b/diff.c @@ -735,6 +735,7 @@ struct diffstat_t { int alloc; struct diffstat_file { char *name; + char *from_name; unsigned is_unmerged:1; unsigned is_binary:1; unsigned is_renamed:1; @@ -755,11 +756,14 @@ static struct diffstat_file *diffstat_add(struct diffstat_t *diffstat, } diffstat->files[diffstat->nr++] = x; if (name_b) { - x->name = pprint_rename(name_a, name_b); + x->from_name = xstrdup(name_a); + x->name = xstrdup(name_b); x->is_renamed = 1; } - else + else { + x->from_name = NULL; x->name = xstrdup(name_a); + } return x; } @@ -837,7 +841,7 @@ static void show_stats(struct diffstat_t* data, struct diff_options *options) struct diffstat_file *file = data->files[i]; int change = file->added + file->deleted; - if (!file->is_renamed) { /* renames are already quoted by pprint_rename */ + if (!file->is_renamed) { /* renames will be quoted by pprint_rename */ struct strbuf buf; strbuf_init(&buf, 0); if (quote_c_style(file->name, &buf, NULL, 0)) { @@ -846,6 +850,11 @@ static void show_stats(struct diffstat_t* data, struct diff_options *options) } else { strbuf_release(&buf); } + } else { + char *qname = pprint_rename(file->from_name, file->name); + free(file->name); + free(file->from_name); + file->name = qname; } len = strlen(file->name); @@ -982,12 +991,7 @@ static void show_numstat(struct diffstat_t* data, struct diff_options *options) printf("-\t-\t"); else printf("%d\t%d\t", file->added, file->deleted); - if (!file->is_renamed) { - write_name_quoted(file->name, stdout, options->line_termination); - } else { - fputs(file->name, stdout); - putchar(options->line_termination); - } + write_name_quoted(file->name, stdout, options->line_termination); } } -- 1.5.3.7 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH (amend)] diff: Make numstat machine friendly also for renames (and copies) 2007-12-10 22:55 ` [PATCH (amend)] " Jakub Narebski @ 2007-12-11 1:06 ` Junio C Hamano 2007-12-11 1:26 ` Jakub Narebski 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2007-12-11 1:06 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski <jnareb@gmail.com> writes: > Previous version of this patch (from 7 May 2007) used instead of current > only "to_name" format similar to git-diff-tree raw format for renames: > > added deleted TAB path for "src" TAB path for "dst" LF > > The problem was when -z option was used: how to separate end of record > from end of from_name and start of to_name. For git-diff we have status > to distinguish those; no such thing for numstat output. Previous version > of patch used (or was to use actually, because of error in the code) > > added deleted TAB path for "src" NUL NUL path for "dst" NUL > > when -z option was used. I think the cleanest at this point is to have --numstat-enhanced that shows <added> <deleted> <status> <path1> <added> <deleted> <status> <path1> <path2> Anything else would be a regression. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH (amend)] diff: Make numstat machine friendly also for renames (and copies) 2007-12-11 1:06 ` Junio C Hamano @ 2007-12-11 1:26 ` Jakub Narebski 2007-12-11 2:50 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Jakub Narebski @ 2007-12-11 1:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: > > > Previous version of this patch (from 7 May 2007) used instead of current > > only "to_name" format similar to git-diff-tree raw format for renames: > > > > added deleted TAB path for "src" TAB path for "dst" LF > > > > The problem was when -z option was used: how to separate end of record > > from end of from_name and start of to_name. For git-diff we have status > > to distinguish those; no such thing for numstat output. Previous version > > of patch used (or was to use actually, because of error in the code) > > > > added deleted TAB path for "src" NUL NUL path for "dst" NUL > > > > when -z option was used. > > I think the cleanest at this point is to have --numstat-enhanced that > shows > > <added> <deleted> <status> <path1> > <added> <deleted> <status> <path1> <path2> > > Anything else would be a regression. That is the plan[*1*]. Nevertheless always using destination filename for "ordinary" numstat is a step in good direction. I don't think that would break _any_ scripts (as previous version was not good to be parsed by a machine); I think it is even more probable that old version _broke_ scripts if -M / -C was provided. [*1*] When I (or somebody else) find time for that. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH (amend)] diff: Make numstat machine friendly also for renames (and copies) 2007-12-11 1:26 ` Jakub Narebski @ 2007-12-11 2:50 ` Junio C Hamano 2007-12-11 23:09 ` Jakub Narebski 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2007-12-11 2:50 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski <jnareb@gmail.com> writes: > Junio C Hamano wrote: >> Jakub Narebski <jnareb@gmail.com> writes: >> >> > Previous version of this patch (from 7 May 2007) used instead of current >> > only "to_name" format similar to git-diff-tree raw format for renames: >> > >> > added deleted TAB path for "src" TAB path for "dst" LF That's perfectly fine (that is without -z). >> > ... Previous version >> > of patch used (or was to use actually, because of error in the code) >> > >> > added deleted TAB path for "src" NUL NUL path for "dst" NUL >> > >> > when -z option was used. I think your "previous" one: <added> <deleted> <src> NUL (no rename) <added> <deleted> <src> NUL NUL <dst> NUL (with rename) would be unambiguously parsable, but it would be cleaner and easier for the parser if the latter were like this instead: <added> <deleted> NUL <src> NUL <dst> NUL (with rename) The reader can expect to find a NUL terminated src, finds the length is zero and notices it cannot be src and then reads two paths from that point. Without having the status letter, we cannot do "optional two paths" without breaking existing --numstat -z readers that do not care about renames and copies, and I agree that existing --numstat -z readers that pass -M or -C are already broken, so I think a format extension along the above line (your "previous" and the above clean-up proposal have the same power of expressiveness, I just think the latter is syntactically cleaner) would be reasonable. And at that point we probably would not need --numstat-enhanced at all ;-) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH (amend)] diff: Make numstat machine friendly also for renames (and copies) 2007-12-11 2:50 ` Junio C Hamano @ 2007-12-11 23:09 ` Jakub Narebski 2007-12-12 7:46 ` Junio C Hamano 2007-12-12 10:31 ` Jakub Narebski 0 siblings, 2 replies; 12+ messages in thread From: Jakub Narebski @ 2007-12-11 23:09 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Wed, 11 Dec 2007, Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: >> Junio C Hamano wrote: >>> Jakub Narebski <jnareb@gmail.com> writes: >>> >>>> Previous version of this patch (from 7 May 2007) used instead of current >>>> only "to_name" format similar to git-diff-tree raw format for renames: >>>> >>>> added deleted TAB path for "src" TAB path for "dst" LF > > That's perfectly fine (that is without -z). > >>>> ... Previous version >>>> of patch used (or was to use actually, because of error in the code) >>>> >>>> added deleted TAB path for "src" NUL NUL path for "dst" NUL >>>> >>>> when -z option was used. > > I think your "previous" one: > > <added> <deleted> <src> NUL (no rename) > <added> <deleted> <src> NUL NUL <dst> NUL (with rename) > > would be unambiguously parsable, but it would be cleaner and easier for > the parser if the latter were like this instead: > > <added> <deleted> NUL <src> NUL <dst> NUL (with rename) > > The reader can expect to find a NUL terminated src, finds the length is > zero and notices it cannot be src and then reads two paths from that > point. Very good idea. > Without having the status letter, we cannot do "optional two paths" > without breaking existing --numstat -z readers that do not care about > renames and copies, and I agree that existing --numstat -z readers that > pass -M or -C are already broken, so I think a format extension along > the above line (your "previous" and the above clean-up proposal have the > same power of expressiveness, I just think the latter is syntactically > cleaner) would be reasonable. And at that point we probably would not > need --numstat-enhanced at all ;-) The previous patch is a fix (usability fix) for numstat output in the presence of rename detection, making it truly machine friendly. Moreover it should not break any scripts which parsed numstat output not expecting to deal with renames (for example if repository has diff.renames config option set), and might even fix them. The proposed - and implemented in patch below - change could break some scripts not expecting to deal with new numstat output. Adding status letter (with similarity/dissimilarity percentage info) would unfortunately require greater surgery... We can either allow scripts (do there exist any?) to break, or make the full rename info shown only for --numstat-extended; but I'd like to have this status letter for --numstat-extended / --numstat-enhanced. So possible solution are: (a) Accept both patches, and accept remote possibility that some scripts might break (b) Add --numstat-extended option whoich would show rename info as implemented in this patch (and of course accept previous) (c) Implement --numstat-extended with status letter, as suggested, accept first patch but not this one. Note that gitweb would require --numstat with proper rename detection support if we want to e.g. provide graphical diffstat in 'commit' view for merge commits. P.S. The numstat output format for renames should be probably described in documentation, and not only in commit message, but I was not sure where to put it (and even how it should be written). -- >8 -- From: Jakub Narebski <jnareb@gmail.com> Date: Tue, 11 Dec 2007 23:52:18 +0100 Subject: [RFC/PATCH] diff: Show rename info in numstat output, correctly Make numstat output of git-diff show both source and destination filename for renames (and copies) in easily parseable format, similar to raw diff output format. The numstat format for rename (or copy) is now <added> <deleted> TAB <path for "src"> TAB <path for "dst"> LF or if -z option is used <added> <deleted> TAB NULL <path for "src"> NULL <path for "dst"> NULL where <added> and <deleted> is number of added/deleted lines (or '-' for binary file). When -z option is not used, TAB, LF, and backslash characters in pathnames are represented as \t, \n, and \\, respectively. Idea for -z numstat rename format by Junio C Hamano. Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- diff.c | 9 ++++++++- 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/diff.c b/diff.c index 8039ac7..56cbf28 100644 --- a/diff.c +++ b/diff.c @@ -991,7 +991,14 @@ static void show_numstat(struct diffstat_t* data, struct diff_options *options) printf("-\t-\t"); else printf("%d\t%d\t", file->added, file->deleted); - write_name_quoted(file->name, stdout, options->line_termination); + if (file->is_renamed) { + char sep = options->line_termination ? '\t' : '\0'; + if (!options->line_termination) + putchar(options->line_termination); + write_name_quoted(file->from_name, stdout, sep); + write_name_quoted(file->name, stdout, options->line_termination); + } else + write_name_quoted(file->name, stdout, options->line_termination); } } -- 1.5.3.7 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH (amend)] diff: Make numstat machine friendly also for renames (and copies) 2007-12-11 23:09 ` Jakub Narebski @ 2007-12-12 7:46 ` Junio C Hamano 2007-12-12 10:21 ` Jakub Narebski 2007-12-12 10:31 ` Jakub Narebski 1 sibling, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2007-12-12 7:46 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski <jnareb@gmail.com> writes: > The previous patch is a fix (usability fix) for numstat output in the > presence of rename detection, making it truly machine friendly. Moreover > it should not break any scripts which parsed numstat output not > expecting to deal with renames (for example if repository has > diff.renames config option set), and might even fix them. > > The proposed - and implemented in patch below - change could break some > scripts not expecting to deal with new numstat output. I do not quite follow. If a script wanted -M and/or -C output in the older "common/{a => b}/common" format, either change would break it equally badly, either with or without -z. And I do not think adding status letter is necessary for --numstat; if the caller wants that information, it can ask for it explicitly with both options. So I do not think there is anything more to solve, than the patch you just sent. The patch looks good. Thanks. > P.S. The numstat output format for renames should be probably described > in documentation, and not only in commit message, but I was not sure > where to put it (and even how it should be written). I started writing this, and then reviewed the result of squashing your two patches. Although everybody complains that most of git is run-once-and-exit, I wrote the original diff part fairly conservatively to make it leak-free, because of a single command, "diff-tree --stdin", that can be told to run millions of diffs inside a single process. But the diffstat part is horribly leaky, especially after your patch, and it has ugly workarounds such as refusing to show both diffstat and shortstat at the same time, not because it does not make much sense (admittedly it doesn't), but because show_stats() discards necessary information when it is done, making it impossible for shortstat to run. So I ended up restructuring the name allocation/free code a bit while at it. -- >8 -- diff --numstat -z: make it machine readable The "-z" format is all about machine parsability, but showing renamed paths as "common/{a => b}/suffix" makes it impossible. The scripts would never have successfully parsed "--numstat -z -M" in the old format. This fixes the output format in a (hopefully minimally) backward incompatible way. * The output without -z is not changed. This has given a good way for humans to view added and deleted lines separately, and showing the path in combined, shorter way would preserve readability. * The output with -z is unchanged for paths that do not involve renames. Existing scripts that do not pass -M/-C are not affected at all. * The output with -z for a renamed path is shown in a format that can easily be distinguished from an unrenamed path. This is based on Jakub Narebski's patch. Bugs and documentation typos are mine. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Documentation/diff-format.txt | 61 +++++++++++++++++++++++++ diff.c | 100 ++++++++++++++++++++++++++++------------- 2 files changed, 129 insertions(+), 32 deletions(-) diff --git a/Documentation/diff-format.txt b/Documentation/diff-format.txt index 2c3a4c4..400cbb3 100644 --- a/Documentation/diff-format.txt +++ b/Documentation/diff-format.txt @@ -84,3 +84,64 @@ all parents. include::diff-generate-patch.txt[] + + +other diff formats +------------------ + +The `--summary` option describes newly added, deleted, renamed and +copied files. The `--stat` option adds diffstat(1) graph to the +output. These options can be combined with other options, such as +`-p`, and are meant for human consumption. + +When showing a change that involves a rename or a copy, `--stat` output +formats the pathnames compactly by combining common prefix and suffix of +the pathnames. For example, a change that moves `arch/i386/Makefile` to +`arch/x86/Makefile` while modifying 4 lines will be shown like this: + +------------------------------------ +arch/{i386 => x86}/Makefile | 4 +-- +------------------------------------ + +The `--numstat` option gives the diffstat(1) information but is designed +for easier machine consumption. An entry in `--numstat` output looks +like this: + +---------------------------------------- +1 2 README +3 1 arch/{i386 => x86}/Makefile +---------------------------------------- + +That is, from left to right: + +. the number of added lines; +. a tab; +. the number of deleted lines; +. a tab; +. pathname (possibly with rename/copy information); +. a newline. + +When `-z` output option is in effect, the output is formatted this way: + +---------------------------------------- +1 2 README NUL +3 1 NUL arch/i386/Makefile NUL arch/x86/Makefile NUL +---------------------------------------- + +That is: + +. the number of added lines; +. a tab; +. the number of deleted lines; +. a tab; +. a NUL (only exists if renamed/copied); +. pathname in preimage; +. a NUL (only exists if renamed/copied); +. pathname in postimage (only exists if renamed/copied); +. a NUL. + +The extra `NUL` before the preimage path in renamed case is to allow +scripts that read the output to tell if the current record being read is +a single-path record or a rename/copy record without reading ahead. +After reading added and deleted lines, reading up to `NUL` would yield +the pathname, but if that is `NUL`, the record will show two paths. diff --git a/diff.c b/diff.c index f780e3e..fc51506 100644 --- a/diff.c +++ b/diff.c @@ -734,7 +734,9 @@ struct diffstat_t { int nr; int alloc; struct diffstat_file { + char *from_name; char *name; + char *print_name; unsigned is_unmerged:1; unsigned is_binary:1; unsigned is_renamed:1; @@ -755,11 +757,14 @@ static struct diffstat_file *diffstat_add(struct diffstat_t *diffstat, } diffstat->files[diffstat->nr++] = x; if (name_b) { - x->name = pprint_rename(name_a, name_b); + x->from_name = xstrdup(name_a); + x->name = xstrdup(name_b); x->is_renamed = 1; } - else + else { + x->from_name = NULL; x->name = xstrdup(name_a); + } return x; } @@ -803,6 +808,28 @@ static void show_graph(char ch, int cnt, const char *set, const char *reset) printf("%s", reset); } +static void fill_print_name(struct diffstat_file *file) +{ + char *pname; + + if (file->print_name) + return; + + if (!file->is_renamed) { + struct strbuf buf; + strbuf_init(&buf, 0); + if (quote_c_style(file->name, &buf, NULL, 0)) { + pname = strbuf_detach(&buf, NULL); + } else { + pname = file->name; + strbuf_release(&buf); + } + } else { + pname = pprint_rename(file->from_name, file->name); + } + file->print_name = pname; +} + static void show_stats(struct diffstat_t* data, struct diff_options *options) { int i, len, add, del, total, adds = 0, dels = 0; @@ -836,19 +863,8 @@ static void show_stats(struct diffstat_t* data, struct diff_options *options) for (i = 0; i < data->nr; i++) { struct diffstat_file *file = data->files[i]; int change = file->added + file->deleted; - - if (!file->is_renamed) { /* renames are already quoted by pprint_rename */ - struct strbuf buf; - strbuf_init(&buf, 0); - if (quote_c_style(file->name, &buf, NULL, 0)) { - free(file->name); - file->name = strbuf_detach(&buf, NULL); - } else { - strbuf_release(&buf); - } - } - - len = strlen(file->name); + fill_print_name(file); + len = strlen(file->print_name); if (max_len < len) max_len = len; @@ -873,7 +889,7 @@ static void show_stats(struct diffstat_t* data, struct diff_options *options) for (i = 0; i < data->nr; i++) { const char *prefix = ""; - char *name = data->files[i]->name; + char *name = data->files[i]->print_name; int added = data->files[i]->added; int deleted = data->files[i]->deleted; int name_len; @@ -901,17 +917,17 @@ static void show_stats(struct diffstat_t* data, struct diff_options *options) printf("%s%d%s", add_c, added, reset); printf(" bytes"); printf("\n"); - goto free_diffstat_file; + continue; } else if (data->files[i]->is_unmerged) { show_name(prefix, name, len, reset, set); printf(" Unmerged\n"); - goto free_diffstat_file; + continue; } else if (!data->files[i]->is_renamed && (added + deleted == 0)) { total_files--; - goto free_diffstat_file; + continue; } /* @@ -933,11 +949,7 @@ static void show_stats(struct diffstat_t* data, struct diff_options *options) show_graph('+', add, add_c, reset); show_graph('-', del, del_c, reset); putchar('\n'); - free_diffstat_file: - free(data->files[i]->name); - free(data->files[i]); } - free(data->files); printf("%s %d files changed, %d insertions(+), %d deletions(-)%s\n", set, total_files, adds, dels, reset); } @@ -962,11 +974,7 @@ static void show_shortstats(struct diffstat_t* data) dels += deleted; } } - free(data->files[i]->name); - free(data->files[i]); } - free(data->files); - printf(" %d files changed, %d insertions(+), %d deletions(-)\n", total_files, adds, dels); } @@ -975,6 +983,9 @@ static void show_numstat(struct diffstat_t* data, struct diff_options *options) { int i; + if (data->nr == 0) + return; + for (i = 0; i < data->nr; i++) { struct diffstat_file *file = data->files[i]; @@ -982,15 +993,39 @@ static void show_numstat(struct diffstat_t* data, struct diff_options *options) printf("-\t-\t"); else printf("%d\t%d\t", file->added, file->deleted); - if (!file->is_renamed) { - write_name_quoted(file->name, stdout, options->line_termination); + if (options->line_termination) { + fill_print_name(file); + if (!file->is_renamed) + write_name_quoted(file->name, stdout, + options->line_termination); + else { + fputs(file->print_name, stdout); + putchar(options->line_termination); + } } else { - fputs(file->name, stdout); - putchar(options->line_termination); + if (file->is_renamed) { + putchar('\0'); + write_name_quoted(file->from_name, stdout, '\0'); + } + write_name_quoted(file->name, stdout, '\0'); } } } +static void free_diffstat_info(struct diffstat_t *diffstat) +{ + int i; + for (i = 0; i < diffstat->nr; i++) { + struct diffstat_file *f = diffstat->files[i]; + if (f->name != f->print_name) + free(f->print_name); + free(f->name); + free(f->from_name); + free(f); + } + free(diffstat->files); +} + struct checkdiff_t { struct xdiff_emit_state xm; const char *filename; @@ -2943,8 +2978,9 @@ void diff_flush(struct diff_options *options) show_numstat(&diffstat, options); if (output_format & DIFF_FORMAT_DIFFSTAT) show_stats(&diffstat, options); - else if (output_format & DIFF_FORMAT_SHORTSTAT) + if (output_format & DIFF_FORMAT_SHORTSTAT) show_shortstats(&diffstat); + free_diffstat_info(&diffstat); separator++; } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH (amend)] diff: Make numstat machine friendly also for renames (and copies) 2007-12-12 7:46 ` Junio C Hamano @ 2007-12-12 10:21 ` Jakub Narebski 0 siblings, 0 replies; 12+ messages in thread From: Jakub Narebski @ 2007-12-12 10:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: > > > The previous patch is a fix (usability fix) for numstat output in the > > presence of rename detection, making it truly machine friendly. Moreover > > it should not break any scripts which parsed numstat output not > > expecting to deal with renames (for example if repository has > > diff.renames config option set), and might even fix them. > > > > The proposed - and implemented in patch below - change could break some > > scripts not expecting to deal with new numstat output. > > I do not quite follow. If a script wanted -M and/or -C output in the > older "common/{a => b}/common" format, either change would break it > equally badly, either with or without -z. And I do not think adding > status letter is necessary for --numstat; if the caller wants that > information, it can ask for it explicitly with both options. I meant it there that if an [older] script parsing numstat output didn't expect to deal with renames (but had to for example because of diff.renames set), and expected the numstat output format to be either <add> TAB <del> TAB <filename_q> LF or (if running with -z option) <add> TAB <del> TAB <filename> NUL After first patch, but before second, those expectations were met, that I meant by "and might even fix them" (the scripts). BTW. beside the fact that for renames it was not filename (pathname), it could be half-quoted; see also below. > So I do not think there is anything more to solve, than the patch you > just sent. > > The patch looks good. Thanks. > > > P.S. The numstat output format for renames should be probably described > > in documentation, and not only in commit message, but I was not sure > > where to put it (and even how it should be written). > > I started writing this, and then reviewed the result of squashing your > two patches. And because of the above I wanted that you'd rather _not_ squash those patches, but apply them separately... > Although everybody complains that most of git is run-once-and-exit, I > wrote the original diff part fairly conservatively to make it leak-free, > because of a single command, "diff-tree --stdin", that can be told to > run millions of diffs inside a single process. > > But the diffstat part is horribly leaky, especially after your patch, > and it has ugly workarounds such as refusing to show both diffstat and > shortstat at the same time, not because it does not make much sense > (admittedly it doesn't), but because show_stats() discards necessary > information when it is done, making it impossible for shortstat to run. > > So I ended up restructuring the name allocation/free code a bit while at > it. ...but as you did more surgery on diffstat, it is reasonable to do it in one commit, not two. > -- >8 -- > diff --numstat -z: make it machine readable > > The "-z" format is all about machine parsability, but showing renamed > paths as "common/{a => b}/suffix" makes it impossible. The scripts would > never have successfully parsed "--numstat -z -M" in the old format. > > This fixes the output format in a (hopefully minimally) backward > incompatible way. > > * The output without -z is not changed. This has given a good way for > humans to view added and deleted lines separately, and showing the > path in combined, shorter way would preserve readability. > > * The output with -z is unchanged for paths that do not involve renames. > Existing scripts that do not pass -M/-C are not affected at all. > > * The output with -z for a renamed path is shown in a format that can > easily be distinguished from an unrenamed path. This changes invariant what we had, that output with and without -z differ only in [filename] quoting and record separator. Now those two have different semantic: --numstat without -z is for easier reading, --numstat with -z is for machine consumption. But as output with and without -z has to differ (in the form) already for renames, why not... [...] > include::diff-generate-patch.txt[] > + > + > +other diff formats > +------------------ > + > +The `--summary` option describes newly added, deleted, renamed and > +copied files. The `--stat` option adds diffstat(1) graph to the > +output. These options can be combined with other options, such as > +`-p`, and are meant for human consumption. > + > +When showing a change that involves a rename or a copy, `--stat` output > +formats the pathnames compactly by combining common prefix and suffix of > +the pathnames. For example, a change that moves `arch/i386/Makefile` to > +`arch/x86/Makefile` while modifying 4 lines will be shown like this: Perhaps it could be added there word about "table-like format", and about column widths (total width, filename [column] width). > + > +------------------------------------ > +arch/{i386 => x86}/Makefile | 4 +-- I would add there example of shortened filename there, for example +.../SubmittingPatches | 2 + Do I understand correctly that code first tries to strip directories, and only if the filename is too long it does shorten filename? By the way the combining common prefix and suffix (pprint_rename) doesn't play well with quoting: when only one of the source and destination names are quoted it wouldn't of course find any common prefix/suffix. I think that if one of filenames has to be quoted, both should be. And shortening of filenames together with either combining common prefix, and/or with filename quoting produces output with might be a bit strange for some. But I don't have idea how (and if) it could be corrected. > +------------------------------------ > + > +The `--numstat` option gives the diffstat(1) information but is designed > +for easier machine consumption. An entry in `--numstat` output looks > +like this: > + > +---------------------------------------- > +1 2 README > +3 1 arch/{i386 => x86}/Makefile > +---------------------------------------- Note that filename is not shortened in --numstat (as it can be in --stat). > + > +When `-z` output option is in effect, the output is formatted this way: > + > +---------------------------------------- > +1 2 README NUL > +3 1 NUL arch/i386/Makefile NUL arch/x86/Makefile NUL > +---------------------------------------- I wanted to say that it would make it harder on "line"-based parsers... but I have realized that it would not, as one can simply read next two "lines" (records) for pre-image and post-image filename if 0-length filename is detected. [...] > +static void fill_print_name(struct diffstat_file *file) Nice idea to separate this into function. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH (amend)] diff: Make numstat machine friendly also for renames (and copies) 2007-12-11 23:09 ` Jakub Narebski 2007-12-12 7:46 ` Junio C Hamano @ 2007-12-12 10:31 ` Jakub Narebski 2007-12-12 19:07 ` Junio C Hamano 1 sibling, 1 reply; 12+ messages in thread From: Jakub Narebski @ 2007-12-12 10:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: git BTW. I have noticed something strange with current (after my two patches) diffstat code: 3606:[gitweb/web@git]# ./git diff-tree -C -C -r --stat gitweb/test~8 0456a2ba58efb0e1d5f7421d5a8a2278e3b15ebc .../test/file with spaces\tand\ttabs" | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) 3607:[gitweb/web@git]# ./git diff-tree -C -C -r --numstat gitweb/test~8 0456a2ba58efb0e1d5f7421d5a8a2278e3b15ebc 3 1 gitweb/test/file with spaces "gitweb/test/file with spaces\tand\ttabs" but: 3603:[gitweb/web@git]# ./git diff-tree -C -C -r --stat gitweb/test~6 cfb9ef9ec73b37f44e9370c4f5d91e01d46ec6e4 gitweb/test/Documentation-symlink1 | 1 + gitweb/test/Documentation-symlink2 | 1 + .../{git-shortlog.8.html => git-shortlog.html} | 0 3 files changed, 2 insertions(+), 0 deletions(-) So rename detection works. Strange. P.S. gitweb/test branch can be found on http://repo.or.cz/w/git/jnareb-git.git -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH (amend)] diff: Make numstat machine friendly also for renames (and copies) 2007-12-12 10:31 ` Jakub Narebski @ 2007-12-12 19:07 ` Junio C Hamano 0 siblings, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2007-12-12 19:07 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski <jnareb@gmail.com> writes: > BTW. I have noticed something strange with current (after my two > patches) diffstat code: > > 3606:[gitweb/web@git]# ./git diff-tree -C -C -r --stat gitweb/test~8 > 0456a2ba58efb0e1d5f7421d5a8a2278e3b15ebc > .../test/file with spaces\tand\ttabs" | 4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > 3607:[gitweb/web@git]# ./git diff-tree -C -C -r --numstat gitweb/test~8 > 0456a2ba58efb0e1d5f7421d5a8a2278e3b15ebc > 3 1 gitweb/test/file with spaces "gitweb/test/file with spaces\tand\ttabs" If you are wondering about the lack of => in --stat when --numstat is showing rename, there is nothing strange going on. "filename scaling" done in --stat happens way later than pprint_rename() in the current code structure (essentially, it chomps at the last slash to keep the long "path" fit within the given space). --stat (not --numstat) is for human consumption and showing longer part of the name of postimage is more important if we do not have enough room, and most tools that enable rename use --summary with --stat anyway, so the rename information for such an oddball long path can be found out elsewhere in the output if needed. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] diff: Make numstat machine friendly also for renames (and copies) 2007-12-10 22:32 [PATCH] diff: Make numstat machine friendly also for renames (and copies) Jakub Narebski 2007-12-10 22:55 ` [PATCH (amend)] " Jakub Narebski @ 2007-12-10 23:00 ` Junio C Hamano 2007-12-10 23:14 ` Jakub Narebski 1 sibling, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2007-12-10 23:00 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Jakub Narebski <jnareb@gmail.com> writes: > "git diff --numstat" used the same format as "git diff --stat" for > renamed (and copied) files, except that filenames were not shortened > when they didn't fit in the column width. This format is suitable for > human consumption, but it cannot be unambiguously parsed. Agreed about the (un)parsability, and --numstat is all about parsability so I would not object. A fix is really needed there. I do not have time to look at the patch right now, but if the changed output is in line with what --name-status would show, that would be great. I'd call that "the format that should have been from day one". I.e. no '=>' rename marker, but show two names c-quoted (unless -z is used) and separated with inter_name_termination). IIRC, that is how rename/copy is shown with --name-status. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] diff: Make numstat machine friendly also for renames (and copies) 2007-12-10 23:00 ` [PATCH] " Junio C Hamano @ 2007-12-10 23:14 ` Jakub Narebski 0 siblings, 0 replies; 12+ messages in thread From: Jakub Narebski @ 2007-12-10 23:14 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: > > > "git diff --numstat" used the same format as "git diff --stat" for > > renamed (and copied) files, except that filenames were not shortened > > when they didn't fit in the column width. This format is suitable for > > human consumption, but it cannot be unambiguously parsed. > > Agreed about the (un)parsability, and --numstat is all about parsability > so I would not object. A fix is really needed there. > > I do not have time to look at the patch right now, but if the changed > output is in line with what --name-status would show, that would be > great. I'd call that "the format that should have been from day one". > > I.e. no '=>' rename marker, but show two names c-quoted (unless -z is > used) and separated with inter_name_termination). IIRC, that is how > rename/copy is shown with --name-status. Unfortunately this is not possible, at least if we want to retain the assertion that -z output looks like normal output, only without quoting. diff --name-status has _status_ field which can be used to distinguish if the NUL (for -z output) is the end of source filename, or the end of record. The patch send changes --numstat to use only _destination_ name. What you want I'd left for futore --numstat-extended (basically --numstat, but with status field. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-12-12 19:09 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-12-10 22:32 [PATCH] diff: Make numstat machine friendly also for renames (and copies) Jakub Narebski 2007-12-10 22:55 ` [PATCH (amend)] " Jakub Narebski 2007-12-11 1:06 ` Junio C Hamano 2007-12-11 1:26 ` Jakub Narebski 2007-12-11 2:50 ` Junio C Hamano 2007-12-11 23:09 ` Jakub Narebski 2007-12-12 7:46 ` Junio C Hamano 2007-12-12 10:21 ` Jakub Narebski 2007-12-12 10:31 ` Jakub Narebski 2007-12-12 19:07 ` Junio C Hamano 2007-12-10 23:00 ` [PATCH] " Junio C Hamano 2007-12-10 23:14 ` Jakub Narebski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).