* [PATCH/RFC] diff: Make numstat machine friendly also for renames
@ 2007-05-08 0:43 Jakub Narebski
2007-05-08 1:10 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Narebski @ 2007-05-08 0:43 UTC (permalink / raw)
To: git; +Cc: Jakub Narebski
Instead of saving human readable rename information in the 'name'
field when diffstat info is generated, do it when writing --stat
output. Change --numstat output to be machine friendly.
This makes result of git-diff --numstat more suitable for machines
also when renames are involved, by using format similar to the one for
renames in the raw diff format, instead of the format more suited for
humans.
The numstat format for rename is now
added deleted TAB path for "src" TAB path for "dst" LF
or if -z option is used
added deleted TAB path for "src" NUL NUL path for "dst" NUL
When -z option is not used, ", TAB, LF, and backslash characters in
pathnames are represented as \", \t, \n, and \\, respectively. If any
character needs to be quoted then pathnames are enclosed in double
quotes.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This change increases memory footprint a bit, as struct diffstat_file
is wider by sizeof(char *) wide field, which is NULL except for
renames.
I have thought about storing it using one pointer to fragment of
memory of the form "dst name \0 src name \0", but this trades a little
memory for CPU time.
The goal of this change is to make it possible to generate HTML
diffstat against first parent for merge commits in gitweb. The current
notation for renames, which looks for example like below:
t/{t6030-bisect-run.sh => t6030-bisect-porcelain.sh}
is not easy to parse by machines (note that filename may contain
"=>"), but easy to understand _usually_ by humans.
P.S. By the way, what is the difference between quote_one and
quote_c_style, i.e. when one calls one and when the other?
diff.c | 22 ++++++++++++++++++----
1 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/diff.c b/diff.c
index 7bbe759..568c59b 100644
--- a/diff.c
+++ b/diff.c
@@ -703,6 +703,7 @@ struct diffstat_t {
int alloc;
struct diffstat_file {
char *name;
+ char *from_name;
unsigned is_unmerged:1;
unsigned is_binary:1;
unsigned is_renamed:1;
@@ -722,12 +723,13 @@ static struct diffstat_file *diffstat_add(struct diffstat_t *diffstat,
diffstat->alloc * sizeof(x));
}
diffstat->files[diffstat->nr++] = x;
+ x->name = xstrdup(name_a);
if (name_b) {
- x->name = pprint_rename(name_a, name_b);
+ x->from_name = xstrdup(name_b);
x->is_renamed = 1;
}
else
- x->name = xstrdup(name_a);
+ x->from_name = NULL;
return x;
}
@@ -805,7 +807,7 @@ static void show_stats(struct diffstat_t* data, struct diff_options *options)
struct diffstat_file *file = data->files[i];
int change = file->added + file->deleted;
- if (!file->is_renamed) { /* renames are already quoted by pprint_rename */
+ if (!file->is_renamed) { /* renames are quoted by pprint_rename */
len = quote_c_style(file->name, NULL, NULL, 0);
if (len) {
char *qname = xmalloc(len + 1);
@@ -813,6 +815,10 @@ static void show_stats(struct diffstat_t* data, struct diff_options *options)
free(file->name);
file->name = qname;
}
+ } else {
+ char *qname = pprint_rename(file->name, file->from_name);
+ free(file->name);
+ file->name = qname;
}
len = strlen(file->name);
@@ -949,11 +955,19 @@ static void show_numstat(struct diffstat_t* data, struct diff_options *options)
printf("-\t-\t");
else
printf("%d\t%d\t", file->added, file->deleted);
- if (options->line_termination && !file->is_renamed &&
+ if (options->line_termination &&
quote_c_style(file->name, NULL, NULL, 0))
quote_c_style(file->name, NULL, stdout, 0);
else
fputs(file->name, stdout);
+ if (file->is_renamed) {
+ printf("%s", options->line_termination ? "\t" : "\0\0");
+ if (options->line_termination &&
+ quote_c_style(file->from_name, NULL, NULL, 0))
+ quote_c_style(file->from_name, NULL, stdout, 0);
+ else
+ fputs(file->from_name, stdout);
+ }
putchar(options->line_termination);
}
}
--
1.5.1.3
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH/RFC] diff: Make numstat machine friendly also for renames
2007-05-08 0:43 [PATCH/RFC] diff: Make numstat machine friendly also for renames Jakub Narebski
@ 2007-05-08 1:10 ` Junio C Hamano
2007-05-08 1:45 ` Jakub Narebski
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2007-05-08 1:10 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
> Instead of saving human readable rename information in the 'name'
> field when diffstat info is generated, do it when writing --stat
> output. Change --numstat output to be machine friendly.
>
> This makes result of git-diff --numstat more suitable for machines
> also when renames are involved, by using format similar to the one for
> renames in the raw diff format, instead of the format more suited for
> humans.
>
> The numstat format for rename is now
>
> added deleted TAB path for "src" TAB path for "dst" LF
>
> or if -z option is used
>
> added deleted TAB path for "src" NUL NUL path for "dst" NUL
Why two NULs?
There are already a handful in-tree users of --numstat, and also
a few tests scripts. I think you would need to adjust them.
> The goal of this change is to make it possible to generate HTML
> diffstat against first parent for merge commits in gitweb. The current
> notation for renames, which looks for example like below:
>
> t/{t6030-bisect-run.sh => t6030-bisect-porcelain.sh}
I do not have much objection against teaching --numstat to show
the preimage pathnames. I do not disagree with "the goal" of
showing "git diff --stat -M $commit^1 $commit" even for merge
commit.
But I do not see the connection between the two. Why aren't you
parsing --summary?
Have you actually _tested_ your patch?
> @@ -949,11 +955,19 @@ static void show_numstat(struct diffstat_t* data,
> printf("-\t-\t");
> else
> printf("%d\t%d\t", file->added, file->deleted);
> - if (options->line_termination && !file->is_renamed &&
> + if (options->line_termination &&
> quote_c_style(file->name, NULL, NULL, 0))
> quote_c_style(file->name, NULL, stdout, 0);
> else
> fputs(file->name, stdout);
> + if (file->is_renamed) {
> + printf("%s", options->line_termination ? "\t" : "\0\0");
I know you marked it as RFC; but it is impolite to request
comments from other people on a patch that does not do what you
intended to do, without marking "this is untested". It would
waste people's time.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH/RFC] diff: Make numstat machine friendly also for renames
2007-05-08 1:10 ` Junio C Hamano
@ 2007-05-08 1:45 ` Jakub Narebski
2007-05-08 1:58 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Narebski @ 2007-05-08 1:45 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>
>> Instead of saving human readable rename information in the 'name'
>> field when diffstat info is generated, do it when writing --stat
>> output. Change --numstat output to be machine friendly.
>>
>> This makes result of git-diff --numstat more suitable for machines
>> also when renames are involved, by using format similar to the one for
>> renames in the raw diff format, instead of the format more suited for
>> humans.
>>
>> The numstat format for rename is now
>>
>> added deleted TAB path for "src" TAB path for "dst" LF
>>
>> or if -z option is used
>>
>> added deleted TAB path for "src" NUL NUL path for "dst" NUL
>
> Why two NULs?
That was the only way I could think of to separate pre-image name
from posi-image name for renames. Note that file name might look like
(part of) diffstat line, and there is no 'status' field in the
numstat to mark rename (as there is in "git diff-tree --raw" output).
But it doesn't mean that this is the only way, that is why it is RFC.
Well that and the fact that this patch increases slightly memory
footprint.
> There are already a handful in-tree users of --numstat, and also
> a few tests scripts. I think you would need to adjust them.
Right, I forgot to run "make test" to check which tests scripts
would need adjusting.
But result of "git grep -e -M --and -e --numstat -- t/" is empty,
so I don't think that any script test --numstat with rename detection.
>> The goal of this change is to make it possible to generate HTML
>> diffstat against first parent for merge commits in gitweb. The current
>> notation for renames, which looks for example like below:
>>
>> t/{t6030-bisect-run.sh => t6030-bisect-porcelain.sh}
>
> I do not have much objection against teaching --numstat to show
> the preimage pathnames. I do not disagree with "the goal" of
> showing "git diff --stat -M $commit^1 $commit" even for merge
> commit.
>
> But I do not see the connection between the two. Why aren't you
> parsing --summary?
Did you mean --stat here? Because
--summary::
Output a condensed summary of extended header information
such as creations, renames and mode changes.
I'd like to have diffstat for merge, similar to what "git pull <repo>"
does when doing true merge, not what "git commit" does.
And --stat is meant for human consumption, not for machine consumption.
File name may contain " => " inside. And there is no way to differentiate
between " => " in file name, and " => " separating "src" path name from
"dst" path name.
> Have you actually _tested_ your patch?
Compiled, but forgot to run "make test". But I have checked that it passes
"make test", which probably mean that we don't have enough coverage ;-)
>> @@ -949,11 +955,19 @@ static void show_numstat(struct diffstat_t* data,
>> printf("-\t-\t");
>> else
>> printf("%d\t%d\t", file->added, file->deleted);
>> - if (options->line_termination && !file->is_renamed &&
>> + if (options->line_termination &&
>> quote_c_style(file->name, NULL, NULL, 0))
>> quote_c_style(file->name, NULL, stdout, 0);
>> else
>> fputs(file->name, stdout);
>> + if (file->is_renamed) {
>> + printf("%s", options->line_termination ? "\t" : "\0\0");
>
> I know you marked it as RFC; but it is impolite to request
> comments from other people on a patch that does not do what you
> intended to do, without marking "this is untested". It would
> waste people's time.
It passes "make test". I should perhaps mark more strongly that some
of _ideas_ are untested...
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH/RFC] diff: Make numstat machine friendly also for renames
2007-05-08 1:45 ` Jakub Narebski
@ 2007-05-08 1:58 ` Junio C Hamano
2007-05-08 12:33 ` Jakub Narebski
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2007-05-08 1:58 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
>>> The numstat format for rename is now
>>>
>>> added deleted TAB path for "src" TAB path for "dst" LF
>>>
>>> or if -z option is used
>>>
>>> added deleted TAB path for "src" NUL NUL path for "dst" NUL
>>
>> Why two NULs?
>
> That was the only way I could think of to separate pre-image name
> from posi-image name for renames. Note that file name might look like
> (part of) diffstat line, and there is no 'status' field in the
> numstat to mark rename (as there is in "git diff-tree --raw" output).
The --stat format is for human consumption, and --numstat (be it
with -z or without) is for machines, so I am not opposed to a
format change that gives information that is already computed
but currently is hard to parse. If the format change breaks
existing scripts, we might want to do --numstat-extended,
though...
For example, I do not see a reason not to add "R98" in there.
I.e.
added deleted status TAB "src" (TAB "dst"){0,1} LF
added deleted status NUL "src" (NUL "dst"){0,1} NUL
where the dst path is present only when status says it is a
rename/copy, just like the --raw format.
> Did you mean --stat here?
No, I did mean --summary. But that was foolish of me. I forgot
that it had the same { namepart => namepart } issue.
>>> @@ -949,11 +955,19 @@ static void show_numstat(struct diffstat_t* data,
>>> printf("-\t-\t");
>>> else
>>> printf("%d\t%d\t", file->added, file->deleted);
>>> - if (options->line_termination && !file->is_renamed &&
>>> + if (options->line_termination &&
>>> quote_c_style(file->name, NULL, NULL, 0))
>>> quote_c_style(file->name, NULL, stdout, 0);
>>> else
>>> fputs(file->name, stdout);
>>> + if (file->is_renamed) {
>>> + printf("%s", options->line_termination ? "\t" : "\0\0");
>
> It passes "make test".
But you already grepped to make sure this codepath is never
exercised ;-).
What I was hoping you to notice was that printf("%s", "\0\0")
thing. %s would not even notice that the const char[] literal
is 2 bytes long.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH/RFC] diff: Make numstat machine friendly also for renames
2007-05-08 1:58 ` Junio C Hamano
@ 2007-05-08 12:33 ` Jakub Narebski
2007-05-09 4:59 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Jakub Narebski @ 2007-05-08 12:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Junio C Hamano <junkio@cox.net> wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
>
>>>> The numstat format for rename is now
>>>>
>>>> added deleted TAB path for "src" TAB path for "dst" LF
>>>>
>>>> or if -z option is used
>>>>
>>>> added deleted TAB path for "src" NUL NUL path for "dst" NUL
>>>
>>> Why two NULs?
>>
>> That was the only way I could think of to separate pre-image name
>> from posi-image name for renames. Note that file name might look like
>> (part of) diffstat line, and there is no 'status' field in the
>> numstat to mark rename (as there is in "git diff-tree --raw" output).
>
> The --stat format is for human consumption, and --numstat (be it
> with -z or without) is for machines, so I am not opposed to a
> format change that gives information that is already computed
> but currently is hard to parse. If the format change breaks
> existing scripts, we might want to do --numstat-extended,
> though...
>
> For example, I do not see a reason not to add "R98" in there.
> I.e.
>
> added deleted status TAB "src" (TAB "dst"){0,1} LF
> added deleted status NUL "src" (NUL "dst"){0,1} NUL
>
> where the dst path is present only when status says it is a
> rename/copy, just like the --raw format.
That is a good idea, but wouldn't it break existing scripts? Well,
break more than a bit hacky idea of using NUL NUL as separator between
pre-image name and post-image name.
This would change output in every case, while my proposal doesn't change
output for the case without renames. '-M' should also work correctly,
perhaps scripts using it getting wrong filename. It is '-z -M' that changes
most.
>> Did you mean --stat here?
>
> No, I did mean --summary. But that was foolish of me. I forgot
> that it had the same { namepart => namepart } issue.
Ah. I somehow didn't get then that you meant for --numstat to have only
post-image names, and get pre-image names from rename information in
--summary. But as you have noticed it wouldn't help: rename information
is in the same for-humans format.
>>>> @@ -949,11 +955,19 @@ static void show_numstat(struct diffstat_t* data,
>>>> printf("-\t-\t");
>>>> else
>>>> printf("%d\t%d\t", file->added, file->deleted);
>>>> - if (options->line_termination && !file->is_renamed &&
>>>> + if (options->line_termination &&
>>>> quote_c_style(file->name, NULL, NULL, 0))
>>>> quote_c_style(file->name, NULL, stdout, 0);
>>>> else
>>>> fputs(file->name, stdout);
>>>> + if (file->is_renamed) {
>>>> + printf("%s", options->line_termination ? "\t" : "\0\0");
>
> What I was hoping you to notice was that printf("%s", "\0\0")
> thing. %s would not even notice that the const char[] literal
> is 2 bytes long.
Ooops. Shame on me. That is the result of trying to be too smart...
and changing separator between pathnames for rename from NUL to NUL NUL.
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH/RFC] diff: Make numstat machine friendly also for renames
2007-05-08 12:33 ` Jakub Narebski
@ 2007-05-09 4:59 ` Junio C Hamano
0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2007-05-09 4:59 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git
Jakub Narebski <jnareb@gmail.com> writes:
>> The --stat format is for human consumption, and --numstat (be it
>> with -z or without) is for machines, so I am not opposed to a
>> format change that gives information that is already computed
>> but currently is hard to parse. If the format change breaks
>> existing scripts, we might want to do --numstat-extended,
>> though...
>>
>> For example, I do not see a reason not to add "R98" in there.
>> I.e.
>>
>> added deleted status TAB "src" (TAB "dst"){0,1} LF
>> added deleted status NUL "src" (NUL "dst"){0,1} NUL
>>
>> where the dst path is present only when status says it is a
>> rename/copy, just like the --raw format.
>
> That is a good idea, but wouldn't it break existing scripts? Well,
> break more than a bit hacky idea of using NUL NUL as separator between
> pre-image name and post-image name.
I think both would break equally. That's why I hinted --numstat-extended,
but I think the information is getting to be about --machine-readable, and
not necessarily about "stat" anymore.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-05-09 4:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-08 0:43 [PATCH/RFC] diff: Make numstat machine friendly also for renames Jakub Narebski
2007-05-08 1:10 ` Junio C Hamano
2007-05-08 1:45 ` Jakub Narebski
2007-05-08 1:58 ` Junio C Hamano
2007-05-08 12:33 ` Jakub Narebski
2007-05-09 4:59 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).