Possible regression in git-rev-list --header

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Possible regression in git-rev-list --header
@ 2006-12-30 17:56 Marco Costalba
  2006-12-30 18:30 ` Jakub Narebski
  2006-12-30 18:57 ` Johannes Schindelin
  0 siblings, 2 replies; 26+ messages in thread
From: Marco Costalba @ 2006-12-30 17:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

When 'commitencoding' variable is set in config file then git-rev-list
called with --header option sends also the encoding information.

$ git rev-list --header -n1 HEAD
6d751699cb04150abd79a730187d4e2ed6330c05
tree 70209eebdc59d108948feb15c3c5497f299ef290
parent 49a8186d7352d0454df79b289fecb18c8e535c32
author Marco Costalba <mcostalba@gmail.com> 1167500660 +0100
committer Marco Costalba <mcostalba@gmail.com> 1167500660 +0100
encoding UTF-8

    Test commit

    Let's see what git-rev-list --header spits out.

    Signed-off-by: Marco Costalba <mcostalba@gmail.com>

$ git --version
git version 1.5.0.rc0.g1ed48


This is very unfortunate because qgit parsing routine it's totally
broken after this change. Please revert the patch, or at least make
git-rev-list --header quiet as default, IOW add an option
--show-encoding that defaults to disabled.

Sorry to ask you this but _all_ the qgit users out there will see a
broken main view revision's list for all the commits created after the
patch went in.

Thanks
Marco

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2006-12-30 17:56 Possible regression in git-rev-list --header Marco Costalba
@ 2006-12-30 18:30 ` Jakub Narebski
  2006-12-30 18:57 ` Johannes Schindelin
  1 sibling, 0 replies; 26+ messages in thread
From: Jakub Narebski @ 2006-12-30 18:30 UTC (permalink / raw)
  To: git

Marco Costalba wrote:

> When 'commitencoding' variable is set in config file then git-rev-list
> called with --header option sends also the encoding information.
> 
> $ git rev-list --header -n1 HEAD
> 6d751699cb04150abd79a730187d4e2ed6330c05
> tree 70209eebdc59d108948feb15c3c5497f299ef290
> parent 49a8186d7352d0454df79b289fecb18c8e535c32
> author Marco Costalba <mcostalba@gmail.com> 1167500660 +0100
> committer Marco Costalba <mcostalba@gmail.com> 1167500660 +0100
> encoding UTF-8
> 
>     Test commit
> 
>     Let's see what git-rev-list --header spits out.
> 
>     Signed-off-by: Marco Costalba <mcostalba@gmail.com>
> 
> $ git --version
> git version 1.5.0.rc0.g1ed48
> 
> 
> This is very unfortunate because qgit parsing routine it's totally
> broken after this change. Please revert the patch, or at least make
> git-rev-list --header quiet as default, IOW add an option
> --show-encoding that defaults to disabled.

Well, that is incompatibile change, but it also shows design errors
in qgit, as it should ignore unknown headers, and rely on "\n\n" to
separate header from the body. Isn't it?

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2006-12-30 17:56 Possible regression in git-rev-list --header Marco Costalba
  2006-12-30 18:30 ` Jakub Narebski
@ 2006-12-30 18:57 ` Johannes Schindelin
  2006-12-30 20:20   ` Junio C Hamano
  2006-12-30 20:22   ` [PATCH] Move commit reencoding parameter parsing to revision.c Junio C Hamano
  1 sibling, 2 replies; 26+ messages in thread
From: Johannes Schindelin @ 2006-12-30 18:57 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Junio C Hamano, Git Mailing List

Hi,

On Sat, 30 Dec 2006, Marco Costalba wrote:

> When 'commitencoding' variable is set in config file then git-rev-list
> called with --header option sends also the encoding information.

As Jakub pointed out, qgit should not expect to know all headers. I am 
very sorry, since I said I looked at all parsers of the commit header in 
git, but that was _only_ git, and no porcelains.

Please fix qgit, since I really consider this a bug.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2006-12-30 18:57 ` Johannes Schindelin
@ 2006-12-30 20:20   ` Junio C Hamano
  2006-12-30 22:19     ` Junio C Hamano
  2006-12-31  0:37     ` Johannes Schindelin
  2006-12-30 20:22   ` [PATCH] Move commit reencoding parameter parsing to revision.c Junio C Hamano
  1 sibling, 2 replies; 26+ messages in thread
From: Junio C Hamano @ 2006-12-30 20:20 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git, Johannes Schindelin

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Sat, 30 Dec 2006, Marco Costalba wrote:
>
>> When 'commitencoding' variable is set in config file then git-rev-list
>> called with --header option sends also the encoding information.
>
> As Jakub pointed out, qgit should not expect to know all headers. I am 
> very sorry, since I said I looked at all parsers of the commit header in 
> git, but that was _only_ git, and no porcelains.
>
> Please fix qgit, since I really consider this a bug.

I have to agree with Johannes.  In principle Porcelains should
be prepared to see and ignore unknown headers.

However, this commit created by `commit-tree` certalinly can be
improved.

        $ git rev-list --header -n1 HEAD
        6d751699cb04150abd79a730187d4e2ed6330c05
        tree 70209eebdc59d108948feb15c3c5497f299ef290
        parent 49a8186d7352d0454df79b289fecb18c8e535c32
        author Marco Costalba <mcostalba@gmail.com> 1167500660 +0100
        committer Marco Costalba <mcostalba@gmail.com> 1167500660 +0100
        encoding UTF-8

           Test commit

           Let's see what git-rev-list --header spits out.

           Signed-off-by: Marco Costalba <mcostalba@gmail.com>

-- >8 --
commit-tree: cope with different ways "utf-8" can be spelled.

People can spell config.commitencoding differently from what we
internally have ("utf-8") to mean UTF-8.  Try to accept them and
treat them equally.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff --git a/builtin-commit-tree.c b/builtin-commit-tree.c
index 146aaff..0651e59 100644
--- a/builtin-commit-tree.c
+++ b/builtin-commit-tree.c
@@ -119,8 +119,7 @@ int cmd_commit_tree(int argc, const char **argv, const char *prefix)
 	}
 
 	/* Not having i18n.commitencoding is the same as having utf-8 */
-	encoding_is_utf8 = (!git_commit_encoding ||
-			    !strcmp(git_commit_encoding, "utf-8"));
+	encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
 
 	init_buffer(&buffer, &size);
 	add_buffer(&buffer, &size, "tree %s\n", sha1_to_hex(tree_sha1));
diff --git a/utf8.c b/utf8.c
index 1eedd8b..7c80eec 100644
--- a/utf8.c
+++ b/utf8.c
@@ -277,6 +277,15 @@ void print_wrapped_text(const char *text, int indent, int indent2, int width)
 	}
 }
 
+int is_encoding_utf8(const char *name)
+{
+	if (!name)
+		return 1;
+	if (!strcasecmp(name, "utf-8") || !strcasecmp(name, "utf8"))
+		return 1;
+	return 0;
+}
+
 /*
  * Given a buffer and its encoding, return it re-encoded
  * with iconv.  If the conversion fails, returns NULL.
diff --git a/utf8.h b/utf8.h
index cae2a8e..a07c5a8 100644
--- a/utf8.h
+++ b/utf8.h
@@ -3,6 +3,8 @@
 
 int utf8_width(const char **start);
 int is_utf8(const char *text);
+int is_encoding_utf8(const char *name);
+
 void print_wrapped_text(const char *text, int indent, int indent2, int len);
 
 #ifndef NO_ICONV

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2006-12-30 20:20   ` Junio C Hamano
@ 2006-12-30 22:19     ` Junio C Hamano
  2006-12-31  1:13       ` Johannes Schindelin
  2006-12-31  0:37     ` Johannes Schindelin
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2006-12-30 22:19 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git, Johannes Schindelin

Junio C Hamano <junkio@cox.net> writes:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> On Sat, 30 Dec 2006, Marco Costalba wrote:
>>
>>> When 'commitencoding' variable is set in config file then git-rev-list
>>> called with --header option sends also the encoding information.
>>
>> As Jakub pointed out, qgit should not expect to know all headers. I am 
>> very sorry, since I said I looked at all parsers of the commit header in 
>> git, but that was _only_ git, and no porcelains.
>>
>> Please fix qgit, since I really consider this a bug.
>
> I have to agree with Johannes.  In principle Porcelains should
> be prepared to see and ignore unknown headers.
>
> However, this commit created by `commit-tree` certalinly can be
> improved.
> ...

Another thing.  I think it would make sense to remove "encoding"
header after pretty_print_commit successfully re-codes the
buffer.  An alternative is to rewrite "encoding" header to show
which encoding the log now uses (and omit it if it is UTF-8).

The attached patch does the latter, but I think removing the
header altogether in this case would make it easier to use
overall.  If you want to, you can change

	if (is_encoding_utf8(encoding))

in the patch with "if (1)" (the above macro is what was
introduced in my previous patch).

Pro for having "encoding" rewritten as this patch does is that
the output is still self-identifying.  You can tell what
encoding you are supposed to interpret the log messages in.  Con
against it is that the output is after re-coding as the user
asked (either with the config or --encoding=<encoding> option),
and the extra header becomes redundant information.

-- >8 --

 commit.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/commit.c b/commit.c
index e13b9cb..bf82fe8 100644
--- a/commit.c
+++ b/commit.c
@@ -624,6 +624,48 @@ static char *get_header(const struct commit *commit, const char *key)
 	}
 }
 
+static char *replace_encoding_header(char *buf, char *encoding)
+{
+	char *encoding_header = strstr(buf, "\nencoding ");
+	char *end_of_encoding_header;
+	int encoding_header_pos;
+	int encoding_header_len;
+	int new_len;
+	int need_len;
+	int buflen = strlen(buf) + 1;
+
+	if (!encoding_header)
+		return buf; /* should not happen but be defensive */
+	encoding_header++;
+	end_of_encoding_header = strchr(encoding_header, '\n');
+	if (!end_of_encoding_header)
+		return buf; /* should not happen but be defensive */
+	end_of_encoding_header++;
+
+	encoding_header_len = end_of_encoding_header - encoding_header;
+	encoding_header_pos = encoding_header - buf;
+
+	if (is_encoding_utf8(encoding)) {
+		/* we have re-coded to UTF-8; drop the header */
+		memmove(encoding_header, end_of_encoding_header,
+			buflen - (encoding_header_pos + encoding_header_len));
+		return buf;
+	}
+	new_len = strlen(encoding);
+	need_len = new_len + strlen("encoding \n");
+	if (encoding_header_len < need_len) {
+		buf = xrealloc(buf, buflen + (need_len - encoding_header_len));
+		encoding_header = buf + encoding_header_pos;
+		end_of_encoding_header = encoding_header + encoding_header_len;
+	}
+	memmove(end_of_encoding_header + (need_len - encoding_header_len),
+		end_of_encoding_header,
+		buflen - (encoding_header_pos + encoding_header_len));
+	memcpy(encoding_header + 9, encoding, strlen(encoding));
+	encoding_header[9 + new_len] = '\n';
+	return buf;
+}
+
 static char *logmsg_reencode(const struct commit *commit)
 {
 	char *encoding;
@@ -642,6 +684,8 @@ static char *logmsg_reencode(const struct commit *commit)
 		return NULL;
 	}
 	out = reencode_string(commit->buffer, output_encoding, encoding);
+	out = replace_encoding_header(out, output_encoding);
+
 	free(encoding);
 	if (!out)
 		return NULL;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2006-12-30 22:19     ` Junio C Hamano
@ 2006-12-31  1:13       ` Johannes Schindelin
  2006-12-31  1:45         ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Johannes Schindelin @ 2006-12-31  1:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Costalba, git

Hi,

On Sat, 30 Dec 2006, Junio C Hamano wrote:

> Another thing.  I think it would make sense to remove "encoding" header 
> after pretty_print_commit successfully re-codes the buffer.  An 
> alternative is to rewrite "encoding" header to show which encoding the 
> log now uses (and omit it if it is UTF-8).

I think it would be wrong. Sure, the output may be encoded differently, 
but the _original_ commit was not. And this is the information I want 
to see when I look at the raw commit.

If you do not want to see all headers, you have to choose a different 
--pretty option.

Ciao,
Dscho
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2006-12-31  1:13       ` Johannes Schindelin
@ 2006-12-31  1:45         ` Junio C Hamano
  2006-12-31 11:45           ` Marco Costalba
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2006-12-31  1:45 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Marco Costalba, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Sat, 30 Dec 2006, Junio C Hamano wrote:
>
>> Another thing.  I think it would make sense to remove "encoding" header 
>> after pretty_print_commit successfully re-codes the buffer.  An 
>> alternative is to rewrite "encoding" header to show which encoding the 
>> log now uses (and omit it if it is UTF-8).
>
> I think it would be wrong. Sure, the output may be encoded differently, 
> but the _original_ commit was not. And this is the information I want 
> to see when I look at the raw commit.

When you want to see the raw commit, you would not ask for it to
re-code, so "removal after successfully re-codes" would not kick
in (if you _really_ want to look at the raw commit, I guess
cat-file can help, but let's not go there).  Re-coding the
message but still showing what the original encoding was does
not sound making much sense to me.

I've pushed this out after a small rework.

The rule is:

 - if you ask for re-coding (either by i18n.* configuration or
   an explicit --encoding option from the command line), and if
   re-coding successfully does its job, you do not see
   "encoding" header;

 - if the buffer cannot successfully be re-coded, no re-coding
   is done, and the caller can inspect "encoding" header.

 - if you do not ask for re-coding, "encoding" header is left as
   is, so is the commit log message.  The caller can deal with
   any re-coding itself.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2006-12-31  1:45         ` Junio C Hamano
@ 2006-12-31 11:45           ` Marco Costalba
  2006-12-31 15:27             ` Johannes Schindelin
  0 siblings, 1 reply; 26+ messages in thread
From: Marco Costalba @ 2006-12-31 11:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

I'm sorry to reply only now but I have problems with my internet
connection, I am also not able to test Junio patches now.

Regarding qgit parsing 'bug' I would like to point out something that
probably is not clear.

1) Parsing routine _must_ be able to sustain the loading of more then
40000 revisions in a couple of seconds, so must be very quick. A lot
of effort has been put to index the header info at maximum speed. Now
it takes about 300ms to parse the whole linux tree. You can have this
only if the header format is 'fixed enough', it means that you would
not expect whole new lines (new '\n' chars) to appear from nowhere in
header, with the exception of log message and parents info lines of
course.

2) I _have_ to rely on headers info because that's where I get data to
show the user, that's the whole point of calling git-rev-list with
--headers option.

3) The rule of double '\n\n' as an indicator of the start of log
message it's not only very slow, it's also broken in cases of no log,
see 7b7abfe3dd81d in Linux tree.

4) We are talking of keeping back compatibility, _fixing_ qgit does
not solves the issues with current qgit users.

So please I really would ask again to *do not print that encoding
extra line if not requested to do so*

Thanks again
Marco

P.S: One way to really speed up the indexing would be if git-rev-list
prints at the beginning the length of the record, i.e. the distance
from the next '\0' terminating point. I don't know if this info is
already available to git-rev-list before to print the record or has to
find it anyway, in the latter case there is no point in doing this.
But in the former case it would be possible to avoid a costly
find('\0'), starting from the beginning of log message. Log message is
more then the half of all the record length so it would be possible to
avoid a good amount of unuseful work because only the position of the
next '\0' is needed during parsing, not the log message, that is
retrieved and shown to the user only on demand, i.e. only for the
selected revision.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2006-12-31 11:45           ` Marco Costalba
@ 2006-12-31 15:27             ` Johannes Schindelin
  2006-12-31 15:43               ` Marco Costalba
  0 siblings, 1 reply; 26+ messages in thread
From: Johannes Schindelin @ 2006-12-31 15:27 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Junio C Hamano, git

Hi,

On Sun, 31 Dec 2006, Marco Costalba wrote:

> Regarding qgit parsing 'bug' I would like to point out something that
> probably is not clear.

But it _is_ a bug!

> 1) Parsing routine _must_ be able to sustain the loading of more then
> 40000 revisions in a couple of seconds, so must be very quick. A lot
> of effort has been put to index the header info at maximum speed. Now
> it takes about 300ms to parse the whole linux tree. You can have this
> only if the header format is 'fixed enough', it means that you would
> not expect whole new lines (new '\n' chars) to appear from nowhere in
> header, with the exception of log message and parents info lines of
> course.

I don't see why this would slow down parsing _at all_. Besides, you should 
really stop relying on the header format being fixed for all eternity. 
There has been talk about putting more useful information into the header, 
and there _are_ valid reasons to keep the header extensible.

Further, if you rely on parsing being super-fast, why not just parse 
_only_ the header information that you actually need? The header still 
consists of

	- exactly one "tree",
	- an arbitrary amount of "parent" lines,
	- exactly one "author", and
	- exactly one "committer" line

After that may come optional headers, but by that time you should 
_already_ have stopped parsing! And the order is fixed already 
(parse_commit_buffer() relies on it).

After all, you have an initial parsing for the purpose of organizing the 
commits, and you can have _another_ for the purpose of displaying the 
message (you can remember the offset where the first parsing stopped to 
accelerate the second). The latter parsing should be done individually, 
when displaying the commit.

And I still have to disagree with Junio that the encoding header is no 
longer needed when displaying the commit message. The "tree" and "parent" 
headers are also displayed, even if their information is already used for 
purposes of displaying them.

The commit header contains information about that particular commit, and 
if I ask to see the headers, I want to see them, and not be treated like 
an idiot who does not know how to handle that information.

(If I ask for git-log to show everything encoded in Latin-1, it might 
still be interesting to know who used which encoding. And if it is 
displayed in my local encoding, but the commit header says UTF-8, I _do_ 
know that this is the original encoding, not the displayed one, thank you 
very much!)

So please, Marco, fix that bug in qgit. Otherwise you will restrict our 
ability to enhance commit objects with useful meta information _anyway_. 
IOW, even if the encoding header is not shown (which I would not like), 
you should fix that bug.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2006-12-31 15:27             ` Johannes Schindelin
@ 2006-12-31 15:43               ` Marco Costalba
  2007-01-01  3:21                 ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Marco Costalba @ 2006-12-31 15:43 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

On 12/31/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> Further, if you rely on parsing being super-fast, why not just parse
> _only_ the header information that you actually need? The header still
> consists of
>
>         - exactly one "tree",
>         - an arbitrary amount of "parent" lines,
>         - exactly one "author", and
>         - exactly one "committer" line
>
> After that may come optional headers,

If you intorduce the concept of an 'optional header part' you
logically and naturally _may_ also introduce the concept of disabling
the display of _that_ optional header, or better, to keep back
compatibility (I just remaind that's more then one year and an half
that git-rev-list works in that way!) you may accept the concept of an
option to show the additional optional header part. That's just what
I'm asking, no more no less.

>but by that time you should
> _already_ have stopped parsing! And the order is fixed already
> (parse_commit_buffer() relies on it).
>
> After all, you have an initial parsing for the purpose of organizing the
> commits, and you can have _another_ for the purpose of displaying the
> message (you can remember the offset where the first parsing stopped to
> accelerate the second). The latter parsing should be done individually,
> when displaying the commit.
>

The problem with your proposed algorithm is that you don't have _one_
commit but a sequence of commits to parse, so when you have parsed
until the committer line you must need to know where the next commit
starts, IOW you have to find the next '\0', that's what I was trying
to expose in my previous e-mail postscriptum.

Thanks
Marco

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2006-12-31 15:43               ` Marco Costalba
@ 2007-01-01  3:21                 ` Junio C Hamano
  2007-01-02 21:32                   ` Johannes Schindelin
  2007-01-03  9:21                   ` Marco Costalba
  0 siblings, 2 replies; 26+ messages in thread
From: Junio C Hamano @ 2007-01-01  3:21 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git, Johannes Schindelin

"Marco Costalba" <mcostalba@gmail.com> writes:

> On 12/31/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>>
>> Further, if you rely on parsing being super-fast, why not just parse
>> _only_ the header information that you actually need? The header still
>> consists of
>>
>>         - exactly one "tree",
>>         - an arbitrary amount of "parent" lines,
>>         - exactly one "author", and
>>         - exactly one "committer" line
>>
>> After that may come optional headers,

They are more like 'other' headers.  Nobody said the set of
headers are cast in stone forever.  The only things parsers
safely can assume are that the original four kinds come at the
beginning in the above order, and there is a blank line that
separates headers and the body.

> If you intorduce the concept of an 'optional header part' you
> logically and naturally _may_ also introduce the concept of disabling
> the display of _that_ optional header, or better, to keep back
> compatibility...

While I am somewhat sympathetic, and am willing to apologize for
trying to advance the i18n support without enough advance
warning, I think you already know what you are saying does not
make much sense in the larger picture and as the longer term
solution.  Does any MUA ask the filesystem, POP3 server or IMAP
server not to give X-* headers?

We could declare "headers are cast in stone and we will not
enhance it in any way forever", and go back to my original hack
to use hidden trailer, but I do not think it would solve
anything.  Porcelains that would try to take advantage of the
trailer would now start assuming incorrectly that the set of
trailers are cast in stone and will break when new information
is added to the trailer, which would bring us back to exactly
where we are now.

Having said that, I think what is in the current tip of 'master'
is of much less impact for normal repositories than the one that
bit you, in two aspects.

 * Your sample commit had "encoding UTF-8" header, presumably
   because the repository that the commit was created in had
   core.commitencoding set to "UTF-8".  The intent was not to
   add anything when the log message is UTF-8, but the earlier
   code was checking only for "utf-8".  With the current tip of
   master this should not happen anymore.

 * When the output encoding conversion is done successfully, the
   current tip of master drops "encoding" header from the
   output, so in your sample situation where the commit has
   (incorrectly) "encoding UTF-8" and your output encoding is
   also "UTF-8" (because either you have core.commitencoding set
   to it, or you do not have the configuration and let git fall
   back to its default which is UTF-8), you would not see the
   "encoding" header in the output from the rev-list.

The reason we did the latter, by the way, does not have anything
to do with helping broken parsers.  We drop the header after
re-coding the log message into an encoding specified by the user
(which is presumably different from what the commit was
originally recorded in) because the encoding recorded on
"encoding" header would not match the re-coded log message
anymore.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2007-01-01  3:21                 ` Junio C Hamano
@ 2007-01-02 21:32                   ` Johannes Schindelin
  2007-01-02 22:13                     ` Junio C Hamano
  2007-01-03  9:21                   ` Marco Costalba
  1 sibling, 1 reply; 26+ messages in thread
From: Johannes Schindelin @ 2007-01-02 21:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Costalba, git

Hi,

On Sun, 31 Dec 2006, Junio C Hamano wrote:

> "Marco Costalba" <mcostalba@gmail.com> writes:
> 
> > On 12/31/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> >>
> >> Further, if you rely on parsing being super-fast, why not just parse
> >> _only_ the header information that you actually need? The header still
> >> consists of
> >>
> >>         - exactly one "tree",
> >>         - an arbitrary amount of "parent" lines,
> >>         - exactly one "author", and
> >>         - exactly one "committer" line
> >>
> >> After that may come optional headers,
> 
> They are more like 'other' headers.

I should have been more clear: optional for the committer.

> > If you intorduce the concept of an 'optional header part' you
> > logically and naturally _may_ also introduce the concept of disabling
> > the display of _that_ optional header, or better, to keep back
> > compatibility...
> 
> While I am somewhat sympathetic, and am willing to apologize for
> trying to advance the i18n support without enough advance
> warning, I think you already know what you are saying does not
> make much sense in the larger picture and as the longer term
> solution.

Besides, when you say

> The problem with your proposed algorithm is that you don't have _one_ 
> commit but a sequence of commits to parse, so when you have parsed until 
> the committer line you must need to know where the next commit starts, 
> IOW you have to find the next '\0', that's what I was trying to expose 
> in my previous e-mail postscriptum.

I don't get _at all_ what could be a problem _with_ the encoding header 
that is no problem _without_ it. I assume you want to tell me something 
more than that you do not want to change your code? If so, I missed it.

>  * When the output encoding conversion is done successfully, the
>    current tip of master drops "encoding" header from the
>    output, [...]

Earlier, I said that I do not feel strongly about that issue.

But now I do.

If you drop the "encoding" header from the commit buffer, just because you 
reencoded it to whatever encoding happens to be the one the caller just 
asked for, you are _not_ interpreting the data, but _changing_ it.

That is not what git is about, IMHO. It would be a completely different 
thing if the caller had a way to ask for _specific_ headers, and asks to 
be left alone with all the other cruft. But the caller does not even have 
the chance to say that, let alone ask specifically _for_ it.

The encoding header bears information, just like the tree header or the 
committer header. I find it highly irritating that I am shielded from it. 
The encoding header has _nothing_ to do with the encoding that the output 
is being encoded with, but _all_ with how the commit message was encoded 
_by the committer_.

> The reason we did the latter, by the way, does not have anything
> to do with helping broken parsers.  We drop the header after
> re-coding the log message into an encoding specified by the user
> (which is presumably different from what the commit was
> originally recorded in) because the encoding recorded on
> "encoding" header would not match the re-coded log message
> anymore.

By the same reasoning, you'd have to rewrite the committer line to reflect 
the current GIT_COMMITTER_IDENT, or hide it. If you want to convince me, 
you have to try harder.

And Marco has to fix the header parsing anyway.

So, please, Junio, can you rethink that decision?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2007-01-02 21:32                   ` Johannes Schindelin
@ 2007-01-02 22:13                     ` Junio C Hamano
  2007-01-02 22:28                       ` Johannes Schindelin
  2007-01-02 22:29                       ` Marco Costalba
  0 siblings, 2 replies; 26+ messages in thread
From: Junio C Hamano @ 2007-01-02 22:13 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Marco Costalba, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> The reason we did the latter, by the way, does not have anything
>> to do with helping broken parsers.  We drop the header after
>> re-coding the log message into an encoding specified by the user
>> (which is presumably different from what the commit was
>> originally recorded in) because the encoding recorded on
>> "encoding" header would not match the re-coded log message
>> anymore.
>
> By the same reasoning, you'd have to rewrite the committer line to reflect 
> the current GIT_COMMITTER_IDENT, or hide it. If you want to convince me, 
> you have to try harder.

Sorry, but you completely lost me with that analogy.

I think showing log message in the user's preferred encoding is
more like passing the output to the colorization mechanism and
then to the pager.  We are interacting with humans at that
point, and we are changing the presentation without changing the
semantics of the data.  

I do not see why committer identity needs to be rewritten nor
hidden by the same reasoning.

> And Marco has to fix the header parsing anyway.

No question about that.  If iconv() punts, qgit will see
"encoding" header to deal with even when the re-coding is in
effect.  I think it may be a sensible thing for qgit to replace
the log message and show "log message in this encoding, which
cannot be shown in this window" instead in such a case, but that
is up to Porcelain.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2007-01-02 22:13                     ` Junio C Hamano
@ 2007-01-02 22:28                       ` Johannes Schindelin
  2007-01-02 22:29                       ` Marco Costalba
  1 sibling, 0 replies; 26+ messages in thread
From: Johannes Schindelin @ 2007-01-02 22:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Costalba, git

Hi,

On Tue, 2 Jan 2007, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> >> The reason we did the latter, by the way, does not have anything
> >> to do with helping broken parsers.  We drop the header after
> >> re-coding the log message into an encoding specified by the user
> >> (which is presumably different from what the commit was
> >> originally recorded in) because the encoding recorded on
> >> "encoding" header would not match the re-coded log message
> >> anymore.
> >
> > By the same reasoning, you'd have to rewrite the committer line to reflect 
> > the current GIT_COMMITTER_IDENT, or hide it. If you want to convince me, 
> > you have to try harder.
> 
> Sorry, but you completely lost me with that analogy.
> 
> I think showing log message in the user's preferred encoding is
> more like passing the output to the colorization mechanism and
> then to the pager.  We are interacting with humans at that
> point, and we are changing the presentation without changing the
> semantics of the data.  
> 
> I do not see why committer identity needs to be rewritten nor
> hidden by the same reasoning.
> 
> > And Marco has to fix the header parsing anyway.
> 
> No question about that.  If iconv() punts, qgit will see
> "encoding" header to deal with even when the re-coding is in
> effect.  I think it may be a sensible thing for qgit to replace
> the log message and show "log message in this encoding, which
> cannot be shown in this window" instead in such a case, but that
> is up to Porcelain.

Ah! Now I get your reasoning. But it is wrong. You are misusing headers -- 
which should be metadata describing the commit -- to pass a vital 
information to a porcelain: "re-encoding failed, please have a try 
yourself".

But the encoding header describes a certain aspect of the commit object: 
how it was _originally_ encoded.

It's just like mails: often I look at the X-Mailer: header to find out if 
the sender by any chance used Windows, or Mac, because it may help me help 
the sender when incomplete information was provided.

And just like with the mail, I would not like the tool to _hide_ the 
headers from me when I ask for them, just because it happened to use them 
already to display the characters correctly.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2007-01-02 22:13                     ` Junio C Hamano
  2007-01-02 22:28                       ` Johannes Schindelin
@ 2007-01-02 22:29                       ` Marco Costalba
  1 sibling, 0 replies; 26+ messages in thread
From: Marco Costalba @ 2007-01-02 22:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

On 1/2/07, Junio C Hamano <junkio@cox.net> wrote:
>
> > And Marco has to fix the header parsing anyway.
>
> No question about that.  If iconv() punts, qgit will see
> "encoding" header to deal with even when the re-coding is in
> effect.  I think it may be a sensible thing for qgit to replace
> the log message and show "log message in this encoding, which
> cannot be shown in this window" instead in such a case, but that
> is up to Porcelain.
>

Yes, git-rev-list output is stored in memory as a big chunk of row
bytes, conversion
to utf8 (the internal format used by Qt string class) is done _only_
for displayed items, just
before to show them to user so is not performance critical. I agree
with Junio, that info could be used to teach Qt string conversion
function how to handle the data. Something like:

         convertWithCodec(const char* data, codec* codecName)

So perhaps if Qt is better then iconv we could display a good log
message anyway.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2007-01-01  3:21                 ` Junio C Hamano
  2007-01-02 21:32                   ` Johannes Schindelin
@ 2007-01-03  9:21                   ` Marco Costalba
  2007-01-03 10:10                     ` Junio C Hamano
  2007-01-03 10:21                     ` Lars Hjemli
  1 sibling, 2 replies; 26+ messages in thread
From: Marco Costalba @ 2007-01-03  9:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin

On 1/1/07, Junio C Hamano <junkio@cox.net> wrote:
> "Marco Costalba" <mcostalba@gmail.com> writes:
>
> > On 12/31/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> >>
> >> Further, if you rely on parsing being super-fast, why not just parse
> >> _only_ the header information that you actually need? The header still
> >> consists of
> >>
> >>         - exactly one "tree",
> >>         - an arbitrary amount of "parent" lines,
> >>         - exactly one "author", and
> >>         - exactly one "committer" line
> >>
> >> After that may come optional headers,
>
> They are more like 'other' headers.  Nobody said the set of
> headers are cast in stone forever.  The only things parsers
> safely can assume are that the original four kinds come at the
> beginning in the above order, and there is a blank line that
> separates headers and the body.
>

I'm cooking the qgit parser fix, please confirm the following
assumption is correct:

	When git-rev-list is called with --header option, after the first
	line with the commit sha, the following information is produced
	
	- one line with "tree"
	- an arbitrary amount of "parent" lines
	- one line with "author"
	- one line with "committer"
	- zero or more *non blank* lines with other info, as the encoding
	- one blank line
	- zero or one line with log title
	- zero or more lines with log message
	- a terminating '\0'


Thanks
Marco

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2007-01-03  9:21                   ` Marco Costalba
@ 2007-01-03 10:10                     ` Junio C Hamano
  2007-01-03 10:21                     ` Lars Hjemli
  1 sibling, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2007-01-03 10:10 UTC (permalink / raw)
  To: git

"Marco Costalba" <mcostalba@gmail.com> writes:

> On 1/1/07, Junio C Hamano <junkio@cox.net> wrote:
>> "Marco Costalba" <mcostalba@gmail.com> writes:
>>
>> > On 12/31/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>> >>
>> >> Further, if you rely on parsing being super-fast, why not just parse
>> >> _only_ the header information that you actually need? The header still
>> >> consists of
>> >>
>> >>         - exactly one "tree",
>> >>         - an arbitrary amount of "parent" lines,
>> >>         - exactly one "author", and
>> >>         - exactly one "committer" line
>> >>
>> >> After that may come optional headers,
>>
>> They are more like 'other' headers.  Nobody said the set of
>> headers are cast in stone forever.  The only things parsers
>> safely can assume are that the original four kinds come at the
>> beginning in the above order, and there is a blank line that
>> separates headers and the body.
>>
>
> I'm cooking the qgit parser fix, please confirm the following
> assumption is correct:
>
> 	When git-rev-list is called with --header option, after the first
> 	line with the commit sha, the following information is produced
> 	
> 	- one line with "tree"
> 	- an arbitrary amount of "parent" lines
> 	- one line with "author"
> 	- one line with "committer"
> 	- zero or more *non blank* lines with other info, as the encoding
> 	- one blank line
> 	- zero or one line with log title
> 	- zero or more lines with log message
> 	- a terminating '\0'

I think that is correct, except that rev-list does not even care
the distinction between log title and log message.

I do not think of a reason to change the order of the first four
offhand, so while it might be prudent to prepare the code to
accept them in any order if it is not too much trouble from
purist point of view, I do not think it is necessary in practice
and you should be able to depend on the order in which they
appear.  I do not think it would hurt anybody if we right now
declared that other extra headers people have talked about (such
as "notes") should always come after the first four.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2007-01-03  9:21                   ` Marco Costalba
  2007-01-03 10:10                     ` Junio C Hamano
@ 2007-01-03 10:21                     ` Lars Hjemli
  2007-01-03 10:35                       ` Marco Costalba
  2007-01-03 10:38                       ` Lars Hjemli
  1 sibling, 2 replies; 26+ messages in thread
From: Lars Hjemli @ 2007-01-03 10:21 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Junio C Hamano, git, Johannes Schindelin

On 1/3/07, Marco Costalba <mcostalba@gmail.com> wrote:
>         - one blank line
>         - zero or one line with log title
>         - zero or more lines with log message
>         - a terminating '\0'

I think the should be:
  -zero or more blank lines
  -zero or more non-blank lines with log title
  -zero or more blank lines
  -zero or more lines with log message
  -a terminating '\0'

...which implies that all of these rules (except the last) are
optional and only triggers if the _previous_ rule triggered.

In other words:

  commit ::= header [blank+ [title+ [blank+ text*]]] '\0'
  header ::= tree parent* author committer

-- 
larsh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2007-01-03 10:21                     ` Lars Hjemli
@ 2007-01-03 10:35                       ` Marco Costalba
  2007-01-03 11:14                         ` Lars Hjemli
  2007-01-04 15:18                         ` Andreas Ericsson
  2007-01-03 10:38                       ` Lars Hjemli
  1 sibling, 2 replies; 26+ messages in thread
From: Marco Costalba @ 2007-01-03 10:35 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: Junio C Hamano, git, Johannes Schindelin

On 1/3/07, Lars Hjemli <hjemli@gmail.com> wrote:
> On 1/3/07, Marco Costalba <mcostalba@gmail.com> wrote:
> >         - one blank line
> >         - zero or one line with log title
> >         - zero or more lines with log message
> >         - a terminating '\0'
>
> I think the should be:
>   -zero or more blank lines

Isn't it zero or _one_ blank line? Why more then one? would be it
useful? surely is slower to parse.

>   -zero or more non-blank lines with log title

multi lines titles are allowed? never saw one of them.

>   -zero or more blank lines

Why? this is necessary only to disambiguate muti (non blank) lines
titles, but as Junio pointed out distinction between log title and log
message is only in the Porcelain, not encoded in git. So the Porcalain
is going to show _one_line title if any an the remaining stuff in the
log message.

>   -zero or more lines with log message
>   -a terminating '\0'
>

 Marco

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2007-01-03 10:35                       ` Marco Costalba
@ 2007-01-03 11:14                         ` Lars Hjemli
  2007-01-04 15:21                           ` Andreas Ericsson
  2007-01-04 15:18                         ` Andreas Ericsson
  1 sibling, 1 reply; 26+ messages in thread
From: Lars Hjemli @ 2007-01-03 11:14 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Junio C Hamano, git, Johannes Schindelin

On 1/3/07, Marco Costalba <mcostalba@gmail.com> wrote:
> On 1/3/07, Lars Hjemli <hjemli@gmail.com> wrote:
> > On 1/3/07, Marco Costalba <mcostalba@gmail.com> wrote:
> > >         - one blank line
> > >         - zero or one line with log title
> > >         - zero or more lines with log message
> > >         - a terminating '\0'
> >
> > I think the should be:
> >   -zero or more blank lines
>
> Isn't it zero or _one_ blank line? Why more then one? would be it
> useful? surely is slower to parse.

My point is just that the log message is unformatted. There is nothing
stopping you from
adding blank lines at the start of the message.

But then I tested it, and it seems as 'git-commit' strips off any
leading blank lines (but git-commit-tree does not). So expecting one
blank _might_ be good enough.

>
> >   -zero or more non-blank lines with log title
>
> multi lines titles are allowed? never saw one of them.

It is allowed (I tested this)

>
> >   -zero or more blank lines
>
> Why? this is necessary only to disambiguate muti (non blank) lines
> titles, but as Junio pointed out distinction between log title and log
> message is only in the Porcelain, not encoded in git. So the Porcalain
> is going to show _one_line title if any an the remaining stuff in the
> log message.

Well, if the porcelain chooses to do so, it probably is ok. But I
think the parsing/presentation of the log title/message would be more
correct/better formatted if the parser is more "adaptive" to the
content.

-- 
larsh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2007-01-03 11:14                         ` Lars Hjemli
@ 2007-01-04 15:21                           ` Andreas Ericsson
  0 siblings, 0 replies; 26+ messages in thread
From: Andreas Ericsson @ 2007-01-04 15:21 UTC (permalink / raw)
  To: Lars Hjemli; +Cc: Marco Costalba, Junio C Hamano, git, Johannes Schindelin

Lars Hjemli wrote:
> On 1/3/07, Marco Costalba <mcostalba@gmail.com> wrote:
>> On 1/3/07, Lars Hjemli <hjemli@gmail.com> wrote:
>> > On 1/3/07, Marco Costalba <mcostalba@gmail.com> wrote:
>> > >         - one blank line
>> > >         - zero or one line with log title
>> > >         - zero or more lines with log message
>> > >         - a terminating '\0'
>> >
>> > I think the should be:
>> >   -zero or more blank lines
>>
>> Isn't it zero or _one_ blank line? Why more then one? would be it
>> useful? surely is slower to parse.
> 
> My point is just that the log message is unformatted. There is nothing
> stopping you from
> adding blank lines at the start of the message.
> 
> But then I tested it, and it seems as 'git-commit' strips off any
> leading blank lines (but git-commit-tree does not). So expecting one
> blank _might_ be good enough.
> 

Other scm's from which we can import repositories have no such 
mechanisms though. It would be nice to have this functionality in 
rev-list. I'm looking into it at the moment.

> 
>>
>> >   -zero or more blank lines
>>
>> Why? this is necessary only to disambiguate muti (non blank) lines
>> titles, but as Junio pointed out distinction between log title and log
>> message is only in the Porcelain, not encoded in git. So the Porcalain
>> is going to show _one_line title if any an the remaining stuff in the
>> log message.
> 
> Well, if the porcelain chooses to do so, it probably is ok. But I
> think the parsing/presentation of the log title/message would be more
> correct/better formatted if the parser is more "adaptive" to the
> content.
> 

I disagree. If the title spans over more than one line, it's most likely 
because it's too long to fit in one, in which case it won't look good 
(or even be viewable in qgit / gitk) when printed anyways.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2007-01-03 10:35                       ` Marco Costalba
  2007-01-03 11:14                         ` Lars Hjemli
@ 2007-01-04 15:18                         ` Andreas Ericsson
  1 sibling, 0 replies; 26+ messages in thread
From: Andreas Ericsson @ 2007-01-04 15:18 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Lars Hjemli, Junio C Hamano, git, Johannes Schindelin

Marco Costalba wrote:
> On 1/3/07, Lars Hjemli <hjemli@gmail.com> wrote:
>> On 1/3/07, Marco Costalba <mcostalba@gmail.com> wrote:
>> >         - one blank line
>> >         - zero or one line with log title
>> >         - zero or more lines with log message
>> >         - a terminating '\0'
>>
>> I think the should be:
>>   -zero or more blank lines
> 
> Isn't it zero or _one_ blank line? Why more then one? would be it
> useful? surely is slower to parse.
> 

I've imported a repository from CVS to git (the nagiosplugin repo), 
where one author consistently put an empty line before the "commit 
subject". I'm fairly certain this is done by the tool he uses (some GUI 
thing, no doubt).

I'm currently looking into stripping optional empty lines appearing 
postheaders-prebody from rev-list output, as it completely fudges qgit 
and gitk viewing alike. Good thing the author in question only does 
translations and no code work, or I'd be screaming with frustration 
trying to figure out which commit does what. Mind you, CVS's lack of 
forensic tools doesn't exactly inspire the clear and concise commit 
messages we see in git, but still...

>>   -zero or more non-blank lines with log title
> 
> multi lines titles are allowed? never saw one of them.
> 

They aren't disallowed. The log message part only by convention consists 
of a title line followed by a blank and then the message body. It's 
entirely possible to just start writing the message straight off, or put 
any number of blank lines wherever.

>>   -zero or more blank lines
> 
> Why? this is necessary only to disambiguate muti (non blank) lines
> titles, but as Junio pointed out distinction between log title and log
> message is only in the Porcelain, not encoded in git. So the Porcalain
> is going to show _one_line title if any an the remaining stuff in the
> log message.
> 

Sensible. There's no (reliable) way of knowing if the first line with 
text on is actually a title or just the start of a 25-line sentence, or 
even the signoff line, or completely empty.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2007-01-03 10:21                     ` Lars Hjemli
  2007-01-03 10:35                       ` Marco Costalba
@ 2007-01-03 10:38                       ` Lars Hjemli
  1 sibling, 0 replies; 26+ messages in thread
From: Lars Hjemli @ 2007-01-03 10:38 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Junio C Hamano, git, Johannes Schindelin

On 1/3/07, Lars Hjemli <hjemli@gmail.com> wrote:
>   commit ::= header [blank+ [title+ [blank+ text*]]] '\0'
>   header ::= tree parent* author committer

Sorry, this should have been

   commit ::= header [blank+ [title+ [blank+ text*]]] '\0'
   header ::= tree parent* author committer other_header*

-- 
larsh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Possible regression in git-rev-list --header
  2006-12-30 20:20   ` Junio C Hamano
  2006-12-30 22:19     ` Junio C Hamano
@ 2006-12-31  0:37     ` Johannes Schindelin
  1 sibling, 0 replies; 26+ messages in thread
From: Johannes Schindelin @ 2006-12-31  0:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Costalba, git

Hi,

On Sat, 30 Dec 2006, Junio C Hamano wrote:

> commit-tree: cope with different ways "utf-8" can be spelled.

Does iconv handle these? If not, we should reset git_commit_encoding.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] Move commit reencoding parameter parsing to revision.c
  2006-12-30 18:57 ` Johannes Schindelin
  2006-12-30 20:20   ` Junio C Hamano
@ 2006-12-30 20:22   ` Junio C Hamano
  2006-12-31  1:10     ` Johannes Schindelin
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2006-12-30 20:22 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git, Johannes Schindelin

Also I think we need this patch for completeness on top of what
we have in 'master'.

-- >8 --
[PATCH] Move commit reencoding parameter parsing to revision.c

This way, git-rev-list and git-diff-tree with --pretty can use
it.  Because they are both plumbing, we default not to do any
conversion unless --encoding=<encoding> is explicitly given from
the command line.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 Documentation/git-rev-list.txt   |    1 +
 Documentation/pretty-formats.txt |    8 ++++++++
 builtin-diff-tree.c              |    6 ++++++
 builtin-rev-list.c               |    6 ++++++
 revision.c                       |    8 ++++++++
 5 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-rev-list.txt b/Documentation/git-rev-list.txt
index 9e0dcf8..86c94e7 100644
--- a/Documentation/git-rev-list.txt
+++ b/Documentation/git-rev-list.txt
@@ -21,6 +21,7 @@ SYNOPSIS
 	     [ \--stdin ]
 	     [ \--topo-order ]
 	     [ \--parents ]
+	     [ \--encoding[=<encoding>] ]
 	     [ \--(author|committer|grep)=<pattern> ]
 	     [ [\--objects | \--objects-edge] [ \--unpacked ] ]
 	     [ \--pretty | \--header ]
diff --git a/Documentation/pretty-formats.txt b/Documentation/pretty-formats.txt
index 996f628..8cba13f 100644
--- a/Documentation/pretty-formats.txt
+++ b/Documentation/pretty-formats.txt
@@ -76,3 +76,11 @@ displayed in full, regardless of whether --abbrev or
 --no-abbrev are used, and 'parents' information show the
 true parent commits, without taking grafts nor history
 simplification into account.
+
+
+--encoding[=<encoding>]::
+	The commit objects record the encoding used for the log message
+	in their encoding header; this option can be used to tell the
+	command to re-code the commit log message in the encoding
+	preferred by the user.  For non plumbing commands this
+	defaults to UTF-8.
diff --git a/builtin-diff-tree.c b/builtin-diff-tree.c
index 24cb2d7..6ce2c0f 100644
--- a/builtin-diff-tree.c
+++ b/builtin-diff-tree.c
@@ -67,6 +67,12 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 	static struct rev_info *opt = &log_tree_opt;
 	int read_stdin = 0;
 
+	/* default to turn the conversion off -- diff-tree is not a
+	 * Porcelain but as low plumbing as it can go, as far as
+	 * two-tree comparison is concerned.
+	 */
+	git_log_output_encoding = "";
+
 	init_revisions(opt, prefix);
 	git_config(git_default_config); /* no "diff" UI options */
 	nr_sha1 = 0;
diff --git a/builtin-rev-list.c b/builtin-rev-list.c
index 1bb3a06..0c3dce6 100644
--- a/builtin-rev-list.c
+++ b/builtin-rev-list.c
@@ -226,6 +226,12 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	int i;
 	int read_from_stdin = 0;
 
+	/* default to turn the conversion off -- rev-list is not a
+	 * Porcelain but as low plumbing as it can go, as far as
+	 * revision traversal is concerned.
+	 */
+	git_log_output_encoding = "";
+
 	init_revisions(&revs, prefix);
 	revs.abbrev = 0;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
diff --git a/revision.c b/revision.c
index af9f874..6e4ec46 100644
--- a/revision.c
+++ b/revision.c
@@ -1039,6 +1039,14 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, const ch
 				all_match = 1;
 				continue;
 			}
+			if (!strncmp(arg, "--encoding=", 11)) {
+				arg += 11;
+				if (strcmp(arg, "none"))
+					git_log_output_encoding = strdup(arg);
+				else
+					git_log_output_encoding = "";
+				continue;
+			}
 
 			opts = diff_opt_parse(&revs->diffopt, argv+i, argc-i);
 			if (opts > 0) {
-- 
1.5.0.rc0.g6bb1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] Move commit reencoding parameter parsing to revision.c
  2006-12-30 20:22   ` [PATCH] Move commit reencoding parameter parsing to revision.c Junio C Hamano
@ 2006-12-31  1:10     ` Johannes Schindelin
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Schindelin @ 2006-12-31  1:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Costalba, git

Hi,

this patch makes a lot of sense IMHO.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2007-01-04 15:21 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-30 17:56 Possible regression in git-rev-list --header Marco Costalba
2006-12-30 18:30 ` Jakub Narebski
2006-12-30 18:57 ` Johannes Schindelin
2006-12-30 20:20   ` Junio C Hamano
2006-12-30 22:19     ` Junio C Hamano
2006-12-31  1:13       ` Johannes Schindelin
2006-12-31  1:45         ` Junio C Hamano
2006-12-31 11:45           ` Marco Costalba
2006-12-31 15:27             ` Johannes Schindelin
2006-12-31 15:43               ` Marco Costalba
2007-01-01  3:21                 ` Junio C Hamano
2007-01-02 21:32                   ` Johannes Schindelin
2007-01-02 22:13                     ` Junio C Hamano
2007-01-02 22:28                       ` Johannes Schindelin
2007-01-02 22:29                       ` Marco Costalba
2007-01-03  9:21                   ` Marco Costalba
2007-01-03 10:10                     ` Junio C Hamano
2007-01-03 10:21                     ` Lars Hjemli
2007-01-03 10:35                       ` Marco Costalba
2007-01-03 11:14                         ` Lars Hjemli
2007-01-04 15:21                           ` Andreas Ericsson
2007-01-04 15:18                         ` Andreas Ericsson
2007-01-03 10:38                       ` Lars Hjemli
2006-12-31  0:37     ` Johannes Schindelin
2006-12-30 20:22   ` [PATCH] Move commit reencoding parameter parsing to revision.c Junio C Hamano
2006-12-31  1:10     ` Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).