* [PATCH] Reencode committer info to utf-8 before formatting mail header @ 2007-01-12 13:06 David Kågedal 2007-01-12 22:11 ` Junio C Hamano 0 siblings, 1 reply; 18+ messages in thread From: David Kågedal @ 2007-01-12 13:06 UTC (permalink / raw) To: git The add_user_info function formats the commit as a mail message, and uses add_rfc2047 to format the From: line. The add_rfc2047 assumes that the string is encoded as utf-8. --- builtin-mailinfo.c | 2 +- commit.c | 10 +++++++++- utf8.c | 9 +++++++-- utf8.h | 2 +- 4 files changed, 18 insertions(+), 5 deletions(-) I was hit by this problem when working with an old repository where I had used latin1, and I tried to use "git rebase". Another option would have been to use the correct encoding in the RFC2047 header, but this was a quicker solution. diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c index 583da38..3fd8e00 100644 --- a/builtin-mailinfo.c +++ b/builtin-mailinfo.c @@ -513,7 +513,7 @@ static void convert_to_utf8(char *line, char *charset) { static char latin_one[] = "latin1"; char *input_charset = *charset ? charset : latin_one; - char *out = reencode_string(line, metainfo_charset, input_charset); + char *out = reencode_string(line, metainfo_charset, input_charset, NULL); if (!out) die("cannot convert from %s to %s\n", diff --git a/commit.c b/commit.c index 496d37a..8477fa7 100644 --- a/commit.c +++ b/commit.c @@ -486,6 +486,10 @@ static int add_rfc2047(char *buf, const char *line, int len) if (!needquote) return sprintf(buf, "%.*s", len, line); + if (git_commit_encoding) + line = reencode_string(line, "utf-8", + git_commit_encoding, &len); + memcpy(bp, q_utf8, sizeof(q_utf8)-1); bp += sizeof(q_utf8)-1; for (i = 0; i < len; i++) { @@ -501,6 +505,10 @@ static int add_rfc2047(char *buf, const char *line, int len) } memcpy(bp, "?=", 2); bp += 2; + + if (git_commit_encoding) + free((char *)line); + return bp - buf; } @@ -687,7 +695,7 @@ static char *logmsg_reencode(const struct commit *commit) out = strdup(commit->buffer); else out = reencode_string(commit->buffer, - output_encoding, encoding); + output_encoding, encoding, NULL); if (out) out = replace_encoding_header(out, output_encoding); diff --git a/utf8.c b/utf8.c index 7c80eec..ee9f514 100644 --- a/utf8.c +++ b/utf8.c @@ -291,7 +291,7 @@ int is_encoding_utf8(const char *name) * with iconv. If the conversion fails, returns NULL. */ #ifndef NO_ICONV -char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding) +char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding, int *len) { iconv_t conv; size_t insz, outsz, outalloc; @@ -302,7 +302,10 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e conv = iconv_open(out_encoding, in_encoding); if (conv == (iconv_t) -1) return NULL; - insz = strlen(in); + if (len) + insz = *len; + else + insz = strlen(in); outsz = insz; outalloc = outsz + 1; /* for terminating NUL */ out = xmalloc(outalloc); @@ -332,6 +335,8 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e } else { *outpos = '\0'; + if (len) + *len = outpos - out; break; } } diff --git a/utf8.h b/utf8.h index a07c5a8..eb64d46 100644 --- a/utf8.h +++ b/utf8.h @@ -8,7 +8,7 @@ int is_encoding_utf8(const char *name); void print_wrapped_text(const char *text, int indent, int indent2, int len); #ifndef NO_ICONV -char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding); +char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding, int *len); #else #define reencode_string(a,b,c) NULL #endif -- 1.4.4.4.ge10a-dirty -- David Kågedal ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-12 13:06 [PATCH] Reencode committer info to utf-8 before formatting mail header David Kågedal @ 2007-01-12 22:11 ` Junio C Hamano 2007-01-13 1:31 ` Junio C Hamano 2007-01-15 16:53 ` David Kågedal 0 siblings, 2 replies; 18+ messages in thread From: Junio C Hamano @ 2007-01-12 22:11 UTC (permalink / raw) To: David Kågedal; +Cc: git David Kågedal <davidk@lysator.liu.se> writes: > The add_user_info function formats the commit as a mail message, and > uses add_rfc2047 to format the From: line. The add_rfc2047 assumes > that the string is encoded as utf-8. pretty_print_commit() labels the commit log message not just the author name also as UTF-8 when doing plain_non_ascii. It might make more sense to just set the log_output_encoding to be always UTF-8 when generating an e-mail output, in git-format-patch. > diff --git a/utf8.h b/utf8.h > index a07c5a8..eb64d46 100644 > --- a/utf8.h > +++ b/utf8.h > @@ -8,7 +8,7 @@ int is_encoding_utf8(const char *name); > void print_wrapped_text(const char *text, int indent, int indent2, int len); > > #ifndef NO_ICONV > -char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding); > +char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding, int *len); > #else > #define reencode_string(a,b,c) NULL > #endif This feels fishy... ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-12 22:11 ` Junio C Hamano @ 2007-01-13 1:31 ` Junio C Hamano 2007-01-13 1:43 ` Junio C Hamano ` (3 more replies) 2007-01-15 16:53 ` David Kågedal 1 sibling, 4 replies; 18+ messages in thread From: Junio C Hamano @ 2007-01-13 1:31 UTC (permalink / raw) To: David Kågedal; +Cc: git Junio C Hamano <junkio@cox.net> writes: > It might make more sense to just set the log_output_encoding to > be always UTF-8 when generating an e-mail output, in > git-format-patch. Actually, I do not want to be an UTF-8 imperialist, so how about doing this? -- >8 -- Use log output encoding in --pretty=email headers. Private functions add_rfc2047() and pretty_print_commit() assumed they are only emitting UTF-8. Signed-off-by: Junio C Hamano <junkio@cox.net> --- diff --git a/commit.c b/commit.c index 496d37a..9b2b842 100644 --- a/commit.c +++ b/commit.c @@ -464,20 +464,29 @@ static int get_one_line(const char *msg, unsigned long len) return ret; } +/* High bit set, or ISO-2022-INT */ +static int non_ascii(int ch) +{ + ch = (ch & 0xff); + return ((ch & 0x80) || (ch == 0x1b)); +} + static int is_rfc2047_special(char ch) { - return ((ch & 0x80) || (ch == '=') || (ch == '?') || (ch == '_')); + return (non_ascii(ch) || (ch == '=') || (ch == '?') || (ch == '_')); } -static int add_rfc2047(char *buf, const char *line, int len) +static int add_rfc2047(char *buf, const char *line, int len, + const char *encoding) { char *bp = buf; int i, needquote; - static const char q_utf8[] = "=?utf-8?q?"; + char q_encoding[128]; + const char *q_encoding_fmt = "=?%s?q?"; for (i = needquote = 0; !needquote && i < len; i++) { - unsigned ch = line[i]; - if (ch & 0x80) + int ch = line[i]; + if (non_ascii(ch)) needquote++; if ((i + 1 < len) && (ch == '=' && line[i+1] == '?')) @@ -486,8 +495,11 @@ static int add_rfc2047(char *buf, const char *line, int len) if (!needquote) return sprintf(buf, "%.*s", len, line); - memcpy(bp, q_utf8, sizeof(q_utf8)-1); - bp += sizeof(q_utf8)-1; + i = snprintf(q_encoding, sizeof(q_encoding), q_encoding_fmt, encoding); + if (sizeof(q_encoding) < i) + die("Insanely long encoding name %s", encoding); + memcpy(bp, q_encoding, i); + bp += i; for (i = 0; i < len; i++) { unsigned ch = line[i] & 0xFF; if (is_rfc2047_special(ch)) { @@ -505,7 +517,8 @@ static int add_rfc2047(char *buf, const char *line, int len) } static int add_user_info(const char *what, enum cmit_fmt fmt, char *buf, - const char *line, int relative_date) + const char *line, int relative_date, + const char *encoding) { char *date; int namelen; @@ -533,7 +546,8 @@ static int add_user_info(const char *what, enum cmit_fmt fmt, char *buf, filler = ""; strcpy(buf, "From: "); ret = strlen(buf); - ret += add_rfc2047(buf + ret, line, display_name_length); + ret += add_rfc2047(buf + ret, line, display_name_length, + encoding); memcpy(buf + ret, name_tail, namelen - display_name_length); ret += namelen - display_name_length; buf[ret++] = '\n'; @@ -668,21 +682,18 @@ static char *replace_encoding_header(char *buf, char *encoding) return buf; } -static char *logmsg_reencode(const struct commit *commit) +static char *logmsg_reencode(const struct commit *commit, + char *output_encoding) { char *encoding; char *out; - char *output_encoding = (git_log_output_encoding - ? git_log_output_encoding - : git_commit_encoding); + char *utf8 = "utf-8"; - if (!output_encoding) - output_encoding = "utf-8"; - else if (!*output_encoding) + if (!*output_encoding) return NULL; encoding = get_header(commit, "encoding"); if (!encoding) - return NULL; + encoding = utf8; if (!strcmp(encoding, output_encoding)) out = strdup(commit->buffer); else @@ -691,7 +702,8 @@ static char *logmsg_reencode(const struct commit *commit) if (out) out = replace_encoding_header(out, output_encoding); - free(encoding); + if (encoding != utf8) + free(encoding); if (!out) return NULL; return out; @@ -711,8 +723,15 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt, int parents_shown = 0; const char *msg = commit->buffer; int plain_non_ascii = 0; - char *reencoded = logmsg_reencode(commit); + char *reencoded; + char *encoding; + encoding = (git_log_output_encoding + ? git_log_output_encoding + : git_commit_encoding); + if (!encoding) + encoding = "utf-8"; + reencoded = logmsg_reencode(commit, encoding); if (reencoded) msg = reencoded; @@ -738,7 +757,7 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt, i + 1 < len && msg[i+1] == '\n') in_body = 1; } - else if (ch & 0x80) { + else if (non_ascii(ch)) { plain_non_ascii = 1; break; } @@ -797,13 +816,15 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt, offset += add_user_info("Author", fmt, buf + offset, line + 7, - relative_date); + relative_date, + encoding); if (!memcmp(line, "committer ", 10) && (fmt == CMIT_FMT_FULL || fmt == CMIT_FMT_FULLER)) offset += add_user_info("Commit", fmt, buf + offset, line + 10, - relative_date); + relative_date, + encoding); continue; } @@ -826,7 +847,8 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt, int slen = strlen(subject); memcpy(buf + offset, subject, slen); offset += slen; - offset += add_rfc2047(buf + offset, line, linelen); + offset += add_rfc2047(buf + offset, line, linelen, + encoding); } else { memset(buf + offset, ' ', indent); @@ -837,11 +859,17 @@ unsigned long pretty_print_commit(enum cmit_fmt fmt, if (fmt == CMIT_FMT_ONELINE) break; if (subject && plain_non_ascii) { - static const char header[] = - "Content-Type: text/plain; charset=UTF-8\n" + int sz; + char header[512]; + const char *header_fmt = + "Content-Type: text/plain; charset=%s\n" "Content-Transfer-Encoding: 8bit\n"; - memcpy(buf + offset, header, sizeof(header)-1); - offset += sizeof(header)-1; + sz = snprintf(header, sizeof(header), header_fmt, + encoding); + if (sizeof(header) < sz) + die("Encoding name %s too long", encoding); + memcpy(buf + offset, header, sz); + offset += sz; } if (after_subject) { int slen = strlen(after_subject); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-13 1:31 ` Junio C Hamano @ 2007-01-13 1:43 ` Junio C Hamano 2007-01-13 11:19 ` Johannes Schindelin ` (2 more replies) 2007-01-13 11:02 ` Alex Riesen ` (2 subsequent siblings) 3 siblings, 3 replies; 18+ messages in thread From: Junio C Hamano @ 2007-01-13 1:43 UTC (permalink / raw) To: David Kågedal; +Cc: git Side note. The previous patch does not help if your commit were made in non UTF-8 with not too recent git; the code assumes that commit messages without the new "encoding" headers are in UTF-8. We might want to help transitioning people by doing something like this on top of the previous patch. Then when dealing with an ancient commit (sorry, I am not saying commits older than 3 weeks are ancient -- but it will be 6 months from now ;-), you can override that default by setting an environment variable. --- diff --git a/commit.c b/commit.c index 9b2b842..a1b5705 100644 --- a/commit.c +++ b/commit.c @@ -692,8 +692,12 @@ static char *logmsg_reencode(const struct commit *commit, if (!*output_encoding) return NULL; encoding = get_header(commit, "encoding"); - if (!encoding) - encoding = utf8; + if (!encoding) { + if (getenv("GIT_OLD_COMMIT_ENCODING")) + encoding = strdup(getenv("GIT_OLD_COMMIT_ENCODING")); + else + encoding = utf8; + } if (!strcmp(encoding, output_encoding)) out = strdup(commit->buffer); else ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-13 1:43 ` Junio C Hamano @ 2007-01-13 11:19 ` Johannes Schindelin 2007-01-13 17:57 ` Junio C Hamano 2007-01-15 16:58 ` David Kågedal 2007-01-13 12:23 ` Robin Rosenberg 2007-01-15 16:54 ` David Kågedal 2 siblings, 2 replies; 18+ messages in thread From: Johannes Schindelin @ 2007-01-13 11:19 UTC (permalink / raw) To: Junio C Hamano; +Cc: David Kågedal, git Hi, On Fri, 12 Jan 2007, Junio C Hamano wrote: > Side note. The previous patch does not help if your commit were > made in non UTF-8 with not too recent git; the code assumes that > commit messages without the new "encoding" headers are in UTF-8. Why not just use is_utf8() and warn, or error out, if the message is not UTF-8? (I tend towards the erroring out, since this _is_ a new feature, and gives undesired results with "old" commits.) Ciao, Dscho ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-13 11:19 ` Johannes Schindelin @ 2007-01-13 17:57 ` Junio C Hamano 2007-01-15 16:58 ` David Kågedal 1 sibling, 0 replies; 18+ messages in thread From: Junio C Hamano @ 2007-01-13 17:57 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Kågedal, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Why not just use is_utf8() and warn, or error out, if the message is not > UTF-8? (I tend towards the erroring out, since this _is_ a new feature, > and gives undesired results with "old" commits.) That sounds sensible. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-13 11:19 ` Johannes Schindelin 2007-01-13 17:57 ` Junio C Hamano @ 2007-01-15 16:58 ` David Kågedal 2007-01-16 11:41 ` Johannes Schindelin 1 sibling, 1 reply; 18+ messages in thread From: David Kågedal @ 2007-01-15 16:58 UTC (permalink / raw) To: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Hi, > > On Fri, 12 Jan 2007, Junio C Hamano wrote: > >> Side note. The previous patch does not help if your commit were >> made in non UTF-8 with not too recent git; the code assumes that >> commit messages without the new "encoding" headers are in UTF-8. > > Why not just use is_utf8() and warn, or error out, if the message is not > UTF-8? (I tend towards the erroring out, since this _is_ a new feature, > and gives undesired results with "old" commits.) What do you mean? I have an old repository with latin1 commits without any encoding markers. I want to be able to use format-patch from that and at least get a From: line with something readable. You can't just barf and say "This isn't UTF-8, go away". -- David Kågedal ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-15 16:58 ` David Kågedal @ 2007-01-16 11:41 ` Johannes Schindelin 2007-01-16 12:43 ` David Kågedal 0 siblings, 1 reply; 18+ messages in thread From: Johannes Schindelin @ 2007-01-16 11:41 UTC (permalink / raw) To: David Kågedal; +Cc: git [-- Attachment #1: Type: TEXT/PLAIN, Size: 988 bytes --] Hi, On Mon, 15 Jan 2007, David Kågedal wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > On Fri, 12 Jan 2007, Junio C Hamano wrote: > > > >> Side note. The previous patch does not help if your commit were > >> made in non UTF-8 with not too recent git; the code assumes that > >> commit messages without the new "encoding" headers are in UTF-8. > > > > Why not just use is_utf8() and warn, or error out, if the message is not > > UTF-8? (I tend towards the erroring out, since this _is_ a new feature, > > and gives undesired results with "old" commits.) > > What do you mean? I have an old repository with latin1 commits without > any encoding markers. I want to be able to use format-patch from that > and at least get a From: line with something readable. You can't just > barf and say "This isn't UTF-8, go away". So what do you want to do instead? Just pretend that the unrecoded -- Latin-1 encoded -- text is UTF-8? That's plain wrong. Ciao, Dscho ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-16 11:41 ` Johannes Schindelin @ 2007-01-16 12:43 ` David Kågedal 0 siblings, 0 replies; 18+ messages in thread From: David Kågedal @ 2007-01-16 12:43 UTC (permalink / raw) To: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Hi, > > On Mon, 15 Jan 2007, David Kågedal wrote: > >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> >> > On Fri, 12 Jan 2007, Junio C Hamano wrote: >> > >> >> Side note. The previous patch does not help if your commit were >> >> made in non UTF-8 with not too recent git; the code assumes that >> >> commit messages without the new "encoding" headers are in UTF-8. >> > >> > Why not just use is_utf8() and warn, or error out, if the message is not >> > UTF-8? (I tend towards the erroring out, since this _is_ a new feature, >> > and gives undesired results with "old" commits.) >> >> What do you mean? I have an old repository with latin1 commits without >> any encoding markers. I want to be able to use format-patch from that >> and at least get a From: line with something readable. You can't just >> barf and say "This isn't UTF-8, go away". > > So what do you want to do instead? Just pretend that the unrecoded -- > Latin-1 encoded -- text is UTF-8? That's plain wrong. That is what git did before I wrote my patch, so it obviously not what I want. I want to be able to tell git what encoding it is. My patch reused the i18n.commitencoding configuration parameter for that, but Junio is probably right in that that is only meant for new commits, and an evironment variable makes more sense. So just barfing on a commit that isn't utf-8 isn't a complete solution. But maybe there was some context to your comment above that I missed. -- David Kågedal ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-13 1:43 ` Junio C Hamano 2007-01-13 11:19 ` Johannes Schindelin @ 2007-01-13 12:23 ` Robin Rosenberg 2007-01-13 17:54 ` Junio C Hamano 2007-01-15 16:54 ` David Kågedal 2 siblings, 1 reply; 18+ messages in thread From: Robin Rosenberg @ 2007-01-13 12:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: David Kågedal, git lördag 13 januari 2007 02:43 skrev Junio C Hamano: > Side note. The previous patch does not help if your commit were > made in non UTF-8 with not too recent git; the code assumes that > commit messages without the new "encoding" headers are in UTF-8. Wasn't there a repository option, "commitencoding"? I can't see it being used here. I.e., we should err out if the log message is not UTF-8 and the option is not set (giving a message telling the user to set it). If it is set we should consider the repository encoding to be the one and if that too is wrong (only possible to detect for some encodings), just assume iso-8859-1 as anything could in theory be iso-8859-1 encoded. -- robin ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-13 12:23 ` Robin Rosenberg @ 2007-01-13 17:54 ` Junio C Hamano 0 siblings, 0 replies; 18+ messages in thread From: Junio C Hamano @ 2007-01-13 17:54 UTC (permalink / raw) To: Robin Rosenberg; +Cc: David Kågedal, git Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes: > lördag 13 januari 2007 02:43 skrev Junio C Hamano: >> Side note. The previous patch does not help if your commit were >> made in non UTF-8 with not too recent git; the code assumes that >> commit messages without the new "encoding" headers are in UTF-8. > > Wasn't there a repository option, "commitencoding"? I can't see it being > used here. commitencoding is about what encoding the commit newly created in this repository right now should claim to have -- in other words what is fed to commit-tree. We are talking about examining existing commit that might have come from another repository or created some time ago when the repository configuration was set differently. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-13 1:43 ` Junio C Hamano 2007-01-13 11:19 ` Johannes Schindelin 2007-01-13 12:23 ` Robin Rosenberg @ 2007-01-15 16:54 ` David Kågedal 2 siblings, 0 replies; 18+ messages in thread From: David Kågedal @ 2007-01-15 16:54 UTC (permalink / raw) To: git Junio C Hamano <junkio@cox.net> writes: > Side note. The previous patch does not help if your commit were > made in non UTF-8 with not too recent git; the code assumes that > commit messages without the new "encoding" headers are in UTF-8. This was exactly the problem I was trying to solve. > We might want to help transitioning people by doing something > like this on top of the previous patch. Then when dealing with > an ancient commit (sorry, I am not saying commits older than 3 > weeks are ancient -- but it will be 6 months from now ;-), you > can override that default by setting an environment variable. > > --- > diff --git a/commit.c b/commit.c > index 9b2b842..a1b5705 100644 > --- a/commit.c > +++ b/commit.c > @@ -692,8 +692,12 @@ static char *logmsg_reencode(const struct commit *commit, > if (!*output_encoding) > return NULL; > encoding = get_header(commit, "encoding"); > - if (!encoding) > - encoding = utf8; > + if (!encoding) { > + if (getenv("GIT_OLD_COMMIT_ENCODING")) > + encoding = strdup(getenv("GIT_OLD_COMMIT_ENCODING")); > + else > + encoding = utf8; > + } > if (!strcmp(encoding, output_encoding)) > out = strdup(commit->buffer); > else > > > -- David Kågedal ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-13 1:31 ` Junio C Hamano 2007-01-13 1:43 ` Junio C Hamano @ 2007-01-13 11:02 ` Alex Riesen 2007-01-14 0:42 ` Horst H. von Brand 2007-01-13 22:18 ` Junio C Hamano 2007-01-15 16:57 ` David Kågedal 3 siblings, 1 reply; 18+ messages in thread From: Alex Riesen @ 2007-01-13 11:02 UTC (permalink / raw) To: Junio C Hamano; +Cc: David Kågedal, git Junio C Hamano, Sat, Jan 13, 2007 02:31:35 +0100: > +/* High bit set, or ISO-2022-INT */ > +static int non_ascii(int ch) > +{ > + ch = (ch & 0xff); > + return ((ch & 0x80) || (ch == 0x1b)); > +} > + "return (ch & 0x0x80) || (ch & 0xff) == 0x1b;" :) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-13 11:02 ` Alex Riesen @ 2007-01-14 0:42 ` Horst H. von Brand 2007-01-14 19:25 ` Alex Riesen 0 siblings, 1 reply; 18+ messages in thread From: Horst H. von Brand @ 2007-01-14 0:42 UTC (permalink / raw) To: Alex Riesen; +Cc: Junio C Hamano, David Kågedal, git Alex Riesen <fork0@t-online.de> wrote: > Junio C Hamano, Sat, Jan 13, 2007 02:31:35 +0100: > > +/* High bit set, or ISO-2022-INT */ > > +static int non_ascii(int ch) > > +{ > > + ch = (ch & 0xff); > > + return ((ch & 0x80) || (ch == 0x1b)); > > +} > > + > > "return (ch & 0x0x80) || (ch & 0xff) == 0x1b;" :) ^^ Is the same, if ch == 0x9b, it will match the first part anyway. The outer parentesis can (should?) go. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 2654431 Universidad Tecnica Federico Santa Maria +56 32 2654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 2797513 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-14 0:42 ` Horst H. von Brand @ 2007-01-14 19:25 ` Alex Riesen 0 siblings, 0 replies; 18+ messages in thread From: Alex Riesen @ 2007-01-14 19:25 UTC (permalink / raw) To: Horst H. von Brand; +Cc: Junio C Hamano, David Kågedal, git Horst H. von Brand, Sun, Jan 14, 2007 01:42:57 +0100: > Alex Riesen <fork0@t-online.de> wrote: > > Junio C Hamano, Sat, Jan 13, 2007 02:31:35 +0100: > > > +/* High bit set, or ISO-2022-INT */ > > > +static int non_ascii(int ch) > > > +{ > > > + ch = (ch & 0xff); > > > + return ((ch & 0x80) || (ch == 0x1b)); > > > +} > > > + > > > > "return (ch & 0x0x80) || (ch & 0xff) == 0x1b;" :) > ^^ Oops :) > Is the same, if ch == 0x9b, it will match the first part anyway. So it should. 0x9b isn't ASCII. > The outer parentesis can (should?) go. It's "question of style", I'm afraid :) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-13 1:31 ` Junio C Hamano 2007-01-13 1:43 ` Junio C Hamano 2007-01-13 11:02 ` Alex Riesen @ 2007-01-13 22:18 ` Junio C Hamano 2007-01-15 16:57 ` David Kågedal 3 siblings, 0 replies; 18+ messages in thread From: Junio C Hamano @ 2007-01-13 22:18 UTC (permalink / raw) To: git; +Cc: David Kågedal, Johannes Schindelin On this topic, along with the "format-patch" fix (which automatically makes "rebase without --merge" do the right thing because it is "format-patch piped to am" in essence), I have another commit to make "cherry-pick", "rebase --merge" and "commit -c/-C" do the right thing according to the commitencoding specified in the repository the new commit is being created. The issue is that an existing commit might have come from a different repository or from the past when this repository had commitencoding that was different from the current value. Running "cat-file commit" to extract the old commit log message and feeding it directly to create the new commit would not work, because the value of commitencoding in this repository may be different. This should not affect old encoding-unaware setup where people use _only_ a legacy encoding and do not bother to specify any commitencoding. In such a case, both input and output are the same and while we pretend both are UTF-8, we actually do not trigger conversion. To support such a configuration is one reason I did not actually take Johannes's suggestion to error out on an existing commit that does _not_ have encoding header but the contents does not look like a valid UTF-8. The series is currently sitting in 'next'. If people do not see problem with it, I think it should go in v1.5.0. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-13 1:31 ` Junio C Hamano ` (2 preceding siblings ...) 2007-01-13 22:18 ` Junio C Hamano @ 2007-01-15 16:57 ` David Kågedal 3 siblings, 0 replies; 18+ messages in thread From: David Kågedal @ 2007-01-15 16:57 UTC (permalink / raw) To: git Junio C Hamano <junkio@cox.net> writes: > -static int add_rfc2047(char *buf, const char *line, int len) > +static int add_rfc2047(char *buf, const char *line, int len, > + const char *encoding) > { > char *bp = buf; > int i, needquote; > - static const char q_utf8[] = "=?utf-8?q?"; > + char q_encoding[128]; > + const char *q_encoding_fmt = "=?%s?q?"; This goes against the old principle of being forgiving in what you accept, and strict in what you send. The names of the encoding in the headers should probably be normalized before putting them in an e-mail. I.e. we might accept "utf-8", "utf8", "UTF-8", and "UTF8" (this depends on iconv, I suppose), but the RFC2047 encoding should be the one blessed by RFC4027. But I admit that I haven't read the RFC, and I'm writing this offline so I can't check right now. -- David Kågedal ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Reencode committer info to utf-8 before formatting mail header 2007-01-12 22:11 ` Junio C Hamano 2007-01-13 1:31 ` Junio C Hamano @ 2007-01-15 16:53 ` David Kågedal 1 sibling, 0 replies; 18+ messages in thread From: David Kågedal @ 2007-01-15 16:53 UTC (permalink / raw) To: git Junio C Hamano <junkio@cox.net> writes: >> diff --git a/utf8.h b/utf8.h >> index a07c5a8..eb64d46 100644 >> --- a/utf8.h >> +++ b/utf8.h >> @@ -8,7 +8,7 @@ int is_encoding_utf8(const char *name); >> void print_wrapped_text(const char *text, int indent, int indent2, int len); >> >> #ifndef NO_ICONV >> -char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding); >> +char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding, int *len); >> #else >> #define reencode_string(a,b,c) NULL >> #endif > > This feels fishy... I admit that I didn't test-compile with NO_ICONV. -- David Kågedal ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2007-01-16 12:44 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-01-12 13:06 [PATCH] Reencode committer info to utf-8 before formatting mail header David Kågedal 2007-01-12 22:11 ` Junio C Hamano 2007-01-13 1:31 ` Junio C Hamano 2007-01-13 1:43 ` Junio C Hamano 2007-01-13 11:19 ` Johannes Schindelin 2007-01-13 17:57 ` Junio C Hamano 2007-01-15 16:58 ` David Kågedal 2007-01-16 11:41 ` Johannes Schindelin 2007-01-16 12:43 ` David Kågedal 2007-01-13 12:23 ` Robin Rosenberg 2007-01-13 17:54 ` Junio C Hamano 2007-01-15 16:54 ` David Kågedal 2007-01-13 11:02 ` Alex Riesen 2007-01-14 0:42 ` Horst H. von Brand 2007-01-14 19:25 ` Alex Riesen 2007-01-13 22:18 ` Junio C Hamano 2007-01-15 16:57 ` David Kågedal 2007-01-15 16:53 ` David Kågedal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).