From: Jeff King <peff@peff.net>
To: katsu <gkatsu.ne@gmail.com>
Cc: git@vger.kernel.org, gitster@pobox.com,
Takeharu Katsuyama <tkatsu.ne@gmail.com>
Subject: Re: [PATCH] Fix Q-encoded multi-octet-char split in email.
Date: Tue, 3 Jul 2012 02:35:11 -0400 [thread overview]
Message-ID: <20120703063511.GA16679@sigill.intra.peff.net> (raw)
In-Reply-To: <1341279697-4596-1-git-send-email-gkatsu.ne@gmail.com>
On Tue, Jul 03, 2012 at 10:41:37AM +0900, katsu wrote:
> Issue: Email subject written in multi-octet language like japanese cannot
> be displayed in correct at destinations's email client, because the
> Q-encoded subject which is longer than 78 octets is split by a octet not by
> a character at line breaks.
> e.g.)
> "=?utf-8?q? [PATCH] ... =E8=83=86=E8=81=A9?="
> |
> V
> "=?utf-8?q? [PATCH] ... =E8=83=86=E8?="
> "=?utf-8?q?=81=A9=?"
>
> Changes: Add a judge if a character is an part of utf-8 muti-octet, and
> split the characters by a character not by a octet at line breaks in
> function add_rfc2407() in pretty.c. Like following.
>
> "=?utf-8?q? [PATCH] ... =E8=83=86?="
> "=?utf-8?q?=E8=81=A9=?"
>
> Signed-off-by: Takeharu Katsuyama <tkatsu.ne@gmail.com>
Yeah, we definitely don't handle that properly according to the rfc.
This patch is is going in the right direction, but I have a few
comments:
> --- a/pretty.c
> +++ b/pretty.c
> @@ -272,6 +272,12 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len,
> static const int max_length = 78; /* per rfc2822 */
> int i;
> int line_len;
> + int utf_ctr, use_utf;
> +
> + if (!strcmp(encoding, "UTF-8") || !strcmp(encoding, "utf-8"))
> + use_utf = 1;
> + else
> + use_utf = 0;
Please use is_encoding_utf8, which handles both of these spellings, as
well as "utf8" and "UTF8" (it also handles encoding==NULL; I don't think
that can happen in this code path, but it is nice to be defensive).
> @@ -293,10 +299,31 @@ needquote:
> strbuf_grow(sb, len * 3 + strlen(encoding) + 100);
> strbuf_addf(sb, "=?%s?q?", encoding);
> line_len += strlen(encoding) + 5; /* 5 for =??q? */
> + utf_ctr = 0;
> for (i = 0; i < len; i++) {
> unsigned ch = line[i] & 0xFF;
>
> - if (line_len >= max_length - 2) {
> + /*
> + * Judge if it is an utf-8 char, to avoid inserting newline
> + * in the middle of utf-8 char code.
> + */
> + if (use_utf) {
> + if (ch >= 0xC2 && ch <= 0xDF) /* 1'st byte of 2-bytes utf-8 */
> + utf_ctr = 1;
> + else if (ch >= 0xE0 && ch <= 0xEF) /* 3-bytes utf-8 */
> + utf_ctr = 2;
> + else if (ch >= 0xF0 && ch <= 0xF7) /* 4-bytes utf-8 */
> + utf_ctr = 3;
> + else if (ch >= 0xF8 && ch <= 0xFB) /* 5-bytes utf-8 */
> + utf_ctr = 4;
> + else if (ch >= 0xFC && ch <= 0xFD) /* 6-bytes utf-8 */
> + utf_ctr = 5;
> + else if (ch >= 0x80 && ch <= 0xBF) /* 2'nd to 6'th byte of utf-8 */
> + utf_ctr--;
> + else
> + utf_ctr = 0;
> + }
> + if (line_len >= (max_length - 2 - utf_ctr *3)) {
Can we re-use utf8_width here instead of rewriting these rules?
-Peff
next prev parent reply other threads:[~2012-07-03 6:35 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-03 1:41 [PATCH] Fix Q-encoded multi-octet-char split in email katsu
2012-07-03 6:35 ` Jeff King [this message]
[not found] ` <CAGxub4-9E0W8ZgsPHeTyUyxmPD80LUd7NjSezg5Zt2-nZPBMJA@mail.gmail.com>
2012-07-04 6:44 ` Jeff King
2012-07-18 5:10 ` Junio C Hamano
2012-07-18 7:27 ` Jeff King
2012-07-25 11:10 ` Drew Northup
2012-08-16 21:52 ` Junio C Hamano
2012-07-03 9:52 ` Erik Faye-Lund
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120703063511.GA16679@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=gkatsu.ne@gmail.com \
--cc=tkatsu.ne@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).