From mboxrd@z Thu Jan 1 00:00:00 1970 From: katsu Subject: [PATCH] Fix Q-encoded multi-octet-char split in email. Date: Tue, 3 Jul 2012 10:41:37 +0900 Message-ID: <1341279697-4596-1-git-send-email-gkatsu.ne@gmail.com> Cc: katsu , Takeharu Katsuyama To: git@vger.kernel.org, gitster@pobox.com X-From: git-owner@vger.kernel.org Tue Jul 03 03:50:29 2012 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SlsFz-0002Rp-Qw for gcvg-git-2@plane.gmane.org; Tue, 03 Jul 2012 03:50:24 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933317Ab2GCBuP (ORCPT ); Mon, 2 Jul 2012 21:50:15 -0400 Received: from mx2.avis.ne.jp ([202.247.192.54]:44738 "EHLO mx2.avis.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932139Ab2GCBuM (ORCPT ); Mon, 2 Jul 2012 21:50:12 -0400 X-Greylist: delayed 506 seconds by postgrey-1.27 at vger.kernel.org; Mon, 02 Jul 2012 21:50:12 EDT Received: from negw.ne.njrc.co.jp (f219-42.ip.avis.ne.jp [202.247.219.42]) by mx2.avis.ne.jp (Postfix) with ESMTP id 0D1B97814A; Tue, 3 Jul 2012 10:41:44 +0900 (JST) Received: from alpha.ne.njrc.co.jp (alpha.ne.njrc.co.jp [172.19.33.7]) by negw.ne.njrc.co.jp (unknown) with ESMTP id q631fhKT003843; Tue, 3 Jul 2012 10:41:43 +0900 Received: from localhost (pce-224.ne.njrc.co.jp [172.19.33.224]) by alpha.ne.njrc.co.jp (Postfix) with ESMTP id 2532760B62; Tue, 3 Jul 2012 10:41:43 +0900 (JST) X-Mailer: git-send-email 1.7.9 Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Issue: Email subject written in multi-octet language like japanese cannot be displayed in correct at destinations's email client, because the Q-encoded subject which is longer than 78 octets is split by a octet not by a character at line breaks. e.g.) "=?utf-8?q? [PATCH] ... =E8=83=86=E8=81=A9?=" | V "=?utf-8?q? [PATCH] ... =E8=83=86=E8?=" "=?utf-8?q?=81=A9=?" Changes: Add a judge if a character is an part of utf-8 muti-octet, and split the characters by a character not by a octet at line breaks in function add_rfc2407() in pretty.c. Like following. "=?utf-8?q? [PATCH] ... =E8=83=86?=" "=?utf-8?q?=E8=81=A9=?" Signed-off-by: Takeharu Katsuyama --- pretty.c | 29 ++++++++++++++++++++++++++++- 1 files changed, 28 insertions(+), 1 deletions(-) mode change 100644 => 100755 pretty.c diff --git a/pretty.c b/pretty.c old mode 100644 new mode 100755 index 8b1ea9f..266a8fe --- a/pretty.c +++ b/pretty.c @@ -272,6 +272,12 @@ static void add_rfc2047(struct strbuf *sb, const char *line, int len, static const int max_length = 78; /* per rfc2822 */ int i; int line_len; + int utf_ctr, use_utf; + + if (!strcmp(encoding, "UTF-8") || !strcmp(encoding, "utf-8")) + use_utf = 1; + else + use_utf = 0; /* How many bytes are already used on the current line? */ for (i = sb->len - 1; i >= 0; i--) @@ -293,10 +299,31 @@ needquote: strbuf_grow(sb, len * 3 + strlen(encoding) + 100); strbuf_addf(sb, "=?%s?q?", encoding); line_len += strlen(encoding) + 5; /* 5 for =??q? */ + utf_ctr = 0; for (i = 0; i < len; i++) { unsigned ch = line[i] & 0xFF; - if (line_len >= max_length - 2) { + /* + * Judge if it is an utf-8 char, to avoid inserting newline + * in the middle of utf-8 char code. + */ + if (use_utf) { + if (ch >= 0xC2 && ch <= 0xDF) /* 1'st byte of 2-bytes utf-8 */ + utf_ctr = 1; + else if (ch >= 0xE0 && ch <= 0xEF) /* 3-bytes utf-8 */ + utf_ctr = 2; + else if (ch >= 0xF0 && ch <= 0xF7) /* 4-bytes utf-8 */ + utf_ctr = 3; + else if (ch >= 0xF8 && ch <= 0xFB) /* 5-bytes utf-8 */ + utf_ctr = 4; + else if (ch >= 0xFC && ch <= 0xFD) /* 6-bytes utf-8 */ + utf_ctr = 5; + else if (ch >= 0x80 && ch <= 0xBF) /* 2'nd to 6'th byte of utf-8 */ + utf_ctr--; + else + utf_ctr = 0; + } + if (line_len >= (max_length - 2 - utf_ctr *3)) { strbuf_addf(sb, "?=\n =?%s?q?", encoding); line_len = strlen(encoding) + 5 + 1; /* =??q? plus SP */ } -- 1.7.9