git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH v4 08/13] pretty: two phase conversion for non utf-8 commits
Date: Fri, 19 Apr 2013 09:08:47 +1000	[thread overview]
Message-ID: <1366326532-28517-9-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1366326532-28517-1-git-send-email-pclouds@gmail.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 4649 bytes --]

Always assume format_commit_item() takes an utf-8 string for string
handling simplicity (we can handle utf-8 strings, but can't with other
encodings).

If commit message is in non-utf8, or output encoding is not, then the
commit is first converted to utf-8, processed, then output converted
to output encoding. This of course only works with encodings that are
compatible with Unicode.

This also fixes the iso8859-1 test in t6006. It's supposed to create
an iso8859-1 commit, but the commit content in t6006 is in UTF-8.
t6006 is now converted back in UTF-8 (the downside is we can't put
utf-8 strings there anymore).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 pretty.c                   | 24 ++++++++++++++++++++++--
 t/t6006-rev-list-format.sh | 12 ++++++------
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/pretty.c b/pretty.c
index e0f93ba..5947275 100644
--- a/pretty.c
+++ b/pretty.c
@@ -954,7 +954,8 @@ static int format_reflog_person(struct strbuf *sb,
 	return format_person_part(sb, part, ident, strlen(ident), dmode);
 }
 
-static size_t format_commit_one(struct strbuf *sb, const char *placeholder,
+static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
+				const char *placeholder,
 				void *context)
 {
 	struct format_commit_context *c = context;
@@ -1193,7 +1194,8 @@ static size_t format_commit_one(struct strbuf *sb, const char *placeholder,
 	return 0;	/* unknown placeholder */
 }
 
-static size_t format_commit_item(struct strbuf *sb, const char *placeholder,
+static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */
+				 const char *placeholder,
 				 void *context)
 {
 	int consumed;
@@ -1273,6 +1275,7 @@ void format_commit_message(const struct commit *commit,
 {
 	struct format_commit_context context;
 	const char *output_enc = pretty_ctx->output_encoding;
+	const char *utf8 = "UTF-8";
 
 	memset(&context, 0, sizeof(context));
 	context.commit = commit;
@@ -1285,6 +1288,23 @@ void format_commit_message(const struct commit *commit,
 	strbuf_expand(sb, format, format_commit_item, &context);
 	rewrap_message_tail(sb, &context, 0, 0, 0);
 
+	if (output_enc) {
+		if (same_encoding(utf8, output_enc))
+			output_enc = NULL;
+	} else {
+		if (context.commit_encoding &&
+		    !same_encoding(context.commit_encoding, utf8))
+			output_enc = context.commit_encoding;
+	}
+
+	if (output_enc) {
+		int outsz;
+		char *out = reencode_string_len(sb->buf, sb->len,
+						output_enc, utf8, &outsz);
+		if (out)
+			strbuf_attach(sb, out, outsz, outsz + 1);
+	}
+
 	free(context.commit_encoding);
 	logmsg_free(context.message, commit);
 	free(context.signature_check.gpg_output);
diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index 3fc3b74..0393c9f 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -184,7 +184,7 @@ Test printing of complex bodies
 
 This commit message is much longer than the others,
 and it will be encoded in iso8859-1. We should therefore
-include an iso8859 character: ¡bueno!
+include an iso8859 character: ¡bueno!
 EOF
 test_expect_success 'setup complex body' '
 git config i18n.commitencoding iso8859-1 &&
@@ -192,14 +192,14 @@ git config i18n.commitencoding iso8859-1 &&
 '
 
 test_format complex-encoding %e <<'EOF'
-commit f58db70b055c5718631e5c61528b28b12090cdea
+commit 1ed88da4a5b5ed8c449114ac131efc62178734c3
 iso8859-1
 commit 131a310eb913d107dd3c09a65d1651175898735d
 commit 86c75cfd708a0e5868dc876ed5b8bb66c80b4873
 EOF
 
 test_format complex-subject %s <<'EOF'
-commit f58db70b055c5718631e5c61528b28b12090cdea
+commit 1ed88da4a5b5ed8c449114ac131efc62178734c3
 Test printing of complex bodies
 commit 131a310eb913d107dd3c09a65d1651175898735d
 changed foo
@@ -208,17 +208,17 @@ added foo
 EOF
 
 test_format complex-body %b <<'EOF'
-commit f58db70b055c5718631e5c61528b28b12090cdea
+commit 1ed88da4a5b5ed8c449114ac131efc62178734c3
 This commit message is much longer than the others,
 and it will be encoded in iso8859-1. We should therefore
-include an iso8859 character: ¡bueno!
+include an iso8859 character: ¡bueno!
 
 commit 131a310eb913d107dd3c09a65d1651175898735d
 commit 86c75cfd708a0e5868dc876ed5b8bb66c80b4873
 EOF
 
 test_expect_success '%x00 shows NUL' '
-	echo  >expect commit f58db70b055c5718631e5c61528b28b12090cdea &&
+	echo  >expect commit 1ed88da4a5b5ed8c449114ac131efc62178734c3 &&
 	echo >>expect fooQbar &&
 	git rev-list -1 --format=foo%x00bar HEAD >actual.nul &&
 	nul_to_q <actual.nul >actual &&
-- 
1.8.2.82.gc24b958

  parent reply	other threads:[~2013-04-18 23:10 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-16  2:24 [PATCH 00/12] Layout control placeholders for pretty format Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 01/12] pretty-formats.txt: wrap long lines Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 02/12] pretty: share code between format_decoration and show_decorations Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 03/12] utf8.c: move display_mode_esc_sequence_len() for use by other functions Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 04/12] utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 05/12] pretty: save commit encoding from logmsg_reencode if the caller needs it Nguyễn Thái Ngọc Duy
2013-03-17  8:57   ` Eric Sunshine
2013-03-16  2:24 ` [PATCH 06/12] pretty: get the correct encoding for --pretty:format=%e Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 07/12] utf8: keep NULs in reencode_string() Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 08/12] pretty: two phase conversion for non utf-8 commits Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 09/12] pretty: add %C(auto) for auto-coloring on the next placeholder Nguyễn Thái Ngọc Duy
2013-03-17  8:59   ` Eric Sunshine
2013-03-16  2:24 ` [PATCH 10/12] pretty: support padding placeholders, %< %> and %>< Nguyễn Thái Ngọc Duy
2013-03-17  9:03   ` Eric Sunshine
2013-03-16  2:24 ` [PATCH 11/12] pretty: support truncating in %>, %< " Nguyễn Thái Ngọc Duy
2013-03-16  9:04   ` Paul Campbell
2013-03-16  2:24 ` [PATCH 12/12] pretty: support %>> that steal trailing spaces Nguyễn Thái Ngọc Duy
2013-03-17  9:06   ` Eric Sunshine
2013-03-30  9:31     ` Duy Nguyen
2013-03-30  9:35 ` [PATCH v2 00/12] Layout control placeholders for pretty format Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 01/12] pretty-formats.txt: wrap long lines Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 02/12] pretty: share code between format_decoration and show_decorations Nguyễn Thái Ngọc Duy
2013-04-01 17:53     ` Junio C Hamano
2013-04-05  7:57       ` Jakub Narębski
2013-04-12 23:36         ` Duy Nguyen
2013-04-12 23:34       ` Duy Nguyen
2013-03-30  9:35   ` [PATCH v2 03/12] utf8.c: move display_mode_esc_sequence_len() for use by other functions Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 04/12] utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences Nguyễn Thái Ngọc Duy
2013-04-01 18:04     ` Junio C Hamano
2013-03-30  9:35   ` [PATCH v2 05/12] pretty: save commit encoding from logmsg_reencode if the caller needs it Nguyễn Thái Ngọc Duy
2013-04-01 18:10     ` Junio C Hamano
2013-03-30  9:35   ` [PATCH v2 06/12] pretty: get the correct encoding for --pretty:format=%e Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 07/12] utf8: keep NULs in reencode_string() Nguyễn Thái Ngọc Duy
2013-03-30 17:06     ` Torsten Bögershausen
2013-03-31  0:23       ` Duy Nguyen
2013-03-30  9:35   ` [PATCH v2 08/12] pretty: two phase conversion for non utf-8 commits Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 09/12] pretty: add %C(auto) for auto-coloring on the next placeholder Nguyễn Thái Ngọc Duy
2013-04-01 18:26     ` Junio C Hamano
2013-04-05  2:21       ` Duy Nguyen
2013-04-05 17:13         ` Junio C Hamano
2013-04-15  9:54           ` Duy Nguyen
2013-03-30  9:35   ` [PATCH v2 10/12] pretty: support padding placeholders, %< %> and %>< Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 11/12] pretty: support truncating in %>, %< " Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 12/12] pretty: support %>> that steal trailing spaces Nguyễn Thái Ngọc Duy
2013-04-01 18:39     ` Junio C Hamano
2013-04-16  8:24   ` [PATCH v3 00/13] nd/pretty-formats Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 01/13] pretty: save commit encoding from logmsg_reencode if the caller needs it Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 02/13] pretty: get the correct encoding for --pretty:format=%e Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 03/13] pretty-formats.txt: wrap long lines Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 04/13] pretty: share code between format_decoration and show_decorations Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 05/13] utf8.c: move display_mode_esc_sequence_len() for use by other functions Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 06/13] utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 07/13] utf8.c: add reencode_string_len() that can handle NULs in string Nguyễn Thái Ngọc Duy
2013-04-16  8:30       ` Duy Nguyen
2013-04-18 17:25       ` Junio C Hamano
2013-04-16  8:24     ` [PATCH v3 08/13] pretty: two phase conversion for non utf-8 commits Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 09/13] pretty: split color parsing into a separate function Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 10/13] pretty: add %C(auto) for auto-coloring Nguyễn Thái Ngọc Duy
2013-04-16 21:33       ` Junio C Hamano
2013-04-17  9:55         ` Duy Nguyen
2013-04-17 15:28           ` Junio C Hamano
2013-04-16 21:37       ` Junio C Hamano
2013-04-16  8:25     ` [PATCH v3 11/13] pretty: support padding placeholders, %< %> and %>< Nguyễn Thái Ngọc Duy
2013-04-16 20:41       ` Junio C Hamano
2013-04-16 20:43         ` Junio C Hamano
2013-04-17  9:45         ` Duy Nguyen
2013-04-16  8:25     ` [PATCH v3 12/13] pretty: support truncating in %>, %< " Nguyễn Thái Ngọc Duy
2013-04-16  8:25     ` [PATCH v3 13/13] pretty: support %>> that steal trailing spaces Nguyễn Thái Ngọc Duy
     [not found]     ` <516D57BD.7080208@web.de>
2013-04-16 14:47       ` [PATCH v3 00/13] nd/pretty-formats Torsten Bögershausen
2013-04-18 23:08     ` [PATCH v4 " Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 01/13] pretty: save commit encoding from logmsg_reencode if the caller needs it Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 02/13] pretty: get the correct encoding for --pretty:format=%e Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 03/13] pretty-formats.txt: wrap long lines Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 04/13] pretty: share code between format_decoration and show_decorations Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 05/13] utf8.c: move display_mode_esc_sequence_len() for use by other functions Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 06/13] utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 07/13] utf8.c: add reencode_string_len() that can handle NULs in string Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` Nguyễn Thái Ngọc Duy [this message]
2013-04-18 23:08       ` [PATCH v4 09/13] pretty: split color parsing into a separate function Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 10/13] pretty: add %C(auto) for auto-coloring Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 11/13] pretty: support padding placeholders, %< %> and %>< Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 12/13] pretty: support truncating in %>, %< " Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 13/13] pretty: support %>> that steal trailing spaces Nguyễn Thái Ngọc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1366326532-28517-9-git-send-email-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).