All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH v4 08/13] pretty: two phase conversion for non utf-8 commits
Date: Fri, 19 Apr 2013 09:08:47 +1000	[thread overview]
Message-ID: <1366326532-28517-9-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1366326532-28517-1-git-send-email-pclouds@gmail.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 4649 bytes --]

Always assume format_commit_item() takes an utf-8 string for string
handling simplicity (we can handle utf-8 strings, but can't with other
encodings).

If commit message is in non-utf8, or output encoding is not, then the
commit is first converted to utf-8, processed, then output converted
to output encoding. This of course only works with encodings that are
compatible with Unicode.

This also fixes the iso8859-1 test in t6006. It's supposed to create
an iso8859-1 commit, but the commit content in t6006 is in UTF-8.
t6006 is now converted back in UTF-8 (the downside is we can't put
utf-8 strings there anymore).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 pretty.c                   | 24 ++++++++++++++++++++++--
 t/t6006-rev-list-format.sh | 12 ++++++------
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/pretty.c b/pretty.c
index e0f93ba..5947275 100644
--- a/pretty.c
+++ b/pretty.c
@@ -954,7 +954,8 @@ static int format_reflog_person(struct strbuf *sb,
 	return format_person_part(sb, part, ident, strlen(ident), dmode);
 }
 
-static size_t format_commit_one(struct strbuf *sb, const char *placeholder,
+static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
+				const char *placeholder,
 				void *context)
 {
 	struct format_commit_context *c = context;
@@ -1193,7 +1194,8 @@ static size_t format_commit_one(struct strbuf *sb, const char *placeholder,
 	return 0;	/* unknown placeholder */
 }
 
-static size_t format_commit_item(struct strbuf *sb, const char *placeholder,
+static size_t format_commit_item(struct strbuf *sb, /* in UTF-8 */
+				 const char *placeholder,
 				 void *context)
 {
 	int consumed;
@@ -1273,6 +1275,7 @@ void format_commit_message(const struct commit *commit,
 {
 	struct format_commit_context context;
 	const char *output_enc = pretty_ctx->output_encoding;
+	const char *utf8 = "UTF-8";
 
 	memset(&context, 0, sizeof(context));
 	context.commit = commit;
@@ -1285,6 +1288,23 @@ void format_commit_message(const struct commit *commit,
 	strbuf_expand(sb, format, format_commit_item, &context);
 	rewrap_message_tail(sb, &context, 0, 0, 0);
 
+	if (output_enc) {
+		if (same_encoding(utf8, output_enc))
+			output_enc = NULL;
+	} else {
+		if (context.commit_encoding &&
+		    !same_encoding(context.commit_encoding, utf8))
+			output_enc = context.commit_encoding;
+	}
+
+	if (output_enc) {
+		int outsz;
+		char *out = reencode_string_len(sb->buf, sb->len,
+						output_enc, utf8, &outsz);
+		if (out)
+			strbuf_attach(sb, out, outsz, outsz + 1);
+	}
+
 	free(context.commit_encoding);
 	logmsg_free(context.message, commit);
 	free(context.signature_check.gpg_output);
diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index 3fc3b74..0393c9f 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -184,7 +184,7 @@ Test printing of complex bodies
 
 This commit message is much longer than the others,
 and it will be encoded in iso8859-1. We should therefore
-include an iso8859 character: ¡bueno!
+include an iso8859 character: ¡bueno!
 EOF
 test_expect_success 'setup complex body' '
 git config i18n.commitencoding iso8859-1 &&
@@ -192,14 +192,14 @@ git config i18n.commitencoding iso8859-1 &&
 '
 
 test_format complex-encoding %e <<'EOF'
-commit f58db70b055c5718631e5c61528b28b12090cdea
+commit 1ed88da4a5b5ed8c449114ac131efc62178734c3
 iso8859-1
 commit 131a310eb913d107dd3c09a65d1651175898735d
 commit 86c75cfd708a0e5868dc876ed5b8bb66c80b4873
 EOF
 
 test_format complex-subject %s <<'EOF'
-commit f58db70b055c5718631e5c61528b28b12090cdea
+commit 1ed88da4a5b5ed8c449114ac131efc62178734c3
 Test printing of complex bodies
 commit 131a310eb913d107dd3c09a65d1651175898735d
 changed foo
@@ -208,17 +208,17 @@ added foo
 EOF
 
 test_format complex-body %b <<'EOF'
-commit f58db70b055c5718631e5c61528b28b12090cdea
+commit 1ed88da4a5b5ed8c449114ac131efc62178734c3
 This commit message is much longer than the others,
 and it will be encoded in iso8859-1. We should therefore
-include an iso8859 character: ¡bueno!
+include an iso8859 character: ¡bueno!
 
 commit 131a310eb913d107dd3c09a65d1651175898735d
 commit 86c75cfd708a0e5868dc876ed5b8bb66c80b4873
 EOF
 
 test_expect_success '%x00 shows NUL' '
-	echo  >expect commit f58db70b055c5718631e5c61528b28b12090cdea &&
+	echo  >expect commit 1ed88da4a5b5ed8c449114ac131efc62178734c3 &&
 	echo >>expect fooQbar &&
 	git rev-list -1 --format=foo%x00bar HEAD >actual.nul &&
 	nul_to_q <actual.nul >actual &&
-- 
1.8.2.82.gc24b958

  parent reply	other threads:[~2013-04-18 23:10 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-16  2:24 [PATCH 00/12] Layout control placeholders for pretty format Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 01/12] pretty-formats.txt: wrap long lines Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 02/12] pretty: share code between format_decoration and show_decorations Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 03/12] utf8.c: move display_mode_esc_sequence_len() for use by other functions Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 04/12] utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 05/12] pretty: save commit encoding from logmsg_reencode if the caller needs it Nguyễn Thái Ngọc Duy
2013-03-17  8:57   ` Eric Sunshine
2013-03-16  2:24 ` [PATCH 06/12] pretty: get the correct encoding for --pretty:format=%e Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 07/12] utf8: keep NULs in reencode_string() Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 08/12] pretty: two phase conversion for non utf-8 commits Nguyễn Thái Ngọc Duy
2013-03-16  2:24 ` [PATCH 09/12] pretty: add %C(auto) for auto-coloring on the next placeholder Nguyễn Thái Ngọc Duy
2013-03-17  8:59   ` Eric Sunshine
2013-03-16  2:24 ` [PATCH 10/12] pretty: support padding placeholders, %< %> and %>< Nguyễn Thái Ngọc Duy
2013-03-17  9:03   ` Eric Sunshine
2013-03-16  2:24 ` [PATCH 11/12] pretty: support truncating in %>, %< " Nguyễn Thái Ngọc Duy
2013-03-16  9:04   ` Paul Campbell
2013-03-16  2:24 ` [PATCH 12/12] pretty: support %>> that steal trailing spaces Nguyễn Thái Ngọc Duy
2013-03-17  9:06   ` Eric Sunshine
2013-03-30  9:31     ` Duy Nguyen
2013-03-30  9:35 ` [PATCH v2 00/12] Layout control placeholders for pretty format Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 01/12] pretty-formats.txt: wrap long lines Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 02/12] pretty: share code between format_decoration and show_decorations Nguyễn Thái Ngọc Duy
2013-04-01 17:53     ` Junio C Hamano
2013-04-05  7:57       ` Jakub Narębski
2013-04-12 23:36         ` Duy Nguyen
2013-04-12 23:34       ` Duy Nguyen
2013-03-30  9:35   ` [PATCH v2 03/12] utf8.c: move display_mode_esc_sequence_len() for use by other functions Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 04/12] utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences Nguyễn Thái Ngọc Duy
2013-04-01 18:04     ` Junio C Hamano
2013-03-30  9:35   ` [PATCH v2 05/12] pretty: save commit encoding from logmsg_reencode if the caller needs it Nguyễn Thái Ngọc Duy
2013-04-01 18:10     ` Junio C Hamano
2013-03-30  9:35   ` [PATCH v2 06/12] pretty: get the correct encoding for --pretty:format=%e Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 07/12] utf8: keep NULs in reencode_string() Nguyễn Thái Ngọc Duy
2013-03-30 17:06     ` Torsten Bögershausen
2013-03-31  0:23       ` Duy Nguyen
2013-03-30  9:35   ` [PATCH v2 08/12] pretty: two phase conversion for non utf-8 commits Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 09/12] pretty: add %C(auto) for auto-coloring on the next placeholder Nguyễn Thái Ngọc Duy
2013-04-01 18:26     ` Junio C Hamano
2013-04-05  2:21       ` Duy Nguyen
2013-04-05 17:13         ` Junio C Hamano
2013-04-15  9:54           ` Duy Nguyen
2013-03-30  9:35   ` [PATCH v2 10/12] pretty: support padding placeholders, %< %> and %>< Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 11/12] pretty: support truncating in %>, %< " Nguyễn Thái Ngọc Duy
2013-03-30  9:35   ` [PATCH v2 12/12] pretty: support %>> that steal trailing spaces Nguyễn Thái Ngọc Duy
2013-04-01 18:39     ` Junio C Hamano
2013-04-16  8:24   ` [PATCH v3 00/13] nd/pretty-formats Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 01/13] pretty: save commit encoding from logmsg_reencode if the caller needs it Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 02/13] pretty: get the correct encoding for --pretty:format=%e Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 03/13] pretty-formats.txt: wrap long lines Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 04/13] pretty: share code between format_decoration and show_decorations Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 05/13] utf8.c: move display_mode_esc_sequence_len() for use by other functions Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 06/13] utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 07/13] utf8.c: add reencode_string_len() that can handle NULs in string Nguyễn Thái Ngọc Duy
2013-04-16  8:30       ` Duy Nguyen
2013-04-18 17:25       ` Junio C Hamano
2013-04-16  8:24     ` [PATCH v3 08/13] pretty: two phase conversion for non utf-8 commits Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 09/13] pretty: split color parsing into a separate function Nguyễn Thái Ngọc Duy
2013-04-16  8:24     ` [PATCH v3 10/13] pretty: add %C(auto) for auto-coloring Nguyễn Thái Ngọc Duy
2013-04-16 21:33       ` Junio C Hamano
2013-04-17  9:55         ` Duy Nguyen
2013-04-17 15:28           ` Junio C Hamano
2013-04-16 21:37       ` Junio C Hamano
2013-04-16  8:25     ` [PATCH v3 11/13] pretty: support padding placeholders, %< %> and %>< Nguyễn Thái Ngọc Duy
2013-04-16 20:41       ` Junio C Hamano
2013-04-16 20:43         ` Junio C Hamano
2013-04-17  9:45         ` Duy Nguyen
2013-04-16  8:25     ` [PATCH v3 12/13] pretty: support truncating in %>, %< " Nguyễn Thái Ngọc Duy
2013-04-16  8:25     ` [PATCH v3 13/13] pretty: support %>> that steal trailing spaces Nguyễn Thái Ngọc Duy
     [not found]     ` <516D57BD.7080208@web.de>
2013-04-16 14:47       ` [PATCH v3 00/13] nd/pretty-formats Torsten Bögershausen
2013-04-18 23:08     ` [PATCH v4 " Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 01/13] pretty: save commit encoding from logmsg_reencode if the caller needs it Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 02/13] pretty: get the correct encoding for --pretty:format=%e Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 03/13] pretty-formats.txt: wrap long lines Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 04/13] pretty: share code between format_decoration and show_decorations Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 05/13] utf8.c: move display_mode_esc_sequence_len() for use by other functions Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 06/13] utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 07/13] utf8.c: add reencode_string_len() that can handle NULs in string Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` Nguyễn Thái Ngọc Duy [this message]
2013-04-18 23:08       ` [PATCH v4 09/13] pretty: split color parsing into a separate function Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 10/13] pretty: add %C(auto) for auto-coloring Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 11/13] pretty: support padding placeholders, %< %> and %>< Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 12/13] pretty: support truncating in %>, %< " Nguyễn Thái Ngọc Duy
2013-04-18 23:08       ` [PATCH v4 13/13] pretty: support %>> that steal trailing spaces Nguyễn Thái Ngọc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1366326532-28517-9-git-send-email-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.