All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings
Date: Tue, 21 Feb 2012 21:24:50 +0700	[thread overview]
Message-ID: <1329834292-2511-2-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1329834292-2511-1-git-send-email-pclouds@gmail.com>

We rely on ASCII everywhere. We print "\n" directly without conversion
for example. The end result would be a mix of some encoding and ASCII
if they are incompatible. Do not do that.

In theory we could convert everything to utf-8 as intermediate medium,
process process process, then convert final output to the desired
encoding. But that's a lot of work (unless we have a pager-like
converter) with little real use. Users can just pipe everything to
iconv instead.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 It seems half of the encodings "iconv -l" list does not pass
 ascii_superset_encoding() test. I just assume they are either exotic
 or duplicate names.

 pretty.c |    7 +++++++
 utf8.c   |   15 +++++++++++++++
 utf8.h   |    1 +
 3 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/pretty.c b/pretty.c
index 8688b8f..5c433a2 100644
--- a/pretty.c
+++ b/pretty.c
@@ -493,12 +493,19 @@ char *logmsg_reencode(const struct commit *commit,
 		      const char *output_encoding)
 {
 	static const char *utf8 = "UTF-8";
+	static const char *last_output_encoding = NULL;
 	const char *use_encoding;
 	char *encoding;
 	char *out;
 
 	if (!*output_encoding)
 		return NULL;
+	if (last_output_encoding != output_encoding) {
+		if (!ascii_superset_encoding(output_encoding))
+			die("encoding %s is not a superset of ASCII.",
+			    output_encoding);
+		last_output_encoding = output_encoding;
+	}
 	encoding = get_header(commit, "encoding");
 	use_encoding = encoding ? encoding : utf8;
 	if (!strcmp(use_encoding, output_encoding))
diff --git a/utf8.c b/utf8.c
index 8acbc66..def93ee 100644
--- a/utf8.c
+++ b/utf8.c
@@ -482,3 +482,18 @@ char *reencode_string(const char *in, const char *out_encoding, const char *in_e
 	return out;
 }
 #endif
+
+int ascii_superset_encoding(const char *encoding)
+{
+	const char *sample = " !\"#$%&'()*+,-./0123456789:;<=>?@"
+		"ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`"
+		"abcdefghijklmnopqrstuvwxyz{|}~\n";
+	char *output;
+	int ret;
+	if (!encoding)
+		return 1;
+	output = reencode_string(sample, encoding, "US-ASCII");
+	ret = !output || !strcmp(sample, output);
+	free(output);
+	return ret;
+}
diff --git a/utf8.h b/utf8.h
index 81f2c82..75bc128 100644
--- a/utf8.h
+++ b/utf8.h
@@ -12,6 +12,7 @@ int strbuf_add_wrapped_text(struct strbuf *buf,
 		const char *text, int indent, int indent2, int width);
 int strbuf_add_wrapped_bytes(struct strbuf *buf, const char *data, int len,
 			     int indent, int indent2, int width);
+int ascii_superset_encoding(const char *encoding);
 
 #ifndef NO_ICONV
 char *reencode_string(const char *in, const char *out_encoding, const char *in_encoding);
-- 
1.7.8.36.g69ee2

  reply	other threads:[~2012-02-21 14:25 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-21 14:24 [PATCH 1/4] t3900: add missing UTF-16.txt and mark the test successful Nguyễn Thái Ngọc Duy
2012-02-21 14:24 ` Nguyễn Thái Ngọc Duy [this message]
2012-02-21 14:53   ` [PATCH 2/4] Do attempt pretty print in ASCII-incompatible encodings Nguyen Thai Ngoc Duy
2012-02-21 18:21   ` Jeff King
2012-02-22  2:17     ` Nguyen Thai Ngoc Duy
2012-02-23 11:25     ` Peter Krefting
2012-02-21 14:24 ` [PATCH 3/4] utf8: die if failed to re-encoding Nguyễn Thái Ngọc Duy
2012-02-21 17:36   ` Junio C Hamano
2012-02-21 14:24 ` [PATCH 4/4] Only re-encode certain parts in commit object, not the whole Nguyễn Thái Ngọc Duy
2012-02-21 18:25   ` Jeff King
2012-02-22  2:01     ` Nguyen Thai Ngoc Duy
2012-02-22  3:14       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1329834292-2511-2-git-send-email-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.