git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Gavrilov <angavrilov@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>, Paul Mackerras <paulus@samba.org>
Subject: [RFC PATCH] commit: Warn about encodings unsupported by iconv.
Date: Tue, 21 Oct 2008 01:25:18 +0400	[thread overview]
Message-ID: <1224537918-14024-1-git-send-email-angavrilov@gmail.com> (raw)

Currently git-commit and git-commit-tree silently allow
usage of encodings that are unknown to iconv. This may
confuse the user, who then won't be able to use automatic
encoding conversion in git-log and friends without any
immediately obvious reason. Note that the difference
between a supported and an unsupported name may be as
small as CP1251 vs CP-1251, or Shift-JIS vs ShiftJIS.

This commit adds a check for such cases, which produces
a warning similar to the one issued when a message claims
to be utf-8, but actually is not.

Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
---

	I wonder how common such situation may actually be, and 
	whether gitk & git-gui (or core git itself) should explicitly
	provide some way to deal with it in old commits. I personally
	hit it during experimenting, and wrongly concluded that gitk
	does not support using multiple encodings in one repository.
	
	Current gitk implementation generally allows working around
	it by setting i18n.commitencoding to a valid name of the
	encoding used in the mislabeled commits. However, the
	downside is that if the selected encoding cannot represent
	some characters of an otherwise completely valid commit,
	they come out as garbage. Always using --encoding=utf-8
	from GUI and relying on conversion done by git-log should
	fix this case, but it breaks the workaround.
	
	-- Alexander

 builtin-commit-tree.c |   25 +++++++++++++++++++++++--
 1 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/builtin-commit-tree.c b/builtin-commit-tree.c
index 0453425..7f325a3 100644
--- a/builtin-commit-tree.c
+++ b/builtin-commit-tree.c
@@ -45,6 +45,28 @@ static const char commit_utf8_warn[] =
 "You may want to amend it after fixing the message, or set the config\n"
 "variable i18n.commitencoding to the encoding your project uses.\n";
 
+static const char commit_bad_encoding_warn[] =
+"Warning: commit encoding '%s' is not supported.\n"
+"You may want to change the value of the i18n.commitencoding config\n"
+"variable, and redo the commit. Use 'iconv --list' to see the list of\n"
+"available encoding names.\n";
+
+static void verify_commit_encoding(const char *text, const char *encoding)
+{
+	if (is_encoding_utf8(encoding)) {
+		if (!is_utf8(text))
+			fprintf(stderr, commit_utf8_warn);
+	}
+#ifndef NO_ICONV
+	else {
+		char *conv = reencode_string("", "utf-8", encoding);
+		if (!conv)
+			fprintf(stderr, commit_bad_encoding_warn, encoding);
+		free(conv);
+	}
+#endif
+}
+
 int commit_tree(const char *msg, unsigned char *tree,
 		struct commit_list *parents, unsigned char *ret,
 		const char *author)
@@ -87,8 +109,7 @@ int commit_tree(const char *msg, unsigned char *tree,
 	strbuf_addstr(&buffer, msg);
 
 	/* And check the encoding */
-	if (encoding_is_utf8 && !is_utf8(buffer.buf))
-		fprintf(stderr, commit_utf8_warn);
+	verify_commit_encoding(buffer.buf, git_commit_encoding);
 
 	result = write_sha1_file(buffer.buf, buffer.len, commit_type, ret);
 	strbuf_release(&buffer);
-- 
1.6.0.20.g6148bc

             reply	other threads:[~2008-10-20 21:28 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-20 21:25 Alexander Gavrilov [this message]
2008-10-21  0:39 ` [RFC PATCH] commit: Warn about encodings unsupported by iconv Junio C Hamano
2008-10-21  6:17   ` Alex Riesen
2008-10-21 10:43   ` Alexander Gavrilov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1224537918-14024-1-git-send-email-angavrilov@gmail.com \
    --to=angavrilov@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).