git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] [RFC] add --recode-patch option to git-mailinfo
@ 2010-06-06 11:59 Zhang Le
  2010-06-06 11:59 ` [PATCH 2/2] [RFC] add --recode-patch to git-am Zhang Le
  2010-06-06 20:03 ` [PATCH 1/2] [RFC] add --recode-patch option to git-mailinfo Junio C Hamano
  0 siblings, 2 replies; 4+ messages in thread
From: Zhang Le @ 2010-06-06 11:59 UTC (permalink / raw)
  To: git; +Cc: Zhang Le

I have a translation project which uses UTF-8 as charset.
So the patch must be encoded in UTF-8, not just the commit msg etc.
And we use google group as our mailing list.

Recently, due to unknown reason, mails saved from gmail are encoded using GB2312.
This never happened before. I guess google has did something.
But I haven't found how to change this behavior.

So I took another way, i.e. add this option to git-mailinfo.
I hope this could benefit others as well.

Signed-off-by: Zhang Le <r0bertz@gentoo.org>
---
 builtin/mailinfo.c  |    8 +++++++-
 man1/git-mailinfo.1 |    7 ++++++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
index 4a9729b..73f51f3 100644
--- a/builtin/mailinfo.c
+++ b/builtin/mailinfo.c
@@ -12,6 +12,7 @@ static FILE *cmitmsg, *patchfile, *fin, *fout;
 static int keep_subject;
 static int keep_non_patch_brackets_in_subject;
 static const char *metainfo_charset;
+static int recode_patch;
 static struct strbuf line = STRBUF_INIT;
 static struct strbuf name = STRBUF_INIT;
 static struct strbuf email = STRBUF_INIT;
@@ -830,6 +831,8 @@ static int handle_commit_msg(struct strbuf *line)
 
 static void handle_patch(const struct strbuf *line)
 {
+	if (recode_patch)
+		convert_to_utf8(line, charset.buf);
 	fwrite(line->buf, 1, line->len, patchfile);
 	patch_lines++;
 }
@@ -1021,7 +1024,7 @@ static int git_mailinfo_config(const char *var, const char *value, void *unused)
 }
 
 static const char mailinfo_usage[] =
-	"git mailinfo [-k|-b] [-u | --encoding=<encoding> | -n] [--scissors | --no-scissors] msg patch < mail >info";
+	"git mailinfo [-k|-b] [-u | --encoding=<encoding> | -n] [--recode-patch] [--scissors | --no-scissors] msg patch < mail >info";
 
 int cmd_mailinfo(int argc, const char **argv, const char *prefix)
 {
@@ -1034,6 +1037,7 @@ int cmd_mailinfo(int argc, const char **argv, const char *prefix)
 
 	def_charset = (git_commit_encoding ? git_commit_encoding : "UTF-8");
 	metainfo_charset = def_charset;
+	recode_patch = 0;
 
 	while (1 < argc && argv[1][0] == '-') {
 		if (!strcmp(argv[1], "-k"))
@@ -1046,6 +1050,8 @@ int cmd_mailinfo(int argc, const char **argv, const char *prefix)
 			metainfo_charset = NULL;
 		else if (!prefixcmp(argv[1], "--encoding="))
 			metainfo_charset = argv[1] + 11;
+		else if (!strcmp(argv[1], "--recode-patch"))
+			recode_patch = 1;
 		else if (!strcmp(argv[1], "--scissors"))
 			use_scissors = 1;
 		else if (!strcmp(argv[1], "--no-scissors"))
diff --git a/man1/git-mailinfo.1 b/man1/git-mailinfo.1
index 4d0e929..d52457f 100644
--- a/man1/git-mailinfo.1
+++ b/man1/git-mailinfo.1
@@ -22,7 +22,7 @@
 git-mailinfo \- Extracts patch and authorship from a single e\-mail message
 .SH "SYNOPSIS"
 .sp
-\fIgit mailinfo\fR [\-k|\-b] [\-u | \-\-encoding=<encoding> | \-n] [\-\-scissors] <msg> <patch>
+\fIgit mailinfo\fR [\-k|\-b] [\-u | \-\-encoding=<encoding> | \-n] [\-\-recode\-patch] [\-\-scissors] <msg> <patch>
 .SH "DESCRIPTION"
 .sp
 Reads a single e\-mail message from the standard input, and writes the commit log message in <msg> file, and the patches in <patch> file\&. The author name, e\-mail and e\-mail subject are written out to the standard output to be used by \fIgit am\fR to create a commit\&. It is usually not necessary to use this command directly\&. See \fBgit-am\fR(1) instead\&.
@@ -70,6 +70,11 @@ Similar to \-u but if the local convention is different from what is specified b
 Disable all charset re\-coding of the metadata\&.
 .RE
 .PP
+\-\-recode\-patch
+.RS 4
+Re\-code patch as well, using the same encoding as metadata\&. The default is off\&.
+.RE
+.PP
 \-\-scissors
 .RS 4
 Remove everything in body before a scissors line\&. A line that mainly consists of scissors (either ">8" or "8<") and perforation (dash "\-") marks is called a scissors line, and is used to request the reader to cut the message at that line\&. If such a line appears in the body of the message before the patch, everything before it (including the scissors line itself) is ignored when this option is used\&.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] [RFC] add --recode-patch to git-am
  2010-06-06 11:59 [PATCH 1/2] [RFC] add --recode-patch option to git-mailinfo Zhang Le
@ 2010-06-06 11:59 ` Zhang Le
  2010-06-06 20:03 ` [PATCH 1/2] [RFC] add --recode-patch option to git-mailinfo Junio C Hamano
  1 sibling, 0 replies; 4+ messages in thread
From: Zhang Le @ 2010-06-06 11:59 UTC (permalink / raw)
  To: git; +Cc: Zhang Le

The reason has been explained in the "add --recode-patch to git-mailinfo" patch

Signed-off-by: Zhang Le <r0bertz@gentoo.org>
---
 git-am.sh     |   13 +++++++++++--
 man1/git-am.1 |   12 +++++++++++-
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/git-am.sh b/git-am.sh
index 1056075..62e7870 100755
--- a/git-am.sh
+++ b/git-am.sh
@@ -14,6 +14,7 @@ b,binary*       (historical option -- no-op)
 q,quiet         be quiet
 s,signoff       add a Signed-off-by line to the commit message
 u,utf8          recode into utf8 (default)
+recode-patch    pass --recode-patch flag to git-mailinfo
 k,keep          pass -k flag to git-mailinfo
 keep-cr         pass --keep-cr flag to git-mailsplit for mbox format
 no-keep-cr      do not pass --keep-cr flag to git-mailsplit independent of am.keepcr
@@ -294,7 +295,7 @@ split_patches () {
 prec=4
 dotest="$GIT_DIR/rebase-apply"
 sign= utf8=t keep= keepcr= skip= interactive= resolved= rebasing= abort=
-resolvemsg= resume= scissors= no_inbody_headers=
+resolvemsg= resume= scissors= no_inbody_headers= recode_patch=
 git_apply_opt=
 committer_date_is_author_date=
 ignore_date=
@@ -320,6 +321,8 @@ do
 		utf8=t ;; # this is now default
 	--no-utf8)
 		utf8= ;;
+	--recode-patch)
+		recode_patch=t ;;
 	-k|--keep)
 		keep=t ;;
 	-c|--scissors)
@@ -463,6 +466,7 @@ else
 	echo "$threeway" >"$dotest/threeway"
 	echo "$sign" >"$dotest/sign"
 	echo "$utf8" >"$dotest/utf8"
+	echo "$recode_patch" >"$dotest/recode_patch"
 	echo "$keep" >"$dotest/keep"
 	echo "$keepcr" >"$dotest/keepcr"
 	echo "$scissors" >"$dotest/scissors"
@@ -504,6 +508,10 @@ then
 else
 	utf8=-n
 fi
+if test "$(cat "$dotest/recode_patch")" = t
+then
+	recodepatch=--recode-patch
+fi
 if test "$(cat "$dotest/keep")" = t
 then
 	keep=-k
@@ -580,7 +588,8 @@ do
 	# by the user, or the user can tell us to do so by --resolved flag.
 	case "$resume" in
 	'')
-		git mailinfo $keep $no_inbody_headers $scissors $utf8 "$dotest/msg" "$dotest/patch" \
+		git mailinfo $keep $no_inbody_headers $scissors $utf8 \
+		$recodepatch "$dotest/msg" "$dotest/patch" \
 			<"$dotest/$msgnum" >"$dotest/info" ||
 			stop_here $this
 
diff --git a/man1/git-am.1 b/man1/git-am.1
index c6a0d27..b5bc0e8 100644
--- a/man1/git-am.1
+++ b/man1/git-am.1
@@ -24,7 +24,7 @@ git-am \- Apply a series of patches from a mailbox
 .sp
 .nf
 \fIgit am\fR [\-\-signoff] [\-\-keep] [\-\-keep\-cr | \-\-no\-keep\-cr] [\-\-utf8 | \-\-no\-utf8]
-         [\-\-3way] [\-\-interactive] [\-\-committer\-date\-is\-author\-date]
+         [\-\-recode\-patch] [\-\-3way] [\-\-interactive] [\-\-committer\-date\-is\-author\-date]
          [\-\-ignore\-date] [\-\-ignore\-space\-change | \-\-ignore\-whitespace]
          [\-\-whitespace=<option>] [\-C<n>] [\-p<n>] [\-\-directory=<dir>]
          [\-\-reject] [\-q | \-\-quiet] [\-\-scissors | \-\-no\-scissors]
@@ -116,6 +116,16 @@ flag to
 \fBgit-mailinfo\fR(1))\&.
 .RE
 .PP
+\-\-recode\-patch
+.RS 4
+Pass
+\-\-recode\-patch
+flag to
+\fIgit mailinfo\fR
+(see
+\fBgit-mailinfo\fR(1))\&.
+.RE
+.PP
 \-3, \-\-3way
 .RS 4
 When the patch does not apply cleanly, fall back on 3\-way merge if the patch records the identity of blobs it is supposed to apply to and we have those blobs available locally\&.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] [RFC] add --recode-patch option to git-mailinfo
  2010-06-06 11:59 [PATCH 1/2] [RFC] add --recode-patch option to git-mailinfo Zhang Le
  2010-06-06 11:59 ` [PATCH 2/2] [RFC] add --recode-patch to git-am Zhang Le
@ 2010-06-06 20:03 ` Junio C Hamano
  2010-06-07  1:44   ` Zhang Le
  1 sibling, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2010-06-06 20:03 UTC (permalink / raw)
  To: Zhang Le; +Cc: git

Zhang Le <r0bertz@gentoo.org> writes:

> I have a translation project which uses UTF-8 as charset.
> So the patch must be encoded in UTF-8, not just the commit msg etc.
> And we use google group as our mailing list.
>
> Recently, due to unknown reason, mails saved from gmail are encoded using GB2312.
> This never happened before. I guess google has did something.
> But I haven't found how to change this behavior.
>
> So I took another way, i.e. add this option to git-mailinfo.
> I hope this could benefit others as well.
>
> Signed-off-by: Zhang Le <r0bertz@gentoo.org>
> ---
>  builtin/mailinfo.c  |    8 +++++++-
>  man1/git-mailinfo.1 |    7 ++++++-

Don't patch anything in man?/ as they are autogenerated files and not
source; patch the source file in Documentation/ directory instead.

I take it that you recode from whatever encoding the mail message is in
(probably stated in "Content-type: ...; charset=xxx" header) to the
encoding specified with --encoding option (defaulting to UTF-8), but it
wasn't very clear from the documentation.  We might want to improve 
the descriptions of both this new option and --encoding option.

Also it might be useful to find out what that "due to unknown reason" is,
at least to see if that is what Google did or what the user did.

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] [RFC] add --recode-patch option to git-mailinfo
  2010-06-06 20:03 ` [PATCH 1/2] [RFC] add --recode-patch option to git-mailinfo Junio C Hamano
@ 2010-06-07  1:44   ` Zhang Le
  0 siblings, 0 replies; 4+ messages in thread
From: Zhang Le @ 2010-06-07  1:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, druggo

[-- Attachment #1: Type: text/plain, Size: 2271 bytes --]

On 13:03 Sun 06 Jun     , Junio C Hamano wrote:
> Zhang Le <r0bertz@gentoo.org> writes:
> 
> > I have a translation project which uses UTF-8 as charset.
> > So the patch must be encoded in UTF-8, not just the commit msg etc.
> > And we use google group as our mailing list.
> >
> > Recently, due to unknown reason, mails saved from gmail are encoded using GB2312.
> > This never happened before. I guess google has did something.
> > But I haven't found how to change this behavior.
> >
> > So I took another way, i.e. add this option to git-mailinfo.
> > I hope this could benefit others as well.
> >
> > Signed-off-by: Zhang Le <r0bertz@gentoo.org>
> > ---
> >  builtin/mailinfo.c  |    8 +++++++-
> >  man1/git-mailinfo.1 |    7 ++++++-
> 
> Don't patch anything in man?/ as they are autogenerated files and not
> source; patch the source file in Documentation/ directory instead.

Thanks, will do it.

> 
> I take it that you recode from whatever encoding the mail message is in
> (probably stated in "Content-type: ...; charset=xxx" header) to the
> encoding specified with --encoding option (defaulting to UTF-8), but it
> wasn't very clear from the documentation.  We might want to improve 
> the descriptions of both this new option and --encoding option.

That's exactly what this patch's purpose is.
I will try to improve the doc.

> 
> Also it might be useful to find out what that "due to unknown reason" is,
> at least to see if that is what Google did or what the user did.

One of my friend, Yang Xiaoguang, found that google tries to detect the
language of the email and recode it using the native charset.
For Simplified Chinese, it is GB2312.
For Traditional Chinese, it is Big5.

In the test, Yang sent all emails using UTF-8 charset.
He sent those mails to a google group and then checked the "Content-type: ...;
charset=xxx" in gmail.

If the mail is written in Simplified Chinese, the charset became GB2312.
If the mail is written in Traditional Chinese, the charset became Big5.
If the mail is mixed with Simplified and Traditional Chinese, the charset
remains as UTF-8.

-- 
Zhang, Le
Gentoo/Loongson Developer
http://zhangle.is-a-geek.org
0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-06-07  1:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-06 11:59 [PATCH 1/2] [RFC] add --recode-patch option to git-mailinfo Zhang Le
2010-06-06 11:59 ` [PATCH 2/2] [RFC] add --recode-patch to git-am Zhang Le
2010-06-06 20:03 ` [PATCH 1/2] [RFC] add --recode-patch option to git-mailinfo Junio C Hamano
2010-06-07  1:44   ` Zhang Le

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).