git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Encoding problems using git-svn
@ 2008-10-29  3:14 James North
  2008-10-30  3:28 ` James North
  2008-10-30  7:41 ` Eric Wong
  0 siblings, 2 replies; 6+ messages in thread
From: James North @ 2008-10-29  3:14 UTC (permalink / raw)
  To: git

Hi,

I'm using git-svn on a system with ISO-8859-1 encoding. The problem is
when I try to use "git svn dcommit" to send changes to a remote svn
(also ISO-8859-1).

Seems like git-svn is sending commit messages with utf-8 (just a
guessing...) and they look bad on the remote svn log. E.g. "Ca?\241a
de cami?\243n"

I have tried using i18n.commitencoding=ISO-8859-1 as suggested by the
warning when doing "git svn dcommit" but messages still are sent with
wrong encoding.

I'm mising something?

Thanks everyone

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Encoding problems using git-svn
  2008-10-29  3:14 Encoding problems using git-svn James North
@ 2008-10-30  3:28 ` James North
  2008-10-30  7:41 ` Eric Wong
  1 sibling, 0 replies; 6+ messages in thread
From: James North @ 2008-10-30  3:28 UTC (permalink / raw)
  To: git

Ok, I made a quick change in git-svn script and seems like is working
now in my system with locale set to iso-8859-1.

Dunno if this is the right place to post this, but I hope someone
knowledgeable see this and tells if this would work as a general fix.

This patch is against 1.6.0.2

--- git-svn     2008-09-15 13:04:46.000000000 +0200
+++ git-svn.mine        2008-10-30 04:21:09.000000000 +0100
@@ -43,6 +43,7 @@
 use Getopt::Long qw/:config gnu_getopt no_ignore_case auto_abbrev/;
 use IPC::Open3;
 use Git;
+use Encode;

 BEGIN {
        # import functions from Git into our packages, en masse
@@ -1061,6 +1062,7 @@
                    && !$saw_from) {
                        $msgbuf .= "\n\nFrom: $author";
                }
+        $msgbuf = encode("utf8", $msgbuf);
                print $log_fh $msgbuf or croak $!;
                command_close_pipe($msg_fh, $ctx);
        }


On Wed, Oct 29, 2008 at 4:14 AM, James North <tocapicha@gmail.com> wrote:
> Hi,
>
> I'm using git-svn on a system with ISO-8859-1 encoding. The problem is
> when I try to use "git svn dcommit" to send changes to a remote svn
> (also ISO-8859-1).
>
> Seems like git-svn is sending commit messages with utf-8 (just a
> guessing...) and they look bad on the remote svn log. E.g. "Ca?\241a
> de cami?\243n"
>
> I have tried using i18n.commitencoding=ISO-8859-1 as suggested by the
> warning when doing "git svn dcommit" but messages still are sent with
> wrong encoding.
>
> I'm mising something?
>
> Thanks everyone
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Encoding problems using git-svn
  2008-10-29  3:14 Encoding problems using git-svn James North
  2008-10-30  3:28 ` James North
@ 2008-10-30  7:41 ` Eric Wong
  2008-10-30 15:14   ` James North
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Wong @ 2008-10-30  7:41 UTC (permalink / raw)
  To: James North; +Cc: git

Hi James,

I saw your other patch too late, I had already started working on my
patch earlier today but got distracted by other things (being at
GitTogether :) and lacked a stable Internet connection afterwards.

Anyways, here's my version, it handles the case where the user specifies
the --edit option to interactively edit the commit message before
committing; and also reencodes the messages when fetching from SVN.

Can you let me know if it works for you?

Note: I'll be in transit tomorrow and may not have time to follow
up on this until Saturday.

>From 84f003e0c39414ebf27a98de167643e95bed6abb Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson@yhbt.net>
Date: Wed, 29 Oct 2008 23:49:26 -0700
Subject: [PATCH] git-svn: respect i18n.commitencoding config

SVN itself always stores log messages in the repository as
UTF-8.  git always stores/retrieves everything as raw binary
data with no transformations whatsoever.

To interact with SVN, we need to encode log messages as UTF-8
before sending them to SVN, as SVN cannot do it for us.  When
retrieving log messages from SVN, we also need to (attempt to)
reencode the UTF-8 log message back to the user-specified commit
encoding.

Note, handling i18n.logoutputencoding for "git svn log" also
needs to be done in a future change.

Also, this change only deals with the encoding of commit
messages and nothing else (path names, blob content, ...).

In-Reply-To: <8b168cfb0810282014r789ac01dnec51824de1078f0@mail.gmail.com>
James North <tocapicha@gmail.com> wrote:
> Hi,
>
> I'm using git-svn on a system with ISO-8859-1 encoding. The problem is
> when I try to use "git svn dcommit" to send changes to a remote svn
> (also ISO-8859-1).
>
> Seems like git-svn is sending commit messages with utf-8 (just a
> guessing...) and they look bad on the remote svn log. E.g. "Ca?\241a
> de cami?\243n"
>
> I have tried using i18n.commitencoding=ISO-8859-1 as suggested by the
> warning when doing "git svn dcommit" but messages still are sent with
> wrong encoding.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
 git-svn.perl                           |   24 ++++++++-
 t/t9129-git-svn-i18n-commitencoding.sh |   80 ++++++++++++++++++++++++++++++++
 2 files changed, 101 insertions(+), 3 deletions(-)
 create mode 100755 t/t9129-git-svn-i18n-commitencoding.sh

diff --git a/git-svn.perl b/git-svn.perl
index f90ddac..f24559c 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -1136,9 +1136,19 @@ sub get_commit_entry {
 		system($editor, $commit_editmsg);
 	}
 	rename $commit_editmsg, $commit_msg or croak $!;
-	open $log_fh, '<', $commit_msg or croak $!;
-	{ local $/; chomp($log_entry{log} = <$log_fh>); }
-	close $log_fh or croak $!;
+	{
+		# SVN requires messages to be UTF-8 when entering the repo
+		local $/;
+		open $log_fh, '<', $commit_msg or croak $!;
+		binmode $log_fh;
+		chomp($log_entry{log} = <$log_fh>);
+
+		if (my $enc = Git::config('i18n.commitencoding')) {
+			require Encode;
+			Encode::from_to($log_entry{log}, $enc, 'UTF-8');
+		}
+		close $log_fh or croak $!;
+	}
 	unlink $commit_msg;
 	\%log_entry;
 }
@@ -2273,6 +2283,14 @@ sub do_git_commit {
 	}
 	defined(my $pid = open3(my $msg_fh, my $out_fh, '>&STDERR', @exec))
 	                                                           or croak $!;
+	binmode $msg_fh;
+
+	# we always get UTF-8 from SVN, but we may want our commits in
+	# a different encoding.
+	if (my $enc = Git::config('i18n.commitencoding')) {
+		require Encode;
+		Encode::from_to($log_entry->{log}, 'UTF-8', $enc);
+	}
 	print $msg_fh $log_entry->{log} or croak $!;
 	restore_commit_header_env($old_env);
 	unless ($self->no_metadata) {
diff --git a/t/t9129-git-svn-i18n-commitencoding.sh b/t/t9129-git-svn-i18n-commitencoding.sh
new file mode 100755
index 0000000..2848e46
--- /dev/null
+++ b/t/t9129-git-svn-i18n-commitencoding.sh
@@ -0,0 +1,80 @@
+#!/bin/sh
+#
+# Copyright (c) 2008 Eric Wong
+
+test_description='git svn honors i18n.commitEncoding in config'
+
+. ./lib-git-svn.sh
+
+compare_git_head_with () {
+	nr=`wc -l < "$1"`
+	a=7
+	b=$(($a + $nr - 1))
+	git cat-file commit HEAD | sed -ne "$a,${b}p" >current &&
+	test_cmp current "$1"
+}
+
+compare_svn_head_with () {
+	LC_ALL=en_US.UTF-8 svn log --limit 1 `git svn info --url` | \
+		sed -e 1,3d -e "/^-\+\$/d" >current &&
+	test_cmp current "$1"
+}
+
+for H in ISO-8859-1 EUCJP ISO-2022-JP
+do
+	test_expect_success "$H setup" '
+		mkdir $H &&
+		svn import -m "$H test" $H "$svnrepo"/$H &&
+		git svn clone "$svnrepo"/$H $H
+	'
+done
+
+for H in ISO-8859-1 EUCJP ISO-2022-JP
+do
+	test_expect_success "$H commit on git side" '
+	(
+		cd $H &&
+		git config i18n.commitencoding $H &&
+		git checkout -b t refs/remotes/git-svn &&
+		echo $H >F &&
+		git add F &&
+		git commit -a -F "$TEST_DIRECTORY"/t3900/$H.txt &&
+		E=$(git cat-file commit HEAD | sed -ne "s/^encoding //p") &&
+		test "z$E" = "z$H"
+		compare_git_head_with "$TEST_DIRECTORY"/t3900/$H.txt
+	)
+	'
+done
+
+for H in ISO-8859-1 EUCJP ISO-2022-JP
+do
+	test_expect_success "$H dcommit to svn" '
+	(
+		cd $H &&
+		git svn dcommit &&
+		git cat-file commit HEAD | grep git-svn-id: &&
+		E=$(git cat-file commit HEAD | sed -ne "s/^encoding //p") &&
+		test "z$E" = "z$H" &&
+		compare_git_head_with "$TEST_DIRECTORY"/t3900/$H.txt
+	)
+	'
+done
+
+test_expect_success 'ISO-8859-1 should match UTF-8 in svn' '
+(
+	cd ISO-8859-1 &&
+	compare_svn_head_with "$TEST_DIRECTORY"/t3900/1-UTF-8.txt
+)
+'
+
+for H in EUCJP ISO-2022-JP
+do
+	test_expect_success '$H should match UTF-8 in svn' '
+	(
+		cd $H &&
+		compare_svn_head_with "$TEST_DIRECTORY"/t3900/2-UTF-8.txt
+	)
+	'
+done
+
+test_done
-- 
Eric Wong

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Encoding problems using git-svn
  2008-10-30  7:41 ` Eric Wong
@ 2008-10-30 15:14   ` James North
  2008-11-02  9:48     ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: James North @ 2008-10-30 15:14 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

Hi Eric,

Don't worry about not seeing the patch and thanks for the answer :)

Your patch works great.

Messages appear without problems on "svn log" and "git log", I haven't
found any gotcha that I know of.

The weird thing is that this problem was not found by anyone before, I
guessed there should be some people with a setup similar to mine.

Thanks again.

On Thu, Oct 30, 2008 at 8:41 AM, Eric Wong <normalperson@yhbt.net> wrote:
> Hi James,
>
> I saw your other patch too late, I had already started working on my
> patch earlier today but got distracted by other things (being at
> GitTogether :) and lacked a stable Internet connection afterwards.
>
> Anyways, here's my version, it handles the case where the user specifies
> the --edit option to interactively edit the commit message before
> committing; and also reencodes the messages when fetching from SVN.
>
> Can you let me know if it works for you?
>
> Note: I'll be in transit tomorrow and may not have time to follow
> up on this until Saturday.
>
> From 84f003e0c39414ebf27a98de167643e95bed6abb Mon Sep 17 00:00:00 2001
> From: Eric Wong <normalperson@yhbt.net>
> Date: Wed, 29 Oct 2008 23:49:26 -0700
> Subject: [PATCH] git-svn: respect i18n.commitencoding config
>
> SVN itself always stores log messages in the repository as
> UTF-8.  git always stores/retrieves everything as raw binary
> data with no transformations whatsoever.
>
> To interact with SVN, we need to encode log messages as UTF-8
> before sending them to SVN, as SVN cannot do it for us.  When
> retrieving log messages from SVN, we also need to (attempt to)
> reencode the UTF-8 log message back to the user-specified commit
> encoding.
>
> Note, handling i18n.logoutputencoding for "git svn log" also
> needs to be done in a future change.
>
> Also, this change only deals with the encoding of commit
> messages and nothing else (path names, blob content, ...).
>
> In-Reply-To: <8b168cfb0810282014r789ac01dnec51824de1078f0@mail.gmail.com>
> James North <tocapicha@gmail.com> wrote:
>> Hi,
>>
>> I'm using git-svn on a system with ISO-8859-1 encoding. The problem is
>> when I try to use "git svn dcommit" to send changes to a remote svn
>> (also ISO-8859-1).
>>
>> Seems like git-svn is sending commit messages with utf-8 (just a
>> guessing...) and they look bad on the remote svn log. E.g. "Ca?\241a
>> de cami?\243n"
>>
>> I have tried using i18n.commitencoding=ISO-8859-1 as suggested by the
>> warning when doing "git svn dcommit" but messages still are sent with
>> wrong encoding.
>
> Signed-off-by: Eric Wong <normalperson@yhbt.net>
> ---
>  git-svn.perl                           |   24 ++++++++-
>  t/t9129-git-svn-i18n-commitencoding.sh |   80 ++++++++++++++++++++++++++++++++
>  2 files changed, 101 insertions(+), 3 deletions(-)
>  create mode 100755 t/t9129-git-svn-i18n-commitencoding.sh
>
> diff --git a/git-svn.perl b/git-svn.perl
> index f90ddac..f24559c 100755
> --- a/git-svn.perl
> +++ b/git-svn.perl
> @@ -1136,9 +1136,19 @@ sub get_commit_entry {
>                system($editor, $commit_editmsg);
>        }
>        rename $commit_editmsg, $commit_msg or croak $!;
> -       open $log_fh, '<', $commit_msg or croak $!;
> -       { local $/; chomp($log_entry{log} = <$log_fh>); }
> -       close $log_fh or croak $!;
> +       {
> +               # SVN requires messages to be UTF-8 when entering the repo
> +               local $/;
> +               open $log_fh, '<', $commit_msg or croak $!;
> +               binmode $log_fh;
> +               chomp($log_entry{log} = <$log_fh>);
> +
> +               if (my $enc = Git::config('i18n.commitencoding')) {
> +                       require Encode;
> +                       Encode::from_to($log_entry{log}, $enc, 'UTF-8');
> +               }
> +               close $log_fh or croak $!;
> +       }
>        unlink $commit_msg;
>        \%log_entry;
>  }
> @@ -2273,6 +2283,14 @@ sub do_git_commit {
>        }
>        defined(my $pid = open3(my $msg_fh, my $out_fh, '>&STDERR', @exec))
>                                                                   or croak $!;
> +       binmode $msg_fh;
> +
> +       # we always get UTF-8 from SVN, but we may want our commits in
> +       # a different encoding.
> +       if (my $enc = Git::config('i18n.commitencoding')) {
> +               require Encode;
> +               Encode::from_to($log_entry->{log}, 'UTF-8', $enc);
> +       }
>        print $msg_fh $log_entry->{log} or croak $!;
>        restore_commit_header_env($old_env);
>        unless ($self->no_metadata) {
> diff --git a/t/t9129-git-svn-i18n-commitencoding.sh b/t/t9129-git-svn-i18n-commitencoding.sh
> new file mode 100755
> index 0000000..2848e46
> --- /dev/null
> +++ b/t/t9129-git-svn-i18n-commitencoding.sh
> @@ -0,0 +1,80 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2008 Eric Wong
> +
> +test_description='git svn honors i18n.commitEncoding in config'
> +
> +. ./lib-git-svn.sh
> +
> +compare_git_head_with () {
> +       nr=`wc -l < "$1"`
> +       a=7
> +       b=$(($a + $nr - 1))
> +       git cat-file commit HEAD | sed -ne "$a,${b}p" >current &&
> +       test_cmp current "$1"
> +}
> +
> +compare_svn_head_with () {
> +       LC_ALL=en_US.UTF-8 svn log --limit 1 `git svn info --url` | \
> +               sed -e 1,3d -e "/^-\+\$/d" >current &&
> +       test_cmp current "$1"
> +}
> +
> +for H in ISO-8859-1 EUCJP ISO-2022-JP
> +do
> +       test_expect_success "$H setup" '
> +               mkdir $H &&
> +               svn import -m "$H test" $H "$svnrepo"/$H &&
> +               git svn clone "$svnrepo"/$H $H
> +       '
> +done
> +
> +for H in ISO-8859-1 EUCJP ISO-2022-JP
> +do
> +       test_expect_success "$H commit on git side" '
> +       (
> +               cd $H &&
> +               git config i18n.commitencoding $H &&
> +               git checkout -b t refs/remotes/git-svn &&
> +               echo $H >F &&
> +               git add F &&
> +               git commit -a -F "$TEST_DIRECTORY"/t3900/$H.txt &&
> +               E=$(git cat-file commit HEAD | sed -ne "s/^encoding //p") &&
> +               test "z$E" = "z$H"
> +               compare_git_head_with "$TEST_DIRECTORY"/t3900/$H.txt
> +       )
> +       '
> +done
> +
> +for H in ISO-8859-1 EUCJP ISO-2022-JP
> +do
> +       test_expect_success "$H dcommit to svn" '
> +       (
> +               cd $H &&
> +               git svn dcommit &&
> +               git cat-file commit HEAD | grep git-svn-id: &&
> +               E=$(git cat-file commit HEAD | sed -ne "s/^encoding //p") &&
> +               test "z$E" = "z$H" &&
> +               compare_git_head_with "$TEST_DIRECTORY"/t3900/$H.txt
> +       )
> +       '
> +done
> +
> +test_expect_success 'ISO-8859-1 should match UTF-8 in svn' '
> +(
> +       cd ISO-8859-1 &&
> +       compare_svn_head_with "$TEST_DIRECTORY"/t3900/1-UTF-8.txt
> +)
> +'
> +
> +for H in EUCJP ISO-2022-JP
> +do
> +       test_expect_success '$H should match UTF-8 in svn' '
> +       (
> +               cd $H &&
> +               compare_svn_head_with "$TEST_DIRECTORY"/t3900/2-UTF-8.txt
> +       )
> +       '
> +done
> +
> +test_done
> --
> Eric Wong
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Encoding problems using git-svn
  2008-10-30 15:14   ` James North
@ 2008-11-02  9:48     ` Eric Wong
  2008-11-02 13:45       ` Robin Rosenberg
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2008-11-02  9:48 UTC (permalink / raw)
  To: James North, Junio C Hamano; +Cc: git

James North <tocapicha@gmail.com> wrote:
> Hi Eric,
> 
> Don't worry about not seeing the patch and thanks for the answer :)
> 
> Your patch works great.
> 
> Messages appear without problems on "svn log" and "git log", I haven't
> found any gotcha that I know of.

Thanks for the confirmation.

> The weird thing is that this problem was not found by anyone before, I
> guessed there should be some people with a setup similar to mine.

Squeaky wheel gets the grease :)

Honestly, I think most folks have just moved onto UTF-8 entirely and
left legacy encodings behind.  Especially people using modern tools like
git (along with SVN enforcing UTF-8 at the repository/protocol level).


Junio:

I've pushed the following out to git://git.bogomips.org/git-svn.git:

Eric Wong (2):
      git-svn: don't escape tilde ('~') for http(s) URLs
      git-svn: respect i18n.commitencoding config

I'll try to get around to the more robust escaping checks
and splitting out the monolithic git-svn.perl source next
week.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Encoding problems using git-svn
  2008-11-02  9:48     ` Eric Wong
@ 2008-11-02 13:45       ` Robin Rosenberg
  0 siblings, 0 replies; 6+ messages in thread
From: Robin Rosenberg @ 2008-11-02 13:45 UTC (permalink / raw)
  To: Eric Wong; +Cc: James North, Junio C Hamano, git

On söndag 02 november 2008 10:48 Eric Wong wrote:
> James North <tocapicha@gmail.com> wrote:
> > Hi Eric,
> >
> > Don't worry about not seeing the patch and thanks for the answer :)
> >
> > Your patch works great.
> >
> > Messages appear without problems on "svn log" and "git log", I haven't
> > found any gotcha that I know of.
>
> Thanks for the confirmation.
>
> > The weird thing is that this problem was not found by anyone before, I
> > guessed there should be some people with a setup similar to mine.
>
> Squeaky wheel gets the grease :)
>
> Honestly, I think most folks have just moved onto UTF-8 entirely and
> left legacy encodings behind.  Especially people using modern tools like
> git (along with SVN enforcing UTF-8 at the repository/protocol level).

"Most" people don't have a legacy encoding problem, but some of us do and 
tools that help with migration by enforcing UTF-8 internally help. SVN is such
an example, though not very helpful as an SCM. That way we can still use 
legacy encodings for old stupid tools until we can move to an all UTF-8 world. 
We're not there yet, but in a few years hopefully. That's when it's sad that 
the git command line for example still enforce the legacy encoding. Some 
GUI's, like git gui, jgit and probably a few others help by recoding when 
necessary.

-- robiin

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-11-02 13:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-29  3:14 Encoding problems using git-svn James North
2008-10-30  3:28 ` James North
2008-10-30  7:41 ` Eric Wong
2008-10-30 15:14   ` James North
2008-11-02  9:48     ` Eric Wong
2008-11-02 13:45       ` Robin Rosenberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).