* Encoding problems using git-svn @ 2008-10-29 3:14 James North 2008-10-30 3:28 ` James North 2008-10-30 7:41 ` Eric Wong 0 siblings, 2 replies; 6+ messages in thread From: James North @ 2008-10-29 3:14 UTC (permalink / raw) To: git Hi, I'm using git-svn on a system with ISO-8859-1 encoding. The problem is when I try to use "git svn dcommit" to send changes to a remote svn (also ISO-8859-1). Seems like git-svn is sending commit messages with utf-8 (just a guessing...) and they look bad on the remote svn log. E.g. "Ca?\241a de cami?\243n" I have tried using i18n.commitencoding=ISO-8859-1 as suggested by the warning when doing "git svn dcommit" but messages still are sent with wrong encoding. I'm mising something? Thanks everyone ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Encoding problems using git-svn 2008-10-29 3:14 Encoding problems using git-svn James North @ 2008-10-30 3:28 ` James North 2008-10-30 7:41 ` Eric Wong 1 sibling, 0 replies; 6+ messages in thread From: James North @ 2008-10-30 3:28 UTC (permalink / raw) To: git Ok, I made a quick change in git-svn script and seems like is working now in my system with locale set to iso-8859-1. Dunno if this is the right place to post this, but I hope someone knowledgeable see this and tells if this would work as a general fix. This patch is against 1.6.0.2 --- git-svn 2008-09-15 13:04:46.000000000 +0200 +++ git-svn.mine 2008-10-30 04:21:09.000000000 +0100 @@ -43,6 +43,7 @@ use Getopt::Long qw/:config gnu_getopt no_ignore_case auto_abbrev/; use IPC::Open3; use Git; +use Encode; BEGIN { # import functions from Git into our packages, en masse @@ -1061,6 +1062,7 @@ && !$saw_from) { $msgbuf .= "\n\nFrom: $author"; } + $msgbuf = encode("utf8", $msgbuf); print $log_fh $msgbuf or croak $!; command_close_pipe($msg_fh, $ctx); } On Wed, Oct 29, 2008 at 4:14 AM, James North <tocapicha@gmail.com> wrote: > Hi, > > I'm using git-svn on a system with ISO-8859-1 encoding. The problem is > when I try to use "git svn dcommit" to send changes to a remote svn > (also ISO-8859-1). > > Seems like git-svn is sending commit messages with utf-8 (just a > guessing...) and they look bad on the remote svn log. E.g. "Ca?\241a > de cami?\243n" > > I have tried using i18n.commitencoding=ISO-8859-1 as suggested by the > warning when doing "git svn dcommit" but messages still are sent with > wrong encoding. > > I'm mising something? > > Thanks everyone > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Encoding problems using git-svn 2008-10-29 3:14 Encoding problems using git-svn James North 2008-10-30 3:28 ` James North @ 2008-10-30 7:41 ` Eric Wong 2008-10-30 15:14 ` James North 1 sibling, 1 reply; 6+ messages in thread From: Eric Wong @ 2008-10-30 7:41 UTC (permalink / raw) To: James North; +Cc: git Hi James, I saw your other patch too late, I had already started working on my patch earlier today but got distracted by other things (being at GitTogether :) and lacked a stable Internet connection afterwards. Anyways, here's my version, it handles the case where the user specifies the --edit option to interactively edit the commit message before committing; and also reencodes the messages when fetching from SVN. Can you let me know if it works for you? Note: I'll be in transit tomorrow and may not have time to follow up on this until Saturday. >From 84f003e0c39414ebf27a98de167643e95bed6abb Mon Sep 17 00:00:00 2001 From: Eric Wong <normalperson@yhbt.net> Date: Wed, 29 Oct 2008 23:49:26 -0700 Subject: [PATCH] git-svn: respect i18n.commitencoding config SVN itself always stores log messages in the repository as UTF-8. git always stores/retrieves everything as raw binary data with no transformations whatsoever. To interact with SVN, we need to encode log messages as UTF-8 before sending them to SVN, as SVN cannot do it for us. When retrieving log messages from SVN, we also need to (attempt to) reencode the UTF-8 log message back to the user-specified commit encoding. Note, handling i18n.logoutputencoding for "git svn log" also needs to be done in a future change. Also, this change only deals with the encoding of commit messages and nothing else (path names, blob content, ...). In-Reply-To: <8b168cfb0810282014r789ac01dnec51824de1078f0@mail.gmail.com> James North <tocapicha@gmail.com> wrote: > Hi, > > I'm using git-svn on a system with ISO-8859-1 encoding. The problem is > when I try to use "git svn dcommit" to send changes to a remote svn > (also ISO-8859-1). > > Seems like git-svn is sending commit messages with utf-8 (just a > guessing...) and they look bad on the remote svn log. E.g. "Ca?\241a > de cami?\243n" > > I have tried using i18n.commitencoding=ISO-8859-1 as suggested by the > warning when doing "git svn dcommit" but messages still are sent with > wrong encoding. Signed-off-by: Eric Wong <normalperson@yhbt.net> --- git-svn.perl | 24 ++++++++- t/t9129-git-svn-i18n-commitencoding.sh | 80 ++++++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+), 3 deletions(-) create mode 100755 t/t9129-git-svn-i18n-commitencoding.sh diff --git a/git-svn.perl b/git-svn.perl index f90ddac..f24559c 100755 --- a/git-svn.perl +++ b/git-svn.perl @@ -1136,9 +1136,19 @@ sub get_commit_entry { system($editor, $commit_editmsg); } rename $commit_editmsg, $commit_msg or croak $!; - open $log_fh, '<', $commit_msg or croak $!; - { local $/; chomp($log_entry{log} = <$log_fh>); } - close $log_fh or croak $!; + { + # SVN requires messages to be UTF-8 when entering the repo + local $/; + open $log_fh, '<', $commit_msg or croak $!; + binmode $log_fh; + chomp($log_entry{log} = <$log_fh>); + + if (my $enc = Git::config('i18n.commitencoding')) { + require Encode; + Encode::from_to($log_entry{log}, $enc, 'UTF-8'); + } + close $log_fh or croak $!; + } unlink $commit_msg; \%log_entry; } @@ -2273,6 +2283,14 @@ sub do_git_commit { } defined(my $pid = open3(my $msg_fh, my $out_fh, '>&STDERR', @exec)) or croak $!; + binmode $msg_fh; + + # we always get UTF-8 from SVN, but we may want our commits in + # a different encoding. + if (my $enc = Git::config('i18n.commitencoding')) { + require Encode; + Encode::from_to($log_entry->{log}, 'UTF-8', $enc); + } print $msg_fh $log_entry->{log} or croak $!; restore_commit_header_env($old_env); unless ($self->no_metadata) { diff --git a/t/t9129-git-svn-i18n-commitencoding.sh b/t/t9129-git-svn-i18n-commitencoding.sh new file mode 100755 index 0000000..2848e46 --- /dev/null +++ b/t/t9129-git-svn-i18n-commitencoding.sh @@ -0,0 +1,80 @@ +#!/bin/sh +# +# Copyright (c) 2008 Eric Wong + +test_description='git svn honors i18n.commitEncoding in config' + +. ./lib-git-svn.sh + +compare_git_head_with () { + nr=`wc -l < "$1"` + a=7 + b=$(($a + $nr - 1)) + git cat-file commit HEAD | sed -ne "$a,${b}p" >current && + test_cmp current "$1" +} + +compare_svn_head_with () { + LC_ALL=en_US.UTF-8 svn log --limit 1 `git svn info --url` | \ + sed -e 1,3d -e "/^-\+\$/d" >current && + test_cmp current "$1" +} + +for H in ISO-8859-1 EUCJP ISO-2022-JP +do + test_expect_success "$H setup" ' + mkdir $H && + svn import -m "$H test" $H "$svnrepo"/$H && + git svn clone "$svnrepo"/$H $H + ' +done + +for H in ISO-8859-1 EUCJP ISO-2022-JP +do + test_expect_success "$H commit on git side" ' + ( + cd $H && + git config i18n.commitencoding $H && + git checkout -b t refs/remotes/git-svn && + echo $H >F && + git add F && + git commit -a -F "$TEST_DIRECTORY"/t3900/$H.txt && + E=$(git cat-file commit HEAD | sed -ne "s/^encoding //p") && + test "z$E" = "z$H" + compare_git_head_with "$TEST_DIRECTORY"/t3900/$H.txt + ) + ' +done + +for H in ISO-8859-1 EUCJP ISO-2022-JP +do + test_expect_success "$H dcommit to svn" ' + ( + cd $H && + git svn dcommit && + git cat-file commit HEAD | grep git-svn-id: && + E=$(git cat-file commit HEAD | sed -ne "s/^encoding //p") && + test "z$E" = "z$H" && + compare_git_head_with "$TEST_DIRECTORY"/t3900/$H.txt + ) + ' +done + +test_expect_success 'ISO-8859-1 should match UTF-8 in svn' ' +( + cd ISO-8859-1 && + compare_svn_head_with "$TEST_DIRECTORY"/t3900/1-UTF-8.txt +) +' + +for H in EUCJP ISO-2022-JP +do + test_expect_success '$H should match UTF-8 in svn' ' + ( + cd $H && + compare_svn_head_with "$TEST_DIRECTORY"/t3900/2-UTF-8.txt + ) + ' +done + +test_done -- Eric Wong ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Encoding problems using git-svn 2008-10-30 7:41 ` Eric Wong @ 2008-10-30 15:14 ` James North 2008-11-02 9:48 ` Eric Wong 0 siblings, 1 reply; 6+ messages in thread From: James North @ 2008-10-30 15:14 UTC (permalink / raw) To: Eric Wong; +Cc: git Hi Eric, Don't worry about not seeing the patch and thanks for the answer :) Your patch works great. Messages appear without problems on "svn log" and "git log", I haven't found any gotcha that I know of. The weird thing is that this problem was not found by anyone before, I guessed there should be some people with a setup similar to mine. Thanks again. On Thu, Oct 30, 2008 at 8:41 AM, Eric Wong <normalperson@yhbt.net> wrote: > Hi James, > > I saw your other patch too late, I had already started working on my > patch earlier today but got distracted by other things (being at > GitTogether :) and lacked a stable Internet connection afterwards. > > Anyways, here's my version, it handles the case where the user specifies > the --edit option to interactively edit the commit message before > committing; and also reencodes the messages when fetching from SVN. > > Can you let me know if it works for you? > > Note: I'll be in transit tomorrow and may not have time to follow > up on this until Saturday. > > From 84f003e0c39414ebf27a98de167643e95bed6abb Mon Sep 17 00:00:00 2001 > From: Eric Wong <normalperson@yhbt.net> > Date: Wed, 29 Oct 2008 23:49:26 -0700 > Subject: [PATCH] git-svn: respect i18n.commitencoding config > > SVN itself always stores log messages in the repository as > UTF-8. git always stores/retrieves everything as raw binary > data with no transformations whatsoever. > > To interact with SVN, we need to encode log messages as UTF-8 > before sending them to SVN, as SVN cannot do it for us. When > retrieving log messages from SVN, we also need to (attempt to) > reencode the UTF-8 log message back to the user-specified commit > encoding. > > Note, handling i18n.logoutputencoding for "git svn log" also > needs to be done in a future change. > > Also, this change only deals with the encoding of commit > messages and nothing else (path names, blob content, ...). > > In-Reply-To: <8b168cfb0810282014r789ac01dnec51824de1078f0@mail.gmail.com> > James North <tocapicha@gmail.com> wrote: >> Hi, >> >> I'm using git-svn on a system with ISO-8859-1 encoding. The problem is >> when I try to use "git svn dcommit" to send changes to a remote svn >> (also ISO-8859-1). >> >> Seems like git-svn is sending commit messages with utf-8 (just a >> guessing...) and they look bad on the remote svn log. E.g. "Ca?\241a >> de cami?\243n" >> >> I have tried using i18n.commitencoding=ISO-8859-1 as suggested by the >> warning when doing "git svn dcommit" but messages still are sent with >> wrong encoding. > > Signed-off-by: Eric Wong <normalperson@yhbt.net> > --- > git-svn.perl | 24 ++++++++- > t/t9129-git-svn-i18n-commitencoding.sh | 80 ++++++++++++++++++++++++++++++++ > 2 files changed, 101 insertions(+), 3 deletions(-) > create mode 100755 t/t9129-git-svn-i18n-commitencoding.sh > > diff --git a/git-svn.perl b/git-svn.perl > index f90ddac..f24559c 100755 > --- a/git-svn.perl > +++ b/git-svn.perl > @@ -1136,9 +1136,19 @@ sub get_commit_entry { > system($editor, $commit_editmsg); > } > rename $commit_editmsg, $commit_msg or croak $!; > - open $log_fh, '<', $commit_msg or croak $!; > - { local $/; chomp($log_entry{log} = <$log_fh>); } > - close $log_fh or croak $!; > + { > + # SVN requires messages to be UTF-8 when entering the repo > + local $/; > + open $log_fh, '<', $commit_msg or croak $!; > + binmode $log_fh; > + chomp($log_entry{log} = <$log_fh>); > + > + if (my $enc = Git::config('i18n.commitencoding')) { > + require Encode; > + Encode::from_to($log_entry{log}, $enc, 'UTF-8'); > + } > + close $log_fh or croak $!; > + } > unlink $commit_msg; > \%log_entry; > } > @@ -2273,6 +2283,14 @@ sub do_git_commit { > } > defined(my $pid = open3(my $msg_fh, my $out_fh, '>&STDERR', @exec)) > or croak $!; > + binmode $msg_fh; > + > + # we always get UTF-8 from SVN, but we may want our commits in > + # a different encoding. > + if (my $enc = Git::config('i18n.commitencoding')) { > + require Encode; > + Encode::from_to($log_entry->{log}, 'UTF-8', $enc); > + } > print $msg_fh $log_entry->{log} or croak $!; > restore_commit_header_env($old_env); > unless ($self->no_metadata) { > diff --git a/t/t9129-git-svn-i18n-commitencoding.sh b/t/t9129-git-svn-i18n-commitencoding.sh > new file mode 100755 > index 0000000..2848e46 > --- /dev/null > +++ b/t/t9129-git-svn-i18n-commitencoding.sh > @@ -0,0 +1,80 @@ > +#!/bin/sh > +# > +# Copyright (c) 2008 Eric Wong > + > +test_description='git svn honors i18n.commitEncoding in config' > + > +. ./lib-git-svn.sh > + > +compare_git_head_with () { > + nr=`wc -l < "$1"` > + a=7 > + b=$(($a + $nr - 1)) > + git cat-file commit HEAD | sed -ne "$a,${b}p" >current && > + test_cmp current "$1" > +} > + > +compare_svn_head_with () { > + LC_ALL=en_US.UTF-8 svn log --limit 1 `git svn info --url` | \ > + sed -e 1,3d -e "/^-\+\$/d" >current && > + test_cmp current "$1" > +} > + > +for H in ISO-8859-1 EUCJP ISO-2022-JP > +do > + test_expect_success "$H setup" ' > + mkdir $H && > + svn import -m "$H test" $H "$svnrepo"/$H && > + git svn clone "$svnrepo"/$H $H > + ' > +done > + > +for H in ISO-8859-1 EUCJP ISO-2022-JP > +do > + test_expect_success "$H commit on git side" ' > + ( > + cd $H && > + git config i18n.commitencoding $H && > + git checkout -b t refs/remotes/git-svn && > + echo $H >F && > + git add F && > + git commit -a -F "$TEST_DIRECTORY"/t3900/$H.txt && > + E=$(git cat-file commit HEAD | sed -ne "s/^encoding //p") && > + test "z$E" = "z$H" > + compare_git_head_with "$TEST_DIRECTORY"/t3900/$H.txt > + ) > + ' > +done > + > +for H in ISO-8859-1 EUCJP ISO-2022-JP > +do > + test_expect_success "$H dcommit to svn" ' > + ( > + cd $H && > + git svn dcommit && > + git cat-file commit HEAD | grep git-svn-id: && > + E=$(git cat-file commit HEAD | sed -ne "s/^encoding //p") && > + test "z$E" = "z$H" && > + compare_git_head_with "$TEST_DIRECTORY"/t3900/$H.txt > + ) > + ' > +done > + > +test_expect_success 'ISO-8859-1 should match UTF-8 in svn' ' > +( > + cd ISO-8859-1 && > + compare_svn_head_with "$TEST_DIRECTORY"/t3900/1-UTF-8.txt > +) > +' > + > +for H in EUCJP ISO-2022-JP > +do > + test_expect_success '$H should match UTF-8 in svn' ' > + ( > + cd $H && > + compare_svn_head_with "$TEST_DIRECTORY"/t3900/2-UTF-8.txt > + ) > + ' > +done > + > +test_done > -- > Eric Wong > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Encoding problems using git-svn 2008-10-30 15:14 ` James North @ 2008-11-02 9:48 ` Eric Wong 2008-11-02 13:45 ` Robin Rosenberg 0 siblings, 1 reply; 6+ messages in thread From: Eric Wong @ 2008-11-02 9:48 UTC (permalink / raw) To: James North, Junio C Hamano; +Cc: git James North <tocapicha@gmail.com> wrote: > Hi Eric, > > Don't worry about not seeing the patch and thanks for the answer :) > > Your patch works great. > > Messages appear without problems on "svn log" and "git log", I haven't > found any gotcha that I know of. Thanks for the confirmation. > The weird thing is that this problem was not found by anyone before, I > guessed there should be some people with a setup similar to mine. Squeaky wheel gets the grease :) Honestly, I think most folks have just moved onto UTF-8 entirely and left legacy encodings behind. Especially people using modern tools like git (along with SVN enforcing UTF-8 at the repository/protocol level). Junio: I've pushed the following out to git://git.bogomips.org/git-svn.git: Eric Wong (2): git-svn: don't escape tilde ('~') for http(s) URLs git-svn: respect i18n.commitencoding config I'll try to get around to the more robust escaping checks and splitting out the monolithic git-svn.perl source next week. -- Eric Wong ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Encoding problems using git-svn 2008-11-02 9:48 ` Eric Wong @ 2008-11-02 13:45 ` Robin Rosenberg 0 siblings, 0 replies; 6+ messages in thread From: Robin Rosenberg @ 2008-11-02 13:45 UTC (permalink / raw) To: Eric Wong; +Cc: James North, Junio C Hamano, git On söndag 02 november 2008 10:48 Eric Wong wrote: > James North <tocapicha@gmail.com> wrote: > > Hi Eric, > > > > Don't worry about not seeing the patch and thanks for the answer :) > > > > Your patch works great. > > > > Messages appear without problems on "svn log" and "git log", I haven't > > found any gotcha that I know of. > > Thanks for the confirmation. > > > The weird thing is that this problem was not found by anyone before, I > > guessed there should be some people with a setup similar to mine. > > Squeaky wheel gets the grease :) > > Honestly, I think most folks have just moved onto UTF-8 entirely and > left legacy encodings behind. Especially people using modern tools like > git (along with SVN enforcing UTF-8 at the repository/protocol level). "Most" people don't have a legacy encoding problem, but some of us do and tools that help with migration by enforcing UTF-8 internally help. SVN is such an example, though not very helpful as an SCM. That way we can still use legacy encodings for old stupid tools until we can move to an all UTF-8 world. We're not there yet, but in a few years hopefully. That's when it's sad that the git command line for example still enforce the legacy encoding. Some GUI's, like git gui, jgit and probably a few others help by recoding when necessary. -- robiin ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-11-02 13:47 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-29 3:14 Encoding problems using git-svn James North 2008-10-30 3:28 ` James North 2008-10-30 7:41 ` Eric Wong 2008-10-30 15:14 ` James North 2008-11-02 9:48 ` Eric Wong 2008-11-02 13:45 ` Robin Rosenberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).