* Non-ASCII paths and git-cvsserver @ 2006-11-09 11:11 sf 2006-11-10 18:59 ` Martin Langhoff 2006-11-10 19:49 ` Junio C Hamano 0 siblings, 2 replies; 11+ messages in thread From: sf @ 2006-11-09 11:11 UTC (permalink / raw) To: git Hello, I want to access a git repository via git-cvsserver. The problem is that the repository contains paths with umlauts. These paths come out quoted and escaped when checked out with cvs. Test case: #! /bin/sh set -e -u -x WORK='/tmp/gittest' FILE=$'\303\244' mkdir "${WORK}" mkdir "${WORK}/git" #trap 'rm -r "${WORK}"' EXIT cd "${WORK}/git" git init-db git repo-config gitcvs.enabled 1 git repo-config gitcvs.logfile "${WORK}/git/.git/cvslog.txt" touch "${FILE}" git add "${FILE}" git commit -a -mx cd "${WORK}" CVS_SERVER='git-cvsserver' export CVS_SERVER cvs -d ":fork:${WORK}/git/.git" co master ls master ### end This is what I get: + WORK=/tmp/gittest + FILE=$'\303\244' + mkdir /tmp/gittest + mkdir /tmp/gittest/git + cd /tmp/gittest/git + git init-db defaulting to local storage area + git repo-config gitcvs.enabled 1 + git repo-config gitcvs.logfile /tmp/gittest/git/.git/cvslog.txt + touch $'\303\244' + git add $'\303\244' + git commit -a -mx Committing initial tree 23d6145738bba135994775c19d6e8ae707d399ee + cd /tmp/gittest + CVS_SERVER=git-cvsserver + export CVS_SERVER + cvs -d :fork:/tmp/gittest/git/.git co master cvs checkout: Updating master U master/"\303\244" + ls master "\303\244" CVS I do not speak perl so can anyone help? Regards Stephan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Non-ASCII paths and git-cvsserver 2006-11-09 11:11 Non-ASCII paths and git-cvsserver sf @ 2006-11-10 18:59 ` Martin Langhoff 2006-11-10 19:49 ` Junio C Hamano 1 sibling, 0 replies; 11+ messages in thread From: Martin Langhoff @ 2006-11-10 18:59 UTC (permalink / raw) To: sf; +Cc: git On 11/9/06, sf <sf@b-i-t.de> wrote: > I want to access a git repository via git-cvsserver. The problem is that > the repository contains paths with umlauts. These paths come out quoted > and escaped when checked out with cvs. Thanks for the detailed report! I am travelling right now, so with "high latency" and on a machine that's missing sqlite libs :-/ But I'll give it a go anyway. Does this mini-patch help? You'll need Perl 5.8.x and probably a recent SQLite for this. diff --git a/git-cvsserver.perl b/git-cvsserver.perl index 8817f8b..c534de5 100755 --- a/git-cvsserver.perl +++ b/git-cvsserver.perl @@ -22,6 +22,9 @@ use Fcntl; use File::Temp qw/tempdir tempfile/; use File::Basename; +binmode(STDIN, ':utf8'); +binmode(STDOUT, ':utf8'); + my $log = GITCVS::log->new(); my $cfg; @@ -2104,6 +2107,11 @@ sub new $self->{tables}{$table} = 1; } + # this will set the encoding for new DBs + # or return false for existing DBs that are not + # utf-8 + $self->{dbh}->do('PRAGMA encoding = "UTF-8"'); + # Construct the revision table if required unless ( $self->{tables}{revision} ) ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Non-ASCII paths and git-cvsserver 2006-11-09 11:11 Non-ASCII paths and git-cvsserver sf 2006-11-10 18:59 ` Martin Langhoff @ 2006-11-10 19:49 ` Junio C Hamano 2006-11-13 13:58 ` sf 1 sibling, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2006-11-10 19:49 UTC (permalink / raw) To: sf; +Cc: git, Martin Langhoff sf <sf@b-i-t.de> writes: > I want to access a git repository via git-cvsserver. The problem is > that the repository contains paths with umlauts. These paths come out > quoted and escaped when checked out with cvs. I think this is because the cvsserver invokes diff-tree and ls-tree without -z and the output from these command quote non-ascii letters as unsafe. Martin's sqlite may probably be needed as well, but regardless of that something like this patch is needed -- otherwise what populates sqlite database will be quoted to begin with so it would not help much. I've tested with your reproduction recipe, but otherwise not tested this patch. -- >8 -- diff --git a/git-cvsserver.perl b/git-cvsserver.perl index 8817f8b..ca519b7 100755 --- a/git-cvsserver.perl +++ b/git-cvsserver.perl @@ -2343,67 +2343,72 @@ sub update if ( defined ( $lastpicked ) ) { - my $filepipe = open(FILELIST, '-|', 'git-diff-tree', '-r', $lastpicked, $commit->{hash}) or die("Cannot call git-diff-tree : $!"); + my $filepipe = open(FILELIST, '-|', 'git-diff-tree', '-z', '-r', $lastpicked, $commit->{hash}) or die("Cannot call git-diff-tree : $!"); + local ($/) = "\0"; while ( <FILELIST> ) { - unless ( /^:\d{6}\s+\d{3}(\d)\d{2}\s+[a-zA-Z0-9]{40}\s+([a-zA-Z0-9]{40})\s+(\w)\s+(.*)$/o ) + chomp; + unless ( /^:\d{6}\s+\d{3}(\d)\d{2}\s+[a-zA-Z0-9]{40}\s+([a-zA-Z0-9]{40})\s+(\w)$/o ) { die("Couldn't process git-diff-tree line : $_"); } + my ($mode, $hash, $change) = ($1, $2, $3); + my $name = <FILELIST>; + chomp($name); - # $log->debug("File mode=$1, hash=$2, change=$3, name=$4"); + # $log->debug("File mode=$mode, hash=$hash, change=$change, name=$name"); my $git_perms = ""; - $git_perms .= "r" if ( $1 & 4 ); - $git_perms .= "w" if ( $1 & 2 ); - $git_perms .= "x" if ( $1 & 1 ); + $git_perms .= "r" if ( $mode & 4 ); + $git_perms .= "w" if ( $mode & 2 ); + $git_perms .= "x" if ( $mode & 1 ); $git_perms = "rw" if ( $git_perms eq "" ); - if ( $3 eq "D" ) + if ( $change eq "D" ) { - #$log->debug("DELETE $4"); - $head->{$4} = { - name => $4, - revision => $head->{$4}{revision} + 1, + #$log->debug("DELETE $name"); + $head->{$name} = { + name => $name, + revision => $head->{$name}{revision} + 1, filehash => "deleted", commithash => $commit->{hash}, modified => $commit->{date}, author => $commit->{author}, mode => $git_perms, }; - $self->insert_rev($4, $head->{$4}{revision}, $2, $commit->{hash}, $commit->{date}, $commit->{author}, $git_perms); + $self->insert_rev($name, $head->{$name}{revision}, $hash, $commit->{hash}, $commit->{date}, $commit->{author}, $git_perms); } - elsif ( $3 eq "M" ) + elsif ( $change eq "M" ) { - #$log->debug("MODIFIED $4"); - $head->{$4} = { - name => $4, - revision => $head->{$4}{revision} + 1, - filehash => $2, + #$log->debug("MODIFIED $name"); + $head->{$name} = { + name => $name, + revision => $head->{$name}{revision} + 1, + filehash => $hash, commithash => $commit->{hash}, modified => $commit->{date}, author => $commit->{author}, mode => $git_perms, }; - $self->insert_rev($4, $head->{$4}{revision}, $2, $commit->{hash}, $commit->{date}, $commit->{author}, $git_perms); + $self->insert_rev($name, $head->{$name}{revision}, $hash, $commit->{hash}, $commit->{date}, $commit->{author}, $git_perms); } - elsif ( $3 eq "A" ) + elsif ( $change eq "A" ) { - #$log->debug("ADDED $4"); - $head->{$4} = { - name => $4, + #$log->debug("ADDED $name"); + $head->{$name} = { + name => $name, revision => 1, - filehash => $2, + filehash => $hash, commithash => $commit->{hash}, modified => $commit->{date}, author => $commit->{author}, mode => $git_perms, }; - $self->insert_rev($4, $head->{$4}{revision}, $2, $commit->{hash}, $commit->{date}, $commit->{author}, $git_perms); + $self->insert_rev($name, $head->{$name}{revision}, $hash, $commit->{hash}, $commit->{date}, $commit->{author}, $git_perms); } else { - $log->warn("UNKNOWN FILE CHANGE mode=$1, hash=$2, change=$3, name=$4"); + $log->warn("UNKNOWN FILE CHANGE mode=$mode, hash=$hash, change=$change, name=$name"); die; } } @@ -2412,10 +2417,12 @@ sub update # this is used to detect files removed from the repo my $seen_files = {}; - my $filepipe = open(FILELIST, '-|', 'git-ls-tree', '-r', $commit->{hash}) or die("Cannot call git-ls-tree : $!"); + my $filepipe = open(FILELIST, '-|', 'git-ls-tree', '-z', '-r', $commit->{hash}) or die("Cannot call git-ls-tree : $!"); + local $/ = "\0"; while ( <FILELIST> ) { - unless ( /^(\d+)\s+(\w+)\s+([a-zA-Z0-9]+)\s+(.*)$/o ) + chomp; + unless ( /^(\d+)\s+(\w+)\s+([a-zA-Z0-9]+)\t(.*)$/o ) { die("Couldn't process git-ls-tree line : $_"); } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Non-ASCII paths and git-cvsserver 2006-11-10 19:49 ` Junio C Hamano @ 2006-11-13 13:58 ` sf 2006-11-13 14:20 ` Jakub Narebski 2006-11-13 18:22 ` Martin Langhoff 0 siblings, 2 replies; 11+ messages in thread From: sf @ 2006-11-13 13:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Martin Langhoff Junio C Hamano wrote: > sf <sf@b-i-t.de> writes: > >> I want to access a git repository via git-cvsserver. The problem is >> that the repository contains paths with umlauts. These paths come out >> quoted and escaped when checked out with cvs. > > I think this is because the cvsserver invokes diff-tree and > ls-tree without -z and the output from these command quote > non-ascii letters as unsafe. I knew I had seen that kind of quoting before but right then I thought it was related to Perl or SQLite. > Martin's sqlite may probably be needed as well, but regardless > of that something like this patch is needed -- otherwise what > populates sqlite database will be quoted to begin with so it > would not help much. Martin, are you sure your patch is needed? (see below) > I've tested with your reproduction recipe, but otherwise not > tested this patch. Thanks, Junio. Paths with umlauts are returned correctly now both in UTF-8 and ISO-8859-1. I guess git-cvsserver is now as encoding agnostic as git core. Regards ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Non-ASCII paths and git-cvsserver 2006-11-13 13:58 ` sf @ 2006-11-13 14:20 ` Jakub Narebski 2006-11-13 18:30 ` Robin Rosenberg 2006-11-13 18:22 ` Martin Langhoff 1 sibling, 1 reply; 11+ messages in thread From: Jakub Narebski @ 2006-11-13 14:20 UTC (permalink / raw) To: git sf wrote: > Thanks, Junio. Paths with umlauts are returned correctly now both in > UTF-8 and ISO-8859-1. I guess git-cvsserver is now as encoding agnostic > as git core. By the way, now that git has per user config file, ~/.gitconfig, perhaps it is time to add i18n.filesystemEncoding configuration variable, to automatically convert between filesystem encoding (somthing you usually don't have any control over) and UTF-8 encoding of paths in tree objects. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Non-ASCII paths and git-cvsserver 2006-11-13 14:20 ` Jakub Narebski @ 2006-11-13 18:30 ` Robin Rosenberg 2006-11-13 18:57 ` Jakub Narebski 2006-11-13 19:48 ` Junio C Hamano 0 siblings, 2 replies; 11+ messages in thread From: Robin Rosenberg @ 2006-11-13 18:30 UTC (permalink / raw) To: Jakub Narebski; +Cc: git måndag 13 november 2006 15:20 skrev Jakub Narebski: > sf wrote: > > Thanks, Junio. Paths with umlauts are returned correctly now both in > > UTF-8 and ISO-8859-1. I guess git-cvsserver is now as encoding agnostic > > as git core. > > By the way, now that git has per user config file, ~/.gitconfig, perhaps > it is time to add i18n.filesystemEncoding configuration variable, to > automatically convert between filesystem encoding (somthing you usually > don't have any control over) and UTF-8 encoding of paths in tree objects. I'd prefer git to store filenames and comments in UTF-8 and convert on input/output when and if it is necessary rather than forcing everybody to take the hit. Most systems, but far from all, already use UTF-8 so it's a noop for them. The only reason I want conversion is for the years to come where we still live in two worlds of non-utf-8 and utf-8 and then forget about everything non-utf-8, rather than carry around the baggage forever. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Non-ASCII paths and git-cvsserver 2006-11-13 18:30 ` Robin Rosenberg @ 2006-11-13 18:57 ` Jakub Narebski 2006-11-13 21:41 ` Robin Rosenberg 2006-11-13 19:48 ` Junio C Hamano 1 sibling, 1 reply; 11+ messages in thread From: Jakub Narebski @ 2006-11-13 18:57 UTC (permalink / raw) To: Robin Rosenberg; +Cc: git Dnia poniedziałek 13. listopada 2006 19:30, Robin Rosenberg napisał: > måndag 13 november 2006 15:20 skrev Jakub Narebski: >> sf wrote: >>> Thanks, Junio. Paths with umlauts are returned correctly now both in >>> UTF-8 and ISO-8859-1. I guess git-cvsserver is now as encoding agnostic >>> as git core. >> >> By the way, now that git has per user config file, ~/.gitconfig, perhaps >> it is time to add i18n.filesystemEncoding configuration variable, to >> automatically convert between filesystem encoding (somthing you usually >> don't have any control over) and UTF-8 encoding of paths in tree objects. > > I'd prefer git to store filenames and comments in UTF-8 and convert on > input/output when and if it is necessary rather than forcing everybody to > take the hit. Most systems, but far from all, already use UTF-8 so it's a > noop for them. The only reason I want conversion is for the years to come > where we still live in two worlds of non-utf-8 and utf-8 and then forget > about everything non-utf-8, rather than carry around the baggage forever. That was my idea, to have i18n.filesystemEncoding configuration variable to convert between filesystem encoding (which is usually something you don't have control over, and which depends from place to place, but not from repository to repository) and UTF-8 encoding git would store filenames. -- Jakub Narebski ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Non-ASCII paths and git-cvsserver 2006-11-13 18:57 ` Jakub Narebski @ 2006-11-13 21:41 ` Robin Rosenberg 0 siblings, 0 replies; 11+ messages in thread From: Robin Rosenberg @ 2006-11-13 21:41 UTC (permalink / raw) To: Jakub Narebski; +Cc: git måndag 13 november 2006 19:57 skrev Jakub Narebski: > That was my idea, to have i18n.filesystemEncoding configuration variable > to convert between filesystem encoding (which is usually something you > don't have control over, and which depends from place to place, but not > from repository to repository) and UTF-8 encoding git would store > filenames. Yes, I know. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Non-ASCII paths and git-cvsserver 2006-11-13 18:30 ` Robin Rosenberg 2006-11-13 18:57 ` Jakub Narebski @ 2006-11-13 19:48 ` Junio C Hamano 1 sibling, 0 replies; 11+ messages in thread From: Junio C Hamano @ 2006-11-13 19:48 UTC (permalink / raw) To: Robin Rosenberg; +Cc: git Robin Rosenberg <robin.rosenberg.lists@dewire.com> writes: > måndag 13 november 2006 15:20 skrev Jakub Narebski: >> sf wrote: >> > Thanks, Junio. Paths with umlauts are returned correctly now both in >> > UTF-8 and ISO-8859-1. I guess git-cvsserver is now as encoding agnostic >> > as git core. >> >> By the way, now that git has per user config file, ~/.gitconfig, perhaps >> it is time to add i18n.filesystemEncoding configuration variable, to >> automatically convert between filesystem encoding (somthing you usually >> don't have any control over) and UTF-8 encoding of paths in tree objects. > > I'd prefer git to store filenames and comments in UTF-8 and convert on > input/output when and if it is necessary rather than forcing everybody to > take the hit. Most systems, but far from all, already use UTF-8 so it's a > noop for them. The only reason I want conversion is for the years to come > where we still live in two worlds of non-utf-8 and utf-8 and then forget > about everything non-utf-8, rather than carry around the baggage forever. Pathnames in git core are encoding agnostic just like UNIX pathnames are. As you say, if the project convention is UTF-8 then it would not make any difference either way, so the status quo is fine for people living in UTF-8 only world. To people for whom it is inconvenient to work with UTF-8, including me, it is always wrong to record UTF-8 at the core level and try to autoconvert. If (non-git) tools, libraries and legacy-to-unicode roundtrip conversion were perfect, we would have already converted and living in UTF-8 only world. Projects that choose to run with legacy pathname encoding should be allowed to do so without taking the roundtrip risk converting to and from UTF-8. Interestingly enough, Linus mentioned this once, a lot better than myself would have, here: http://thread.gmane.org/gmane.comp.version-control.git/12240/focus=12279 Having said that, I am not opposed to have an option to make the external interface to do the pathname conversion. If your project chooses to use euc-jp for commit messages, your configuration variable i18n.commitencoding is set to euc-jp, and if gitweb always wants to do its thing in utf-8 (which is probably a sensible thing to do), it would make a lot of sense to take the commit message and convert it from euc-jp to utf-8 before rendering it in HTML. Maybe i18n.pathnameencoding could be used for similar purposes for external interfaces. But the core will stay encoding agnostic; pathnames stored in the index and tree are what you can feed stat() and open(), and what you read from readdir(). Maybe we could revisit this decision in five years, but not now. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Non-ASCII paths and git-cvsserver 2006-11-13 13:58 ` sf 2006-11-13 14:20 ` Jakub Narebski @ 2006-11-13 18:22 ` Martin Langhoff 2006-11-14 10:40 ` sf 1 sibling, 1 reply; 11+ messages in thread From: Martin Langhoff @ 2006-11-13 18:22 UTC (permalink / raw) To: sf; +Cc: Junio C Hamano, git On 11/13/06, sf <sf@b-i-t.de> wrote: > Martin, are you sure your patch is needed? (see below) Not 100% sure. I was just making sure we crossed all the Ts and dotted the Is. I gather you have tried my patch and it didn't make any difference. What SQLite and Perl versions are you using? cheers, ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Non-ASCII paths and git-cvsserver 2006-11-13 18:22 ` Martin Langhoff @ 2006-11-14 10:40 ` sf 0 siblings, 0 replies; 11+ messages in thread From: sf @ 2006-11-14 10:40 UTC (permalink / raw) To: Martin Langhoff; +Cc: Junio C Hamano, git Martin Langhoff wrote: > On 11/13/06, sf <sf@b-i-t.de> wrote: >> Martin, are you sure your patch is needed? (see below) > > Not 100% sure. I was just making sure we crossed all the Ts and dotted > the Is. I gather you have tried my patch and it didn't make any > difference. What SQLite and Perl versions are you using? Your patch did make a difference but the outcome is not good: + WORK=/tmp/gittest + FILE=$'\303\244' + mkdir /tmp/gittest + mkdir /tmp/gittest/git + cd /tmp/gittest/git + git init-db defaulting to local storage area + git repo-config gitcvs.enabled 1 + git repo-config gitcvs.logfile /tmp/gittest/git/.git/cvslog.txt + touch $'\303\244' + git add $'\303\244' + git commit -a -mx Committing initial tree 23d6145738bba135994775c19d6e8ae707d399ee + cd /tmp/gittest + CVS_SERVER=git-cvsserver + export CVS_SERVER + cvs -d :fork:/tmp/gittest/git/.git co master cvs checkout: Updating master U master/ä + ls master ä CVS The pathname has been UTF-8 encoded _twice_! Perl's version is 5.8.8. How do I get the version of SQLite? Do you mean DBD-SQLite-1.11? Regards Stephan ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-11-14 10:41 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-11-09 11:11 Non-ASCII paths and git-cvsserver sf 2006-11-10 18:59 ` Martin Langhoff 2006-11-10 19:49 ` Junio C Hamano 2006-11-13 13:58 ` sf 2006-11-13 14:20 ` Jakub Narebski 2006-11-13 18:30 ` Robin Rosenberg 2006-11-13 18:57 ` Jakub Narebski 2006-11-13 21:41 ` Robin Rosenberg 2006-11-13 19:48 ` Junio C Hamano 2006-11-13 18:22 ` Martin Langhoff 2006-11-14 10:40 ` sf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).