* [PATCH] git-p4: add "--path-encoding" option @ 2015-08-31 15:40 larsxschneider 2015-08-31 15:40 ` larsxschneider 0 siblings, 1 reply; 5+ messages in thread From: larsxschneider @ 2015-08-31 15:40 UTC (permalink / raw) To: git; +Cc: luke, Lars Schneider From: Lars Schneider <larsxschneider@gmail.com> Hi, I think I discovered a path encoding issue if you migrate P4 repositories that contain path names generated with Windows. I added a test case to prove my point. Character encoding is a complicated topic. Feedback is highly appreciated. Thanks, Lars Lars Schneider (1): git-p4: add "--path-encoding" option Documentation/git-p4.txt | 4 ++++ git-p4.py | 6 ++++++ t/t9821-git-p4-path-encoding.sh | 38 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 48 insertions(+) create mode 100755 t/t9821-git-p4-path-encoding.sh -- 2.5.1.1.g36ff854 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] git-p4: add "--path-encoding" option 2015-08-31 15:40 [PATCH] git-p4: add "--path-encoding" option larsxschneider @ 2015-08-31 15:40 ` larsxschneider 2015-08-31 17:40 ` Junio C Hamano 0 siblings, 1 reply; 5+ messages in thread From: larsxschneider @ 2015-08-31 15:40 UTC (permalink / raw) To: git; +Cc: luke, Lars Schneider [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=y, Size: 3589 bytes --] From: Lars Schneider <larsxschneider@gmail.com> Signed-off-by: Lars Schneider <larsxschneider@gmail.com> --- Documentation/git-p4.txt | 4 ++++ git-p4.py | 6 ++++++ t/t9821-git-p4-path-encoding.sh | 38 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 48 insertions(+) create mode 100755 t/t9821-git-p4-path-encoding.sh diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt index 82aa5d6..98b6c0f 100644 --- a/Documentation/git-p4.txt +++ b/Documentation/git-p4.txt @@ -252,6 +252,10 @@ Git repository: Use a client spec to find the list of interesting files in p4. See the "CLIENT SPEC" section below. +----path-encoding <encoding>:: + The encoding to use when reading p4 client paths. With this option + non ASCII paths are properly stored in Git. For example, the encoding 'cp1252' is often used on Windows systems. + -/ <path>:: Exclude selected depot paths when cloning or syncing. diff --git a/git-p4.py b/git-p4.py index 073f87b..2b3bfc4 100755 --- a/git-p4.py +++ b/git-p4.py @@ -1981,6 +1981,8 @@ class P4Sync(Command, P4UserMap): optparse.make_option("--silent", dest="silent", action="store_true"), optparse.make_option("--detect-labels", dest="detectLabels", action="store_true"), optparse.make_option("--import-labels", dest="importLabels", action="store_true"), + optparse.make_option("--path-encoding", dest="pathEncoding", type="string", + help="Encoding to use for paths"), optparse.make_option("--import-local", dest="importIntoRemotes", action="store_false", help="Import into refs/heads/ , not refs/remotes"), optparse.make_option("--max-changes", dest="maxChanges", @@ -2025,6 +2027,7 @@ class P4Sync(Command, P4UserMap): self.clientSpecDirs = None self.tempBranches = [] self.tempBranchLocation = "git-p4-tmp" + self.pathEncoding = None if gitConfig("git-p4.syncFromOrigin") == "false": self.syncWithOrigin = False @@ -2213,6 +2216,9 @@ class P4Sync(Command, P4UserMap): text = regexp.sub(r'$\1$', text) contents = [ text ] + if self.pathEncoding: + relPath = relPath.decode(self.pathEncoding).encode('utf8', 'replace') + self.gitStream.write("M %s inline %s\n" % (git_mode, relPath)) # total length... diff --git a/t/t9821-git-p4-path-encoding.sh b/t/t9821-git-p4-path-encoding.sh new file mode 100755 index 0000000..f6bb79c --- /dev/null +++ b/t/t9821-git-p4-path-encoding.sh @@ -0,0 +1,38 @@ +#!/bin/sh + +test_description='Clone repositories with non ASCII paths' + +. ./lib-git-p4.sh + +test_expect_success 'start p4d' ' + start_p4d +' + +test_expect_success 'Create a repo containing cp1251 encoded paths' ' + cd "$cli" && + + FILENAME="$(echo "a-ä_o-ö_u-ü.txt" | iconv -f utf-8 -t cp1252)" && + >"$FILENAME" && + p4 add "$FILENAME" && + p4 submit -d "test" +' + +test_expect_success 'Clone repo containing cp1251 encoded paths' ' + git p4 clone --destination="$git" --path-encoding=cp1252 //depot && + test_when_finished cleanup_git && + ( + cd "$git" && + git init . && + cat >expect <<-\EOF && + "a-\303\244_o-\303\266_u-\303\274.txt" + EOF + git ls-files >actual && + test_cmp expect actual + ) +' + +test_expect_success 'kill p4d' ' + kill_p4d +' + +test_done -- 2.5.1.1.g36ff854 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] git-p4: add "--path-encoding" option 2015-08-31 15:40 ` larsxschneider @ 2015-08-31 17:40 ` Junio C Hamano 2015-08-31 19:22 ` Torsten Bögershausen 0 siblings, 1 reply; 5+ messages in thread From: Junio C Hamano @ 2015-08-31 17:40 UTC (permalink / raw) To: larsxschneider; +Cc: git, luke larsxschneider@gmail.com writes: > From: Lars Schneider <larsxschneider@gmail.com> > > Signed-off-by: Lars Schneider <larsxschneider@gmail.com> > --- > Documentation/git-p4.txt | 4 ++++ > git-p4.py | 6 ++++++ > t/t9821-git-p4-path-encoding.sh | 38 ++++++++++++++++++++++++++++++++++++++ > 3 files changed, 48 insertions(+) > create mode 100755 t/t9821-git-p4-path-encoding.sh > > diff --git a/Documentation/git-p4.txt b/Documentation/git-p4.txt > index 82aa5d6..98b6c0f 100644 > --- a/Documentation/git-p4.txt > +++ b/Documentation/git-p4.txt > @@ -252,6 +252,10 @@ Git repository: > Use a client spec to find the list of interesting files in p4. > See the "CLIENT SPEC" section below. > > +----path-encoding <encoding>:: > + The encoding to use when reading p4 client paths. With this option > + non ASCII paths are properly stored in Git. For example, the encoding 'cp1252' is often used on Windows systems. > + This line is overly long. Let AsciiDoc wrap it upon output and keep the source within a reasonable limit (see existing lines around the new text to see what is considered reasonable). Do I see too many dashes before the option name, by the way, or is it my e-mail client tricking my eyes? > diff --git a/t/t9821-git-p4-path-encoding.sh b/t/t9821-git-p4-path-encoding.sh > new file mode 100755 > index 0000000..f6bb79c > --- /dev/null > +++ b/t/t9821-git-p4-path-encoding.sh > @@ -0,0 +1,38 @@ > +#!/bin/sh > + > +test_description='Clone repositories with non ASCII paths' > + > +. ./lib-git-p4.sh > + > +test_expect_success 'start p4d' ' > + start_p4d > +' > + > +test_expect_success 'Create a repo containing cp1251 encoded paths' ' > + cd "$cli" && > + > + FILENAME="$(echo "a-¤_o-¶_u-¼.txt" | iconv -f utf-8 -t cp1252)" && Hmm, we'd be better off not having a bare UTF-8 sequence in the source like this, especially when you already have the same thing backslash-escaped in the "expect" file below. Perhaps NAME="a-\303\244_o-\303\266_u-\303\274.txt" && UTF8=$(printf "$NAME") && CP1252=$(printf "$NAME" | iconv -t cp1252) && echo "\"$UTF8\"" >expect && >"$CP1252" && p4 add "$CP1252" && ... or something along that line? > + >"$FILENAME" && > + p4 add "$FILENAME" && > + p4 submit -d "test" > +' > + > +test_expect_success 'Clone repo containing cp1251 encoded paths' ' > + git p4 clone --destination="$git" --path-encoding=cp1252 //depot && > + test_when_finished cleanup_git && > + ( > + cd "$git" && > + git init . && > + cat >expect <<-\EOF && > + "a-\303\244_o-\303\266_u-\303\274.txt" > + EOF > + git ls-files >actual && > + test_cmp expect actual > + ) > +' > + > +test_expect_success 'kill p4d' ' > + kill_p4d > +' > + > +test_done ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] git-p4: add "--path-encoding" option 2015-08-31 17:40 ` Junio C Hamano @ 2015-08-31 19:22 ` Torsten Bögershausen 2015-08-31 20:09 ` Junio C Hamano 0 siblings, 1 reply; 5+ messages in thread From: Torsten Bögershausen @ 2015-08-31 19:22 UTC (permalink / raw) To: Junio C Hamano, larsxschneider; +Cc: git, luke On 2015-08-31 19.40, Junio C Hamano wrote: > larsxschneider@gmail.com writes: >> +test_expect_success 'Create a repo containing cp1251 encoded paths' ' >> + cd "$cli" && >> + >> + FILENAME="$(echo "a-¤_o-¶_u-¼.txt" | iconv -f utf-8 -t cp1252)" && > > Hmm, we'd be better off not having a bare UTF-8 sequence in the > source like this, especially when you already have the same thing > backslash-escaped in the "expect" file below. Perhaps > > NAME="a-\303\244_o-\303\266_u-\303\274.txt" && > > UTF8=$(printf "$NAME") && > CP1252=$(printf "$NAME" | iconv -t cp1252) && > echo "\"$UTF8\"" >expect && > > >"$CP1252" && > p4 add "$CP1252" && > ... > Using file names and iconv like this may not be portable: - cp1252 may be called CP1252 (or may not be available) - reading from stdin is not necessarily supported by iconv - creating files in CP1252 may not be supported under Mac OS (Not sure about Windows) One solution could be to use ISO-8859-1, convert into UTF-8, and "convert into UTF-8" one more time. We can skip using iconv in the test case completely, and use something like this: (Fully untested) UTF8=$(printf '\303\203\302\204') NAME=$(printf '\303\204') ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] git-p4: add "--path-encoding" option 2015-08-31 19:22 ` Torsten Bögershausen @ 2015-08-31 20:09 ` Junio C Hamano 0 siblings, 0 replies; 5+ messages in thread From: Junio C Hamano @ 2015-08-31 20:09 UTC (permalink / raw) To: Torsten Bögershausen; +Cc: larsxschneider, git, luke Torsten Bögershausen <tboegi@web.de> writes: > On 2015-08-31 19.40, Junio C Hamano wrote: >> larsxschneider@gmail.com writes: > >>> +test_expect_success 'Create a repo containing cp1251 encoded paths' ' >>> + cd "$cli" && >>> + >>> + FILENAME="$(echo "a-¤_o-¶_u-¼.txt" | iconv -f utf-8 -t cp1252)" && >> ... > Using file names and iconv like this may not be portable: > - cp1252 may be called CP1252 (or may not be available) "git grep 'cp[0-9]' t/" does tell us that we refrain from using them and I am sure the portability worries is a big reason. Thank you for pointing it out. > - reading from stdin is not necessarily supported by iconv "git grep '| iconv' t/" tells me that this is irrelevant; we already heavily depend on it. > - creating files in CP1252 may not be supported under Mac OS > (Not sure about Windows) The same as the first point, which is a good thing to worry about. > One solution could be to use ISO-8859-1, convert into UTF-8, > and "convert into UTF-8" one more time. I do not quite get it; do you need to do anything more than just replacing cp1252 with iso-8859-1 in the patch being discussed? ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-08-31 20:10 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-08-31 15:40 [PATCH] git-p4: add "--path-encoding" option larsxschneider 2015-08-31 15:40 ` larsxschneider 2015-08-31 17:40 ` Junio C Hamano 2015-08-31 19:22 ` Torsten Bögershausen 2015-08-31 20:09 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).