From: Luke Diamand <luke@diamand.org>
To: Pete Wyckoff <pw@padd.com>
Cc: git@vger.kernel.org, Vitor Antunes <vitor.hda@gmail.com>,
Chris Li <git@chrisli.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/5] git-p4: handle utf16 filetype properly
Date: Fri, 23 Sep 2011 19:01:11 +0100 [thread overview]
Message-ID: <4E7CC967.8010502@diamand.org> (raw)
In-Reply-To: <20110918012831.GB4619@arf.padd.com>
On 18/09/11 02:28, Pete Wyckoff wrote:
> One of the filetypes that p4 supports is utf16. Its behavior is
> odd in this case. The data delivered through "p4 -G print" is
> not encoded in utf16, although "p4 print -o" will produce the
> proper utf16-encoded file.
>
> When dealing with this filetype, discard the data from -G, and
> intstead read the contents directly.
"intstead" - should be "instead", or perhaps "int32_tstead".
>
> An alternate approach would be to try to encode the data in
> python. That worked for true utf16 files, but for other files
> marked as utf16, p4 delivers mangled text in no recognizable encoding.
>
> Add a test case to check utf16 handling, and +k and +ko handling.
>
> Reported-by: Chris Li<git@chrisli.org>
> Signed-off-by: Pete Wyckoff<pw@padd.com>
> ---
> contrib/fast-import/git-p4 | 11 +++++
> t/t9802-git-p4-filetype.sh | 107 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 118 insertions(+), 0 deletions(-)
> create mode 100755 t/t9802-git-p4-filetype.sh
>
> diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
> index 2f7b270..e69caf3 100755
> --- a/contrib/fast-import/git-p4
> +++ b/contrib/fast-import/git-p4
> @@ -1238,6 +1238,15 @@ class P4Sync(Command, P4UserMap):
> data = ''.join(contents)
> contents = [data[:-1]]
>
> + if file['type'].startswith("utf16"):
> + # p4 delivers different text in the python output to -G
> + # than it does when using "print -o", or normal p4 client
> + # operations. utf16 is converted to ascii or utf8, perhaps.
> + # But ascii text saved as -t utf16 is completely mangled.
> + # Invoke print -o to get the real contents.
> + text = p4_read_pipe('print -q -o - "%s"' % file['depotFile'])
> + contents = [ text ]
> +
> if self.isWindows and file["type"].endswith("text"):
> mangled = []
> for data in contents:
> @@ -1245,6 +1254,8 @@ class P4Sync(Command, P4UserMap):
> mangled.append(data)
> contents = mangled
>
> + # Note that we do not try to de-mangle keywords on utf16 files,
> + # even though in theory somebody may want that.
> if file['type'] in ('text+ko', 'unicode+ko', 'binary+ko'):
> contents = map(lambda text: re.sub(r'(?i)\$(Id|Header):[^$]*\$',r'$\1$', text), contents)
> elif file['type'] in ('text+k', 'ktext', 'kxtext', 'unicode+k', 'binary+k'):
> diff --git a/t/t9802-git-p4-filetype.sh b/t/t9802-git-p4-filetype.sh
> new file mode 100755
> index 0000000..f112eaa
> --- /dev/null
> +++ b/t/t9802-git-p4-filetype.sh
> @@ -0,0 +1,107 @@
> +#!/bin/sh
> +
> +test_description='git-p4 p4 filetype tests'
> +
> +. ./lib-git-p4.sh
> +
> +test_expect_success 'start p4d' '
> + kill_p4d || :&&
> + start_p4d&&
> + cd "$TRASH_DIRECTORY"
> +'
> +
> +test_expect_success 'utf-16 file create' '
> + cd "$cli"&&
> +
> + # p4 saves this verbatim
> + echo -e "three\nline\ntext"> f-ascii&&
> + p4 add -t text f-ascii&&
> +
> + # p4 adds \377\376 header
> + cp f-ascii f-ascii-as-utf16&&
> + p4 add -t utf16 f-ascii-as-utf16&&
> +
> + # p4 saves this exactly as iconv produced it
> + echo -e "three\nline\ntext" | iconv -f ascii -t utf-16> f-utf16&&
> + p4 add -t utf16 f-utf16&&
> +
> + # this also is unchanged
> + cp f-utf16 f-utf16-as-text&&
> + p4 add -t text f-utf16-as-text&&
> +
> + p4 submit -d "f files"&&
> +
> + # force update of client files
> + p4 sync -f&&
> + cd "$TRASH_DIRECTORY"
> +'
> +
> +test_expect_success 'utf-16 file test' '
> + test_when_finished cleanup_git&&
> + "$GITP4" clone --dest="$git" //depot@all&&
> + cd "$git"&&
> +
> + cmp "$cli/f-ascii" f-ascii&&
> + cmp "$cli/f-ascii-as-utf16" f-ascii-as-utf16&&
> + cmp "$cli/f-utf16" f-utf16&&
> + cmp "$cli/f-utf16-as-text" f-utf16-as-text
> +'
> +
> +test_expect_success 'keyword file create' '
> + cd "$cli"&&
> +
> + echo -e "id\n\$Id\$\n\$Author\$\ntext"> k-text-k&&
> + p4 add -t text+k k-text-k&&
> +
> + cp k-text-k k-text-ko&&
> + p4 add -t text+ko k-text-ko&&
> +
> + cat k-text-k | iconv -f ascii -t utf-16> k-utf16-k&&
> + p4 add -t utf16+k k-utf16-k&&
> +
> + cp k-utf16-k k-utf16-ko&&
> + p4 add -t utf16+ko k-utf16-ko&&
> +
> + p4 submit -d "k files"&&
> + p4 sync -f&&
> + cd "$TRASH_DIRECTORY"
> +'
> +
> +ko_smush() {
> + cat>smush.py<<-EOF&&
> + import re, sys
> + sys.stdout.write(re.sub(r'(?i)\\\$(Id|Header):[^$]*\\\$', r'$\1$', sys.stdin.read()))
> + EOF
> + python smush.py< "$1"
> +}
> +
> +k_smush() {
> + cat>smush.py<<-EOF&&
> + import re, sys
> + sys.stdout.write(re.sub(r'(?i)\\\$(Id|Header|Author|Date|DateTime|Change|File|Revision):[^$]*\\\$', r'$\1$', sys.stdin.read()))
> + EOF
> + python smush.py< "$1"
> +}
> +
> +test_expect_success 'keyword file test' '
> + test_when_finished cleanup_git&&
> + "$GITP4" clone --dest="$git" //depot@all&&
> + cd "$git"&&
> +
> + # text, ensure unexpanded
> + k_smush "$cli/k-text-k"> cli-k-text-k-smush&&
> + cmp cli-k-text-k-smush k-text-k&&
> + ko_smush "$cli/k-text-ko"> cli-k-text-ko-smush&&
> + cmp cli-k-text-ko-smush k-text-ko&&
> +
> + # utf16, even though p4 expands keywords, git-p4 does not
> + # try to undo that
> + cmp "$cli/k-utf16-k" k-utf16-k&&
> + cmp "$cli/k-utf16-ko" k-utf16-ko
> +'
> +
> +test_expect_success 'kill p4d' '
> + kill_p4d
> +'
> +
> +test_done
next prev parent reply other threads:[~2011-09-23 18:01 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-18 1:26 [PATCH 0/5] git-p4 filetype handling Pete Wyckoff
2011-09-18 1:27 ` [PATCH 1/5] git-p4 tests: refactor, split out common functions Pete Wyckoff
2011-09-18 21:48 ` Junio C Hamano
2011-09-21 1:29 ` Pete Wyckoff
2011-09-21 2:34 ` Junio C Hamano
2011-09-21 2:35 ` Junio C Hamano
2011-09-18 1:28 ` [PATCH 2/5] git-p4: handle utf16 filetype properly Pete Wyckoff
2011-09-23 18:01 ` Luke Diamand [this message]
2011-09-18 1:29 ` Pete Wyckoff
2011-09-18 1:31 ` [PATCH 3/5] git-p4: recognize all p4 filetypes Pete Wyckoff
2011-09-18 1:32 ` [PATCH 4/5] git-p4: stop ignoring apple filetype Pete Wyckoff
2011-09-18 1:33 ` [PATCH 5/5] git-p4: keyword flattening fixes Pete Wyckoff
2011-09-23 17:56 ` [PATCH 0/5] git-p4 filetype handling Luke Diamand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E7CC967.8010502@diamand.org \
--to=luke@diamand.org \
--cc=git@chrisli.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pw@padd.com \
--cc=vitor.hda@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).