git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luke Diamand <luke@diamand.org>
To: Pete Wyckoff <pw@padd.com>
Cc: git@vger.kernel.org, Vitor Antunes <vitor.hda@gmail.com>,
	Chris Li <git@chrisli.org>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 2/5] git-p4: handle utf16 filetype properly
Date: Fri, 23 Sep 2011 19:01:11 +0100	[thread overview]
Message-ID: <4E7CC967.8010502@diamand.org> (raw)
In-Reply-To: <20110918012831.GB4619@arf.padd.com>

On 18/09/11 02:28, Pete Wyckoff wrote:
> One of the filetypes that p4 supports is utf16.  Its behavior is
> odd in this case.  The data delivered through "p4 -G print" is
> not encoded in utf16, although "p4 print -o" will produce the
> proper utf16-encoded file.
>
> When dealing with this filetype, discard the data from -G, and
> intstead read the contents directly.

"intstead" - should be "instead", or perhaps "int32_tstead".



>
> An alternate approach would be to try to encode the data in
> python.  That worked for true utf16 files, but for other files
> marked as utf16, p4 delivers mangled text in no recognizable encoding.
>
> Add a test case to check utf16 handling, and +k and +ko handling.
>
> Reported-by: Chris Li<git@chrisli.org>
> Signed-off-by: Pete Wyckoff<pw@padd.com>
> ---
>   contrib/fast-import/git-p4 |   11 +++++
>   t/t9802-git-p4-filetype.sh |  107 ++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 118 insertions(+), 0 deletions(-)
>   create mode 100755 t/t9802-git-p4-filetype.sh
>
> diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
> index 2f7b270..e69caf3 100755
> --- a/contrib/fast-import/git-p4
> +++ b/contrib/fast-import/git-p4
> @@ -1238,6 +1238,15 @@ class P4Sync(Command, P4UserMap):
>               data = ''.join(contents)
>               contents = [data[:-1]]
>
> +        if file['type'].startswith("utf16"):
> +            # p4 delivers different text in the python output to -G
> +            # than it does when using "print -o", or normal p4 client
> +            # operations.  utf16 is converted to ascii or utf8, perhaps.
> +            # But ascii text saved as -t utf16 is completely mangled.
> +            # Invoke print -o to get the real contents.
> +            text = p4_read_pipe('print -q -o - "%s"' % file['depotFile'])
> +            contents = [ text ]
> +
>           if self.isWindows and file["type"].endswith("text"):
>               mangled = []
>               for data in contents:
> @@ -1245,6 +1254,8 @@ class P4Sync(Command, P4UserMap):
>                   mangled.append(data)
>               contents = mangled
>
> +        # Note that we do not try to de-mangle keywords on utf16 files,
> +        # even though in theory somebody may want that.
>           if file['type'] in ('text+ko', 'unicode+ko', 'binary+ko'):
>               contents = map(lambda text: re.sub(r'(?i)\$(Id|Header):[^$]*\$',r'$\1$', text), contents)
>           elif file['type'] in ('text+k', 'ktext', 'kxtext', 'unicode+k', 'binary+k'):
> diff --git a/t/t9802-git-p4-filetype.sh b/t/t9802-git-p4-filetype.sh
> new file mode 100755
> index 0000000..f112eaa
> --- /dev/null
> +++ b/t/t9802-git-p4-filetype.sh
> @@ -0,0 +1,107 @@
> +#!/bin/sh
> +
> +test_description='git-p4 p4 filetype tests'
> +
> +. ./lib-git-p4.sh
> +
> +test_expect_success 'start p4d' '
> +	kill_p4d || :&&
> +	start_p4d&&
> +	cd "$TRASH_DIRECTORY"
> +'
> +
> +test_expect_success 'utf-16 file create' '
> +	cd "$cli"&&
> +
> +	# p4 saves this verbatim
> +	echo -e "three\nline\ntext">  f-ascii&&
> +	p4 add -t text f-ascii&&
> +
> +	# p4 adds \377\376 header
> +	cp f-ascii f-ascii-as-utf16&&
> +	p4 add -t utf16 f-ascii-as-utf16&&
> +
> +	# p4 saves this exactly as iconv produced it
> +	echo -e "three\nline\ntext" | iconv -f ascii -t utf-16>  f-utf16&&
> +	p4 add -t utf16 f-utf16&&
> +
> +	# this also is unchanged
> +	cp f-utf16 f-utf16-as-text&&
> +	p4 add -t text f-utf16-as-text&&
> +
> +	p4 submit -d "f files"&&
> +
> +	# force update of client files
> +	p4 sync -f&&
> +	cd "$TRASH_DIRECTORY"
> +'
> +
> +test_expect_success 'utf-16 file test' '
> +	test_when_finished cleanup_git&&
> +	"$GITP4" clone --dest="$git" //depot@all&&
> +	cd "$git"&&
> +
> +	cmp "$cli/f-ascii" f-ascii&&
> +	cmp "$cli/f-ascii-as-utf16" f-ascii-as-utf16&&
> +	cmp "$cli/f-utf16" f-utf16&&
> +	cmp "$cli/f-utf16-as-text" f-utf16-as-text
> +'
> +
> +test_expect_success 'keyword file create' '
> +	cd "$cli"&&
> +
> +	echo -e "id\n\$Id\$\n\$Author\$\ntext">  k-text-k&&
> +	p4 add -t text+k k-text-k&&
> +
> +	cp k-text-k k-text-ko&&
> +	p4 add -t text+ko k-text-ko&&
> +
> +	cat k-text-k | iconv -f ascii -t utf-16>  k-utf16-k&&
> +	p4 add -t utf16+k k-utf16-k&&
> +
> +	cp k-utf16-k k-utf16-ko&&
> +	p4 add -t utf16+ko k-utf16-ko&&
> +
> +	p4 submit -d "k files"&&
> +	p4 sync -f&&
> +	cd "$TRASH_DIRECTORY"
> +'
> +
> +ko_smush() {
> +	cat>smush.py<<-EOF&&
> +	import re, sys
> +	sys.stdout.write(re.sub(r'(?i)\\\$(Id|Header):[^$]*\\\$', r'$\1$', sys.stdin.read()))
> +	EOF
> +	python smush.py<  "$1"
> +}
> +
> +k_smush() {
> +	cat>smush.py<<-EOF&&
> +	import re, sys
> +	sys.stdout.write(re.sub(r'(?i)\\\$(Id|Header|Author|Date|DateTime|Change|File|Revision):[^$]*\\\$', r'$\1$', sys.stdin.read()))
> +	EOF
> +	python smush.py<  "$1"
> +}
> +
> +test_expect_success 'keyword file test' '
> +	test_when_finished cleanup_git&&
> +	"$GITP4" clone --dest="$git" //depot@all&&
> +	cd "$git"&&
> +
> +	# text, ensure unexpanded
> +	k_smush "$cli/k-text-k">  cli-k-text-k-smush&&
> +	cmp cli-k-text-k-smush k-text-k&&
> +	ko_smush "$cli/k-text-ko">  cli-k-text-ko-smush&&
> +	cmp cli-k-text-ko-smush k-text-ko&&
> +
> +	# utf16, even though p4 expands keywords, git-p4 does not
> +	# try to undo that
> +	cmp "$cli/k-utf16-k" k-utf16-k&&
> +	cmp "$cli/k-utf16-ko" k-utf16-ko
> +'
> +
> +test_expect_success 'kill p4d' '
> +	kill_p4d
> +'
> +
> +test_done

  reply	other threads:[~2011-09-23 18:01 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-18  1:26 [PATCH 0/5] git-p4 filetype handling Pete Wyckoff
2011-09-18  1:27 ` [PATCH 1/5] git-p4 tests: refactor, split out common functions Pete Wyckoff
2011-09-18 21:48   ` Junio C Hamano
2011-09-21  1:29     ` Pete Wyckoff
2011-09-21  2:34       ` Junio C Hamano
2011-09-21  2:35         ` Junio C Hamano
2011-09-18  1:28 ` [PATCH 2/5] git-p4: handle utf16 filetype properly Pete Wyckoff
2011-09-23 18:01   ` Luke Diamand [this message]
2011-09-18  1:29 ` Pete Wyckoff
2011-09-18  1:31 ` [PATCH 3/5] git-p4: recognize all p4 filetypes Pete Wyckoff
2011-09-18  1:32 ` [PATCH 4/5] git-p4: stop ignoring apple filetype Pete Wyckoff
2011-09-18  1:33 ` [PATCH 5/5] git-p4: keyword flattening fixes Pete Wyckoff
2011-09-23 17:56 ` [PATCH 0/5] git-p4 filetype handling Luke Diamand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E7CC967.8010502@diamand.org \
    --to=luke@diamand.org \
    --cc=git@chrisli.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pw@padd.com \
    --cc=vitor.hda@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).