git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] git-p4: import utf16 file properly
@ 2011-09-13 21:33 Chris Li
  2011-09-14  7:55 ` Luke Diamand
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Li @ 2011-09-13 21:33 UTC (permalink / raw)
  To: git; +Cc: Pete Wyckoff, Junio C Hamano

The current git-p4 does not handle utf16 files properly.
The "p4 print" command, when output to stdout, converts the
utf16 file into utf8. That effectively imported the utf16 file
as utf8 for git. In other words, git-p4 import a different
file compare to file check out by perforce. This breakes my
windows build in the company project.

The fix is simple, just ask perforce to print the depot
file into a real file. This way perforce will not performe
the utf16 to utf8 conversion. Git can import the exact same
file as perforce checkout.
---
 contrib/fast-import/git-p4 |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 6b9de9e..5fb1ac7 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -1239,6 +1239,11 @@ class P4Sync(Command, P4UserMap):
             contents = map(lambda text:
re.sub(r'(?i)\$(Id|Header):[^$]*\$',r'$\1$', text), contents)
         elif file['type'] in ('text+k', 'ktext', 'kxtext',
'unicode+k', 'binary+k'):
             contents = map(lambda text:
re.sub(r'\$(Id|Header|Author|Date|DateTime|Change|File|Revision):[^$\n]*\$',r'$\1$',
text), contents)
+        elif file['type'] == 'utf16':
+             tmpFile = tempfile.NamedTemporaryFile()
+             p4CmdList("print -o %s %s"%(tmpFile.name, file['depotFile']))
+             contents = [ open(tmpFile.name).read() ]
+             tmpFile.close()

         self.gitStream.write("M %s inline %s\n" % (mode, relPath))

-- 
1.7.6

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-p4: import utf16 file properly
  2011-09-13 21:33 [PATCH] git-p4: import utf16 file properly Chris Li
@ 2011-09-14  7:55 ` Luke Diamand
  2011-09-14 18:29   ` Chris Li
  0 siblings, 1 reply; 6+ messages in thread
From: Luke Diamand @ 2011-09-14  7:55 UTC (permalink / raw)
  To: Chris Li; +Cc: git, Pete Wyckoff, Junio C Hamano

On 13/09/11 22:33, Chris Li wrote:
> The current git-p4 does not handle utf16 files properly.
> The "p4 print" command, when output to stdout, converts the
> utf16 file into utf8. That effectively imported the utf16 file
> as utf8 for git. In other words, git-p4 import a different
> file compare to file check out by perforce. This breakes my
> windows build in the company project.
>
> The fix is simple, just ask perforce to print the depot
> file into a real file. This way perforce will not performe
> the utf16 to utf8 conversion. Git can import the exact same
> file as perforce checkout.

Does this change do the right thing with RCS keywords in UTF16 files?

If p4CmdList() fails, e.g. due to running out of diskspace, will this 
just happily import a truncated/corrupt file?

(And I could be wrong about this, but does you patch have newline 
damage? It didn't seem to apply for me).

Regards!
Luke

> ---
>   contrib/fast-import/git-p4 |    5 +++++
>   1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
> index 6b9de9e..5fb1ac7 100755
> --- a/contrib/fast-import/git-p4
> +++ b/contrib/fast-import/git-p4
> @@ -1239,6 +1239,11 @@ class P4Sync(Command, P4UserMap):
>               contents = map(lambda text:
> re.sub(r'(?i)\$(Id|Header):[^$]*\$',r'$\1$', text), contents)
>           elif file['type'] in ('text+k', 'ktext', 'kxtext',
> 'unicode+k', 'binary+k'):
>               contents = map(lambda text:
> re.sub(r'\$(Id|Header|Author|Date|DateTime|Change|File|Revision):[^$\n]*\$',r'$\1$',
> text), contents)
> +        elif file['type'] == 'utf16':
> +             tmpFile = tempfile.NamedTemporaryFile()
> +             p4CmdList("print -o %s %s"%(tmpFile.name, file['depotFile']))
> +             contents = [ open(tmpFile.name).read() ]
> +             tmpFile.close()
>
>           self.gitStream.write("M %s inline %s\n" % (mode, relPath))
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-p4: import utf16 file properly
  2011-09-14  7:55 ` Luke Diamand
@ 2011-09-14 18:29   ` Chris Li
  2011-09-14 18:39     ` Chris Li
  2011-09-14 18:56     ` Luke Diamand
  0 siblings, 2 replies; 6+ messages in thread
From: Chris Li @ 2011-09-14 18:29 UTC (permalink / raw)
  To: Luke Diamand; +Cc: git, Pete Wyckoff, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1175 bytes --]

On Wed, Sep 14, 2011 at 12:55 AM, Luke Diamand <luke@diamand.org> wrote:
> On 13/09/11 22:33, Chris Li wrote:
>> The fix is simple, just ask perforce to print the depot
>> file into a real file. This way perforce will not performe
>> the utf16 to utf8 conversion. Git can import the exact same
>> file as perforce checkout.
>
> Does this change do the right thing with RCS keywords in UTF16 files?

I don't know what is the rules about the RCS keyword in UTF16 files.
I look at the current git-p4, it does not do any keyword replacement in
utf16 files. So this patch did not change that. It should be a separate issue.

The way I see it, this patch is a straight enhancement compare to the
current git-p4 because the current git-p4 *corrupts* the utf16 files.

>
> If p4CmdList() fails, e.g. due to running out of diskspace, will this just
> happily import a truncated/corrupt file?

Good point. I add the error check and attach the new patch.

> (And I could be wrong about this, but does you patch have newline damage? It
> didn't seem to apply for me).

Gmail dmage the white space. I should always use the attachment.
Does the attached patch work for you?

Thanks

Chris

[-- Attachment #2: 0001-git-p4-import-utf16-file-properly.patch --]
[-- Type: text/x-patch, Size: 1846 bytes --]

From 06de9cfdcd89e8bfb6575f40d36fdfcefe1a1985 Mon Sep 17 00:00:00 2001
From: Chris Li <git@chrisli.org>
Date: Tue, 13 Sep 2011 13:57:31 -0700
Subject: [PATCH] git-p4: import utf16 file properly

The current git-p4 does not handle utf16 files properly.
The "p4 print" command, when output to stdout, convert the
utf16 file into utf8. That effectively imported the utf16 file
as utf8 for git. In other words, git-p4 import a different
file compare to file check out by perforce. This breaks my
windows build in the company project.

The fix is simple, just ask perforce to print the depot
file into a real file. This way perforce will not perform
the utf16 to utf8 conversion. Git can import the exact same
file as perforce checkout.
---
 contrib/fast-import/git-p4 |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/contrib/fast-import/git-p4 b/contrib/fast-import/git-p4
index 6b9de9e..c111cad 100755
--- a/contrib/fast-import/git-p4
+++ b/contrib/fast-import/git-p4
@@ -1239,6 +1239,14 @@ class P4Sync(Command, P4UserMap):
             contents = map(lambda text: re.sub(r'(?i)\$(Id|Header):[^$]*\$',r'$\1$', text), contents)
         elif file['type'] in ('text+k', 'ktext', 'kxtext', 'unicode+k', 'binary+k'):
             contents = map(lambda text: re.sub(r'\$(Id|Header|Author|Date|DateTime|Change|File|Revision):[^$\n]*\$',r'$\1$', text), contents)
+        elif file['type'] == 'utf16':
+             tmpFile = tempfile.NamedTemporaryFile()
+             cmd = "print -o %s %s"%(tmpFile.name, file['depotFile'])
+             result = p4Cmd(cmd)
+             if "p4ExitCode" in result:
+                 die("Problems executing p4 %s"%cmd)
+             contents = [ open(tmpFile.name).read() ]
+             tmpFile.close()
 
         self.gitStream.write("M %s inline %s\n" % (mode, relPath))
 
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-p4: import utf16 file properly
  2011-09-14 18:29   ` Chris Li
@ 2011-09-14 18:39     ` Chris Li
  2011-09-18  1:19       ` Pete Wyckoff
  2011-09-14 18:56     ` Luke Diamand
  1 sibling, 1 reply; 6+ messages in thread
From: Chris Li @ 2011-09-14 18:39 UTC (permalink / raw)
  To: Luke Diamand; +Cc: git, Pete Wyckoff, Junio C Hamano

On Wed, Sep 14, 2011 at 11:29 AM, Chris Li <git@chrisli.org> wrote:
>> Does this change do the right thing with RCS keywords in UTF16 files?
>
> I don't know what is the rules about the RCS keyword in UTF16 files.

I did a little bit research and found this:

http://www.perforce.com/perforce/doc.current/manuals/p4guide/ab_filetypes.html

RCS keyword expand should only happen for "+k" or "+ko" modifiers.
There for, "utf16" files without modifier should not be converted.
In that regard, the patch is correct. But both the original and patched version
did not handle "utf16+k" type of files. I still consider it as a separate issue.

Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-p4: import utf16 file properly
  2011-09-14 18:29   ` Chris Li
  2011-09-14 18:39     ` Chris Li
@ 2011-09-14 18:56     ` Luke Diamand
  1 sibling, 0 replies; 6+ messages in thread
From: Luke Diamand @ 2011-09-14 18:56 UTC (permalink / raw)
  To: Chris Li; +Cc: git, Pete Wyckoff, Junio C Hamano

On 14/09/11 19:29, Chris Li wrote:
> On Wed, Sep 14, 2011 at 12:55 AM, Luke Diamand<luke@diamand.org>  wrote:
>> On 13/09/11 22:33, Chris Li wrote:
>>> The fix is simple, just ask perforce to print the depot
>>> file into a real file. This way perforce will not performe
>>> the utf16 to utf8 conversion. Git can import the exact same
>>> file as perforce checkout.
>>
>> Does this change do the right thing with RCS keywords in UTF16 files?
>
> I don't know what is the rules about the RCS keyword in UTF16 files.
> I look at the current git-p4, it does not do any keyword replacement in
> utf16 files. So this patch did not change that. It should be a separate issue.
>
> The way I see it, this patch is a straight enhancement compare to the
> current git-p4 because the current git-p4 *corrupts* the utf16 files.
>
>>
>> If p4CmdList() fails, e.g. due to running out of diskspace, will this just
>> happily import a truncated/corrupt file?
>
> Good point. I add the error check and attach the new patch.
>
>> (And I could be wrong about this, but does you patch have newline damage? It
>> didn't seem to apply for me).

Looks good to me. I think you're right about the RCS keywords not being 
relevant here.



>
> Gmail dmage the white space. I should always use the attachment.
> Does the attached patch work for you?
>
> Thanks
>
> Chris

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] git-p4: import utf16 file properly
  2011-09-14 18:39     ` Chris Li
@ 2011-09-18  1:19       ` Pete Wyckoff
  0 siblings, 0 replies; 6+ messages in thread
From: Pete Wyckoff @ 2011-09-18  1:19 UTC (permalink / raw)
  To: Chris Li; +Cc: Luke Diamand, git

git@chrisli.org wrote on Wed, 14 Sep 2011 11:39 -0700:
> On Wed, Sep 14, 2011 at 11:29 AM, Chris Li <git@chrisli.org> wrote:
> >> Does this change do the right thing with RCS keywords in UTF16 files?
> >
> > I don't know what is the rules about the RCS keyword in UTF16 files.
> 
> I did a little bit research and found this:
> 
> http://www.perforce.com/perforce/doc.current/manuals/p4guide/ab_filetypes.html
> 
> RCS keyword expand should only happen for "+k" or "+ko" modifiers.
> There for, "utf16" files without modifier should not be converted.
> In that regard, the patch is correct. But both the original and patched version
> did not handle "utf16+k" type of files. I still consider it as a separate issue.

Your patch looks good and this all makes sense.  I redid
it, adding a test case, and some more patches to clean up
some of the filetype detection code.  I'll send it out for
review soon here.

Luke:  thanks for the comments; they prompted me to think
about keywords and beyond.

		-- Pete

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-09-18  1:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-13 21:33 [PATCH] git-p4: import utf16 file properly Chris Li
2011-09-14  7:55 ` Luke Diamand
2011-09-14 18:29   ` Chris Li
2011-09-14 18:39     ` Chris Li
2011-09-18  1:19       ` Pete Wyckoff
2011-09-14 18:56     ` Luke Diamand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).