git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] remote-hg: unquote C-style paths when exporting
@ 2013-10-18 17:03 Antoine Pelisse
  2013-10-22 19:13 ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Antoine Pelisse @ 2013-10-18 17:03 UTC (permalink / raw)
  To: git; +Cc: Felipe Contreras, Antoine Pelisse

git-fast-import documentation says that paths can be C-style quoted.
Unfortunately, the current remote-hg helper doesn't unquote quoted
path and pass them as-is to Mercurial when the commit is created.

This result in the following situation:

- clone a mercurial repository with git
- Add a file with space: `mkdir dir/foo\ bar`
- Commit that new file, and push the change to mercurial
- The mercurial repository as now a new directory named '"dir', which
contains a file named 'foo bar"'

Use python ast.literal_eval to unquote the string if it starts with ".
It has been tested with quotes, spaces, and utf-8 encoded file-names.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
---
 contrib/remote-helpers/git-remote-hg | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/contrib/remote-helpers/git-remote-hg b/contrib/remote-helpers/git-remote-hg
index 92d994e..0141949 100755
--- a/contrib/remote-helpers/git-remote-hg
+++ b/contrib/remote-helpers/git-remote-hg
@@ -14,6 +14,7 @@
 
 from mercurial import hg, ui, bookmarks, context, encoding, node, error, extensions, discovery, util
 
+import ast
 import re
 import sys
 import os
@@ -742,6 +743,8 @@ def parse_commit(parser):
             f = { 'deleted' : True }
         else:
             die('Unknown file command: %s' % line)
+        if path.startswith('"'):
+            path = ast.literal_eval(path)
         files[path] = f
 
     # only export the commits if we are on an internal proxy repo
-- 
1.8.4.1.507.g9768648.dirty

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] remote-hg: unquote C-style paths when exporting
  2013-10-18 17:03 [PATCH] remote-hg: unquote C-style paths when exporting Antoine Pelisse
@ 2013-10-22 19:13 ` Junio C Hamano
  2013-10-22 20:49   ` Antoine Pelisse
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2013-10-22 19:13 UTC (permalink / raw)
  To: Antoine Pelisse; +Cc: git, Felipe Contreras

Antoine Pelisse <apelisse@gmail.com> writes:

> git-fast-import documentation says that paths can be C-style quoted.
> Unfortunately, the current remote-hg helper doesn't unquote quoted
> path and pass them as-is to Mercurial when the commit is created.
>
> This result in the following situation:
>
> - clone a mercurial repository with git
> - Add a file with space: `mkdir dir/foo\ bar`
> - Commit that new file, and push the change to mercurial
> - The mercurial repository as now a new directory named '"dir', which
> contains a file named 'foo bar"'
>
> Use python ast.literal_eval to unquote the string if it starts with ".
> It has been tested with quotes, spaces, and utf-8 encoded file-names.
>
> Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
> ---

A path you read in fast-import input indeed needs to be unquoted
when it begins with a dq, and I _think_ by using ast.literal_eval(),
you probably can correctly unquote any valid C-quoted string.

But it bothers me somewhat that what the patch does seems to be
overly broad.  Doesn't ast.literal_eval() take a lot more than just
strings?

    ast.literal_eval(node_or_string)

        Safely evaluate an expression node or a Unicode or Latin-1
        encoded string containing a Python expression. The string or
        node provided may only consist of the following Python literal
        structures: strings, numbers, tuples, lists, dicts, booleans,
        and None.

Also doesn't Python's double-quoted string have a lot more magic
than C-quoted string, e.g.

	$ python -i
        >>> import ast
        >>> not_cq_path = '"abc" "def"'
        >>> ast.literal_eval(not_cq_path)
	'abcdef'

>  contrib/remote-helpers/git-remote-hg | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/contrib/remote-helpers/git-remote-hg b/contrib/remote-helpers/git-remote-hg
> index 92d994e..0141949 100755
> --- a/contrib/remote-helpers/git-remote-hg
> +++ b/contrib/remote-helpers/git-remote-hg
> @@ -14,6 +14,7 @@
>  
>  from mercurial import hg, ui, bookmarks, context, encoding, node, error, extensions, discovery, util
>  
> +import ast
>  import re
>  import sys
>  import os
> @@ -742,6 +743,8 @@ def parse_commit(parser):
>              f = { 'deleted' : True }
>          else:
>              die('Unknown file command: %s' % line)
> +        if path.startswith('"'):
> +            path = ast.literal_eval(path)
>          files[path] = f
>  
>      # only export the commits if we are on an internal proxy repo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] remote-hg: unquote C-style paths when exporting
  2013-10-22 19:13 ` Junio C Hamano
@ 2013-10-22 20:49   ` Antoine Pelisse
  2013-10-23  0:45     ` Felipe Contreras
  0 siblings, 1 reply; 7+ messages in thread
From: Antoine Pelisse @ 2013-10-22 20:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Felipe Contreras

On Tue, Oct 22, 2013 at 9:13 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Antoine Pelisse <apelisse@gmail.com> writes:
>
>> git-fast-import documentation says that paths can be C-style quoted.
>> Unfortunately, the current remote-hg helper doesn't unquote quoted
>> path and pass them as-is to Mercurial when the commit is created.
>>
>> This result in the following situation:
>>
>> - clone a mercurial repository with git
>> - Add a file with space: `mkdir dir/foo\ bar`

Note to myself, mkdir doesn't create a "file"

>> - Commit that new file, and push the change to mercurial
>> - The mercurial repository as now a new directory named '"dir', which
>> contains a file named 'foo bar"'
>>
>> Use python ast.literal_eval to unquote the string if it starts with ".
>> It has been tested with quotes, spaces, and utf-8 encoded file-names.
>>
>> Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
>> ---
>
> A path you read in fast-import input indeed needs to be unquoted
> when it begins with a dq, and I _think_ by using ast.literal_eval(),
> you probably can correctly unquote any valid C-quoted string.
>
> But it bothers me somewhat that what the patch does seems to be
> overly broad.  Doesn't ast.literal_eval() take a lot more than just
> strings?

Good point

>     ast.literal_eval(node_or_string)
>
>         Safely evaluate an expression node or a Unicode or Latin-1
>         encoded string containing a Python expression. The string or
>         node provided may only consist of the following Python literal
>         structures: strings, numbers, tuples, lists, dicts, booleans,
>         and None.

Fortunately, I don't believe any of the other type can start with a
dq. So currently, I don't believe we can end-up with anything else but
a string. We could certainly check that this is always true though.

> Also doesn't Python's double-quoted string have a lot more magic
> than C-quoted string, e.g.
>
>         $ python -i
>         >>> import ast
>         >>> not_cq_path = '"abc" "def"'
>         >>> ast.literal_eval(not_cq_path)
>         'abcdef'

It is true that I have expected "valid output" from git-fast-export.
And I don't have in mind any easy solution to detect that the output
is broken, yet still accepted as a valid string by python. We could
obviously write a unquote_c_style() equivalent in python if needed.

Thanks,

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] remote-hg: unquote C-style paths when exporting
  2013-10-22 20:49   ` Antoine Pelisse
@ 2013-10-23  0:45     ` Felipe Contreras
  2013-10-23  8:38       ` Antoine Pelisse
  0 siblings, 1 reply; 7+ messages in thread
From: Felipe Contreras @ 2013-10-23  0:45 UTC (permalink / raw)
  To: Antoine Pelisse; +Cc: Junio C Hamano, git

On Tue, Oct 22, 2013 at 3:49 PM, Antoine Pelisse <apelisse@gmail.com> wrote:

> It is true that I have expected "valid output" from git-fast-export.
> And I don't have in mind any easy solution to detect that the output
> is broken, yet still accepted as a valid string by python. We could
> obviously write a unquote_c_style() equivalent in python if needed.

Something like this?

def c_style_unescape(string):
    if string[0] == string[-1] == '"':
        return string.decode('string-escape')[1:-1]
    return string

It's in git-remote-bzr.py.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] remote-hg: unquote C-style paths when exporting
  2013-10-23  0:45     ` Felipe Contreras
@ 2013-10-23  8:38       ` Antoine Pelisse
  2013-10-23 15:44         ` Re* " Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Antoine Pelisse @ 2013-10-23  8:38 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: Junio C Hamano, git

On Wed, Oct 23, 2013 at 2:45 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Tue, Oct 22, 2013 at 3:49 PM, Antoine Pelisse <apelisse@gmail.com> wrote:
>
>> It is true that I have expected "valid output" from git-fast-export.
>> And I don't have in mind any easy solution to detect that the output
>> is broken, yet still accepted as a valid string by python. We could
>> obviously write a unquote_c_style() equivalent in python if needed.
>
> Something like this?
>
> def c_style_unescape(string):
>     if string[0] == string[-1] == '"':
>         return string.decode('string-escape')[1:-1]
>     return string
>
> It's in git-remote-bzr.py.

Yeah, that's certainly better,

Thanks,

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re* [PATCH] remote-hg: unquote C-style paths when exporting
  2013-10-23  8:38       ` Antoine Pelisse
@ 2013-10-23 15:44         ` Junio C Hamano
  2013-10-23 15:53           ` Antoine Pelisse
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2013-10-23 15:44 UTC (permalink / raw)
  To: Antoine Pelisse; +Cc: Felipe Contreras, git

Antoine Pelisse <apelisse@gmail.com> writes:

>> def c_style_unescape(string):
>>     if string[0] == string[-1] == '"':
>>         return string.decode('string-escape')[1:-1]
>>     return string
>>
>> It's in git-remote-bzr.py.
>
> Yeah, that's certainly better,
>
> Thanks,

OK, so an amended one will look like this?

-- >8 --
From: Antoine Pelisse <apelisse@gmail.com>
Subject: remote-hg: unquote C-style paths when exporting

git-fast-import documentation says that paths can be C-style quoted.
Unfortunately, the current remote-hg helper doesn't unquote quoted
path and pass them as-is to Mercurial when the commit is created.

This result in the following situation:

- clone a mercurial repository with git
- Add a file with space: `mkdir dir/foo\ bar`
- Commit that new file, and push the change to mercurial
- The mercurial repository as now a new directory named '"dir', which
contains a file named 'foo bar"'

Use python ast.literal_eval to unquote the string if it starts with ".
It has been tested with quotes, spaces, and utf-8 encoded file-names.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 contrib/remote-helpers/git-remote-hg | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/contrib/remote-helpers/git-remote-hg b/contrib/remote-helpers/git-remote-hg
index 0194c67..85abbed 100755
--- a/contrib/remote-helpers/git-remote-hg
+++ b/contrib/remote-helpers/git-remote-hg
@@ -678,6 +678,11 @@ def get_merge_files(repo, p1, p2, files):
             f = { 'ctx' : repo[p1][e] }
             files[e] = f
 
+def c_style_unescape(string):
+    if string[0] == string[-1] == '"':
+        return string.decode('string-escape')[1:-1]
+    return string
+
 def parse_commit(parser):
     global marks, blob_marks, parsed_refs
     global mode
@@ -720,6 +725,7 @@ def parse_commit(parser):
             f = { 'deleted' : True }
         else:
             die('Unknown file command: %s' % line)
+        path = c_style_unescape(path).decode('utf-8')
         files[path] = f
 
     # only export the commits if we are on an internal proxy repo

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Re* [PATCH] remote-hg: unquote C-style paths when exporting
  2013-10-23 15:44         ` Re* " Junio C Hamano
@ 2013-10-23 15:53           ` Antoine Pelisse
  0 siblings, 0 replies; 7+ messages in thread
From: Antoine Pelisse @ 2013-10-23 15:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Felipe Contreras, git

On Wed, Oct 23, 2013 at 5:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Antoine Pelisse <apelisse@gmail.com> writes:
>
>>> def c_style_unescape(string):
>>>     if string[0] == string[-1] == '"':
>>>         return string.decode('string-escape')[1:-1]
>>>     return string
>>>
>>> It's in git-remote-bzr.py.
>>
>> Yeah, that's certainly better,
>>
>> Thanks,
>
> OK, so an amended one will look like this?

The commit message needs to be updated as well.

> -- >8 --
> From: Antoine Pelisse <apelisse@gmail.com>
> Subject: remote-hg: unquote C-style paths when exporting
>
> git-fast-import documentation says that paths can be C-style quoted.
> Unfortunately, the current remote-hg helper doesn't unquote quoted
> path and pass them as-is to Mercurial when the commit is created.
>
> This result in the following situation:

s/result/&s/

> - clone a mercurial repository with git
> - Add a file with space: `mkdir dir/foo\ bar`

- Add a file with space in a directory: `>dir/foo\ bar`

> - Commit that new file, and push the change to mercurial
> - The mercurial repository as now a new directory named '"dir', which
> contains a file named 'foo bar"'

I'm so ashamed I'd rather not report this one: s/as/has/

> Use python ast.literal_eval to unquote the string if it starts with ".

Use python str.decode('string-escape') to unquote the string if it
starts and ends with ".

> It has been tested with quotes, spaces, and utf-8 encoded file-names.
>
> Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  contrib/remote-helpers/git-remote-hg | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/contrib/remote-helpers/git-remote-hg b/contrib/remote-helpers/git-remote-hg
> index 0194c67..85abbed 100755
> --- a/contrib/remote-helpers/git-remote-hg
> +++ b/contrib/remote-helpers/git-remote-hg
> @@ -678,6 +678,11 @@ def get_merge_files(repo, p1, p2, files):
>              f = { 'ctx' : repo[p1][e] }
>              files[e] = f
>
> +def c_style_unescape(string):
> +    if string[0] == string[-1] == '"':
> +        return string.decode('string-escape')[1:-1]
> +    return string
> +
>  def parse_commit(parser):
>      global marks, blob_marks, parsed_refs
>      global mode
> @@ -720,6 +725,7 @@ def parse_commit(parser):
>              f = { 'deleted' : True }
>          else:
>              die('Unknown file command: %s' % line)
> +        path = c_style_unescape(path).decode('utf-8')
>          files[path] = f
>
>      # only export the commits if we are on an internal proxy repo

That is consistent with git-remote-bzr,

Thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-10-23 15:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-18 17:03 [PATCH] remote-hg: unquote C-style paths when exporting Antoine Pelisse
2013-10-22 19:13 ` Junio C Hamano
2013-10-22 20:49   ` Antoine Pelisse
2013-10-23  0:45     ` Felipe Contreras
2013-10-23  8:38       ` Antoine Pelisse
2013-10-23 15:44         ` Re* " Junio C Hamano
2013-10-23 15:53           ` Antoine Pelisse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).