* [PATCH] remote-hg: unquote C-style paths when exporting
@ 2013-10-18 17:03 Antoine Pelisse
2013-10-22 19:13 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Antoine Pelisse @ 2013-10-18 17:03 UTC (permalink / raw)
To: git; +Cc: Felipe Contreras, Antoine Pelisse
git-fast-import documentation says that paths can be C-style quoted.
Unfortunately, the current remote-hg helper doesn't unquote quoted
path and pass them as-is to Mercurial when the commit is created.
This result in the following situation:
- clone a mercurial repository with git
- Add a file with space: `mkdir dir/foo\ bar`
- Commit that new file, and push the change to mercurial
- The mercurial repository as now a new directory named '"dir', which
contains a file named 'foo bar"'
Use python ast.literal_eval to unquote the string if it starts with ".
It has been tested with quotes, spaces, and utf-8 encoded file-names.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
---
contrib/remote-helpers/git-remote-hg | 3 +++
1 file changed, 3 insertions(+)
diff --git a/contrib/remote-helpers/git-remote-hg b/contrib/remote-helpers/git-remote-hg
index 92d994e..0141949 100755
--- a/contrib/remote-helpers/git-remote-hg
+++ b/contrib/remote-helpers/git-remote-hg
@@ -14,6 +14,7 @@
from mercurial import hg, ui, bookmarks, context, encoding, node, error, extensions, discovery, util
+import ast
import re
import sys
import os
@@ -742,6 +743,8 @@ def parse_commit(parser):
f = { 'deleted' : True }
else:
die('Unknown file command: %s' % line)
+ if path.startswith('"'):
+ path = ast.literal_eval(path)
files[path] = f
# only export the commits if we are on an internal proxy repo
--
1.8.4.1.507.g9768648.dirty
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] remote-hg: unquote C-style paths when exporting
2013-10-18 17:03 [PATCH] remote-hg: unquote C-style paths when exporting Antoine Pelisse
@ 2013-10-22 19:13 ` Junio C Hamano
2013-10-22 20:49 ` Antoine Pelisse
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2013-10-22 19:13 UTC (permalink / raw)
To: Antoine Pelisse; +Cc: git, Felipe Contreras
Antoine Pelisse <apelisse@gmail.com> writes:
> git-fast-import documentation says that paths can be C-style quoted.
> Unfortunately, the current remote-hg helper doesn't unquote quoted
> path and pass them as-is to Mercurial when the commit is created.
>
> This result in the following situation:
>
> - clone a mercurial repository with git
> - Add a file with space: `mkdir dir/foo\ bar`
> - Commit that new file, and push the change to mercurial
> - The mercurial repository as now a new directory named '"dir', which
> contains a file named 'foo bar"'
>
> Use python ast.literal_eval to unquote the string if it starts with ".
> It has been tested with quotes, spaces, and utf-8 encoded file-names.
>
> Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
> ---
A path you read in fast-import input indeed needs to be unquoted
when it begins with a dq, and I _think_ by using ast.literal_eval(),
you probably can correctly unquote any valid C-quoted string.
But it bothers me somewhat that what the patch does seems to be
overly broad. Doesn't ast.literal_eval() take a lot more than just
strings?
ast.literal_eval(node_or_string)
Safely evaluate an expression node or a Unicode or Latin-1
encoded string containing a Python expression. The string or
node provided may only consist of the following Python literal
structures: strings, numbers, tuples, lists, dicts, booleans,
and None.
Also doesn't Python's double-quoted string have a lot more magic
than C-quoted string, e.g.
$ python -i
>>> import ast
>>> not_cq_path = '"abc" "def"'
>>> ast.literal_eval(not_cq_path)
'abcdef'
> contrib/remote-helpers/git-remote-hg | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/contrib/remote-helpers/git-remote-hg b/contrib/remote-helpers/git-remote-hg
> index 92d994e..0141949 100755
> --- a/contrib/remote-helpers/git-remote-hg
> +++ b/contrib/remote-helpers/git-remote-hg
> @@ -14,6 +14,7 @@
>
> from mercurial import hg, ui, bookmarks, context, encoding, node, error, extensions, discovery, util
>
> +import ast
> import re
> import sys
> import os
> @@ -742,6 +743,8 @@ def parse_commit(parser):
> f = { 'deleted' : True }
> else:
> die('Unknown file command: %s' % line)
> + if path.startswith('"'):
> + path = ast.literal_eval(path)
> files[path] = f
>
> # only export the commits if we are on an internal proxy repo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] remote-hg: unquote C-style paths when exporting
2013-10-22 19:13 ` Junio C Hamano
@ 2013-10-22 20:49 ` Antoine Pelisse
2013-10-23 0:45 ` Felipe Contreras
0 siblings, 1 reply; 7+ messages in thread
From: Antoine Pelisse @ 2013-10-22 20:49 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Felipe Contreras
On Tue, Oct 22, 2013 at 9:13 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Antoine Pelisse <apelisse@gmail.com> writes:
>
>> git-fast-import documentation says that paths can be C-style quoted.
>> Unfortunately, the current remote-hg helper doesn't unquote quoted
>> path and pass them as-is to Mercurial when the commit is created.
>>
>> This result in the following situation:
>>
>> - clone a mercurial repository with git
>> - Add a file with space: `mkdir dir/foo\ bar`
Note to myself, mkdir doesn't create a "file"
>> - Commit that new file, and push the change to mercurial
>> - The mercurial repository as now a new directory named '"dir', which
>> contains a file named 'foo bar"'
>>
>> Use python ast.literal_eval to unquote the string if it starts with ".
>> It has been tested with quotes, spaces, and utf-8 encoded file-names.
>>
>> Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
>> ---
>
> A path you read in fast-import input indeed needs to be unquoted
> when it begins with a dq, and I _think_ by using ast.literal_eval(),
> you probably can correctly unquote any valid C-quoted string.
>
> But it bothers me somewhat that what the patch does seems to be
> overly broad. Doesn't ast.literal_eval() take a lot more than just
> strings?
Good point
> ast.literal_eval(node_or_string)
>
> Safely evaluate an expression node or a Unicode or Latin-1
> encoded string containing a Python expression. The string or
> node provided may only consist of the following Python literal
> structures: strings, numbers, tuples, lists, dicts, booleans,
> and None.
Fortunately, I don't believe any of the other type can start with a
dq. So currently, I don't believe we can end-up with anything else but
a string. We could certainly check that this is always true though.
> Also doesn't Python's double-quoted string have a lot more magic
> than C-quoted string, e.g.
>
> $ python -i
> >>> import ast
> >>> not_cq_path = '"abc" "def"'
> >>> ast.literal_eval(not_cq_path)
> 'abcdef'
It is true that I have expected "valid output" from git-fast-export.
And I don't have in mind any easy solution to detect that the output
is broken, yet still accepted as a valid string by python. We could
obviously write a unquote_c_style() equivalent in python if needed.
Thanks,
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] remote-hg: unquote C-style paths when exporting
2013-10-22 20:49 ` Antoine Pelisse
@ 2013-10-23 0:45 ` Felipe Contreras
2013-10-23 8:38 ` Antoine Pelisse
0 siblings, 1 reply; 7+ messages in thread
From: Felipe Contreras @ 2013-10-23 0:45 UTC (permalink / raw)
To: Antoine Pelisse; +Cc: Junio C Hamano, git
On Tue, Oct 22, 2013 at 3:49 PM, Antoine Pelisse <apelisse@gmail.com> wrote:
> It is true that I have expected "valid output" from git-fast-export.
> And I don't have in mind any easy solution to detect that the output
> is broken, yet still accepted as a valid string by python. We could
> obviously write a unquote_c_style() equivalent in python if needed.
Something like this?
def c_style_unescape(string):
if string[0] == string[-1] == '"':
return string.decode('string-escape')[1:-1]
return string
It's in git-remote-bzr.py.
--
Felipe Contreras
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] remote-hg: unquote C-style paths when exporting
2013-10-23 0:45 ` Felipe Contreras
@ 2013-10-23 8:38 ` Antoine Pelisse
2013-10-23 15:44 ` Re* " Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Antoine Pelisse @ 2013-10-23 8:38 UTC (permalink / raw)
To: Felipe Contreras; +Cc: Junio C Hamano, git
On Wed, Oct 23, 2013 at 2:45 AM, Felipe Contreras
<felipe.contreras@gmail.com> wrote:
> On Tue, Oct 22, 2013 at 3:49 PM, Antoine Pelisse <apelisse@gmail.com> wrote:
>
>> It is true that I have expected "valid output" from git-fast-export.
>> And I don't have in mind any easy solution to detect that the output
>> is broken, yet still accepted as a valid string by python. We could
>> obviously write a unquote_c_style() equivalent in python if needed.
>
> Something like this?
>
> def c_style_unescape(string):
> if string[0] == string[-1] == '"':
> return string.decode('string-escape')[1:-1]
> return string
>
> It's in git-remote-bzr.py.
Yeah, that's certainly better,
Thanks,
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re* [PATCH] remote-hg: unquote C-style paths when exporting
2013-10-23 8:38 ` Antoine Pelisse
@ 2013-10-23 15:44 ` Junio C Hamano
2013-10-23 15:53 ` Antoine Pelisse
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2013-10-23 15:44 UTC (permalink / raw)
To: Antoine Pelisse; +Cc: Felipe Contreras, git
Antoine Pelisse <apelisse@gmail.com> writes:
>> def c_style_unescape(string):
>> if string[0] == string[-1] == '"':
>> return string.decode('string-escape')[1:-1]
>> return string
>>
>> It's in git-remote-bzr.py.
>
> Yeah, that's certainly better,
>
> Thanks,
OK, so an amended one will look like this?
-- >8 --
From: Antoine Pelisse <apelisse@gmail.com>
Subject: remote-hg: unquote C-style paths when exporting
git-fast-import documentation says that paths can be C-style quoted.
Unfortunately, the current remote-hg helper doesn't unquote quoted
path and pass them as-is to Mercurial when the commit is created.
This result in the following situation:
- clone a mercurial repository with git
- Add a file with space: `mkdir dir/foo\ bar`
- Commit that new file, and push the change to mercurial
- The mercurial repository as now a new directory named '"dir', which
contains a file named 'foo bar"'
Use python ast.literal_eval to unquote the string if it starts with ".
It has been tested with quotes, spaces, and utf-8 encoded file-names.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
contrib/remote-helpers/git-remote-hg | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/contrib/remote-helpers/git-remote-hg b/contrib/remote-helpers/git-remote-hg
index 0194c67..85abbed 100755
--- a/contrib/remote-helpers/git-remote-hg
+++ b/contrib/remote-helpers/git-remote-hg
@@ -678,6 +678,11 @@ def get_merge_files(repo, p1, p2, files):
f = { 'ctx' : repo[p1][e] }
files[e] = f
+def c_style_unescape(string):
+ if string[0] == string[-1] == '"':
+ return string.decode('string-escape')[1:-1]
+ return string
+
def parse_commit(parser):
global marks, blob_marks, parsed_refs
global mode
@@ -720,6 +725,7 @@ def parse_commit(parser):
f = { 'deleted' : True }
else:
die('Unknown file command: %s' % line)
+ path = c_style_unescape(path).decode('utf-8')
files[path] = f
# only export the commits if we are on an internal proxy repo
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Re* [PATCH] remote-hg: unquote C-style paths when exporting
2013-10-23 15:44 ` Re* " Junio C Hamano
@ 2013-10-23 15:53 ` Antoine Pelisse
0 siblings, 0 replies; 7+ messages in thread
From: Antoine Pelisse @ 2013-10-23 15:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Felipe Contreras, git
On Wed, Oct 23, 2013 at 5:44 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Antoine Pelisse <apelisse@gmail.com> writes:
>
>>> def c_style_unescape(string):
>>> if string[0] == string[-1] == '"':
>>> return string.decode('string-escape')[1:-1]
>>> return string
>>>
>>> It's in git-remote-bzr.py.
>>
>> Yeah, that's certainly better,
>>
>> Thanks,
>
> OK, so an amended one will look like this?
The commit message needs to be updated as well.
> -- >8 --
> From: Antoine Pelisse <apelisse@gmail.com>
> Subject: remote-hg: unquote C-style paths when exporting
>
> git-fast-import documentation says that paths can be C-style quoted.
> Unfortunately, the current remote-hg helper doesn't unquote quoted
> path and pass them as-is to Mercurial when the commit is created.
>
> This result in the following situation:
s/result/&s/
> - clone a mercurial repository with git
> - Add a file with space: `mkdir dir/foo\ bar`
- Add a file with space in a directory: `>dir/foo\ bar`
> - Commit that new file, and push the change to mercurial
> - The mercurial repository as now a new directory named '"dir', which
> contains a file named 'foo bar"'
I'm so ashamed I'd rather not report this one: s/as/has/
> Use python ast.literal_eval to unquote the string if it starts with ".
Use python str.decode('string-escape') to unquote the string if it
starts and ends with ".
> It has been tested with quotes, spaces, and utf-8 encoded file-names.
>
> Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> contrib/remote-helpers/git-remote-hg | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/contrib/remote-helpers/git-remote-hg b/contrib/remote-helpers/git-remote-hg
> index 0194c67..85abbed 100755
> --- a/contrib/remote-helpers/git-remote-hg
> +++ b/contrib/remote-helpers/git-remote-hg
> @@ -678,6 +678,11 @@ def get_merge_files(repo, p1, p2, files):
> f = { 'ctx' : repo[p1][e] }
> files[e] = f
>
> +def c_style_unescape(string):
> + if string[0] == string[-1] == '"':
> + return string.decode('string-escape')[1:-1]
> + return string
> +
> def parse_commit(parser):
> global marks, blob_marks, parsed_refs
> global mode
> @@ -720,6 +725,7 @@ def parse_commit(parser):
> f = { 'deleted' : True }
> else:
> die('Unknown file command: %s' % line)
> + path = c_style_unescape(path).decode('utf-8')
> files[path] = f
>
> # only export the commits if we are on an internal proxy repo
That is consistent with git-remote-bzr,
Thanks
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-10-23 15:53 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-18 17:03 [PATCH] remote-hg: unquote C-style paths when exporting Antoine Pelisse
2013-10-22 19:13 ` Junio C Hamano
2013-10-22 20:49 ` Antoine Pelisse
2013-10-23 0:45 ` Felipe Contreras
2013-10-23 8:38 ` Antoine Pelisse
2013-10-23 15:44 ` Re* " Junio C Hamano
2013-10-23 15:53 ` Antoine Pelisse
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).