* [PATCH] translate bad characters in refnames during git-svn fetch
@ 2007-07-15 13:05 martin f krafft
2007-07-16 3:30 ` Eric Wong
0 siblings, 1 reply; 10+ messages in thread
From: martin f krafft @ 2007-07-15 13:05 UTC (permalink / raw)
To: git discussion list
[-- Attachment #1: Type: text/plain, Size: 3146 bytes --]
Hi,
I am trying to track/convert the Debian pkg-mdadm repository with
git-svn:
svn://svn.debian.org/svn/pkg-mdadm/mdadm/trunk
My problem is that the fetching fails:
fatal: refs/remotes/tags/2.6.1-1~exp.1: cannot lock the ref
update-ref -m r311 refs/remotes/tags/2.6.1-1~exp.1
c6e351ea25dc90714048e33693099595c2d5dab8: command returned error:
128
This is because the ~ character is an invalid character for
a refname (it's used to specify the nth parent).
So I figured that the best way to deal with this is to introduce
a conversion filter to git-svn, but I cannot figure out where it has
to go. My perl is rusty and even after an hour now with the code,
I could not find the right spot.
The following patch works, but I can't really explain why. Moreover,
it does not change the STDERR output, so you'll still get stuff like
r340 = 0dc5693471af9dfdb712c1342071ba1040af8963
(tags/2.6.1-1~exp.3)
which makes me think that it's translating the refname too late.
However, the end result looks sane.
Comments welcome,
m
---
git-check-ref-format(1) documents which characters may be contained in
a refname. Since Subversion has different rules, an import can result in
problems, such as:
fatal: refs/remotes/tags/2.6.1-1~exp.1: cannot lock the ref
update-ref -m r311 refs/remotes/tags/2.6.1-1~exp.1
c6e351ea25dc90714048e33693099595c2d5dab8: command returned error: 128
This patch translates bad characters to valid substitutes to enable imports of
tags/branches/whatever using characters that git does not allow in refnames.
Signed-off-by: martin f. krafft <madduck@piper.oerlikon.madduck.net>
---
git-svn.perl | 24 +++++++++++++++++++++++-
1 files changed, 23 insertions(+), 1 deletions(-)
diff --git a/git-svn.perl b/git-svn.perl
index 299b40f..de43697 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -1239,7 +1239,29 @@ sub new {
$self;
}
-sub refname { "refs/remotes/$_[0]->{ref_id}" }
+sub refname {
+ my ($refname) = $_[0]->{ref_id};
+ ## transform the refname as per rules in git-check-ref-format(1):
+ # no slash-separated omponent can begin with a dot .
+ # /.* becomes /,*
+ $refname =~ s|/\.|/,|g;
+ # It cannot have two consecutive dots .. anywhere
+ # .. becomes ,,
+ $refname =~ s|\.\.|,,|g;
+ # It cannot have ASCII control character space, tilde ~, caret ^,
+ # colon :, question-mark ?, asterisk *, or open bracket[ anywhere
+ # <space> becomes _
+ # ~ becomes =
+ # ^ becomes @
+ # : becomes %
+ # ? becomes $
+ # * becomes +
+ # [ becomes (
+ $refname =~ y| ~^:?*[|_=@%\$+(|;
+ # It cannot end with a slash /
+ $refname =~ s|/$||g;
+ "refs/remotes/$refname";
+}
sub svm_uuid {
my ($self) = @_;
--
1.5.3.rc1.27.ga5e40
--
martin; (greetings from the heart of the sun.)
\____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck
spamtraps: madduck.bogus@madduck.net
"a warm bed in a house sounds a mite better
than eating a hot dog on a stick
with an old geezer traveling on a lawn mower."
-- alvin straight (the straight story)
[-- Attachment #2: Digital signature (GPG/PGP) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] translate bad characters in refnames during git-svn fetch
2007-07-15 13:05 [PATCH] translate bad characters in refnames during git-svn fetch martin f krafft
@ 2007-07-16 3:30 ` Eric Wong
2007-07-16 11:15 ` Jan Hudec
0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2007-07-16 3:30 UTC (permalink / raw)
To: git discussion list
martin f krafft <madduck@madduck.net> wrote:
> Hi,
>
> I am trying to track/convert the Debian pkg-mdadm repository with
> git-svn:
>
> svn://svn.debian.org/svn/pkg-mdadm/mdadm/trunk
>
> My problem is that the fetching fails:
>
> fatal: refs/remotes/tags/2.6.1-1~exp.1: cannot lock the ref
> update-ref -m r311 refs/remotes/tags/2.6.1-1~exp.1
> c6e351ea25dc90714048e33693099595c2d5dab8: command returned error:
> 128
>
> This is because the ~ character is an invalid character for
> a refname (it's used to specify the nth parent).
>
> So I figured that the best way to deal with this is to introduce
> a conversion filter to git-svn, but I cannot figure out where it has
> to go. My perl is rusty and even after an hour now with the code,
> I could not find the right spot.
>
> The following patch works, but I can't really explain why. Moreover,
> it does not change the STDERR output, so you'll still get stuff like
>
> r340 = 0dc5693471af9dfdb712c1342071ba1040af8963
> (tags/2.6.1-1~exp.3)
>
> which makes me think that it's translating the refname too late.
> However, the end result looks sane.
>
> Comments welcome,
The major issue with this is that it doesn't handle odd cases
where a refname is sanitized into something
(say "1234~2" sanitizes to "1234=2"), and then another branch
is created named "1234=2".
git-svn should at least keep track of what it got sanitized to, to
avoid clobbering branches.
I started working on this a while back but haven't gotten around
to revisiting it:
http://thread.gmane.org/gmane.comp.version-control.git/45651
> ---
> git-check-ref-format(1) documents which characters may be contained in
> a refname. Since Subversion has different rules, an import can result in
> problems, such as:
>
> fatal: refs/remotes/tags/2.6.1-1~exp.1: cannot lock the ref
> update-ref -m r311 refs/remotes/tags/2.6.1-1~exp.1
> c6e351ea25dc90714048e33693099595c2d5dab8: command returned error: 128
>
> This patch translates bad characters to valid substitutes to enable imports of
> tags/branches/whatever using characters that git does not allow in refnames.
>
> Signed-off-by: martin f. krafft <madduck@piper.oerlikon.madduck.net>
> ---
> git-svn.perl | 24 +++++++++++++++++++++++-
> 1 files changed, 23 insertions(+), 1 deletions(-)
>
> diff --git a/git-svn.perl b/git-svn.perl
> index 299b40f..de43697 100755
> --- a/git-svn.perl
> +++ b/git-svn.perl
> @@ -1239,7 +1239,29 @@ sub new {
> $self;
> }
>
> -sub refname { "refs/remotes/$_[0]->{ref_id}" }
> +sub refname {
> + my ($refname) = $_[0]->{ref_id};
> + ## transform the refname as per rules in git-check-ref-format(1):
> + # no slash-separated omponent can begin with a dot .
> + # /.* becomes /,*
> + $refname =~ s|/\.|/,|g;
> + # It cannot have two consecutive dots .. anywhere
> + # .. becomes ,,
> + $refname =~ s|\.\.|,,|g;
> + # It cannot have ASCII control character space, tilde ~, caret ^,
> + # colon :, question-mark ?, asterisk *, or open bracket[ anywhere
> + # <space> becomes _
> + # ~ becomes =
> + # ^ becomes @
> + # : becomes %
> + # ? becomes $
> + # * becomes +
> + # [ becomes (
> + $refname =~ y| ~^:?*[|_=@%\$+(|;
> + # It cannot end with a slash /
> + $refname =~ s|/$||g;
> + "refs/remotes/$refname";
> +}
>
> sub svm_uuid {
> my ($self) = @_;
> --
--
Eric Wong
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] translate bad characters in refnames during git-svn fetch
2007-07-16 3:30 ` Eric Wong
@ 2007-07-16 11:15 ` Jan Hudec
2007-07-16 17:47 ` martin f krafft
0 siblings, 1 reply; 10+ messages in thread
From: Jan Hudec @ 2007-07-16 11:15 UTC (permalink / raw)
To: Eric Wong; +Cc: git discussion list
[-- Attachment #1: Type: text/plain, Size: 1205 bytes --]
On Sun, Jul 15, 2007 at 20:30:50 -0700, Eric Wong wrote:
> The major issue with this is that it doesn't handle odd cases
> where a refname is sanitized into something
> (say "1234~2" sanitizes to "1234=2"), and then another branch
> is created named "1234=2".
>
> git-svn should at least keep track of what it got sanitized to, to
> avoid clobbering branches.
>
> I started working on this a while back but haven't gotten around
> to revisiting it:
> http://thread.gmane.org/gmane.comp.version-control.git/45651
I believe % is safe, right? So what if git-svn just url-escaped stuff in the
branch name it does not like. Of course % would be included in the list of
characters it does not like. Eg. 1234~2 would escape to 1234%7E2 and if the
user ever head 1234%7E2 in svn, it would simply escape too, to 1234%257E2.
Space is rather common, but that's why there is the + rule in url-encoding --
"foo bar" escapes to "foo+bar" and "foo+bar" escapes to "foo%2Bbar". Or you
could use something else to escape space. I can only think of "=", "_" is too
common to have it escaped and anything else would conflict with either git or
shell.
--
Jan 'Bulb' Hudec <bulb@ucw.cz>
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] translate bad characters in refnames during git-svn fetch
2007-07-16 11:15 ` Jan Hudec
@ 2007-07-16 17:47 ` martin f krafft
2007-07-17 12:28 ` Eric Wong
0 siblings, 1 reply; 10+ messages in thread
From: martin f krafft @ 2007-07-16 17:47 UTC (permalink / raw)
To: git discussion list
[-- Attachment #1: Type: text/plain, Size: 841 bytes --]
also sprach Eric Wong <normalperson@yhbt.net> [2007.07.16.0530 +0200]:
> The major issue with this is that it doesn't handle odd cases
> where a refname is sanitized into something (say "1234~2"
> sanitizes to "1234=2"), and then another branch is created named
> "1234=2".
Well, we can't please everyone, can we? :)
I like Jan's proposal about using the % escape, even though it
doesn't make pretty branch names.
On the other hand, we could make the translation regexps
configurable...
--
martin; (greetings from the heart of the sun.)
\____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck
spamtraps: madduck.bogus@madduck.net
"if they can get you asking the wrong questions,
they don't have to worry about answers."
-- thomas pynchon
[-- Attachment #2: Digital signature (GPG/PGP) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] translate bad characters in refnames during git-svn fetch
2007-07-16 17:47 ` martin f krafft
@ 2007-07-17 12:28 ` Eric Wong
2007-07-17 13:17 ` martin f krafft
2007-07-28 7:23 ` Mike Hommey
0 siblings, 2 replies; 10+ messages in thread
From: Eric Wong @ 2007-07-17 12:28 UTC (permalink / raw)
To: git discussion list
martin f krafft <madduck@madduck.net> wrote:
> also sprach Eric Wong <normalperson@yhbt.net> [2007.07.16.0530 +0200]:
> > The major issue with this is that it doesn't handle odd cases
> > where a refname is sanitized into something (say "1234~2"
> > sanitizes to "1234=2"), and then another branch is created named
> > "1234=2".
>
> Well, we can't please everyone, can we? :)
>
> I like Jan's proposal about using the % escape, even though it
> doesn't make pretty branch names.
I like it, too. How about something like the two functions below? This
will break things a bit for people currently using % in refnames,
however.
I think this will work rather nicely once I've figured out how the path
globbing code works[1] and where to sanitize/desanitize the refnames
properly.
It would be far easier to take your approach and sanitize them only
for the command-line, but storing unsanitized git refnames into the
.git/config is something I want to avoid:
Somebody naming directories on the SVN side with the path component
":refs/remotes" in them could screw things up for us.
# transform the refname as per rules in git-check-ref-format(1):
sub sanitize_ref_name {
my ($refname) = @_;
# It cannot end with a slash /, we'll throw up on this because
# SVN can't have directories with a slash in their name, either:
if ($refname =~ m{/$}) {
die "ref: '$refname' ends with a trailing slash, this is ",
"not permitted by git nor Subversion\n";
}
# It cannot have ASCII control character space, tilde ~, caret ^,
# colon :, question-mark ?, asterisk *, or open bracket[ anywhere
#
# Additionally, % must be escaped because it is used for escaping
# and we want our escaped refname to be reversible
$refname =~ s{( \%~\^:\?\*\[\t)}{uc sprintf('%%%02x',ord($1))}eg;
# no slash-separated component can begin with a dot .
# /.* becomes /%2E*
$refname =~ s{/\.}{/%2E}g;
# It cannot have two consecutive dots .. anywhere
# .. becomes %2E%2E
$refname =~ s{\.\.}{%2E%2E}g;
$refname;
}
sub desanitize_ref_name {
my ($refname) = @_;
$refname =~ s{%(?:([0-9A-F]{2})}{chr hex($1)}g;
$refname;
}
> On the other hand, we could make the translation regexps
> configurable...
Hopefully not needed. I fear it would just add to confusion.
[1] I don't remember writing the globbing code myself, maybe it was my
psychotic alter ego, but I'm having trouble following it at this time of
the night/morning.
--
Eric Wong
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] translate bad characters in refnames during git-svn fetch
2007-07-17 12:28 ` Eric Wong
@ 2007-07-17 13:17 ` martin f krafft
2007-07-26 10:59 ` Robert Ewald
2007-07-28 7:23 ` Mike Hommey
1 sibling, 1 reply; 10+ messages in thread
From: martin f krafft @ 2007-07-17 13:17 UTC (permalink / raw)
To: git discussion list
[-- Attachment #1: Type: text/plain, Size: 2658 bytes --]
also sprach Eric Wong <normalperson@yhbt.net> [2007.07.17.1428 +0200]:
> I like it, too. How about something like the two functions below?
> This will break things a bit for people currently using % in
> refnames, however.
Well, wait. git-svn usually works in its own repo, and if that's
tracked by another repo, then it is tracked under the
remote/whatever namespace, so there should not be any conflicts. You
also hardly ever run git-svn to clone stuff *into* an existing repo,
so there can't be conflicts with existing refnames-with-%. Thus the
only breakage is if a person creates a new refname inside a git-svn
repo, which uses % in such a way as to collide with an imported
branch/tag/whatever from git-svn. That's not breakage, since git
will just refuse to do it.
Remember that we're only translating from <char> -> %XX, never the
other way around, really. Okay, we might be during git-svn
rebase/dcommit, but only for those refnames which we store in
.git/svn/ anyway. So a user-specified refname containing % will not
be a problem, will it?
> I think this will work rather nicely once I've figured out how the path
> globbing code works[1] and where to sanitize/desanitize the refnames
> properly.
I am glad you're having the same problem; makes me feel less stupid.
:)
> Somebody naming directories on the SVN side with the path component
> ":refs/remotes" in them could screw things up for us.
Those people should be tarred and feathered. git owns the trademark
on these names.
> sub desanitize_ref_name {
> my ($refname) = @_;
> $refname =~ s{%(?:([0-9A-F]{2})}{chr hex($1)}g;
>
> $refname;
> }
We could make it escape to %25; instead of %25. That's ugly but it
would make desanitation a little safer.
> > On the other hand, we could make the translation regexps
> > configurable...
>
> Hopefully not needed. I fear it would just add to confusion.
I was thinking about something like.
git-svn clone ...
...
error: remote branch/tagn name includes ~, which git does not
allow. please specify a replacement character in .git/config
and then have config.svn-remote.svn.translations simply be a list of
pairs in vim pairlist syntax:
~:!,^:#,.:\,
--
martin; (greetings from the heart of the sun.)
\____ echo mailto: !#^."<*>"|tr "<*> mailto:" net@madduck
spamtraps: madduck.bogus@madduck.net
"it is easier to be a lover than a husband for the simple reason
that it is more difficult to be witty every day
than to say pretty things from time to time."
-- honoré de balzac
[-- Attachment #2: Digital signature (GPG/PGP) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] translate bad characters in refnames during git-svn fetch
2007-07-17 13:17 ` martin f krafft
@ 2007-07-26 10:59 ` Robert Ewald
2007-07-26 12:35 ` Martin F Krafft
0 siblings, 1 reply; 10+ messages in thread
From: Robert Ewald @ 2007-07-26 10:59 UTC (permalink / raw)
To: git
Hello,
I am very interested in a functionality like this.
martin f krafft wrote:
>> sub desanitize_ref_name {
>> my ($refname) = @_;
>> $refname =~ s{%(?:([0-9A-F]{2})}{chr hex($1)}g;
>>
>> $refname;
>> }
>
> We could make it escape to %25; instead of %25. That's ugly but it
> would make desanitation a little safer.
In my limited knowledge I wonder if that would confuse shell scripts.
>> > On the other hand, we could make the translation regexps
>> > configurable...
>>
>> Hopefully not needed. I fear it would just add to confusion.
>
> I was thinking about something like.
>
> git-svn clone ...
> ...
> error: remote branch/tagn name includes ~, which git does not
> allow. please specify a replacement character in .git/config
>
> and then have config.svn-remote.svn.translations simply be a list of
> pairs in vim pairlist syntax:
>
> ~:!,^:#,.:\,
>
Having the user specify replacements leads to diversion which would not be
desired. Consider the case where two git users clone a svn repo and later
pull from each other. Different replacements would cause confusion in this
case. That can of course be remedied by having the same replacements but
then configuration is not needed.
Is there anybody working on this feature at the moment? Can I pull from
somewhere? I am hard pressed for that feature but my ability to contribute
is only in testing and reporting bugs.
Greetings
--
Robert Ewald
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] translate bad characters in refnames during git-svn fetch
2007-07-26 10:59 ` Robert Ewald
@ 2007-07-26 12:35 ` Martin F Krafft
0 siblings, 0 replies; 10+ messages in thread
From: Martin F Krafft @ 2007-07-26 12:35 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 887 bytes --]
also sprach Robert Ewald <robert.ewald@nov.com> [2007.07.26.1259 +0200]:
> Is there anybody working on this feature at the moment? Can I pull
> from somewhere? I am hard pressed for that feature but my ability
> to contribute is only in testing and reporting bugs.
As I told you on IRC, I am on my way out for holiday but will read
email, so if you need non-urgent feedback, please write. Please
include my name or reply to this thread for me to see it.
--
Martin F. Krafft Artificial Intelligence Laboratory
Ph.D. Student Department of Information Technology
Email: krafft@ailab.ch University of Zurich
Tel: +41.(0)44.63-54323 Andreasstrasse 15, Office 2.18
http://ailab.ch/people/krafft CH-8050 Zurich, Switzerland
Spamtraps: krafft.bogus@ailab.ch krafft.bogus@ifi.unizh.ch
gentoo: the performance placebo.
[-- Attachment #2: Digital signature (GPG/PGP) --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] translate bad characters in refnames during git-svn fetch
2007-07-17 12:28 ` Eric Wong
2007-07-17 13:17 ` martin f krafft
@ 2007-07-28 7:23 ` Mike Hommey
2007-07-28 7:33 ` David Kastrup
1 sibling, 1 reply; 10+ messages in thread
From: Mike Hommey @ 2007-07-28 7:23 UTC (permalink / raw)
To: git
Eric Wong <normalperson <at> yhbt.net> writes:
> martin f krafft <madduck <at> madduck.net> wrote:
> > also sprach Eric Wong <normalperson <at> yhbt.net> [2007.07.16.0530 +0200]:
> > > The major issue with this is that it doesn't handle odd cases
> > > where a refname is sanitized into something (say "1234~2"
> > > sanitizes to "1234=2"), and then another branch is created named
> > > "1234=2".
> >
> > Well, we can't please everyone, can we? :)
> >
> > I like Jan's proposal about using the % escape, even though it
> > doesn't make pretty branch names.
>
> I like it, too. How about something like the two functions below? This
> will break things a bit for people currently using % in refnames,
> however.
>
> I think this will work rather nicely once I've figured out how the path
> globbing code works[1] and where to sanitize/desanitize the refnames
> properly.
>
> It would be far easier to take your approach and sanitize them only
> for the command-line, but storing unsanitized git refnames into the
> .git/config is something I want to avoid:
>
> Somebody naming directories on the SVN side with the path component
> ":refs/remotes" in them could screw things up for us.
Why not "simply" allow some form of escaping in refs, such that special
characters CAN be used anywhere. Then git-svn would just have to escape these
characters.
Something like:
git update-ref "refs/remotes/tags/sometag\~1" $sha1
I'm pretty sure that could help fix a lot of other similar issues.
Mike
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] translate bad characters in refnames during git-svn fetch
2007-07-28 7:23 ` Mike Hommey
@ 2007-07-28 7:33 ` David Kastrup
0 siblings, 0 replies; 10+ messages in thread
From: David Kastrup @ 2007-07-28 7:33 UTC (permalink / raw)
To: Mike Hommey; +Cc: git
Mike Hommey <mh@glandium.org> writes:
> Eric Wong <normalperson <at> yhbt.net> writes:
>>
>> Somebody naming directories on the SVN side with the path component
>> ":refs/remotes" in them could screw things up for us.
>
> Why not "simply" allow some form of escaping in refs, such that special
> characters CAN be used anywhere. Then git-svn would just have to escape these
> characters.
>
> Something like:
> git update-ref "refs/remotes/tags/sometag\~1" $sha1
>
> I'm pretty sure that could help fix a lot of other similar issues.
Well, having had to do my fair level of porting shell-scripts and
installation stuff and so on to Windows/MacOSX whatsoever where spaces
(and other characters) in file names are considered business as usual:
it is a bottomless pit. You'll always find one more place in your
software that does not get this right.
It may be a more confined problem to make the interoperation utility
responsible for quoting/renaming.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-07-28 7:34 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-15 13:05 [PATCH] translate bad characters in refnames during git-svn fetch martin f krafft
2007-07-16 3:30 ` Eric Wong
2007-07-16 11:15 ` Jan Hudec
2007-07-16 17:47 ` martin f krafft
2007-07-17 12:28 ` Eric Wong
2007-07-17 13:17 ` martin f krafft
2007-07-26 10:59 ` Robert Ewald
2007-07-26 12:35 ` Martin F Krafft
2007-07-28 7:23 ` Mike Hommey
2007-07-28 7:33 ` David Kastrup
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).