* reposurgeon now writes Subversion repositories
@ 2012-11-29 5:59 Eric S. Raymond
2012-11-29 7:58 ` Daniel Shahaf
0 siblings, 1 reply; 7+ messages in thread
From: Eric S. Raymond @ 2012-11-29 5:59 UTC (permalink / raw)
To: dev, git
This is something that probably doesn't happen very often -
cross-posting to the Subversion and git dev lists that is on-topic for
both :-).
The repo head version of reposurgeon can now write Subversion
repositories from its common git-import-stream-based representation of
repository histories, as well as reading them in. This joins full
support for git, hg, and bzr; it means that in theory reposurgeon
could now be used to move revision histories from these systems to
Subversion, as well as the other way around.
(For those of you who have been living under a rock, reposurgeon is a
multi-VCS surgery and conversion tool. Since 2.x it does a more
intelligent job of lifting from Subversion to anything else than any
other tool I know of. Much more at <http://www.catb.org/esr/reposurgeon/>.)
Presently, writing (as opposed to reading) Subversion repos is more of
a stunt than a real production technique, and may always remain so.
It has serious limitations. I am posting because I think the details
of those limitations will be of some technical interest to both
Subversion and git developers.
Indented paragraphs is the documentation from reposurgeon's manual
page. I have added some further notes.
In summary, Subversion repository histories do not round-trip through
reposurgeon editing. File content changes are preserved but some
metadata is unavoidably lost. Furthermore, writing out a DVCS history
in Subversion also loses significant portions of its metadata.
Writing a Subversion repository or dump stream discards author
information, the committer's name, and the hostname part of the commit
address; only the commit timestamp and the local part of the
committer's email address are preserved, the latter becoming the
Subversion author field. However, reading a Subversion repository and
writing it out again will preserve the author fields.
Subversion's metadata doesn't have separate author and committer
properties, and doesn't store anything but a Unix user ID as
attribution. I don't see any way around this.
Import-stream timestamps have 1-second granularity. The subsecond
parts of Subversion commit timestamps will be lost on their way through
reposurgeon.
Unavoidable in moving from Subversion to git import streams, and one
of two places where git's data model requires us to throw away
information.
However, I think I could preserve this information in a
Subversion-to-Subversion editing scenario by storing the incoming
timestamps as floats and only truncating them on import-stream output,
leaving the subseconds in place for Subversion output.
Empty directories aren't represented in import streams. Consequently,
reading and writing Subversion repositories preserves file content,
but not empty directories. It is also not guaranteed that after
editing a Subverson repository that the sequence of directory
creations and deletions relative to other operations will be
identical; the only guarantee is that enclosing directories will be
created before any files in them are.
When reading a Subversion repository, reposurgeon discards the special
directory-copy nodes associated with branch creations. These can't be
recreated if and when the repository is written back out to
Subversion; rather, each branch copy node from the original translates
into a branch creation plus the first set of file modifications on the
branch.
In theory, I could relax the rules of reposurgeon's internal
representation so that empty directory-creation and deletion nodes are
not discarded at read time but only when outputting a git event stream.
That would bring Subversion repositories closer to round-tripping, but
not get all the way there. One problem is botched branch copies -
directory copies with cp(1) followed by Subversion add operations.
This is not an uncommon malformation; reposurgeon takes it in stride,
treating these as though they had been real branch copies and
simplifying the backlinks appropriately.
When reading a Subversion repository, reposurgeon also automatically
breaks apart mixed-branch commits.
It has to. These just can't be represented in the import-stream model of
branching.
Because of the preceding two points, it is not guaranteed that
even revision numbers will be stable when a Subversion repository
is read in and then written out!
So not only can Subversion repos fail to round-trip exactly, in the
presence of lots of branch copies and mixed-branch commits the
relationship between the read-in and written out revision numbers
could get pretty unpredictable.
Subversion repositories are always written with a standard
(trunk/tags/branches) layout. Thus, a repository with a nonstandard
shape that has been analyzed by reposurgeon won't be written out with
the same shape.
In particular, this means linear Subversion repositories with no trunk
(an organization some smaller projects used to use and might still)
will turn into branchy repos with trunk on the way out.
Subversion has a concept of "flows"; that is, named segments of
history corresponding to files or directories that are created when
the path is added, cloned when the path is copied, and deleted when
the path is deleted. This information is not preserved in import
streams or the internal representation that reposurgeon uses. Thus,
after editing, the flow boundaries of a Subversion history may be
arbitrarily changed.
This is me being obsessive about documenting the details. I think it
is doubtful that most Subversion users even know flows exist.
Bugs: Presently, writing out a history to a Subversion repository does
not create mergeinfo properties representing branch merges. It also
loses all information about lightweight tags (though annotated tags
are turned into Subversion-style directory copies). These bugs will
probably be fixed in future reposurgeon releases.
I'm also not sure the present code handles branchiness exactly right.
My next task is to write a test suite for this new feature.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
The Constitution is not neutral. It was designed to take the
government off the backs of the people.
-- Justice William O. Douglas
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: reposurgeon now writes Subversion repositories
2012-11-29 5:59 reposurgeon now writes Subversion repositories Eric S. Raymond
@ 2012-11-29 7:58 ` Daniel Shahaf
2012-11-29 10:31 ` Branko Čibej
2012-11-29 11:46 ` Eric S. Raymond
0 siblings, 2 replies; 7+ messages in thread
From: Daniel Shahaf @ 2012-11-29 7:58 UTC (permalink / raw)
To: Eric S. Raymond; +Cc: dev, git
Eric S. Raymond wrote on Thu, Nov 29, 2012 at 00:59:45 -0500:
> In summary, Subversion repository histories do not round-trip through
> reposurgeon editing. File content changes are preserved but some
> metadata is unavoidably lost. Furthermore, writing out a DVCS history
> in Subversion also loses significant portions of its metadata.
>
> Writing a Subversion repository or dump stream discards author
> information, the committer's name, and the hostname part of the commit
> address; only the commit timestamp and the local part of the
> committer's email address are preserved, the latter becoming the
> Subversion author field. However, reading a Subversion repository and
> writing it out again will preserve the author fields.
>
> Subversion's metadata doesn't have separate author and committer
> properties, and doesn't store anything but a Unix user ID as
> attribution. I don't see any way around this.
You're not fully informed, then.
1) svn:author revprops can contain any UTF-8 string. They are not
restricted to Unix user id's. (For example, they can contain full
names, if the administrator so chooses.)
2) You can define custom revision properties. In your case, the easiest
way would be to set an reposurgeon:author property, alongside the
svn:author property.
You might also seek community consensus to reserve an svn:foo name for
the "original author" property --- perhaps svn:original-author --- so
that reposurgeon and other git->svn tools can interoperate in the way
they transfer the "original author" information.
I note that one can set revision properties at commit time:
svn commit -m logmsg --with-revprop svn:original-author="Patch Submitter <foo@bar.example>"
> Empty directories aren't represented in import streams. Consequently,
> reading and writing Subversion repositories preserves file content,
> but not empty directories. It is also not guaranteed that after
> editing a Subverson repository that the sequence of directory
> creations and deletions relative to other operations will be
> identical; the only guarantee is that enclosing directories will be
> created before any files in them are.
How does reposurgeon handle empty directories with (node) properties?
% svnadmin create r
% svnmucc -mm -U file://$PWD/r mkdir foo propset k v foo
> Subversion has a concept of "flows"; that is, named segments of
> history corresponding to files or directories that are created when
> the path is added, cloned when the path is copied, and deleted when
> the path is deleted. This information is not preserved in import
> streams or the internal representation that reposurgeon uses. Thus,
> after editing, the flow boundaries of a Subversion history may be
> arbitrarily changed.
>
> This is me being obsessive about documenting the details. I think it
> is doubtful that most Subversion users even know flows exist.
>
I think you're saying that adds might turn into copies, and vice-versa.
That is something users would notice --- it is certainly exposed in the
UI --- even though node-id's are not exposed to clients.
>
Cheers
Daniel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: reposurgeon now writes Subversion repositories
2012-11-29 7:58 ` Daniel Shahaf
@ 2012-11-29 10:31 ` Branko Čibej
2012-11-29 11:46 ` Eric S. Raymond
1 sibling, 0 replies; 7+ messages in thread
From: Branko Čibej @ 2012-11-29 10:31 UTC (permalink / raw)
To: dev; +Cc: Eric S. Raymond, git
On 29.11.2012 08:58, Daniel Shahaf wrote:
> I think you're saying that adds might turn into copies, and
> vice-versa. That is something users would notice --- it is certainly
> exposed in the UI --- even though node-id's are not exposed to clients.
... yet. But there are plans underway to expose them.
--
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: reposurgeon now writes Subversion repositories
2012-11-29 7:58 ` Daniel Shahaf
2012-11-29 10:31 ` Branko Čibej
@ 2012-11-29 11:46 ` Eric S. Raymond
2012-11-29 13:42 ` Daniel Shahaf
2012-11-29 13:43 ` AW: " Markus Schaber
1 sibling, 2 replies; 7+ messages in thread
From: Eric S. Raymond @ 2012-11-29 11:46 UTC (permalink / raw)
To: Daniel Shahaf; +Cc: dev, git
Daniel Shahaf <danielsh@elego.de>:
> > Subversion's metadata doesn't have separate author and committer
> > properties, and doesn't store anything but a Unix user ID as
> > attribution. I don't see any way around this.
>
> You're not fully informed, then.
>
> 1) svn:author revprops can contain any UTF-8 string. They are not
> restricted to Unix user id's. (For example, they can contain full
> names, if the administrator so chooses.)
Right. At one point during the development of this feature I was
accidentally storing the full email field in this property. So I
already knew that this is allowed at some level.
And, I have no trouble believing that svn log will cheerfully echo
anything that I choose to stuff in that field.
But...
(1) How much work would it be it to set up a Subversion installation
so that when I svn commit, the tool does the right thing, e.g. puts
a DVCS-style fullname/email string in there?
(2) Have the tools been tested for bugs arising from having whitespace
in that data?
Really, if it's actually easy to set up DVCS-style globally unique IDs you
Subversion guys ought to be shouting it from the housetops. The absence
of this capability is a serious PITA in several situations, including
for example migrating projects between forges.
RFC: If I wrote a patch that let Subversion users set their own
content string for the author field in ~/.subversion/config, would
you merge it? Because I'd totally write that.
> 2) You can define custom revision properties. In your case, the easiest
> way would be to set an reposurgeon:author property, alongside the
> svn:author property.
Yeah, sure, I've assumed all along this wouldn't break if I tried it.
If I actually thought you guys were capable of designing a data model
with a perfectly general-looking store of key/value pairs and then
arbitrarily restricting the key set so I couldn't do that, I'd almost
have to find each and every one of you and kick your asses into next
Tuesday on account of blatant stupidity. I have no such plans :-).
But...what good does this capability do? OK, it would assist
round-tripping back to gitspace, but while that's kind of cool I don't
see any help for a normal Subversion workflow here.
> You might also seek community consensus to reserve an svn:foo name for
> the "original author" property --- perhaps svn:original-author --- so
> that reposurgeon and other git->svn tools can interoperate in the way
> they transfer the "original author" information.
OK. But I like the idea of letting the users set their own author
content string better. Instead of another layer of kluges, why
shouldn't Subversion join the DVCSes in the happy land of
Internet-scoped attributions?
> How does reposurgeon handle empty directories with (node) properties?
Currently by ignoring all of them except svn:ignore, which it turns
into .gitignore content on the gitspace side. And now vice-versa, too.
Not clear what else it *could* do. I'd take suggestions.
> > Subversion has a concept of "flows"; that is, named segments of
> > history corresponding to files or directories that are created when
> > the path is added, cloned when the path is copied, and deleted when
> > the path is deleted. This information is not preserved in import
> > streams or the internal representation that reposurgeon uses. Thus,
> > after editing, the flow boundaries of a Subversion history may be
> > arbitrarily changed.
> >
> > This is me being obsessive about documenting the details. I think it
> > is doubtful that most Subversion users even know flows exist.
>
> I think you're saying that adds might turn into copies, and vice-versa.
> That is something users would notice --- it is certainly exposed in the
> UI --- even though node-id's are not exposed to clients.
I'm saying nobody thinks of flows when they do branch copies. It's
not just that users don't see node IDs, it's that no part of most users'
mental model of how Subversion works resembles them.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: reposurgeon now writes Subversion repositories
2012-11-29 11:46 ` Eric S. Raymond
@ 2012-11-29 13:42 ` Daniel Shahaf
2012-11-29 13:55 ` Eric S. Raymond
2012-11-29 13:43 ` AW: " Markus Schaber
1 sibling, 1 reply; 7+ messages in thread
From: Daniel Shahaf @ 2012-11-29 13:42 UTC (permalink / raw)
To: Eric S. Raymond; +Cc: dev, git
(note, other half of the thread is on dev@svn only..)
Eric S. Raymond wrote on Thu, Nov 29, 2012 at 06:46:37 -0500:
> Daniel Shahaf <danielsh@elego.de>:
> > You might also seek community consensus to reserve an svn:foo name for
> > the "original author" property --- perhaps svn:original-author --- so
> > that reposurgeon and other git->svn tools can interoperate in the way
> > they transfer the "original author" information.
>
> OK. But I like the idea of letting the users set their own author
> content string better. Instead of another layer of kluges, why
I don't see the kludge here --- git has a "author" != "committer"
distinction, svn doesn't, so if you want to grow that distinction the
most natural way is a new property. Storing additional information in
svn:author is a separate issue.
> > > Subversion has a concept of "flows"; that is, named segments of
> > > history corresponding to files or directories that are created when
> > > the path is added, cloned when the path is copied, and deleted when
> > > the path is deleted. This information is not preserved in import
> > > streams or the internal representation that reposurgeon uses. Thus,
> > > after editing, the flow boundaries of a Subversion history may be
> > > arbitrarily changed.
> > >
> > > This is me being obsessive about documenting the details. I think it
> > > is doubtful that most Subversion users even know flows exist.
> >
> > I think you're saying that adds might turn into copies, and vice-versa.
> > That is something users would notice --- it is certainly exposed in the
> > UI --- even though node-id's are not exposed to clients.
>
> I'm saying nobody thinks of flows when they do branch copies. It's
> not just that users don't see node IDs, it's that no part of most users'
> mental model of how Subversion works resembles them.
I'm still not sure what you have in mind. I note that 'svn log' and
'svn blame' cross both file copies and branch creation --- that's one
effect of "'svn cp foo bar; svn ci' causes bar to be related to foo".
> --
> <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: reposurgeon now writes Subversion repositories
2012-11-29 13:42 ` Daniel Shahaf
@ 2012-11-29 13:55 ` Eric S. Raymond
0 siblings, 0 replies; 7+ messages in thread
From: Eric S. Raymond @ 2012-11-29 13:55 UTC (permalink / raw)
To: Daniel Shahaf; +Cc: dev, git
Daniel Shahaf <danielsh@elego.de>:
> I don't see the kludge here --- git has a "author" != "committer"
> distinction, svn doesn't, so if you want to grow that distinction the
> most natural way is a new property. Storing additional information in
> svn:author is a separate issue.
See my advocacy to Branko of going to Internet-scoped IDs. The kludge
would be maintaining the local and Internet-scoped identifications
as different properties and having to decide which one to key on
ad-hoc. Nothing to do with the author/committer distinction.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* AW: reposurgeon now writes Subversion repositories
2012-11-29 11:46 ` Eric S. Raymond
2012-11-29 13:42 ` Daniel Shahaf
@ 2012-11-29 13:43 ` Markus Schaber
1 sibling, 0 replies; 7+ messages in thread
From: Markus Schaber @ 2012-11-29 13:43 UTC (permalink / raw)
To: esr@thyrsus.com, Daniel Shahaf
Cc: dev@subversion.apache.org, git@vger.kernel.org
Hi,
Von: Eric S. Raymond [mailto:esr@thyrsus.com]
> > How does reposurgeon handle empty directories with (node) properties?
>
> Currently by ignoring all of them except svn:ignore, which it turns
> into .gitignore content on the gitspace side. And now vice-versa, too.
>
> Not clear what else it *could* do. I'd take suggestions.
AFAIR, SvnBridge (which bridges SVN to Team Foundation Server for CodePlex) creates a hidden .svnproperties file where all the properties of the directory and files are stored.
I'm not really sure, but maybe this could be used as some standard to bridge svn properties to non-svn VCSes.
Best regards
Markus Schaber
CODESYS(r) a trademark of 3S-Smart Software Solutions GmbH
Inspiring Automation Solutions
3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50
E-Mail: m.schaber@codesys.com | Web: http://www.codesys.com
CODESYS internet forum: http://forum.codesys.com
Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-11-29 13:56 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-29 5:59 reposurgeon now writes Subversion repositories Eric S. Raymond
2012-11-29 7:58 ` Daniel Shahaf
2012-11-29 10:31 ` Branko Čibej
2012-11-29 11:46 ` Eric S. Raymond
2012-11-29 13:42 ` Daniel Shahaf
2012-11-29 13:55 ` Eric S. Raymond
2012-11-29 13:43 ` AW: " Markus Schaber
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).