* [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem
@ 2008-03-19 4:08 Bryan Donlan
2008-03-20 4:31 ` Sam Vilain
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Bryan Donlan @ 2008-03-19 4:08 UTC (permalink / raw)
To: git
Hello,
I'm planning to apply for the git summer of code project. My proposal
is based on the project idea of a subversion gateway for git,
implemented with a new subversion filesystem layer. A draft of my
proposal follows; I'd appreciate any comments/questions on it before
the application period proper begins.
Thanks,
Bryan Donlan
=== Project Goals ===
I propose to implement a subversion filesystem driver (libsvn-fs-git) that
uses a git repository as its backing store. Commits will be supported either
directly in the git repository, or in the corresponding subversion repository,
and automatically mirrored to the other side as appropriate.
I intend to support the following:
* Full or near full (possibly forbidding modification of the toplevel
trunk/ branches/ tags/ structure) read/write access from subversion
* svnadmin create/dump/load to convert existing subversion repositories
* Support for wrapping a pre-existing git repository and presenting it
as a subversion repository
* Support for mapping git branches and tags onto subversion branches
and tags (and vice versa)
* Support for syncing svn:executable with git file mode information
* Representation of git merge data using svk:merge and/or svn:mergeinfo
* Syncing .gitignore and svn:ignore data
As both subversion and git are written in C, this driver will also be in C.
Here are some tentative milestones:
* Read-only access from SVN to the master branch (no trunk/ etc layout)
= Conversion of git commit information into svn revprops
= git mode/.gitignore -> svn property conversion here?
* Read-write access from SVN to the master branch
= Map svn usernames to git full name/email according to a configuration map
- how should git commits with names unknown to svn be handled? Just pass
them through, spaces and <@> as well?
= Bidirectional svn:execute and svn:ignore conversion.
= Copyfrom and file property information needs to be recorded
= Test importing a largish repository (without converting merge information)
to git (the svn toplevel stuff would be left as-is in the git tree)
= Consider developing git-svn-fs on a git-svn-fs repository itself for
testing purposes
* Standard toplevel SVN layout (trunk/ tags/ branches/)
= SVN branch creation might come a bit later
= Test importing a largish repository with tags and branches carried across
(might not efficiently support copy-from information)
* Merge information annotation (git->svn)
= Try to guess the copy source for a new tag or branch - and for merges
* Merge information annotation (svn->git)
* Import of a largish repository with svk or similar merge information into git,
and vice versa (eg, exporting git.git with merge tracking as a subversion
repo)
=== Interfaces ===
As mentioned before, this driver will plug into the existing subversion stack
as a filesystem driver. This immediately allows access using any of subversion's
access methods (direct filesystem access, mod_dav_svn, svnserve).
On the git side I intend to use libgit for all git repository access. If I find
it lacking a necessary feature, I will attempt to add the missing interfaces
to libgit if at all feasable.
I anticipate svn-git-fs to live either in git.git's contrib or an outside
repository. There should be little if any changes to git itself.
=== About me ===
I am a sophomore computer science student at the University of Maine at Orono.
I have been programming since well before I entered college, and am experienced
in C, although I have not done much work in large (in terms of number of
developers) projects. I have experience in using Subversion, including doing
merges with svk, but I am somewhat less experienced with git. I hope to become
more familiar with git prior to, and as I progress in this project. I also have
some ability with Japanese... but possibly not enough yet to translate the
strings and documentation in this project :)
This particular project idea caught my eye partially because I have been hoping
to convert another open-source project that I hack on [1] to git, but as one
of the other developers is testing it primarily on windows, we've been reluctant
to move there. A git<->svn gateway would be ideal to help ease the transition.
(We haven't yet tried the cygwin port, so admittedly this may be moot already)
I have submitted a small documentation patch to git.git recently[2], and lurked
on the mailing lists for a while during its early days, but I have not yet
become actively involved in development.
[1] - http://openc2e.ccdevnet.org
[2] - commit 81fa145917c40b68a5e2cca6afc6a10cdfdbd25b
=== (Tentative) design notes ===
My current plan for storing the additional information the subversion side will
need (fileprops, revprops, copyfrom information...) is to create an additional
branch on the git repository (possibly .git-svn or similar) to hold the
necessary metadata. Configuration, including author maps, branch/tag maps,
etc, would be on another branch (git-svn-config or similar).
The layout might look like this:
/tree/{trunk/,branches/,tags/} - the tree as svn currently sees it
/props/{trunk/,branches/,tags/} - file properties; props on directories will be
represented with a reserved filename (._GIT-SVN-DIRPROPS perhaps)
copyfrom information might be in /props, or in a seperate tree
/revprops/NNN - revision properties for the given revision number
/revmap/NNN - a reference to the commit hash in the .git-svn branch
corresponding to the given subversion revision number
Each subversion commit corresponds to two .git-svn commits; one to
update tree, props, and revprops, and one to update revmap. The first
commit will
additionally have the following metadata in its commit:
git-svn-revno NNN
git-svn-parent (commit hash of corresponding git-side commit)
If the commit was initiated from the svn side, the git-side commit will be
committed first, and will contain a git-svn-revno field as well. The overall
commit will be performed while holding a git-svn-fs-specific lock (also held
when replicating new git commits to the svn side).
If a commit is performed on the subversion side, the next query to the
subversion layer which checks the current youngest revision number will also
scan for updated git heads, assign revision numbers, and create the necessary
subversion metadata in the .git-svn branch.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem
2008-03-19 4:08 [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem Bryan Donlan
@ 2008-03-20 4:31 ` Sam Vilain
2008-03-20 4:56 ` Shawn O. Pearce
2008-03-22 5:02 ` Bryan Donlan
2 siblings, 0 replies; 11+ messages in thread
From: Sam Vilain @ 2008-03-20 4:31 UTC (permalink / raw)
To: Bryan Donlan; +Cc: git
Bryan Donlan wrote:
> Here are some tentative milestones:
> * Read-only access from SVN to the master branch (no trunk/ etc layout)
> = Conversion of git commit information into svn revprops
> = git mode/.gitignore -> svn property conversion here?
This seems like a large milestone. Can you break this up any more?
For instance, your design notes on storing the necessary mapping
information are good. How about a separate milestone of having a test
suite for the library functions you make for accessing that information.
I would be tempted to check the protocol -
http://svn.collab.net/repos/svn/trunk/subversion/libsvn_ra_svn/protocol
- and make milestones for each request type that the protocol allows
for. Perhaps there is a more relevant list that you can find, such as
groups of tests in the back-end test suite that ships with Subversion.
Even taking the list of svn sub-commands, and deciding which fit into
each category would be a good enhancement.
> * Read-write access from SVN to the master branch
> = Map svn usernames to git full name/email according to a configuration map
> - how should git commits with names unknown to svn be handled? Just pass
> them through, spaces and <@> as well?
Meh. Just ignore them, and set revprops with all of the git committer
information.
> = Bidirectional svn:execute and svn:ignore conversion.
> = Copyfrom and file property information needs to be recorded
> = Test importing a largish repository (without converting merge information)
> to git (the svn toplevel stuff would be left as-is in the git tree)
> = Consider developing git-svn-fs on a git-svn-fs repository itself for
> testing purposes
An honourable notion, but I'd steer away from worrying about
self-hosting, if it is irrelevant to the task at hand. Focus more on a
finding a good test suite to check you supported all the operations.
Eg, can the test suite bundled with the Subversion project run against
your back-end?
> * Standard toplevel SVN layout (trunk/ tags/ branches/)
> = SVN branch creation might come a bit later
> = Test importing a largish repository with tags and branches carried across
> (might not efficiently support copy-from information)
> * Merge information annotation (git->svn)
> = Try to guess the copy source for a new tag or branch - and for merges
I don't like this word "guess". It might be dangerous to not
deterministically or repeatably answer a request. If any random
decisions were made, or information derived based on things that might
change, then it should be stored in your mapping information branch. In
this instance, we didn't 'guess', we decided.
> * Merge information annotation (svn->git)
> * Import of a largish repository with svk or similar merge information into git,
> and vice versa (eg, exporting git.git with merge tracking as a subversion
> repo)
Whew! That's a lot of big milestones, but it's your summer ... :)
I think the merging thing is a nice-to-have, and doing it would just
prove that you can use the metadata that you have collected well.
One thing I like about your approach is that the tracking branch itself
could be replicated, leaving an audit of what happened.
> === Interfaces ===
>
> As mentioned before, this driver will plug into the existing subversion stack
> as a filesystem driver. This immediately allows access using any of subversion's
> access methods (direct filesystem access, mod_dav_svn, svnserve).
>
> On the git side I intend to use libgit for all git repository access. If I find
> it lacking a necessary feature, I will attempt to add the missing interfaces
> to libgit if at all feasable.
AFAIK the interface for libgit is not yet finalized, so bear in mind the
application will possibly need porting work for each release.
Sam.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem
2008-03-19 4:08 [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem Bryan Donlan
2008-03-20 4:31 ` Sam Vilain
@ 2008-03-20 4:56 ` Shawn O. Pearce
2008-03-20 6:18 ` Harvey Harrison
` (2 more replies)
2008-03-22 5:02 ` Bryan Donlan
2 siblings, 3 replies; 11+ messages in thread
From: Shawn O. Pearce @ 2008-03-20 4:56 UTC (permalink / raw)
To: Bryan Donlan; +Cc: git
Bryan Donlan <bdonlan@gmail.com> wrote:
> I'm planning to apply for the git summer of code project. My proposal
> is based on the project idea of a subversion gateway for git,
> implemented with a new subversion filesystem layer. A draft of my
> proposal follows; I'd appreciate any comments/questions on it before
> the application period proper begins.
Very cool. Have you had a chance to look at the prototype python
implementation of an SVN server that Julian Phillips started?
http://git.q42.co.uk/w/git_svn_server.git
I'm just curious what your take is regarding this approach. Why
would you choose to construct libsvn-fs-git over a standalone server?
There are several advantages and drawbacks to both approaches.
I am not advocating over the other, but want to make sure you have
thought it through for yourself.
> I intend to support the following:
> * Full or near full (possibly forbidding modification of the toplevel
> trunk/ branches/ tags/ structure) read/write access from subversion
That's probably the only sane way to go about it; disallow read/write
on the top level, map whatever branch "HEAD" points to in Git to the
trunk/, put the other branches in branches/ and the tags under tags/.
Block everything else.
> * Support for syncing svn:executable with git file mode information
> * Representation of git merge data using svk:merge and/or svn:mergeinfo
> * Syncing .gitignore and svn:ignore data
These are gravy. Sure they are going to be difficult to make work,
but people can limp by without them. Most users who want an SVN
client to speak to a Git repository are trying to do so from a
platform that does not honor executable bits (hi Windows!) and
telling users to edit the funny ".gitignore" file to alter ignore
lists is something they can work around without too much trouble
if they are already able to modify and commit files.
Though their clients won't provide the proper ignore support out
of the box. *sigh*
> As both subversion and git are written in C, this driver will also be in C.
I think you may have underestimated the challenges associated with
linking "libgit.a" (which is _not_ a library) with SVN. Critical
routines within libgit that you want to be able to invoke will do
not so nice things like leak massive amounts of memory or cause
your process to terminate if the function is fed an invalid input.
Most of the C code of Git is designed for single-shot execution.
We leak memory like mad because it is more efficient to load up what
we need, exit, and let the OS just return the pages to the free pool.
Long running processes have simply not been something we do.
> My current plan for storing the additional information the subversion side will
> need (fileprops, revprops, copyfrom information...) is to create an additional
> branch on the git repository (possibly .git-svn or similar) to hold the
> necessary metadata. Configuration, including author maps, branch/tag maps,
> etc, would be on another branch (git-svn-config or similar).
>
> The layout might look like this:
>
> /tree/{trunk/,branches/,tags/} - the tree as svn currently sees it
I don't think you'd want to put a copy of the tree inside of a tree,
as this can then get out of sync with changes made directly through
git, plus you run into issues about connecting the two histories
together in a meaningful way.
I would suggest having the root directory of the SVN tree be built
on the fly based upon the list of available branches and tags in
the Git repository (aka the output of git-show-ref).
> /props/{trunk/,branches/,tags/} - file properties; props on directories will be
> represented with a reserved filename (._GIT-SVN-DIRPROPS perhaps)
> copyfrom information might be in /props, or in a seperate tree
How critical are file properties to an SVN client for proper
functioning? Given the challenges already in front of you for this
project I would almost encourage you to avoid dealing with file
level properties. Its hard enough to make something that speaks SVN
on the wire but reads/writes Git on disk, not to mention you have
to somehow "flatten" the Git DAG down into a sequential revision
namespace to make the SVN clients happy. So deferring property
support until later may be wise.
> /revprops/NNN - revision properties for the given revision number
Ditto. Aside from the special merge properties you mentioned,
I wonder if you can simply avoid implementing support for these
early on.
> /revmap/NNN - a reference to the commit hash in the .git-svn branch
> corresponding to the given subversion revision number
How about using a simple flat file interface? To initially prime
the file you can do something like:
git rev-list --topo-order --date-order --reverse --all >.git/svn-map
and then number the revisions by the line number that they appear on.
Locating a Git SHA-1 for a specific SVN revision would be a simple
case of lseek(fd, 41 * rev, SEEK_SET). Going the other direction
would be more of a challenge, but is still doable.
Updating the file should just require appending new commits; if
the SVN client wants a new commit you append on and return the
line number. If Git has caused new commits not in this file you
need to rebuild the log. This would have to be done incrementally,
to prevent changing a prior SVN revision number that clients may
already know about.
--
Shawn.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem
2008-03-20 4:56 ` Shawn O. Pearce
@ 2008-03-20 6:18 ` Harvey Harrison
2008-03-20 9:22 ` Julian Phillips
2008-03-20 10:01 ` Jakub Narebski
2 siblings, 0 replies; 11+ messages in thread
From: Harvey Harrison @ 2008-03-20 6:18 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Bryan Donlan, git
On Thu, 2008-03-20 at 00:56 -0400, Shawn O. Pearce wrote:
> Bryan Donlan <bdonlan@gmail.com> wrote:
> > /revmap/NNN - a reference to the commit hash in the .git-svn branch
> > corresponding to the given subversion revision number
>
> How about using a simple flat file interface? To initially prime
> the file you can do something like:
>
> git rev-list --topo-order --date-order --reverse --all >.git/svn-map
>
> and then number the revisions by the line number that they appear on.
> Locating a Git SHA-1 for a specific SVN revision would be a simple
> case of lseek(fd, 41 * rev, SEEK_SET). Going the other direction
> would be more of a challenge, but is still doable.
>
> Updating the file should just require appending new commits; if
> the SVN client wants a new commit you append on and return the
> line number. If Git has caused new commits not in this file you
> need to rebuild the log. This would have to be done incrementally,
> to prevent changing a prior SVN revision number that clients may
> already know about.
Why not just copy the rev_map format git-svn already uses, it's pretty
efficient.
Harvey
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem
2008-03-20 4:56 ` Shawn O. Pearce
2008-03-20 6:18 ` Harvey Harrison
@ 2008-03-20 9:22 ` Julian Phillips
2008-03-20 10:01 ` Jakub Narebski
2 siblings, 0 replies; 11+ messages in thread
From: Julian Phillips @ 2008-03-20 9:22 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Bryan Donlan, git
On Thu, 20 Mar 2008, Shawn O. Pearce wrote:
> Bryan Donlan <bdonlan@gmail.com> wrote:
>> I'm planning to apply for the git summer of code project. My proposal
>> is based on the project idea of a subversion gateway for git,
>> implemented with a new subversion filesystem layer. A draft of my
>> proposal follows; I'd appreciate any comments/questions on it before
>> the application period proper begins.
>
> Very cool. Have you had a chance to look at the prototype python
> implementation of an SVN server that Julian Phillips started?
>
> http://git.q42.co.uk/w/git_svn_server.git
(now with partial support for 'svn log' ... ;))
>> /props/{trunk/,branches/,tags/} - file properties; props on directories will be
>> represented with a reserved filename (._GIT-SVN-DIRPROPS perhaps)
>> copyfrom information might be in /props, or in a seperate tree
>
> How critical are file properties to an SVN client for proper
> functioning? Given the challenges already in front of you for this
> project I would almost encourage you to avoid dealing with file
> level properties. Its hard enough to make something that speaks SVN
> on the wire but reads/writes Git on disk, not to mention you have
> to somehow "flatten" the Git DAG down into a sequential revision
> namespace to make the SVN clients happy. So deferring property
> support until later may be wise.
You might need to get svn:eol-style working to prevent the svn client from
munging any binary files? Can't think of any other vital properties atm.
>> /revprops/NNN - revision properties for the given revision number
>
> Ditto. Aside from the special merge properties you mentioned,
> I wonder if you can simply avoid implementing support for these
> early on.
Since you have to explicitly enable revprop editing in the subversion
repository by enabling a hook script, I should think that this was
definately something that could be left at the bottom of the TODO list ...
Though you do need to be able to convert commit info into the appropriate
revprops (e.g. commit msg -> svn:log revprop)
--
Julian
---
Often statistics are used as a drunken man uses lampposts -- for support
rather than illumination.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem
2008-03-20 4:56 ` Shawn O. Pearce
2008-03-20 6:18 ` Harvey Harrison
2008-03-20 9:22 ` Julian Phillips
@ 2008-03-20 10:01 ` Jakub Narebski
2 siblings, 0 replies; 11+ messages in thread
From: Jakub Narebski @ 2008-03-20 10:01 UTC (permalink / raw)
To: git
[Cc: Shawn O. Pearce <spearce@spearce.org>,
Bryan Donlan <bdonlan@gmail.com>,
git@vger.kernel.org]
Shawn O. Pearce wrote:
> Bryan Donlan <bdonlan@gmail.com> wrote:
>> /revmap/NNN - a reference to the commit hash in the .git-svn branch
>> corresponding to the given subversion revision number
>
> How about using a simple flat file interface? To initially prime
> the file you can do something like:
>
> git rev-list --topo-order --date-order --reverse --all \
> >.git/svn-map
>
> and then number the revisions by the line number that they appear on.
> Locating a Git SHA-1 for a specific SVN revision would be a simple
> case of lseek(fd, 41 * rev, SEEK_SET). Going the other direction
> would be more of a challenge, but is still doable.
>
> Updating the file should just require appending new commits; if
> the SVN client wants a new commit you append on and return the
> line number. If Git has caused new commits not in this file you
> need to rebuild the log. This would have to be done incrementally,
> to prevent changing a prior SVN revision number that clients may
> already know about.
By the way, have you looked into what git-svn uses? IIRC it had some
improvements to avoid spending more disk space on SVN revno <-> Git SHA-1
mapping than on the repository itself...
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem
2008-03-19 4:08 [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem Bryan Donlan
2008-03-20 4:31 ` Sam Vilain
2008-03-20 4:56 ` Shawn O. Pearce
@ 2008-03-22 5:02 ` Bryan Donlan
2008-03-22 11:35 ` thread-safe libgit.a as a GSoC project, was " Johannes Schindelin
2 siblings, 1 reply; 11+ messages in thread
From: Bryan Donlan @ 2008-03-22 5:02 UTC (permalink / raw)
To: git
Cc: Sam Vilain, Shawn O. Pearce, Harvey Harrison, Julian Phillips,
Jakub Narebski
On Wed, Mar 19, 2008 at 12:08 AM, Bryan Donlan <bdonlan@gmail.com> wrote:
> Hello,
>
> I'm planning to apply for the git summer of code project. My proposal
> is based on the project idea of a subversion gateway for git,
> implemented with a new subversion filesystem layer. A draft of my
> proposal follows; I'd appreciate any comments/questions on it before
> the application period proper begins.
Hi all,
Thanks for all the comments. To try to avoid spamming the list, I've
replied in a single message, if it'd be better to reply individually
in the future please let me know.
On Thu, Mar 20, 2008 at 12:31 AM, Sam Vilain <sam@vilain.net> wrote:
> Bryan Donlan wrote:
>
> > Here are some tentative milestones:
> > * Read-only access from SVN to the master branch (no trunk/ etc layout)
> > = Conversion of git commit information into svn revprops
> > = git mode/.gitignore -> svn property conversion here?
>
> This seems like a large milestone. Can you break this up any more?
>
> For instance, your design notes on storing the necessary mapping
> information are good. How about a separate milestone of having a test
> suite for the library functions you make for accessing that information.
That seems reasonable - eg, a milestone for cloning over a git tree
into the .git-svn branch.
> I would be tempted to check the protocol -
> http://svn.collab.net/repos/svn/trunk/subversion/libsvn_ra_svn/protocol
> - and make milestones for each request type that the protocol allows
> for. Perhaps there is a more relevant list that you can find, such as
> groups of tests in the back-end test suite that ships with Subversion.
> Even taking the list of svn sub-commands, and deciding which fit into
> each category would be a good enhancement.
I haven't decided I will try to take this into the subversion tree
proper - but I could try to shoehorn on the subversion tests. That
said, they tend to work by checking in test data, then verifying it,
so they won't work until write support works.
I could set milestones based on specific libsvn_fs APIs, but once I've
got the metadata cloned over I don't think any individual operation
will be particularly difficult in itself (they'd all be just giving a
view of the /tree/ in the .git-svn branch)
> > = Bidirectional svn:execute and svn:ignore conversion.
> > = Copyfrom and file property information needs to be recorded
> > = Test importing a largish repository (without converting merge information)
> > to git (the svn toplevel stuff would be left as-is in the git tree)
> > = Consider developing git-svn-fs on a git-svn-fs repository itself for
> > testing purposes
>
> An honourable notion, but I'd steer away from worrying about
> self-hosting, if it is irrelevant to the task at hand. Focus more on a
> finding a good test suite to check you supported all the operations.
> Eg, can the test suite bundled with the Subversion project run against
> your back-end?
Once commits work, I think it should be possible to get it working
(it's already engineered to support two backends). svnadmin create
support might be needed as well though.
> > * Standard toplevel SVN layout (trunk/ tags/ branches/)
> > = SVN branch creation might come a bit later
> > = Test importing a largish repository with tags and branches carried across
> > (might not efficiently support copy-from information)
> > * Merge information annotation (git->svn)
> > = Try to guess the copy source for a new tag or branch - and for merges
>
> I don't like this word "guess". It might be dangerous to not
> deterministically or repeatably answer a request. If any random
> decisions were made, or information derived based on things that might
> change, then it should be stored in your mapping information branch. In
> this instance, we didn't 'guess', we decided.
Indeed, it would be saved by creating subversion commits (recorded in
the .git-svn branch as changes to /tree etc) corresponding to any
changes to branches or tags. The reason I say 'guess' is because,
since git commits are not unique, it is difficult to attribute them to
a single branch. eg:
branch2 C
/
branch1 B-D
/
master A---E
Commit 'B' would have to be attributed to one or the other of branch1
or branch2, but given just the final state of (C,D,E) we can't
uniquely determine which it should go to. Thus some kind of heuristic
will be needed. Merges can be worse:
branch D-----E
/ / \
master A-----B---C
If commit 'E' is from a remote branch, we won't have enough branches
to go around, and either the merge information would be discarded
(leaving an incomplete view of history), or an automatically-named
branch would need to be made with the history of the other branch.
> > * Merge information annotation (svn->git)
> > * Import of a largish repository with svk or similar merge information into git,
> > and vice versa (eg, exporting git.git with merge tracking as a subversion
> > repo)
>
> Whew! That's a lot of big milestones, but it's your summer ... :)
>
> I think the merging thing is a nice-to-have, and doing it would just
> prove that you can use the metadata that you have collected well.
Okay, I think I'll move some of the milestones into 'nice to have, but
might run out of time' then :)
On Thu, Mar 20, 2008 at 12:56 AM, Shawn O. Pearce <spearce@spearce.org> wrote:
> Bryan Donlan <bdonlan@gmail.com> wrote:
> > I'm planning to apply for the git summer of code project. My proposal
> > is based on the project idea of a subversion gateway for git,
> > implemented with a new subversion filesystem layer. A draft of my
> > proposal follows; I'd appreciate any comments/questions on it before
> > the application period proper begins.
>
> Very cool. Have you had a chance to look at the prototype python
> implementation of an SVN server that Julian Phillips started?
>
> http://git.q42.co.uk/w/git_svn_server.git
>
> I'm just curious what your take is regarding this approach. Why
> would you choose to construct libsvn-fs-git over a standalone server?
> There are several advantages and drawbacks to both approaches.
> I am not advocating over the other, but want to make sure you have
> thought it through for yourself.
The main reason is generality - I want to give the user the choice of
svn://, http://, or even (for testing, I hope) file:// access to the
repository. Also, it may be possible to convert a subversion
repository to git with just a svnadmin load, once sufficient support
is in place.
Allowing the svn server and repository access layer to go in front of
the git filesystem also lets me benefit from any sanity checks in
there, hopefully reducing the impact of any possible security bugs.
Finally, it seems a bit simpler of an API than the libsvn_ra API that
svnserve wraps. For one, I don't need to keep track of the client's
working copy state, nor to I need to mess with non-blocking IO to
avoid deadlocks.
> > I intend to support the following:
> > * Full or near full (possibly forbidding modification of the toplevel
> > trunk/ branches/ tags/ structure) read/write access from subversion
>
> That's probably the only sane way to go about it; disallow read/write
> on the top level, map whatever branch "HEAD" points to in Git to the
> trunk/, put the other branches in branches/ and the tags under tags/.
> Block everything else.
It'd be nice to unblock later, to allow svnadmin load to effectively
convert a svn repository to git.
> > * Support for syncing svn:executable with git file mode information
> > * Representation of git merge data using svk:merge and/or svn:mergeinfo
> > * Syncing .gitignore and svn:ignore data
>
> These are gravy. Sure they are going to be difficult to make work,
> but people can limp by without them. Most users who want an SVN
> client to speak to a Git repository are trying to do so from a
> platform that does not honor executable bits (hi Windows!) and
> telling users to edit the funny ".gitignore" file to alter ignore
> lists is something they can work around without too much trouble
> if they are already able to modify and commit files.
>
> Though their clients won't provide the proper ignore support out
> of the box. *sigh*
Mhm, perhaps I'll move this to a later (would-be-nice-if-there's-time)
milestone then.
> > As both subversion and git are written in C, this driver will also be in C.
>
> I think you may have underestimated the challenges associated with
> linking "libgit.a" (which is _not_ a library) with SVN. Critical
> routines within libgit that you want to be able to invoke will do
> not so nice things like leak massive amounts of memory or cause
> your process to terminate if the function is fed an invalid input.
>
> Most of the C code of Git is designed for single-shot execution.
> We leak memory like mad because it is more efficient to load up what
> we need, exit, and let the OS just return the pages to the free pool.
> Long running processes have simply not been something we do.
Mmm, and on further inspection there's global variables everywhere, no
locking, and what looks like not much support for multiple git
directories. I might have to skip libgit and just write my own code to
access the git object store - subversion requires thread safety, and
support for opening multiple filesystems (or even the same filesystem
multiple times).
> > My current plan for storing the additional information the subversion side will
> > need (fileprops, revprops, copyfrom information...) is to create an additional
> > branch on the git repository (possibly .git-svn or similar) to hold the
> > necessary metadata. Configuration, including author maps, branch/tag maps,
> > etc, would be on another branch (git-svn-config or similar).
> >
> > The layout might look like this:
> >
> > /tree/{trunk/,branches/,tags/} - the tree as svn currently sees it
>
> I don't think you'd want to put a copy of the tree inside of a tree,
> as this can then get out of sync with changes made directly through
> git, plus you run into issues about connecting the two histories
> together in a meaningful way.
>
> I would suggest having the root directory of the SVN tree be built
> on the fly based upon the list of available branches and tags in
> the Git repository (aka the output of git-show-ref).
Subversion absolutely requires that revisions be immutable - the
client will do things like present the server with a revision number
and ask for all changes since then. As such, once we decide what a
revision looks like, we must record that and use the same tree in the
future. Explicitly saving the tree seemed to me like the most
effective way to do that - and it also means many of the filesystem
access APIs can simply directly inspect this git tree.
As for synchronization, it'll be necessary to explicitly convert git
commits to subversion revisions anyway, as actions such as making a
new branch, which in git doesn't need a new commit, do require a copy
operation and commit in subversion.
> > /props/{trunk/,branches/,tags/} - file properties; props on directories will be
> > represented with a reserved filename (._GIT-SVN-DIRPROPS perhaps)
> > copyfrom information might be in /props, or in a seperate tree
>
> How critical are file properties to an SVN client for proper
> functioning? Given the challenges already in front of you for this
> project I would almost encourage you to avoid dealing with file
> level properties. Its hard enough to make something that speaks SVN
> on the wire but reads/writes Git on disk, not to mention you have
> to somehow "flatten" the Git DAG down into a sequential revision
> namespace to make the SVN clients happy. So deferring property
> support until later may be wise.
Okay. Property support is probably not too difficult, but I see no
problem with moving it to a later milestone.
> > /revprops/NNN - revision properties for the given revision number
>
> Ditto. Aside from the special merge properties you mentioned,
> I wonder if you can simply avoid implementing support for these
> early on.
Merge properties are actually file properties. Revprops are needed for
svn log support; they store the commit message, author, and date at
the least. It may be possible to get the subversion client limping
along without them though.
> > /revmap/NNN - a reference to the commit hash in the .git-svn branch
> > corresponding to the given subversion revision number
>
> How about using a simple flat file interface? To initially prime
> the file you can do something like:
>
> git rev-list --topo-order --date-order --reverse --all >.git/svn-map
>
> and then number the revisions by the line number that they appear on.
> Locating a Git SHA-1 for a specific SVN revision would be a simple
> case of lseek(fd, 41 * rev, SEEK_SET). Going the other direction
> would be more of a challenge, but is still doable.
>
> Updating the file should just require appending new commits; if
> the SVN client wants a new commit you append on and return the
> line number. If Git has caused new commits not in this file you
> need to rebuild the log. This would have to be done incrementally,
> to prevent changing a prior SVN revision number that clients may
> already know about.
Hm, yes, that does seem a better way. Actually since the canonical
revision numbers are in the commits on the metadata branch anyway,
this can just be a cache, and not part of the git tree itself.
On Thu, Mar 20, 2008 at 2:18 AM, Harvey Harrison
<harvey.harrison@gmail.com> wrote:
> On Thu, 2008-03-20 at 00:56 -0400, Shawn O. Pearce wrote:
> > Bryan Donlan <bdonlan@gmail.com> wrote:
>
> > > /revmap/NNN - a reference to the commit hash in the .git-svn branch
> > > corresponding to the given subversion revision number
> >
> > How about using a simple flat file interface? To initially prime
> > the file you can do something like:
> >
> > git rev-list --topo-order --date-order --reverse --all >.git/svn-map
> >
> > and then number the revisions by the line number that they appear on.
> > Locating a Git SHA-1 for a specific SVN revision would be a simple
> > case of lseek(fd, 41 * rev, SEEK_SET). Going the other direction
> > would be more of a challenge, but is still doable.
> >
> > Updating the file should just require appending new commits; if
> > the SVN client wants a new commit you append on and return the
> > line number. If Git has caused new commits not in this file you
> > need to rebuild the log. This would have to be done incrementally,
> > to prevent changing a prior SVN revision number that clients may
> > already know about.
>
> Why not just copy the rev_map format git-svn already uses, it's pretty
> efficient.
2008/3/20 Jakub Narebski <jnareb@gmail.com>:
> By the way, have you looked into what git-svn uses? IIRC it had some
> improvements to avoid spending more disk space on SVN revno <-> Git SHA-1
> mapping than on the repository itself...
git-svn's rev_map format's designed to support gaps in revision
numbers. Since I need to record all revisions, the four-byte revision
number field can go - but apart from that, that seems fine for the
revision map cache.
On Thu, Mar 20, 2008 at 5:22 AM, Julian Phillips
<julian@quantumfyre.co.uk> wrote:
> On Thu, 20 Mar 2008, Shawn O. Pearce wrote:
>
> > Bryan Donlan <bdonlan@gmail.com> wrote:
> >> /props/{trunk/,branches/,tags/} - file properties; props on directories will be
> >> represented with a reserved filename (._GIT-SVN-DIRPROPS perhaps)
> >> copyfrom information might be in /props, or in a seperate tree
> >
> > How critical are file properties to an SVN client for proper
> > functioning? Given the challenges already in front of you for this
> > project I would almost encourage you to avoid dealing with file
> > level properties. Its hard enough to make something that speaks SVN
> > on the wire but reads/writes Git on disk, not to mention you have
> > to somehow "flatten" the Git DAG down into a sequential revision
> > namespace to make the SVN clients happy. So deferring property
> > support until later may be wise.
>
> You might need to get svn:eol-style working to prevent the svn client from
> munging any binary files? Can't think of any other vital properties atm.
The subversion client won't touch your binaries /unless/ svn:eol-style
is set. If I just return an empty set of properties it'll leave them
alone.
Dealing with eol-style in particular is somewhat hard with git - it
works by transforming the file into a canonical end-of-line style on
the client before sending it to the server, and transforming back when
it's checked out. I don't know how it'd behave if someone on the git
side commited with a different eol style than it expects.
>
>
> >> /revprops/NNN - revision properties for the given revision number
> >
> > Ditto. Aside from the special merge properties you mentioned,
> > I wonder if you can simply avoid implementing support for these
> > early on.
>
> Since you have to explicitly enable revprop editing in the subversion
> repository by enabling a hook script, I should think that this was
> definately something that could be left at the bottom of the TODO list ...
>
> Though you do need to be able to convert commit info into the appropriate
> revprops (e.g. commit msg -> svn:log revprop)
Right, and at that point adding full revprop editing ought not to be too hard.
Anyway, I'll rework my milestones a bit before I submit the proper
proposal. Also, after looking at libgit in a bit more detail, I think
it might be necessary to not use it after all, as subversion requires
support for multiple open repositories, as well as thread safety (at
least when accessing different open repo from different threads).
Perhaps a thread-safe git library would be a nice SoC project as well?
:)
Thanks for the feedback,
Bryan Donlan
^ permalink raw reply [flat|nested] 11+ messages in thread
* thread-safe libgit.a as a GSoC project, was Re: [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem
2008-03-22 5:02 ` Bryan Donlan
@ 2008-03-22 11:35 ` Johannes Schindelin
2008-03-23 1:34 ` Govind Salinas
2008-03-24 19:50 ` Bryan Donlan
0 siblings, 2 replies; 11+ messages in thread
From: Johannes Schindelin @ 2008-03-22 11:35 UTC (permalink / raw)
To: Bryan Donlan
Cc: git, Sam Vilain, Shawn O. Pearce, Harvey Harrison,
Julian Phillips, Jakub Narebski
Hi,
On Sat, 22 Mar 2008, Bryan Donlan wrote:
> On Wed, Mar 19, 2008 at 12:08 AM, Bryan Donlan <bdonlan@gmail.com> wrote:
>
> > I'm planning to apply for the git summer of code project. My proposal
> > is based on the project idea of a subversion gateway for git,
> > implemented with a new subversion filesystem layer. A draft of my
> > proposal follows; I'd appreciate any comments/questions on it before
> > the application period proper begins.
>
> Thanks for all the comments. To try to avoid spamming the list, I've
> replied in a single message, if it'd be better to reply individually
> in the future please let me know.
My preference is to have single replies, possibly changing the subject
("xyz, was Re: blabla"), but it is maybe just me.
> Also, after looking at libgit in a bit more detail, I think it might be
> necessary to not use it after all, as subversion requires support for
> multiple open repositories, as well as thread safety (at least when
> accessing different open repo from different threads). Perhaps a
> thread-safe git library would be a nice SoC project as well?
As I said on IRC yesterday, I think that such a libgit.a would be nice,
_but_
- a lot of git programs expect to be one-shot, and libgit.a shows that,
- not many people will help you with your effort, but just ignore it and
actively introduce things that do not help libification (at least that's
my experience),
- unless you have a proper need for such a library, I do not think there
is enough motivation to actually get it to completion.
I once thought that libification would be nice, and important, but as I do
not need it myself, I reversed my opinion.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: thread-safe libgit.a as a GSoC project, was Re: [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem
2008-03-22 11:35 ` thread-safe libgit.a as a GSoC project, was " Johannes Schindelin
@ 2008-03-23 1:34 ` Govind Salinas
2008-03-23 2:10 ` Johannes Schindelin
2008-03-24 19:50 ` Bryan Donlan
1 sibling, 1 reply; 11+ messages in thread
From: Govind Salinas @ 2008-03-23 1:34 UTC (permalink / raw)
To: Johannes Schindelin
Cc: Bryan Donlan, git, Sam Vilain, Shawn O. Pearce, Harvey Harrison,
Julian Phillips, Jakub Narebski
On Sat, Mar 22, 2008 at 6:35 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Sat, 22 Mar 2008, Bryan Donlan wrote:
> > Also, after looking at libgit in a bit more detail, I think it might be
> > necessary to not use it after all, as subversion requires support for
> > multiple open repositories, as well as thread safety (at least when
> > accessing different open repo from different threads). Perhaps a
> > thread-safe git library would be a nice SoC project as well?
>
> As I said on IRC yesterday, I think that such a libgit.a would be nice,
> _but_
>
> - a lot of git programs expect to be one-shot, and libgit.a shows that,
>
> - not many people will help you with your effort, but just ignore it and
> actively introduce things that do not help libification (at least that's
> my experience),
>
> - unless you have a proper need for such a library, I do not think there
> is enough motivation to actually get it to completion.
>
I would use it for my pyrite work, although it will be some time before I
could contribute to such an effort. I expect it would be useful for
anyone who wants to make a language binding that uses native
git underneath.
Just so you know *someone* will use it.
Thanks,
Govind.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: thread-safe libgit.a as a GSoC project, was Re: [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem
2008-03-23 1:34 ` Govind Salinas
@ 2008-03-23 2:10 ` Johannes Schindelin
0 siblings, 0 replies; 11+ messages in thread
From: Johannes Schindelin @ 2008-03-23 2:10 UTC (permalink / raw)
To: Govind Salinas
Cc: Bryan Donlan, git, Sam Vilain, Shawn O. Pearce, Harvey Harrison,
Julian Phillips, Jakub Narebski
Hi,
On Sat, 22 Mar 2008, Govind Salinas wrote:
> I would use it for my pyrite work, although it will be some time before
> I could contribute to such an effort. I expect it would be useful for
> anyone who wants to make a language binding that uses native git
> underneath.
>
> Just so you know *someone* will use it.
I know people would use it. My point was: those people that want to use
it have the best starting point to make it happen, because they (should)
actually care about libification.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: thread-safe libgit.a as a GSoC project, was Re: [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem
2008-03-22 11:35 ` thread-safe libgit.a as a GSoC project, was " Johannes Schindelin
2008-03-23 1:34 ` Govind Salinas
@ 2008-03-24 19:50 ` Bryan Donlan
1 sibling, 0 replies; 11+ messages in thread
From: Bryan Donlan @ 2008-03-24 19:50 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
On Sat, Mar 22, 2008 at 7:35 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Sat, 22 Mar 2008, Bryan Donlan wrote:
>
> > On Wed, Mar 19, 2008 at 12:08 AM, Bryan Donlan <bdonlan@gmail.com> wrote:
> >
> > > I'm planning to apply for the git summer of code project. My proposal
> > > is based on the project idea of a subversion gateway for git,
> > > implemented with a new subversion filesystem layer. A draft of my
> > > proposal follows; I'd appreciate any comments/questions on it before
> > > the application period proper begins.
> >
> > Thanks for all the comments. To try to avoid spamming the list, I've
> > replied in a single message, if it'd be better to reply individually
> > in the future please let me know.
>
> My preference is to have single replies, possibly changing the subject
> ("xyz, was Re: blabla"), but it is maybe just me.
>
> > Also, after looking at libgit in a bit more detail, I think it might be
> > necessary to not use it after all, as subversion requires support for
> > multiple open repositories, as well as thread safety (at least when
> > accessing different open repo from different threads). Perhaps a
> > thread-safe git library would be a nice SoC project as well?
>
> As I said on IRC yesterday, I think that such a libgit.a would be nice,
> _but_
>
> - a lot of git programs expect to be one-shot, and libgit.a shows that,
>
> - not many people will help you with your effort, but just ignore it and
> actively introduce things that do not help libification (at least that's
> my experience),
>
> - unless you have a proper need for such a library, I do not think there
> is enough motivation to actually get it to completion.
>
> I once thought that libification would be nice, and important, but as I do
> not need it myself, I reversed my opinion.
All right. If I do end up having to recreate (thread-safe,
multiple-git-dir-safe) logic for my project, I'll try to keep in mind
the possibility of spinning it off into a proper library later though
:)
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-03-24 19:50 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-19 4:08 [SoC RFC] libsvn-fs-git: A git backend for the subversion filesystem Bryan Donlan
2008-03-20 4:31 ` Sam Vilain
2008-03-20 4:56 ` Shawn O. Pearce
2008-03-20 6:18 ` Harvey Harrison
2008-03-20 9:22 ` Julian Phillips
2008-03-20 10:01 ` Jakub Narebski
2008-03-22 5:02 ` Bryan Donlan
2008-03-22 11:35 ` thread-safe libgit.a as a GSoC project, was " Johannes Schindelin
2008-03-23 1:34 ` Govind Salinas
2008-03-23 2:10 ` Johannes Schindelin
2008-03-24 19:50 ` Bryan Donlan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).