From: Jonathan Nieder <jrnieder@gmail.com>
To: Florian Achleitner <florian.achleitner@student.tugraz.at>
Cc: Ramkumar Ramachandra <artagnon@gmail.com>,
David Barr <davidbarr@google.com>,
Git Mailing List <git@vger.kernel.org>,
Andrew Sayers <andrew-git@pileofstuff.org>,
Sverre Rabbelier <srabbelier@gmail.com>,
Dmitry Ivankov <divanorama@gmail.com>
Subject: Re: GSOC Proposal draft: git-remote-svn
Date: Mon, 2 Apr 2012 15:57:00 -0500 [thread overview]
Message-ID: <20120402205659.GA13725@burratino> (raw)
In-Reply-To: <2487557.B8qfnaixh3@flomedio>
Hi Florian,
Florian Achleitner wrote:
> Here is my draft of the proposal for the GSoC project. RFC!
> Please comment and tell me what you think and if I understood it all right!
I like the rough idea. I also agree with Ram that the scope seems too
wide for one summer and think it would be useful to narrow the scope a
little.
Some tasks I can think of:
- getting Dmitry's importer into contrib/ and making sure it works
reliably. This might require some fixes to svnrdump, svn-fe,
and the transport-helper. Some known problems that I suspect may
be still unresolved:
- files marked with both svn:special (symlink) and svn:executable
- dealing with after-the-fact edits to the svn repository. For
example, revprops including svn:log can be and often are changed
after the fact.
- what happens when the connection to the Subversion server is
interrupted? The Subversion dump format does not have an
"end of commit" marker so currently we can get confused and
seem to succeed.
- svn-fe does not correctly handle revs that change a text file to
a symlink or vice versa without changing its text.
- UI for importing only some revisions (e.g., "all revisions after
r1000"). Dmitry has a patch for the svn-fe plumbing to handle
this but I don't think the corresponding change for the remote
helper has been written.
- this would probably also require changes to svnrdump. What
happens when r2000 involves copying a file from a version before
r1000? If imports do not start at r0, normal dumps of r1000:
are not self-contained.
- UI for storing the mapping between Subversion revision numbers and
git commit names in the git object db somewhere. Currently we
store it in a marks file. There is a script floating around to
convert that marks file into a set of commit notes and Dmitry also
has a patch for svn-fe to make it write commit notes directly.
What happens when the notes and marks file go out of sync --- which
is authoritative?
This also implies that repeated fetches would not have to start
importing again at r1.
- Storing empty directories and path-specific properties like
svn:ignore that we don't currently handle.
- Splitting history into branches.
Somehow svn-fe has to communicate "svn cp" source and target
information to the branch mapper so we can trace history to before
the birth of the paths we are following. That is, the full history
of branches/1.7.x/ includes the early history of trunk/ if the
1.7.x branch was originally created as a copy of the trunk.
This might be able to use mechanism similar to storage of
empty directories and path properties.
- UI for importing only a subset of paths (e.g., "just the trunk").
- this would probably also require changes to svnrdump. What
happens when r2000 involves copying a file from a branch we
have chosen not to import?
- Mapping authorship information from Subversion (which usually
amounts to a remote username) to something more idiomatic in git
(usually a human's name and email address) in a way that makes
round trips possible.
- Sharing an imported repository with other users of the remote
helper.
- this might involve changes to the remote helper machinery to
allow new clones to use some fetch/push ref specification
different from refs/heads/*:refs/remotes/origin/*, or it might
involve some change to core git to automatically push notes
corresponding to some refs in some situations.
- Importing <rev, path> pairs that have multiple parents. In the
subversion model, path nodes have only one (copyfrom) parent,
but repositories can use the svn:mergeinfo property to indicate
that changes made in certain revs to another patch have been
incorporated. Under what circumstances is that enough
justification to add a second parent on the git side?
- Because svn:mergeinfo is a normal path property, the branch
mapper could have enough information to take care of this with
the help of the previously mentioned facility for storing path
properties.
All of the above is just for reasonable fetch support.
For push support, one early problem to solve would be that pushing
a commit so that the git commit id from re-importing it is the same
requires permission to set the svn:date property. Is our target
audience one that already has that permission? Is that permission
something reasonable for a committer to ask for from the repository
admin in order to use the remote helper?
Because of the above:
> 1. Write a new bi-directional remote helper in C.
The word "new" makes me worried that you'd be throwing away whatever
work already exists. :)
[...]
> { Hmm.. so it looks like thats a lot? what do you think? }
I agree --- what you've described is more than one summer's worth
of work. Are there any aspects you're particularly interested in
focusing on? For example,
(1) If we focus on repositories without any branching structure at
all and where the user has full ability to write whatever she
pleases to the repository, I think developing a bidirectional
remote helper is feasible during the summer. Round-trip
support (i.e., commit ids staying the same with a push followed
by a fetch) is feasible with such a quick plan if we're willing
to store some git-specific junk in the repo.
(2) Regarding a tool that sits between svn-fe and the remote helper
and implements the "follow parent" rule for tracing the full
history of a single (linear) branch: I think developing that
_and_ getting it merged could fit in the summer.
(3) Regarding storing and sharing Subversion's path-specific
and revision-specific properties: I think implementing a
mechanism for that and getting it merged could fit in one
summer.
(4) Regarding getting git weirdness like distinct author and
committer names, lack of rename information cooked at commit
time, and timezones in author and committer dates handled during
pushes to Subversion in a non-invasive way that is user-friendly
for the pusher likely to be acceptable on the receiving side for
normal projects: that could certainly fill a summer.
(5) Subversion weirdness like revs that change the entire repository
at once in a many-branch repo, non-standard file modes, and
noticing and acting appropriately for svn:log messages that were
changed after the fact could fill another summer.
So ideally I would like 5 students working on the remote helper
project. ;-)
Hope that helps,
Jonathan
next prev parent reply other threads:[~2012-04-02 20:57 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-19 14:42 GSoC intro Florian Achleitner
2012-03-19 21:31 ` Andrew Sayers
2012-03-20 12:25 ` Florian Achleitner
2012-03-20 13:19 ` David Barr
2012-03-21 21:16 ` Florian Achleitner
2012-03-26 11:06 ` Ramkumar Ramachandra
2012-03-27 13:53 ` Florian Achleitner
2012-04-02 8:30 ` GSOC Proposal draft: git-remote-svn Florian Achleitner
2012-04-02 11:00 ` Ramkumar Ramachandra
2012-04-02 20:57 ` Jonathan Nieder [this message]
2012-04-02 23:04 ` Jonathan Nieder
2012-04-03 7:49 ` Florian Achleitner
2012-04-03 18:48 ` Jonathan Nieder
2012-04-05 16:18 ` Tomas Carnecky
2012-04-02 22:17 ` Andrew Sayers
2012-04-02 22:29 ` Jonathan Nieder
2012-04-02 23:20 ` Andrew Sayers
2012-04-03 0:09 ` Jonathan Nieder
2012-04-03 21:53 ` Andrew Sayers
2012-04-03 22:21 ` Jonathan Nieder
2012-04-05 13:36 ` Florian Achleitner
2012-04-05 15:47 ` Dmitry Ivankov
2012-04-09 18:59 ` Stephen Bash
2012-04-10 17:17 ` Jonathan Nieder
2012-04-10 22:30 ` Andrew Sayers
2012-04-10 23:46 ` Jonathan Nieder
2012-04-11 19:09 ` Florian Achleitner
2012-04-14 22:57 ` Andrew Sayers
2012-04-11 15:51 ` Jakub Narebski
2012-04-11 15:56 ` Jonathan Nieder
2012-04-11 19:20 ` Florian Achleitner
2012-04-11 19:44 ` Dmitry Ivankov
2012-04-11 19:53 ` Jonathan Nieder
2012-04-11 22:43 ` Andrew Sayers
2012-04-12 9:02 ` Thomas Rast
2012-04-12 15:28 ` Florian Achleitner
2012-04-12 22:30 ` Andrew Sayers
2012-04-14 20:09 ` Florian Achleitner
2012-04-14 21:35 ` Andrew Sayers
2012-04-15 3:13 ` Stephen Bash
2012-04-13 19:19 ` Jonathan Nieder
2012-04-14 20:15 ` Florian Achleitner
2012-04-18 20:16 ` Florian Achleitner
2012-04-19 12:26 ` Florian Achleitner
2012-03-28 8:09 ` GSoC intro Miles Bader
2012-03-28 9:30 ` Dmitry Ivankov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120402205659.GA13725@burratino \
--to=jrnieder@gmail.com \
--cc=andrew-git@pileofstuff.org \
--cc=artagnon@gmail.com \
--cc=davidbarr@google.com \
--cc=divanorama@gmail.com \
--cc=florian.achleitner@student.tugraz.at \
--cc=git@vger.kernel.org \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).