git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Florian Achleitner <florian.achleitner2.6.31@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Ramkumar Ramachandra <artagnon@gmail.com>,
	David Barr <davidbarr@google.com>,
	Andrew Sayers <andrew-git@pileofstuff.org>,
	Sverre Rabbelier <srabbelier@gmail.com>,
	Dmitry Ivankov <divanorama@gmail.com>
Subject: Re: GSOC Proposal draft: git-remote-svn
Date: Tue, 10 Apr 2012 12:17:07 -0500	[thread overview]
Message-ID: <20120410171707.GA3869@burratino> (raw)
In-Reply-To: <1421035.yALBSXSHGd@flomedio>

Hi,

Florian Achleitner wrote:

> Thanks for your inputs. I've now submitted a slightly updated version of my 
> proposal to google. Additionally it's on github [1].
>
> Summary of diffs:
> I'll concentrate on the fetching from svn, writing a remote helper without 
> branch detection (like svn-fe) first, and then creating the branch mapper.

Thanks for the update.

If I understand correctly, the remote helper from the first half would
do essentially the same thing as Dmitry's remote-svn-alpha script.
Since in shell script form it is very simple, I don't think it should
take more than a couple of days to write such a thing in C.

> Timeline
>
> GSoC timeline and summer holidays
> Summer holidays in Austria at 9th of July. So until the mid-term
> evaluations my git project will have co-exist with my regular
> university work and projects. But holidays extend until the beginning
> of October, so there’s some time left to catch up after the official
> end of GSoC.

Another possibility that some people in similar situations have
followed is to start early.  That works a little better since it means
that by the time midterm evaluations come around we can have a
reasonable idea of whether a change in strategy is needed for the
project to finished on time.

> I plan to split the project in two parts:
>
> Writing the remote helper using existing functions in vcs-svn to
> import svn history without detecting branches, like svn-fe does.
> Milestone: 9th of July, GSoC mid-term
>
> Writing a branch mapper for the remote helper that reads the config
> language (SBL) and imports branches trying to deal as good as possible
> with all the little pitfalls that will occur. Milestone: 20th of
> August, GSoC end

Could you flesh out this timeline more?  Ideally it would be nice to
have a definite plan here, even to the point of listing what patches
would need to be written, so during the summer all that would need to
happen is to execute and deal with bugs as they come.

Given the goal described here of an import with support for
automatically detecting branches, here are some rough steps I imagine
would be involved:

 . baseline: remote helper in C

 . option to import starting with a particular numbered revision.
   This would be good practice for seeing how options passed to
   "git clone -c" can be read from the config file.

 . option or URL schema to import a single project from a large
   Subversion repository that houses several projects.  This would
   already be useful in practice since importing the entire Apache
   Software Foundation repository takes a while which is a waste
   when one only wants the history of the Subversion project.

   How should the importer handle Subversion copy commands that
   refer to other projects in this case?

 . automatically detecting trunk when importing a project with the
   standard layout.  The trunk usually is not branched from elsewhere
   so this does not require copyfrom info.  Some design questions
   come up here: should the remote helper import the entire project
   tree, too?  (I think "yes", since copy commands that copy from
   other branches are very common and that would ensure the relevant
   info is available to git.)  What should the mapping of git commit
   names to Subversion revision numbers that is stored in notes say
   in this case?

 . detecting trunk and branches and exposing them as different remote
   branches.  This is a small step that just involves understanding
   how remote helpers expose branches.

 . storing path properties and copyfrom information in the commits
   produced by the vcs-svn/ library.  How should these be stored?
   For example, there could be a parallel directory structure
   in the tree:

	foo/
		bar.c
	baz/
		qux.c
	.properties/
		foo.properties
		foo/
			bar.c.properties
		baz/
			qux.c.properties

   with properites for <path> stored at .properties/<path>.properties.
   This strawman scheme doesn't work if the repository being imported
   has any paths ending with ".properties", though.  Ideas?

 . tracing history past branch creation events, using the now-saved
   copyfrom information.

 . tracing second-parent history using svn:mergeinfo properties.

In other words, in the above list the strategy is:

 1. First convert the remote helper to C so it doesn't have to be
    translated again later.

 2. Teach the remote helper to import a single project from a
    repository that houses multiple projects (i.e., path limiting).

 3. Teach the remote helper to split an imported project that uses
    the standard layout into branches (an application of the code
    from (2)).  This complicates the scheme for mapping between
    Subversion revision numbers and git commit ids.

 4. Teach the SVN dumpfile to fast-import stream converter not to
    lose the information that is needed in order to get parenthood
    information.

 5. Use the information from step (4) to get parenthood right for a
    project split into branches.

 6. Getting the second parent right (i.e., merges).  I mentioned
    this for fun but I don't expect there to be time for it.

Does that seem right, or does it need tweaks?  How long would each
step take?  Can the steps be subdivided into smaller steps?

Another question is: what is the design for this?  With the existing
remote-svn-alpha script, there are a few different components with
well defined interfaces:

	commands like "git fetch"
	  |
	  | (1)
	  |
	transport-helper --- (2) --- git fast-import
	  |                               |
	  | (2, 3)                        |
	  |                               |
	remote-svn-alpha                  | (3)
	  |             ''..              |
	  | (2)             ''(2)..       |
	  |                        ''..   |
	svnrdump --------- (3) -------- svn-fe

 (1) communicates using function calls and shared data
 (2) launches
 (3) communicates over pipe

Once remote-svn-alpha is rewritten in C, the same structure is still
present, though it might be less obvious because some of the (2)
and (3) can change into (1).

Where does the functionality you are adding fit into this picture?
Are there any new components being added, and if so what do they take
as input and output?

Hope that helps,
Jonathan

> [1] https://github.com/flyingflo/git/wiki/

  parent reply	other threads:[~2012-04-10 17:17 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-19 14:42 GSoC intro Florian Achleitner
2012-03-19 21:31 ` Andrew Sayers
2012-03-20 12:25 ` Florian Achleitner
2012-03-20 13:19 ` David Barr
2012-03-21 21:16   ` Florian Achleitner
2012-03-26 11:06     ` Ramkumar Ramachandra
2012-03-27 13:53       ` Florian Achleitner
2012-04-02  8:30         ` GSOC Proposal draft: git-remote-svn Florian Achleitner
2012-04-02 11:00           ` Ramkumar Ramachandra
2012-04-02 20:57           ` Jonathan Nieder
2012-04-02 23:04             ` Jonathan Nieder
2012-04-03  7:49             ` Florian Achleitner
2012-04-03 18:48               ` Jonathan Nieder
2012-04-05 16:18             ` Tomas Carnecky
2012-04-02 22:17           ` Andrew Sayers
2012-04-02 22:29             ` Jonathan Nieder
2012-04-02 23:20               ` Andrew Sayers
2012-04-03  0:09                 ` Jonathan Nieder
2012-04-03 21:53                   ` Andrew Sayers
2012-04-03 22:21                     ` Jonathan Nieder
2012-04-05 13:36           ` Florian Achleitner
2012-04-05 15:47             ` Dmitry Ivankov
2012-04-09 18:59             ` Stephen Bash
2012-04-10 17:17             ` Jonathan Nieder [this message]
2012-04-10 22:30               ` Andrew Sayers
2012-04-10 23:46                 ` Jonathan Nieder
2012-04-11 19:09                 ` Florian Achleitner
2012-04-14 22:57                   ` Andrew Sayers
2012-04-11 15:51               ` Jakub Narebski
2012-04-11 15:56                 ` Jonathan Nieder
2012-04-11 19:20               ` Florian Achleitner
2012-04-11 19:44                 ` Dmitry Ivankov
2012-04-11 19:53                 ` Jonathan Nieder
2012-04-11 22:43                   ` Andrew Sayers
2012-04-12  9:02                   ` Thomas Rast
2012-04-12 15:28               ` Florian Achleitner
2012-04-12 22:30                 ` Andrew Sayers
2012-04-14 20:09                   ` Florian Achleitner
2012-04-14 21:35                     ` Andrew Sayers
2012-04-15  3:13                       ` Stephen Bash
2012-04-13 19:19                 ` Jonathan Nieder
2012-04-14 20:15                   ` Florian Achleitner
2012-04-18 20:16               ` Florian Achleitner
2012-04-19 12:26                 ` Florian Achleitner
2012-03-28  8:09       ` GSoC intro Miles Bader
2012-03-28  9:30         ` Dmitry Ivankov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120410171707.GA3869@burratino \
    --to=jrnieder@gmail.com \
    --cc=andrew-git@pileofstuff.org \
    --cc=artagnon@gmail.com \
    --cc=davidbarr@google.com \
    --cc=divanorama@gmail.com \
    --cc=florian.achleitner2.6.31@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).