git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Sayers <andrew-git@pileofstuff.org>
To: Florian Achleitner <florian.achleitner@student.tugraz.at>
Cc: Ramkumar Ramachandra <artagnon@gmail.com>,
	David Barr <davidbarr@google.com>,
	Git Mailing List <git@vger.kernel.org>,
	Jonathan Nieder <jrnieder@gmail.com>,
	Sverre Rabbelier <srabbelier@gmail.com>,
	Dmitry Ivankov <divanorama@gmail.com>
Subject: Re: GSOC Proposal draft: git-remote-svn
Date: Mon, 02 Apr 2012 23:17:48 +0100	[thread overview]
Message-ID: <4F7A258C.5000200@pileofstuff.org> (raw)
In-Reply-To: <2487557.B8qfnaixh3@flomedio>

Hey Florian,

Comments below.  The nitpickier ones aren't so much there to help the
proposal as for general information.

On 02/04/12 09:30, Florian Achleitner wrote:
<snip>
> 
> Subversion (svn) [2] was created as a successor of CVS, both follow a strict 
> client-server design, where the repository exclusively lives on the central 
> server and every client only checks out a copy of a single revision at a time. 
> SVN doesn't truly have a concept of branches. SVN branches are a copy of a 
> directory (so are tags).

Just a little nitpick - SVN was primarily inspired by CVS, but there's
no formal connection between the projects - both are developed by
different development teams even to this day.

<snip>
> git-fast-import [4] is a format to serialize a git repository into a text 
> format. It is used by the tools git-fast-import and git-fast-export.
> 
> The remote helper has to convert the foreign protocol and data (svn) to the 
> git-fast-import format.

As discussed on IRC, I'd like to see some discussion of solutions that
use plumbing directly (e.g. git-commit-tree) if you choose to focus on
branch import.

<snip>
> Branches exist due to the convention of having branches/, trunk/, and tags/ 
> directories in a repository, so do tags. But this is not mandatory and 
> therefore there are many different layouts. It follows that in svn it is also 
> possible to commit across branches. This means that a single commit can change 
> files on more than one branch (accidentally or deliberately).

This is basically accurate, but a contrived example might help explain
why fully automatic branch export is impossible in the general case:

Imagine a repository that consists of a single revision with a single
file, "scratchpad/libfoo/foo.c" - how would we decide which directory is
the branch?  Has the author has even decided yet?  For example, he might
be learning version control and not understand what branches are.

Having said that, automatic branch export might be possible in some
important special cases (like repositories that use the standard
layout).  I haven't really looked into this yet.

<snip>
>   - Because generating the branch mapping configuration already requires that 
> you have a dump of the svn repo, the helper should probably be able to read 
> from a file in place of svnrdump too.

It might help if I explain how the SVN branch exporter will work:

First, it will read an SVN dump and create a file containing JSON blobs
summarising each revision - e.g. it specifies which files were changed,
but not the contents of the changes.  As Ram mentioned, downloading the
dump and tee'ing it to both this process and svn-fe makes a lot of sense.

Next, it will read the JSON file and detect trunks.  This turns out to
be extremely fast now it's been freed from the SVN dump format.

Next, the user will have the opportunity to review the detected trunks.
 For example, if somebody put a "README.txt" in the root directory, the
previous step will need to be rerun with that file ignored.

Next, the main branch detection stage will be run using the JSON file
and the previous branch information.

Next, the user has another chance to make changes.  Some users will blow
straight past this stage, but sufficiently fussy users with sufficiently
large repositories could spend several days looping through this and the
previous stage until their branches and merges are just right.

The SBL file is finally complete whenever the user decides - you'll need
to tell them how to restart the import process, in case they restarted
their computer while they were refining the file.

<snip>
> 3. Add output capabilities to vcs-svn. Currently the code in vcs-svn can only 
> convert svn to git. To push to svn we also need conversion and mapping from 
> git to svn. The actual mapping code for branches should also be placed here 
> {??} and called by the remote helper.

I agree with Jonathan and Ram that we're not ready for this yet.  Even
mapping git branches back to a branchless representation won't be
practical until branch import is fairly mature.

	- Andrew

[1]https://github.com/andrew-sayers/Proof-of-concept-History-Converter/blob/master/git-branch-import.pl
[2]git sources git/Documentation/git-commit-tree.txt

  parent reply	other threads:[~2012-04-02 22:17 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-19 14:42 GSoC intro Florian Achleitner
2012-03-19 21:31 ` Andrew Sayers
2012-03-20 12:25 ` Florian Achleitner
2012-03-20 13:19 ` David Barr
2012-03-21 21:16   ` Florian Achleitner
2012-03-26 11:06     ` Ramkumar Ramachandra
2012-03-27 13:53       ` Florian Achleitner
2012-04-02  8:30         ` GSOC Proposal draft: git-remote-svn Florian Achleitner
2012-04-02 11:00           ` Ramkumar Ramachandra
2012-04-02 20:57           ` Jonathan Nieder
2012-04-02 23:04             ` Jonathan Nieder
2012-04-03  7:49             ` Florian Achleitner
2012-04-03 18:48               ` Jonathan Nieder
2012-04-05 16:18             ` Tomas Carnecky
2012-04-02 22:17           ` Andrew Sayers [this message]
2012-04-02 22:29             ` Jonathan Nieder
2012-04-02 23:20               ` Andrew Sayers
2012-04-03  0:09                 ` Jonathan Nieder
2012-04-03 21:53                   ` Andrew Sayers
2012-04-03 22:21                     ` Jonathan Nieder
2012-04-05 13:36           ` Florian Achleitner
2012-04-05 15:47             ` Dmitry Ivankov
2012-04-09 18:59             ` Stephen Bash
2012-04-10 17:17             ` Jonathan Nieder
2012-04-10 22:30               ` Andrew Sayers
2012-04-10 23:46                 ` Jonathan Nieder
2012-04-11 19:09                 ` Florian Achleitner
2012-04-14 22:57                   ` Andrew Sayers
2012-04-11 15:51               ` Jakub Narebski
2012-04-11 15:56                 ` Jonathan Nieder
2012-04-11 19:20               ` Florian Achleitner
2012-04-11 19:44                 ` Dmitry Ivankov
2012-04-11 19:53                 ` Jonathan Nieder
2012-04-11 22:43                   ` Andrew Sayers
2012-04-12  9:02                   ` Thomas Rast
2012-04-12 15:28               ` Florian Achleitner
2012-04-12 22:30                 ` Andrew Sayers
2012-04-14 20:09                   ` Florian Achleitner
2012-04-14 21:35                     ` Andrew Sayers
2012-04-15  3:13                       ` Stephen Bash
2012-04-13 19:19                 ` Jonathan Nieder
2012-04-14 20:15                   ` Florian Achleitner
2012-04-18 20:16               ` Florian Achleitner
2012-04-19 12:26                 ` Florian Achleitner
2012-03-28  8:09       ` GSoC intro Miles Bader
2012-03-28  9:30         ` Dmitry Ivankov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F7A258C.5000200@pileofstuff.org \
    --to=andrew-git@pileofstuff.org \
    --cc=artagnon@gmail.com \
    --cc=davidbarr@google.com \
    --cc=divanorama@gmail.com \
    --cc=florian.achleitner@student.tugraz.at \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).