git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Florian Achleitner <florian.achleitner2.6.31@gmail.com>
To: Andrew Sayers <andrew-git@pileofstuff.org>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Ramkumar Ramachandra <artagnon@gmail.com>,
	David Barr <davidbarr@google.com>,
	Sverre Rabbelier <srabbelier@gmail.com>,
	Dmitry Ivankov <divanorama@gmail.com>
Subject: Re: GSOC Proposal draft: git-remote-svn
Date: Sat, 14 Apr 2012 22:09:39 +0200	[thread overview]
Message-ID: <1472353.TRfidGPc01@flomedio> (raw)
In-Reply-To: <4F875785.6040103@pileofstuff.org>

Hi!

Thanks for your explainations.

On Thursday 12 April 2012 23:30:29 Andrew Sayers wrote:
> On 12/04/12 16:28, Florian Achleitner wrote:
> > I'm not sure if storing this in a seperate directory tree makes sense,
> > mostly looking at performance. All these files will only contain some
> > bytes, I guess. Andrew, why did you choose JSON?
> 
> JSON has become my default storage format in recent years, so it seemed
> like the natural thing to use for a format I wanted to chuck in and get
> on with my work :)
> 
> JSON is my default format because it's reasonably space-efficient,
> human-readable, widely supported and can represent everything I care
> about except recursive data structures (which I didn't need for this
> job).  You can do cleverer things if you don't mind being
> language-specific (e.g. Perl's "Storable" module supports recursive data
> structures but can't be used with other languages) or if you don't mind
> needing special tools (e.g. git's index is highly efficient but can't be
> debugged with `less`).  I've found you won't go far wrong if you start
> with JSON and pick something else when the requirements become more obvious.
> 
> I gzipped the file because JSON isn't *that* space-efficient, and
> because very large repositories are likely to produce enough JSON that
> people will notice.  I found that gzipping the file significantly
> reduced its size without having too much effect on run time.
> 
> I've attached a sample file representing the first few commits from the
> GNU R repository.  The problem I referred to obliquely before isn't with
> JSON, but with gzip - how would you add more revisions to the end of the
> file without gunzipping it, adding one line, then gzipping it again?
> One very nice feature of a directory structure is that you could store
> it in git and get all that stuff for free.
> 
> To be clear, I'm not pushing any particular solution to this problem,
> just offering some anecdotal evidence.  I'm pretty sure that SVN branch
> export is an I/O bound problem - David Barr has said much the same about
> svn-fe, but I was surprised to see it was still the bottleneck with a
> problem that stripped out almost all the data from the dump and pushed
> it through not-particularly-optimised Perl.  Having said that, the
> initial import problem (potentially hundreds of thousands of revisions
> needing manual attention) doesn't necessarily want the same solution as
> update (tens of revisions that can almost always be read automatically).

JSON seems to be a good initial choice..

> 
> >>  . tracing history past branch creation events, using the now-saved
> >>  
> >>    copyfrom information.
> >>  
> >>  . tracing second-parent history using svn:mergeinfo properties.
> > 
> > This is about detection when to create a git merge-commit, right?
> 
> Yes - SVN has always stored metadata about where a directory was copied
> from (unlike git, which prefers to detect it automatically), and since
> version 1.0.5, SVN has added "svn:mergeinfo" metadata to files and
> directories specifying which revisions of which other files or
> directories have been cherry-picked in to them.
> 
> If you know a directory is a branch, "copyfrom" metadata is a very
> useful signal for detecting branches created from it.  Unfortunately,
> "svn:mergeinfo" is not as useful - aside from anything else, older
> repositories often exhibit a period where there's no metadata at all,
> then a gradual migration through SVN's early experiments with merge
> tracking (like svnmerge.py), before everyone gradually standardises on
> svn:mergeinfo and leaves the other tools behind.  Oh, and the interface
> doesn't tell you about unmerged revisions, so if anybody ever forgets to
> merge a revision then you'll probably never notice.

This doesn't look very straight forward. In the svn docs they say there is a 
command that outputs which changesets are eligible to merge.
http://svnbook.red-
bean.com/en/1.7/svn.branchmerge.basicmerging.html#svn.branchmerge.basicmerging.mergeinfo

But I don't know if that helps.
>
> I'm planning to tackle this stuff in the work I'm doing, but I expect
> people will be reporting edge cases until the day the last SVN
> repository shuts down.  You shouldn't need to worry about it much on the
> git side of SBL, which is probably best for your sanity ;)

:)

> 
> 	- Andrew

  reply	other threads:[~2012-04-14 20:17 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-19 14:42 GSoC intro Florian Achleitner
2012-03-19 21:31 ` Andrew Sayers
2012-03-20 12:25 ` Florian Achleitner
2012-03-20 13:19 ` David Barr
2012-03-21 21:16   ` Florian Achleitner
2012-03-26 11:06     ` Ramkumar Ramachandra
2012-03-27 13:53       ` Florian Achleitner
2012-04-02  8:30         ` GSOC Proposal draft: git-remote-svn Florian Achleitner
2012-04-02 11:00           ` Ramkumar Ramachandra
2012-04-02 20:57           ` Jonathan Nieder
2012-04-02 23:04             ` Jonathan Nieder
2012-04-03  7:49             ` Florian Achleitner
2012-04-03 18:48               ` Jonathan Nieder
2012-04-05 16:18             ` Tomas Carnecky
2012-04-02 22:17           ` Andrew Sayers
2012-04-02 22:29             ` Jonathan Nieder
2012-04-02 23:20               ` Andrew Sayers
2012-04-03  0:09                 ` Jonathan Nieder
2012-04-03 21:53                   ` Andrew Sayers
2012-04-03 22:21                     ` Jonathan Nieder
2012-04-05 13:36           ` Florian Achleitner
2012-04-05 15:47             ` Dmitry Ivankov
2012-04-09 18:59             ` Stephen Bash
2012-04-10 17:17             ` Jonathan Nieder
2012-04-10 22:30               ` Andrew Sayers
2012-04-10 23:46                 ` Jonathan Nieder
2012-04-11 19:09                 ` Florian Achleitner
2012-04-14 22:57                   ` Andrew Sayers
2012-04-11 15:51               ` Jakub Narebski
2012-04-11 15:56                 ` Jonathan Nieder
2012-04-11 19:20               ` Florian Achleitner
2012-04-11 19:44                 ` Dmitry Ivankov
2012-04-11 19:53                 ` Jonathan Nieder
2012-04-11 22:43                   ` Andrew Sayers
2012-04-12  9:02                   ` Thomas Rast
2012-04-12 15:28               ` Florian Achleitner
2012-04-12 22:30                 ` Andrew Sayers
2012-04-14 20:09                   ` Florian Achleitner [this message]
2012-04-14 21:35                     ` Andrew Sayers
2012-04-15  3:13                       ` Stephen Bash
2012-04-13 19:19                 ` Jonathan Nieder
2012-04-14 20:15                   ` Florian Achleitner
2012-04-18 20:16               ` Florian Achleitner
2012-04-19 12:26                 ` Florian Achleitner
2012-03-28  8:09       ` GSoC intro Miles Bader
2012-03-28  9:30         ` Dmitry Ivankov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1472353.TRfidGPc01@flomedio \
    --to=florian.achleitner2.6.31@gmail.com \
    --cc=andrew-git@pileofstuff.org \
    --cc=artagnon@gmail.com \
    --cc=davidbarr@google.com \
    --cc=divanorama@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).