All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Achleitner <florian.achleitner2.6.31@gmail.com>
To: Andrew Sayers <andrew-git@pileofstuff.org>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Ramkumar Ramachandra <artagnon@gmail.com>,
	David Barr <davidbarr@google.com>,
	Sverre Rabbelier <srabbelier@gmail.com>,
	Dmitry Ivankov <divanorama@gmail.com>
Subject: Re: GSOC Proposal draft: git-remote-svn
Date: Sat, 14 Apr 2012 22:09:39 +0200	[thread overview]
Message-ID: <1472353.TRfidGPc01@flomedio> (raw)
In-Reply-To: <4F875785.6040103@pileofstuff.org>

Hi!

Thanks for your explainations.

On Thursday 12 April 2012 23:30:29 Andrew Sayers wrote:
> On 12/04/12 16:28, Florian Achleitner wrote:
> > I'm not sure if storing this in a seperate directory tree makes sense,
> > mostly looking at performance. All these files will only contain some
> > bytes, I guess. Andrew, why did you choose JSON?
> 
> JSON has become my default storage format in recent years, so it seemed
> like the natural thing to use for a format I wanted to chuck in and get
> on with my work :)
> 
> JSON is my default format because it's reasonably space-efficient,
> human-readable, widely supported and can represent everything I care
> about except recursive data structures (which I didn't need for this
> job).  You can do cleverer things if you don't mind being
> language-specific (e.g. Perl's "Storable" module supports recursive data
> structures but can't be used with other languages) or if you don't mind
> needing special tools (e.g. git's index is highly efficient but can't be
> debugged with `less`).  I've found you won't go far wrong if you start
> with JSON and pick something else when the requirements become more obvious.
> 
> I gzipped the file because JSON isn't *that* space-efficient, and
> because very large repositories are likely to produce enough JSON that
> people will notice.  I found that gzipping the file significantly
> reduced its size without having too much effect on run time.
> 
> I've attached a sample file representing the first few commits from the
> GNU R repository.  The problem I referred to obliquely before isn't with
> JSON, but with gzip - how would you add more revisions to the end of the
> file without gunzipping it, adding one line, then gzipping it again?
> One very nice feature of a directory structure is that you could store
> it in git and get all that stuff for free.
> 
> To be clear, I'm not pushing any particular solution to this problem,
> just offering some anecdotal evidence.  I'm pretty sure that SVN branch
> export is an I/O bound problem - David Barr has said much the same about
> svn-fe, but I was surprised to see it was still the bottleneck with a
> problem that stripped out almost all the data from the dump and pushed
> it through not-particularly-optimised Perl.  Having said that, the
> initial import problem (potentially hundreds of thousands of revisions
> needing manual attention) doesn't necessarily want the same solution as
> update (tens of revisions that can almost always be read automatically).

JSON seems to be a good initial choice..

> 
> >>  . tracing history past branch creation events, using the now-saved
> >>  
> >>    copyfrom information.
> >>  
> >>  . tracing second-parent history using svn:mergeinfo properties.
> > 
> > This is about detection when to create a git merge-commit, right?
> 
> Yes - SVN has always stored metadata about where a directory was copied
> from (unlike git, which prefers to detect it automatically), and since
> version 1.0.5, SVN has added "svn:mergeinfo" metadata to files and
> directories specifying which revisions of which other files or
> directories have been cherry-picked in to them.
> 
> If you know a directory is a branch, "copyfrom" metadata is a very
> useful signal for detecting branches created from it.  Unfortunately,
> "svn:mergeinfo" is not as useful - aside from anything else, older
> repositories often exhibit a period where there's no metadata at all,
> then a gradual migration through SVN's early experiments with merge
> tracking (like svnmerge.py), before everyone gradually standardises on
> svn:mergeinfo and leaves the other tools behind.  Oh, and the interface
> doesn't tell you about unmerged revisions, so if anybody ever forgets to
> merge a revision then you'll probably never notice.

This doesn't look very straight forward. In the svn docs they say there is a 
command that outputs which changesets are eligible to merge.
http://svnbook.red-
bean.com/en/1.7/svn.branchmerge.basicmerging.html#svn.branchmerge.basicmerging.mergeinfo

But I don't know if that helps.
>
> I'm planning to tackle this stuff in the work I'm doing, but I expect
> people will be reporting edge cases until the day the last SVN
> repository shuts down.  You shouldn't need to worry about it much on the
> git side of SBL, which is probably best for your sanity ;)

:)

> 
> 	- Andrew

  reply	other threads:[~2012-04-14 20:17 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-19 14:42 GSoC intro Florian Achleitner
2012-03-19 21:31 ` Andrew Sayers
2012-03-20 12:25 ` Florian Achleitner
2012-03-20 13:19 ` David Barr
2012-03-21 21:16   ` Florian Achleitner
2012-03-26 11:06     ` Ramkumar Ramachandra
2012-03-27 13:53       ` Florian Achleitner
2012-04-02  8:30         ` GSOC Proposal draft: git-remote-svn Florian Achleitner
2012-04-02 11:00           ` Ramkumar Ramachandra
2012-04-02 20:57           ` Jonathan Nieder
2012-04-02 23:04             ` Jonathan Nieder
2012-04-03  7:49             ` Florian Achleitner
2012-04-03 18:48               ` Jonathan Nieder
2012-04-05 16:18             ` Tomas Carnecky
2012-04-02 22:17           ` Andrew Sayers
2012-04-02 22:29             ` Jonathan Nieder
2012-04-02 23:20               ` Andrew Sayers
2012-04-03  0:09                 ` Jonathan Nieder
2012-04-03 21:53                   ` Andrew Sayers
2012-04-03 22:21                     ` Jonathan Nieder
2012-04-05 13:36           ` Florian Achleitner
2012-04-05 15:47             ` Dmitry Ivankov
2012-04-09 18:59             ` Stephen Bash
2012-04-10 17:17             ` Jonathan Nieder
2012-04-10 22:30               ` Andrew Sayers
2012-04-10 23:46                 ` Jonathan Nieder
2012-04-11 19:09                 ` Florian Achleitner
2012-04-14 22:57                   ` Andrew Sayers
2012-04-11 15:51               ` Jakub Narebski
2012-04-11 15:56                 ` Jonathan Nieder
2012-04-11 19:20               ` Florian Achleitner
2012-04-11 19:44                 ` Dmitry Ivankov
2012-04-11 19:53                 ` Jonathan Nieder
2012-04-11 22:43                   ` Andrew Sayers
2012-04-12  9:02                   ` Thomas Rast
2012-04-12 15:28               ` Florian Achleitner
2012-04-12 22:30                 ` Andrew Sayers
2012-04-14 20:09                   ` Florian Achleitner [this message]
2012-04-14 21:35                     ` Andrew Sayers
2012-04-15  3:13                       ` Stephen Bash
2012-04-13 19:19                 ` Jonathan Nieder
2012-04-14 20:15                   ` Florian Achleitner
2012-04-18 20:16               ` Florian Achleitner
2012-04-19 12:26                 ` Florian Achleitner
2012-03-28  8:09       ` GSoC intro Miles Bader
2012-03-28  9:30         ` Dmitry Ivankov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1472353.TRfidGPc01@flomedio \
    --to=florian.achleitner2.6.31@gmail.com \
    --cc=andrew-git@pileofstuff.org \
    --cc=artagnon@gmail.com \
    --cc=davidbarr@google.com \
    --cc=divanorama@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.