From: Florian Achleitner <florian.achleitner2.6.31@gmail.com>
To: Andrew Sayers <andrew-git@pileofstuff.org>
Cc: Jonathan Nieder <jrnieder@gmail.com>,
Git Mailing List <git@vger.kernel.org>,
Ramkumar Ramachandra <artagnon@gmail.com>,
David Barr <davidbarr@google.com>,
Sverre Rabbelier <srabbelier@gmail.com>,
Dmitry Ivankov <divanorama@gmail.com>
Subject: Re: GSOC Proposal draft: git-remote-svn
Date: Sat, 14 Apr 2012 22:09:39 +0200 [thread overview]
Message-ID: <1472353.TRfidGPc01@flomedio> (raw)
In-Reply-To: <4F875785.6040103@pileofstuff.org>
Hi!
Thanks for your explainations.
On Thursday 12 April 2012 23:30:29 Andrew Sayers wrote:
> On 12/04/12 16:28, Florian Achleitner wrote:
> > I'm not sure if storing this in a seperate directory tree makes sense,
> > mostly looking at performance. All these files will only contain some
> > bytes, I guess. Andrew, why did you choose JSON?
>
> JSON has become my default storage format in recent years, so it seemed
> like the natural thing to use for a format I wanted to chuck in and get
> on with my work :)
>
> JSON is my default format because it's reasonably space-efficient,
> human-readable, widely supported and can represent everything I care
> about except recursive data structures (which I didn't need for this
> job). You can do cleverer things if you don't mind being
> language-specific (e.g. Perl's "Storable" module supports recursive data
> structures but can't be used with other languages) or if you don't mind
> needing special tools (e.g. git's index is highly efficient but can't be
> debugged with `less`). I've found you won't go far wrong if you start
> with JSON and pick something else when the requirements become more obvious.
>
> I gzipped the file because JSON isn't *that* space-efficient, and
> because very large repositories are likely to produce enough JSON that
> people will notice. I found that gzipping the file significantly
> reduced its size without having too much effect on run time.
>
> I've attached a sample file representing the first few commits from the
> GNU R repository. The problem I referred to obliquely before isn't with
> JSON, but with gzip - how would you add more revisions to the end of the
> file without gunzipping it, adding one line, then gzipping it again?
> One very nice feature of a directory structure is that you could store
> it in git and get all that stuff for free.
>
> To be clear, I'm not pushing any particular solution to this problem,
> just offering some anecdotal evidence. I'm pretty sure that SVN branch
> export is an I/O bound problem - David Barr has said much the same about
> svn-fe, but I was surprised to see it was still the bottleneck with a
> problem that stripped out almost all the data from the dump and pushed
> it through not-particularly-optimised Perl. Having said that, the
> initial import problem (potentially hundreds of thousands of revisions
> needing manual attention) doesn't necessarily want the same solution as
> update (tens of revisions that can almost always be read automatically).
JSON seems to be a good initial choice..
>
> >> . tracing history past branch creation events, using the now-saved
> >>
> >> copyfrom information.
> >>
> >> . tracing second-parent history using svn:mergeinfo properties.
> >
> > This is about detection when to create a git merge-commit, right?
>
> Yes - SVN has always stored metadata about where a directory was copied
> from (unlike git, which prefers to detect it automatically), and since
> version 1.0.5, SVN has added "svn:mergeinfo" metadata to files and
> directories specifying which revisions of which other files or
> directories have been cherry-picked in to them.
>
> If you know a directory is a branch, "copyfrom" metadata is a very
> useful signal for detecting branches created from it. Unfortunately,
> "svn:mergeinfo" is not as useful - aside from anything else, older
> repositories often exhibit a period where there's no metadata at all,
> then a gradual migration through SVN's early experiments with merge
> tracking (like svnmerge.py), before everyone gradually standardises on
> svn:mergeinfo and leaves the other tools behind. Oh, and the interface
> doesn't tell you about unmerged revisions, so if anybody ever forgets to
> merge a revision then you'll probably never notice.
This doesn't look very straight forward. In the svn docs they say there is a
command that outputs which changesets are eligible to merge.
http://svnbook.red-
bean.com/en/1.7/svn.branchmerge.basicmerging.html#svn.branchmerge.basicmerging.mergeinfo
But I don't know if that helps.
>
> I'm planning to tackle this stuff in the work I'm doing, but I expect
> people will be reporting edge cases until the day the last SVN
> repository shuts down. You shouldn't need to worry about it much on the
> git side of SBL, which is probably best for your sanity ;)
:)
>
> - Andrew
next prev parent reply other threads:[~2012-04-14 20:17 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-19 14:42 GSoC intro Florian Achleitner
2012-03-19 21:31 ` Andrew Sayers
2012-03-20 12:25 ` Florian Achleitner
2012-03-20 13:19 ` David Barr
2012-03-21 21:16 ` Florian Achleitner
2012-03-26 11:06 ` Ramkumar Ramachandra
2012-03-27 13:53 ` Florian Achleitner
2012-04-02 8:30 ` GSOC Proposal draft: git-remote-svn Florian Achleitner
2012-04-02 11:00 ` Ramkumar Ramachandra
2012-04-02 20:57 ` Jonathan Nieder
2012-04-02 23:04 ` Jonathan Nieder
2012-04-03 7:49 ` Florian Achleitner
2012-04-03 18:48 ` Jonathan Nieder
2012-04-05 16:18 ` Tomas Carnecky
2012-04-02 22:17 ` Andrew Sayers
2012-04-02 22:29 ` Jonathan Nieder
2012-04-02 23:20 ` Andrew Sayers
2012-04-03 0:09 ` Jonathan Nieder
2012-04-03 21:53 ` Andrew Sayers
2012-04-03 22:21 ` Jonathan Nieder
2012-04-05 13:36 ` Florian Achleitner
2012-04-05 15:47 ` Dmitry Ivankov
2012-04-09 18:59 ` Stephen Bash
2012-04-10 17:17 ` Jonathan Nieder
2012-04-10 22:30 ` Andrew Sayers
2012-04-10 23:46 ` Jonathan Nieder
2012-04-11 19:09 ` Florian Achleitner
2012-04-14 22:57 ` Andrew Sayers
2012-04-11 15:51 ` Jakub Narebski
2012-04-11 15:56 ` Jonathan Nieder
2012-04-11 19:20 ` Florian Achleitner
2012-04-11 19:44 ` Dmitry Ivankov
2012-04-11 19:53 ` Jonathan Nieder
2012-04-11 22:43 ` Andrew Sayers
2012-04-12 9:02 ` Thomas Rast
2012-04-12 15:28 ` Florian Achleitner
2012-04-12 22:30 ` Andrew Sayers
2012-04-14 20:09 ` Florian Achleitner [this message]
2012-04-14 21:35 ` Andrew Sayers
2012-04-15 3:13 ` Stephen Bash
2012-04-13 19:19 ` Jonathan Nieder
2012-04-14 20:15 ` Florian Achleitner
2012-04-18 20:16 ` Florian Achleitner
2012-04-19 12:26 ` Florian Achleitner
2012-03-28 8:09 ` GSoC intro Miles Bader
2012-03-28 9:30 ` Dmitry Ivankov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1472353.TRfidGPc01@flomedio \
--to=florian.achleitner2.6.31@gmail.com \
--cc=andrew-git@pileofstuff.org \
--cc=artagnon@gmail.com \
--cc=davidbarr@google.com \
--cc=divanorama@gmail.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).