From: Jonathan Nieder <jrnieder@gmail.com>
To: Dmitry Ivankov <divanorama@gmail.com>
Cc: git@vger.kernel.org, artagnon@gmail.com, david.barr@cordelta.com,
srabbelier@gmail.com, Eric Wong <normalperson@yhbt.net>
Subject: Re: GSoC proposal for svn remote helper
Date: Fri, 8 Apr 2011 00:21:26 -0500 [thread overview]
Message-ID: <20110408052126.GA22256@elie> (raw)
In-Reply-To: <BANLkTinHE-E5_mK8aKYv2f7yExVvfOFVRw@mail.gmail.com>
(+cc: Eric who brought us git-svn)
Hi Dmitry,
Dmitry Ivankov wrote:
> This is the second iteration of my GSoC proposal
Great; let's iron this out.
> I would like to work on "Remote helper for Subversion and git-svn".
> My major motivation is to make git-svn repository easy to clone, and to make
> git-svn (fetch) faster on huge repositories.
So, my new first impression is that this goal might make things hard[1].
I think replacing git-svn with an imperfect emulation would not leave
people happy. Existing configurations need to continue to work.
> Project Goals:
> + * Design and create fully functional prototype of new git-svn which is
> cloneable and quite fast.
*If* one does not have this goal ("new git-svn") then there is a
chance to move past some of git-svn's limitations[2].
All that said, these tools could be used to speed up git-svn.
> By fully functional I mean that it'll be
> able to fetch, push, etc. but probably won't have automatic tags and
> branches discovery and like, but will allow it to be implemented on
> top. Oh, it just hit me that given a path (read trunk) to track and a
> svndump it looks trivial to discover all it's branches - just seek for
> copies.
As mentioned before, this sounds very ambitious. Once we have a
timeline showing how this breaks down into small steps it should
hopefully be clearer way.
> + * Get all the needed core git changes merged.
The following is probably controversial. It's my opinion only.
Since you can't control what other people do, I don't think it's right
to judge your project's success or failure based on whether it gets
merged. Put another way, the product of your work that can be judged
is not whatever fraction gets accepted in git.git by the end of the
summer[3].
So I think the goal is whatever it is (a working and suitable "git
clone svn://foo" command, say) and getting feedback by pushing changes
upstream and responding to it is a part of how that happens.
At some point there will probably be a point of no return --- "if the
design of this patch is not right, I would have to rewrite everything
on top of a redesign of it". I'd encourage getting input on such
patches _very_ early and working hard to get them merged at least to
"next" (i.e., to have a rough consensus that they are suitable modulo
small tweaks). I would love it if the proposal included a timeline
pointing out some examples of this.
> Some of these exist already and
> only need help with polishing, reviewing and merging.
Do you mean support for parsing "svnadmin dump --deltas" output? It
is already polished and reviewed; it's only sitting out-of-tree for
now because it makes the commandline usage awkward and it would be
nice to merge some improvements to that at the same time.
> + * Make the prototype as close to being merged as possible.
That's kind of vague, you know. :)
> Milestones for prototype functionality:
[list of features snipped]
Could you say something about how you would go about implementing
these?
Sorry for the ramble, and thanks for working on this.
Ciao,
Jonathan
[1] git-svn.perl is a work of art and a wonder to behold, and if your aim
is to make a compatible replacement for it, the first step will be to
understand its design deeply. And the thing is, that much, while
valuable anyway, is pretty hard already.
You see, "git svn" has heuristics for
- matching up git history to svn history by reading commit messages;
- pushing mergy history as linear history by rebasing internally
(dcommit);
- finding the branches, merges, branch renames, and so on in an
imperfectly structured history (find_parent etc)
- what particular paths are relevant (--ignore-paths)
and maintains some of its own data in the repository:
- a configuration scheme and wide variety of supported configurations;
- a log for unhandled pieces of history;
- a cache mapping svn revision numbers to git commits
and people rely a lot on an odd coincidence:
- using "git svn clone" twice with the same configuration on the same
repository will, at least most of the time, give the same commit
names.
[2] Well, it mostly comes down to one limitation. To give a quick
sketch:
If I clone a repository with "git svn", then I am in a way a
second-class citizen. The history shown with "git log" is filled
with "git-svn-id:" lines that are not very interesting to me (the
revision number is still interesting, of course). I cannot use
"git push" to push my work, and in fact I cannot push my work as a
branch reflecting the real development history at all --- I have to
rebase it at the same time as pushing. Whenever I push, the commit
names for my work change, so other branches based on my work don't
show up in "gitk" as based on my work any more.
Wouldn't it be nicer to be able to do
alice$ git clone svn::http://svn.apache.org/repos/asf/subversion
alice$ cd subversion
alice$ ... hack hack hack ...
bob$ git clone 'alice:~/src/subversion'
bob$ cd subversion
bob$ ... hack hack hack ...; # make some changes on top of alice's work
alice$ git fetch origin; # anything new upstream?
alice$ git push origin; # push my changes upstream
bob$ git remote add upstream svn::http://svn.apache.org/repos/asf/subversion
bob$ git fetch upstream
bob$ # push my changes on top of alice's (which were already pushed):
bob$ git push upstream
That is the dream. Because there is not a clearly appropriate
one-to-one mapping between possible svn histories and possible git
histories, there are going to have to be limitations[1], but that is
an ideal to strive for.
Sounds hard, maybe? Yeah, it is, but getting at least fetch support
using the tools David and Ram made sounds easier to me than a fully
compatible replacement for git-svn.
[3] Meanwhile, just writing and publishing code is not enough, since
the code might have a fatal flaw that means no one will use it ("ivory
tour syndrome"). So what do I mean by the above?
As students work, I hope they will keep the mailing list posted on
their progress and find small pieces to review and merge early. In
response they might get some questions and suggestions for
improvement; the response to these is just as important as the code.
On one hand this feedback is an important sanity check on the broad
features of your work and a means to get the details right for
inclusion in git (i.e., get it merged). On the other hand, one should
not be tempted by interesting side tracks and avoid getting the actual
project done; you have to be able to say "no, I will not be working on
that". Out of these conversations emerge better code and
documentation of the design in the form of list archives.
See [4] for a better explanation of this workflow.
[4] http://thread.gmane.org/gmane.comp.version-control.git/142623/focus=142877
next prev parent reply other threads:[~2011-04-08 5:21 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <BANLkTinYyxxkZpmEF2PYXMb_BjCVcbTkYw@mail.gmail.com>
2011-04-08 3:42 ` GSoC proposal for svn remote helper Dmitry Ivankov
2011-04-08 5:21 ` Jonathan Nieder [this message]
2011-04-08 7:11 ` Jonathan Nieder
2011-04-08 8:47 ` Dmitry Ivankov
2011-04-08 22:31 ` Jonathan Nieder
2011-04-09 8:21 ` Dmitry Ivankov
2011-04-09 23:19 ` Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110408052126.GA22256@elie \
--to=jrnieder@gmail.com \
--cc=artagnon@gmail.com \
--cc=david.barr@cordelta.com \
--cc=divanorama@gmail.com \
--cc=git@vger.kernel.org \
--cc=normalperson@yhbt.net \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).