git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: GSoC proposal for svn remote helper
       [not found] <BANLkTinYyxxkZpmEF2PYXMb_BjCVcbTkYw@mail.gmail.com>
@ 2011-04-08  3:42 ` Dmitry Ivankov
  2011-04-08  5:21   ` Jonathan Nieder
  0 siblings, 1 reply; 7+ messages in thread
From: Dmitry Ivankov @ 2011-04-08  3:42 UTC (permalink / raw)
  To: git; +Cc: jrnieder, artagnon, david.barr, srabbelier

resending in plain-text to make vger.kernel.org happy.

Hi.

This is the second iteration of my GSoC proposal, the first one was on
melange and
got nice responses from Jonathan Nieder, David Barr and Sverre Rabbelier.
However the conclusion was that it looks too ambitious, and that I should roll
out it to the list. So here it is in a diff-style.

I would like to work on "Remote helper for Subversion and git-svn".
My major motivation is to make git-svn repository easy to clone, and to make
git-svn (fetch) faster on huge repositories.

Project Goals:
+ * Design and create fully functional prototype of new git-svn which is
cloneable and quite fast. By fully functional I mean that it'll be
able to fetch, push, etc. but probably won't have automatic tags and
branches discovery and like, but will allow it to be implemented on
top. Oh, it just hit me that given a path (read trunk) to track and a
svndump it looks trivial to discover all it's branches - just seek for
copies.
+ * Get all the needed core git changes merged. Some of these exist already and
only need help with polishing, reviewing and merging.
- * Complete git-remote-svn and get it merged.
- * Implement new git-svn and get it merged too.
+ * Make the prototype as close to being merged as possible.

Milestones for prototype functionality:
 * Be able to track whole / as the only remote branch.
 - * Be able to dumb track some path as the only remote branch. Could be done
 either via pruning in git-remote-svn or via maintaining custom branch,
 probably with the help of git-filter-branch.
 - * Finally, be able to work with multiple svn branches. There are many ways to
 achieve this and several kinds of what features the ability to work includes,
 so there even should be a milestone for choosing the ones to implement.
 + * Be able to track several "paths" as branches so that they have expected
 history (whole path copying is branching), and so that these branches are
 cloneable (will be the same in different git-svn repositories tracking the
 same svn repo, under reasonable assumptions like svn:author not being
 propedited :) ).
 * Anything else that'll appear to be interesting and related.

 Sorry for the late submission to this list, I was really puzzled on how to
 make my proposal more realistic and still as useful for git-svn as possible.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GSoC proposal for svn remote helper
  2011-04-08  3:42 ` GSoC proposal for svn remote helper Dmitry Ivankov
@ 2011-04-08  5:21   ` Jonathan Nieder
  2011-04-08  7:11     ` Jonathan Nieder
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jonathan Nieder @ 2011-04-08  5:21 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, artagnon, david.barr, srabbelier, Eric Wong

(+cc: Eric who brought us git-svn)
Hi Dmitry,

Dmitry Ivankov wrote:

> This is the second iteration of my GSoC proposal

Great; let's iron this out.

> I would like to work on "Remote helper for Subversion and git-svn".
> My major motivation is to make git-svn repository easy to clone, and to make
> git-svn (fetch) faster on huge repositories.

So, my new first impression is that this goal might make things hard[1].

I think replacing git-svn with an imperfect emulation would not leave
people happy.  Existing configurations need to continue to work.

> Project Goals:
> + * Design and create fully functional prototype of new git-svn which is
> cloneable and quite fast.

*If* one does not have this goal ("new git-svn") then there is a
chance to move past some of git-svn's limitations[2].

All that said, these tools could be used to speed up git-svn.  

> By fully functional I mean that it'll be
> able to fetch, push, etc. but probably won't have automatic tags and
> branches discovery and like, but will allow it to be implemented on
> top. Oh, it just hit me that given a path (read trunk) to track and a
> svndump it looks trivial to discover all it's branches - just seek for
> copies.

As mentioned before, this sounds very ambitious.  Once we have a
timeline showing how this breaks down into small steps it should
hopefully be clearer way.

> + * Get all the needed core git changes merged.

The following is probably controversial.  It's my opinion only.

Since you can't control what other people do, I don't think it's right
to judge your project's success or failure based on whether it gets
merged.  Put another way, the product of your work that can be judged
is not whatever fraction gets accepted in git.git by the end of the
summer[3].

So I think the goal is whatever it is (a working and suitable "git
clone svn://foo" command, say) and getting feedback by pushing changes
upstream and responding to it is a part of how that happens.

At some point there will probably be a point of no return --- "if the
design of this patch is not right, I would have to rewrite everything
on top of a redesign of it".  I'd encourage getting input on such
patches _very_ early and working hard to get them merged at least to
"next" (i.e., to have a rough consensus that they are suitable modulo
small tweaks).  I would love it if the proposal included a timeline
pointing out some examples of this.

> Some of these exist already and
> only need help with polishing, reviewing and merging.

Do you mean support for parsing "svnadmin dump --deltas" output?  It
is already polished and reviewed; it's only sitting out-of-tree for
now because it makes the commandline usage awkward and it would be
nice to merge some improvements to that at the same time.

> + * Make the prototype as close to being merged as possible.

That's kind of vague, you know. :)

> Milestones for prototype functionality:
[list of features snipped]

Could you say something about how you would go about implementing
these?

Sorry for the ramble, and thanks for working on this.

Ciao,
Jonathan

[1] git-svn.perl is a work of art and a wonder to behold, and if your aim
is to make a compatible replacement for it, the first step will be to
understand its design deeply.  And the thing is, that much, while
valuable anyway, is pretty hard already.

You see, "git svn" has heuristics for

 - matching up git history to svn history by reading commit messages;
 - pushing mergy history as linear history by rebasing internally
   (dcommit);
 - finding the branches, merges, branch renames, and so on in an
   imperfectly structured history (find_parent etc)
 - what particular paths are relevant  (--ignore-paths)

and maintains some of its own data in the repository:

 - a configuration scheme and wide variety of supported configurations;
 - a log for unhandled pieces of history;
 - a cache mapping svn revision numbers to git commits

and people rely a lot on an odd coincidence:

 - using "git svn clone" twice with the same configuration on the same
   repository will, at least most of the time, give the same commit
   names.

[2] Well, it mostly comes down to one limitation.  To give a quick
sketch:

If I clone a repository with "git svn", then I am in a way a
second-class citizen.  The history shown with "git log" is filled
with "git-svn-id:" lines that are not very interesting to me (the
revision number is still interesting, of course).  I cannot use
"git push" to push my work, and in fact I cannot push my work as a
branch reflecting the real development history at all --- I have to
rebase it at the same time as pushing.  Whenever I push, the commit
names for my work change, so other branches based on my work don't
show up in "gitk" as based on my work any more.

Wouldn't it be nicer to be able to do

 alice$ git clone svn::http://svn.apache.org/repos/asf/subversion
 alice$ cd subversion
 alice$ ... hack hack hack ...

 bob$ git clone 'alice:~/src/subversion'
 bob$ cd subversion
 bob$ ... hack hack hack ...;	# make some changes on top of alice's work

 alice$ git fetch origin; # anything new upstream?
 alice$ git push origin; # push my changes upstream

 bob$ git remote add upstream svn::http://svn.apache.org/repos/asf/subversion
 bob$ git fetch upstream
 bob$ # push my changes on top of alice's (which were already pushed):
 bob$ git push upstream

That is the dream.  Because there is not a clearly appropriate
one-to-one mapping between possible svn histories and possible git
histories, there are going to have to be limitations[1], but that is
an ideal to strive for.

Sounds hard, maybe?  Yeah, it is, but getting at least fetch support
using the tools David and Ram made sounds easier to me than a fully
compatible replacement for git-svn.

[3] Meanwhile, just writing and publishing code is not enough, since
the code might have a fatal flaw that means no one will use it ("ivory
tour syndrome").  So what do I mean by the above?

As students work, I hope they will keep the mailing list posted on
their progress and find small pieces to review and merge early.  In
response they might get some questions and suggestions for
improvement; the response to these is just as important as the code.

On one hand this feedback is an important sanity check on the broad
features of your work and a means to get the details right for
inclusion in git (i.e., get it merged).  On the other hand, one should
not be tempted by interesting side tracks and avoid getting the actual
project done; you have to be able to say "no, I will not be working on
that".  Out of these conversations emerge better code and
documentation of the design in the form of list archives.

See [4] for a better explanation of this workflow.

[4] http://thread.gmane.org/gmane.comp.version-control.git/142623/focus=142877

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GSoC proposal for svn remote helper
  2011-04-08  5:21   ` Jonathan Nieder
@ 2011-04-08  7:11     ` Jonathan Nieder
  2011-04-08  8:47     ` Dmitry Ivankov
  2011-04-09 23:19     ` Eric Wong
  2 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2011-04-08  7:11 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, artagnon, david.barr, srabbelier, Eric Wong

Hi again,

A small clarification.

Jonathan Nieder wrote:

> As mentioned before, this sounds very ambitious.  Once we have a
> timeline showing how this breaks down into small steps it should
> hopefully be clearer way.

Agh, that sentence doesn't even parse.  I ought to have have said:
"Breaking the task into concrete steps will make it easier to see what
is realistic."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GSoC proposal for svn remote helper
  2011-04-08  5:21   ` Jonathan Nieder
  2011-04-08  7:11     ` Jonathan Nieder
@ 2011-04-08  8:47     ` Dmitry Ivankov
  2011-04-08 22:31       ` Jonathan Nieder
  2011-04-09 23:19     ` Eric Wong
  2 siblings, 1 reply; 7+ messages in thread
From: Dmitry Ivankov @ 2011-04-08  8:47 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, artagnon, david.barr, srabbelier, Eric Wong

On Fri, Apr 8, 2011 at 11:21 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> (+cc: Eric who brought us git-svn)
> Hi Dmitry,
>
> Dmitry Ivankov wrote:
>
>> This is the second iteration of my GSoC proposal
>
> Great; let's iron this out.
>
>> I would like to work on "Remote helper for Subversion and git-svn".
>> My major motivation is to make git-svn repository easy to clone, and to make
>> git-svn (fetch) faster on huge repositories.
>
> So, my new first impression is that this goal might make things hard[1].
>
> I think replacing git-svn with an imperfect emulation would not leave
> people happy.  Existing configurations need to continue to work.
I should have used different names for current git-svn.perl and what
should be tracking svn repo in git somewhat better than git-svn.perl.
Maybe call it git-svn-ng. It should definitely support common
workflows, but I think that it should not be too close in
configuration and behavior details. git-svn.perl is a personal setup
and I doubt someone shares it somehow, so transition won't be hard. My
focus is on git-svn-ng core operations - fast fetch, push, ability to
clone at least upstream svn state from another git repo, and I see way
to a complete replacement as follows:
1) introductory step, nothing new compared to git-svn.perl, but
expected to be already faster on allowed operations
git clone svn::/svnroot (svnroot can be a path in local or remote
repo, this is already supported by svn layers)
..hack..
git svn dcommit or like - put changes to the remote

2) allow private clones, that is be able to exchange svn updates or do
initial clone via git
git clone /somewhere/git_ro_version_of_svn.git (maybe some additional
command/key to get svn metadata)
git remote add upstream svn://svnroot (git remote update should be as
quick as on origin)
..hack..
git svn dcommit

3) allow tracking of a path, something like, intermediate,
functionally the same or slightly different in corner cases
git remote add svn svn:://svnroot/ (it's still a good idea to have
whole root specified)
git svn branch trunk svn/trunk@12 (at first it'll be ok to behave like
as if root is svnroot/trunk, just a new syntax)

4) follow tracked path, create branches
git remote add svn svn:://svnroot/
git svn branch trunk svn/trunk@12
git svn branch trunk svn/branches/stable@14
git remote update
git merge-base svn/trunk svn/branches/stable
#for example svn/trunk@13
git remote update
#got svn/trunk@23 -> svn/branches/fixups (maybe if option discover is
set; here we need the root to check that the destination is ok)
git svn branch trunk svn/branches/old@13 (if somehow it wasn't discovered)
git remote update (maybe, if we store whole root history, will be fast)
and also create svn branches with git, ugly way is to:
git checkout svnroot
cp -r trunk branches/experimental
git add branches/experimental
git commit && git svn dcommit
or maybe:
git svn branch trunk svn/branches/experimental
git checkout svn/branches/experimental
git commit --allow-empty -m "create new branch"
git svn dcommit

These four imho are fine as a minimal functionality - one can use it
for git-svn.perl replacement if perfomance and clones are more
important than heuristics and easy of use.

5) tweak 3-4), allow tracking branches, sharing branches, committing
merges, and a lot more, see below for an idea of getting rid of
rebase.

Uh, that became a long story, anyway, I clearly see that there is
git-svn.perl which can do git<->svn interaction quite comfortable for
users, there is almost ready faster git<->svn transport, there are
already a bunch remote-helpers available on the good side, and on the
bad it's currently hard to get even initial clone. So I consider it
quite possible for a GSoC project to get a kind git-svn-ng that is
cloneable, faster than git-svn.perl, and hopefully doesn't involve
deep understanding and patching git-svn.perl. All this with the idea
of extending it to handle git workflows between two git-svn-ng clones
and a svn repo, or just better git workflows inside one git-svn-ng.

>
>> Project Goals:
>> + * Design and create fully functional prototype of new git-svn which is
>> cloneable and quite fast.
>
> *If* one does not have this goal ("new git-svn") then there is a
> chance to move past some of git-svn's limitations[2].
I'll write inline in [2]
>
> All that said, these tools could be used to speed up git-svn.
>
>> By fully functional I mean that it'll be
>> able to fetch, push, etc. but probably won't have automatic tags and
>> branches discovery and like, but will allow it to be implemented on
>> top. Oh, it just hit me that given a path (read trunk) to track and a
>> svndump it looks trivial to discover all it's branches - just seek for
>> copies.
>
> As mentioned before, this sounds very ambitious.  Once we have a
> timeline showing how this breaks down into small steps it should
> hopefully be clearer way.
Ok, I'll try to break it into some steps.
>
>> + * Get all the needed core git changes merged.
>
> The following is probably controversial.  It's my opinion only.
>
> Since you can't control what other people do, I don't think it's right
> to judge your project's success or failure based on whether it gets
> merged.  Put another way, the product of your work that can be judged
> is not whatever fraction gets accepted in git.git by the end of the
> summer[3].
That means one can't blame them if it's not merged, but also git is a
mentor and that'd be strange to choose a goal like "write a thing I'll
use myself" :)

> So I think the goal is whatever it is (a working and suitable "git
> clone svn://foo" command, say) and getting feedback by pushing changes
> upstream and responding to it is a part of how that happens.
That makes a great sense to keep both things in mind, yes :)
>
> At some point there will probably be a point of no return --- "if the
> design of this patch is not right, I would have to rewrite everything
> on top of a redesign of it".  I'd encourage getting input on such
> patches _very_ early and working hard to get them merged at least to
> "next" (i.e., to have a rough consensus that they are suitable modulo
> small tweaks).  I would love it if the proposal included a timeline
> pointing out some examples of this.
>
>> Some of these exist already and
>> only need help with polishing, reviewing and merging.
>
> Do you mean support for parsing "svnadmin dump --deltas" output?  It
> is already polished and reviewed; it's only sitting out-of-tree for
> now because it makes the commandline usage awkward and it would be
> nice to merge some improvements to that at the same time.
Yep, help in whatever way I can with this one and also I saw helpers
branches introducing new remote-helpers commands or extending existing
core functions. I hope all the needed core changes will quickly popup
on early stages.
>
>> + * Make the prototype as close to being merged as possible.
>
> That's kind of vague, you know. :)
Yep, but I don't know any good metric for this :)
>
>> Milestones for prototype functionality:
> [list of features snipped]
>
> Could you say something about how you would go about implementing
> these?
>
> Sorry for the ramble, and thanks for working on this.
No problem absolutely, thank you for feedback, I like the challenges.
>
> Ciao,
> Jonathan
>
> [1] git-svn.perl is a work of art and a wonder to behold, and if your aim
> is to make a compatible replacement for it, the first step will be to
> understand its design deeply.  And the thing is, that much, while
> valuable anyway, is pretty hard already.
[skipped git-svn.perl heuristics]
> and people rely a lot on an odd coincidence:
>
>  - using "git svn clone" twice with the same configuration on the same
>   repository will, at least most of the time, give the same commit
>   names.
I want this to happen always wherever two clone & fetch sequences
reach the same remote revision.

>
> [2] Well, it mostly comes down to one limitation.  To give a quick
> sketch:
>
> If I clone a repository with "git svn", then I am in a way a
> second-class citizen.  The history shown with "git log" is filled
> with "git-svn-id:" lines that are not very interesting to me (the
> revision number is still interesting, of course).
As already mentioned I didn't mean exact emulation of git-svn.
Ideally I'd like git commit object to include only immutable svn data,
and even more it should be the same after a round trip to the svn repo
and back.

> I cannot use
> "git push" to push my work, and in fact I cannot push my work as a
> branch reflecting the real development history at all --- I have to
> rebase it at the same time as pushing.  Whenever I push, the commit
> names for my work change, so other branches based on my work don't
> show up in "gitk" as based on my work any more.
>
> Wouldn't it be nicer to be able to do
>
>  alice$ git clone svn::http://svn.apache.org/repos/asf/subversion
>  alice$ cd subversion
>  alice$ ... hack hack hack ...
>
>  bob$ git clone 'alice:~/src/subversion'
>  bob$ cd subversion
>  bob$ ... hack hack hack ...;   # make some changes on top of alice's work
>
>  alice$ git fetch origin; # anything new upstream?
>  alice$ git push origin; # push my changes upstream
>
>  bob$ git remote add upstream svn::http://svn.apache.org/repos/asf/subversion
>  bob$ git fetch upstream
>  bob$ # push my changes on top of alice's (which were already pushed):
>  bob$ git push upstream
>
> That is the dream.  Because there is not a clearly appropriate
> one-to-one mapping between possible svn histories and possible git
> histories, there are going to have to be limitations[1], but that is
> an ideal to strive for.
I have an idea for it, quite raw, but could work.
We are limited by svn:
- commit isn't a push, it's rebase & push. So we don't control the parent ref.
- there are no branches in svn. There are paths, but we can convert
them to branches of another paths.
- there are no merges in svn. That's a trouble, but maybe we can try
to use svn:mergeinfo to create and read multiple parent refs.
- we definitely want to keep everything needed inside svn, or we are
sure to diverge in different clones sometime.

So what we do:
- get rid of svn revision: just have path@rev -> sha1 mapping
separately, of course history of path@rev should look like the history
of sha1
- learn to create and fetch back merge commits: try svn:mergeinfo
- be sure to control the parents: don't let svn to commit on top of
something different from git parent:
-- if path wasn't changed in the repo while we were hacking, commit it
and it'll come back as the same sha1
-- if it was, create a svn branch of our parent, commit there, and
then create a merge commit of these two, commit it and get same merge
history back
-- and if we are commiting a merge, create/commit to branches as necessary

Not perfect, but it hardly can be cleaner to emulate git history in
svn, and get it back unchanged. And it should be optional too, not all
svn commits need this.

> Sounds hard, maybe?  Yeah, it is, but getting at least fetch support
> using the tools David and Ram made sounds easier to me than a fully
> compatible replacement for git-svn.
>
> [3] Meanwhile, just writing and publishing code is not enough, since
> the code might have a fatal flaw that means no one will use it ("ivory
> tour syndrome").  So what do I mean by the above?
>
> As students work, I hope they will keep the mailing list posted on
> their progress and find small pieces to review and merge early.  In
> response they might get some questions and suggestions for
> improvement; the response to these is just as important as the code.
>
> On one hand this feedback is an important sanity check on the broad
> features of your work and a means to get the details right for
> inclusion in git (i.e., get it merged).  On the other hand, one should
> not be tempted by interesting side tracks and avoid getting the actual
> project done; you have to be able to say "no, I will not be working on
> that".  Out of these conversations emerge better code and
> documentation of the design in the form of list archives.
>
> See [4] for a better explanation of this workflow.
>
> [4] http://thread.gmane.org/gmane.comp.version-control.git/142623/focus=142877
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GSoC proposal for svn remote helper
  2011-04-08  8:47     ` Dmitry Ivankov
@ 2011-04-08 22:31       ` Jonathan Nieder
  2011-04-09  8:21         ` Dmitry Ivankov
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2011-04-08 22:31 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, artagnon, david.barr, srabbelier, Eric Wong

Hi again,

Dmitry Ivankov wrote:

> I should have used different names for current git-svn.perl and what
> should be tracking svn repo in git somewhat better than git-svn.perl.
> Maybe call it git-svn-ng.

Ah, sorry I misunderstood.

> It should definitely support common
> workflows, but I think that it should not be too close in
> configuration and behavior details.

At the moment I am more concerned with what its guts will look like
than what features it will support.  A feature list is just a way to
advertise how good the guts are. ;-)

> I have an idea for it, quite raw, but could work.

The discussion here might also be helpful:
http://thread.gmane.org/gmane.comp.version-control.git/159235/focus=159264

> We are limited by svn:
> - commit isn't a push, it's rebase & push. So we don't control the parent ref.

Yes, this is unfortunate.  When I send in my change A, there is no
server-side locking to prevent someone else from making a change B
on the same branch first, with a semantically meaningless result.

svnrdump load has to deal with the same problem.  Possible
workarounds:

 * prevent pushes to the svn repository (using some out-of-band
   mechanism) before allowing git to push to it.  This is what svnsync
   users typically do.

 * lock directory, check parent, commit if okay, unlock directory.
   This is probably the best we can do.

 * extend the SVN RA protocol to provide a lock that is automatically
   removed if the connection is interrupted, and use that.

Ram, does the above (#2) make sense?  (I think this would be something
for svn fast-import to take care of; by the time the stream gets to
"svnrdump load" hopefully the relevant branches are already locked or
protected from contention some other way.)

So I think this is a problem but not insurmountable.  I think it's
even possible to ignore at first (by assuming exclusive access to the
target svn repo --- i.e., blame the user :)) as long as it is
documented.  Thanks for a reminder.

Reference: http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_ra_svn/protocol

> - there are no branches in svn. There are paths, but we can convert
> them to branches of another paths.

I think it's reasonable to assume a standard trunk / tags / branches
layout and simple "svn cp" records for new and renamed branches, at
least to start out.

> - there are no merges in svn. That's a trouble, but maybe we can try
> to use svn:mergeinfo to create and read multiple parent refs.

The hard part is distinguishing merges from cherry-picks.  Seems
optional to start out (i.e., a prototype can assume each commit has
a single parent).

> - we definitely want to keep everything needed inside svn, or we are
> sure to diverge in different clones sometime.

Right --- this means we have to make it very clear to the user when
they are creating history that will be unrepresentable in svn.

> So what we do:
> - get rid of svn revision: just have path@rev -> sha1 mapping
> separately, of course history of path@rev should look like the history
> of sha1

The current idea that's been floating in the air is to use a "git notes"
tree (see git-notes(1)) to store the mapping from git commit names to svn
path@rev specifications.  This way, (1) they can be easily shared but
(2) they are not part of the commit name.  So it is easy to discard
the mapping, or to adjust it when the project moves to a different svn
server.

Meanwhile svn-fe needs quick mapping in the other direction, revs ->
commits, for its work.  So one would want to cache the reverse mapping
(with the notes as the master copy).

> - learn to create and fetch back merge commits: try svn:mergeinfo

See above.

> - be sure to control the parents: don't let svn to commit on top of
> something different from git parent:

This requires more information from svn-fe (copyfrom info).  So either
svn-fe needs to take care of this itself, or the next program in the
pipeline would get some information from it (using an extension to the
fast-import format, an extra output stream, or extra data in comments,
"progress" commands, or the log message).

> -- if path wasn't changed in the repo while we were hacking, commit it
> and it'll come back as the same sha1

Hopefully. :)

> -- if it was, create a svn branch of our parent, commit there, and
> then create a merge commit of these two, commit it and get same merge
> history back

Yikes.  I don't think typical projects would like the resulting
history.

> -- and if we are commiting a merge, create/commit to branches as necessary

Yes, this is an interesting question.  Given a history like this (time
flowing left to right):

         E --- F --- G
        /             \
 A --- B --- C -- D -- H

where A is the latest rev of trunk/, how do we push this history to
svn?  Where is the name of the side branch recorded in the git
history?

On answer is that it's possible to learn the historical branch name by
parsing the commit message for "H".  Yuck.

I'd put off pushing merges to start.

> Not perfect, but it hardly can be cleaner to emulate git history in
> svn, and get it back unchanged. And it should be optional too, not all
> svn commits need this.

I think the cleanest solution is often to reject a push if it is not
obvious how to represent it remotely, just as though the remote server
had a hook that rejected it.

Jonathan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GSoC proposal for svn remote helper
  2011-04-08 22:31       ` Jonathan Nieder
@ 2011-04-09  8:21         ` Dmitry Ivankov
  0 siblings, 0 replies; 7+ messages in thread
From: Dmitry Ivankov @ 2011-04-09  8:21 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, artagnon, david.barr, srabbelier, Eric Wong

Hi

On Sat, Apr 9, 2011 at 4:31 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> At the moment I am more concerned with what its guts will look like
> than what features it will support.  A feature list is just a way to
> advertise how good the guts are. ;-)
My current view is following:
Use svnrdump stream to track / in say svnroot branch. It'll be linear
(1), commits will include actual diffs, and some revprops translated
to their git counterparts like svn:log,svn:author,svn:date (2). And
it'll be a bridge for svn interaction.
In git-notes tree store sha1 -> svn rev mapping(3), and also for each
svn rev store all it's revprops.
Store and maintain /path branches - there we have some freedom of
choosing git parents.
That's all about fetch. Obviously svnrdump will be used to push
fast-forward linear history back to svnroot, or to a /path branch
which goes the same way in fast-forward case.
And to be somewhat usable we want to be able to rebase-push/dcommit
(if there is a dense stream of svn commits going, we don't want the
user to type git rebase, git push and fail until he gets a lucky
timing).
Merges need more thinking, and may be not that necessary for a start.
Path ignores, or even revision ignores should be possible to implement
in the code, but just an emergency tools for a user (sometimes people
(by mistake) commit something enormous or incompatible with
filesystems names or like, so that user considers it ok to trash this
out of his history), also there could be a need for permanent path
filter (like track /projX, not /bin) - just the same, be ready that
sometime it'll have to be implemented.

[skipped some of svn commit "races" and merge tricks]

>> -- if it was, create a svn branch of our parent, commit there, and
>> then create a merge commit of these two, commit it and get same merge
>> history back
>
> Yikes.  I don't think typical projects would like the resulting
> history.
Will make them mad, but in some cases it should be ok, if we are
pushing a lengthy topic branch they'll sometimes prefer to see it as
one commit.

> Yes, this is an interesting question.  Given a history like this (time
> flowing left to right):
>
>         E --- F --- G
>        /             \
>  A --- B --- C -- D -- H
>
> where A is the latest rev of trunk/, how do we push this history to
> svn?  Where is the name of the side branch recorded in the git
> history?
Could be either autogenerated with some user pattern, like
/branches/user/tmpXX, or specified explicitly in git-notes or
somewhere, or maybe we have already pushed a placeholder branch to svn
and will commit there.
>
> On answer is that it's possible to learn the historical branch name by
> parsing the commit message for "H".  Yuck.
>
> I'd put off pushing merges to start.
It's definitely not in the minimal plan.

> I think the cleanest solution is often to reject a push if it is not
> obvious how to represent it remotely, just as though the remote server
> had a hook that rejected it.
Makes sense, after all plain svn users want to see svn-like history,
because they still use svn.

(1) In theory we could track whole svnroot merges (from svnroot2 on
the same repo for example, or hypothetical merge from another repo)
but that's hardly used by anyone.
(2) The hardest thing is to decide which ones to store in git.
Translating more gives a better look and feel, translating less
reduces the chances to get the same git objects on another clone. And
what should git do if this data changes is not a trivial choice too.
(3) And a funny thing is could happen that there are path1@rev1
path2@rev2 that produce the same sha1. That's perfectly fine because
they are just refs, care should be taken when choosing a path to
commit to though. Also svn will distinguish them, but it's just a
corner case.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: GSoC proposal for svn remote helper
  2011-04-08  5:21   ` Jonathan Nieder
  2011-04-08  7:11     ` Jonathan Nieder
  2011-04-08  8:47     ` Dmitry Ivankov
@ 2011-04-09 23:19     ` Eric Wong
  2 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2011-04-09 23:19 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Dmitry Ivankov, git, artagnon, david.barr, srabbelier

Jonathan Nieder <jrnieder@gmail.com> wrote:
> [1] git-svn.perl is a work of art and a wonder to behold, and if your aim
> is to make a compatible replacement for it, the first step will be to
> understand its design deeply.  And the thing is, that much, while
> valuable anyway, is pretty hard already.

Thanks.  I credit the use of automated tests for making things work
as well as it does, especially given how ugly the code has gotten.

I think the first step to understanding much of it is to split the
modules into individual files and then understanding the test cases.

> and people rely a lot on an odd coincidence:
> 
>  - using "git svn clone" twice with the same configuration on the same
>    repository will, at least most of the time, give the same commit
>    names.

I did design that in mind when I first started, I'm glad it still mostly
works after all this time :)

> That is the dream.

*My* dream was to replace Subversion and get people using git.  git-svn
was designed to be self-obsoleting from the start.  For the most part
I consider it a success since I no longer need to use it :)

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-04-09 23:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <BANLkTinYyxxkZpmEF2PYXMb_BjCVcbTkYw@mail.gmail.com>
2011-04-08  3:42 ` GSoC proposal for svn remote helper Dmitry Ivankov
2011-04-08  5:21   ` Jonathan Nieder
2011-04-08  7:11     ` Jonathan Nieder
2011-04-08  8:47     ` Dmitry Ivankov
2011-04-08 22:31       ` Jonathan Nieder
2011-04-09  8:21         ` Dmitry Ivankov
2011-04-09 23:19     ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).