* GSOC remote-svn: branch detection @ 2012-08-03 9:43 Florian Achleitner 2012-08-03 18:17 ` Jonathan Nieder 0 siblings, 1 reply; 5+ messages in thread From: Florian Achleitner @ 2012-08-03 9:43 UTC (permalink / raw) To: git; +Cc: David Michael Barr, Jonathan Nieder, Andrew Sayers Hi! I'm playing around in vcs-svn/ to start a framework for detecting and processing branches in svndumps. So I wanted to let you know about my ideas. Two approaches: 1. Import linearly and split later: One idea is to import from svn linearly, i.e. one revision on top of it's predecessor, like now, and detect and split branches afterwards. The svn metadata is stored in git notes, so the required information would be available. + allows recovery, because the linear history is always here. + it's easier to peek around in the git history than in the svn dump during import to do the branch detection. - requires creation of new commits in the branch detection stage. - this results in double commits and awkward history, linear vs. branched. 2. Split during import: Detect branches as they are created while reading the svn dump and identify to which branch a following node belongs. First step is to restructure svndump.c to be able to buffer one complete revision for inspection before starting to write a commit to fast import. Probably it's possible to feed the blobs to fast import directly and only buffer node data and defer commit creation, but not the data. Currently, at the beginning of a new revision on the svn side, a new commit is created on top of a constant ref. When we support branches, we don't know the ref, i.e. the branch(es), the revision changes, before reading all the 'Node- *' lines. + feels more 'right' - requires revision buffering Generally: Detect branches as they are created by 'Node-copyfrom*' to some commonly used branch directories, like branches/. More complex branch detection can be implemented later, of course. Store detected branches permanently (necessary for incremental fetches), and assign every file modification to one of those branches, if possible. Else assign them to, hm .. If a revision modifies more than one branch, create multiple commits. Thanks for your comments and ideas! -- Florian ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: GSOC remote-svn: branch detection 2012-08-03 9:43 GSOC remote-svn: branch detection Florian Achleitner @ 2012-08-03 18:17 ` Jonathan Nieder 2012-08-04 6:40 ` Dmitry Ivankov 2012-08-04 18:23 ` Ramkumar Ramachandra 0 siblings, 2 replies; 5+ messages in thread From: Jonathan Nieder @ 2012-08-03 18:17 UTC (permalink / raw) To: Florian Achleitner Cc: git, David Michael Barr, Andrew Sayers, Dmitry Ivankov, Ramkumar Ramachandra, Sam Vilain Hi, Florian Achleitner wrote: > Two approaches: > 1. Import linearly and split later: > One idea is to import from svn linearly, i.e. one revision on top of it's > predecessor, like now, and detect and split branches afterwards. The svn > metadata is stored in git notes, so the required information would be > available. > + allows recovery, because the linear history is always here. > + it's easier to peek around in the git history than in the svn dump during > import to do the branch detection. > - requires creation of new commits in the branch detection stage. > - this results in double commits and awkward history, linear vs. branched. I don't think you've captured the real pros and cons here. + Divides responsibility between a component that fetches and a component that splits branches, making for easier debugging, independent refactoring of components, reuse in other contexts (e.g., splitting out branches in other similar VCSen, etc) - Divides responsibility between a component that fetches and a component that splits branches, which is tricky because it involves designing an interface between them and documenting it. And maybe a different interface would be better. There are also performance and history-clarity ramifications as you've mentioned, but they do not seem as important. Hope that helps, Jonathan > 2. Split during import: ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: GSOC remote-svn: branch detection 2012-08-03 18:17 ` Jonathan Nieder @ 2012-08-04 6:40 ` Dmitry Ivankov 2012-08-04 18:23 ` Ramkumar Ramachandra 1 sibling, 0 replies; 5+ messages in thread From: Dmitry Ivankov @ 2012-08-04 6:40 UTC (permalink / raw) To: Jonathan Nieder Cc: Florian Achleitner, git, David Michael Barr, Andrew Sayers, Ramkumar Ramachandra, Sam Vilain Hi, On Sat, Aug 4, 2012 at 12:17 AM, Jonathan Nieder <jrnieder@gmail.com> wrote: > Hi, > > Florian Achleitner wrote: > >> Two approaches: >> 1. Import linearly and split later: >> One idea is to import from svn linearly, i.e. one revision on top of it's >> predecessor, like now, and detect and split branches afterwards. The svn >> metadata is stored in git notes, so the required information would be >> available. >> + allows recovery, because the linear history is always here. This is a good one, but I'd put questions another way: - do we want to query svn server only for newer revisions even if our settings changed (branch layout ones for example), maybe we don't mind some queries in settings change case (like git-svn.perl)? - do we want to be able to filter svn history early (like take trunk,branches,tags, skip tests_data as it's huge but sometimes there are svn cp to/from it, or maybe the repo has weird permissions or even is corrupted)? - do we just want a completely separate (fast) (local) storage like svn dump file to use it for imports and settings changes? I personally still haven't decided on those. My set of pros/cons: + should be the simplest thing for simple small repos + keeps all the original data details and looks quite robust - becomes complicated if we don't want or can't import some parts of the history. While git-svn.perl somehow handles is. - looks like a thing to store and access svn dump information, do we really want it to be in a form of git objects (almost sure), how stable, flexible, independent from svn helper should it be (that's what Jonathan talks about). Weird idea: what if we keep everything in one huge git tree like rXX/{data,props,copy-from,..}/path/path/path/file. It should represent all the known svn info so far. Ok, I know it's a late stage now and this thing is completely raw, just posting to have it written out somewhere :) >> + it's easier to peek around in the git history than in the svn dump during >> import to do the branch detection. >> - requires creation of new commits in the branch detection stage. >> - this results in double commits and awkward history, linear vs. branched. > > I don't think you've captured the real pros and cons here. > > + Divides responsibility between a component that fetches and a component > that splits branches, making for easier debugging, independent refactoring > of components, reuse in other contexts (e.g., splitting out branches in > other similar VCSen, etc) > > - Divides responsibility between a component that fetches and a component > that splits branches, which is tricky because it involves designing an > interface between them and documenting it. And maybe a different > interface would be better. > > There are also performance and history-clarity ramifications as you've > mentioned, but they do not seem as important. > > Hope that helps, > Jonathan > >> 2. Split during import: ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: GSOC remote-svn: branch detection 2012-08-03 18:17 ` Jonathan Nieder 2012-08-04 6:40 ` Dmitry Ivankov @ 2012-08-04 18:23 ` Ramkumar Ramachandra 2012-08-07 21:26 ` Florian Achleitner 1 sibling, 1 reply; 5+ messages in thread From: Ramkumar Ramachandra @ 2012-08-04 18:23 UTC (permalink / raw) To: Florian Achleitner Cc: Jonathan Nieder, git, David Michael Barr, Andrew Sayers, Dmitry Ivankov, Sam Vilain Hi, Florian Achleitner wrote: > 1. Import linearly and split later: I think this approach will be a lot less messy if you can cleanly separate the fetching component from the mapper. Currently, svndump re-creates the layout of the SVN repository. And the series you posted last week contains a patch that attaches a note with SVN metadata to each commit. Do you have thoughts on how the mapping will take place? Ram ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: GSOC remote-svn: branch detection 2012-08-04 18:23 ` Ramkumar Ramachandra @ 2012-08-07 21:26 ` Florian Achleitner 0 siblings, 0 replies; 5+ messages in thread From: Florian Achleitner @ 2012-08-07 21:26 UTC (permalink / raw) To: Ramkumar Ramachandra Cc: Florian Achleitner, Jonathan Nieder, git, David Michael Barr, Andrew Sayers, Dmitry Ivankov, Sam Vilain On Saturday 04 August 2012 23:53:58 Ramkumar Ramachandra wrote: > Hi, > > Florian Achleitner wrote: > > 1. Import linearly and split later: > I think this approach will be a lot less messy if you can cleanly > separate the fetching component from the mapper. Currently, svndump > re-creates the layout of the SVN repository. And the series you > posted last week contains a patch that attaches a note with SVN > metadata to each commit. Do you have thoughts on how the mapping will > take place? The mapping itself is currently a black box for me, it's internals could be rather complex. It could get a function like is_branch_start, that is called with a node ctx and tells if this is likely to be the start of branch. The detected branches are stored and upcoming changes in the associated directories are mapped to a commit on a branch. The detection of branch starts and the list of existing branches can be taken from whatever logic we want. So that's approx. the idea. Currently I'm working on more basic preparations. I want to split the creation of commits and the creation of blobs in svndump.c. This is necessary because fast import requires a branch name as an argument to the 'commit' command, and currently a 'commit' command is started when a new revision is encountered in the svndump. But to decide on which branch the commit should go, or even if it will be more than one commit, it is necessary to read all the nodes first. To prevent buffering the node content, I want to replace the inline data format (currently used) by 'blob' commands. While parsing the dump, every node change creates a blob command to feed the data immediately into fast-import while the node metadata (struct node_ctx) is stored at least until the revision ends. Then the blobs can be put on a linear master tree and other branch trees. The node metadata could also be read from notes, if remapping branches. That's not so easy to do, because the current implementation mixes tree- operations and blob-operations heavily, and relies on only one global node_ctx. > > Ram Flo ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-08-07 21:26 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-03 9:43 GSOC remote-svn: branch detection Florian Achleitner 2012-08-03 18:17 ` Jonathan Nieder 2012-08-04 6:40 ` Dmitry Ivankov 2012-08-04 18:23 ` Ramkumar Ramachandra 2012-08-07 21:26 ` Florian Achleitner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).