git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* challenges using fast-import and svn
@ 2007-07-02 19:26 David Frech
  2007-07-02 22:24 ` Shawn O. Pearce
  0 siblings, 1 reply; 3+ messages in thread
From: David Frech @ 2007-07-02 19:26 UTC (permalink / raw)
  To: git

I have an svn repo containing several small projects that is an odd
"shape" (in terms of directories) because of its history;
git-svnimport doesn't like the directory structure, and I wasn't able
to coax it to work.

I looked around for other options, and discovered fast-import (thanks
Shawn!). I decided that the "easiest" approach would be to parse the
svn dump file and feed the commits into fast-import.

So I wrote, in Lua, a parser for the (terrible) svn dump file format
that feeds commands into fast-import. The parser took a day and a half
to write; the fast-import backend took about an hour. ;-)

However, there are issues. I don't currently track branch copies
correctly, so branches start out with no history, rather than the with
the history of the branch they are copied from; and handling deletes
is tricky.

This last thing is my main "question" to the list, although I'm
curious if anyone else has played with svn dump files, and whether my
approach makes sense.

Here is the problem: if a file or directory is deleted in svn, the
dumpfile shows simply this:

Node-path: trunk/project/file-or-directory
Node-action: delete

In the case of a file, I can simply feed a "D" command to fast-import;
but if I'm deleting a whole directory, my code knows nothing about
what files exist in that directory. Is fast-import smart about this?
Will it barf if given a directory argument rather than a file for "D"
commands?

I could cache the directory contents in my code, but isn't that partly
what fast-import is good for?

Any thoughts are welcome.

Cheers,

- David

-- 
If I have not seen farther, it is because I have stood in the
footsteps of giants.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: challenges using fast-import and svn
  2007-07-02 19:26 challenges using fast-import and svn David Frech
@ 2007-07-02 22:24 ` Shawn O. Pearce
  2007-07-02 23:28   ` David Frech
  0 siblings, 1 reply; 3+ messages in thread
From: Shawn O. Pearce @ 2007-07-02 22:24 UTC (permalink / raw)
  To: David Frech; +Cc: git

David Frech <nimblemachines@gmail.com> wrote:
> So I wrote, in Lua, a parser for the (terrible) svn dump file format
> that feeds commands into fast-import. The parser took a day and a half
> to write; the fast-import backend took about an hour. ;-)

Heh.  That's about what most folks say.  ;-)
 
> However, there are issues. I don't currently track branch copies
> correctly, so branches start out with no history, rather than the with
> the history of the branch they are copied from; and handling deletes
> is tricky.

Branches are easy to create from the right branch in fast-import,
but its hard with the SVN dump file to know where it starts from.

One trick folks have used in the past is to assign a mark in
fast-import for each SVN revision.  Marks are very cheap and make
it easy to reference a commit in a from command when you need to
make a new branch.  You can just use the SVN revision number you
get from the SVN dump file.

> Here is the problem: if a file or directory is deleted in svn, the
> dumpfile shows simply this:
> 
> Node-path: trunk/project/file-or-directory
> Node-action: delete
> 
> In the case of a file, I can simply feed a "D" command to fast-import;
> but if I'm deleting a whole directory, my code knows nothing about
> what files exist in that directory. Is fast-import smart about this?
> Will it barf if given a directory argument rather than a file for "D"
> commands?

I just read the code again.  You can delete an entire subdirectory
just by sending a D command for that subdirectory, assuming you
don't end the name with a '/'.  So you should be able to just do:

  D file-or-directory

and whatever file-or-directory is, it goes away.  If you were to
send a trailing '/':

  D file-or-directory/

its likely bad things will happen because fast-import will try to
remove the file or directory named "" (yes, empty string) in the
subdirectory called "file-or-directory" but leave the subdirectory.


Another option is you can replace a tree with a file at any point in
time, without first deleting it.  So you could also just overwrite
the entire subdirectory with an empty file via the M command, then
delete the file.  But that shouldn't be necessary as the D command
should already do exactly what you want it to do.  Internally
the "replace entire directory with single file" is the same
implementation as the "delete entire directory" implementation...


So I guess this means a documentation update for the D command
would be a good idea?

> I could cache the directory contents in my code, but isn't that partly
> what fast-import is good for?

Yes.  fast-import is really quite good at helping you do Git side
of the equation.  ;-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: challenges using fast-import and svn
  2007-07-02 22:24 ` Shawn O. Pearce
@ 2007-07-02 23:28   ` David Frech
  0 siblings, 0 replies; 3+ messages in thread
From: David Frech @ 2007-07-02 23:28 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

On 7/2/07, Shawn O. Pearce <spearce@spearce.org> wrote:
> David Frech <nimblemachines@gmail.com> wrote:
> > However, there are issues. I don't currently track branch copies
> > correctly, so branches start out with no history, rather than the with
> > the history of the branch they are copied from; and handling deletes
> > is tricky.
>
> Branches are easy to create from the right branch in fast-import,
> but its hard with the SVN dump file to know where it starts from.
>
> One trick folks have used in the past is to assign a mark in
> fast-import for each SVN revision.  Marks are very cheap and make
> it easy to reference a commit in a from command when you need to
> make a new branch.  You can just use the SVN revision number you
> get from the SVN dump file.

I think I know how to do this. I'm already using marks for each commit.


> > Here is the problem: if a file or directory is deleted in svn, the
> > dumpfile shows simply this:
> >
> > Node-path: trunk/project/file-or-directory
> > Node-action: delete
> >
> > In the case of a file, I can simply feed a "D" command to fast-import;
> > but if I'm deleting a whole directory, my code knows nothing about
> > what files exist in that directory. Is fast-import smart about this?
> > Will it barf if given a directory argument rather than a file for "D"
> > commands?
>
> I just read the code again.  You can delete an entire subdirectory
> just by sending a D command for that subdirectory, assuming you
> don't end the name with a '/'.  So you should be able to just do:
>
>   D file-or-directory
>
> and whatever file-or-directory is, it goes away.  If you were to
> send a trailing '/':
>
>   D file-or-directory/
>
> its likely bad things will happen because fast-import will try to
> remove the file or directory named "" (yes, empty string) in the
> subdirectory called "file-or-directory" but leave the subdirectory.

This is great! I'll update the code and see what happens...


> So I guess this means a documentation update for the D command
> would be a good idea?

Sounds good to me. Right now it really implies "this only works on files".

Thanks for the snappy reply, and thanks again for writing fast-import!
It was a pleasure to use.

> --
> Shawn.
>

- David

-- 
If I have not seen farther, it is because I have stood in the
footsteps of giants.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-07-02 23:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-02 19:26 challenges using fast-import and svn David Frech
2007-07-02 22:24 ` Shawn O. Pearce
2007-07-02 23:28   ` David Frech

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).