git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* native-git-svn: A Summer of Code 2010 proposal
@ 2010-03-19 17:18 Ramkumar Ramachandra
  2010-03-19 18:32 ` Avery Pennarun
  0 siblings, 1 reply; 39+ messages in thread
From: Ramkumar Ramachandra @ 2010-03-19 17:18 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Sverre Rabbelier

Hi,

I picked up a project I liked from the Wiki
[https://git.wiki.kernel.org/index.php/SoC2010Ideas#A_remote_helper_for_svn]
and discussed it with Sverre. I now have a preliminary draft of my
proposal ready, and I'd really appreciate feedback.

=====================================
Project Proposal: native-git-svn | Native SVN support in Git

== The Outline ==
Currently, git-svn.perl is used to interface with SVN repositories.
However, it has serious shortcomings:
1. It is essentially an arcane 5000-line Perl script that doesn't use
git-fast-import/ git-fast-export. It converts an SVN repository to a
Git repository by hand. This makes it virtually unmaintainable.
2. Its UI is unnecessarily complex. git-svn-* has some commands
corresponding to git-* commands, and it can be quite difficult for the
user to understand which one to use in different situations. These can
be merged easily.
3. It handles the standard trunk/branches/tags layout well, but it
doesn't know how to handle non-standard/ changing SVN layout.
4. There's an array of other annoyances which makes it quite
imperfect. For example, it ignores all SVN properties except
svn:executable.

While many of these problems can be tackled in git-svn.perl itself,
problem 1 is the most prominent. git-svn.perl is very difficult to
modify or even maintain. A more permanent solution is required.

My proposal is to start from scratch and build an application that
makes dealing SVN repositories very easy. The plan is to build
component-wise, in a modular manner. The project can be considered
fully successful only after the functionality described in all the
components have been written, and the project is merged into upstream.
It will involve minimal changes to the current Git codebase, if any at
all. I additionally hope that this project will serve as a roadmap for
other projects that involve natively supporting other versioning
systems in Git.

== The Technicalities ==
The distinct components I plan to write are:
1. An SVN client that uses libsvn to fetch/ push revisions to a remote
SVN repository.
2. An exporter for SVN repositories, which will extract all the
relevant revision history and metadata to import into Git.
3. A remote helper for Git that takes the data from this SVN exporter,
and uses git-fast-import to create corresponding commits in Git.
4. Another remote helper to export commit data and metadata from Git
to import into SVN.
5. An importer for SVN, which will create revisions in SVN
corresponding to commits in Git.
6. A UI that glues all the components together into one large
consistent interface.

Due to a licensing conflict, the details of which can be found here
[1], native-git-svn will link to libsvn, but will NOT link to Git. It
will simply use a thin wrapper to call compiled Git executables
(referred to as remote helper in article). The six components will be
developed and tested independently.

The following resources are relevant to the project:
1. git_remote_helpers/git/git.py is a minimalistic remote helper
written by Sverre. I plan to extend this as much as possible before
rewriting it in C.
2. libsvn contains excellent documentation and clear examples to
create the SVN client.
3. git-svn.perl has a lot functionality that I plan to re-implement in
native-git-svn:
   3.1 parse_svn_date: Given a date (in UTC) from Subversion, return a
string in the format "<TZ Offset> <local date/time>" that Git will use
   3.2 load_authors: <svn username> = real-name <email address>
mapping based on git-svnimport
   3.3 do_git_init_db: Create and maintain svn-remotes
   3.4 get_commit_entry: Parse commit messages, and encode them; SVN
requires messages to be UTF-8 when entering the repo
   3.5 cmd_branch: Handle branching/ tagging
   3.6 cmd_create_ignore: Reads svn:ignore and puts the information
into .gitignore
4. There are several existing third-party SVN exporters worth looking into [2].

I've additionally discussed the project with Sverre Rabbelier at
length over email.

== Who am I? ==
I'm Ramkumar, a student at the Indian Institute of Technology,
Kharagpur. I haven't contributed more than a few small patches to Git
[3], and I look at this project as a fantastic opportunity to get more
involved with the community. In the summer and winter of 2008, I
worked with a Django-based startup. The team comprised of three
experienced Python developers, one designer to steer the project, and
an undergraduate student- me. We versioned everything on Git, deployed
on Apache/ PostgreSQL, using Amazon S3 for static content. While
working with the startup, I also contributed to South, a migration
framework for Django. A lot more about this is mentioned on my resume
[4].

C, C++ [5], and Python are my strongest languages. I've additionally
learnt Common Lisp through an Emacs Lisp application I wrote in summer
2009 [6]. I'm known to be very communicative, both in person, and over
email/ chat. The style and clarity of my communication is seen in the
slides I used at FOSS.IN/2009 in winter 2009 [7].

== Notes ==
[1] http://thread.gmane.org/gmane.comp.version-control.git/139545
[2] svn-all-fast-export | git://repo.or.cz/svn-all-fast-export.git and
fast-export | git://repo.or.cz/fast-export.git
[3] 52eb5173ac and 88d50e78c3
[4] TODO
[5] On a related note, I've also contributed a little to Chromium
[6] http://github.com/artagnon/ublog.el
[7] http://artagnon.com/wp-content/uploads/haskell-internals.pdf and
http://artagnon.com/wp-content/uploads/unladen-swallow.pdf
=====================================

Thanks!

Regards,
Ramkumar

^ permalink raw reply	[flat|nested] 39+ messages in thread
* Re: native-git-svn: A Summer of Code 2010 proposal
@ 2010-03-27  5:40 Steven Michalske
  2010-03-27  6:46 ` Ramkumar Ramachandra
  2010-03-27  9:19 ` Eric Raymond
  0 siblings, 2 replies; 39+ messages in thread
From: Steven Michalske @ 2010-03-27  5:40 UTC (permalink / raw)
  To: git

Ramkumar Ramachandra <artagnon <at> gmail.com> writes:
---8<------
> The following resources are relevant to the project:
> 1. git_remote_helpers/git/git.py is a minimalistic remote helper
> written by Sverre. I plan to extend this as much as possible before
> rewriting it in C.

Would cython meet the needs of increasing the speed of the python code
without requiring a rewrite?

> 2. libsvn contains excellent documentation and clear examples to
> create the SVN client.
> 3. git-svn.perl has a lot functionality that I plan to re-implement in
> native-git-svn:
>    3.1 parse_svn_date: Given a date (in UTC) from Subversion, return a
> string in the format "<TZ Offset> <local date/time>" that Git will use
>    3.2 load_authors: <svn username> = real-name <email address>
> mapping based on git-svnimport

One feature that I would like to see is a way to call an application for a name
lookup author file maintenance.

i.e. if the SVN authors file is missing the lookup,  call the lookup tool.

I work at a company with a LDAP server that I can look up the svn username to
get real name and email address.  This way I don't have to manually maintain a
svn authors file.

This is really a remote helper component, not just SVN

>    3.3 do_git_init_db: Create and maintain svn-remotes
>    3.4 get_commit_entry: Parse commit messages, and encode them; SVN
> requires messages to be UTF-8 when entering the repo
>    3.5 cmd_branch: Handle branching/ tagging

I'm torn on how the current system handles this,  I like all tags to
be tags, and
that if a tag had a branch like behavior (bad SVN users!), that a branch exists
for it, with the tag pointing to its branches head.

>    3.6 cmd_create_ignore: Reads svn:ignore and puts the information
> into .gitignore
> 4. There are several existing third-party SVN exporters worth looking into [2].
-----8<------

A couple of side thoughts.

--
Support for SVN's blank folders.  Some of the old build systems I have used
need the blank folders, so I have to create to make the build work :-(
can't use
git-bisect easily.  Well it's that i have to make the bisect run
script make the
needed folders, not too hard, but annoying.  Could we track if in a particular
SVN revision we had a blank folder that was either created or removed.  Stuff a
hidden file '.git_svn_empty_folder' or a .gitignore with a * in it so
git can then
track the SVN's empty folder, and if the SVN folder gets contents the
.git_ignore
needs the ignore removed?

--
One of my SVN repositories using the current system fails to import that
repository is missing a revision in its SVN history.  In other words
the SVN repo
has corrupted history the current git-svn will fail to import the repository.

Example:  R31 of the SVN repo is this status, R32 fails to checkout
due to a SVN
error, but R33 will checkout and is valid.  I would like to see the
helper pause
and ask me what to do.  Either fail or skip that revision.  It's a
shame the history
is gone, but I now have to tell the current git-svn to do a shallow clone and
start at R33 and I loose all of commits R1 to R31, This leads to branches that
have no known roots......  My case is this happened roughly at
revision 1700, the
server's hard drive crashed and restore was done with a backup that was at a
revision around 1500 so there is a big gap..... of lost history, but
that history is
5-6 years old so the daily backups to reconstruct it are LONG gone.
Too bad we didn't have git back then, could have restored all the
history with a
push! ;-)

If you want me to test your work on a hairy repository with corrupt history and
thousands of branches, I'll do that for you.


--
In that same corrupt repository, each branch has a large PDF that NEVER
changes.  This makes me think that it might make exports faster if I could tell
the SVN client that a file is static, and to only track if it gets
removed or size
changes,  don't know if libsvn would let you do that....  You might
even be able
to detect that kind of condition, large unchanging binary like files,
 might make a great bandwidth/speed optimization.



Sorry if I didn't see these points brought up in other emails on your
proposal.
But working at a company with lots of history in SVN makes me passionate
about the SVN integration in git :-)


Good luck!
Steve

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2010-03-29 20:05 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-19 17:18 native-git-svn: A Summer of Code 2010 proposal Ramkumar Ramachandra
2010-03-19 18:32 ` Avery Pennarun
2010-03-19 18:39   ` Sverre Rabbelier
2010-03-19 21:30     ` Avery Pennarun
2010-03-20  9:19       ` Ramkumar Ramachandra
2010-03-20 10:48       ` Johannes Schindelin
2010-03-20 20:34         ` Ramkumar Ramachandra
2010-03-20 20:55           ` Ramkumar Ramachandra
2010-03-20 21:04           ` Jonathan Nieder
2010-03-21 10:26             ` Johannes Schindelin
2010-03-21 11:08               ` Jonathan Nieder
2010-03-21 11:47                 ` Johannes Schindelin
2010-03-21 12:25                   ` Ramkumar Ramachandra
2010-03-21 12:31                     ` Johannes Schindelin
2010-03-21 12:36                     ` Sverre Rabbelier
2010-03-21 17:58                     ` Jonathan Nieder
2010-03-22  0:33                     ` Daniel Barkalow
2010-03-22  2:41                       ` Christian Couder
2010-03-22  3:49                         ` Ramkumar Ramachandra
2010-03-22 11:33                           ` Johannes Schindelin
     [not found]                             ` <f3271551003220643j3a726d09o2d3a078292fd8bf6@mail.gmail.com>
2010-03-22 19:52                               ` Johannes Schindelin
2010-03-23  7:49                                 ` Ramkumar Ramachandra
2010-03-21 16:43                   ` Best example of GSoC student participation (was: Re: native-git-svn: A Summer of Code 2010 proposal) Jakub Narebski
2010-03-21 17:27                     ` Best example of GSoC student participation Johannes Schindelin
2010-03-20 21:58           ` native-git-svn: A Summer of Code 2010 proposal Daniel Barkalow
2010-03-20 22:19             ` Ramkumar Ramachandra
2010-03-21  5:36             ` Ramkumar Ramachandra
2010-03-21 22:56               ` Daniel Barkalow
2010-03-21 17:08             ` Ilari Liusvaara
2010-03-21  7:40           ` Peter Baumann
2010-03-21 23:51       ` Dave Olszewski
2010-03-19 20:53   ` Jonathan Nieder
2010-03-19 21:00     ` Johannes Schindelin
  -- strict thread matches above, loose matches on Subject: below --
2010-03-27  5:40 Steven Michalske
2010-03-27  6:46 ` Ramkumar Ramachandra
2010-03-27  8:03   ` Steven Michalske
2010-03-27  9:19 ` Eric Raymond
     [not found]   ` <f3271551003280225v17af30d4s6d3d24b4d548ff7d@mail.gmail.com>
2010-03-28 12:10     ` Eric Raymond
2010-03-29 20:04       ` Ramkumar Ramachandra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).