git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tomas Carnecky <tom@dbservice.com>
To: git@vger.kernel.org
Cc: Tomas Carnecky <tom@dbservice.com>
Subject: [PATCH 6/6] Add git-remote-svn
Date: Sun,  3 Oct 2010 14:21:51 +0200	[thread overview]
Message-ID: <1286108511-55876-6-git-send-email-tom@dbservice.com> (raw)
In-Reply-To: <4CA86A12.6080905@dbservice.com>

This is an experimental git remote helper for svn repositories. It uses
the new git fast-import-helper. It only works with local svn repos (not
over network). It uses notes to save the git commit -> svn revision
mapping (refs/notes/svn).

This remote helper serves as a technology preview of what the new type
of remote helpers can do.

It assumes that the svn repo uses the standard layout (trunk, branches).
---
 .gitignore        |    1 +
 Makefile          |    1 +
 git-remote-svn.py |  408 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 410 insertions(+), 0 deletions(-)
 create mode 100644 git-remote-svn.py

diff --git a/.gitignore b/.gitignore
index c8aa8c7..0a6011c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -114,6 +114,7 @@
 /git-remote-ftp
 /git-remote-ftps
 /git-remote-testgit
+/git-remote-svn
 /git-repack
 /git-replace
 /git-repo-config
diff --git a/Makefile b/Makefile
index f8a9c40..eb959c9 100644
--- a/Makefile
+++ b/Makefile
@@ -387,6 +387,7 @@ SCRIPT_PERL += git-send-email.perl
 SCRIPT_PERL += git-svn.perl
 
 SCRIPT_PYTHON += git-remote-testgit.py
+SCRIPT_PYTHON += git-remote-svn.py
 
 SCRIPTS = $(patsubst %.sh,%,$(SCRIPT_SH)) \
 	  $(patsubst %.perl,%,$(SCRIPT_PERL)) \
diff --git a/git-remote-svn.py b/git-remote-svn.py
new file mode 100644
index 0000000..c617743
--- /dev/null
+++ b/git-remote-svn.py
@@ -0,0 +1,408 @@
+#!/usr/bin/env python
+
+import sys, os, re, time, subprocess
+import svn.core, svn.repos, svn.fs
+
+ct_short = ['M', 'A', 'D', 'R', 'X']
+
+############################################################################
+# Class which encapsulates the fast import helper. Provides methods to
+# read/write data to it. 
+class FastImportHelper:
+
+	def start(self):
+		PIPE = subprocess.PIPE
+		args = ['git', 'fast-import-helper']
+		self.helper = subprocess.Popen(args, stdin=PIPE, stdout=PIPE)
+
+	def close(self):
+		self.helper.stdin.close()
+		self.helper.wait()
+		del self.helper
+
+	def write(self, data):
+		self.helper.stdin.write(data)
+
+	# Expect the helper to write the mark mapping to its stdout. Verify that
+	# the mark is the same that we've given and return the git object name
+	def response(self, mark):
+		line = self.helper.stdout.readline().strip().split(' ')
+		assert(str(line[1][1:]) == str(mark))
+
+		return line[2]
+
+	# Make a commit from the given arguments, return the git object name
+	# corresponding to that just created commit
+	def commit(self, mark, ref, parents, author, committer, changes, message):
+		self.write("commit %s\n" % ref)
+		self.write("mark :%s\n" % mark)
+		if author:
+			self.write("author %s %s -0000\n" % author)
+		self.write("committer %s %s -0000\n" % committer)
+		self.write("data %s\n" % len(message))
+		self.write(message)
+
+		parent = parents.pop(0)
+		if parent:
+			self.write("from %s\n" % parent)
+		for parent in parents:
+			self.write("merge %s\n", parent)
+	
+		self.write(''.join(changes))
+
+		# Make it happen
+		self.write("\n")
+		return self.response(mark)
+
+	# Create a blob from the given arguments. 'read' is callable object
+	# which returns data. Return the git object name
+	def blob(self, mark, length, read):
+		self.write("blob\nmark :%s\n" % mark)
+		self.write("data %s\n" % length)
+		
+		while length > 0:
+			avail = min(length, 4096)
+			data = read(avail)
+			err = self.write(data)
+			length -= avail
+
+		# Make it happen
+		self.write("\n")
+		return self.response(mark)
+
+
+
+############################################################################
+# Base class for python remote helpers. It handles the main command loop.
+# This class also manages the fast-import-helper and marks. If you want to
+# use the fih, call self.fih.start() first and after you're done call .close()
+class RemoteHelper(object):
+
+	def __init__(self, kind):
+		self.kind = kind
+		self.fih = FastImportHelper()
+		
+		self.notes = []
+		
+		# nfrom is the current notes commit, we'll need that later when
+		# adding new notes. Check if that ref exists, if not set nfrom
+		# to None, if yes, get the object name and store it in nfrom
+		argv = [ 'git', 'rev-parse', 'refs/notes/%s^0' % kind ]
+		PIPE = subprocess.PIPE
+		proc = subprocess.Popen(argv, stdout=PIPE, stderr=PIPE)
+		proc.wait()
+		if proc.returncode == 0:
+			self.nfrom = proc.stdout.readline().strip()
+		else:
+			self.nfrom = None
+
+	# The commands we understand
+	COMMANDS = ( 'capabilities', 'list', 'fetch', )
+
+	# Read next command. Raise an exception if the command is invalid.
+	# Return a tuple (command, args,)
+	def read_next_command(self):
+		line = sys.stdin.readline()
+		if not line:
+			return ( None, None, )
+	
+		cmdline = line.strip().split()
+		if not cmdline:
+			return ( None, None, )
+
+		cmd = cmdline.pop(0)
+		if cmd not in self.COMMANDS:
+			raise Exception("Invalid command '%s'" % cmd)
+		
+		return ( cmd, cmdline, )
+
+	# Run the remote helper, process commands until the end of the world. Or
+	# until we're told to finish.
+	def run(self):
+		while (True):
+			( cmd, args, ) = self.read_next_command()
+			if cmd is None:
+				return
+
+			func = getattr(self, cmd, None)
+			if func is None or not callable(func):
+				raise Exception("Command '%s' not implemented" % cmd)
+
+			result = func(args)
+			sys.stdout.flush()
+
+
+	# Convenience method for writing data back to git
+	def reply(self, data):
+		sys.stdout.write(data)
+
+	# Return all refs and the contents of the note attached to each.
+	# This can be used by the remote helper to find out what the latest
+	# version is that we fetched into this repo.
+	# Returns list of tuples of (sha1, typename, refname, note,)
+	def refs(self):
+		refs = []
+		
+		PIPE = subprocess.PIPE
+		args = [ 'git', 'for-each-ref' ]
+		gfer = subprocess.Popen(args, stdin=PIPE, stdout=PIPE)
+		
+		# Regular expression for matching the output from g-f-e-r
+		pattern = re.compile(r"(.{40}) (\w+)	(.*)")
+		while (True):
+			line = gfer.stdout.readline()
+			if not line:
+				break 
+
+			match = pattern.match(line)
+			
+			# The sha1 and name of the ref
+			sha1 = match.group(1)
+			typename = match.group(2)
+			refname = match.group(3)
+			
+			# Extract the note using `git notes show <sha>`
+			git_notes_show = [ 'git', 'notes', 'show', sha1 ]
+			
+			# Set GIT_NOTES_REF to point to the notes of our kind
+			env = { "GIT_NOTES_REF": "refs/notes/%s" % self.kind }
+			
+			note = subprocess.Popen(git_notes_show, env=env, stdout=PIPE, stderr=PIPE)
+			refs.append(( sha1, typename, refname, note.stdout.readline() ))
+
+		gfer.wait()
+
+		return refs
+
+	# Attach text to an object. objects are currently limited to commits
+	def note(self, obj, text):
+		self.notes.append(( obj, text, ))
+		if len(self.notes) >= 10:
+			self.flush()
+
+	# Commit all outstanding notes. Don't forget to flush the notes before
+	# you close the fih
+	def flush(self):
+		if len(self.notes) == 0:
+			return
+
+		now = int(time.time())
+		mark = "%s-notes" % self.kind
+		ref = "refs/notes/%s" % self.kind
+		parents = [ self.nfrom ]
+		author = ( 'nobody <nobody@localhost>', now, )
+		committer = ( 'nobody <nobody@localhost>', now, )
+		message = "Update notes"
+
+		changes = []
+		for ( obj, text, ) in self.notes:
+			changes.append("N inline %s\ndata %s\n" % (obj, len(text)))
+			changes.append(text)
+			changes.append("\n")
+		
+		self.nfrom = self.fih.commit(mark, ref, parents, author, committer, changes, message)
+		self.notes = []
+
+
+
+############################################################################
+# Remote helper for Subversion
+class RemoteHelperSubversion(RemoteHelper):
+
+	def __init__(self, url):
+		super(RemoteHelperSubversion, self).__init__("svn")
+
+		url = svn.core.svn_path_canonicalize(url)
+		self.repo = svn.repos.svn_repos_open(url)
+		self.fs = svn.repos.svn_repos_fs(self.repo)
+		self.uuid = svn.fs.svn_fs_get_uuid(self.fs)
+
+
+	# Here follow the commands this helper implements
+	
+	# RH command 'capabilities'
+	def capabilities(self, args):
+		self.reply("list\nfetch\n\n")
+
+	# RH command 'list'
+	def list(self, args):
+		rev = svn.fs.svn_fs_youngest_rev(self.fs)
+		root = svn.fs.svn_fs_revision_root(self.fs, rev)
+
+		refs = self.discover(root)
+		for ( name, rev, ) in refs:
+			self.reply(":r%s %s\n" % ( rev, name, ))
+
+		if len(refs) > 0:
+			self.reply("@%s HEAD\n" % refs[0][0])
+		self.reply("\n")
+
+	# RH command 'fetch'
+	def fetch(self, args):
+		# Start the fast-import helper
+		self.fih.start()
+
+		# Fetches are done in batches. Process fetch lines until we see a
+		# blank newline
+		while args:
+			# The revision to fetch, strip the leading 'r' from 'r42'
+			new = int(args[0][1:])
+			
+			# Trailing slash to ensure that it's a directory
+			prefix = "/%s/" % args[1]
+			
+			( sha1, old, ) = self.parent(args[1])
+			sys.stderr.write("Best parent: %s %s\n" % (old, new,))
+			
+			if old != new:
+				sha1 = self.fi(prefix, old, new, sha1)
+			self.reply("map r%s %s\n" % ( new, sha1 ))
+
+			# Read next line, break if it's a newline (ending this fetch batch)
+			( cmd, args, ) = self.read_next_command()
+			if not cmd:
+				break
+
+		self.flush()
+		self.fih.close()
+		
+		# Before finishing this command, make sure to emit the 'silent'
+		# command to register the notes
+		self.reply("silent refs/notes/%s %s\n" % (self.kind, self.nfrom, ))
+		
+		self.reply("\n")
+
+	# Discover all refs (trunk, braches) in the repository
+	def discover(self, root):
+		refs = []
+		
+		# First check /trunk
+		entries = svn.fs.svn_fs_dir_entries(root, "/")
+		names = entries.keys()
+		
+		if 'trunk' in names:
+			refs.append(( 'trunk', self.rev(root, '/trunk'), ))
+		
+		if 'branches' in names:
+			entries = svn.fs.svn_fs_dir_entries(root, "/branches")
+			names = entries.keys()
+			for name in names:
+				refs.append(( 'branches/'+name, self.rev(root, '/branches/%s' % name), ))
+
+		return refs
+
+	# Get the revision when `path` was last modified
+	def rev(self, root, path):
+		history = svn.fs.svn_fs_node_history(root, path)
+		
+		# Yes, this is required.
+		history = svn.fs.svn_fs_history_prev(history, True)
+		if not history:
+			return 1
+
+		( path, rev, ) = svn.fs.svn_fs_history_location(history)		
+		return rev
+
+
+	# Find the git commit we can use as parent when importing from the
+	# repo with the given prefix. All commits imported from svn
+	# have a note attached which contains this information. But to make our
+	# job easier, we only scan ref heads and not the whole history.
+	# Go through all refs, see which one has a note that matches the given
+	# prefix and extract the svn revision number from the note.
+	# Return a tuple (sha1, rev,) which identifies the git commit and svn
+	# revision.
+	def parent(self, prefix):
+		pattern = re.compile(r"([0-9a-h-]+)/([^@]*)@(\d+)")
+		res = []
+		for ( sha1, typename, name, note, ) in self.refs():
+			if typename != "commit":
+				continue
+
+			match = pattern.match(note)
+			if not match:
+				continue
+
+			if match.group(2) == prefix and match.group(1) == self.uuid:
+				rev = int(match.group(3))
+				res.append(( sha1, rev ))
+
+		if len(res) == 0:
+			return ( None, 1, )
+
+		res.sort(lambda a,b: a[1] < b[1])
+		return res[0]
+
+
+	# Run fast import of revision `old` up to `new`, only considering files
+	# under the given prefix. Use `sha1` as the parent of the first commit.
+	# Return the git commit name that corresponds to the last revision so
+	# we can report it back to git.
+	def fi(self, prefix, old, new, sha1):
+		for rev in xrange(old or 1, new + 1):
+			sha1 = self.feed(rev, prefix, sha1)
+
+		return sha1	
+
+
+	# Feed the fast-import helper with the given revision
+	def feed(self, rev, prefix, sha1):
+		# Open the root at that revision and get the changes
+		root = svn.fs.svn_fs_revision_root(self.fs, rev)
+		changes = svn.fs.svn_fs_paths_changed(root)
+
+		i, file_changes = 1, []
+		for path, change_type in changes.iteritems():
+			if svn.fs.svn_fs_is_dir(root, path):
+				continue
+			if not path.startswith(prefix):
+				continue
+		
+			realpath = path.replace(prefix, '')
+
+			c_t = ct_short[change_type.change_kind]
+			if c_t == 'D':
+				file_changes.append("D %s\n" % realpath)
+			else:
+				file_changes.append("M 644 :%s %s\n" % (i, realpath))
+
+				length = int(svn.fs.svn_fs_file_length(root, path))
+				stream = svn.fs.svn_fs_file_contents(root, path)
+				read = lambda x: svn.core.svn_stream_read(stream, x)
+				self.fih.blob(i, length, read)
+				svn.core.svn_stream_close(stream)
+				i += 1
+
+		if len(file_changes) == 0:
+			return sha1
+
+		props = svn.fs.svn_fs_revision_proplist(self.fs, rev)
+
+		# Collect all the needed information to create the commit
+		mark = str(rev)
+		ref = "refs/heads/master"
+		parents = [ sha1 ]
+
+		svndate = props['svn:date'][0:-8]
+		commit_time = time.mktime(time.strptime(svndate, '%Y-%m-%dT%H:%M:%S'))
+		
+		if props.has_key('svn:author'):
+			author = "%s <%s@localhost>" % (props['svn:author'], props['svn:author'])
+		else:
+			author = 'nobody <nobody@localhost>'
+
+		committer = ( author, int(commit_time), )
+		message = props['svn:log']
+		
+		sha1 = self.fih.commit(mark, ref, parents, None, committer, file_changes, message)
+		
+		note = "%s%s@%s\n" % (svn.fs.svn_fs_get_uuid(self.fs), prefix[:-1], rev)
+		self.note(sha1, note)
+		
+		return sha1
+
+
+
+if __name__ == '__main__':
+	helper = RemoteHelperSubversion(sys.argv[2])
+	helper.run()
-- 
1.7.3.37.gb6088b

  parent reply	other threads:[~2010-10-03 12:22 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-03 11:33 [RFC] New type of remote helpers Tomas Carnecky
2010-10-03 12:21 ` [PATCH 1/6] Remote helper: accept ':<value> <name>' as a response to 'list' Tomas Carnecky
2010-10-05  2:00   ` Jonathan Nieder
2010-10-07 21:17     ` Sverre Rabbelier
2010-10-03 12:21 ` [PATCH 2/6] Allow more than one keepfile in the transport Tomas Carnecky
2010-10-05  2:11   ` Jonathan Nieder
2010-10-03 12:21 ` [PATCH 3/6] Allow the transport fetch command to add additional refs Tomas Carnecky
2010-10-05  2:18   ` Jonathan Nieder
2010-10-03 12:21 ` [PATCH 4/6] Rename get_mode() to decode_tree_mode() and export it Tomas Carnecky
2010-10-05  2:23   ` Jonathan Nieder
2010-10-03 12:21 ` [PATCH 5/6] Introduce the git fast-import-helper Tomas Carnecky
2010-10-03 15:31   ` Jonathan Nieder
2010-10-03 15:45     ` Tomas Carnecky
2010-10-03 15:53       ` Sverre Rabbelier
2010-10-03 17:39         ` Tomas Carnecky
2010-10-03 23:15           ` Sverre Rabbelier
2010-10-03 12:21 ` Tomas Carnecky [this message]
2010-10-05  2:26   ` [PATCH 6/6] Add git-remote-svn Jonathan Nieder
2010-10-03 13:56 ` [RFC] New type of remote helpers Sverre Rabbelier
2010-10-03 15:13   ` Jonathan Nieder
2010-10-03 17:07     ` Ramkumar Ramachandra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1286108511-55876-6-git-send-email-tom@dbservice.com \
    --to=tom@dbservice.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).