git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Merge David's SVN exporter into git.git
@ 2010-06-04 13:26 Ramkumar Ramachandra
  2010-06-04 13:26 ` [PATCH 1/6] Add memory pool library Ramkumar Ramachandra
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Ramkumar Ramachandra @ 2010-06-04 13:26 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier,
	Junio C Hamano

Hi,

This is another attempt to merge David's SVN exporter into
git.git. What changed since last time: David implemented incremental
dump support, and I fixed certain things for the merge, as suggested
by Jonathan Nieder. Preparing patches for the list eats up a lot of my
time, and if this batch is more-or-less okay, I'd like it to be merged
atleast into `pu`: we can squash in minor fixes later. The exporter is
functionally complete and validated against ~940k revisions of the ASF
repository.

You can see the complete revision history in my `git-merge` branch of
my fork of svn-dump-fast-export [1].

The issue of authorship has already been discussed, but what exactly I
should do isn't very clear to me- in my opinion the author of all six
patches should be hand-edited to:
David Barr <david.barr@gmail.com>

[1]: http://github.com/artagnon/svn-dump-fast-export

Ramkumar Ramachandra (6):
  Add memory pool library
  Add cpp macro implementation of treaps
  Add library for string-specific memory pool
  Add stream helper library
  Add infrastructure to write revisions in fast-export format
  Add SVN dump parser

 vcs-svn/fast_export.c |   69 ++++++++++
 vcs-svn/fast_export.h |   14 ++
 vcs-svn/line_buffer.c |  129 ++++++++++++++++++
 vcs-svn/line_buffer.h |   14 ++
 vcs-svn/obj_pool.h    |   98 ++++++++++++++
 vcs-svn/repo_tree.c   |  353 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/repo_tree.h   |   27 ++++
 vcs-svn/string_pool.c |  110 +++++++++++++++
 vcs-svn/string_pool.h |   14 ++
 vcs-svn/svndump.c     |  294 ++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndump.h     |    7 +
 vcs-svn/trp.h         |  118 ++++++++++++++++
 vcs-svn/trp.txt       |   62 +++++++++
 13 files changed, 1309 insertions(+), 0 deletions(-)
 create mode 100644 vcs-svn/fast_export.c
 create mode 100644 vcs-svn/fast_export.h
 create mode 100644 vcs-svn/line_buffer.c
 create mode 100644 vcs-svn/line_buffer.h
 create mode 100644 vcs-svn/obj_pool.h
 create mode 100644 vcs-svn/repo_tree.c
 create mode 100644 vcs-svn/repo_tree.h
 create mode 100644 vcs-svn/string_pool.c
 create mode 100644 vcs-svn/string_pool.h
 create mode 100644 vcs-svn/svndump.c
 create mode 100644 vcs-svn/svndump.h
 create mode 100644 vcs-svn/trp.h
 create mode 100644 vcs-svn/trp.txt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/6] Add memory pool library
  2010-06-04 13:26 [PATCH 0/6] Merge David's SVN exporter into git.git Ramkumar Ramachandra
@ 2010-06-04 13:26 ` Ramkumar Ramachandra
  2010-06-04 13:26 ` [PATCH 2/6] Add cpp macro implementation of treaps Ramkumar Ramachandra
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ramkumar Ramachandra @ 2010-06-04 13:26 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier,
	Junio C Hamano

Add a memory pool library implemented using cpp macros. The library
provides macros that can be used to create a type-specific memory pool
API.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 vcs-svn/obj_pool.h |   98 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 98 insertions(+), 0 deletions(-)
 create mode 100644 vcs-svn/obj_pool.h

diff --git a/vcs-svn/obj_pool.h b/vcs-svn/obj_pool.h
new file mode 100644
index 0000000..84c8321
--- /dev/null
+++ b/vcs-svn/obj_pool.h
@@ -0,1 +1,98 @@
+#ifndef OBJ_POOL_H_
+#define OBJ_POOL_H_
+
+#include "git-compat-util.h"
+
+/*
+ * The obj_pool_gen() macro generates a type-specific memory pool
+ * implementation.
+ *
+ * Arguments:
+ *
+ *   pre              : Prefix for generated functions (ex: string_).
+ *   obj_t            : Type for treap data structure (ex: char).
+ *   intial_capacity  : The initial size of the memory pool (ex: 4096).
+ *
+ */
+#define obj_pool_gen(pre, obj_t, initial_capacity) \
+static struct { \
+	uint32_t size; \
+	uint32_t capacity; \
+	obj_t *base; \
+	FILE *file; \
+} pre##_pool = { 0, 0, NULL, NULL}; \
+static void pre##_init(void) \
+{ \
+	struct stat st; \
+	size_t ps = sysconf (_SC_PAGESIZE); \
+	/* Touch binary file before opening read/write */ \
+	pre##_pool.file = fopen(#pre ".bin", "a"); \
+	fclose(pre##_pool.file); \
+	/* Open, check size, compute capacity */ \
+	pre##_pool.file = fopen(#pre ".bin", "r+"); \
+	fstat(fileno(pre##_pool.file), &st); \
+	pre##_pool.size = st.st_size / sizeof(obj_t); \
+	pre##_pool.capacity = ((st.st_size + ps - 1) & ~(ps - 1)) / sizeof(obj_t); \
+	if (pre##_pool.capacity < initial_capacity) \
+		pre##_pool.capacity = initial_capacity; \
+	/* Truncate to calculated capacity and map to VM */ \
+	ftruncate(fileno(pre##_pool.file), pre##_pool.capacity * sizeof(obj_t)); \
+	pre##_pool.base = mmap(0, pre##_pool.capacity * sizeof(obj_t), \
+				PROT_READ | PROT_WRITE, MAP_SHARED, \
+				fileno(pre##_pool.file), 0); \
+} \
+static uint32_t pre##_alloc(uint32_t count) \
+{ \
+	uint32_t offset; \
+	if (pre##_pool.size + count > pre##_pool.capacity) { \
+		if (NULL == pre##_pool.base) \
+			pre##_init(); \
+		fsync(fileno(pre##_pool.file)); \
+		munmap(pre##_pool.base, \
+			pre##_pool.capacity * sizeof(obj_t)); \
+		pre##_pool.base = NULL; \
+		while (pre##_pool.size + count > pre##_pool.capacity) \
+			if (pre##_pool.capacity) \
+				pre##_pool.capacity *= 2; \
+			else \
+				pre##_pool.capacity = initial_capacity; \
+		ftruncate(fileno(pre##_pool.file), \
+				pre##_pool.capacity * sizeof(obj_t)); \
+		pre##_pool.base = \
+			mmap(0, pre##_pool.capacity * sizeof(obj_t), \
+				PROT_READ | PROT_WRITE, MAP_SHARED, \
+				fileno(pre##_pool.file), 0); \
+	} \
+	offset = pre##_pool.size; \
+	pre##_pool.size += count; \
+	return offset; \
+} \
+static void pre##_free(uint32_t count) \
+{ \
+	pre##_pool.size -= count; \
+} \
+static uint32_t pre##_offset(obj_t *obj) \
+{ \
+	return obj == NULL ? ~0 : obj - pre##_pool.base; \
+} \
+static obj_t *pre##_pointer(uint32_t offset) \
+{ \
+	return offset >= pre##_pool.size ? NULL : &pre##_pool.base[offset]; \
+} \
+static void pre##_reset(void) \
+{ \
+	if (pre##_pool.base) { \
+		fsync(fileno(pre##_pool.file)); \
+		munmap(pre##_pool.base, \
+			pre##_pool.capacity * sizeof(obj_t)); \
+		ftruncate(fileno(pre##_pool.file), \
+				pre##_pool.size * sizeof(obj_t)); \
+		fclose(pre##_pool.file); \
+	} \
+	pre##_pool.base = NULL; \
+	pre##_pool.size = 0; \
+	pre##_pool.capacity = 0; \
+	pre##_pool.file = NULL; \
+}
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/6] Add cpp macro implementation of treaps
  2010-06-04 13:26 [PATCH 0/6] Merge David's SVN exporter into git.git Ramkumar Ramachandra
  2010-06-04 13:26 ` [PATCH 1/6] Add memory pool library Ramkumar Ramachandra
@ 2010-06-04 13:26 ` Ramkumar Ramachandra
  2010-06-04 13:26 ` [PATCH 3/6] Add library for string-specific memory pool Ramkumar Ramachandra
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ramkumar Ramachandra @ 2010-06-04 13:26 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier,
	Junio C Hamano

The implementation exposes an API to generate type-specific treap
implmentation and various functions to operate on it. It uses
obj_pool.h to store memory nodes in a treap.

Treaps provide a memory-efficient binary search tree structure.
Insertion/deletion/search are about as about as fast in the average
case as red-black trees and the chances of worst-case behavior are
vanishingly small, thanks to (pseudo-)randomness.  That is a small
price to pay, given that treaps are much simpler to implement.

[db: Altered to reference nodes by offset from a common base pointer]
[db: Bob Jenkins' hashing implementation dropped for Knuth's]
[db: Methods unnecessary for search and insert dropped]

From: Jason Evans <jasone@canonware.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 vcs-svn/trp.h   |  118 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/trp.txt |   62 +++++++++++++++++++++++++++++
 2 files changed, 180 insertions(+), 0 deletions(-)
 create mode 100644 vcs-svn/trp.h
 create mode 100644 vcs-svn/trp.txt

diff --git a/vcs-svn/trp.h b/vcs-svn/trp.h
new file mode 100644
index 0000000..c90f5c3
--- /dev/null
+++ b/vcs-svn/trp.h
@@ -0,0 +1,118 @@
+/*
+ * cpp macro implementation of treaps.
+ *
+ * Usage:
+ *   #include <stdint.h>
+ *   #include <trp.h>
+ *   trp_gen(...)
+ */
+
+#ifndef TRP_H_
+#define TRP_H_
+
+/* Node structure. */
+struct trp_node {
+	uint32_t trpn_left;
+	uint32_t trpn_right;
+};
+
+/* Root structure. */
+struct trp_root {
+	uint32_t trp_root;
+};
+
+/* Pointer/Offset conversion */
+#define trpn_pointer(a_base, a_offset) (a_base##_pointer(a_offset))
+#define trpn_offset(a_base, a_pointer) (a_base##_offset(a_pointer))
+
+/* Left accessors. */
+#define trp_left_get(a_base, a_field, a_node) \
+	trpn_pointer(a_base, (a_node)->a_field.trpn_left)
+#define trp_left_set(a_base, a_field, a_node, a_left) \
+	(a_node)->a_field.trpn_left = trpn_offset(a_base, a_left)
+
+/* Right accessors. */
+#define trp_right_get(a_base, a_field, a_node) \
+	trpn_pointer(a_base, (a_node)->a_field.trpn_right)
+#define trp_right_set(a_base, a_field, a_node, a_right) \
+	(a_node)->a_field.trpn_right = trpn_offset(a_base, a_right)
+
+/* Priority accessors. */
+#define KNUTH_GOLDEN_RATIO_32BIT 2654435761u
+#define trp_prio_get(a_node) \
+	(KNUTH_GOLDEN_RATIO_32BIT*(uint32_t)(uintptr_t)(a_node))
+
+/* Node initializer. */
+#define trp_node_new(a_base, a_field, a_node) \
+	trp_left_set(a_base, a_field, (a_node), NULL); \
+	trp_right_set(a_base, a_field, (a_node), NULL)
+
+/* Internal utility macros. */
+#define trpn_rotate_left(a_base, a_field, a_node, r_node) \
+	do { (r_node) = trp_right_get(a_base, a_field, (a_node)); \
+	trp_right_set(a_base, a_field, (a_node), \
+		trp_left_get(a_base, a_field, (r_node))); \
+	trp_left_set(a_base, a_field, (r_node), (a_node)); } while(0)
+
+#define trpn_rotate_right(a_base, a_field, a_node, r_node) \
+	do { (r_node) = trp_left_get(a_base, a_field, (a_node)); \
+	trp_left_set(a_base, a_field, (a_node), \
+		trp_right_get(a_base, a_field, (r_node))); \
+	trp_right_set(a_base, a_field, (r_node), (a_node)); } while(0)
+
+#define trp_gen(a_attr, a_pre, a_type, a_field, a_base, a_cmp) \
+a_attr a_type *a_pre##psearch(struct trp_root *treap, a_type *key) \
+{ \
+	a_type *ret; \
+	a_type *tnode = trpn_pointer(a_base, treap->trp_root); \
+	ret = NULL; \
+	while (tnode != NULL) { \
+		int cmp = (a_cmp)(key, tnode); \
+		if (cmp < 0) \
+			tnode = trp_left_get(a_base, a_field, tnode); \
+		else if (cmp > 0) { \
+			ret = tnode; \
+			tnode = trp_right_get(a_base, a_field, tnode); \
+		} else { \
+			ret = tnode; \
+			break; \
+		} \
+	} \
+	return (ret); \
+} \
+a_attr a_type *a_pre##insert_recurse(a_type *cur_node, a_type *ins_node) \
+{ \
+	if (cur_node == NULL) \
+		return (ins_node); \
+	else { \
+		a_type *ret; \
+		int cmp = a_cmp(ins_node, cur_node); \
+		if (cmp < 0) { \
+			a_type *left = a_pre##insert_recurse( \
+				trp_left_get(a_base, a_field, cur_node), ins_node); \
+			trp_left_set(a_base, a_field, cur_node, left); \
+			if (trp_prio_get(left) < trp_prio_get(cur_node)) \
+				trpn_rotate_right(a_base, a_field, cur_node, ret); \
+			else \
+				ret = cur_node; \
+		} else { \
+			a_type *right = a_pre##insert_recurse( \
+				trp_right_get(a_base, a_field, cur_node), ins_node); \
+			trp_right_set(a_base, a_field, cur_node, right); \
+			if (trp_prio_get(right) < trp_prio_get(cur_node)) \
+				trpn_rotate_left(a_base, a_field, cur_node, ret); \
+			else \
+				ret = cur_node; \
+		} \
+		return (ret); \
+	} \
+} \
+a_attr void a_pre##insert(struct trp_root *treap, a_type *node) \
+{ \
+	trp_node_new(a_base, a_field, node); \
+	treap->trp_root = trpn_offset(a_base, a_pre##insert_recurse( \
+					      trpn_pointer(a_base, treap->trp_root), \
+					      node)); \
+}
+
+#endif
diff --git a/vcs-svn/trp.txt b/vcs-svn/trp.txt
new file mode 100644
index 0000000..7cf9b40
--- /dev/null
+++ b/vcs-svn/trp.txt
@@ -0,1 +1,61 @@
+TODO: Update this documentation to match the changes to trp.h
+
+The trp_gen() macro generates a type-specific treap implementation,
+based on the above cpp macros.
+
+Arguments:
+
+  a_attr     : Function attribute for generated functions (ex: static).
+  a_pre      : Prefix for generated functions (ex: treap_).
+  a_t_type   : Type for treap data structure (ex: treap_t).
+  a_type     : Type for treap node data structure (ex: treap_node_t).
+  a_field    : Name of treap node linkage (ex: treap_link).
+  a_base     : Expression for the base pointer from which nodes are offset.
+  a_cmp      : Node comparison function name, with the following prototype:
+                 int (a_cmp *)(a_type *a_node, a_type *a_other);
+                                       ^^^^^^
+                                    or a_key
+               Interpretation of comparision function return values:
+                 -1 : a_node <  a_other
+                  0 : a_node == a_other
+                  1 : a_node >  a_other
+               In all cases, the a_node or a_key macro argument is the first
+               argument to the comparison function, which makes it possible
+               to write comparison functions that treat the first argument
+               specially.
+
+Assuming the following setup:
+
+  typedef struct ex_node_s ex_node_t;
+  struct ex_node_s {
+      trp_node(ex_node_t) ex_link;
+  };
+  typedef trp(ex_node_t) ex_t;
+  static ex_node_t ex_base[MAX_NODES];
+  trp_gen(static, ex_, ex_t, ex_node_t, ex_link, ex_base, ex_cmp)
+
+The following API is generated:
+
+  static void
+  ex_new(ex_t *treap);
+      Description: Initialize a treap structure.
+      Args:
+        treap: Pointer to an uninitialized treap object.
+
+  static ex_node_t *
+  ex_psearch(ex_t *treap, ex_node_t *key);
+      Description: Search for node that matches key.  If no match is found,
+                   return what would be key's successor/predecessor, were
+                   key in treap.
+      Args:
+        treap: Pointer to a initialized treap object.
+        key  : Search key.
+      Ret: Node in treap that matches key, or if no match, hypothetical
+           node's successor/predecessor (NULL if no successor/predecessor).
+
+  static void
+  ex_insert(ex_t *treap, ex_node_t *node);
+      Description: Insert node into treap.
+      Args:
+        treap: Pointer to a initialized treap object.
+        node : Node to be inserted into treap.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/6] Add library for string-specific memory pool
  2010-06-04 13:26 [PATCH 0/6] Merge David's SVN exporter into git.git Ramkumar Ramachandra
  2010-06-04 13:26 ` [PATCH 1/6] Add memory pool library Ramkumar Ramachandra
  2010-06-04 13:26 ` [PATCH 2/6] Add cpp macro implementation of treaps Ramkumar Ramachandra
@ 2010-06-04 13:26 ` Ramkumar Ramachandra
  2010-06-04 13:26 ` [PATCH 4/6] Add stream helper library Ramkumar Ramachandra
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ramkumar Ramachandra @ 2010-06-04 13:26 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier,
	Junio C Hamano

This library uses the macros in the obj_pool.h and trp.h to create a
memory pool for strings and expose an API for handling them.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 vcs-svn/string_pool.c |  110 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/string_pool.h |   14 ++++++
 2 files changed, 124 insertions(+), 0 deletions(-)
 create mode 100644 vcs-svn/string_pool.c
 create mode 100644 vcs-svn/string_pool.h

diff --git a/vcs-svn/string_pool.c b/vcs-svn/string_pool.c
new file mode 100644
index 0000000..cfcf127
--- /dev/null
+++ b/vcs-svn/string_pool.c
@@ -0,0 +1,110 @@
+#include "git-compat-util.h"
+
+#include "trp.h"
+#include "obj_pool.h"
+#include "string_pool.h"
+
+typedef struct node_s node_t;
+static struct trp_root tree = { ~0 };
+
+struct node_s {
+	uint32_t offset;
+	struct trp_node children;
+};
+
+/* Create two memory pools: one for node_t, and another for strings */
+obj_pool_gen(node, node_t, 4096);
+obj_pool_gen(string, char, 4096);
+
+static char *node_value(node_t *node)
+{
+	return node ? string_pointer(node->offset) : NULL;
+}
+
+static int node_value_cmp(node_t *a, node_t *b)
+{
+	return strcmp(node_value(a), node_value(b));
+}
+
+static int node_indentity_cmp(node_t *a, node_t *b)
+{
+	int r = node_value_cmp(a, b);
+	return r ? r : (((uintptr_t) a) > ((uintptr_t) b))
+		- (((uintptr_t) a) < ((uintptr_t) b));
+}
+
+/* Build a Treap from the node_s structure (a trp_node w/ offset) */
+trp_gen(static, tree_, node_t, children, node, node_indentity_cmp);
+
+char *pool_fetch(uint32_t entry)
+{
+	return node_value(node_pointer(entry));
+}
+
+uint32_t pool_intern(char *key)
+{
+	/* Canonicalize key */
+	node_t *match = NULL;
+	uint32_t key_len;
+	if (key == NULL)
+		return ~0;
+	key_len = strlen(key) + 1;
+	node_t *node = node_pointer(node_alloc(1));
+	node->offset = string_alloc(key_len);
+	strcpy(node_value(node), key);
+	match = tree_psearch(&tree, node);
+	if (!match || node_value_cmp(node, match)) {
+		tree_insert(&tree, node);
+	} else {
+		node_free(1);
+		string_free(key_len);
+		node = match;
+	}
+	return node_offset(node);
+}
+
+uint32_t pool_tok_r(char *str, const char *delim, char **saveptr)
+{
+	char *token = strtok_r(str, delim, saveptr);
+	return token ? pool_intern(token) : ~0;
+}
+
+void pool_print_seq(uint32_t len, uint32_t *seq, char delim, FILE *stream)
+{
+	uint32_t i;
+	for (i = 0; i < len && ~seq[i]; i++) {
+		fputs(pool_fetch(seq[i]), stream);
+		if (i < len - 1 && ~seq[i + 1])
+			fputc(delim, stream);
+	}
+}
+
+uint32_t pool_tok_seq(uint32_t max, uint32_t *seq, char *delim, char *str)
+{
+	char *context = NULL;
+	uint32_t length = 0, token = str ? pool_tok_r(str, delim, &context) : ~0;
+	while (length < max) {
+		seq[length++] = token;
+		if (token == ~0)
+			break;
+		token = pool_tok_r(NULL, delim, &context);
+	}
+	seq[length ? length - 1 : 0] = ~0;
+	return length;
+}
+
+void pool_init(void)
+{
+	uint32_t node;
+	node_init();
+	string_init();
+	for (node = 0; node < node_pool.size; node++) {
+		tree_insert(&tree, node_pointer(node));
+	}
+}
+
+void pool_reset(void)
+{
+	node_reset();
+	string_reset();
+}
diff --git a/vcs-svn/string_pool.h b/vcs-svn/string_pool.h
new file mode 100644
index 0000000..e2cc447
--- /dev/null
+++ b/vcs-svn/string_pool.h
@@ -0,0 +1,14 @@
+#ifndef STRING_POOL_H_
+#define	STRING_POOL_H_
+
+#include "git-compat-util.h"
+
+uint32_t pool_intern(char *key);
+char *pool_fetch(uint32_t entry);
+uint32_t pool_tok_r(char *str, const char *delim, char **saveptr);
+void pool_print_seq(uint32_t len, uint32_t *seq, char delim, FILE *stream);
+uint32_t pool_tok_seq(uint32_t max, uint32_t *seq, char *delim, char *str);
+void pool_init(void);
+void pool_reset(void);
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/6] Add stream helper library
  2010-06-04 13:26 [PATCH 0/6] Merge David's SVN exporter into git.git Ramkumar Ramachandra
                   ` (2 preceding siblings ...)
  2010-06-04 13:26 ` [PATCH 3/6] Add library for string-specific memory pool Ramkumar Ramachandra
@ 2010-06-04 13:26 ` Ramkumar Ramachandra
  2010-06-04 13:26 ` [PATCH 5/6] Add infrastructure to write revisions in fast-export format Ramkumar Ramachandra
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ramkumar Ramachandra @ 2010-06-04 13:26 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier,
	Junio C Hamano

This library provides facilities to read streams into buffers. It
maintains a couple of static buffers and provides an API to use them.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 vcs-svn/line_buffer.c |  129 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/line_buffer.h |   14 +++++
 2 files changed, 143 insertions(+), 0 deletions(-)
 create mode 100644 vcs-svn/line_buffer.c
 create mode 100644 vcs-svn/line_buffer.h

diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
new file mode 100644
index 0000000..740676d
--- /dev/null
+++ b/vcs-svn/line_buffer.c
@@ -0,0 +1,129 @@
+#include "git-compat-util.h"
+
+#include "line_buffer.h"
+#include "obj_pool.h"
+
+#define LINE_BUFFER_LEN 10000
+#define COPY_BUFFER_LEN 4096
+
+/* Create memory pool for char sequence of known length */
+obj_pool_gen(blob, char, 4096);
+
+static char line_buffer[LINE_BUFFER_LEN];
+static char byte_buffer[COPY_BUFFER_LEN];
+static uint32_t line_buffer_len = 0;
+static uint32_t line_len = 0;
+static FILE *infile;
+
+int buffer_init(char *filename)
+{
+	infile = fopen(filename, "r");
+	if (!infile)
+		return 1;
+	return 0;
+}
+
+int buffer_deinit()
+{
+	fclose(infile);
+	return 0;
+}
+
+char *buffer_read_line(void)
+{
+	char *end;
+	uint32_t n_read;
+
+	if (line_len) {
+		memmove(line_buffer, &line_buffer[line_len],
+			line_buffer_len - line_len);
+		line_buffer_len -= line_len;
+		line_len = 0;
+	}
+
+	end = memchr(line_buffer, '\n', line_buffer_len);
+	while (line_buffer_len < LINE_BUFFER_LEN - 1 &&
+	       !feof(infile) && ferror(infile) && NULL == end) {
+		n_read = fread(&line_buffer[line_buffer_len], 1,
+			       LINE_BUFFER_LEN - 1 - line_buffer_len,
+			       infile);
+		end = memchr(&line_buffer[line_buffer_len], '\n', n_read);
+		line_buffer_len += n_read;
+	}
+
+	if (ferror(infile))
+		return NULL;
+
+	if (end != NULL) {
+		line_len = end - line_buffer;
+		line_buffer[line_len++] = '\0';
+	} else {
+		line_len = line_buffer_len;
+		line_buffer[line_buffer_len] = '\0';
+	}
+
+	if (line_len == 0)
+		return NULL;
+
+	return line_buffer;
+}
+
+char *buffer_read_string(uint32_t len)
+{
+	char *s;
+	blob_free(blob_pool.size);
+	s = blob_pointer(blob_alloc(len + 1));
+	uint32_t offset = 0;
+	if (line_buffer_len > line_len) {
+		offset = line_buffer_len - line_len;
+		if (offset > len)
+			offset = len;
+		memcpy(s, &line_buffer[line_len], offset);
+		line_len += offset;
+	}
+	if (offset < len)
+		offset += fread(&s[offset], 1, len - offset, infile);
+	s[offset] = '\0';
+	return s;
+}
+
+void buffer_copy_bytes(uint32_t len)
+{
+	uint32_t in;
+	if (line_buffer_len > line_len) {
+		in = line_buffer_len - line_len;
+		if (in > len)
+			in = len;
+		fwrite(&line_buffer[line_len], 1, in, stdout);
+		len -= in;
+		line_len += in;
+	}
+	while (len > 0 && !feof(infile)) {
+		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
+		in = fread(byte_buffer, 1, in, infile);
+		len -= in;
+		fwrite(byte_buffer, 1, in, stdout);
+	}
+}
+
+void buffer_skip_bytes(uint32_t len)
+{
+	uint32_t in;
+	if (line_buffer_len > line_len) {
+		in = line_buffer_len - line_len;
+		if (in > len)
+			in = len;
+		line_len += in;
+		len -= in;
+	}
+	while (len > 0 && !feof(infile) && !ferror(infile)) {
+		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
+		in = fread(byte_buffer, 1, in, infile);
+		len -= in;
+	}
+}
+
+void buffer_reset(void)
+{
+	blob_reset();
+}
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
new file mode 100644
index 0000000..a6c42d7
--- /dev/null
+++ b/vcs-svn/line_buffer.h
@@ -0,0 +1,14 @@
+#ifndef LINE_BUFFER_H_
+#define LINE_BUFFER_H_
+
+#include <stdint.h>
+
+int buffer_init(char *filename);
+int buffer_deinit(void);
+char *buffer_read_line(void);
+char *buffer_read_string(uint32_t len);
+void buffer_copy_bytes(uint32_t len);
+void buffer_skip_bytes(uint32_t len);
+void buffer_reset(void);
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/6] Add infrastructure to write revisions in fast-export format
  2010-06-04 13:26 [PATCH 0/6] Merge David's SVN exporter into git.git Ramkumar Ramachandra
                   ` (3 preceding siblings ...)
  2010-06-04 13:26 ` [PATCH 4/6] Add stream helper library Ramkumar Ramachandra
@ 2010-06-04 13:26 ` Ramkumar Ramachandra
  2010-06-04 13:26 ` [PATCH 6/6] Add SVN dump parser Ramkumar Ramachandra
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Ramkumar Ramachandra @ 2010-06-04 13:26 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier,
	Junio C Hamano

repo_tree maintains the exporter's state and provides a facility to
to call fast_export, which then writes objects to stdout suitable for
consumption by git-fast-import.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 vcs-svn/fast_export.c |   69 ++++++++++
 vcs-svn/fast_export.h |   14 ++
 vcs-svn/repo_tree.c   |  353 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/repo_tree.h   |   27 ++++
 4 files changed, 463 insertions(+), 0 deletions(-)
 create mode 100644 vcs-svn/fast_export.c
 create mode 100644 vcs-svn/fast_export.h
 create mode 100644 vcs-svn/repo_tree.c
 create mode 100644 vcs-svn/repo_tree.h

diff --git a/vcs-svn/fast_export.c b/vcs-svn/fast_export.c
new file mode 100644
index 0000000..e5eb409
--- /dev/null
+++ b/vcs-svn/fast_export.c
@@ -0,0 +1,69 @@
+#include "git-compat-util.h"
+
+#include "fast_export.h"
+#include "line_buffer.h"
+#include "repo_tree.h"
+#include "string_pool.h"
+
+#define MAX_GITSVN_LINE_LEN 4096
+
+static uint32_t first_commit_done;
+
+void fast_export_delete(uint32_t depth, uint32_t *path)
+{
+	putchar('D');
+	putchar(' ');
+	pool_print_seq(depth, path, '/', stdout);
+	putchar('\n');
+}
+
+void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
+                        uint32_t mark)
+{
+	printf("M %06o :%d ", mode, mark);
+	pool_print_seq(depth, path, '/', stdout);
+	putchar('\n');
+}
+
+static char gitsvnline[MAX_GITSVN_LINE_LEN];
+void fast_export_commit(uint32_t revision, uint32_t author, char *log,
+                        uint32_t uuid, uint32_t url,
+                        unsigned long timestamp)
+{
+	if (!log)
+		log = "";
+	if (~uuid && ~url) {
+		snprintf(gitsvnline, MAX_GITSVN_LINE_LEN, "\n\ngit-svn-id: %s@%d %s\n",
+				 pool_fetch(url), revision, pool_fetch(uuid));
+	} else {
+		*gitsvnline = '\0';
+	}
+	printf("commit refs/heads/master\n");
+	printf("committer %s <%s@%s> %ld +0000\n",
+		   ~author ? pool_fetch(author) : "nobody",
+		   ~author ? pool_fetch(author) : "nobody",
+		   ~uuid ? pool_fetch(uuid) : "local", timestamp);
+	printf("data %zd\n%s%s\n",
+		   strlen(log) + strlen(gitsvnline), log, gitsvnline);
+	if (!first_commit_done) {
+		if (revision > 1)
+			printf("from refs/heads/master^0\n");
+		first_commit_done = 1;
+	}
+	repo_diff(revision - 1, revision);
+	fputc('\n', stdout);
+
+	printf("progress Imported commit %d.\n\n", revision);
+}
+
+void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len)
+{
+	if (mode == REPO_MODE_LNK) {
+		/* svn symlink blobs start with "link " */
+		buffer_skip_bytes(5);
+		len -= 5;
+	}
+	printf("blob\nmark :%d\ndata %d\n", mark, len);
+	buffer_copy_bytes(len);
+	fputc('\n', stdout);
+}
diff --git a/vcs-svn/fast_export.h b/vcs-svn/fast_export.h
new file mode 100644
index 0000000..e9ea3ed
--- /dev/null
+++ b/vcs-svn/fast_export.h
@@ -0,0 +1,14 @@
+#ifndef FAST_EXPORT_H_
+#define FAST_EXPORT_H_
+
+#include <stdint.h>
+#include <time.h>
+
+void fast_export_delete(uint32_t depth, uint32_t *path);
+void fast_export_modify(uint32_t depth, uint32_t *path, uint32_t mode,
+                        uint32_t mark);
+void fast_export_commit(uint32_t revision, uint32_t author, char *log,
+                        uint32_t uuid, uint32_t url, unsigned long timestamp);
+void fast_export_blob(uint32_t mode, uint32_t mark, uint32_t len);
+
+#endif
diff --git a/vcs-svn/repo_tree.c b/vcs-svn/repo_tree.c
new file mode 100644
index 0000000..5dfffb8
--- /dev/null
+++ b/vcs-svn/repo_tree.c
@@ -0,0 +1,353 @@
+#include "git-compat-util.h"
+
+#include "string_pool.h"
+#include "repo_tree.h"
+#include "obj_pool.h"
+#include "fast_export.h"
+
+struct repo_dirent {
+	uint32_t name_offset;
+	uint32_t mode;
+	uint32_t content_offset;
+};
+
+struct repo_dir {
+	uint32_t size;
+	uint32_t first_offset;
+};
+
+struct repo_commit {
+	uint32_t mark;
+	uint32_t root_dir_offset;
+};
+
+/* Generate memory pools for commit, dir and dirent */
+obj_pool_gen(commit, struct repo_commit, 4096);
+obj_pool_gen(dir, struct repo_dir, 4096);
+obj_pool_gen(dirent, struct repo_dirent, 4096);
+
+static uint32_t num_dirs_saved;
+static uint32_t num_dirents_saved;
+static uint32_t active_commit;
+static uint32_t _mark;
+
+uint32_t next_blob_mark(void)
+{
+	return _mark++;
+}
+
+static struct repo_dir *repo_commit_root_dir(struct repo_commit *commit)
+{
+	return dir_pointer(commit->root_dir_offset);
+}
+
+static struct repo_dirent *repo_first_dirent(struct repo_dir *dir)
+{
+	return dirent_pointer(dir->first_offset);
+}
+
+static int repo_dirent_name_cmp(const void *a, const void *b)
+{
+	const struct repo_dirent *dirent1 = a, *dirent2 = b;
+	uint32_t a_offset = dirent1->name_offset;
+	uint32_t b_offset = dirent2->name_offset;
+	return (a_offset > b_offset) - (a_offset < b_offset);
+}
+
+static struct repo_dirent *repo_dirent_by_name(struct repo_dir *dir,
+                                          uint32_t name_offset)
+{
+	struct repo_dirent key;
+	if (dir == NULL || dir->size == 0)
+		return NULL;
+	key.name_offset = name_offset;
+	return bsearch(&key, repo_first_dirent(dir), dir->size,
+				   sizeof(struct repo_dirent), repo_dirent_name_cmp);
+}
+
+static int repo_dirent_is_dir(struct repo_dirent *dirent)
+{
+	return dirent != NULL && dirent->mode == REPO_MODE_DIR;
+}
+
+static struct repo_dir *repo_dir_from_dirent(struct repo_dirent *dirent)
+{
+	if (!repo_dirent_is_dir(dirent))
+		return NULL;
+	return dir_pointer(dirent->content_offset);
+}
+
+static uint32_t dir_with_dirents_alloc(uint32_t size)
+{
+	uint32_t offset = dir_alloc(1);
+	dir_pointer(offset)->size = size;
+	dir_pointer(offset)->first_offset = dirent_alloc(size);
+	return offset;
+}
+
+static struct repo_dir *repo_clone_dir(struct repo_dir *orig_dir, uint32_t padding)
+{
+	uint32_t orig_o, new_o, dirent_o;
+	orig_o = dir_offset(orig_dir);
+	if (orig_o < num_dirs_saved) {
+		new_o = dir_with_dirents_alloc(orig_dir->size + padding);
+		orig_dir = dir_pointer(orig_o);
+		dirent_o = dir_pointer(new_o)->first_offset;
+	} else {
+		if (padding == 0)
+			return orig_dir;
+		new_o = orig_o;
+		dirent_o = dirent_alloc(orig_dir->size + padding);
+	}
+	memcpy(dirent_pointer(dirent_o), repo_first_dirent(orig_dir),
+		   orig_dir->size * sizeof(struct repo_dirent));
+	dir_pointer(new_o)->size = orig_dir->size + padding;
+	dir_pointer(new_o)->first_offset = dirent_o;
+	return dir_pointer(new_o);
+}
+
+static struct repo_dirent *repo_read_dirent(uint32_t revision, uint32_t *path)
+{
+	uint32_t name = 0;
+	struct repo_dir *dir = NULL;
+	struct repo_dirent *dirent = NULL;
+	dir = repo_commit_root_dir(commit_pointer(revision));
+	while (~(name = *path++)) {
+		dirent = repo_dirent_by_name(dir, name);
+		if (dirent == NULL) {
+			return NULL;
+		} else if (repo_dirent_is_dir(dirent)) {
+			dir = repo_dir_from_dirent(dirent);
+		} else {
+			break;
+		}
+	}
+	return dirent;
+}
+
+static void
+repo_write_dirent(uint32_t *path, uint32_t mode, uint32_t content_offset,
+                  uint32_t del)
+{
+	uint32_t name, revision, dirent_o = ~0, dir_o = ~0, parent_dir_o = ~0;
+	struct repo_dir *dir;
+	struct repo_dirent *dirent = NULL;
+	revision = active_commit;
+	dir = repo_commit_root_dir(commit_pointer(revision));
+	dir = repo_clone_dir(dir, 0);
+	commit_pointer(revision)->root_dir_offset = dir_offset(dir);
+	while (~(name = *path++)) {
+		parent_dir_o = dir_offset(dir);
+		dirent = repo_dirent_by_name(dir, name);
+		if (dirent == NULL) {
+			dir = repo_clone_dir(dir, 1);
+			dirent = &repo_first_dirent(dir)[dir->size - 1];
+			dirent->name_offset = name;
+			dirent->mode = REPO_MODE_DIR;
+			qsort(repo_first_dirent(dir), dir->size,
+				  sizeof(struct repo_dirent), repo_dirent_name_cmp);
+			dirent = repo_dirent_by_name(dir, name);
+			dir_o = dir_with_dirents_alloc(0);
+			dirent->content_offset = dir_o;
+			dir = dir_pointer(dir_o);
+		} else if ((dir = repo_dir_from_dirent(dirent))) {
+			dirent_o = dirent_offset(dirent);
+			dir = repo_clone_dir(dir, 0);
+			if (dirent_o != ~0)
+				dirent_pointer(dirent_o)->content_offset = dir_offset(dir);
+		} else {
+			dirent->mode = REPO_MODE_DIR;
+			dirent_o = dirent_offset(dirent);
+			dir_o = dir_with_dirents_alloc(0);
+			dirent = dirent_pointer(dirent_o);
+			dir = dir_pointer(dir_o);
+			dirent->content_offset = dir_o;
+		}
+	}
+	if (dirent) {
+		dirent->mode = mode;
+		dirent->content_offset = content_offset;
+		if (del && ~parent_dir_o) {
+			dirent->name_offset = ~0;
+			dir = dir_pointer(parent_dir_o);
+			qsort(repo_first_dirent(dir), dir->size,
+				  sizeof(struct repo_dirent), repo_dirent_name_cmp);
+			dir->size--;
+		}
+	}
+}
+
+uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst)
+{
+	uint32_t mode = 0, content_offset = 0;
+	struct repo_dirent *src_dirent;
+	src_dirent = repo_read_dirent(revision, src);
+	if (src_dirent != NULL) {
+		mode = src_dirent->mode;
+		content_offset = src_dirent->content_offset;
+		repo_write_dirent(dst, mode, content_offset, 0);
+	}
+	return mode;
+}
+
+void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark)
+{
+	repo_write_dirent(path, mode, blob_mark, 0);
+}
+
+uint32_t repo_replace(uint32_t *path, uint32_t blob_mark)
+{
+	uint32_t mode = 0;
+	struct repo_dirent *src_dirent;
+	src_dirent = repo_read_dirent(active_commit, path);
+	if (src_dirent != NULL) {
+		mode = src_dirent->mode;
+		repo_write_dirent(path, mode, blob_mark, 0);
+	}
+	return mode;
+}
+
+void repo_modify(uint32_t *path, uint32_t mode, uint32_t blob_mark)
+{
+	struct repo_dirent *src_dirent;
+	src_dirent = repo_read_dirent(active_commit, path);
+	if (src_dirent != NULL && blob_mark == 0) {
+		blob_mark = src_dirent->content_offset;
+	}
+	repo_write_dirent(path, mode, blob_mark, 0);
+}
+
+void repo_delete(uint32_t *path)
+{
+	repo_write_dirent(path, 0, 0, 1);
+}
+
+static void repo_git_add_r(uint32_t depth, uint32_t *path, struct repo_dir *dir);
+
+static void repo_git_add(uint32_t depth, uint32_t *path, struct repo_dirent *dirent)
+{
+	if (repo_dirent_is_dir(dirent)) {
+		repo_git_add_r(depth, path, repo_dir_from_dirent(dirent));
+	} else {
+		fast_export_modify(depth, path, dirent->mode, dirent->content_offset);
+	}
+}
+
+static void repo_git_add_r(uint32_t depth, uint32_t *path, struct repo_dir *dir)
+{
+	uint32_t o;
+	struct repo_dirent *de;
+	de = repo_first_dirent(dir);
+	for (o = 0; o < dir->size; o++) {
+		path[depth] = de[o].name_offset;
+		repo_git_add(depth + 1, path, &de[o]);
+	}
+}
+
+static void repo_diff_r(uint32_t depth, uint32_t *path, struct repo_dir *dir1,
+			struct repo_dir *dir2)
+{
+	struct repo_dirent *de1, *de2, *max_de1, *max_de2;
+	de1 = repo_first_dirent(dir1);
+	de2 = repo_first_dirent(dir2);
+	max_de1 = &de1[dir1->size];
+	max_de2 = &de2[dir2->size];
+
+	while (de1 < max_de1 && de2 < max_de2) {
+		if (de1->name_offset < de2->name_offset) {
+			path[depth] = (de1++)->name_offset;
+			fast_export_delete(depth + 1, path);
+		} else if (de1->name_offset > de2->name_offset) {
+			path[depth] = de2->name_offset;
+			repo_git_add(depth + 1, path, de2++);
+		} else {
+			path[depth] = de1->name_offset;
+			if (de1->mode != de2->mode ||
+				de1->content_offset != de2->content_offset) {
+				if (repo_dirent_is_dir(de1) && repo_dirent_is_dir(de2)) {
+					repo_diff_r(depth + 1, path,
+								repo_dir_from_dirent(de1),
+								repo_dir_from_dirent(de2));
+				} else {
+					if (repo_dirent_is_dir(de1) != repo_dirent_is_dir(de2)) {
+						fast_export_delete(depth + 1, path);
+					}
+					repo_git_add(depth + 1, path, de2);
+				}
+			}
+			de1++;
+			de2++;
+		}
+	}
+	while (de1 < max_de1) {
+		path[depth] = (de1++)->name_offset;
+		fast_export_delete(depth + 1, path);
+	}
+	while (de2 < max_de2) {
+		path[depth] = de2->name_offset;
+		repo_git_add(depth + 1, path, de2++);
+	}
+}
+
+static uint32_t path_stack[REPO_MAX_PATH_DEPTH];
+
+void repo_diff(uint32_t r1, uint32_t r2)
+{
+	repo_diff_r(0,
+	            path_stack,
+	            repo_commit_root_dir(commit_pointer(r1)),
+	            repo_commit_root_dir(commit_pointer(r2)));
+}
+
+void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
+                 uint32_t url, unsigned long timestamp)
+{
+	fast_export_commit(revision, author, log, uuid, url, timestamp);
+	num_dirs_saved = dir_pool.size;
+	num_dirents_saved = dirent_pool.size;
+	active_commit = commit_alloc(1);
+	commit_pointer(active_commit)->root_dir_offset =
+		commit_pointer(active_commit - 1)->root_dir_offset;
+}
+
+static void mark_init(void)
+{
+	uint32_t i;
+	_mark = 0;
+	for (i = 0; i < dirent_pool.size; i++)
+		if (!repo_dirent_is_dir(dirent_pointer(i)) &&
+			dirent_pointer(i)->content_offset > _mark)
+			_mark = dirent_pointer(i)->content_offset;
+	_mark++;
+}
+
+void repo_init() {
+	pool_init();
+	commit_init();
+	dir_init();
+	dirent_init();
+	mark_init();
+	num_dirs_saved = dir_pool.size;
+	num_dirents_saved = dirent_pool.size;
+	active_commit = commit_pool.size - 1;
+	if (active_commit == -1) {
+		commit_alloc(2);
+		/* Create empty tree for commit 0. */
+		commit_pointer(0)->root_dir_offset =
+			dir_with_dirents_alloc(0);
+		/* Preallocate commit 1, ready for changes. */
+		commit_pointer(1)->root_dir_offset =
+			commit_pointer(0)->root_dir_offset;
+		active_commit = 1;
+		num_dirs_saved = dir_pool.size;
+		num_dirents_saved = dirent_pool.size;
+	}
+}
+
+void repo_reset(void)
+{
+	pool_reset();
+	commit_reset();
+	dir_reset();
+	dirent_reset();
+}
diff --git a/vcs-svn/repo_tree.h b/vcs-svn/repo_tree.h
new file mode 100644
index 0000000..c49b65a
--- /dev/null
+++ b/vcs-svn/repo_tree.h
@@ -0,0 +1,27 @@
+#ifndef REPO_TREE_H_
+#define REPO_TREE_H_
+
+#include <stdint.h>
+#include <time.h>
+
+#define REPO_MODE_DIR 0040000
+#define REPO_MODE_BLB 0100644
+#define REPO_MODE_EXE 0100755
+#define REPO_MODE_LNK 0120000
+
+#define REPO_MAX_PATH_LEN 4096
+#define REPO_MAX_PATH_DEPTH 1000
+
+uint32_t next_blob_mark(void);
+uint32_t repo_copy(uint32_t revision, uint32_t *src, uint32_t *dst);
+void repo_add(uint32_t *path, uint32_t mode, uint32_t blob_mark);
+uint32_t repo_replace(uint32_t *path, uint32_t blob_mark);
+void repo_modify(uint32_t *path, uint32_t mode, uint32_t blob_mark);
+void repo_delete(uint32_t *path);
+void repo_commit(uint32_t revision, uint32_t author, char *log, uint32_t uuid,
+                 uint32_t url, long unsigned timestamp);
+void repo_diff(uint32_t r1, uint32_t r2);
+void repo_init(void);
+void repo_reset(void);
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/6] Add SVN dump parser
  2010-06-04 13:26 [PATCH 0/6] Merge David's SVN exporter into git.git Ramkumar Ramachandra
                   ` (4 preceding siblings ...)
  2010-06-04 13:26 ` [PATCH 5/6] Add infrastructure to write revisions in fast-export format Ramkumar Ramachandra
@ 2010-06-04 13:26 ` Ramkumar Ramachandra
  2010-06-04 13:29 ` [PATCH 0/6] Merge David's SVN exporter into git.git Sverre Rabbelier
  2010-06-04 13:31 ` Michael J Gruber
  7 siblings, 0 replies; 11+ messages in thread
From: Ramkumar Ramachandra @ 2010-06-04 13:26 UTC (permalink / raw)
  To: Git Mailing List
  Cc: David Michael Barr, Jonathan Nieder, Sverre Rabbelier,
	Junio C Hamano

svndump parses data that is in SVN dumpfile format produced by
`svnadmin dump` with the help of line_buffer, and uses repo_tree and
fast_export to emit a git fast-import stream.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 vcs-svn/svndump.c |  294 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndump.h |    7 ++
 2 files changed, 301 insertions(+), 0 deletions(-)
 create mode 100644 vcs-svn/svndump.c
 create mode 100644 vcs-svn/svndump.h

diff --git a/vcs-svn/svndump.c b/vcs-svn/svndump.c
new file mode 100644
index 0000000..9ee1246
--- /dev/null
+++ b/vcs-svn/svndump.c
@@ -0,0 +1,294 @@
+/*
+ * Parse and rearrange a svnadmin dump.
+ * Create the dump with:
+ * svnadmin dump --incremental -r<startrev>:<endrev> <repository> >outfile
+ */
+
+#include "cache.h"
+#include "git-compat-util.h"
+
+#include "repo_tree.h"
+#include "fast_export.h"
+#include "line_buffer.h"
+#include "obj_pool.h"
+#include "string_pool.h"
+
+#define NODEACT_REPLACE 4
+#define NODEACT_DELETE 3
+#define NODEACT_ADD 2
+#define NODEACT_CHANGE 1
+#define NODEACT_UNKNOWN 0
+
+#define DUMP_CTX 0
+#define REV_CTX  1
+#define NODE_CTX 2
+
+#define LENGTH_UNKNOWN (~0)
+#define DATE_RFC2822_LEN 31
+
+/* Create memory pool for log messages */
+obj_pool_gen(log, char, 4096);
+
+static char* log_copy(uint32_t length, char *log)
+{
+	char *buffer;
+	log_free(log_pool.size);
+	buffer = log_pointer(log_alloc(length));
+	strncpy(buffer, log, length);
+	return buffer;
+}
+
+static struct {
+	uint32_t action, propLength, textLength, srcRev, srcMode, mark, type;
+	uint32_t src[REPO_MAX_PATH_DEPTH], dst[REPO_MAX_PATH_DEPTH];
+} node_ctx;
+
+static struct {
+	uint32_t revision, author;
+	unsigned long timestamp;
+	char *log;
+} rev_ctx;
+
+static struct {
+	uint32_t uuid, url;
+} dump_ctx;
+
+static struct {
+	uint32_t svn_log, svn_author, svn_date, svn_executable, svn_special, uuid,
+		revision_number, node_path, node_kind, node_action,
+		node_copyfrom_path, node_copyfrom_rev, text_content_length,
+		prop_content_length, content_length;
+} keys;
+
+static void reset_node_ctx(char *fname)
+{
+	node_ctx.type = 0;
+	node_ctx.action = NODEACT_UNKNOWN;
+	node_ctx.propLength = LENGTH_UNKNOWN;
+	node_ctx.textLength = LENGTH_UNKNOWN;
+	node_ctx.src[0] = ~0;
+	node_ctx.srcRev = 0;
+	node_ctx.srcMode = 0;
+	pool_tok_seq(REPO_MAX_PATH_DEPTH, node_ctx.dst, "/", fname);
+	node_ctx.mark = 0;
+}
+
+static void reset_rev_ctx(uint32_t revision)
+{
+	rev_ctx.revision = revision;
+	rev_ctx.timestamp = "";
+	rev_ctx.log = NULL;
+	rev_ctx.author = ~0;
+}
+
+static void reset_dump_ctx(uint32_t url)
+{
+	dump_ctx.url = url;
+	dump_ctx.uuid = ~0;
+}
+
+static void init_keys(void)
+{
+	keys.svn_log = pool_intern("svn:log");
+	keys.svn_author = pool_intern("svn:author");
+	keys.svn_date = pool_intern("svn:date");
+	keys.svn_executable = pool_intern("svn:executable");
+	keys.svn_special = pool_intern("svn:special");
+	keys.uuid = pool_intern("UUID");
+	keys.revision_number = pool_intern("Revision-number");
+	keys.node_path = pool_intern("Node-path");
+	keys.node_kind = pool_intern("Node-kind");
+	keys.node_action = pool_intern("Node-action");
+	keys.node_copyfrom_path = pool_intern("Node-copyfrom-path");
+	keys.node_copyfrom_rev = pool_intern("Node-copyfrom-rev");
+	keys.text_content_length = pool_intern("Text-content-length");
+	keys.prop_content_length = pool_intern("Prop-content-length");
+	keys.content_length = pool_intern("Content-length");
+}
+
+static void read_props(void)
+{
+	uint32_t len;
+	uint32_t key = ~0;
+	char buffer[27];
+	char *val = NULL;
+	char *t;
+	while ((t = buffer_read_line()) && strcmp(t, "PROPS-END")) {
+		if (!strncmp(t, "K ", 2)) {
+			len = atoi(&t[2]);
+			key = pool_intern(buffer_read_string(len));
+			buffer_read_line();
+		} else if (!strncmp(t, "V ", 2)) {
+			len = atoi(&t[2]);
+			val = buffer_read_string(len);
+			if (key == keys.svn_log) {
+				/* Value length excludes terminating nul. */
+				rev_ctx.log = log_copy(len + 1, val);
+			} else if (key == keys.svn_author) {
+				rev_ctx.author = pool_intern(val);
+			} else if (key == keys.svn_date) {
+				if (parse_date(val, buffer, sizeof(buffer)) > 0)
+					rev_ctx.timestamp = strtoul(buffer, NULL, 0);
+				else
+					fprintf(stderr, "Invalid timestamp: %s", val);
+			} else if (key == keys.svn_executable) {
+				node_ctx.type = REPO_MODE_EXE;
+			} else if (key == keys.svn_special) {
+				node_ctx.type = REPO_MODE_LNK;
+			}
+			key = ~0;
+			buffer_read_line();
+		}
+	}
+}
+
+static void handle_node(void)
+{
+	if (node_ctx.propLength != LENGTH_UNKNOWN && node_ctx.propLength) {
+		read_props();
+	}
+
+	if (node_ctx.srcRev) {
+		node_ctx.srcMode = repo_copy(node_ctx.srcRev, node_ctx.src, node_ctx.dst);
+	}
+
+	if (node_ctx.textLength != LENGTH_UNKNOWN &&
+		node_ctx.type != REPO_MODE_DIR) {
+		node_ctx.mark = next_blob_mark();
+	}
+
+	if (node_ctx.action == NODEACT_DELETE) {
+		repo_delete(node_ctx.dst);
+	} else if (node_ctx.action == NODEACT_CHANGE ||
+			   node_ctx.action == NODEACT_REPLACE) {
+		if (node_ctx.action == NODEACT_REPLACE &&
+			node_ctx.type == REPO_MODE_DIR) {
+			repo_replace(node_ctx.dst, node_ctx.mark);
+		} else if (node_ctx.propLength != LENGTH_UNKNOWN ) {
+			repo_modify(node_ctx.dst, node_ctx.type, node_ctx.mark);
+		} else if (node_ctx.textLength != LENGTH_UNKNOWN) {
+			node_ctx.srcMode = repo_replace(node_ctx.dst, node_ctx.mark);
+		}
+	} else if (node_ctx.action == NODEACT_ADD) {
+		if (node_ctx.srcRev &&
+			node_ctx.propLength == LENGTH_UNKNOWN &&
+			node_ctx.textLength != LENGTH_UNKNOWN) {
+			node_ctx.srcMode = repo_replace(node_ctx.dst, node_ctx.mark);
+		} else if ((node_ctx.type == REPO_MODE_DIR && !node_ctx.srcRev) ||
+				   node_ctx.textLength != LENGTH_UNKNOWN){
+			repo_add(node_ctx.dst, node_ctx.type, node_ctx.mark);
+		}
+	}
+
+	if (node_ctx.propLength == LENGTH_UNKNOWN && node_ctx.srcMode) {
+		node_ctx.type = node_ctx.srcMode;
+	}
+
+	if (node_ctx.mark) {
+		fast_export_blob(node_ctx.type, node_ctx.mark, node_ctx.textLength);
+	} else if (node_ctx.textLength != LENGTH_UNKNOWN) {
+		buffer_skip_bytes(node_ctx.textLength);
+	}
+}
+
+static void handle_revision(void)
+{
+	if (rev_ctx.revision)
+		repo_commit(rev_ctx.revision, rev_ctx.author, rev_ctx.log,
+		            dump_ctx.uuid, dump_ctx.url, rev_ctx.timestamp);
+}
+
+void svndump_read(uint32_t url)
+{
+	char *val;
+	char *t;
+	uint32_t active_ctx = DUMP_CTX;
+	uint32_t len;
+	uint32_t key;
+
+	reset_dump_ctx(url);
+	while ((t = buffer_read_line())) {
+		val = strstr(t, ": ");
+		if (!val) continue;
+		*val++ = '\0';
+		*val++ = '\0';
+		key = pool_intern(t);
+
+		if(key == keys.uuid) {
+			dump_ctx.uuid = pool_intern(val);
+		} else if (key == keys.revision_number) {
+			if (active_ctx == NODE_CTX) handle_node();
+			if (active_ctx != DUMP_CTX) handle_revision();
+			active_ctx = REV_CTX;
+			reset_rev_ctx(atoi(val));
+		} else if (key == keys.node_path) {
+			if (active_ctx == NODE_CTX)
+				handle_node();
+			active_ctx = NODE_CTX;
+			reset_node_ctx(val);
+		} else if (key == keys.node_kind) {
+			if (!strcmp(val, "dir")) {
+				node_ctx.type = REPO_MODE_DIR;
+			} else if (!strcmp(val, "file")) {
+				node_ctx.type = REPO_MODE_BLB;
+			} else {
+				fprintf(stderr, "Unknown node-kind: %s\n", val);
+			}
+		} else if (key == keys.node_action) {
+			if (!strcmp(val, "delete")) {
+				node_ctx.action = NODEACT_DELETE;
+			} else if (!strcmp(val, "add")) {
+				node_ctx.action = NODEACT_ADD;
+			} else if (!strcmp(val, "change")) {
+				node_ctx.action = NODEACT_CHANGE;
+			} else if (!strcmp(val, "replace")) {
+				node_ctx.action = NODEACT_REPLACE;
+			} else {
+				fprintf(stderr, "Unknown node-action: %s\n", val);
+				node_ctx.action = NODEACT_UNKNOWN;
+			}
+		} else if (key == keys.node_copyfrom_path) {
+			pool_tok_seq(REPO_MAX_PATH_DEPTH, node_ctx.src, "/", val);
+		} else if (key == keys.node_copyfrom_rev) {
+			node_ctx.srcRev = atoi(val);
+		} else if (key == keys.text_content_length) {
+			node_ctx.textLength = atoi(val);
+		} else if (key == keys.prop_content_length) {
+			node_ctx.propLength = atoi(val);
+		} else if (key == keys.content_length) {
+			len = atoi(val);
+			buffer_read_line();
+			if (active_ctx == REV_CTX) {
+				read_props();
+			} else if (active_ctx == NODE_CTX) {
+				handle_node();
+				active_ctx = REV_CTX;
+			} else {
+				fprintf(stderr, "Unexpected content length header: %d\n", len);
+				buffer_skip_bytes(len);
+			}
+		}
+	}
+	if (active_ctx == NODE_CTX) handle_node();
+	if (active_ctx != DUMP_CTX) handle_revision();
+}
+
+static void svndump_init(void)
+{
+	log_init();
+	repo_init();
+	reset_dump_ctx(~0);
+	reset_rev_ctx(0);
+	reset_node_ctx(NULL);
+	init_keys();
+}
+
+void svndump_reset(void)
+{
+	log_reset();
+	buffer_reset();
+	repo_reset();
+	reset_dump_ctx(~0);
+	reset_rev_ctx(0);
+	reset_node_ctx(NULL);
+}
diff --git a/vcs-svn/svndump.h b/vcs-svn/svndump.h
new file mode 100644
index 0000000..e205f1f
--- /dev/null
+++ b/vcs-svn/svndump.h
@@ -0,0 +1,7 @@
+#ifndef SVNDUMP_H_
+#define SVNDUMP_H_
+
+void svndump_read(char *url);
+void svndump_reset(void);
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] Merge David's SVN exporter into git.git
  2010-06-04 13:26 [PATCH 0/6] Merge David's SVN exporter into git.git Ramkumar Ramachandra
                   ` (5 preceding siblings ...)
  2010-06-04 13:26 ` [PATCH 6/6] Add SVN dump parser Ramkumar Ramachandra
@ 2010-06-04 13:29 ` Sverre Rabbelier
  2010-06-04 13:31 ` Michael J Gruber
  7 siblings, 0 replies; 11+ messages in thread
From: Sverre Rabbelier @ 2010-06-04 13:29 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Jonathan Nieder,
	Junio C Hamano

Heya,

On Fri, Jun 4, 2010 at 15:26, Ramkumar Ramachandra <artagnon@gmail.com> wrote:
> in my opinion the author of all six
> patches should be hand-edited to:
> David Barr <david.barr@gmail.com>

If that is the case you should include a "From" line at the top of your patch.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] Merge David's SVN exporter into git.git
  2010-06-04 13:26 [PATCH 0/6] Merge David's SVN exporter into git.git Ramkumar Ramachandra
                   ` (6 preceding siblings ...)
  2010-06-04 13:29 ` [PATCH 0/6] Merge David's SVN exporter into git.git Sverre Rabbelier
@ 2010-06-04 13:31 ` Michael J Gruber
  2010-06-04 13:41   ` Ramkumar Ramachandra
  7 siblings, 1 reply; 11+ messages in thread
From: Michael J Gruber @ 2010-06-04 13:31 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Jonathan Nieder,
	Sverre Rabbelier, Junio C Hamano

Ramkumar Ramachandra venit, vidit, dixit 04.06.2010 15:26:
> Hi,
> 
> This is another attempt to merge David's SVN exporter into
> git.git. What changed since last time: David implemented incremental
> dump support, and I fixed certain things for the merge, as suggested
> by Jonathan Nieder. Preparing patches for the list eats up a lot of my
> time, and if this batch is more-or-less okay, I'd like it to be merged
> atleast into `pu`: we can squash in minor fixes later. The exporter is
> functionally complete and validated against ~940k revisions of the ASF
> repository.
> 
> You can see the complete revision history in my `git-merge` branch of
> my fork of svn-dump-fast-export [1].
> 
> The issue of authorship has already been discussed, but what exactly I
> should do isn't very clear to me- in my opinion the author of all six
> patches should be hand-edited to:
> David Barr <david.barr@gmail.com>

Well, in that case you should rewrite the patches to have David in the
author field (e.g. rebase -i and commit --amend --author=...). Git
format-patch/send-email will insert an additional From header in the
patch e-mail then. (Sorry if that was clear to you already. I see you
have a different author in 2/6.)

Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] Merge David's SVN exporter into git.git
  2010-06-04 13:31 ` Michael J Gruber
@ 2010-06-04 13:41   ` Ramkumar Ramachandra
  2010-06-04 13:47     ` Michael J Gruber
  0 siblings, 1 reply; 11+ messages in thread
From: Ramkumar Ramachandra @ 2010-06-04 13:41 UTC (permalink / raw)
  To: Michael J Gruber
  Cc: Git Mailing List, David Michael Barr, Jonathan Nieder,
	Sverre Rabbelier, Junio C Hamano

Hi,

Sverre and Michael: I'm awfully sorry about this- I didn't realize
that I could change the author while committing. I've re-posted the
series now with the author corrected.

Thanks,
Ram

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] Merge David's SVN exporter into git.git
  2010-06-04 13:41   ` Ramkumar Ramachandra
@ 2010-06-04 13:47     ` Michael J Gruber
  0 siblings, 0 replies; 11+ messages in thread
From: Michael J Gruber @ 2010-06-04 13:47 UTC (permalink / raw)
  To: Ramkumar Ramachandra
  Cc: Git Mailing List, David Michael Barr, Jonathan Nieder,
	Sverre Rabbelier, Junio C Hamano

Ramkumar Ramachandra venit, vidit, dixit 04.06.2010 15:41:
> Hi,
> 
> Sverre and Michael: I'm awfully sorry about this- I didn't realize
> that I could change the author while committing. I've re-posted the
> series now with the author corrected.

Please don't feel sorry. We're here to help each other learn, and we
take turns in the learner's seat ;)

Cheers,
Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-06-04 13:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-04 13:26 [PATCH 0/6] Merge David's SVN exporter into git.git Ramkumar Ramachandra
2010-06-04 13:26 ` [PATCH 1/6] Add memory pool library Ramkumar Ramachandra
2010-06-04 13:26 ` [PATCH 2/6] Add cpp macro implementation of treaps Ramkumar Ramachandra
2010-06-04 13:26 ` [PATCH 3/6] Add library for string-specific memory pool Ramkumar Ramachandra
2010-06-04 13:26 ` [PATCH 4/6] Add stream helper library Ramkumar Ramachandra
2010-06-04 13:26 ` [PATCH 5/6] Add infrastructure to write revisions in fast-export format Ramkumar Ramachandra
2010-06-04 13:26 ` [PATCH 6/6] Add SVN dump parser Ramkumar Ramachandra
2010-06-04 13:29 ` [PATCH 0/6] Merge David's SVN exporter into git.git Sverre Rabbelier
2010-06-04 13:31 ` Michael J Gruber
2010-06-04 13:41   ` Ramkumar Ramachandra
2010-06-04 13:47     ` Michael J Gruber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).