* [RFC PATCH 0/3] Towards a Git-to-SVN bridge
@ 2011-01-15 6:51 Ramkumar Ramachandra
2011-01-15 6:51 ` [PATCH 1/3] date: Expose the time_to_tm function Ramkumar Ramachandra
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-15 6:51 UTC (permalink / raw)
To: Git List; +Cc: Jonathan Nieder, David Barr, Sverre Rabbelier
Hi,
Over the last couple of days, I've been working on a parser that
converts a fast-import stream into a SVN dumpfile. So far, it's very
rough and works minimally for some common fast-import
commands. However, the major roadblock is persisting blobs: in this
implementation, they're persisted as an array of strbufs. This is very
memory-intensive and not scalable at all. With some valuable insight
from Jonathan on IRC, I've decided to try re-implementing fast-export
to eliminate blob marks and produce them inline instead [1].
Comments are much appreciated.
[1]: http://colabti.org/irclogger/irclogger_log/git-devel?date=2011-01-14
Ramkumar Ramachandra (3):
date: Expose the time_to_tm function
vcs-svn: Start working on the dumpfile producer
Build an svn-fi target in contrib/svn-fe
Makefile | 2 +-
cache.h | 1 +
contrib/svn-fe/Makefile | 23 ++++-
contrib/svn-fe/svn-fi.c | 16 +++
contrib/svn-fe/svn-fi.txt | 28 +++++
date.c | 2 +-
vcs-svn/dump_export.c | 73 +++++++++++
vcs-svn/svnload.c | 294 +++++++++++++++++++++++++++++++++++++++++++++
8 files changed, 435 insertions(+), 4 deletions(-)
create mode 100644 contrib/svn-fe/svn-fi.c
create mode 100644 contrib/svn-fe/svn-fi.txt
create mode 100644 vcs-svn/dump_export.c
create mode 100644 vcs-svn/svnload.c
--
1.7.4.rc1.7.g2cf08.dirty
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] date: Expose the time_to_tm function
2011-01-15 6:51 [RFC PATCH 0/3] Towards a Git-to-SVN bridge Ramkumar Ramachandra
@ 2011-01-15 6:51 ` Ramkumar Ramachandra
2011-01-15 6:51 ` [PATCH 2/3] vcs-svn: Start working on the dumpfile producer Ramkumar Ramachandra
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-15 6:51 UTC (permalink / raw)
To: Git List; +Cc: Jonathan Nieder, David Barr, Sverre Rabbelier
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
cache.h | 1 +
date.c | 2 +-
2 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/cache.h b/cache.h
index d83d68c..95fea31 100644
--- a/cache.h
+++ b/cache.h
@@ -816,6 +816,7 @@ enum date_mode {
DATE_RAW
};
+struct tm *time_to_tm(unsigned long time, int tz);
const char *show_date(unsigned long time, int timezone, enum date_mode mode);
const char *show_date_relative(unsigned long time, int tz,
const struct timeval *now,
diff --git a/date.c b/date.c
index 00f9eb5..e601a50 100644
--- a/date.c
+++ b/date.c
@@ -54,7 +54,7 @@ static time_t gm_time_t(unsigned long time, int tz)
* thing, which means that tz -0100 is passed in as the integer -100,
* even though it means "sixty minutes off"
*/
-static struct tm *time_to_tm(unsigned long time, int tz)
+struct tm *time_to_tm(unsigned long time, int tz)
{
time_t t = gm_time_t(time, tz);
return gmtime(&t);
--
1.7.4.rc1.7.g2cf08.dirty
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] vcs-svn: Start working on the dumpfile producer
2011-01-15 6:51 [RFC PATCH 0/3] Towards a Git-to-SVN bridge Ramkumar Ramachandra
2011-01-15 6:51 ` [PATCH 1/3] date: Expose the time_to_tm function Ramkumar Ramachandra
@ 2011-01-15 6:51 ` Ramkumar Ramachandra
2011-01-15 7:39 ` Peter Baumann
2011-01-15 6:51 ` [PATCH 3/3] Build an svn-fi target in contrib/svn-fe Ramkumar Ramachandra
2011-01-15 7:22 ` [RFC PATCH 0/3] Towards a Git-to-SVN bridge Jonathan Nieder
3 siblings, 1 reply; 8+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-15 6:51 UTC (permalink / raw)
To: Git List; +Cc: Jonathan Nieder, David Barr, Sverre Rabbelier
Start off with some broad design sketches. Compile succeeds, but
parser is incorrect. Include a Makefile rule to build it into
vcs-svn/lib.a.
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
Makefile | 2 +-
vcs-svn/dump_export.c | 73 ++++++++++++
vcs-svn/svnload.c | 294 +++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 368 insertions(+), 1 deletions(-)
create mode 100644 vcs-svn/dump_export.c
create mode 100644 vcs-svn/svnload.c
diff --git a/Makefile b/Makefile
index 1345c38..40f6691 100644
--- a/Makefile
+++ b/Makefile
@@ -1834,7 +1834,7 @@ ifndef NO_CURL
endif
XDIFF_OBJS = xdiff/xdiffi.o xdiff/xprepare.o xdiff/xutils.o xdiff/xemit.o \
xdiff/xmerge.o xdiff/xpatience.o
-VCSSVN_OBJS = vcs-svn/line_buffer.o \
+VCSSVN_OBJS = vcs-svn/line_buffer.o vcs-svn/svnload.o vcs-svn/dump_export.o \
vcs-svn/repo_tree.o vcs-svn/fast_export.o vcs-svn/sliding_window.o \
vcs-svn/svndiff.o vcs-svn/svndump.o
VCSSVN_TEST_OBJS = test-obj-pool.o \
diff --git a/vcs-svn/dump_export.c b/vcs-svn/dump_export.c
new file mode 100644
index 0000000..04ede06
--- /dev/null
+++ b/vcs-svn/dump_export.c
@@ -0,0 +1,73 @@
+/*
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "git-compat-util.h"
+#include "strbuf.h"
+#include "line_buffer.h"
+#include "dump_export.h"
+
+void dump_export_begin_rev(int revision, const char *revprops,
+ int prop_len) {
+ printf("Revision-number: %d\n", revision);
+ printf("Prop-content-length: %d\n", prop_len);
+ printf("Content-length: %d\n\n", prop_len);
+ printf("%s\n", revprops);
+}
+
+void dump_export_node(const char *path, enum node_kind kind,
+ enum node_action action, unsigned long text_len,
+ unsigned long copyfrom_rev, const char *copyfrom_path) {
+ printf("Node-path: %s\n", path);
+ printf("Node-kind: ");
+ switch (action) {
+ case NODE_KIND_NORMAL:
+ printf("file\n");
+ break;
+ case NODE_KIND_EXECUTABLE:
+ printf("file\n");
+ break;
+ case NODE_KIND_SYMLINK:
+ printf("file\n");
+ break;
+ case NODE_KIND_GITLINK:
+ printf("file\n");
+ break;
+ case NODE_KIND_SUBDIR:
+ die("Unsupported: subdirectory");
+ default:
+ break;
+ }
+ printf("Node-action: ");
+ switch (action) {
+ case NODE_ACTION_CHANGE:
+ printf("change\n");
+ break;
+ case NODE_ACTION_ADD:
+ printf("add\n");
+ break;
+ case NODE_ACTION_REPLACE:
+ printf("replace\n");
+ break;
+ case NODE_ACTION_DELETE:
+ printf("delete\n");
+ break;
+ default:
+ break;
+ }
+ if (copyfrom_rev != SVN_INVALID_REV) {
+ printf("Node-copyfrom-rev: %lu\n", copyfrom_rev);
+ printf("Node-copyfrom-path: %s\n", copyfrom_path);
+ }
+ printf("Prop-delta: false\n");
+ printf("Prop-content-length: 10\n"); /* Constant 10 for "PROPS-END" */
+ printf("Text-delta: false\n");
+ printf("Text-content-length: %lu\n", text_len);
+ printf("Content-length: %lu\n\n", text_len + 10);
+ printf("PROPS-END\n\n");
+}
+
+void dump_export_text(struct line_buffer *data, off_t len) {
+ buffer_copy_bytes(data, len);
+}
diff --git a/vcs-svn/svnload.c b/vcs-svn/svnload.c
new file mode 100644
index 0000000..7043ae7
--- /dev/null
+++ b/vcs-svn/svnload.c
@@ -0,0 +1,294 @@
+/*
+ * Produce a dumpfile v3 from a fast-import stream.
+ * Load the dump into the SVN repository with:
+ * svnrdump load <URL> <dumpfile
+ *
+ * Licensed under a two-clause BSD-style license.
+ * See LICENSE for details.
+ */
+
+#include "cache.h"
+#include "git-compat-util.h"
+#include "line_buffer.h"
+#include "dump_export.h"
+#include "strbuf.h"
+
+#define SVN_DATE_FORMAT "%Y-%m-%dT%H:%M:%S.000000Z"
+#define SVN_DATE_LEN 28
+#define LENGTH_UNKNOWN (~0)
+
+static struct line_buffer input = LINE_BUFFER_INIT;
+static struct strbuf blobs[100];
+
+static struct {
+ unsigned long prop_len, text_len, copyfrom_rev, mark;
+ int text_delta, prop_delta; /* Boolean */
+ enum node_action action;
+ enum node_kind kind;
+ struct strbuf copyfrom_path, path;
+} node_ctx;
+
+static struct {
+ int rev, text_len;
+ struct strbuf props, log;
+ struct strbuf svn_author, author, committer;
+ struct strbuf author_date, committer_date;
+ struct strbuf author_email, committer_email;
+} rev_ctx;
+
+static enum {
+ UNKNOWN_CTX,
+ COMMIT_CTX,
+ BLOB_CTX
+} active_ctx;
+
+static void reset_rev_ctx(int revision)
+{
+ rev_ctx.rev = revision;
+ strbuf_reset(&rev_ctx.props);
+ strbuf_reset(&rev_ctx.log);
+ strbuf_reset(&rev_ctx.svn_author);
+ strbuf_reset(&rev_ctx.author);
+ strbuf_reset(&rev_ctx.committer);
+ strbuf_reset(&rev_ctx.author_date);
+ strbuf_reset(&rev_ctx.committer_date);
+ strbuf_reset(&rev_ctx.author_email);
+ strbuf_reset(&rev_ctx.committer_email);
+}
+
+static void reset_node_ctx(void)
+{
+ node_ctx.prop_len = LENGTH_UNKNOWN;
+ node_ctx.text_len = LENGTH_UNKNOWN;
+ node_ctx.mark = 0;
+ node_ctx.copyfrom_rev = 0;
+ node_ctx.text_delta = -1;
+ node_ctx.prop_delta = -1;
+ strbuf_reset(&node_ctx.copyfrom_path);
+ strbuf_reset(&node_ctx.path);
+}
+
+static void populate_props(struct strbuf *props, const char *author,
+ const char *log, const char *date) {
+ strbuf_reset(props);
+ strbuf_addf(props, "K\nsvn:author\nV\n%s\n", author);
+ strbuf_addf(props, "K\nsvn:log\nV\n%s", log);
+ strbuf_addf(props, "K\nsvn:date\nV\n%s\n", date);
+ strbuf_add(props, "PROPS-END\n", 10);
+}
+
+static void parse_author_line(char *val, struct strbuf *name,
+ struct strbuf *email, struct strbuf *date) {
+ char *t, *tz_off;
+ char time_buf[SVN_DATE_LEN];
+ const struct tm *tm_time;
+
+ /* Simon Hausmann <shausman@trolltech.com> 1170199019 +0100 */
+ strbuf_reset(name);
+ strbuf_reset(email);
+ strbuf_reset(date);
+ tz_off = strrchr(val, ' ');
+ *tz_off++ = '\0';
+ t = strrchr(val, ' ');
+ *(t - 1) = '\0'; /* Ignore '>' from email */
+ t ++;
+ tm_time = time_to_tm(strtoul(t, NULL, 10), atoi(tz_off));
+ strftime(time_buf, SVN_DATE_LEN, SVN_DATE_FORMAT, tm_time);
+ strbuf_add(date, time_buf, SVN_DATE_LEN);
+ t = strchr(val, '<');
+ *(t - 1) = '\0'; /* Ignore ' <' from email */
+ t ++;
+ strbuf_add(email, t, strlen(t));
+ strbuf_add(name, val, strlen(val));
+}
+
+void svnload_read(void) {
+ char *t, *val;
+ int mode_incr;
+ struct strbuf *to_dump;
+
+ while ((t = buffer_read_line(&input))) {
+ val = strchr(t, ' ');
+ if (!val) {
+ if (!memcmp(t, "blob", 4))
+ active_ctx = BLOB_CTX;
+ else if (!memcmp(t, "deleteall", 9))
+ ;
+ continue;
+ }
+ *val++ = '\0';
+
+ /* strlen(key) */
+ switch (val - t - 1) {
+ case 1:
+ if (!memcmp(t, "D", 1)) {
+ node_ctx.action = NODE_ACTION_DELETE;
+ }
+ else if (!memcmp(t, "C", 1)) {
+ node_ctx.action = NODE_ACTION_ADD;
+ }
+ else if (!memcmp(t, "R", 1)) {
+ node_ctx.action = NODE_ACTION_REPLACE;
+ }
+ else if (!memcmp(t, "M", 1)) {
+ node_ctx.action = NODE_ACTION_CHANGE;
+ mode_incr = 7;
+ if (!memcmp(val, "100644", 6))
+ node_ctx.kind = NODE_KIND_NORMAL;
+ else if (!memcmp(val, "100755", 6))
+ node_ctx.kind = NODE_KIND_EXECUTABLE;
+ else if (!memcmp(val, "120000", 6))
+ node_ctx.kind = NODE_KIND_SYMLINK;
+ else if (!memcmp(val, "160000", 6))
+ node_ctx.kind = NODE_KIND_GITLINK;
+ else if (!memcmp(val, "040000", 6))
+ node_ctx.kind = NODE_KIND_SUBDIR;
+ else {
+ if (!memcmp(val, "755", 3))
+ node_ctx.kind = NODE_KIND_EXECUTABLE;
+ else if(!memcmp(val, "644", 3))
+ node_ctx.kind = NODE_KIND_NORMAL;
+ else
+ die("Unrecognized mode: %s", val);
+ mode_incr = 4;
+ }
+ val += mode_incr;
+ t = strchr(val, ' ');
+ *t++ = '\0';
+ strbuf_reset(&node_ctx.path);
+ strbuf_add(&node_ctx.path, t, strlen(t));
+ if (!memcmp(val + 1, "inline", 6))
+ die("Unsupported dataref: inline");
+ else if (*val == ':')
+ to_dump = &blobs[strtoul(val + 1, NULL, 10)];
+ else
+ die("Unsupported dataref: sha1");
+ dump_export_node(node_ctx.path.buf, node_ctx.kind,
+ node_ctx.action, to_dump->len,
+ 0, NULL);
+ printf("%s", to_dump->buf);
+ }
+ break;
+ case 3:
+ if (!memcmp(t, "tag", 3))
+ continue;
+ break;
+ case 4:
+ if (!memcmp(t, "mark", 4))
+ switch(active_ctx) {
+ case COMMIT_CTX:
+ /* What do we do with commit marks? */
+ continue;
+ case BLOB_CTX:
+ node_ctx.mark = strtoul(val + 1, NULL, 10);
+ break;
+ default:
+ break;
+ }
+ else if (!memcmp(t, "from", 4))
+ continue;
+ else if (!memcmp(t, "data", 4)) {
+ switch (active_ctx) {
+ case COMMIT_CTX:
+ strbuf_reset(&rev_ctx.log);
+ buffer_read_binary(&input,
+ &rev_ctx.log,
+ strtoul(val, NULL, 10));
+ populate_props(&rev_ctx.props,
+ rev_ctx.svn_author.buf,
+ rev_ctx.log.buf,
+ rev_ctx.author_date.buf);
+ dump_export_begin_rev(rev_ctx.rev,
+ rev_ctx.props.buf,
+ rev_ctx.props.len);
+ break;
+ case BLOB_CTX:
+ node_ctx.text_len = strtoul(val, NULL, 10);
+ buffer_read_binary(&input,
+ &blobs[node_ctx.mark],
+ node_ctx.text_len);
+ break;
+ default:
+ break;
+ }
+ }
+ break;
+ case 5:
+ if (!memcmp(t, "reset", 5))
+ continue;
+ if (!memcmp(t, "merge", 5))
+ continue;
+ break;
+ case 6:
+ if (!memcmp(t, "author", 6)) {
+ parse_author_line(val, &rev_ctx.author,
+ &rev_ctx.author_email,
+ &rev_ctx.author_date);
+ /* Build svn_author */
+ t = strchr(rev_ctx.author_email.buf, '@');
+ strbuf_reset(&rev_ctx.svn_author);
+ strbuf_add(&rev_ctx.svn_author,
+ rev_ctx.author_email.buf,
+ t - rev_ctx.author_email.buf);
+
+ }
+ else if (!memcmp(t, "commit", 6)) {
+ rev_ctx.rev ++;
+ active_ctx = COMMIT_CTX;
+ }
+ break;
+ case 9:
+ if (!memcmp(t, "committer", 9))
+ parse_author_line(val, &rev_ctx.committer,
+ &rev_ctx.committer_email,
+ &rev_ctx.committer_date);
+ break;
+ default:
+ break;
+ }
+ }
+}
+
+int svnload_init(const char *filename)
+{
+ int i;
+ if (buffer_init(&input, filename))
+ return error("cannot open %s: %s", filename, strerror(errno));
+ active_ctx = UNKNOWN_CTX;
+ strbuf_init(&rev_ctx.props, MAX_GITSVN_LINE_LEN);
+ strbuf_init(&rev_ctx.log, MAX_GITSVN_LINE_LEN);
+ strbuf_init(&rev_ctx.author, MAX_GITSVN_LINE_LEN);
+ strbuf_init(&rev_ctx.committer, MAX_GITSVN_LINE_LEN);
+ strbuf_init(&rev_ctx.author_date, MAX_GITSVN_LINE_LEN);
+ strbuf_init(&rev_ctx.committer_date, MAX_GITSVN_LINE_LEN);
+ strbuf_init(&rev_ctx.author_email, MAX_GITSVN_LINE_LEN);
+ strbuf_init(&rev_ctx.committer_email, MAX_GITSVN_LINE_LEN);
+ strbuf_init(&node_ctx.path, MAX_GITSVN_LINE_LEN);
+ strbuf_init(&node_ctx.copyfrom_path, MAX_GITSVN_LINE_LEN);
+ for (i = 0; i < 100; i ++)
+ strbuf_init(&blobs[i], 10000);
+ return 0;
+}
+
+void svnload_deinit(void)
+{
+ int i;
+ reset_rev_ctx(0);
+ reset_node_ctx();
+ strbuf_release(&rev_ctx.props);
+ strbuf_release(&rev_ctx.log);
+ strbuf_release(&rev_ctx.author);
+ strbuf_release(&rev_ctx.committer);
+ strbuf_release(&rev_ctx.author_date);
+ strbuf_release(&rev_ctx.committer_date);
+ strbuf_release(&rev_ctx.author_email);
+ strbuf_release(&rev_ctx.committer_email);
+ strbuf_release(&node_ctx.path);
+ strbuf_release(&node_ctx.copyfrom_path);
+ for (i = 0; i < 100; i ++)
+ strbuf_release(&blobs[i]);
+ if (buffer_deinit(&input))
+ fprintf(stderr, "Input error\n");
+ if (ferror(stdout))
+ fprintf(stderr, "Output error\n");
+}
--
1.7.4.rc1.7.g2cf08.dirty
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] Build an svn-fi target in contrib/svn-fe
2011-01-15 6:51 [RFC PATCH 0/3] Towards a Git-to-SVN bridge Ramkumar Ramachandra
2011-01-15 6:51 ` [PATCH 1/3] date: Expose the time_to_tm function Ramkumar Ramachandra
2011-01-15 6:51 ` [PATCH 2/3] vcs-svn: Start working on the dumpfile producer Ramkumar Ramachandra
@ 2011-01-15 6:51 ` Ramkumar Ramachandra
2011-01-15 7:22 ` [RFC PATCH 0/3] Towards a Git-to-SVN bridge Jonathan Nieder
3 siblings, 0 replies; 8+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-15 6:51 UTC (permalink / raw)
To: Git List; +Cc: Jonathan Nieder, David Barr, Sverre Rabbelier
Build an svn-fi target for testing the dumpfile producer in vcs-svn/.
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
contrib/svn-fe/Makefile | 23 +++++++++++++++++++++--
contrib/svn-fe/svn-fi.c | 16 ++++++++++++++++
contrib/svn-fe/svn-fi.txt | 28 ++++++++++++++++++++++++++++
3 files changed, 65 insertions(+), 2 deletions(-)
create mode 100644 contrib/svn-fe/svn-fi.c
create mode 100644 contrib/svn-fe/svn-fi.txt
diff --git a/contrib/svn-fe/Makefile b/contrib/svn-fe/Makefile
index 360d8da..555a8ff 100644
--- a/contrib/svn-fe/Makefile
+++ b/contrib/svn-fe/Makefile
@@ -37,7 +37,7 @@ svn-fe$X: svn-fe.o $(VCSSVN_LIB) $(GIT_LIB)
$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ svn-fe.o \
$(ALL_LDFLAGS) $(LIBS)
-svn-fe.o: svn-fe.c ../../vcs-svn/svndump.h
+svn-fe.o: svn-fe.c ../../vcs-svn/svnload.h
$(QUIET_CC)$(CC) -I../../vcs-svn -o $*.o -c $(ALL_CFLAGS) $<
svn-fe.html: svn-fe.txt
@@ -51,6 +51,24 @@ svn-fe.1: svn-fe.txt
../contrib/svn-fe/$@
$(MV) ../../Documentation/svn-fe.1 .
+svn-fi$X: svn-fi.o $(VCSSVN_LIB) $(GIT_LIB)
+ $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ svn-fi.o \
+ $(ALL_LDFLAGS) $(LIBS)
+
+svn-fi.o: svn-fi.c ../../vcs-svn/svnload.h
+ $(QUIET_CC)$(CC) -I../../vcs-svn -o $*.o -c $(ALL_CFLAGS) $<
+
+svn-fi.html: svn-fi.txt
+ $(QUIET_SUBDIR0)../../Documentation $(QUIET_SUBDIR1) \
+ MAN_TXT=../contrib/svn-fe/svn-fi.txt \
+ ../contrib/svn-fe/$@
+
+svn-fi.1: svn-fi.txt
+ $(QUIET_SUBDIR0)../../Documentation $(QUIET_SUBDIR1) \
+ MAN_TXT=../contrib/svn-fe/svn-fi.txt \
+ ../contrib/svn-fe/$@
+ $(MV) ../../Documentation/svn-fi.1 .
+
../../vcs-svn/lib.a: FORCE
$(QUIET_SUBDIR0)../.. $(QUIET_SUBDIR1) vcs-svn/lib.a
@@ -58,6 +76,7 @@ svn-fe.1: svn-fe.txt
$(QUIET_SUBDIR0)../.. $(QUIET_SUBDIR1) libgit.a
clean:
- $(RM) svn-fe$X svn-fe.o svn-fe.html svn-fe.xml svn-fe.1
+ $(RM) svn-fe$X svn-fe.o svn-fe.html svn-fe.xml svn-fe.1 \
+ svn-fi$X svn-fi.o svn-fi.html svn-fi.xml svn-fi.1
.PHONY: all clean FORCE
diff --git a/contrib/svn-fe/svn-fi.c b/contrib/svn-fe/svn-fi.c
new file mode 100644
index 0000000..81347b0
--- /dev/null
+++ b/contrib/svn-fe/svn-fi.c
@@ -0,0 +1,16 @@
+/*
+ * This file is in the public domain.
+ * You may freely use, modify, distribute, and relicense it.
+ */
+
+#include <stdlib.h>
+#include "svnload.h"
+
+int main(int argc, char **argv)
+{
+ if (svnload_init(NULL))
+ return 1;
+ svnload_read();
+ svnload_deinit();
+ return 0;
+}
diff --git a/contrib/svn-fe/svn-fi.txt b/contrib/svn-fe/svn-fi.txt
new file mode 100644
index 0000000..996a175
--- /dev/null
+++ b/contrib/svn-fe/svn-fi.txt
@@ -0,0 +1,28 @@
+svn-fe(1)
+=========
+
+NAME
+----
+svn-fi - convert fast-import stream to an SVN "dumpfile"
+
+SYNOPSIS
+--------
+[verse]
+svn-fi
+
+DESCRIPTION
+-----------
+
+Converts a git-fast-import(1) stream into a Subversion dumpfile.
+
+INPUT FORMAT
+-------------
+The fast-import format is documented by the git-fast-import(1)
+manual page.
+
+OUTPUT FORMAT
+------------
+Subversion's repository dump format is documented in full in
+`notes/dump-load-format.txt` from the Subversion source tree.
+Files in this format can be generated using the 'svnadmin dump' or
+'svk admin dump' command.
--
1.7.4.rc1.7.g2cf08.dirty
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/3] Towards a Git-to-SVN bridge
2011-01-15 6:51 [RFC PATCH 0/3] Towards a Git-to-SVN bridge Ramkumar Ramachandra
` (2 preceding siblings ...)
2011-01-15 6:51 ` [PATCH 3/3] Build an svn-fi target in contrib/svn-fe Ramkumar Ramachandra
@ 2011-01-15 7:22 ` Jonathan Nieder
2011-01-15 7:43 ` Ramkumar Ramachandra
3 siblings, 1 reply; 8+ messages in thread
From: Jonathan Nieder @ 2011-01-15 7:22 UTC (permalink / raw)
To: Ramkumar Ramachandra; +Cc: Git List, David Barr, Sverre Rabbelier
Hi Ram,
Ramkumar Ramachandra wrote:
> Over the last couple of days, I've been working on a parser that
> converts a fast-import stream into a SVN dumpfile. So far, it's very
> rough and works minimally for some common fast-import
> commands.
Some early questions:
- what are the design goals? Is this meant to be super fast?
Robust? Simple? Why should I be excited about it?[1]
- what subset of fast-import commands is supported? Is it well
enough defined to make a manpage?
- does this produce v2 or v3 dumpfiles?
- why would I use this instead of git2svn? Does git2svn do anything
this will not eventually be able to do? (Not a trick question ---
I don't have enough experience with git2svn to tell its strengths
and weaknesses.)
> I've decided to try re-implementing fast-export
> to eliminate blob marks
Hopefully "re-implement" means "patch" here. :)
I can comment on the code but it's probably better if I have a sense
of the design first (in any event, thanks for sending it).
Regards,
Jonathan
[1] I found the original svn-fe design interesting because
(1) it reused code from an existing svndump parser, at least in
spirit,
(2) the repo_tree data structure was well fitted to the design
constraints,
(3) the line_buffer input abstraction was oddly satisfying, even
though it does not buy anything obvious out of the box over
direct use of strbuf and stdio;
(4) speed; and, most importantly
(5) the command-line interface was easy to debug, very flexible,
and dead simple.
I find the current svn-fe satisfying in a different way --- a sort of
"line by line" translation between dump formats is becoming possible.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] vcs-svn: Start working on the dumpfile producer
2011-01-15 6:51 ` [PATCH 2/3] vcs-svn: Start working on the dumpfile producer Ramkumar Ramachandra
@ 2011-01-15 7:39 ` Peter Baumann
2011-01-15 8:11 ` Ramkumar Ramachandra
0 siblings, 1 reply; 8+ messages in thread
From: Peter Baumann @ 2011-01-15 7:39 UTC (permalink / raw)
To: Ramkumar Ramachandra
Cc: Git List, Jonathan Nieder, David Barr, Sverre Rabbelier
On Sat, Jan 15, 2011 at 12:21:11PM +0530, Ramkumar Ramachandra wrote:
> Start off with some broad design sketches. Compile succeeds, but
> parser is incorrect. Include a Makefile rule to build it into
> vcs-svn/lib.a.
>
> Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
> ---
> Makefile | 2 +-
> vcs-svn/dump_export.c | 73 ++++++++++++
> vcs-svn/svnload.c | 294 +++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 368 insertions(+), 1 deletions(-)
> create mode 100644 vcs-svn/dump_export.c
> create mode 100644 vcs-svn/svnload.c
>
...
> diff --git a/vcs-svn/svnload.c b/vcs-svn/svnload.c
> new file mode 100644
> index 0000000..7043ae7
> --- /dev/null
> +++ b/vcs-svn/svnload.c
> @@ -0,0 +1,294 @@
> +/*
> + * Produce a dumpfile v3 from a fast-import stream.
> + * Load the dump into the SVN repository with:
> + * svnrdump load <URL> <dumpfile
> + *
> + * Licensed under a two-clause BSD-style license.
> + * See LICENSE for details.
> + */
> +
> +#include "cache.h"
> +#include "git-compat-util.h"
> +#include "line_buffer.h"
> +#include "dump_export.h"
> +#include "strbuf.h"
> +
> +#define SVN_DATE_FORMAT "%Y-%m-%dT%H:%M:%S.000000Z"
> +#define SVN_DATE_LEN 28
> +#define LENGTH_UNKNOWN (~0)
> +
> +static struct line_buffer input = LINE_BUFFER_INIT;
> +static struct strbuf blobs[100];
> +
> +static struct {
> + unsigned long prop_len, text_len, copyfrom_rev, mark;
> + int text_delta, prop_delta; /* Boolean */
> + enum node_action action;
> + enum node_kind kind;
> + struct strbuf copyfrom_path, path;
> +} node_ctx;
> +
> +static struct {
> + int rev, text_len;
> + struct strbuf props, log;
> + struct strbuf svn_author, author, committer;
> + struct strbuf author_date, committer_date;
> + struct strbuf author_email, committer_email;
> +} rev_ctx;
> +
> +static enum {
> + UNKNOWN_CTX,
> + COMMIT_CTX,
> + BLOB_CTX
> +} active_ctx;
> +
> +static void reset_rev_ctx(int revision)
> +{
> + rev_ctx.rev = revision;
> + strbuf_reset(&rev_ctx.props);
> + strbuf_reset(&rev_ctx.log);
> + strbuf_reset(&rev_ctx.svn_author);
> + strbuf_reset(&rev_ctx.author);
> + strbuf_reset(&rev_ctx.committer);
> + strbuf_reset(&rev_ctx.author_date);
> + strbuf_reset(&rev_ctx.committer_date);
> + strbuf_reset(&rev_ctx.author_email);
> + strbuf_reset(&rev_ctx.committer_email);
> +}
> +
> +static void reset_node_ctx(void)
> +{
> + node_ctx.prop_len = LENGTH_UNKNOWN;
> + node_ctx.text_len = LENGTH_UNKNOWN;
> + node_ctx.mark = 0;
> + node_ctx.copyfrom_rev = 0;
> + node_ctx.text_delta = -1;
> + node_ctx.prop_delta = -1;
> + strbuf_reset(&node_ctx.copyfrom_path);
> + strbuf_reset(&node_ctx.path);
> +}
> +
> +static void populate_props(struct strbuf *props, const char *author,
> + const char *log, const char *date) {
> + strbuf_reset(props);
> + strbuf_addf(props, "K\nsvn:author\nV\n%s\n", author);
> + strbuf_addf(props, "K\nsvn:log\nV\n%s", log);
> + strbuf_addf(props, "K\nsvn:date\nV\n%s\n", date);
> + strbuf_add(props, "PROPS-END\n", 10);
> +}
> +
> +static void parse_author_line(char *val, struct strbuf *name,
> + struct strbuf *email, struct strbuf *date) {
> + char *t, *tz_off;
> + char time_buf[SVN_DATE_LEN];
> + const struct tm *tm_time;
> +
> + /* Simon Hausmann <shausman@trolltech.com> 1170199019 +0100 */
> + strbuf_reset(name);
> + strbuf_reset(email);
> + strbuf_reset(date);
> + tz_off = strrchr(val, ' ');
> + *tz_off++ = '\0';
> + t = strrchr(val, ' ');
> + *(t - 1) = '\0'; /* Ignore '>' from email */
> + t ++;
> + tm_time = time_to_tm(strtoul(t, NULL, 10), atoi(tz_off));
> + strftime(time_buf, SVN_DATE_LEN, SVN_DATE_FORMAT, tm_time);
> + strbuf_add(date, time_buf, SVN_DATE_LEN);
> + t = strchr(val, '<');
> + *(t - 1) = '\0'; /* Ignore ' <' from email */
> + t ++;
> + strbuf_add(email, t, strlen(t));
> + strbuf_add(name, val, strlen(val));
> +}
> +
> +void svnload_read(void) {
> + char *t, *val;
> + int mode_incr;
> + struct strbuf *to_dump;
> +
> + while ((t = buffer_read_line(&input))) {
> + val = strchr(t, ' ');
> + if (!val) {
> + if (!memcmp(t, "blob", 4))
> + active_ctx = BLOB_CTX;
> + else if (!memcmp(t, "deleteall", 9))
> + ;
> + continue;
Having actually no idea what the input you are reading from might look like, but
seeing those two memcmp compares above makes me wonder if 't' might ever be smaller
than 4 (or 9 for the else part). Which obviously would lead to a SEGFAULT.
In the code below there are also memcmp class which might step out of the
buffer.
> + }
> + *val++ = '\0';
> +
> + /* strlen(key) */
> + switch (val - t - 1) {
> + case 1:
> + if (!memcmp(t, "D", 1)) {
> + node_ctx.action = NODE_ACTION_DELETE;
> + }
> + else if (!memcmp(t, "C", 1)) {
> + node_ctx.action = NODE_ACTION_ADD;
> + }
> + else if (!memcmp(t, "R", 1)) {
> + node_ctx.action = NODE_ACTION_REPLACE;
> + }
> + else if (!memcmp(t, "M", 1)) {
> + node_ctx.action = NODE_ACTION_CHANGE;
> + mode_incr = 7;
> + if (!memcmp(val, "100644", 6))
> + node_ctx.kind = NODE_KIND_NORMAL;
> + else if (!memcmp(val, "100755", 6))
> + node_ctx.kind = NODE_KIND_EXECUTABLE;
> + else if (!memcmp(val, "120000", 6))
> + node_ctx.kind = NODE_KIND_SYMLINK;
> + else if (!memcmp(val, "160000", 6))
> + node_ctx.kind = NODE_KIND_GITLINK;
> + else if (!memcmp(val, "040000", 6))
> + node_ctx.kind = NODE_KIND_SUBDIR;
> + else {
> + if (!memcmp(val, "755", 3))
> + node_ctx.kind = NODE_KIND_EXECUTABLE;
> + else if(!memcmp(val, "644", 3))
> + node_ctx.kind = NODE_KIND_NORMAL;
> + else
> + die("Unrecognized mode: %s", val);
> + mode_incr = 4;
> + }
> + val += mode_incr;
> + t = strchr(val, ' ');
> + *t++ = '\0';
> + strbuf_reset(&node_ctx.path);
> + strbuf_add(&node_ctx.path, t, strlen(t));
> + if (!memcmp(val + 1, "inline", 6))
> + die("Unsupported dataref: inline");
> + else if (*val == ':')
> + to_dump = &blobs[strtoul(val + 1, NULL, 10)];
> + else
> + die("Unsupported dataref: sha1");
> + dump_export_node(node_ctx.path.buf, node_ctx.kind,
> + node_ctx.action, to_dump->len,
> + 0, NULL);
> + printf("%s", to_dump->buf);
> + }
> + break;
> + case 3:
> + if (!memcmp(t, "tag", 3))
> + continue;
> + break;
> + case 4:
> + if (!memcmp(t, "mark", 4))
> + switch(active_ctx) {
> + case COMMIT_CTX:
> + /* What do we do with commit marks? */
> + continue;
> + case BLOB_CTX:
> + node_ctx.mark = strtoul(val + 1, NULL, 10);
> + break;
> + default:
> + break;
> + }
> + else if (!memcmp(t, "from", 4))
> + continue;
> + else if (!memcmp(t, "data", 4)) {
> + switch (active_ctx) {
> + case COMMIT_CTX:
> + strbuf_reset(&rev_ctx.log);
> + buffer_read_binary(&input,
> + &rev_ctx.log,
> + strtoul(val, NULL, 10));
> + populate_props(&rev_ctx.props,
> + rev_ctx.svn_author.buf,
> + rev_ctx.log.buf,
> + rev_ctx.author_date.buf);
> + dump_export_begin_rev(rev_ctx.rev,
> + rev_ctx.props.buf,
> + rev_ctx.props.len);
> + break;
> + case BLOB_CTX:
> + node_ctx.text_len = strtoul(val, NULL, 10);
> + buffer_read_binary(&input,
> + &blobs[node_ctx.mark],
> + node_ctx.text_len);
> + break;
> + default:
> + break;
> + }
> + }
> + break;
> + case 5:
> + if (!memcmp(t, "reset", 5))
> + continue;
> + if (!memcmp(t, "merge", 5))
> + continue;
> + break;
> + case 6:
> + if (!memcmp(t, "author", 6)) {
> + parse_author_line(val, &rev_ctx.author,
> + &rev_ctx.author_email,
> + &rev_ctx.author_date);
> + /* Build svn_author */
> + t = strchr(rev_ctx.author_email.buf, '@');
> + strbuf_reset(&rev_ctx.svn_author);
> + strbuf_add(&rev_ctx.svn_author,
> + rev_ctx.author_email.buf,
> + t - rev_ctx.author_email.buf);
> +
> + }
> + else if (!memcmp(t, "commit", 6)) {
> + rev_ctx.rev ++;
> + active_ctx = COMMIT_CTX;
> + }
> + break;
> + case 9:
> + if (!memcmp(t, "committer", 9))
> + parse_author_line(val, &rev_ctx.committer,
> + &rev_ctx.committer_email,
> + &rev_ctx.committer_date);
> + break;
> + default:
> + break;
> + }
> + }
> +}
> +
> +int svnload_init(const char *filename)
> +{
> + int i;
> + if (buffer_init(&input, filename))
> + return error("cannot open %s: %s", filename, strerror(errno));
> + active_ctx = UNKNOWN_CTX;
> + strbuf_init(&rev_ctx.props, MAX_GITSVN_LINE_LEN);
> + strbuf_init(&rev_ctx.log, MAX_GITSVN_LINE_LEN);
> + strbuf_init(&rev_ctx.author, MAX_GITSVN_LINE_LEN);
> + strbuf_init(&rev_ctx.committer, MAX_GITSVN_LINE_LEN);
> + strbuf_init(&rev_ctx.author_date, MAX_GITSVN_LINE_LEN);
> + strbuf_init(&rev_ctx.committer_date, MAX_GITSVN_LINE_LEN);
> + strbuf_init(&rev_ctx.author_email, MAX_GITSVN_LINE_LEN);
> + strbuf_init(&rev_ctx.committer_email, MAX_GITSVN_LINE_LEN);
> + strbuf_init(&node_ctx.path, MAX_GITSVN_LINE_LEN);
> + strbuf_init(&node_ctx.copyfrom_path, MAX_GITSVN_LINE_LEN);
> + for (i = 0; i < 100; i ++)
> + strbuf_init(&blobs[i], 10000);
> + return 0;
> +}
> +
> +void svnload_deinit(void)
> +{
> + int i;
> + reset_rev_ctx(0);
> + reset_node_ctx();
> + strbuf_release(&rev_ctx.props);
> + strbuf_release(&rev_ctx.log);
> + strbuf_release(&rev_ctx.author);
> + strbuf_release(&rev_ctx.committer);
> + strbuf_release(&rev_ctx.author_date);
> + strbuf_release(&rev_ctx.committer_date);
> + strbuf_release(&rev_ctx.author_email);
> + strbuf_release(&rev_ctx.committer_email);
> + strbuf_release(&node_ctx.path);
> + strbuf_release(&node_ctx.copyfrom_path);
> + for (i = 0; i < 100; i ++)
> + strbuf_release(&blobs[i]);
> + if (buffer_deinit(&input))
> + fprintf(stderr, "Input error\n");
> + if (ferror(stdout))
> + fprintf(stderr, "Output error\n");
> +}
> --
> 1.7.4.rc1.7.g2cf08.dirty
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 0/3] Towards a Git-to-SVN bridge
2011-01-15 7:22 ` [RFC PATCH 0/3] Towards a Git-to-SVN bridge Jonathan Nieder
@ 2011-01-15 7:43 ` Ramkumar Ramachandra
0 siblings, 0 replies; 8+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-15 7:43 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: Git List, David Barr, Sverre Rabbelier
Hi Jonathan,
Jonathan Nieder writes:
> Ramkumar Ramachandra wrote:
>
> > Over the last couple of days, I've been working on a parser that
> > converts a fast-import stream into a SVN dumpfile. So far, it's very
> > rough and works minimally for some common fast-import
> > commands.
>
> Some early questions:
Thanks for raising these questions. People interested in the project
should find this useful.
> - what are the design goals? Is this meant to be super fast?
> Robust? Simple? Why should I be excited about it?[1]
I want it to be a lot like current svn-fe: as you can see, I've
re-used many parsing ideas from it. It has to be atleast as fast as
svnrdump, because I don't want it to bottleneck in the remote helper
pipeline. It has to be simple because it'll give rise to other simple
remote helpers- all the complexity has to be offloaded onto the lower
layers like fast-import/ fast-export, and not onto the developer of
the remote helper.
> - what subset of fast-import commands is supported? Is it well
> enough defined to make a manpage?
Currently, it supports just "commit", "blob", "author", "committer"
and "mark" that appear after a blob. It should support more commands
soon enough- this implementation is just a proof of concept. Also,
Instead of giving it the ability to parse /any/ valid fast-import
stream, I want to simply focus on parsing the stream produced by git
fast-export. That should explain why I'm trying to patch git
fast-export primarily.
> - does this produce v2 or v3 dumpfiles?
This is one issue I haven't thought about fully yet. I'm currently
thinking of generating a non-deltified dumpfile v3 -- something that
svnrdump will accept. Generating deltas might be an unnecessary
overhead- but as you pointed out yesterday, that clearly needs more
thought.
> - why would I use this instead of git2svn? Does git2svn do anything
> this will not eventually be able to do? (Not a trick question ---
> I don't have enough experience with git2svn to tell its strengths
> and weaknesses.)
git2svn persists blobs in-memory. It's written in Perl and it's
slow. I thought we needed something nicer to be used with a remote
helper, and started writing svn-fi.
> > I've decided to try re-implementing fast-export
> > to eliminate blob marks
>
> Hopefully "re-implement" means "patch" here. :)
Yep. Just a big one :)
> I can comment on the code but it's probably better if I have a sense
> of the design first (in any event, thanks for sending it).
I haven't had time to clean up the code. Note that it "just works" at
the moment- yes, it's already very very fast :) However, I'm going to
stall the branch and work on fast-export-inline for now.
-- Ram
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] vcs-svn: Start working on the dumpfile producer
2011-01-15 7:39 ` Peter Baumann
@ 2011-01-15 8:11 ` Ramkumar Ramachandra
0 siblings, 0 replies; 8+ messages in thread
From: Ramkumar Ramachandra @ 2011-01-15 8:11 UTC (permalink / raw)
To: Peter Baumann; +Cc: Git List, Jonathan Nieder, David Barr, Sverre Rabbelier
Hi Peter,
Peter Baumann writes:
> > + while ((t = buffer_read_line(&input))) {
> > + val = strchr(t, ' ');
> > + if (!val) {
> > + if (!memcmp(t, "blob", 4))
> > + active_ctx = BLOB_CTX;
> > + else if (!memcmp(t, "deleteall", 9))
> > + ;
> > + continue;
>
> Having actually no idea what the input you are reading from might look like, but
> seeing those two memcmp compares above makes me wonder if 't' might ever be smaller
> than 4 (or 9 for the else part). Which obviously would lead to a SEGFAULT.
> In the code below there are also memcmp class which might step out of the
> buffer.
Right. Silly mistake on my part. Thanks for pointing it out.
There are probably many more trivial mistakes- I was in a hurry to get
/something/ working, and didn't have a chance to clean up the code.
-- Ram
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-01-15 8:11 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-15 6:51 [RFC PATCH 0/3] Towards a Git-to-SVN bridge Ramkumar Ramachandra
2011-01-15 6:51 ` [PATCH 1/3] date: Expose the time_to_tm function Ramkumar Ramachandra
2011-01-15 6:51 ` [PATCH 2/3] vcs-svn: Start working on the dumpfile producer Ramkumar Ramachandra
2011-01-15 7:39 ` Peter Baumann
2011-01-15 8:11 ` Ramkumar Ramachandra
2011-01-15 6:51 ` [PATCH 3/3] Build an svn-fi target in contrib/svn-fe Ramkumar Ramachandra
2011-01-15 7:22 ` [RFC PATCH 0/3] Towards a Git-to-SVN bridge Jonathan Nieder
2011-01-15 7:43 ` Ramkumar Ramachandra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).