git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Mockup an svndiff version 0 parser
@ 2010-09-04  7:43 Ramkumar Ramachandra
  2010-09-04  7:43 ` [WIP PATCH] vcs-svn: Add an svndiff0 parser Ramkumar Ramachandra
  0 siblings, 1 reply; 4+ messages in thread
From: Ramkumar Ramachandra @ 2010-09-04  7:43 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Jonathan Nieder, David Michael Barr, Sverre Rabbelier

Hi,

I finally rolled up my sleeves and wrote that svndiff0 parser
yesterday. I've tested it with a few source-target combinations, and
it seems to work fine. It's based on `master`, now that `jn/svn-fe`
has graduated. It's not intended for inclusion yet- it's more of a
"here's what I've been upto, and I'd like some feedback".
TODO:
1. malloc, realloc and free calls: I should factor these out to use
the obj_pool library in vcs-svn/
2. line_buffer. Only operations on stdin are currently
buffered. Should we extend the line_buffer library to be more generic?
3. Performance: The applier I've written is VERY naively. I can't help
but think that there must be a better way to do it. I still don't have
a solution though- the main problem is copyfrom_target.
4. Using the context dumpfilev3. The svndiff0 itself has no end
markers and it's difficult to say when it's going to end. Instead of
unconditionally waiting for more data (which is a problem when the
connection breaks), limit the number of bytes to parse using the
Content-Length header in the dumpfile v3.

Thanks.

p.s- I'll be MIA for a few weeks; exams are coming up in a week.

Ramkumar Ramachandra (1):
  vcs-svn: Add an svndiff0 parser

 vcs-svn/Makefile      |    2 +
 vcs-svn/line_buffer.c |    7 +-
 vcs-svn/line_buffer.h |    2 +-
 vcs-svn/svndiff.c     |  240 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndiff.h     |   41 +++++++++
 5 files changed, 288 insertions(+), 4 deletions(-)
 create mode 100644 vcs-svn/Makefile
 create mode 100644 vcs-svn/svndiff.c
 create mode 100644 vcs-svn/svndiff.h

-- 
1.7.2.2.409.gdbb11.dirty

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [WIP PATCH] vcs-svn: Add an svndiff0 parser
  2010-09-04  7:43 [PATCH] Mockup an svndiff version 0 parser Ramkumar Ramachandra
@ 2010-09-04  7:43 ` Ramkumar Ramachandra
  2010-09-04 11:58   ` [PATCH] vcs-svn: Avoid %z in format string David Barr
  0 siblings, 1 reply; 4+ messages in thread
From: Ramkumar Ramachandra @ 2010-09-04  7:43 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Jonathan Nieder, David Michael Barr, Sverre Rabbelier

Add a parser to apply one window from an svndiff version 0 stream to a
given source. svndiff version 0 is used by dumpfile version 3; this is
one step towards getting svn-fe to parse the output produced by
svnrdump. The svndiff format is described in $SVN_TRUNK/notes/svndiff.

Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
---
 vcs-svn/Makefile      |    2 +
 vcs-svn/line_buffer.c |    7 +-
 vcs-svn/line_buffer.h |    2 +-
 vcs-svn/svndiff.c     |  240 +++++++++++++++++++++++++++++++++++++++++++++++++
 vcs-svn/svndiff.h     |   41 +++++++++
 5 files changed, 288 insertions(+), 4 deletions(-)
 create mode 100644 vcs-svn/Makefile
 create mode 100644 vcs-svn/svndiff.c
 create mode 100644 vcs-svn/svndiff.h

diff --git a/vcs-svn/Makefile b/vcs-svn/Makefile
new file mode 100644
index 0000000..62cf093
--- /dev/null
+++ b/vcs-svn/Makefile
@@ -0,0 +1,2 @@
+default: svndiff.c line_buffer.c
+	$(CC) -I../ -g3 -O0 -o svndiff svndiff.c line_buffer.c
diff --git a/vcs-svn/line_buffer.c b/vcs-svn/line_buffer.c
index 1543567..17618c0 100644
--- a/vcs-svn/line_buffer.c
+++ b/vcs-svn/line_buffer.c
@@ -66,15 +66,16 @@ char *buffer_read_string(uint32_t len)
 	return ferror(infile) ? NULL : s;
 }
 
-void buffer_copy_bytes(uint32_t len)
+void buffer_copy_bytes(uint32_t len, FILE *dst)
 {
 	uint32_t in;
+	dst = dst ? dst : stdout;
 	while (len > 0 && !feof(infile) && !ferror(infile)) {
 		in = len < COPY_BUFFER_LEN ? len : COPY_BUFFER_LEN;
 		in = fread(byte_buffer, 1, in, infile);
 		len -= in;
-		fwrite(byte_buffer, 1, in, stdout);
-		if (ferror(stdout)) {
+		fwrite(byte_buffer, 1, in, dst);
+		if (ferror(dst)) {
 			buffer_skip_bytes(len);
 			return;
 		}
diff --git a/vcs-svn/line_buffer.h b/vcs-svn/line_buffer.h
index 9c78ae1..fe67a22 100644
--- a/vcs-svn/line_buffer.h
+++ b/vcs-svn/line_buffer.h
@@ -5,7 +5,7 @@ int buffer_init(const char *filename);
 int buffer_deinit(void);
 char *buffer_read_line(void);
 char *buffer_read_string(uint32_t len);
-void buffer_copy_bytes(uint32_t len);
+void buffer_copy_bytes(uint32_t len, FILE *dst);
 void buffer_skip_bytes(uint32_t len);
 void buffer_reset(void);
 
diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
new file mode 100644
index 0000000..2d66dce
--- /dev/null
+++ b/vcs-svn/svndiff.c
@@ -0,0 +1,241 @@
+#include "git-compat-util.h"
+#include "line_buffer.h"
+#include "svndiff.h"
+
+#define DEBUG 1
+
+#define SVN_DELTA_WINDOW_SIZE 102400
+#define MAX_ENCODED_INT_LEN 10
+#define MAX_INSTRUCTION_LEN (2*MAX_ENCODED_INT_LEN+1)
+#define MAX_INSTRUCTION_SECTION_LEN (SVN_DELTA_WINDOW_SIZE*MAX_INSTRUCTION_LEN)
+static char buf[SVN_DELTA_WINDOW_SIZE];
+
+/* Remove when linking to gitcore */
+void die(const char *err, ...)
+{
+	va_list params;
+	va_start(params, err);
+	vfprintf(stderr, err, params);
+	printf("\n");
+	va_end(params);
+	exit(128);
+}
+
+/* Return the number of bytes read */
+size_t read_one_size(size_t *size)
+{
+	unsigned char c;
+	size_t result, bsize;
+	result = 0;
+	bsize = 0;
+
+	while (1)
+	{
+		fread(&c, 1, 1, stdin);
+		result = (result << 7) | (c & 127);
+		bsize ++;
+		if (!(c & 128))
+			/* No continuation bit */
+			break;
+	}
+	*size = result;
+	return bsize;
+}
+
+/* Return the number of bytes read */
+size_t read_one_instruction(struct svndiff_instruction *op)
+{
+	unsigned char c;
+	size_t action, bsize;
+	bsize = 0;
+
+	/* Read the 1-byte instruction-selector */
+	fread(&c, 1, 1, stdin);
+	bsize ++;
+
+	/* Decode the instruction selector from the two higher order
+	   bits; the remaining 6 bits may contain the length */
+	action = (c >> 6) & 3;
+	if (action >= 3)
+		die("Invalid instruction %d", action);
+
+	op->action_code = (enum svndiff_action)(action);
+
+	/* Attempt to extract the length length from the remaining
+	   bits */
+	op->length = c & 63;
+	if (op->length == 0)
+	{
+		bsize += read_one_size(&(op->length));
+		if (op->length == 0)
+			die("Zero length instruction");
+	}
+	/* Offset is present if action is copyfrom_source or
+	   copyfrom_target */
+	if (action != copyfrom_new)
+		bsize += read_one_size(&(op->offset));
+	return bsize;
+}
+
+size_t read_instructions(struct svndiff_window *window, size_t *ninst)
+{
+	size_t tpos = 0, npos, bsize;
+	struct svndiff_instruction *op;
+	npos = 0;
+	bsize = 0;
+	*ninst = 0;
+	
+	while (bsize < window->ins_len)
+	{
+		++(*ninst);
+		window->ops = realloc(window->ops, (*ninst) * sizeof(*op));
+		op = window->ops + (*ninst) - 1;
+		bsize += read_one_instruction(op);
+
+		if (DEBUG)
+			fprintf(stderr, "Instruction: %d %d %d (%d)\n",
+				op->action_code, op->length, op->offset, bsize);
+
+		if (op == NULL)
+			die("Invalid diff stream: "
+			    "instruction %d cannot be decoded", *ninst);
+		else if (op->length == 0)
+			die("Invalid diff stream: "
+			    "instruction %d has length zero", *ninst);
+		else if (op->length > window->tview_len - tpos)
+			die("Invalid diff stream: "
+			    "instruction %d overflows the target view", *ninst);
+
+		switch (op->action_code)
+		{
+		case copyfrom_source:
+			if (op->length > window->sview_len - op->offset ||
+			    op->offset > window->sview_len)
+				die("Invalid diff stream: "
+				    "[src] instruction %d overflows "
+				    " the source view", *ninst);
+			break;
+		case copyfrom_target:
+			if (op->offset >= tpos)
+				die("Invalid diff stream: "
+				    "[tgt] instruction %d starts "
+				    "beyond the target view position", *ninst);
+			break;
+		case copyfrom_new:
+			if (op->length > window->newdata_len - npos)
+				die("Invalid diff stream: "
+				    "[new] instruction %d overflows "
+				    "the new data section", *ninst);
+			npos += op->length;
+			break;
+		}
+		tpos += op->length;
+	}
+
+	if (tpos != window->tview_len)
+		die("Delta does not fill the target window");
+	if (npos != window->newdata_len)
+		die("Delta does not contain enough new data");
+	return bsize;
+}
+
+size_t read_window_header(struct svndiff_window *window)
+{
+	size_t bsize = 0;
+
+	/* Read five sizes; various offsets and lengths */
+	bsize += read_one_size(&(window->sview_offset));
+	bsize += read_one_size(&(window->sview_len));
+	bsize += read_one_size(&(window->tview_len));
+	bsize += read_one_size(&(window->ins_len));
+	bsize += read_one_size(&(window->newdata_len));
+
+	if (window->tview_len > SVN_DELTA_WINDOW_SIZE ||
+	    window->sview_len > SVN_DELTA_WINDOW_SIZE ||
+	    window->newdata_len > SVN_DELTA_WINDOW_SIZE + MAX_ENCODED_INT_LEN ||
+	    window->ins_len > MAX_INSTRUCTION_SECTION_LEN)
+		die("Svndiff contains a window that's too large");
+
+	/* Check for integer overflow */
+	if (window->ins_len + window->newdata_len < window->ins_len
+	    || window->sview_len + window->tview_len < window->sview_len
+	    || window->sview_offset + window->sview_len < window->sview_offset)
+		die("Svndiff contains corrupt window header");
+
+	if (DEBUG)
+		fprintf(stderr, "Window header: %d %d %d %d %d\n",
+			window->sview_offset, window->sview_len,
+			window->tview_len, window->ins_len, window->newdata_len);
+	return bsize;
+}
+
+void drive_window(struct svndiff_window *window, FILE *src_fd)
+{
+	struct svndiff_instruction *op;
+	size_t ninst;
+	FILE *target_fd;
+	long target_fd_end;
+
+	/* Populate the first five fields of the the window object
+	   with data from the stream */	
+	read_window_header(window);
+
+	/* Read instructions of length ins_len into window->ops
+	   performing memory allocations as necessary */
+	read_instructions(window, &ninst);
+
+	/* The Applier */
+	/* We're now looking at new_data; read ahead only in the
+	   copyfrom_new case */	
+	target_fd = tmpfile();
+	for (op = window->ops; ninst-- > 0; op++) {
+		switch (op->action_code) {
+		case copyfrom_source:
+			fseek(src_fd, op->offset, SEEK_SET);
+			fread(buf, op->length, 1, src_fd);
+			fwrite(buf, op->length, 1, target_fd);
+			break;
+		case copyfrom_target:
+			fseek(target_fd, op->offset, SEEK_SET);
+			fread(buf, op->length, 1, target_fd);
+			fseek(target_fd, 0, SEEK_END);
+			fwrite(buf, op->length, 1, target_fd);
+			break;
+		case copyfrom_new:
+			fseek(target_fd, 0, SEEK_END);
+			buffer_copy_bytes(op->length, target_fd);
+			break;
+		}
+	}
+	free(window->ops);
+	target_fd_end = ftell(target_fd);
+	fseek(target_fd, 0, SEEK_SET);
+	fread(buf, target_fd_end, 1, target_fd);
+	fwrite(buf, target_fd_end, 1, stdout);
+	fclose (target_fd);
+}
+
+int main(int argc, char **argv)
+{
+	int version;
+	struct svndiff_window *window;
+	FILE *src_fd;
+	buffer_init(NULL);
+
+	/* Read off the 4-byte header: "SVN\0" */
+	fread(&buf, 4, 1, stdin);
+	version = atoi(buf + 3);
+	if (version != 0)
+		die("Version %d unsupported", version);
+
+	/* Setup the source to apply windows to */
+	src_fd = fopen(argv[1], "r");
+
+	/* Read and drive the first window */
+	window = malloc(sizeof(*window));
+	drive_window(window, src_fd);
+	free(window);
+
+	buffer_deinit();
+	return 0;
+}
diff --git a/vcs-svn/svndiff.h b/vcs-svn/svndiff.h
new file mode 100644
index 0000000..6d0607f
--- /dev/null
+++ b/vcs-svn/svndiff.h
@@ -0,0 +1,41 @@
+enum svndiff_action {
+    copyfrom_source,
+    copyfrom_target,
+    copyfrom_new
+};
+
+struct svndiff_instruction
+{
+  enum svndiff_action action_code;
+  size_t offset;
+  size_t length;
+};
+
+/* An svndiff_window object describes how to reconstruct a
+ * contiguous section of the target string (the "target view") using a
+ * specified contiguous region of the source string (the "source
+ * view").  It contains a series of instructions which assemble the
+ * new target string text by pulling together substrings from:
+ *
+ *   - the source view,
+ *
+ *   - the previously constructed portion of the target view,
+ *
+ *   - a string of new data contained within the window structure
+ *
+ * The source view must always slide forward from one window to the
+ * next; that is, neither the beginning nor the end of the source view
+ * may move to the left as we read from a window stream.  This
+ * property allows us to apply deltas to non-seekable source streams
+ * without making a full copy of the source stream.
+ */
+struct svndiff_window
+{
+  size_t sview_offset;
+  size_t sview_len;
+  size_t tview_len;
+  size_t ins_len;
+  size_t newdata_len;
+  struct svndiff_instruction *ops;
+  char *newdata;
+};
-- 
1.7.2.2.409.gdbb11.dirty

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH] vcs-svn: Avoid %z in format string
  2010-09-04  7:43 ` [WIP PATCH] vcs-svn: Add an svndiff0 parser Ramkumar Ramachandra
@ 2010-09-04 11:58   ` David Barr
  2010-09-04 12:20     ` Ramkumar Ramachandra
  0 siblings, 1 reply; 4+ messages in thread
From: David Barr @ 2010-09-04 11:58 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Ramkumar Ramachandra, Jonathan Nieder, Sverre Rabbelier,
	David Barr

In the spirit of v1.6.4-rc0~124 (MinGW: Fix compiler warning in
merge-recursive, 2009-05-23), use a 64-bit integer instead.
---
 vcs-svn/svndiff.c |   41 +++++++++++++++++++++++++----------------
 1 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/vcs-svn/svndiff.c b/vcs-svn/svndiff.c
index 901fc1a..12b7459 100644
--- a/vcs-svn/svndiff.c
+++ b/vcs-svn/svndiff.c
@@ -57,7 +57,7 @@ size_t read_one_instruction(struct svndiff_instruction *op)
 	   bits; the remaining 6 bits may contain the length */
 	action = (c >> 6) & 3;
 	if (action >= 3)
-		die("Invalid instruction %d", action);
+		die("Invalid instruction %"PRIu64, (uint64_t) action);
 
 	op->action_code = (enum svndiff_action)(action);
 
@@ -93,39 +93,44 @@ size_t read_instructions(struct svndiff_window *window, size_t *ninst)
 		bsize += read_one_instruction(op);
 
 		if (DEBUG)
-			fprintf(stderr, "Instruction: %d %d %d (%d)\n",
-				op->action_code, op->length, op->offset, bsize);
+			fprintf(stderr,
+				"Instruction: %"PRIu64" %"PRIu64" %"PRIu64" (%"PRIu64")\n",
+				(uint64_t) op->action_code,
+				(uint64_t) op->length,
+				(uint64_t) op->offset,
+				(uint64_t) bsize);
 
 		if (op == NULL)
 			die("Invalid diff stream: "
-			    "instruction %d cannot be decoded", *ninst);
+				"instruction %"PRIu64" cannot be decoded", (uint64_t) *ninst);
 		else if (op->length == 0)
 			die("Invalid diff stream: "
-			    "instruction %d has length zero", *ninst);
+				"instruction %"PRIu64" has length zero", (uint64_t) *ninst);
 		else if (op->length > window->tview_len - tpos)
 			die("Invalid diff stream: "
-			    "instruction %d overflows the target view", *ninst);
+				"instruction %"PRIu64" overflows the target view",
+			(uint64_t) *ninst);
 
 		switch (op->action_code)
 		{
 		case copyfrom_source:
 			if (op->length > window->sview_len - op->offset ||
-			    op->offset > window->sview_len)
+				op->offset > window->sview_len)
 				die("Invalid diff stream: "
-				    "[src] instruction %d overflows "
-				    " the source view", *ninst);
+					"[src] instruction %"PRIu64" overflows "
+					" the source view", (uint64_t) *ninst);
 			break;
 		case copyfrom_target:
 			if (op->offset >= tpos)
 				die("Invalid diff stream: "
-				    "[tgt] instruction %d starts "
-				    "beyond the target view position", *ninst);
+					"[tgt] instruction %"PRIu64" starts "
+					"beyond the target view position", (uint64_t) *ninst);
 			break;
 		case copyfrom_new:
 			if (op->length > window->newdata_len - npos)
 				die("Invalid diff stream: "
-				    "[new] instruction %d overflows "
-				    "the new data section", *ninst);
+					"[new] instruction %"PRIu64" overflows "
+					"the new data section", (uint64_t) *ninst);
 			npos += op->length;
 			break;
 		}
@@ -163,9 +168,13 @@ size_t read_window_header(struct svndiff_window *window)
 		die("Svndiff contains corrupt window header");
 
 	if (DEBUG)
-		fprintf(stderr, "Window header: %d %d %d %d %d\n",
-			window->sview_offset, window->sview_len,
-			window->tview_len, window->ins_len, window->newdata_len);
+		fprintf(stderr,
+			"Window header: %"PRIu64" %"PRIu64" %"PRIu64" %"PRIu64" %"PRIu64"\n",
+			(uint64_t) window->sview_offset,
+			(uint64_t) window->sview_len,
+			(uint64_t) window->tview_len,
+			(uint64_t) window->ins_len,
+			(uint64_t) window->newdata_len);
 	return bsize;
 }
 
-- 
1.7.2.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] vcs-svn: Avoid %z in format string
  2010-09-04 11:58   ` [PATCH] vcs-svn: Avoid %z in format string David Barr
@ 2010-09-04 12:20     ` Ramkumar Ramachandra
  0 siblings, 0 replies; 4+ messages in thread
From: Ramkumar Ramachandra @ 2010-09-04 12:20 UTC (permalink / raw)
  To: David Barr; +Cc: Git Mailing List, Jonathan Nieder, Sverre Rabbelier

Hi David,

David Barr writes:
> In the spirit of v1.6.4-rc0~124 (MinGW: Fix compiler warning in
> merge-recursive, 2009-05-23), use a 64-bit integer instead.

Squashed. Thanks much :)

-- Ram

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-09-04 12:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-04  7:43 [PATCH] Mockup an svndiff version 0 parser Ramkumar Ramachandra
2010-09-04  7:43 ` [WIP PATCH] vcs-svn: Add an svndiff0 parser Ramkumar Ramachandra
2010-09-04 11:58   ` [PATCH] vcs-svn: Avoid %z in format string David Barr
2010-09-04 12:20     ` Ramkumar Ramachandra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).