git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH/RFC] Add support for subversion dump format v3
@ 2010-10-12 13:40 David Barr
  2010-10-12 13:40 ` [PATCH 1/7] Teach fast-import to print the id of each imported commit David Barr
  2010-10-12 13:40 ` [PATCH 2/7] fast-import: Let importers retrieve the objects being written David Barr
  0 siblings, 2 replies; 8+ messages in thread
From: David Barr @ 2010-10-12 13:40 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier

This series follows Jonathan Nieder's svn diff applier series.

Patches 1 to 4 add the required infrastructure to fast-import.
The primary feature is the addition of the cat-file command to
fast-import. This allows access to objects written to the
the current pack prior to a checkpoint and is critical to
retrieving full-texts to drive the diff applier.

Patch 5 adds the basic parsing necessary to process the v3 format.

Patch 6 adds logic around decoding prop deltas.

Patch 7 integrates svn-fe with svn-da to decode text deltas.
It was primarily authored by Jonathan but inspired by Ram and
completed by myself.
This is probably the most sprawling of the series and likely the
heaviest target of review.


 Documentation/git-fast-import.txt |   49 +++++++++
 contrib/svn-fe/svn-fe.txt         |    6 +-
 fast-import.c                     |  147 +++++++++++++++++++++++++-
 t/t9010-svn-fe.sh                 |    6 +-
 t/t9300-fast-import.sh            |  210 +++++++++++++++++++++++++++++++++++++
 test-svn-fe.c                     |    3 +-
 vcs-svn/fast_export.c             |  187 ++++++++++++++++++++++++++++++++-
 vcs-svn/fast_export.h             |   13 ++-
 vcs-svn/repo_tree.c               |   37 +++++++-
 vcs-svn/repo_tree.h               |    3 +
 vcs-svn/svndiff.c                 |   11 +--
 vcs-svn/svndiff.h                 |    3 +-
 vcs-svn/svndump.c                 |   99 ++++++++++++++++--
 13 files changed, 739 insertions(+), 35 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/7] Teach fast-import to print the id of each imported commit
  2010-10-12 13:40 [PATCH/RFC] Add support for subversion dump format v3 David Barr
@ 2010-10-12 13:40 ` David Barr
  2010-10-12 13:40 ` [PATCH 2/7] fast-import: Let importers retrieve the objects being written David Barr
  1 sibling, 0 replies; 8+ messages in thread
From: David Barr @ 2010-10-12 13:40 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce

From: Jonathan Nieder <jrnieder@gmail.com>

For the svn importer, it would be useful to build a map from
subversion revision numbers to git commits as the import takes place.
This is particularly relevant because the subversion api sometimes
represents as "copy this directory from this revision", and the
importer needs to be able to access the corresponding trees.  So
(optionally) print each commit id when the corresponding object is
written.

Unfortunately when each commit object is written, it is not yet
accessible to the caller until a checkpoint has finished.  A later
patch will teach fast-import to directly pass on the relevant data on
request, using the same channel.

Cc: Shawn O. Pearce <spearce@spearce.org>
Cc: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/git-fast-import.txt |   13 +++++++++++++
 fast-import.c                     |   18 ++++++++++++++++++
 t/t9300-fast-import.sh            |   36 ++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 966ba4f..e217635 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -92,6 +92,18 @@ OPTIONS
 	--(no-)-relative-marks= with the --(import|export)-marks=
 	options.
 
+--report-fd=<fd>::
+	Print the 40-character object name for each commit to
+	the specified file descriptor before writing it to the
+	pack.  This information may be useful if the importer
+	needs to maintain a map from revisions in the source
+	repository to commit ids in the target repository
+	during the import.
++
+The described objects are not necessarily accessible
+using standard git plumbing tools until a little while
+after the next checkpoint.
+
 --export-pack-edges=<file>::
 	After creating a packfile, print a line of data to
 	<file> listing the filename of the packfile and the last
@@ -896,6 +908,7 @@ The following features are currently supported:
 * date-format
 * import-marks
 * export-marks
+* report-fd
 * relative-marks
 * no-relative-marks
 * force
diff --git a/fast-import.c b/fast-import.c
index 2317b0f..ef0cee7 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -361,6 +361,9 @@ static uintmax_t next_mark;
 static struct strbuf new_data = STRBUF_INIT;
 static int seen_data_command;
 
+/* Where to report commits */
+static int report_fd = -1;
+
 static void parse_argv(void);
 
 static void write_branch_report(FILE *rpt, struct branch *b)
@@ -2571,6 +2574,11 @@ static void parse_new_commit(void)
 
 	if (!store_object(OBJ_COMMIT, &new_data, NULL, b->sha1, next_mark))
 		b->pack_id = pack_id;
+	if (report_fd != -1) {
+		char *buf = sha1_to_hex(b->sha1);
+		buf[40] = '\n';
+		write_or_die(report_fd, buf, 41);
+	}
 	b->last_commit = object_count_by_type[OBJ_COMMIT];
 }
 
@@ -2755,6 +2763,14 @@ static void option_export_marks(const char *marks)
 	safe_create_leading_directories_const(export_marks_file);
 }
 
+static void option_report_fd(const char *fd)
+{
+	unsigned long n = strtoul(fd, NULL, 0);
+	if (n > (unsigned long) INT_MAX)
+		die("--report-fd cannot exceed %d", INT_MAX);
+	report_fd = (int) n;
+}
+
 static void option_export_pack_edges(const char *edges)
 {
 	if (pack_edges)
@@ -2808,6 +2824,8 @@ static int parse_one_feature(const char *feature, int from_stream)
 		option_import_marks(feature + 13, from_stream);
 	} else if (!prefixcmp(feature, "export-marks=")) {
 		option_export_marks(feature + 13);
+	} else if (!prefixcmp(feature, "report-fd=")) {
+		option_report_fd(feature + strlen("report-fd="));
 	} else if (!prefixcmp(feature, "relative-marks")) {
 		relative_marks_paths = 1;
 	} else if (!prefixcmp(feature, "no-relative-marks")) {
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 7c05920..4371308 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1611,6 +1611,42 @@ test_expect_success 'R: feature no-relative-marks should be honoured' '
     test_cmp marks.new non-relative.out
 '
 
+test_expect_success 'setup: have pipes?' '
+	test_when_finished "rm -f frob" &&
+	if mkfifo frob
+	then
+		test_set_prereq PIPE
+	fi
+'
+
+test_expect_success PIPE 'R: fast-import --report-fd' '
+	cat >frontend <<-\FRONTEND_END &&
+	#!/bin/sh
+	cat <<EOF &&
+	commit refs/heads/printed
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	to be printed
+	COMMIT
+
+	from refs/heads/master
+	D file3
+
+	EOF
+
+	read cid <&3 &&
+	echo "$cid" >received
+	EOF
+	FRONTEND_END
+
+	mkfifo commits &&
+	test_when_finished "rm -f commits" &&
+	sh frontend 3<commits |
+	git fast-import --report-fd=3 3>commits &&
+	git rev-parse --verify printed >real &&
+	test_cmp real received
+'
+
 cat >input << EOF
 option git quiet
 blob
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/7] fast-import: Let importers retrieve the objects being written
  2010-10-12 13:40 [PATCH/RFC] Add support for subversion dump format v3 David Barr
  2010-10-12 13:40 ` [PATCH 1/7] Teach fast-import to print the id of each imported commit David Barr
@ 2010-10-12 13:40 ` David Barr
  1 sibling, 0 replies; 8+ messages in thread
From: David Barr @ 2010-10-12 13:40 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, David Barr

From: Jonathan Nieder <jrnieder@gmail.com>

As the description of the "progress" option in git-fast-import.1
hints, there is no convenient way to immediately access the objects
written to a new repository through fast-import.  Until a checkpoint
has been started and finishes writing the pack index, any new blobs,
trees, and commits will not be accessible using standard git tools.

So introduce another way: a "cat" command introduced in the command
stream requests for fast-import to print an object to the same
report-fd stream used to report commits being written.

The output uses the same format as "git cat-file --batch".

Like cat-file --batch, this does not provide an option to dereference
objects to a type of the requestor's choosing.  Tags are presented
as tags, commits as commits, and trees as trees.

Objects can be specified by path within a tree as well, using a

 cat TREE "PATH"

syntax.  With this syntax, also, the tree can only be specified by
:n marker or 40-digit tree id.

Cc: Shawn O. Pearce <spearce@spearce.org>
Cc: David Barr <david.barr@cordelta.com>
Cc: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/git-fast-import.txt |   34 +++++++++++-
 fast-import.c                     |  108 +++++++++++++++++++++++++++++++++++++
 t/t9300-fast-import.sh            |  102 ++++++++++++++++++++++++++++++++++
 3 files changed, 243 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index e217635..2cf48f5 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -102,7 +102,8 @@ OPTIONS
 +
 The described objects are not necessarily accessible
 using standard git plumbing tools until a little while
-after the next checkpoint.
+after the next checkpoint.  To request access to the
+objects before then, use `cat` lines in the command stream.
 
 --export-pack-edges=<file>::
 	After creating a packfile, print a line of data to
@@ -332,6 +333,10 @@ and control the current import process.  More detailed discussion
 	standard output.  This command is optional and is not needed
 	to perform an import.
 
+`cat`::
+	Causes fast-import to print an object in 'cat-file --batch'
+	format to the file descriptor set with "feature report-fd".
+
 `feature`::
 	Require that fast-import supports the specified feature, or
 	abort if it does not.
@@ -888,6 +893,33 @@ Placing a `progress` command immediately after a `checkpoint` will
 inform the reader when the `checkpoint` has been completed and it
 can safely access the refs that fast-import updated.
 
+`cat`
+~~~~~
+Causes fast-import to print an object to a file descriptor
+previously arranged with the `--report-fd` option.  The command
+otherwise has no impact on the current import; its main purpose is to
+retrieve objects that may be in fast-import's memory but not
+accessible from the target repository a little quicker than by the
+method suggested by the description of the `progress` option.
+
+....
+	'cat' SP <dataref> LF
+....
+
+The `<dataref>` can be either a mark reference (`:<idnum>`)
+set previously, or a full 40-byte SHA-1 of any Git object,
+preexisting or ready to be written.
+
+If `<dataref>` refers to a tree object, it may be followed by
+a path within that tree to retrieve a subtree or blob.
+
+....
+	'cat' SP <treeref> SP <path> LF
+....
+
+A `<path>` string should be surrounded with quotation marks and
+use C-style escaping.
+
 `feature`
 ~~~~~~~~~
 Require that fast-import supports the specified feature, or abort if
diff --git a/fast-import.c b/fast-import.c
index ef0cee7..099f63e 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -55,6 +55,9 @@ Format of STDIN stream:
     ('from' sp committish lf)?
     lf?;
 
+  cat_request ::= 'cat' sp (hexsha1 | idnum) lf
+    | 'cat' sp hexsha1 sp path_str lf;
+
   checkpoint ::= 'checkpoint' lf
     lf?;
 
@@ -2688,6 +2691,109 @@ static void parse_reset_branch(void)
 		unread_command_buf = 1;
 }
 
+static void quoted_path_sha1(unsigned char sha1[20], struct tree_entry *root,
+				const char *path, const char *line)
+{
+	struct strbuf uq = STRBUF_INIT;
+	struct tree_entry leaf = {0};
+	const char *x;
+
+	if (unquote_c_style(&uq, path, &x))
+		die("Invalid path: %s", line);
+	if (*x)
+		die("Garbage after path: %s", line);
+	tree_content_get(root, uq.buf, &leaf);
+	if (!leaf.versions[1].mode)
+		die("Path %s not in branch", uq.buf);
+	hashcpy(sha1, leaf.versions[1].sha1);
+}
+
+static void sendreport(const char *buf, unsigned long size)
+{
+	if (write_in_full(report_fd, buf, size) != size)
+		die_errno("Write to frontend failed");
+}
+
+static void cat_object(struct object_entry *oe, unsigned char sha1[20])
+{
+	struct strbuf line = STRBUF_INIT;
+	unsigned long size;
+	enum object_type type = 0;
+	char *buf;
+
+	if (report_fd < 0)
+		die("Internal error: bad report_fd %d", report_fd);
+	if (oe && oe->pack_id != MAX_PACK_ID) {
+		type = oe->type;
+		buf = gfi_unpack_entry(oe, &size);
+	} else {
+		buf = read_sha1_file(sha1, &type, &size);
+	}
+	if (!buf)
+		die("Can't read object %s", sha1_to_hex(sha1));
+
+	/*
+	 * Output based on batch_one_object() from cat-file.c.
+	 */
+	if (type <= 0) {
+		strbuf_reset(&line);
+		strbuf_addf(&line, "%s missing\n", sha1_to_hex(sha1));
+		if (write_in_full(report_fd, line.buf, line.len) != line.len)
+			die_errno("Write to frontend failed 1");
+	}
+	strbuf_reset(&line);
+	strbuf_addf(&line, "%s %s %lu\n", sha1_to_hex(sha1),
+						typename(type), size);
+	sendreport(line.buf, line.len);
+	sendreport(buf, size);
+	sendreport("\n", 1);
+	free(buf);
+}
+
+
+static void parse_cat_request(void)
+{
+	const char *p;
+	struct object_entry *oe = oe;
+	unsigned char sha1[20];
+	struct tree_entry root = {0};
+
+	/* cat SP <object> */
+	p = command_buf.buf + strlen("cat ");
+	if (report_fd < 0)
+		die("The cat command requires the report-fd feature.");
+	if (*p == ':') {
+		char *x;
+		oe = find_mark(strtoumax(p + 1, &x, 10));
+		if (x == p + 1)
+			die("Invalid mark: %s", command_buf.buf);
+		if (!oe)
+			die("Unknown mark: %s", command_buf.buf);
+		p = x;
+		hashcpy(sha1, oe->idx.sha1);
+	} else {
+		if (get_sha1_hex(p, sha1))
+			die("Invalid SHA1: %s", command_buf.buf);
+		p += 40;
+		if (!*p)
+			oe = find_object(sha1);
+	}
+
+	/* [ SP "<path>" ] */
+	if (*p) {
+		if (*p++ != ' ')
+			die("Missing space after SHA1: %s", command_buf.buf);
+
+		/* cat <tree> "<path>" form. */
+		hashcpy(root.versions[1].sha1, sha1);
+		load_tree(&root);
+		quoted_path_sha1(sha1, &root, p, command_buf.buf);
+		oe = find_object(sha1);
+	}
+
+	cat_object(oe, sha1);
+}
+
 static void parse_checkpoint(void)
 {
 	if (object_count) {
@@ -2971,6 +3077,8 @@ int main(int argc, const char **argv)
 			parse_new_tag();
 		else if (!prefixcmp(command_buf.buf, "reset "))
 			parse_reset_branch();
+		else if (!prefixcmp(command_buf.buf, "cat "))
+			parse_cat_request();
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!prefixcmp(command_buf.buf, "progress "))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 4371308..8155b85 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1647,6 +1647,108 @@ test_expect_success PIPE 'R: fast-import --report-fd' '
 	test_cmp real received
 '
 
+test_expect_success PIPE 'R: report-fd: can feed back printed tree' '
+	cat >frontend <<-\FRONTEND_END &&
+	#!/bin/sh
+	cat <<EOF &&
+	commit refs/heads/printed
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	to be printed
+	COMMIT
+
+	from refs/heads/master
+	D file3
+
+	EOF
+
+	read commit_id <&3 &&
+	echo "$commit_id" >printed &&
+	echo "$commit_id commit" >expect.response &&
+	echo "cat $commit_id" &&
+	read cid2 type size <&3 &&
+	echo "$cid2 $type" >response &&
+	dd if=/dev/stdin of=commit bs=1 count=$size <&3 &&
+	read newline <&3 &&
+	read tree tree_id <commit &&
+
+	cat <<EOF &&
+	commit refs/heads/printed
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	to be printed
+	COMMIT
+
+	from refs/heads/printed^0
+	M 040000 $tree_id old
+
+	EOF
+	read cid <&3
+	FRONTEND_END
+
+	mkfifo commits &&
+	test_when_finished "rm -f commits" &&
+	sh frontend 3<commits |
+	git fast-import --report-fd=3 3>commits &&
+	git rev-parse printed^ >expect.printed &&
+	git cat-file commit printed^ >expect.commit &&
+
+	test_cmp expect.printed printed &&
+	test_cmp expect.response response &&
+	test_cmp expect.commit commit
+'
+
+test_expect_success PIPE 'R: report-fd: can feed back printed blob' '
+	file6_id=$(echo "$file6_data" | git hash-object --stdin) &&
+	cat >expect <<-EOF &&
+	:100755 100644 $file6_id $file6_id C100	newdir/exec.sh	file6
+	EOF
+
+	cat >frontend <<-\FRONTEND_END &&
+	#!/bin/sh
+
+	branch=$(git rev-parse --verify refs/heads/branch) &&
+	cat <<EOF &&
+	feature report-fd=3
+	cat $branch
+	EOF
+
+	read commit_id type size <&3 &&
+	dd if=/dev/stdin of=commit bs=1 count=$size <&3 &&
+	read newline <&3 &&
+	read tree tree_id <commit &&
+
+	echo "cat $tree_id \"newdir/exec.sh\"" &&
+	read blob_id type size <&3 &&
+	dd if=/dev/stdin of=blob bs=1 count=$size <&3 &&
+	read newline <&3 &&
+
+	cat <<EOF &&
+	commit refs/heads/copyblob
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	copy file6 to top level
+	COMMIT
+
+	from refs/heads/branch^0
+	M 644 inline "file6"
+	data $size
+	EOF
+	cat blob &&
+	echo &&
+	echo &&
+
+	read cid <&3
+	FRONTEND_END
+
+	mkfifo commits &&
+	test_when_finished "rm -f commits" &&
+	sh frontend 3<commits |
+	git fast-import 3>commits &&
+	git diff-tree -C --find-copies-harder -r copyblob^ copyblob >actual &&
+	compare_diff_raw expect actual
+'
+
 cat >input << EOF
 option git quiet
 blob
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/7] fast-import: Let importers retrieve the objects being written
  2010-10-12 13:50 [PATCH/RFC] Add support for subversion dump format v3 David Barr
@ 2010-10-12 13:50 ` David Barr
  2010-10-12 22:38   ` Jonathan Nieder
  0 siblings, 1 reply; 8+ messages in thread
From: David Barr @ 2010-10-12 13:50 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Jonathan Nieder, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce, David Barr

From: Jonathan Nieder <jrnieder@gmail.com>

As the description of the "progress" option in git-fast-import.1
hints, there is no convenient way to immediately access the objects
written to a new repository through fast-import.  Until a checkpoint
has been started and finishes writing the pack index, any new blobs,
trees, and commits will not be accessible using standard git tools.

So introduce another way: a "cat" command introduced in the command
stream requests for fast-import to print an object to the same
report-fd stream used to report commits being written.

The output uses the same format as "git cat-file --batch".

Like cat-file --batch, this does not provide an option to dereference
objects to a type of the requestor's choosing.  Tags are presented
as tags, commits as commits, and trees as trees.

Objects can be specified by path within a tree as well, using a

 cat TREE "PATH"

syntax.  With this syntax, also, the tree can only be specified by
:n marker or 40-digit tree id.

Cc: Shawn O. Pearce <spearce@spearce.org>
Cc: David Barr <david.barr@cordelta.com>
Cc: Ramkumar Ramachandra <artagnon@gmail.com>
Helped-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/git-fast-import.txt |   34 +++++++++++-
 fast-import.c                     |  108 +++++++++++++++++++++++++++++++++++++
 t/t9300-fast-import.sh            |  102 ++++++++++++++++++++++++++++++++++
 3 files changed, 243 insertions(+), 1 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index e217635..2cf48f5 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -102,7 +102,8 @@ OPTIONS
 +
 The described objects are not necessarily accessible
 using standard git plumbing tools until a little while
-after the next checkpoint.
+after the next checkpoint.  To request access to the
+objects before then, use `cat` lines in the command stream.
 
 --export-pack-edges=<file>::
 	After creating a packfile, print a line of data to
@@ -332,6 +333,10 @@ and control the current import process.  More detailed discussion
 	standard output.  This command is optional and is not needed
 	to perform an import.
 
+`cat`::
+	Causes fast-import to print an object in 'cat-file --batch'
+	format to the file descriptor set with "feature report-fd".
+
 `feature`::
 	Require that fast-import supports the specified feature, or
 	abort if it does not.
@@ -888,6 +893,33 @@ Placing a `progress` command immediately after a `checkpoint` will
 inform the reader when the `checkpoint` has been completed and it
 can safely access the refs that fast-import updated.
 
+`cat`
+~~~~~
+Causes fast-import to print an object to a file descriptor
+previously arranged with the `--report-fd` option.  The command
+otherwise has no impact on the current import; its main purpose is to
+retrieve objects that may be in fast-import's memory but not
+accessible from the target repository a little quicker than by the
+method suggested by the description of the `progress` option.
+
+....
+	'cat' SP <dataref> LF
+....
+
+The `<dataref>` can be either a mark reference (`:<idnum>`)
+set previously, or a full 40-byte SHA-1 of any Git object,
+preexisting or ready to be written.
+
+If `<dataref>` refers to a tree object, it may be followed by
+a path within that tree to retrieve a subtree or blob.
+
+....
+	'cat' SP <treeref> SP <path> LF
+....
+
+A `<path>` string should be surrounded with quotation marks and
+use C-style escaping.
+
 `feature`
 ~~~~~~~~~
 Require that fast-import supports the specified feature, or abort if
diff --git a/fast-import.c b/fast-import.c
index ef0cee7..099f63e 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -55,6 +55,9 @@ Format of STDIN stream:
     ('from' sp committish lf)?
     lf?;
 
+  cat_request ::= 'cat' sp (hexsha1 | idnum) lf
+    | 'cat' sp hexsha1 sp path_str lf;
+
   checkpoint ::= 'checkpoint' lf
     lf?;
 
@@ -2688,6 +2691,109 @@ static void parse_reset_branch(void)
 		unread_command_buf = 1;
 }
 
+static void quoted_path_sha1(unsigned char sha1[20], struct tree_entry *root,
+				const char *path, const char *line)
+{
+	struct strbuf uq = STRBUF_INIT;
+	struct tree_entry leaf = {0};
+	const char *x;
+
+	if (unquote_c_style(&uq, path, &x))
+		die("Invalid path: %s", line);
+	if (*x)
+		die("Garbage after path: %s", line);
+	tree_content_get(root, uq.buf, &leaf);
+	if (!leaf.versions[1].mode)
+		die("Path %s not in branch", uq.buf);
+	hashcpy(sha1, leaf.versions[1].sha1);
+}
+
+static void sendreport(const char *buf, unsigned long size)
+{
+	if (write_in_full(report_fd, buf, size) != size)
+		die_errno("Write to frontend failed");
+}
+
+static void cat_object(struct object_entry *oe, unsigned char sha1[20])
+{
+	struct strbuf line = STRBUF_INIT;
+	unsigned long size;
+	enum object_type type = 0;
+	char *buf;
+
+	if (report_fd < 0)
+		die("Internal error: bad report_fd %d", report_fd);
+	if (oe && oe->pack_id != MAX_PACK_ID) {
+		type = oe->type;
+		buf = gfi_unpack_entry(oe, &size);
+	} else {
+		buf = read_sha1_file(sha1, &type, &size);
+	}
+	if (!buf)
+		die("Can't read object %s", sha1_to_hex(sha1));
+
+	/*
+	 * Output based on batch_one_object() from cat-file.c.
+	 */
+	if (type <= 0) {
+		strbuf_reset(&line);
+		strbuf_addf(&line, "%s missing\n", sha1_to_hex(sha1));
+		if (write_in_full(report_fd, line.buf, line.len) != line.len)
+			die_errno("Write to frontend failed 1");
+	}
+	strbuf_reset(&line);
+	strbuf_addf(&line, "%s %s %lu\n", sha1_to_hex(sha1),
+						typename(type), size);
+	sendreport(line.buf, line.len);
+	sendreport(buf, size);
+	sendreport("\n", 1);
+	free(buf);
+}
+
+
+static void parse_cat_request(void)
+{
+	const char *p;
+	struct object_entry *oe = oe;
+	unsigned char sha1[20];
+	struct tree_entry root = {0};
+
+	/* cat SP <object> */
+	p = command_buf.buf + strlen("cat ");
+	if (report_fd < 0)
+		die("The cat command requires the report-fd feature.");
+	if (*p == ':') {
+		char *x;
+		oe = find_mark(strtoumax(p + 1, &x, 10));
+		if (x == p + 1)
+			die("Invalid mark: %s", command_buf.buf);
+		if (!oe)
+			die("Unknown mark: %s", command_buf.buf);
+		p = x;
+		hashcpy(sha1, oe->idx.sha1);
+	} else {
+		if (get_sha1_hex(p, sha1))
+			die("Invalid SHA1: %s", command_buf.buf);
+		p += 40;
+		if (!*p)
+			oe = find_object(sha1);
+	}
+
+	/* [ SP "<path>" ] */
+	if (*p) {
+		if (*p++ != ' ')
+			die("Missing space after SHA1: %s", command_buf.buf);
+
+		/* cat <tree> "<path>" form. */
+		hashcpy(root.versions[1].sha1, sha1);
+		load_tree(&root);
+		quoted_path_sha1(sha1, &root, p, command_buf.buf);
+		oe = find_object(sha1);
+	}
+
+	cat_object(oe, sha1);
+}
+
 static void parse_checkpoint(void)
 {
 	if (object_count) {
@@ -2971,6 +3077,8 @@ int main(int argc, const char **argv)
 			parse_new_tag();
 		else if (!prefixcmp(command_buf.buf, "reset "))
 			parse_reset_branch();
+		else if (!prefixcmp(command_buf.buf, "cat "))
+			parse_cat_request();
 		else if (!strcmp("checkpoint", command_buf.buf))
 			parse_checkpoint();
 		else if (!prefixcmp(command_buf.buf, "progress "))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 4371308..8155b85 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -1647,6 +1647,108 @@ test_expect_success PIPE 'R: fast-import --report-fd' '
 	test_cmp real received
 '
 
+test_expect_success PIPE 'R: report-fd: can feed back printed tree' '
+	cat >frontend <<-\FRONTEND_END &&
+	#!/bin/sh
+	cat <<EOF &&
+	commit refs/heads/printed
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	to be printed
+	COMMIT
+
+	from refs/heads/master
+	D file3
+
+	EOF
+
+	read commit_id <&3 &&
+	echo "$commit_id" >printed &&
+	echo "$commit_id commit" >expect.response &&
+	echo "cat $commit_id" &&
+	read cid2 type size <&3 &&
+	echo "$cid2 $type" >response &&
+	dd if=/dev/stdin of=commit bs=1 count=$size <&3 &&
+	read newline <&3 &&
+	read tree tree_id <commit &&
+
+	cat <<EOF &&
+	commit refs/heads/printed
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	to be printed
+	COMMIT
+
+	from refs/heads/printed^0
+	M 040000 $tree_id old
+
+	EOF
+	read cid <&3
+	FRONTEND_END
+
+	mkfifo commits &&
+	test_when_finished "rm -f commits" &&
+	sh frontend 3<commits |
+	git fast-import --report-fd=3 3>commits &&
+	git rev-parse printed^ >expect.printed &&
+	git cat-file commit printed^ >expect.commit &&
+
+	test_cmp expect.printed printed &&
+	test_cmp expect.response response &&
+	test_cmp expect.commit commit
+'
+
+test_expect_success PIPE 'R: report-fd: can feed back printed blob' '
+	file6_id=$(echo "$file6_data" | git hash-object --stdin) &&
+	cat >expect <<-EOF &&
+	:100755 100644 $file6_id $file6_id C100	newdir/exec.sh	file6
+	EOF
+
+	cat >frontend <<-\FRONTEND_END &&
+	#!/bin/sh
+
+	branch=$(git rev-parse --verify refs/heads/branch) &&
+	cat <<EOF &&
+	feature report-fd=3
+	cat $branch
+	EOF
+
+	read commit_id type size <&3 &&
+	dd if=/dev/stdin of=commit bs=1 count=$size <&3 &&
+	read newline <&3 &&
+	read tree tree_id <commit &&
+
+	echo "cat $tree_id \"newdir/exec.sh\"" &&
+	read blob_id type size <&3 &&
+	dd if=/dev/stdin of=blob bs=1 count=$size <&3 &&
+	read newline <&3 &&
+
+	cat <<EOF &&
+	commit refs/heads/copyblob
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+	data <<COMMIT
+	copy file6 to top level
+	COMMIT
+
+	from refs/heads/branch^0
+	M 644 inline "file6"
+	data $size
+	EOF
+	cat blob &&
+	echo &&
+	echo &&
+
+	read cid <&3
+	FRONTEND_END
+
+	mkfifo commits &&
+	test_when_finished "rm -f commits" &&
+	sh frontend 3<commits |
+	git fast-import 3>commits &&
+	git diff-tree -C --find-copies-harder -r copyblob^ copyblob >actual &&
+	compare_diff_raw expect actual
+'
+
 cat >input << EOF
 option git quiet
 blob
-- 
1.7.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/7] fast-import: Let importers retrieve the objects being written
  2010-10-12 13:50 ` [PATCH 2/7] fast-import: Let importers retrieve the objects being written David Barr
@ 2010-10-12 22:38   ` Jonathan Nieder
  2010-10-12 23:07     ` Sverre Rabbelier
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Nieder @ 2010-10-12 22:38 UTC (permalink / raw)
  To: David Barr
  Cc: Git Mailing List, Ramkumar Ramachandra, Sverre Rabbelier,
	Shawn O. Pearce

David Barr wrote:

> The output uses the same format as "git cat-file --batch".

This means:

	<sha1> SP <type> SP <size> LF
	<contents> LF

where <contents> are the raw content of the object in question.

If a person just wants to read the <sha1>, the <contents> are
wasted computation and I/O.  Does that matter?

Suppose some day fast-import gains a feature to respond to requests
out of order.  The response would need to include the name of the
object as requested for the frontend to make any sense of it.  Would
it be worth preparing the format for that now, so the same frontend
code can deal with the format produced by fast-import with and without
that feature?

The format of tags, trees, and commits (though not blobs) is specific
to git: fast-import backends for other version control systems would
not be able to respond in the same way.  So maybe it would make sense
to restrict cat requests to blobs for now.

> Objects can be specified by path within a tree as well, using a
>
>  cat TREE "PATH"
>
> syntax.  With this syntax, also, the tree can only be specified by
> :n marker or 40-digit tree id.

I would prefer

	cat "<path>"

within commit commands, which would spit out the content at that
path in the currently-staged version of the commit, and a separate

	C <tree> "<path>" "<path>"

to copy data from previous commits.  So cat <tree> "<path>" would be
replaced with

	C <tree> "<path>" "dest"
	cat "dest"

when using content from tree:path to prepare dest.  Of course, that
would be less flexible, but I find it more intuitive for svn-fe at
least, perhaps because it fits better with svn's object model.

In either regime, I hear there is some demand for allowing commits to
be used in place of trees.

At any rate, that is not needed for the current version of svn-fe,
is it?  So maybe it would make sense to strip down the patch to just
allow

	cat <blob>

Thanks for resending it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/7] fast-import: Let importers retrieve the objects being written
  2010-10-12 22:38   ` Jonathan Nieder
@ 2010-10-12 23:07     ` Sverre Rabbelier
  2010-10-13  0:07       ` Jonathan Nieder
  0 siblings, 1 reply; 8+ messages in thread
From: Sverre Rabbelier @ 2010-10-12 23:07 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Shawn O. Pearce

Heya,

On Wed, Oct 13, 2010 at 00:38, Jonathan Nieder <jrnieder@gmail.com> wrote:
> At any rate, that is not needed for the current version of svn-fe,
> is it?  So maybe it would make sense to strip down the patch to just
> allow
>
>        cat <blob>

Do we want some sort of type indicator to make it easier to extend? So
"cat blob <blob>"?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/7] fast-import: Let importers retrieve the objects being written
  2010-10-12 23:07     ` Sverre Rabbelier
@ 2010-10-13  0:07       ` Jonathan Nieder
  2010-10-13 13:10         ` Sverre Rabbelier
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Nieder @ 2010-10-13  0:07 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Shawn O. Pearce

Sverre Rabbelier wrote:
> On Wed, Oct 13, 2010 at 00:38, Jonathan Nieder <jrnieder@gmail.com> wrote:

>> At any rate, that is not needed for the current version of svn-fe,
>> is it?  So maybe it would make sense to strip down the patch to just
>> allow
>>
>>        cat <blob>
>
> Do we want some sort of type indicator to make it easier to extend? So
> "cat blob <blob>"?

I'm not convinced it's needed.  cat-file does that to allow peeling
references:

	git cat-file tree v1.7.1
vs
	git cat-file tag v1.7.1

But the "cat" name might be too valuable namespace to reserve for only
this.  I have no strong preference either way.  Maybe the command
should be named "cat-blob".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/7] fast-import: Let importers retrieve the objects being written
  2010-10-13  0:07       ` Jonathan Nieder
@ 2010-10-13 13:10         ` Sverre Rabbelier
  0 siblings, 0 replies; 8+ messages in thread
From: Sverre Rabbelier @ 2010-10-13 13:10 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: David Barr, Git Mailing List, Ramkumar Ramachandra,
	Shawn O. Pearce

Heya,

On Wed, Oct 13, 2010 at 02:07, Jonathan Nieder <jrnieder@gmail.com> wrote:
> But the "cat" name might be too valuable namespace to reserve for only
> this.  I have no strong preference either way.  Maybe the command
> should be named "cat-blob".

Let's do that, call it "cat-blob" and restrict it to blobs.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-10-13 13:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-12 13:40 [PATCH/RFC] Add support for subversion dump format v3 David Barr
2010-10-12 13:40 ` [PATCH 1/7] Teach fast-import to print the id of each imported commit David Barr
2010-10-12 13:40 ` [PATCH 2/7] fast-import: Let importers retrieve the objects being written David Barr
  -- strict thread matches above, loose matches on Subject: below --
2010-10-12 13:50 [PATCH/RFC] Add support for subversion dump format v3 David Barr
2010-10-12 13:50 ` [PATCH 2/7] fast-import: Let importers retrieve the objects being written David Barr
2010-10-12 22:38   ` Jonathan Nieder
2010-10-12 23:07     ` Sverre Rabbelier
2010-10-13  0:07       ` Jonathan Nieder
2010-10-13 13:10         ` Sverre Rabbelier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).