git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/9] Add tests for git cat-file
  2007-10-23  5:46 [PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben
@ 2007-10-23  5:46 ` Adam Roben
  2007-10-23  6:59   ` Johannes Sixt
  0 siblings, 1 reply; 19+ messages in thread
From: Adam Roben @ 2007-10-23  5:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Adam Roben


Signed-off-by: Adam Roben <aroben@apple.com>
---
 t/t1005-cat-file.sh |   91 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 91 insertions(+), 0 deletions(-)
 create mode 100755 t/t1005-cat-file.sh

diff --git a/t/t1005-cat-file.sh b/t/t1005-cat-file.sh
new file mode 100755
index 0000000..2fdc446
--- /dev/null
+++ b/t/t1005-cat-file.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+
+test_description='git cat-file'
+
+. ./test-lib.sh
+
+function maybe_remove_timestamp()
+{
+    if test -z "$2"; then
+        echo "$1"
+    else
+        echo "$1" | sed -e 's/ [0-9]\{10\} [+-][0-9]\{4\}$//'
+    fi
+}
+
+function run_tests()
+{
+    type=$1
+    sha1=$2
+    size=$3
+    content=$4
+    pretty_content=$5
+    no_timestamp=$6
+
+    test_expect_success \
+        "$type exists" \
+        "git cat-file -e $hello_sha1"
+    test_expect_success \
+        "Type of $type is correct" \
+        "test $type = \"$(git cat-file -t $sha1)\""
+    test_expect_success \
+        "Size of $type is correct" \
+        "test $size = \"$(git cat-file -s $sha1)\""
+    test -z "$content" || test_expect_success \
+        "Content of $type is correct" \
+        "test \"$(maybe_remove_timestamp "$content" $no_timestamp)\" = \"$(maybe_remove_timestamp "$(git cat-file $type $sha1)" $no_timestamp)\""
+    test_expect_success \
+        "Pretty content of $type is correct" \
+        "test \"$(maybe_remove_timestamp "$pretty_content" $no_timestamp)\" = \"$(maybe_remove_timestamp "$(git cat-file -p $sha1)" $no_timestamp)\""
+}
+
+hello_content="Hello World"
+hello_size=$(echo "$hello_content" | wc -c)
+hello_sha1=557db03de997c86a4a028e1ebd3a1ceb225be238
+
+echo "$hello_content" > hello
+
+git update-index --add hello
+
+run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
+
+tree_sha1=$(git write-tree)
+tree_size=33
+tree_pretty_content="100644 blob $hello_sha1	hello"
+
+run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
+
+commit_message="Intial commit"
+commit_sha1=$(echo "$commit_message" | git commit-tree $tree_sha1)
+commit_size=177
+commit_content="tree $tree_sha1
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 0000000000 +0000
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 0000000000 +0000
+
+$commit_message"
+
+run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
+
+tag_header="object $hello_sha1
+type blob
+tag hellotag
+tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
+tag_description="This is a tag"
+tag_content="$tag_header
+
+$tag_description"
+tag_pretty_content="$tag_header
+Thu Jan 1 00:00:00 1970 +0000
+
+$tag_description"
+
+tag_sha1=$(echo "$tag_content" | git mktag)
+tag_size=$(echo "$tag_content" | wc -c)
+
+run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_pretty_content"
+
+test_expect_success \
+    "Reach a blob from a tag pointing to it" \
+    "test \"$hello_content\" = \"$(git cat-file blob $tag_sha1)\""
+
+test_done
-- 
1.5.3.4.1333.ga2f32

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/9] Add tests for git cat-file
  2007-10-23  5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
@ 2007-10-23  6:59   ` Johannes Sixt
  0 siblings, 0 replies; 19+ messages in thread
From: Johannes Sixt @ 2007-10-23  6:59 UTC (permalink / raw)
  To: Adam Roben; +Cc: git, Junio C Hamano

Adam Roben schrieb:
> +    test_expect_success \
> +        "$type exists" \
> +        "git cat-file -e $hello_sha1"

You mean $sha1 here, right?

> +    test_expect_success \
> +        "Type of $type is correct" \
> +        "test $type = \"$(git cat-file -t $sha1)\""

This should escape the $(...) in all the tests. Like this:

         "test $type = \"\$(git cat-file -t $sha1)\""

> +test_expect_success \
> +    "Reach a blob from a tag pointing to it" \
> +    "test \"$hello_content\" = \"$(git cat-file blob $tag_sha1)\""

And use single quotes without escaping the double-quotes here.

-- Hannes

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RESEND PATCH 0/9] Make git-svn fetch ~1.7x faster
@ 2007-10-25 10:25 Adam Roben
  2007-10-25 10:25 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
  0 siblings, 1 reply; 19+ messages in thread
From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw)
  To: git; +Cc: Junio Hamano


This is a resend of my previous patch series to speed up git-svn, taking into
account comments from Eric, Johannes, and Brian.

--
 Documentation/git-cat-file.txt    |    6 +-
 Documentation/git-hash-object.txt |    5 +-
 builtin-cat-file.c                |   87 +++++++++++++++++----
 git-svn.perl                      |   40 +++++-----
 hash-object.c                     |   29 +++++++-
 perl/Git.pm                       |  153 ++++++++++++++++++++++++++++++++++++-
 t/t1005-cat-file.sh               |  126 ++++++++++++++++++++++++++++++
 t/t1006-hash-object.sh            |   49 ++++++++++++
 8 files changed, 452 insertions(+), 43 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/9] Add tests for git cat-file
  2007-10-25 10:25 [RESEND PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben
@ 2007-10-25 10:25 ` Adam Roben
  2007-10-25 10:25   ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
  0 siblings, 1 reply; 19+ messages in thread
From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw)
  To: git; +Cc: Junio Hamano, Adam Roben, Johannes Sixt


Signed-off-by: Adam Roben <aroben@apple.com>
---
Johannes Sixt wrote:
> Adam Roben schrieb:
> > +    test_expect_success \
> > +        "$type exists" \
> > +        "git cat-file -e $hello_sha1"
> 
> You mean $sha1 here, right?

I most definitely did!

> > +    test_expect_success \
> > +        "Type of $type is correct" \
> > +        "test $type = \"$(git cat-file -t $sha1)\""
> 
> This should escape the $(...) in all the tests. Like this:
> 
>         "test $type = \"\$(git cat-file -t $sha1)\""
> 
> > +test_expect_success \
> > +    "Reach a blob from a tag pointing to it" \
> > +    "test \"$hello_content\" = \"$(git cat-file blob $tag_sha1)\""
> 
> And use single quotes without escaping the double-quotes here. 

Done.

 t/t1005-cat-file.sh |   91 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 91 insertions(+), 0 deletions(-)
 create mode 100755 t/t1005-cat-file.sh

diff --git a/t/t1005-cat-file.sh b/t/t1005-cat-file.sh
new file mode 100755
index 0000000..697354d
--- /dev/null
+++ b/t/t1005-cat-file.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+
+test_description='git cat-file'
+
+. ./test-lib.sh
+
+function maybe_remove_timestamp()
+{
+    if test -z "$2"; then
+        echo "$1"
+    else
+        echo "$1" | sed -e 's/ [0-9]\{10\} [+-][0-9]\{4\}$//'
+    fi
+}
+
+function run_tests()
+{
+    type=$1
+    sha1=$2
+    size=$3
+    content=$4
+    pretty_content=$5
+    no_timestamp=$6
+
+    test_expect_success \
+        "$type exists" \
+        "git cat-file -e $sha1"
+    test_expect_success \
+        "Type of $type is correct" \
+        "test $type = \"\$(git cat-file -t $sha1)\""
+    test_expect_success \
+        "Size of $type is correct" \
+        "test $size = \"\$(git cat-file -s $sha1)\""
+    test -z "$content" || test_expect_success \
+        "Content of $type is correct" \
+        "test \"\$(maybe_remove_timestamp '$content' $no_timestamp)\" = \"\$(maybe_remove_timestamp \"\$(git cat-file $type $sha1)\" $no_timestamp)\""
+    test_expect_success \
+        "Pretty content of $type is correct" \
+        "test \"\$(maybe_remove_timestamp '$pretty_content' $no_timestamp)\" = \"\$(maybe_remove_timestamp \"\$(git cat-file -p $sha1)\" $no_timestamp)\""
+}
+
+hello_content="Hello World"
+hello_size=$(echo "$hello_content" | wc -c)
+hello_sha1=557db03de997c86a4a028e1ebd3a1ceb225be238
+
+echo "$hello_content" > hello
+
+git update-index --add hello
+
+run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
+
+tree_sha1=$(git write-tree)
+tree_size=33
+tree_pretty_content="100644 blob $hello_sha1	hello"
+
+run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
+
+commit_message="Intial commit"
+commit_sha1=$(echo "$commit_message" | git commit-tree $tree_sha1)
+commit_size=177
+commit_content="tree $tree_sha1
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 0000000000 +0000
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 0000000000 +0000
+
+$commit_message"
+
+run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
+
+tag_header="object $hello_sha1
+type blob
+tag hellotag
+tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
+tag_description="This is a tag"
+tag_content="$tag_header
+
+$tag_description"
+tag_pretty_content="$tag_header
+Thu Jan 1 00:00:00 1970 +0000
+
+$tag_description"
+
+tag_sha1=$(echo "$tag_content" | git mktag)
+tag_size=$(echo "$tag_content" | wc -c)
+
+run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_pretty_content"
+
+test_expect_success \
+    "Reach a blob from a tag pointing to it" \
+    "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
+
+test_done
-- 
1.5.3.4.1337.g8e67d-dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file
  2007-10-25 10:25 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
@ 2007-10-25 10:25   ` Adam Roben
  2007-10-25 10:25     ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
  0 siblings, 1 reply; 19+ messages in thread
From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw)
  To: git; +Cc: Junio Hamano, Adam Roben

I separated the logic of parsing the arguments from the logic of fetching and
outputting the data. cat_one_file now does the latter.

Signed-off-by: Adam Roben <aroben@apple.com>
---
 builtin-cat-file.c |   38 ++++++++++++++++++++++----------------
 1 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/builtin-cat-file.c b/builtin-cat-file.c
index f132d58..34a63d1 100644
--- a/builtin-cat-file.c
+++ b/builtin-cat-file.c
@@ -76,31 +76,16 @@ static void pprint_tag(const unsigned char *sha1, const char *buf, unsigned long
 		write_or_die(1, cp, endp - cp);
 }
 
-int cmd_cat_file(int argc, const char **argv, const char *prefix)
+static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
 {
 	unsigned char sha1[20];
 	enum object_type type;
 	void *buf;
 	unsigned long size;
-	int opt;
-	const char *exp_type, *obj_name;
-
-	git_config(git_default_config);
-	if (argc != 3)
-		usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>");
-	exp_type = argv[1];
-	obj_name = argv[2];
 
 	if (get_sha1(obj_name, sha1))
 		die("Not a valid object name %s", obj_name);
 
-	opt = 0;
-	if ( exp_type[0] == '-' ) {
-		opt = exp_type[1];
-		if ( !opt || exp_type[2] )
-			opt = -1; /* Not a single character option */
-	}
-
 	buf = NULL;
 	switch (opt) {
 	case 't':
@@ -157,3 +142,24 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 	write_or_die(1, buf, size);
 	return 0;
 }
+
+int cmd_cat_file(int argc, const char **argv, const char *prefix)
+{
+	int opt;
+	const char *exp_type, *obj_name;
+
+	git_config(git_default_config);
+	if (argc != 3)
+		usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>");
+	exp_type = argv[1];
+	obj_name = argv[2];
+
+	opt = 0;
+	if ( exp_type[0] == '-' ) {
+		opt = exp_type[1];
+		if ( !opt || exp_type[2] )
+			opt = -1; /* Not a single character option */
+	}
+
+	return cat_one_file(opt, exp_type, obj_name);
+}
-- 
1.5.3.4.1337.g8e67d-dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/9] git-cat-file: Make option parsing a little more flexible
  2007-10-25 10:25   ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
@ 2007-10-25 10:25     ` Adam Roben
  2007-10-25 10:25       ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
  2007-10-26 20:56       ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw)
  To: git; +Cc: Junio Hamano, Adam Roben

This will make it easier to add newer options later.

Signed-off-by: Adam Roben <aroben@apple.com>
---
 builtin-cat-file.c |   42 ++++++++++++++++++++++++++++++------------
 1 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/builtin-cat-file.c b/builtin-cat-file.c
index 34a63d1..3a0be4a 100644
--- a/builtin-cat-file.c
+++ b/builtin-cat-file.c
@@ -143,23 +143,41 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
 	return 0;
 }
 
+static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] <sha1>";
+
 int cmd_cat_file(int argc, const char **argv, const char *prefix)
 {
-	int opt;
-	const char *exp_type, *obj_name;
+	int i, opt = 0;
+	const char *exp_type = 0, *obj_name = 0;
 
 	git_config(git_default_config);
-	if (argc != 3)
-		usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>");
-	exp_type = argv[1];
-	obj_name = argv[2];
-
-	opt = 0;
-	if ( exp_type[0] == '-' ) {
-		opt = exp_type[1];
-		if ( !opt || exp_type[2] )
-			opt = -1; /* Not a single character option */
+
+	for (i = 1; i < argc; ++i) {
+		const char *arg = argv[i];
+
+		if (!strcmp(arg, "-t") || !strcmp(arg, "-s") || !strcmp(arg, "-e") || !strcmp(arg, "-p")) {
+			exp_type = arg;
+			opt = exp_type[1];
+			continue;
+		}
+
+		if (arg[0] == '-')
+			usage(cat_file_usage);
+
+		if (!exp_type) {
+			exp_type = arg;
+			continue;
+		}
+
+		if (obj_name)
+			usage(cat_file_usage);
+
+		obj_name = arg;
+		break;
 	}
 
+	if (!exp_type || !obj_name)
+		usage(cat_file_usage);
+
 	return cat_one_file(opt, exp_type, obj_name);
 }
-- 
1.5.3.4.1337.g8e67d-dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/9] git-cat-file: Add --stdin option
  2007-10-25 10:25     ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
@ 2007-10-25 10:25       ` Adam Roben
  2007-10-25 10:25         ` [PATCH 5/9] Add tests for git hash-object Adam Roben
  2007-10-26 20:59         ` [PATCH 4/9] git-cat-file: Add --stdin option Junio C Hamano
  2007-10-26 20:56       ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Junio C Hamano
  1 sibling, 2 replies; 19+ messages in thread
From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw)
  To: git; +Cc: Junio Hamano, Adam Roben, Brian Downing

This lets you specify object names on stdin instead of on the command line.
When printing object contents or pretty-printing, objects will be printed
preceded by their size:

<size>LF
<content>LF

Signed-off-by: Adam Roben <aroben@apple.com>
---
Brian Downing wrote:
> I think a far more reasonable output format for multiple objects would
> be something like:
> 
> <count> LF
> <raw data> LF
> 
> Where <count> is the number of bytes in the <raw data> as an ASCII
> decimal integer.

Agreed.

 Documentation/git-cat-file.txt |    6 ++++-
 builtin-cat-file.c             |   43 ++++++++++++++++++++++++++++++++++-----
 t/t1005-cat-file.sh            |   35 ++++++++++++++++++++++++++++++++
 3 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index afa095c..588d71a 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -8,7 +8,7 @@ git-cat-file - Provide content or type/size information for repository objects
 
 SYNOPSIS
 --------
-'git-cat-file' [-t | -s | -e | -p | <type>] <object>
+'git-cat-file' [-t | -s | -e | -p | <type>] [--stdin | <object>]
 
 DESCRIPTION
 -----------
@@ -23,6 +23,10 @@ OPTIONS
 	For a more complete list of ways to spell object names, see
 	"SPECIFYING REVISIONS" section in gitlink:git-rev-parse[1].
 
+--stdin::
+	Read object names from stdin instead of specifying one on the
+	command line.
+
 -t::
 	Instead of the content, show the object type identified by
 	<object>.
diff --git a/builtin-cat-file.c b/builtin-cat-file.c
index 3a0be4a..ee46ba4 100644
--- a/builtin-cat-file.c
+++ b/builtin-cat-file.c
@@ -76,7 +76,7 @@ static void pprint_tag(const unsigned char *sha1, const char *buf, unsigned long
 		write_or_die(1, cp, endp - cp);
 }
 
-static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
+static int cat_one_file(int opt, const char *exp_type, const char *obj_name, int print_size)
 {
 	unsigned char sha1[20];
 	enum object_type type;
@@ -139,16 +139,26 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
 	if (!buf)
 		die("git-cat-file %s: bad file", obj_name);
 
+	if (print_size) {
+		printf("%lu\n", size);
+		fflush(stdout);
+	}
 	write_or_die(1, buf, size);
+	if (print_size) {
+		printf("\n");
+		fflush(stdout);
+	}
 	return 0;
 }
 
-static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] <sha1>";
+static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] [--stdin | <sha1>]";
 
 int cmd_cat_file(int argc, const char **argv, const char *prefix)
 {
-	int i, opt = 0;
+	int i, opt = 0, print_size = 0;
+	int read_stdin = 0;
 	const char *exp_type = 0, *obj_name = 0;
+	struct strbuf buf;
 
 	git_config(git_default_config);
 
@@ -161,6 +171,11 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 
+		if (!strcmp(arg, "--stdin")) {
+		    read_stdin = 1;
+		    continue;
+		}
+
 		if (arg[0] == '-')
 			usage(cat_file_usage);
 
@@ -169,15 +184,31 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 
-		if (obj_name)
+		if (obj_name || read_stdin)
 			usage(cat_file_usage);
 
 		obj_name = arg;
 		break;
 	}
 
-	if (!exp_type || !obj_name)
+	if (!exp_type)
 		usage(cat_file_usage);
 
-	return cat_one_file(opt, exp_type, obj_name);
+	if (!read_stdin) {
+		if (!obj_name)
+			usage(cat_file_usage);
+		return cat_one_file(opt, exp_type, obj_name, 0);
+	}
+
+	print_size = !opt || opt == 'p';
+
+	strbuf_init(&buf, 0);
+	while (strbuf_getline(&buf, stdin, '\n') != EOF) {
+		int error = cat_one_file(opt, exp_type, buf.buf, print_size);
+		if (error)
+			return error;
+	}
+	strbuf_release(&buf);
+
+	return 0;
 }
diff --git a/t/t1005-cat-file.sh b/t/t1005-cat-file.sh
index 697354d..2b2d386 100755
--- a/t/t1005-cat-file.sh
+++ b/t/t1005-cat-file.sh
@@ -88,4 +88,39 @@ test_expect_success \
     "Reach a blob from a tag pointing to it" \
     "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
 
+sha1s="$hello_sha1
+$tree_sha1
+$commit_sha1
+$tag_sha1"
+
+sizes="$hello_size
+$tree_size
+$commit_size
+$tag_size"
+
+test_expect_success \
+    "Pass object hashes on stdin to retrieve sizes" \
+    "test '$sizes' = \"\$(echo '$sha1s' | git cat-file -s --stdin)\""
+
+example_content="Silly example"
+example_size=$(echo "$example_content" | wc -c)
+example_sha1=f24c74a2e500f5ee1332c86b94199f52b1d1d962
+
+echo "$example_content" > example
+
+git update-index --add example
+
+sha1s="$hello_sha1
+$example_sha1"
+
+contents="$hello_size
+$hello_content
+
+$example_size
+$example_content"
+
+test_expect_success \
+    "Pass object hashes on stdin to retrieve contents" \
+    "test '$contents' = \"\$(echo '$sha1s' | git cat-file blob --stdin)\""
+
 test_done
-- 
1.5.3.4.1337.g8e67d-dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/9] Add tests for git hash-object
  2007-10-25 10:25       ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
@ 2007-10-25 10:25         ` Adam Roben
  2007-10-25 10:25           ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Adam Roben
  2007-10-26 21:00           ` [PATCH 5/9] Add tests for git hash-object Junio C Hamano
  2007-10-26 20:59         ` [PATCH 4/9] git-cat-file: Add --stdin option Junio C Hamano
  1 sibling, 2 replies; 19+ messages in thread
From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw)
  To: git; +Cc: Junio Hamano, Adam Roben, Johannes Sixt


Signed-off-by: Adam Roben <aroben@apple.com>
---
Johannes Sixt wrote:
> Adam Roben schrieb:
> > +test_expect_success \
> > +    'hash a file' \
> > +    "test $hello_sha1 = $(git hash-object hello)"
> 
> Put tests in double-quotes; otherwise, the substitutions happen before the test begins, and not as part of the test. 

I think escaping the $(...) is enough to delay command execution.

 t/t1006-hash-object.sh |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)
 create mode 100755 t/t1006-hash-object.sh

diff --git a/t/t1006-hash-object.sh b/t/t1006-hash-object.sh
new file mode 100755
index 0000000..12f95f0
--- /dev/null
+++ b/t/t1006-hash-object.sh
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+test_description='git hash-object'
+
+. ./test-lib.sh
+
+hello_content="Hello World"
+hello_sha1=557db03de997c86a4a028e1ebd3a1ceb225be238
+echo "$hello_content" > hello
+
+test_expect_success \
+    'hash a file' \
+    "test $hello_sha1 = \$(git hash-object hello)"
+
+test_expect_success \
+    'hash from stdin' \
+    "test $hello_sha1 = \$(echo '$hello_content' | git hash-object --stdin)"
+
+test_expect_success \
+    'hash a file and write to database' \
+    "test $hello_sha1 = \$(git hash-object -w hello)"
+
+test_expect_success \
+    'hash from stdin and write to database' \
+    "test $hello_sha1 = \$(echo '$hello_content' | git hash-object -w --stdin)"
+
+test_done
-- 
1.5.3.4.1337.g8e67d-dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/9] git-hash-object: Add --stdin-paths option
  2007-10-25 10:25         ` [PATCH 5/9] Add tests for git hash-object Adam Roben
@ 2007-10-25 10:25           ` Adam Roben
  2007-10-25 10:25             ` [PATCH 7/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
  2007-10-26 21:00             ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Junio C Hamano
  2007-10-26 21:00           ` [PATCH 5/9] Add tests for git hash-object Junio C Hamano
  1 sibling, 2 replies; 19+ messages in thread
From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw)
  To: git; +Cc: Junio Hamano, Adam Roben

This allows multiple paths to be specified on stdin.

Signed-off-by: Adam Roben <aroben@apple.com>
---
 Documentation/git-hash-object.txt |    5 ++++-
 hash-object.c                     |   29 ++++++++++++++++++++++++++++-
 t/t1006-hash-object.sh            |   22 ++++++++++++++++++++++
 3 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-hash-object.txt b/Documentation/git-hash-object.txt
index 616f196..50fc401 100644
--- a/Documentation/git-hash-object.txt
+++ b/Documentation/git-hash-object.txt
@@ -8,7 +8,7 @@ git-hash-object - Compute object ID and optionally creates a blob from a file
 
 SYNOPSIS
 --------
-'git-hash-object' [-t <type>] [-w] [--stdin] [--] <file>...
+'git-hash-object' [-t <type>] [-w] [--stdin | --stdin-paths] [--] <file>...
 
 DESCRIPTION
 -----------
@@ -32,6 +32,9 @@ OPTIONS
 --stdin::
 	Read the object from standard input instead of from a file.
 
+--stdin-paths::
+	Read file names from stdin instead of from the command-line.
+
 Author
 ------
 Written by Junio C Hamano <junkio@cox.net>
diff --git a/hash-object.c b/hash-object.c
index 18f5017..fd96d50 100644
--- a/hash-object.c
+++ b/hash-object.c
@@ -20,6 +20,7 @@ static void hash_object(const char *path, enum object_type type, int write_objec
 		    ? "Unable to add %s to database"
 		    : "Unable to hash %s", path);
 	printf("%s\n", sha1_to_hex(sha1));
+	maybe_flush_or_die(stdout, "hash to stdout");
 }
 
 static void hash_stdin(const char *type, int write_object)
@@ -31,7 +32,7 @@ static void hash_stdin(const char *type, int write_object)
 }
 
 static const char hash_object_usage[] =
-"git-hash-object [-t <type>] [-w] [--stdin] <file>...";
+"git-hash-object [-t <type>] [-w] [--stdin | --stdin-paths] <file>...";
 
 int main(int argc, char **argv)
 {
@@ -41,6 +42,7 @@ int main(int argc, char **argv)
 	const char *prefix = NULL;
 	int prefix_length = -1;
 	int no_more_flags = 0;
+	int found_stdin_flag = 0;
 
 	for (i = 1 ; i < argc; i++) {
 		if (!no_more_flags && argv[i][0] == '-') {
@@ -62,7 +64,32 @@ int main(int argc, char **argv)
 			}
 			else if (!strcmp(argv[i], "--help"))
 				usage(hash_object_usage);
+			else if (!strcmp(argv[i], "--stdin-paths")) {
+				struct strbuf buf, nbuf;
+
+				if (found_stdin_flag)
+					die("Can't use both --stdin and --stdin-paths");
+				found_stdin_flag = 1;
+
+				strbuf_init(&buf, 0);
+				strbuf_init(&nbuf, 0);
+				while (strbuf_getline(&buf, stdin, '\n') != EOF) {
+					if (buf.buf[0] == '"') {
+						strbuf_reset(&nbuf);
+						if (unquote_c_style(&nbuf, buf.buf, NULL))
+							die("line is badly quoted");
+						strbuf_swap(&buf, &nbuf);
+					}
+					hash_object(buf.buf, type_from_string(type), write_object);
+				}
+				strbuf_release(&buf);
+				strbuf_release(&nbuf);
+			}
 			else if (!strcmp(argv[i], "--stdin")) {
+				if (found_stdin_flag)
+					die("Can't use both --stdin and --stdin-paths");
+				found_stdin_flag = 1;
+
 				hash_stdin(type, write_object);
 			}
 			else
diff --git a/t/t1006-hash-object.sh b/t/t1006-hash-object.sh
index 12f95f0..e747004 100755
--- a/t/t1006-hash-object.sh
+++ b/t/t1006-hash-object.sh
@@ -24,4 +24,26 @@ test_expect_success \
     'hash from stdin and write to database' \
     "test $hello_sha1 = \$(echo '$hello_content' | git hash-object -w --stdin)"
 
+example_content="Silly example"
+example_sha1=f24c74a2e500f5ee1332c86b94199f52b1d1d962
+echo "$example_content" > example
+
+filenames="hello
+example"
+
+sha1s="$hello_sha1
+$example_sha1"
+
+test_expect_success \
+    'hash two files with names on stdin' \
+    "test '$sha1s' = \"\$(echo '$filenames' | git hash-object --stdin-paths)\""
+
+test_expect_success \
+    'hash two files with names on stdin and write to database' \
+    "test '$sha1s' = \"\$(echo '$filenames' | git hash-object --stdin-paths)\""
+
+test_expect_failure \
+    "Can't use --stdin and --stdin-paths together" \
+    "echo '$filenames' | git hash-object --stdin --stdin-paths"
+
 test_done
-- 
1.5.3.4.1337.g8e67d-dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe
  2007-10-25 10:25           ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Adam Roben
@ 2007-10-25 10:25             ` Adam Roben
  2007-10-25 10:25               ` [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob Adam Roben
  2007-10-26 21:00             ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw)
  To: git; +Cc: Junio Hamano, Adam Roben

command_bidi_pipe hands back the stdin and stdout file handles from the
executed command. command_close_bidi_pipe closes these handles and terminates
the process.

Signed-off-by: Adam Roben <aroben@apple.com>
---
 perl/Git.pm |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/perl/Git.pm b/perl/Git.pm
index 3f4080c..46c5d10 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -51,6 +51,7 @@ require Exporter;
 # Methods which can be called as standalone functions as well:
 @EXPORT_OK = qw(command command_oneline command_noisy
                 command_output_pipe command_input_pipe command_close_pipe
+                command_bidi_pipe command_close_bidi_pipe
                 version exec_path hash_object git_cmd_try);
 
 
@@ -92,6 +93,7 @@ increate nonwithstanding).
 use Carp qw(carp croak); # but croak is bad - throw instead
 use Error qw(:try);
 use Cwd qw(abs_path);
+use IPC::Open2 qw(open2);
 
 }
 
@@ -375,6 +377,60 @@ sub command_close_pipe {
 	_cmd_close($fh, $ctx);
 }
 
+=item command_bidi_pipe ( COMMAND [, ARGUMENTS... ] )
+
+Execute the given C<COMMAND> in the same way as command_output_pipe()
+does but return both an input pipe filehandle and an output pipe filehandle.
+
+The function will return return C<($pid, $pipe_in, $pipe_out, $ctx)>.
+See C<command_close_bidi_pipe()> for details.
+
+=cut
+
+sub command_bidi_pipe {
+	my ($pid, $in, $out);
+	$pid = open2($in, $out, 'git', @_);
+	return ($pid, $in, $out, join(' ', @_));
+}
+
+=item command_close_bidi_pipe ( PID, PIPE_IN, PIPE_OUT [, CTX] )
+
+Close the C<PIPE_IN> and C<PIPE_OUT> as returned from C<command_bidi_pipe()>,
+checking whether the command finished successfully. The optional C<CTX>
+argument is required if you want to see the command name in the error message,
+and it is the fourth value returned by C<command_bidi_pipe()>.  The call idiom
+is:
+
+	my ($pid, $in, $out, $ctx) = $r->command_bidi_pipe('cat-file --stdin');
+	print "000000000\n" $out;
+	while (<$in>) { ... }
+	$r->command_close_bidi_pipe($pid, $in, $out, $ctx);
+
+Note that you should not rely on whatever actually is in C<CTX>;
+currently it is simply the command name but in future the context might
+have more complicated structure.
+
+=cut
+
+sub command_close_bidi_pipe {
+	my ($pid, $in, $out, $ctx) = @_;
+	foreach my $fh ($in, $out) {
+		if (not close $fh) {
+			if ($!) {
+				carp "error closing pipe: $!";
+			} elsif ($? >> 8) {
+				throw Git::Error::Command($ctx, $? >>8);
+			}
+		}
+	}
+
+	waitpid $pid, 0;
+
+	if ($? >> 8) {
+		throw Git::Error::Command($ctx, $? >>8);
+	}
+}
+
 
 =item command_noisy ( COMMAND [, ARGUMENTS... ] )
 
-- 
1.5.3.4.1337.g8e67d-dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob
  2007-10-25 10:25             ` [PATCH 7/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
@ 2007-10-25 10:25               ` Adam Roben
  2007-10-25 10:25                 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben
  2007-10-26 15:11                 ` [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob Eric Wong
  0 siblings, 2 replies; 19+ messages in thread
From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw)
  To: git; +Cc: Junio Hamano, Adam Roben, Eric Wong

These functions are more efficient ways of executing `git hash-object -w` and
`git cat-file blob` when you are dealing with many files/objects.

Signed-off-by: Adam Roben <aroben@apple.com>
---
Eric Wong wrote:
> > +package Git::Commands;
> 
> Can this be a separate file, or a part of Git.pm?  I'm sure other
> scripts can eventually use this and I've been meaning to split
> git-svn.perl into separate files so it's easier to follow.

I ended up making it part of Git.pm, because I realized that made far more
sense than splitting it into a separate file.

 perl/Git.pm |   97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/perl/Git.pm b/perl/Git.pm
index 46c5d10..f23edef 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -39,6 +39,9 @@ $VERSION = '0.01';
   my $lastrev = $repo->command_oneline( [ 'rev-list', '--all' ],
                                         STDERR => 0 );
 
+  my $sha1 = $repo->hash_and_insert_object('file.txt');
+  my $contents = $repo->cat_blob($sha1);
+
 =cut
 
 
@@ -218,7 +221,6 @@ sub repository {
 	bless $self, $class;
 }
 
-
 =back
 
 =head1 METHODS
@@ -675,6 +677,93 @@ sub hash_object {
 }
 
 
+=item hash_and_insert_object ( FILENAME )
+
+Compute the SHA1 object id of the given C<FILENAME> and add the object to the
+object database.
+
+The function returns the SHA1 hash.
+
+=cut
+
+# TODO: Support for passing FILEHANDLE instead of FILENAME
+sub hash_and_insert_object {
+	my ($self, $filename) = @_;
+
+	$self->_open_hash_and_insert_object_if_needed();
+	my ($in, $out) = ($self->{hash_object_in}, $self->{hash_object_out});
+
+	print $out $filename, "\n";
+	chomp(my $hash = <$in>);
+	return $hash;
+}
+
+sub _open_hash_and_insert_object_if_needed {
+	my ($self) = @_;
+
+	return if defined($self->{hash_object_pid});
+
+	($self->{hash_object_pid}, $self->{hash_object_in},
+	 $self->{hash_object_out}, $self->{hash_object_ctx}) =
+		command_bidi_pipe(qw(hash-object -w --stdin-paths));
+}
+
+sub _close_hash_and_insert_object {
+	my ($self) = @_;
+
+	return unless defined($self->{hash_object_pid});
+
+	my @vars = map { 'hash_object' . $_ } qw(pid in out ctx);
+
+	command_close_bidi_pipe($self->{@vars});
+	delete $self->{@vars};
+}
+
+=item cat_blob ( SHA1 )
+
+Returns the contents of the blob identified by C<SHA1>.
+
+=cut
+
+sub cat_blob {
+	my ($self, $sha1) = @_;
+
+	$self->_open_cat_blob_if_needed();
+	my ($in, $out) = ($self->{cat_blob_in}, $self->{cat_blob_out});
+
+	print $out $sha1, "\n";
+	chomp(my $size = <$in>);
+
+	my $blob;
+	my $result = read($in, $blob, $size);
+	defined $result or carp $!;
+
+	# Skip past the trailing newline.
+	read($in, my $newline, 1);
+
+	return $blob;
+}
+
+sub _open_cat_blob_if_needed {
+	my ($self) = @_;
+
+	return if defined($self->{cat_blob_pid});
+
+	($self->{cat_blob_pid}, $self->{cat_blob_in},
+	 $self->{cat_blob_out}, $self->{cat_blob_ctx}) =
+		command_bidi_pipe(qw(cat-file blob --stdin));
+}
+
+sub _close_cat_blob {
+	my ($self) = @_;
+
+	return unless defined($self->{cat_blob_pid});
+
+	my @vars = map { 'cat_blob' . $_ } qw(pid in out ctx);
+
+	command_close_bidi_pipe($self->{@vars});
+	delete $self->{@vars};
+}
 
 =back
 
@@ -892,7 +981,11 @@ sub _cmd_close {
 }
 
 
-sub DESTROY { }
+sub DESTROY {
+	my ($self) = @_;
+	$self->_close_hash_and_insert_object();
+	$self->_close_cat_blob();
+}
 
 
 # Pipe implementation for ActiveState Perl.
-- 
1.5.3.4.1342.g32de

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 9/9] git-svn: Make fetch ~1.7x faster
  2007-10-25 10:25               ` [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob Adam Roben
@ 2007-10-25 10:25                 ` Adam Roben
  2007-10-26 15:11                 ` [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob Eric Wong
  1 sibling, 0 replies; 19+ messages in thread
From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw)
  To: git; +Cc: Junio Hamano, Adam Roben, Eric Wong

We were spending a lot of time forking/execing git-cat-file and
git-hash-object. We now maintain a global Git repository object in order to use
Git.pm's more efficient hash_and_insert_object and cat_blob methods.

Signed-off-by: Adam Roben <aroben@apple.com>
---
Eric Wong wrote:
> > +sub hash_object {
> > +   my (undef, $fh) = @_;
> > +
> > +   my ($tmp_fh, $tmp_filename) = tempfile(UNLINK => 1);
> > +   while (my $line = <$fh>) {
> > +           print $tmp_fh $line;
> > +   }
> > +   close($tmp_fh);
> 
> Related to the above.  It's better to sysread()/syswrite() or
> read()/print() in a loop with a predefined buffer size rather than to
> use a readline() since you could be dealing with files with very long
> lines or binaries with no newline characters in them at all.

Fixed.

> > +   _open_hash_object_if_needed();
> > +   print $_hash_object_out $tmp_filename . "\n";
> 
> Minor, but
> 
>         print $_hash_object_out $tmp_filename, "\n";
> 
> avoids creating a new string.

Fixed.

 git-svn.perl |   40 ++++++++++++++++++----------------------
 1 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 22bb47b..fcb07f5 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -4,7 +4,7 @@
 use warnings;
 use strict;
 use vars qw/	$AUTHOR $VERSION
-		$sha1 $sha1_short $_revision
+		$sha1 $sha1_short $_revision $_repository
 		$_q $_authors %users/;
 $AUTHOR = 'Eric Wong <normalperson@yhbt.net>';
 $VERSION = '@@GIT_VERSION@@';
@@ -225,6 +225,7 @@ unless ($cmd =~ /(?:clone|init|multi-init)$/) {
 		}
 		$ENV{GIT_DIR} = $git_dir;
 	}
+	$_repository = Git->repository(Repository => $ENV{GIT_DIR});
 }
 unless ($cmd =~ /^(?:clone|init|multi-init|commit-diff)$/) {
 	Git::SVN::Migration::migration_check();
@@ -332,6 +333,7 @@ sub cmd_init {
 	                       "as a command-line argument\n";
 	init_subdir(@_);
 	do_git_init_db();
+	$_repository = Git->repository(Repository => $ENV{GIT_DIR});
 
 	Git::SVN->init($url);
 }
@@ -2541,6 +2543,7 @@ use vars qw/@ISA/;
 use strict;
 use warnings;
 use Carp qw/croak/;
+use File::Temp qw/tempfile/;
 use IO::File qw//;
 use Digest::MD5;
 
@@ -2683,14 +2686,8 @@ sub apply_textdelta {
 	my $base = IO::File->new_tmpfile;
 	$base->autoflush(1);
 	if ($fb->{blob}) {
-		defined (my $pid = fork) or croak $!;
-		if (!$pid) {
-			open STDOUT, '>&', $base or croak $!;
-			print STDOUT 'link ' if ($fb->{mode_a} == 120000);
-			exec qw/git-cat-file blob/, $fb->{blob} or croak $!;
-		}
-		waitpid $pid, 0;
-		croak $? if $?;
+		my $contents = $::_repository->cat_blob($fb->{blob});
+		print $base $contents;
 
 		if (defined $exp) {
 			seek $base, 0, 0 or croak $!;
@@ -2729,14 +2726,18 @@ sub close_file {
 			$buf eq 'link ' or die "$path has mode 120000",
 			                       "but is not a link\n";
 		}
-		defined(my $pid = open my $out,'-|') or die "Can't fork: $!\n";
-		if (!$pid) {
-			open STDIN, '<&', $fh or croak $!;
-			exec qw/git-hash-object -w --stdin/ or croak $!;
+
+		my ($tmp_fh, $tmp_filename) = File::Temp::tempfile(UNLINK => 1);
+		my $result;
+		while ($result = sysread($fh, my $string, 1024)) {
+			syswrite($tmp_fh, $string, $result);
 		}
-		chomp($hash = do { local $/; <$out> });
-		close $out or croak $!;
+		defined $result or croak $!;
+		close $tmp_fh or croak $!;
+
 		close $fh or croak $!;
+
+		$hash = $::_repository->hash_and_insert_object($tmp_filename);
 		$hash =~ /^[a-f\d]{40}$/ or die "not a sha1: $hash\n";
 		close $fb->{base} or croak $!;
 	} else {
@@ -3063,13 +3064,8 @@ sub chg_file {
 	} elsif ($m->{mode_a} =~ /^120/ && $m->{mode_b} !~ /^120/) {
 		$self->change_file_prop($fbat,'svn:special',undef);
 	}
-	defined(my $pid = fork) or croak $!;
-	if (!$pid) {
-		open STDOUT, '>&', $fh or croak $!;
-		exec qw/git-cat-file blob/, $m->{sha1_b} or croak $!;
-	}
-	waitpid $pid, 0;
-	croak $? if $?;
+	my $blob = $::_repository->cat_blob($m->{sha1_b});
+	print $fh $blob;
 	$fh->flush == 0 or croak $!;
 	seek $fh, 0, 0 or croak $!;
 
-- 
1.5.3.4.1337.g8e67d-dirty

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob
  2007-10-25 10:25               ` [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob Adam Roben
  2007-10-25 10:25                 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben
@ 2007-10-26 15:11                 ` Eric Wong
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Wong @ 2007-10-26 15:11 UTC (permalink / raw)
  To: Adam Roben; +Cc: git, Junio Hamano

Adam Roben <aroben@apple.com> wrote:
> These functions are more efficient ways of executing `git hash-object -w` and
> `git cat-file blob` when you are dealing with many files/objects.
> 
> Signed-off-by: Adam Roben <aroben@apple.com>
> ---
> Eric Wong wrote:
> > > +package Git::Commands;
> > 
> > Can this be a separate file, or a part of Git.pm?  I'm sure other
> > scripts can eventually use this and I've been meaning to split
> > git-svn.perl into separate files so it's easier to follow.
> 
> I ended up making it part of Git.pm, because I realized that made far more
> sense than splitting it into a separate file.
> 
>  perl/Git.pm |   97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 95 insertions(+), 2 deletions(-)

Hi Adam,

Thanks.

> diff --git a/perl/Git.pm b/perl/Git.pm
> index 46c5d10..f23edef 100644
> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -39,6 +39,9 @@ $VERSION = '0.01';
>    my $lastrev = $repo->command_oneline( [ 'rev-list', '--all' ],
>                                          STDERR => 0 );
>  
> +  my $sha1 = $repo->hash_and_insert_object('file.txt');
> +  my $contents = $repo->cat_blob($sha1);

I missed this the first time around.  But I'd rather be able to pass a
file handle to cat_blob for writing, instead of returning a potentially
huge string in memory.

> @@ -675,6 +677,93 @@ sub hash_object {
>  }
>  
>  
> +=item hash_and_insert_object ( FILENAME )
> +
> +Compute the SHA1 object id of the given C<FILENAME> and add the object to the
> +object database.
> +
> +The function returns the SHA1 hash.
> +
> +=cut
> +
> +# TODO: Support for passing FILEHANDLE instead of FILENAME

Filenames are fine for this input since they (are/should be) generated
by File::Temp and not from an untrusted repo.

We should, however assert that the caller of this function
isn't using a stupid filename with "\n" in it.

> +sub hash_and_insert_object {
> +	my ($self, $filename) = @_;
> +
> +	$self->_open_hash_and_insert_object_if_needed();
> +	my ($in, $out) = ($self->{hash_object_in}, $self->{hash_object_out});
> +
> +	print $out $filename, "\n";
> +	chomp(my $hash = <$in>);
> +	return $hash;
> +}
> +
> +sub _open_hash_and_insert_object_if_needed {
> +	my ($self) = @_;
> +
> +	return if defined($self->{hash_object_pid});
> +
> +	($self->{hash_object_pid}, $self->{hash_object_in},
> +	 $self->{hash_object_out}, $self->{hash_object_ctx}) =
> +		command_bidi_pipe(qw(hash-object -w --stdin-paths));
> +}
> +
> +sub _close_hash_and_insert_object {
> +	my ($self) = @_;
> +
> +	return unless defined($self->{hash_object_pid});
> +
> +	my @vars = map { 'hash_object' . $_ } qw(pid in out ctx);

It looks like you're missing a '_' in there.

> +	command_close_bidi_pipe($self->{@vars});
> +	delete $self->{@vars};
> +}
> +


> +=item cat_blob ( SHA1 )
> +
> +Returns the contents of the blob identified by C<SHA1>.
> +
> +=cut
> +
> +sub cat_blob {
> +	my ($self, $sha1) = @_;
> +
> +	$self->_open_cat_blob_if_needed();
> +	my ($in, $out) = ($self->{cat_blob_in}, $self->{cat_blob_out});
> +
> +	print $out $sha1, "\n";
> +	chomp(my $size = <$in>);
> +
> +	my $blob;
> +	my $result = read($in, $blob, $size);
> +	defined $result or carp $!;
> +
> +	# Skip past the trailing newline.
> +	read($in, my $newline, 1);
> +
> +	return $blob;
> +}

However, I'd very much like to be able to pass a file handle to this
function.  This should read()/print() to a file handle passed to it in a
loop rather than slurping all of $size at once, since the files we're
receiving can be huge.

I'd also be happier if we checked that we actually read $size bytes in
the loop, and that $newline is actually "\n" to safeguard against bugs
in cat-blob.

> +sub _open_cat_blob_if_needed {
> +	my ($self) = @_;
> +
> +	return if defined($self->{cat_blob_pid});
> +
> +	($self->{cat_blob_pid}, $self->{cat_blob_in},
> +	 $self->{cat_blob_out}, $self->{cat_blob_ctx}) =
> +		command_bidi_pipe(qw(cat-file blob --stdin));
> +}
> +
> +sub _close_cat_blob {
> +	my ($self) = @_;
> +
> +	return unless defined($self->{cat_blob_pid});
> +
> +	my @vars = map { 'cat_blob' . $_ } qw(pid in out ctx);

It looks like you're missing a '_' here, too.

> +	command_close_bidi_pipe($self->{@vars});
> +	delete $self->{@vars};
> +}
>  

One more nit, I'm a bit paranoid, but I personally like to die/croak if
the result of every print()/syswrite() to make sure the pipe we're
writing to didn't die or if there were other error indicators.

Hopefully that's the last of tweaks I'd like to see :)

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/9] git-cat-file: Make option parsing a little more flexible
  2007-10-25 10:25     ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
  2007-10-25 10:25       ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
@ 2007-10-26 20:56       ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2007-10-26 20:56 UTC (permalink / raw)
  To: Adam Roben; +Cc: git

Adam Roben <aroben@apple.com> writes:

> This will make it easier to add newer options later.

A good change in principle.

> diff --git a/builtin-cat-file.c b/builtin-cat-file.c
> index 34a63d1..3a0be4a 100644
> --- a/builtin-cat-file.c
> +++ b/builtin-cat-file.c
> @@ -143,23 +143,41 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
>  	return 0;
>  }
>  
> +static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] <sha1>";
> +
>  int cmd_cat_file(int argc, const char **argv, const char *prefix)
>  {
> -	int opt;
> -	const char *exp_type, *obj_name;
> +	int i, opt = 0;
> +	const char *exp_type = 0, *obj_name = 0;

NULL pointer constants in git sources are spelled "NULL", not
"0".

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/9] git-cat-file: Add --stdin option
  2007-10-25 10:25       ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
  2007-10-25 10:25         ` [PATCH 5/9] Add tests for git hash-object Adam Roben
@ 2007-10-26 20:59         ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2007-10-26 20:59 UTC (permalink / raw)
  To: Adam Roben; +Cc: git, Brian Downing

Adam Roben <aroben@apple.com> writes:

> @@ -23,6 +23,10 @@ OPTIONS
>  	For a more complete list of ways to spell object names, see
>  	"SPECIFYING REVISIONS" section in gitlink:git-rev-parse[1].
>  
> +--stdin::
> +	Read object names from stdin instead of specifying one on the
> +	command line.
> +

This does not talk about modified output format: what the format
is, nor when that modified format is used.

> @@ -139,16 +139,26 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
>  	if (!buf)
>  		die("git-cat-file %s: bad file", obj_name);
>  
> +	if (print_size) {
> +		printf("%lu\n", size);
> +		fflush(stdout);
> +	}
>  	write_or_die(1, buf, size);
> +	if (print_size) {
> +		printf("\n");
> +		fflush(stdout);
> +	}
>  	return 0;
>  }
>  

Not that I object strongly to it, but do we need extra LF after
the contents?

  - "It would help readers written in typical scripting
    languages" is an acceptable answer, but I doubt that is the
    case --- the reader is given the number of bytes and is
    going to "read($pipe, $buf, $that_size)" anyway.

  - "The reader can assert that one-byte past the content is a
    LF to catch errors, and this LF would help re-synchronize
    after such an error" would be another acceptable answer, but
    for the re-synchronization to work, the output needs to tell
    which record each chunk is about (i.e. if the output were
    "<type> <sha1> <size>LF<contents>LF", the "re-sync" argument
    would make a bit more sense).

> +	print_size = !opt || opt == 'p';

Needs a bit of comment here, and in the documentation.  E.g.

	git-cat-file --stdin -t <list-of-sha1
        git-cat-file --stdin -s <list-of-sha1

	are ways to check types and sizes of the objects in the
	list.

How does --stdin interact with -e?

How does --stdin interact with -p when printing a tree or a tag
object?

How does "blob --stdin" do when input sequence contains a non
blob SHA1?

It almost feels that --stdin should be named something else,
such as --batch or --bulk, as it is not just affecting the
input.

Here is an alternative suggestion.

   Two new options, --batch and --batch-check, are introduced.
   These options are incompatible with -[tsep] or an object type
   given as the first parameter to git-cat-file.

   * git-cat-file --batch-check <list-of-sha1

     outputs a record of this form

          <sha1> SP <type> SP <size> LF

     for each of the input lines.

   * git-cat-file --batch <list-of-sha1

     outputs a record of this form

          <sha1> SP <type> SP <size> LF <contents> LF

     for each of the input lines.

  For a missing object, either option gives a record of form:

          <sha1> SP missing LF

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/9] Add tests for git hash-object
  2007-10-25 10:25         ` [PATCH 5/9] Add tests for git hash-object Adam Roben
  2007-10-25 10:25           ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Adam Roben
@ 2007-10-26 21:00           ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2007-10-26 21:00 UTC (permalink / raw)
  To: Adam Roben; +Cc: git, Johannes Sixt

Adam Roben <aroben@apple.com> writes:

> +test_expect_success \
> +    'hash a file' \
> +    "test $hello_sha1 = \$(git hash-object hello)"
> +
> +test_expect_success \
> +    'hash from stdin' \
> +    "test $hello_sha1 = \$(echo '$hello_content' | git hash-object --stdin)"

Needs to make sure no object has been written to the object
database at this point?

> +test_expect_success \
> +    'hash a file and write to database' \
> +    "test $hello_sha1 = \$(git hash-object -w hello)"

... and make sure the objectis written here?

> +test_expect_success \
> +    'hash from stdin and write to database' \
> +    "test $hello_sha1 = \$(echo '$hello_content' | git hash-object -w --stdin)"
> +
> +test_done

... and/or here?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/9] git-hash-object: Add --stdin-paths option
  2007-10-25 10:25           ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Adam Roben
  2007-10-25 10:25             ` [PATCH 7/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
@ 2007-10-26 21:00             ` Junio C Hamano
  2007-10-26 23:19               ` Brian Downing
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2007-10-26 21:00 UTC (permalink / raw)
  To: Adam Roben; +Cc: git

Adam Roben <aroben@apple.com> writes:

> This allows multiple paths to be specified on stdin.

Ok.  List of paths is certainly a good thing to have.

In addition, if you are enhancing cat-file to spew chunked
output out, I suspect that there should be a mode of operation
for hash-object that eats that data format.  IOW, this pipe

	git-cat-file --batch <list-of-sha1 |
        git-hash-object --batch

should be an intuitive no-op, shouldn't it?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/9] git-hash-object: Add --stdin-paths option
  2007-10-26 21:00             ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Junio C Hamano
@ 2007-10-26 23:19               ` Brian Downing
  2007-10-27  1:02                 ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Downing @ 2007-10-26 23:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Adam Roben, git

On Fri, Oct 26, 2007 at 02:00:47PM -0700, Junio C Hamano wrote:
> In addition, if you are enhancing cat-file to spew chunked
> output out, I suspect that there should be a mode of operation
> for hash-object that eats that data format.  IOW, this pipe
> 
> 	git-cat-file --batch <list-of-sha1 |
>         git-hash-object --batch
> 
> should be an intuitive no-op, shouldn't it?

I think that's an obviously good thing to do.  However, given your
suggested output format (which I also like):

>    * git-cat-file --batch <list-of-sha1
> 
>      outputs a record of this form
> 
>           <sha1> SP <type> SP <size> LF <contents> LF
> 
>      for each of the input lines.

What should the input behavior be?  Obviously the sha1 will probably
not be known on the input side.  Should that simply be optional (i.e.
it will accept either "<sha1> SP <type> SP <size>" or "<type> SP <size>"
or should it only accept the latter, and a dummy sha1 will need to be
filled in if the sha1 is not known (presumably "000...000")?

-bcd

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/9] git-hash-object: Add --stdin-paths option
  2007-10-26 23:19               ` Brian Downing
@ 2007-10-27  1:02                 ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2007-10-27  1:02 UTC (permalink / raw)
  To: Brian Downing; +Cc: Adam Roben, git

bdowning@lavos.net (Brian Downing) writes:

> On Fri, Oct 26, 2007 at 02:00:47PM -0700, Junio C Hamano wrote:
>> In addition, if you are enhancing cat-file to spew chunked
>> output out, I suspect that there should be a mode of operation
>> for hash-object that eats that data format.  IOW, this pipe
>> 
>> 	git-cat-file --batch <list-of-sha1 |
>>         git-hash-object --batch
>> 
>> should be an intuitive no-op, shouldn't it?
>
> I think that's an obviously good thing to do.  However, given your
> suggested output format (which I also like):
>
>>    * git-cat-file --batch <list-of-sha1
>> 
>>      outputs a record of this form
>> 
>>           <sha1> SP <type> SP <size> LF <contents> LF
>> 
>>      for each of the input lines.
>
> What should the input behavior be?  Obviously the sha1 will probably
> not be known on the input side.  Should that simply be optional (i.e.
> it will accept either "<sha1> SP <type> SP <size>" or "<type> SP <size>"
> or should it only accept the latter, and a dummy sha1 will need to be
> filled in if the sha1 is not known (presumably "000...000")?

Yeah, you caught me ;-)

Either making it optional or requiring a dummy value would work.
If a non-dummy value is given, we could use it to validate it.

But that would not be a useful application anyway.  So perhaps
just the sequence of "<type> SP <size> LF <contents> LF" would
be the most sensible thing to do.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2007-10-27  1:02 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-25 10:25 [RESEND PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben
2007-10-25 10:25 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
2007-10-25 10:25   ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
2007-10-25 10:25     ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
2007-10-25 10:25       ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
2007-10-25 10:25         ` [PATCH 5/9] Add tests for git hash-object Adam Roben
2007-10-25 10:25           ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Adam Roben
2007-10-25 10:25             ` [PATCH 7/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
2007-10-25 10:25               ` [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob Adam Roben
2007-10-25 10:25                 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben
2007-10-26 15:11                 ` [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob Eric Wong
2007-10-26 21:00             ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Junio C Hamano
2007-10-26 23:19               ` Brian Downing
2007-10-27  1:02                 ` Junio C Hamano
2007-10-26 21:00           ` [PATCH 5/9] Add tests for git hash-object Junio C Hamano
2007-10-26 20:59         ` [PATCH 4/9] git-cat-file: Add --stdin option Junio C Hamano
2007-10-26 20:56       ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2007-10-23  5:46 [PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben
2007-10-23  5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
2007-10-23  6:59   ` Johannes Sixt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).