* [PATCH 0/9] Make git-svn fetch ~1.7x faster
@ 2007-10-23 5:46 Adam Roben
2007-10-23 5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
2007-10-23 6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey
0 siblings, 2 replies; 25+ messages in thread
From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano
This patch series makes git-svn fetch about 1.7x faster by reducing the number
of forks/execs that occur for each file retrieved from Subversion. To do so, a
few new options are added to git-cat-file and git-hash-object to allow
continuous input on stdin and continuous output on stdout, so that one instance
of each of these commands can be kept running for the duration of the fetch.
The series is based on top of next. I considered basing it on top of the
parse_options work since I touch the option parsing in these two commands, but
I didn't know how wise it would be to base a patch series on something in pu.
I tried to add some new tests for cat-file and hash-object to ensure that I
didn't break old behavior, but I'm not very experienced with the git test suite
and I'm sure my tests could use some improvement. This is the most invasive
change I've yet made to git, so comments are more than welcome.
-Adam
--
Documentation/git-cat-file.txt | 11 +++-
Documentation/git-hash-object.txt | 5 +-
builtin-cat-file.c | 96 +++++++++++++++++++++----
git-svn.perl | 94 +++++++++++++++++++------
hash-object.c | 29 ++++++++-
perl/Git.pm | 56 +++++++++++++++
t/t1005-cat-file.sh | 139 +++++++++++++++++++++++++++++++++++++
t/t1006-hash-object.sh | 49 +++++++++++++
8 files changed, 438 insertions(+), 41 deletions(-)
--
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/9] Add tests for git cat-file
2007-10-23 5:46 [PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben
@ 2007-10-23 5:46 ` Adam Roben
2007-10-23 5:46 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
2007-10-23 6:59 ` [PATCH 1/9] Add tests for git cat-file Johannes Sixt
2007-10-23 6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey
1 sibling, 2 replies; 25+ messages in thread
From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Adam Roben
Signed-off-by: Adam Roben <aroben@apple.com>
---
t/t1005-cat-file.sh | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 91 insertions(+), 0 deletions(-)
create mode 100755 t/t1005-cat-file.sh
diff --git a/t/t1005-cat-file.sh b/t/t1005-cat-file.sh
new file mode 100755
index 0000000..2fdc446
--- /dev/null
+++ b/t/t1005-cat-file.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+
+test_description='git cat-file'
+
+. ./test-lib.sh
+
+function maybe_remove_timestamp()
+{
+ if test -z "$2"; then
+ echo "$1"
+ else
+ echo "$1" | sed -e 's/ [0-9]\{10\} [+-][0-9]\{4\}$//'
+ fi
+}
+
+function run_tests()
+{
+ type=$1
+ sha1=$2
+ size=$3
+ content=$4
+ pretty_content=$5
+ no_timestamp=$6
+
+ test_expect_success \
+ "$type exists" \
+ "git cat-file -e $hello_sha1"
+ test_expect_success \
+ "Type of $type is correct" \
+ "test $type = \"$(git cat-file -t $sha1)\""
+ test_expect_success \
+ "Size of $type is correct" \
+ "test $size = \"$(git cat-file -s $sha1)\""
+ test -z "$content" || test_expect_success \
+ "Content of $type is correct" \
+ "test \"$(maybe_remove_timestamp "$content" $no_timestamp)\" = \"$(maybe_remove_timestamp "$(git cat-file $type $sha1)" $no_timestamp)\""
+ test_expect_success \
+ "Pretty content of $type is correct" \
+ "test \"$(maybe_remove_timestamp "$pretty_content" $no_timestamp)\" = \"$(maybe_remove_timestamp "$(git cat-file -p $sha1)" $no_timestamp)\""
+}
+
+hello_content="Hello World"
+hello_size=$(echo "$hello_content" | wc -c)
+hello_sha1=557db03de997c86a4a028e1ebd3a1ceb225be238
+
+echo "$hello_content" > hello
+
+git update-index --add hello
+
+run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
+
+tree_sha1=$(git write-tree)
+tree_size=33
+tree_pretty_content="100644 blob $hello_sha1 hello"
+
+run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
+
+commit_message="Intial commit"
+commit_sha1=$(echo "$commit_message" | git commit-tree $tree_sha1)
+commit_size=177
+commit_content="tree $tree_sha1
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 0000000000 +0000
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 0000000000 +0000
+
+$commit_message"
+
+run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
+
+tag_header="object $hello_sha1
+type blob
+tag hellotag
+tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
+tag_description="This is a tag"
+tag_content="$tag_header
+
+$tag_description"
+tag_pretty_content="$tag_header
+Thu Jan 1 00:00:00 1970 +0000
+
+$tag_description"
+
+tag_sha1=$(echo "$tag_content" | git mktag)
+tag_size=$(echo "$tag_content" | wc -c)
+
+run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_pretty_content"
+
+test_expect_success \
+ "Reach a blob from a tag pointing to it" \
+ "test \"$hello_content\" = \"$(git cat-file blob $tag_sha1)\""
+
+test_done
--
1.5.3.4.1333.ga2f32
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file
2007-10-23 5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
@ 2007-10-23 5:46 ` Adam Roben
2007-10-23 5:46 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
2007-10-23 6:59 ` [PATCH 1/9] Add tests for git cat-file Johannes Sixt
1 sibling, 1 reply; 25+ messages in thread
From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Adam Roben
I separated the logic of parsing the arguments from the logic of fetching and
outputting the data. cat_one_file now does the latter.
Signed-off-by: Adam Roben <aroben@apple.com>
---
builtin-cat-file.c | 38 ++++++++++++++++++++++----------------
1 files changed, 22 insertions(+), 16 deletions(-)
diff --git a/builtin-cat-file.c b/builtin-cat-file.c
index f132d58..34a63d1 100644
--- a/builtin-cat-file.c
+++ b/builtin-cat-file.c
@@ -76,31 +76,16 @@ static void pprint_tag(const unsigned char *sha1, const char *buf, unsigned long
write_or_die(1, cp, endp - cp);
}
-int cmd_cat_file(int argc, const char **argv, const char *prefix)
+static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
{
unsigned char sha1[20];
enum object_type type;
void *buf;
unsigned long size;
- int opt;
- const char *exp_type, *obj_name;
-
- git_config(git_default_config);
- if (argc != 3)
- usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>");
- exp_type = argv[1];
- obj_name = argv[2];
if (get_sha1(obj_name, sha1))
die("Not a valid object name %s", obj_name);
- opt = 0;
- if ( exp_type[0] == '-' ) {
- opt = exp_type[1];
- if ( !opt || exp_type[2] )
- opt = -1; /* Not a single character option */
- }
-
buf = NULL;
switch (opt) {
case 't':
@@ -157,3 +142,24 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
write_or_die(1, buf, size);
return 0;
}
+
+int cmd_cat_file(int argc, const char **argv, const char *prefix)
+{
+ int opt;
+ const char *exp_type, *obj_name;
+
+ git_config(git_default_config);
+ if (argc != 3)
+ usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>");
+ exp_type = argv[1];
+ obj_name = argv[2];
+
+ opt = 0;
+ if ( exp_type[0] == '-' ) {
+ opt = exp_type[1];
+ if ( !opt || exp_type[2] )
+ opt = -1; /* Not a single character option */
+ }
+
+ return cat_one_file(opt, exp_type, obj_name);
+}
--
1.5.3.4.1333.ga2f32
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 3/9] git-cat-file: Make option parsing a little more flexible
2007-10-23 5:46 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
@ 2007-10-23 5:46 ` Adam Roben
2007-10-23 5:46 ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
0 siblings, 1 reply; 25+ messages in thread
From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Adam Roben
This will make it easier to add newer options later.
Signed-off-by: Adam Roben <aroben@apple.com>
---
builtin-cat-file.c | 42 ++++++++++++++++++++++++++++++------------
1 files changed, 30 insertions(+), 12 deletions(-)
diff --git a/builtin-cat-file.c b/builtin-cat-file.c
index 34a63d1..3a0be4a 100644
--- a/builtin-cat-file.c
+++ b/builtin-cat-file.c
@@ -143,23 +143,41 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
return 0;
}
+static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] <sha1>";
+
int cmd_cat_file(int argc, const char **argv, const char *prefix)
{
- int opt;
- const char *exp_type, *obj_name;
+ int i, opt = 0;
+ const char *exp_type = 0, *obj_name = 0;
git_config(git_default_config);
- if (argc != 3)
- usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>");
- exp_type = argv[1];
- obj_name = argv[2];
-
- opt = 0;
- if ( exp_type[0] == '-' ) {
- opt = exp_type[1];
- if ( !opt || exp_type[2] )
- opt = -1; /* Not a single character option */
+
+ for (i = 1; i < argc; ++i) {
+ const char *arg = argv[i];
+
+ if (!strcmp(arg, "-t") || !strcmp(arg, "-s") || !strcmp(arg, "-e") || !strcmp(arg, "-p")) {
+ exp_type = arg;
+ opt = exp_type[1];
+ continue;
+ }
+
+ if (arg[0] == '-')
+ usage(cat_file_usage);
+
+ if (!exp_type) {
+ exp_type = arg;
+ continue;
+ }
+
+ if (obj_name)
+ usage(cat_file_usage);
+
+ obj_name = arg;
+ break;
}
+ if (!exp_type || !obj_name)
+ usage(cat_file_usage);
+
return cat_one_file(opt, exp_type, obj_name);
}
--
1.5.3.4.1333.ga2f32
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 4/9] git-cat-file: Add --stdin option
2007-10-23 5:46 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
@ 2007-10-23 5:46 ` Adam Roben
2007-10-23 5:46 ` [PATCH 5/9] git-cat-file: Add --separator option Adam Roben
0 siblings, 1 reply; 25+ messages in thread
From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Adam Roben
This lets you specify object names on stdin instead of on the command line.
Signed-off-by: Adam Roben <aroben@apple.com>
---
Documentation/git-cat-file.txt | 6 +++++-
builtin-cat-file.c | 26 ++++++++++++++++++++++----
t/t1005-cat-file.sh | 14 ++++++++++++++
3 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index afa095c..588d71a 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -8,7 +8,7 @@ git-cat-file - Provide content or type/size information for repository objects
SYNOPSIS
--------
-'git-cat-file' [-t | -s | -e | -p | <type>] <object>
+'git-cat-file' [-t | -s | -e | -p | <type>] [--stdin | <object>]
DESCRIPTION
-----------
@@ -23,6 +23,10 @@ OPTIONS
For a more complete list of ways to spell object names, see
"SPECIFYING REVISIONS" section in gitlink:git-rev-parse[1].
+--stdin::
+ Read object names from stdin instead of specifying one on the
+ command line.
+
-t::
Instead of the content, show the object type identified by
<object>.
diff --git a/builtin-cat-file.c b/builtin-cat-file.c
index 3a0be4a..0f1ffe5 100644
--- a/builtin-cat-file.c
+++ b/builtin-cat-file.c
@@ -143,12 +143,14 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
return 0;
}
-static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] <sha1>";
+static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] [--stdin | <sha1>]";
int cmd_cat_file(int argc, const char **argv, const char *prefix)
{
int i, opt = 0;
+ int read_stdin = 0;
const char *exp_type = 0, *obj_name = 0;
+ struct strbuf buf;
git_config(git_default_config);
@@ -161,6 +163,11 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
continue;
}
+ if (!strcmp(arg, "--stdin")) {
+ read_stdin = 1;
+ continue;
+ }
+
if (arg[0] == '-')
usage(cat_file_usage);
@@ -169,15 +176,26 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
continue;
}
- if (obj_name)
+ if (obj_name || read_stdin)
usage(cat_file_usage);
obj_name = arg;
break;
}
- if (!exp_type || !obj_name)
+ if (!read_stdin) {
+ if (!exp_type || !obj_name)
usage(cat_file_usage);
+ return cat_one_file(opt, exp_type, obj_name);
+ }
- return cat_one_file(opt, exp_type, obj_name);
+ strbuf_init(&buf, 0);
+ while (strbuf_getline(&buf, stdin, '\n') != EOF) {
+ int error = cat_one_file(opt, exp_type, buf.buf);
+ if (error)
+ return error;
+ }
+ strbuf_release(&buf);
+
+ return 0;
}
diff --git a/t/t1005-cat-file.sh b/t/t1005-cat-file.sh
index 2fdc446..49eb89d 100755
--- a/t/t1005-cat-file.sh
+++ b/t/t1005-cat-file.sh
@@ -88,4 +88,18 @@ test_expect_success \
"Reach a blob from a tag pointing to it" \
"test \"$hello_content\" = \"$(git cat-file blob $tag_sha1)\""
+sha1s="$hello_sha1
+$tree_sha1
+$commit_sha1
+$tag_sha1"
+
+sizes="$hello_size
+$tree_size
+$commit_size
+$tag_size"
+
+test_expect_success \
+ "Pass object hashes on stdin" \
+ "test \"$sizes\" = \"$(echo "$sha1s" | git cat-file -s --stdin)\""
+
test_done
--
1.5.3.4.1333.ga2f32
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 5/9] git-cat-file: Add --separator option
2007-10-23 5:46 ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
@ 2007-10-23 5:46 ` Adam Roben
2007-10-23 5:46 ` [PATCH 6/9] Add tests for git hash-object Adam Roben
2007-10-24 3:43 ` [PATCH 5/9] git-cat-file: Add --separator option Brian Downing
0 siblings, 2 replies; 25+ messages in thread
From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Adam Roben
This lets the user specify a string to be printed in between the output from
each object passed on stdin.
Signed-off-by: Adam Roben <aroben@apple.com>
---
Documentation/git-cat-file.txt | 7 ++++++-
builtin-cat-file.c | 28 +++++++++++++++++++++++++---
t/t1005-cat-file.sh | 36 +++++++++++++++++++++++++++++++++++-
3 files changed, 66 insertions(+), 5 deletions(-)
diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 588d71a..7a59a5e 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -8,7 +8,7 @@ git-cat-file - Provide content or type/size information for repository objects
SYNOPSIS
--------
-'git-cat-file' [-t | -s | -e | -p | <type>] [--stdin | <object>]
+'git-cat-file' [-t | -s | -e | -p | <type>] [--stdin [--separator <string>] | <object>]
DESCRIPTION
-----------
@@ -27,6 +27,11 @@ OPTIONS
Read object names from stdin instead of specifying one on the
command line.
+--separator::
+ A string to print in between the output for each object passed on
+ stdin. A newline will be appended to the separator each time it is
+ printed.
+
-t::
Instead of the content, show the object type identified by
<object>.
diff --git a/builtin-cat-file.c b/builtin-cat-file.c
index 0f1ffe5..9ae3184 100644
--- a/builtin-cat-file.c
+++ b/builtin-cat-file.c
@@ -92,6 +92,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
type = sha1_object_info(sha1, NULL);
if (type > 0) {
printf("%s\n", typename(type));
+ fflush(stdout);
return 0;
}
break;
@@ -100,6 +101,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
type = sha1_object_info(sha1, &size);
if (type > 0) {
printf("%lu\n", size);
+ fflush(stdout);
return 0;
}
break;
@@ -143,14 +145,16 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
return 0;
}
-static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] [--stdin | <sha1>]";
+static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] [--stdin [--separator <string>] | <sha1>]";
+
+static const char *separator;
int cmd_cat_file(int argc, const char **argv, const char *prefix)
{
int i, opt = 0;
int read_stdin = 0;
const char *exp_type = 0, *obj_name = 0;
- struct strbuf buf;
+ struct strbuf buf, sbuf;
git_config(git_default_config);
@@ -168,6 +172,13 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
continue;
}
+ if (!strcmp(arg, "--separator")) {
+ if (++i == argc)
+ usage(cat_file_usage);
+ separator = argv[i];
+ continue;
+ }
+
if (arg[0] == '-')
usage(cat_file_usage);
@@ -184,18 +195,29 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
}
if (!read_stdin) {
- if (!exp_type || !obj_name)
+ if (!exp_type || !obj_name || separator)
usage(cat_file_usage);
return cat_one_file(opt, exp_type, obj_name);
}
+ if (separator) {
+ strbuf_init(&sbuf, 0);
+ strbuf_addstr(&sbuf, separator);
+ strbuf_addch(&sbuf, '\n');
+ }
+
strbuf_init(&buf, 0);
while (strbuf_getline(&buf, stdin, '\n') != EOF) {
int error = cat_one_file(opt, exp_type, buf.buf);
if (error)
return error;
+ if (separator)
+ write_or_die(1, sbuf.buf, sbuf.len);
}
strbuf_release(&buf);
+ if (separator)
+ strbuf_release(&sbuf);
+
return 0;
}
diff --git a/t/t1005-cat-file.sh b/t/t1005-cat-file.sh
index 49eb89d..52a3efd 100755
--- a/t/t1005-cat-file.sh
+++ b/t/t1005-cat-file.sh
@@ -99,7 +99,41 @@ $commit_size
$tag_size"
test_expect_success \
- "Pass object hashes on stdin" \
+ "Print sizes for object hashes on stdin" \
"test \"$sizes\" = \"$(echo "$sha1s" | git cat-file -s --stdin)\""
+separator="TESTSEPARATOR"
+
+separated_sizes="$hello_size
+$separator
+$tree_size
+$separator
+$commit_size
+$separator
+$tag_size
+$separator"
+
+test_expect_success \
+ "Print sizes for object hashes on stdin with --separator" \
+ "test \"$separated_sizes\" = \"$(echo "$sha1s" | git cat-file -s --stdin --separator $separator)\""
+
+sha1s="$hello_sha1
+$hello_sha1"
+
+contents="$hello_content
+$hello_content"
+
+separated_contents="$hello_content
+$separator
+$hello_content
+$separator"
+
+test_expect_success \
+ "Print objects for object hashes on stdin" \
+ "test \"$contents\" = \"$(echo "$sha1s" | git cat-file blob --stdin)\""
+
+test_expect_success \
+ "Print objects for object hashes on stdin with --separator" \
+ "test \"$separated_contents\" = \"$(echo "$sha1s" | git cat-file blob --stdin --separator $separator)\""
+
test_done
--
1.5.3.4.1333.ga2f32
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 6/9] Add tests for git hash-object
2007-10-23 5:46 ` [PATCH 5/9] git-cat-file: Add --separator option Adam Roben
@ 2007-10-23 5:46 ` Adam Roben
2007-10-23 5:46 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben
2007-10-23 6:59 ` [PATCH 6/9] Add tests for git hash-object Johannes Sixt
2007-10-24 3:43 ` [PATCH 5/9] git-cat-file: Add --separator option Brian Downing
1 sibling, 2 replies; 25+ messages in thread
From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Adam Roben
Signed-off-by: Adam Roben <aroben@apple.com>
---
t/t1006-hash-object.sh | 27 +++++++++++++++++++++++++++
1 files changed, 27 insertions(+), 0 deletions(-)
create mode 100755 t/t1006-hash-object.sh
diff --git a/t/t1006-hash-object.sh b/t/t1006-hash-object.sh
new file mode 100755
index 0000000..77b8eca
--- /dev/null
+++ b/t/t1006-hash-object.sh
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+test_description='git hash-object'
+
+. ./test-lib.sh
+
+hello_content="Hello World"
+hello_sha1=557db03de997c86a4a028e1ebd3a1ceb225be238
+echo "$hello_content" > hello
+
+test_expect_success \
+ 'hash a file' \
+ "test $hello_sha1 = $(git hash-object hello)"
+
+test_expect_success \
+ 'hash from stdin' \
+ "test $hello_sha1 = $(echo "$hello_content" | git hash-object --stdin)"
+
+test_expect_success \
+ 'hash a file and write to database' \
+ "test $hello_sha1 = $(git hash-object -w hello)"
+
+test_expect_success \
+ 'hash from stdin and write to database' \
+ "test $hello_sha1 = $(echo "$hello_content" | git hash-object -w --stdin)"
+
+test_done
--
1.5.3.4.1333.ga2f32
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 7/9] git-hash-object: Add --stdin-paths option
2007-10-23 5:46 ` [PATCH 6/9] Add tests for git hash-object Adam Roben
@ 2007-10-23 5:46 ` Adam Roben
2007-10-23 5:46 ` [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
2007-10-23 5:53 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Shawn O. Pearce
2007-10-23 6:59 ` [PATCH 6/9] Add tests for git hash-object Johannes Sixt
1 sibling, 2 replies; 25+ messages in thread
From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Adam Roben
This allows multiple paths to be specified on stdin.
Signed-off-by: Adam Roben <aroben@apple.com>
---
Documentation/git-hash-object.txt | 5 ++++-
hash-object.c | 29 ++++++++++++++++++++++++++++-
t/t1006-hash-object.sh | 22 ++++++++++++++++++++++
3 files changed, 54 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-hash-object.txt b/Documentation/git-hash-object.txt
index 616f196..50fc401 100644
--- a/Documentation/git-hash-object.txt
+++ b/Documentation/git-hash-object.txt
@@ -8,7 +8,7 @@ git-hash-object - Compute object ID and optionally creates a blob from a file
SYNOPSIS
--------
-'git-hash-object' [-t <type>] [-w] [--stdin] [--] <file>...
+'git-hash-object' [-t <type>] [-w] [--stdin | --stdin-paths] [--] <file>...
DESCRIPTION
-----------
@@ -32,6 +32,9 @@ OPTIONS
--stdin::
Read the object from standard input instead of from a file.
+--stdin-paths::
+ Read file names from stdin instead of from the command-line.
+
Author
------
Written by Junio C Hamano <junkio@cox.net>
diff --git a/hash-object.c b/hash-object.c
index 18f5017..fd96d50 100644
--- a/hash-object.c
+++ b/hash-object.c
@@ -20,6 +20,7 @@ static void hash_object(const char *path, enum object_type type, int write_objec
? "Unable to add %s to database"
: "Unable to hash %s", path);
printf("%s\n", sha1_to_hex(sha1));
+ maybe_flush_or_die(stdout, "hash to stdout");
}
static void hash_stdin(const char *type, int write_object)
@@ -31,7 +32,7 @@ static void hash_stdin(const char *type, int write_object)
}
static const char hash_object_usage[] =
-"git-hash-object [-t <type>] [-w] [--stdin] <file>...";
+"git-hash-object [-t <type>] [-w] [--stdin | --stdin-paths] <file>...";
int main(int argc, char **argv)
{
@@ -41,6 +42,7 @@ int main(int argc, char **argv)
const char *prefix = NULL;
int prefix_length = -1;
int no_more_flags = 0;
+ int found_stdin_flag = 0;
for (i = 1 ; i < argc; i++) {
if (!no_more_flags && argv[i][0] == '-') {
@@ -62,7 +64,32 @@ int main(int argc, char **argv)
}
else if (!strcmp(argv[i], "--help"))
usage(hash_object_usage);
+ else if (!strcmp(argv[i], "--stdin-paths")) {
+ struct strbuf buf, nbuf;
+
+ if (found_stdin_flag)
+ die("Can't use both --stdin and --stdin-paths");
+ found_stdin_flag = 1;
+
+ strbuf_init(&buf, 0);
+ strbuf_init(&nbuf, 0);
+ while (strbuf_getline(&buf, stdin, '\n') != EOF) {
+ if (buf.buf[0] == '"') {
+ strbuf_reset(&nbuf);
+ if (unquote_c_style(&nbuf, buf.buf, NULL))
+ die("line is badly quoted");
+ strbuf_swap(&buf, &nbuf);
+ }
+ hash_object(buf.buf, type_from_string(type), write_object);
+ }
+ strbuf_release(&buf);
+ strbuf_release(&nbuf);
+ }
else if (!strcmp(argv[i], "--stdin")) {
+ if (found_stdin_flag)
+ die("Can't use both --stdin and --stdin-paths");
+ found_stdin_flag = 1;
+
hash_stdin(type, write_object);
}
else
diff --git a/t/t1006-hash-object.sh b/t/t1006-hash-object.sh
index 77b8eca..e6da1c1 100755
--- a/t/t1006-hash-object.sh
+++ b/t/t1006-hash-object.sh
@@ -24,4 +24,26 @@ test_expect_success \
'hash from stdin and write to database' \
"test $hello_sha1 = $(echo "$hello_content" | git hash-object -w --stdin)"
+example_content="Silly example"
+example_sha1=f24c74a2e500f5ee1332c86b94199f52b1d1d962
+echo "$example_content" > example
+
+filenames="hello
+example"
+
+sha1s="$hello_sha1
+$example_sha1"
+
+test_expect_success \
+ 'hash two files with names on stdin' \
+ "test \"$sha1s\" = \"$(echo "$filenames" | git hash-object --stdin-paths)\""
+
+test_expect_success \
+ 'hash two files with names on stdin and write to database' \
+ "test \"$sha1s\" = \"$(echo "$filenames" | git hash-object --stdin-paths)\""
+
+test_expect_failure \
+ "Can't use --stdin and --stdin-paths together" \
+ "echo \"$filenames\" | git hash-object --stdin --stdin-paths"
+
test_done
--
1.5.3.4.1333.ga2f32
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe
2007-10-23 5:46 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben
@ 2007-10-23 5:46 ` Adam Roben
2007-10-23 5:46 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben
2007-10-23 5:53 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Shawn O. Pearce
1 sibling, 1 reply; 25+ messages in thread
From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Adam Roben, Petr Baudis
command_bidi_pipe hands back the stdin and stdout file handles from the
executed command. command_close_bidi_pipe closes these handles and terminates
the process.
Signed-off-by: Adam Roben <aroben@apple.com>
---
perl/Git.pm | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 56 insertions(+), 0 deletions(-)
diff --git a/perl/Git.pm b/perl/Git.pm
index 3f4080c..eb699ff 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -51,6 +51,7 @@ require Exporter;
# Methods which can be called as standalone functions as well:
@EXPORT_OK = qw(command command_oneline command_noisy
command_output_pipe command_input_pipe command_close_pipe
+ command_bidi_pipe command_close_bidi_pipe
version exec_path hash_object git_cmd_try);
@@ -92,6 +93,7 @@ increate nonwithstanding).
use Carp qw(carp croak); # but croak is bad - throw instead
use Error qw(:try);
use Cwd qw(abs_path);
+use IPC::Open2 qw(open2);
}
@@ -375,6 +377,60 @@ sub command_close_pipe {
_cmd_close($fh, $ctx);
}
+=item command_bidi_pipe ( COMMAND [, ARGUMENTS... ] )
+
+Execute the given C<COMMAND> in the same way as command_output_pipe()
+does but return both an input pipe filehandle and an output pipe filehandle.
+
+The function will return return C<($pid, $pipe_in, $pipe_out, $ctx)>.
+See C<command_close_bidi_pipe()> for details.
+
+=cut
+
+sub command_bidi_pipe {
+ my ($pid, $in, $out);
+ $pid = open2($in, $out, @_);
+ return ($pid, $in, $out, join(' ', @_));
+}
+
+=item command_close_bidi_pipe ( PID, PIPE_IN, PIPE_OUT [, CTX] )
+
+Close the C<PIPE_IN> and C<PIPE_OUT> as returned from C<command_bidi_pipe()>,
+checking whether the command finished successfully. The optional C<CTX>
+argument is required if you want to see the command name in the error message,
+and it is the fourth value returned by C<command_bidi_pipe()>. The call idiom
+is:
+
+ my ($pid, $in, $out, $ctx) = $r->command_bidi_pipe('cat-file --stdin');
+ print "000000000\n" $out;
+ while (<$in>) { ... }
+ $r->command_close_bidi_pipe($pid, $in, $out, $ctx);
+
+Note that you should not rely on whatever actually is in C<CTX>;
+currently it is simply the command name but in future the context might
+have more complicated structure.
+
+=cut
+
+sub command_close_bidi_pipe {
+ my ($pid, $in, $out, $ctx) = @_;
+ foreach my $fh ($in, $out) {
+ if (not close $fh) {
+ if ($!) {
+ carp "error closing pipe: $!";
+ } elsif ($? >> 8) {
+ throw Git::Error::Command($ctx, $? >>8);
+ }
+ }
+ }
+
+ waitpid $pid, 0;
+
+ if ($? >> 8) {
+ throw Git::Error::Command($ctx, $? >>8);
+ }
+}
+
=item command_noisy ( COMMAND [, ARGUMENTS... ] )
--
1.5.3.4.1333.ga2f32
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 9/9] git-svn: Make fetch ~1.7x faster
2007-10-23 5:46 ` [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
@ 2007-10-23 5:46 ` Adam Roben
2007-10-23 7:01 ` Johannes Sixt
2007-10-24 6:34 ` Eric Wong
0 siblings, 2 replies; 25+ messages in thread
From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Adam Roben, Eric Wong
We were spending a lot of time forking/execing git-cat-file and
git-hash-object. We now use command_bidi_pipe to keep one instance of each
running and feed it input on stdin.
Signed-off-by: Adam Roben <aroben@apple.com>
---
git-svn.perl | 94 ++++++++++++++++++++++++++++++++++++++++++++-------------
1 files changed, 72 insertions(+), 22 deletions(-)
diff --git a/git-svn.perl b/git-svn.perl
index 22bb47b..8b72046 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -236,6 +236,8 @@ eval {
};
fatal $@ if $@;
post_fetch_checkout();
+Git::Commands->close_cat_blob();
+Git::Commands->close_hash_object();
exit 0;
####################### primary functions ######################
@@ -2683,14 +2685,8 @@ sub apply_textdelta {
my $base = IO::File->new_tmpfile;
$base->autoflush(1);
if ($fb->{blob}) {
- defined (my $pid = fork) or croak $!;
- if (!$pid) {
- open STDOUT, '>&', $base or croak $!;
- print STDOUT 'link ' if ($fb->{mode_a} == 120000);
- exec qw/git-cat-file blob/, $fb->{blob} or croak $!;
- }
- waitpid $pid, 0;
- croak $? if $?;
+ my $contents = Git::Commands->cat_blob($fb->{blob});
+ print $base $contents;
if (defined $exp) {
seek $base, 0, 0 or croak $!;
@@ -2729,13 +2725,7 @@ sub close_file {
$buf eq 'link ' or die "$path has mode 120000",
"but is not a link\n";
}
- defined(my $pid = open my $out,'-|') or die "Can't fork: $!\n";
- if (!$pid) {
- open STDIN, '<&', $fh or croak $!;
- exec qw/git-hash-object -w --stdin/ or croak $!;
- }
- chomp($hash = do { local $/; <$out> });
- close $out or croak $!;
+ $hash = Git::Commands->hash_object($fh);
close $fh or croak $!;
$hash =~ /^[a-f\d]{40}$/ or die "not a sha1: $hash\n";
close $fb->{base} or croak $!;
@@ -3063,13 +3053,8 @@ sub chg_file {
} elsif ($m->{mode_a} =~ /^120/ && $m->{mode_b} !~ /^120/) {
$self->change_file_prop($fbat,'svn:special',undef);
}
- defined(my $pid = fork) or croak $!;
- if (!$pid) {
- open STDOUT, '>&', $fh or croak $!;
- exec qw/git-cat-file blob/, $m->{sha1_b} or croak $!;
- }
- waitpid $pid, 0;
- croak $? if $?;
+ my $blob = Git::Commands->cat_blob($m->{sha1_b});
+ print $fh $blob;
$fh->flush == 0 or croak $!;
seek $fh, 0, 0 or croak $!;
@@ -4272,6 +4257,71 @@ sub full_path {
$path . (length $self->{right} ? "/$self->{right}" : '');
}
+package Git::Commands;
+use vars qw/$_cat_blob_pid $_cat_blob_in $_cat_blob_out $_cat_blob_ctx $_cat_blob_separator
+ $_hash_object_pid $_hash_object_in $_hash_object_out $_hash_object_ctx/;
+use strict;
+use warnings;
+use File::Temp qw/tempfile/;
+use Git qw/command_bidi_pipe command_close_bidi_pipe/;
+
+sub _open_cat_blob_if_needed {
+ return if defined($_cat_blob_pid);
+ $_cat_blob_separator = "--------------GITCATFILESEPARATOR-----------";
+
+ ($_cat_blob_pid, $_cat_blob_in, $_cat_blob_out, $_cat_blob_ctx) = command_bidi_pipe(qw(git-cat-file blob --stdin --separator), $_cat_blob_separator);
+}
+
+sub close_cat_blob {
+ return unless defined($_cat_blob_pid);
+
+ command_close_bidi_pipe($_cat_blob_pid, $_cat_blob_in, $_cat_blob_out, $_cat_blob_ctx);
+}
+
+sub cat_blob {
+ my (undef, $sha1) = @_;
+
+ _open_cat_blob_if_needed();
+ print $_cat_blob_out "$sha1\n";
+ my @file = ();
+ while (my $line = <$_cat_blob_in>) {
+ my $last = 0;
+ if ($line =~ s/\Q$_cat_blob_separator\E$//) {
+ chomp($line);
+ $last = 1;
+ }
+ push(@file, $line);
+ last if $last;
+ }
+ return join('', @file);
+}
+
+sub _open_hash_object_if_needed {
+ return if defined($_hash_object_pid);
+
+ ($_hash_object_pid, $_hash_object_in, $_hash_object_out, $_hash_object_ctx) = command_bidi_pipe(qw(git-hash-object -w --stdin-paths));
+}
+
+sub close_hash_object {
+ return unless defined($_hash_object_pid);
+
+ command_close_bidi_pipe($_hash_object_pid, $_hash_object_in, $_hash_object_out, $_hash_object_ctx);
+}
+
+sub hash_object {
+ my (undef, $fh) = @_;
+
+ my ($tmp_fh, $tmp_filename) = tempfile(UNLINK => 1);
+ while (my $line = <$fh>) {
+ print $tmp_fh $line;
+ }
+ close($tmp_fh);
+ _open_hash_object_if_needed();
+ print $_hash_object_out $tmp_filename . "\n";
+ chomp(my $hash = <$_hash_object_in>);
+ return $hash;
+}
+
__END__
Data structures:
--
1.5.3.4.1333.ga2f32
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 7/9] git-hash-object: Add --stdin-paths option
2007-10-23 5:46 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben
2007-10-23 5:46 ` [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
@ 2007-10-23 5:53 ` Shawn O. Pearce
2007-10-23 5:57 ` Adam Roben
1 sibling, 1 reply; 25+ messages in thread
From: Shawn O. Pearce @ 2007-10-23 5:53 UTC (permalink / raw)
To: Adam Roben; +Cc: git, Junio C Hamano
Adam Roben <aroben@apple.com> wrote:
> This allows multiple paths to be specified on stdin.
git-fast-import wasn't suited to the task?
--
Shawn.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 7/9] git-hash-object: Add --stdin-paths option
2007-10-23 5:53 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Shawn O. Pearce
@ 2007-10-23 5:57 ` Adam Roben
2007-10-23 6:10 ` Shawn O. Pearce
0 siblings, 1 reply; 25+ messages in thread
From: Adam Roben @ 2007-10-23 5:57 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git, Junio C Hamano
Shawn O. Pearce wrote:
> Adam Roben <aroben@apple.com> wrote:
>
>> This allows multiple paths to be specified on stdin.
>>
>
> git-fast-import wasn't suited to the task?
>
I actually considered using fast-import for the whole shebang, but
decided that I don't yet understand the workings and structure of
git-svn well enough to make such a big change.
git-svn uses git-hash-object to both determine a file's hash and insert
it into the index in one go -- can fast-import do this? Or will it just
put it in the index and not give you the hash back? The latter was my
impression.
-Adam
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/9] Make git-svn fetch ~1.7x faster
2007-10-23 5:46 [PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben
2007-10-23 5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
@ 2007-10-23 6:08 ` Mike Hommey
2007-10-23 6:13 ` Adam Roben
2007-10-24 0:43 ` Sam Vilain
1 sibling, 2 replies; 25+ messages in thread
From: Mike Hommey @ 2007-10-23 6:08 UTC (permalink / raw)
To: Adam Roben; +Cc: git, Junio C Hamano
On Mon, Oct 22, 2007 at 10:46:28PM -0700, Adam Roben wrote:
>
> This patch series makes git-svn fetch about 1.7x faster by reducing the number
> of forks/execs that occur for each file retrieved from Subversion. To do so, a
> few new options are added to git-cat-file and git-hash-object to allow
> continuous input on stdin and continuous output on stdout, so that one instance
> of each of these commands can be kept running for the duration of the fetch.
You don't need to do this to avoid forks. Just use git-fast-import
instead.
Mike
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 7/9] git-hash-object: Add --stdin-paths option
2007-10-23 5:57 ` Adam Roben
@ 2007-10-23 6:10 ` Shawn O. Pearce
2007-10-24 6:11 ` Eric Wong
0 siblings, 1 reply; 25+ messages in thread
From: Shawn O. Pearce @ 2007-10-23 6:10 UTC (permalink / raw)
To: Adam Roben; +Cc: git, Junio C Hamano
Adam Roben <aroben@apple.com> wrote:
> Shawn O. Pearce wrote:
> >Adam Roben <aroben@apple.com> wrote:
> >
> >>This allows multiple paths to be specified on stdin.
> >
> >git-fast-import wasn't suited to the task?
>
> I actually considered using fast-import for the whole shebang, but
> decided that I don't yet understand the workings and structure of
> git-svn well enough to make such a big change.
>
> git-svn uses git-hash-object to both determine a file's hash and insert
> it into the index in one go -- can fast-import do this? Or will it just
> put it in the index and not give you the hash back? The latter was my
> impression.
It doesn't currently give you the hash back. You can sort of get
to it by marking the blob then using the 'checkpoint' command to
dump the marks to a file, which you can read in. Not good.
It probably wouldn't be very difficult to give fast-import a way
to dump marks back on stdout as they are assigned. So long as the
frontend either locksteps with fast-import or is willing to monitor
it with a select/poll type of arrangement and read from stdout as
soon as its ready.
Probably a 5 line code change to fast-import. Like this. Only Git
won't recognize that object SHA-1 as its in a packfile that has
no index. You'd need to 'checkpoint' to flush the object out, or
just use all of fast-import for the processing. So yea, I guess
I can see now how its not suited to this.
--8>--
diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index d511967..7fd8b2c 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -67,6 +67,10 @@ OPTIONS
at checkpoint (or completion) the same path can also be
safely given to \--import-marks.
+--export-marks-to-stdout::
+ Dumps marks to stdout as soon as they are assigned.
+ Marks are written one per line as `:markid SHA-1`.
+
--import-marks=<file>::
Before processing any input, load the marks specified in
<file>. The input file must exist, must be readable, and
diff --git a/fast-import.c b/fast-import.c
index 6f888f6..619ed05 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -272,6 +272,7 @@ struct recent_command
static unsigned long max_depth = 10;
static off_t max_packsize = (1LL << 32) - 1;
static int force_update;
+static int marks_to_stdout;
/* Stats and misc. counters */
static uintmax_t alloc_count;
@@ -561,6 +562,7 @@ static char *pool_strdup(const char *s)
static void insert_mark(uintmax_t idnum, struct object_entry *oe)
{
+ uintmax_t orig_idnum = idnum;
struct mark_set *s = marks;
while ((idnum >> s->shift) >= 1024) {
s = pool_calloc(1, sizeof(struct mark_set));
@@ -580,6 +582,8 @@ static void insert_mark(uintmax_t idnum, struct object_entry *oe)
if (!s->data.marked[idnum])
marks_set_count++;
s->data.marked[idnum] = oe;
+ if (marks_to_stdout)
+ printf(":%" PRIuMAX " %s\n", orig_idnum, sha1_to_hex(oe->sha1));
}
static struct object_entry *find_mark(uintmax_t idnum)
@@ -2294,6 +2298,8 @@ int main(int argc, const char **argv)
max_active_branches = strtoul(a + 18, NULL, 0);
else if (!prefixcmp(a, "--import-marks="))
import_marks(a + 15);
+ else if (!prefixcmp(a, "--export-marks-to-stdout"))
+ marks_to_stdout = 1;
else if (!prefixcmp(a, "--export-marks="))
mark_file = a + 15;
else if (!prefixcmp(a, "--export-pack-edges=")) {
--
Shawn.
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 0/9] Make git-svn fetch ~1.7x faster
2007-10-23 6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey
@ 2007-10-23 6:13 ` Adam Roben
2007-10-24 0:43 ` Sam Vilain
1 sibling, 0 replies; 25+ messages in thread
From: Adam Roben @ 2007-10-23 6:13 UTC (permalink / raw)
To: Mike Hommey; +Cc: git, Junio C Hamano
Mike Hommey wrote:
> On Mon, Oct 22, 2007 at 10:46:28PM -0700, Adam Roben wrote:
>
>> This patch series makes git-svn fetch about 1.7x faster by reducing the number
>> of forks/execs that occur for each file retrieved from Subversion. To do so, a
>> few new options are added to git-cat-file and git-hash-object to allow
>> continuous input on stdin and continuous output on stdout, so that one instance
>> of each of these commands can be kept running for the duration of the fetch.
>>
>
> You don't need to do this to avoid forks. Just use git-fast-import
> instead.
>
I agree that fast-import is probably ultimately a better solution for
this, but given that git-svn currently uses the output of every command
it forks off and that fast-import doesn't seem to give the same output,
changing git-svn to use fast-import would be a fairly sweeping change
that I didn't feel comfortable making without a better understanding of
git-svn.
-Adam
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/9] Add tests for git cat-file
2007-10-23 5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
2007-10-23 5:46 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
@ 2007-10-23 6:59 ` Johannes Sixt
1 sibling, 0 replies; 25+ messages in thread
From: Johannes Sixt @ 2007-10-23 6:59 UTC (permalink / raw)
To: Adam Roben; +Cc: git, Junio C Hamano
Adam Roben schrieb:
> + test_expect_success \
> + "$type exists" \
> + "git cat-file -e $hello_sha1"
You mean $sha1 here, right?
> + test_expect_success \
> + "Type of $type is correct" \
> + "test $type = \"$(git cat-file -t $sha1)\""
This should escape the $(...) in all the tests. Like this:
"test $type = \"\$(git cat-file -t $sha1)\""
> +test_expect_success \
> + "Reach a blob from a tag pointing to it" \
> + "test \"$hello_content\" = \"$(git cat-file blob $tag_sha1)\""
And use single quotes without escaping the double-quotes here.
-- Hannes
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 6/9] Add tests for git hash-object
2007-10-23 5:46 ` [PATCH 6/9] Add tests for git hash-object Adam Roben
2007-10-23 5:46 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben
@ 2007-10-23 6:59 ` Johannes Sixt
1 sibling, 0 replies; 25+ messages in thread
From: Johannes Sixt @ 2007-10-23 6:59 UTC (permalink / raw)
To: Adam Roben; +Cc: git, Junio C Hamano
Adam Roben schrieb:
> +test_expect_success \
> + 'hash a file' \
> + "test $hello_sha1 = $(git hash-object hello)"
Put tests in double-quotes; otherwise, the substitutions happen before the
test begins, and not as part of the test.
-- Hannes
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 9/9] git-svn: Make fetch ~1.7x faster
2007-10-23 5:46 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben
@ 2007-10-23 7:01 ` Johannes Sixt
2007-10-24 6:34 ` Eric Wong
1 sibling, 0 replies; 25+ messages in thread
From: Johannes Sixt @ 2007-10-23 7:01 UTC (permalink / raw)
To: Adam Roben; +Cc: git, Junio C Hamano, Eric Wong
Adam Roben schrieb:
> We were spending a lot of time forking/execing git-cat-file and
> git-hash-object. We now use command_bidi_pipe to keep one instance of each
> running and feed it input on stdin.
I appreciate this. It's certainly going to be a much bigger win on Windows,
although git svn doesn't work (in the MinGW port) at this time because of
the old perl and the missing SVN module.
-- Hannes
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/9] Make git-svn fetch ~1.7x faster
2007-10-23 6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey
2007-10-23 6:13 ` Adam Roben
@ 2007-10-24 0:43 ` Sam Vilain
1 sibling, 0 replies; 25+ messages in thread
From: Sam Vilain @ 2007-10-24 0:43 UTC (permalink / raw)
To: Mike Hommey; +Cc: git, aroben
Mike Hommey wrote:
> On Mon, Oct 22, 2007 at 10:46:28PM -0700, Adam Roben wrote:
>> This patch series makes git-svn fetch about 1.7x faster by reducing the number
>> of forks/execs that occur for each file retrieved from Subversion. To do so, a
>> few new options are added to git-cat-file and git-hash-object to allow
>> continuous input on stdin and continuous output on stdout, so that one instance
>> of each of these commands can be kept running for the duration of the fetch.
>
> You don't need to do this to avoid forks. Just use git-fast-import
> instead.
git-fast-import only covers the hash-object side of things, not cat-file.
git-fast-import does not currently suit 'gradual deployment' for
converters such as git-svn, because it;
- returns object IDs at the end, when you checkpoint.
This could be 'fixed' by allowing a marks log file instead of or in
addition to the current behaviour, though if the exporter is
continually waiting for the tokens rather than using marks, it will
slow it down.
- you can't use plumbing commands, such as rev-parse, cat-file, etc on
objects which have not been checkpointed yet.
- can't just stream a file of unknown length to it as you can to
hash-object
These are the design trade-offs of using fast-import. Using
fast-import, you are creating a 'transaction' area which uses user
sequences instead of (git)database-issued identifiers. And this
transaction is isolated from the other concurrent users of the object
database. However the interface does not have the full git CLI
available to it, so unlike a regular database transaction, you end up
having to care.
Rewriting the importer so as to correctly deal with these problems is
quite challenging, and for slow import sources such as Subversion, of
limited merit.
Sam.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 5/9] git-cat-file: Add --separator option
2007-10-23 5:46 ` [PATCH 5/9] git-cat-file: Add --separator option Adam Roben
2007-10-23 5:46 ` [PATCH 6/9] Add tests for git hash-object Adam Roben
@ 2007-10-24 3:43 ` Brian Downing
2007-10-24 4:26 ` Adam Roben
1 sibling, 1 reply; 25+ messages in thread
From: Brian Downing @ 2007-10-24 3:43 UTC (permalink / raw)
To: Adam Roben; +Cc: git, Junio C Hamano
On Mon, Oct 22, 2007 at 10:46:33PM -0700, Adam Roben wrote:
> +--separator::
> + A string to print in between the output for each object passed on
> + stdin. A newline will be appended to the separator each time it is
> + printed.
Maybe I'm just unreasonably paranoid, but I don't think I could ever
trust that you'd never find an arbitrary separator in the data. I
suppose if you scanned the files beforehand you could come up with
something guaranteed to be unique, but that seems like a pain (and
doesn't happen regardless in patch 9/9; it just uses
"--------------GITCATFILESEPARATOR-----------") If I were committing to
SVN, it's sure not something I'd like to bet the integrity of my data
on.
I think a far more reasonable output format for multiple objects would
be something like:
<count> LF
<raw data> LF
Where <count> is the number of bytes in the <raw data> as an ASCII
decimal integer.
This is pretty much the spiritual analog to the fast-import "exact byte
count" data input format as well.
-bcd
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 5/9] git-cat-file: Add --separator option
2007-10-24 3:43 ` [PATCH 5/9] git-cat-file: Add --separator option Brian Downing
@ 2007-10-24 4:26 ` Adam Roben
0 siblings, 0 replies; 25+ messages in thread
From: Adam Roben @ 2007-10-24 4:26 UTC (permalink / raw)
To: Brian Downing; +Cc: git, Junio C Hamano
Brian Downing wrote:
> On Mon, Oct 22, 2007 at 10:46:33PM -0700, Adam Roben wrote:
>
>> +--separator::
>> + A string to print in between the output for each object passed on
>> + stdin. A newline will be appended to the separator each time it is
>> + printed.
>>
>
> Maybe I'm just unreasonably paranoid, but I don't think I could ever
> trust that you'd never find an arbitrary separator in the data. I
> suppose if you scanned the files beforehand you could come up with
> something guaranteed to be unique, but that seems like a pain (and
> doesn't happen regardless in patch 9/9; it just uses
> "--------------GITCATFILESEPARATOR-----------") If I were committing to
> SVN, it's sure not something I'd like to bet the integrity of my data
> on.
>
I had some of the same concerns.
> I think a far more reasonable output format for multiple objects would
> be something like:
>
> <count> LF
> <raw data> LF
>
> Where <count> is the number of bytes in the <raw data> as an ASCII
> decimal integer.
>
This sounds like a much better solution. I'll implement it that way and
send out a new patch. Thanks for the suggestion!
-Adam
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 7/9] git-hash-object: Add --stdin-paths option
2007-10-23 6:10 ` Shawn O. Pearce
@ 2007-10-24 6:11 ` Eric Wong
0 siblings, 0 replies; 25+ messages in thread
From: Eric Wong @ 2007-10-24 6:11 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Adam Roben, git, Junio C Hamano
"Shawn O. Pearce" <spearce@spearce.org> wrote:
> Adam Roben <aroben@apple.com> wrote:
> > Shawn O. Pearce wrote:
> > >Adam Roben <aroben@apple.com> wrote:
> > >
> > >>This allows multiple paths to be specified on stdin.
> > >
> > >git-fast-import wasn't suited to the task?
> >
> > I actually considered using fast-import for the whole shebang, but
> > decided that I don't yet understand the workings and structure of
> > git-svn well enough to make such a big change.
> >
> > git-svn uses git-hash-object to both determine a file's hash and insert
> > it into the index in one go -- can fast-import do this? Or will it just
> > put it in the index and not give you the hash back? The latter was my
> > impression.
>
> It doesn't currently give you the hash back. You can sort of get
> to it by marking the blob then using the 'checkpoint' command to
> dump the marks to a file, which you can read in. Not good.
>
> It probably wouldn't be very difficult to give fast-import a way
> to dump marks back on stdout as they are assigned. So long as the
> frontend either locksteps with fast-import or is willing to monitor
> it with a select/poll type of arrangement and read from stdout as
> soon as its ready.
>
> Probably a 5 line code change to fast-import. Like this. Only Git
> won't recognize that object SHA-1 as its in a packfile that has
> no index. You'd need to 'checkpoint' to flush the object out, or
> just use all of fast-import for the processing. So yea, I guess
> I can see now how its not suited to this.
Shawn, thanks for clearing that up. I was previously considering
fast-import for git-svn, but never had time[1] to really look at it.
I guess Adam is on the right track with his patches.
[1] - Sorry to all on the list, but I've really been slacking on git-svn
work. I was going to get some stuff done this weekend but decided
to attempt to fight my nasty caffeine addiction instead :x
--
Eric Wong
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 9/9] git-svn: Make fetch ~1.7x faster
2007-10-23 5:46 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben
2007-10-23 7:01 ` Johannes Sixt
@ 2007-10-24 6:34 ` Eric Wong
2007-10-24 6:48 ` Adam Roben
1 sibling, 1 reply; 25+ messages in thread
From: Eric Wong @ 2007-10-24 6:34 UTC (permalink / raw)
To: Adam Roben; +Cc: git, Junio C Hamano
Adam Roben <aroben@apple.com> wrote:
> We were spending a lot of time forking/execing git-cat-file and
> git-hash-object. We now use command_bidi_pipe to keep one instance of each
> running and feed it input on stdin.
Nice job! I just got access to a very fast SVN repository for a project
I'm working on (not working on git-svn itself, unfortunately).
A few comments and small nitpicks below:
> Signed-off-by: Adam Roben <aroben@apple.com>
> ---
> git-svn.perl | 94 ++++++++++++++++++++++++++++++++++++++++++++-------------
> 1 files changed, 72 insertions(+), 22 deletions(-)
> +package Git::Commands;
Can this be a separate file, or a part of Git.pm? I'm sure other
scripts can eventually use this and I've been meaning to split
git-svn.perl into separate files so it's easier to follow.
> +use vars qw/$_cat_blob_pid $_cat_blob_in $_cat_blob_out $_cat_blob_ctx $_cat_blob_separator
> + $_hash_object_pid $_hash_object_in $_hash_object_out $_hash_object_ctx/;
I have trouble following long lines, and most of the git code also wraps
at 80-columns. Dead-tree publishers got this concept right a long
time ago :)
> +use strict;
> +use warnings;
> +use File::Temp qw/tempfile/;
> +use Git qw/command_bidi_pipe command_close_bidi_pipe/;
> +
> +sub _open_cat_blob_if_needed {
> + return if defined($_cat_blob_pid);
> + $_cat_blob_separator = "--------------GITCATFILESEPARATOR-----------";
Brian brought this up already, but yes, having pre-defined separators
instead of explicitly-specified sizes makes it all too easy for a
malicious user to commit code that will break things for git-svn users.
> +sub hash_object {
> + my (undef, $fh) = @_;
> +
> + my ($tmp_fh, $tmp_filename) = tempfile(UNLINK => 1);
> + while (my $line = <$fh>) {
> + print $tmp_fh $line;
> + }
> + close($tmp_fh);
Related to the above. It's better to sysread()/syswrite() or
read()/print() in a loop with a predefined buffer size rather than to
use a readline() since you could be dealing with files with very long
lines or binaries with no newline characters in them at all.
> + _open_hash_object_if_needed();
> + print $_hash_object_out $tmp_filename . "\n";
Minor, but
print $_hash_object_out $tmp_filename, "\n";
avoids creating a new string.
--
Eric Wong
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 9/9] git-svn: Make fetch ~1.7x faster
2007-10-24 6:34 ` Eric Wong
@ 2007-10-24 6:48 ` Adam Roben
0 siblings, 0 replies; 25+ messages in thread
From: Adam Roben @ 2007-10-24 6:48 UTC (permalink / raw)
To: Eric Wong; +Cc: git, Junio C Hamano
Eric Wong wrote:
> Adam Roben <aroben@apple.com> wrote:
>
>> +package Git::Commands;
>>
>
> Can this be a separate file, or a part of Git.pm? I'm sure other
> scripts can eventually use this and I've been meaning to split
> git-svn.perl into separate files so it's easier to follow.
>
I had considered doing one of the above, but decided that splitting it
out could be done if/when it was deemed useful for another script. But
I'll split it out since you think it's a good idea.
>> +use vars qw/$_cat_blob_pid $_cat_blob_in $_cat_blob_out $_cat_blob_ctx $_cat_blob_separator
>> + $_hash_object_pid $_hash_object_in $_hash_object_out $_hash_object_ctx/;
>>
>
> I have trouble following long lines, and most of the git code also wraps
> at 80-columns. Dead-tree publishers got this concept right a long
> time ago :)
>
Will fix.
>> +use strict;
>> +use warnings;
>> +use File::Temp qw/tempfile/;
>> +use Git qw/command_bidi_pipe command_close_bidi_pipe/;
>> +
>> +sub _open_cat_blob_if_needed {
>> + return if defined($_cat_blob_pid);
>> + $_cat_blob_separator = "--------------GITCATFILESEPARATOR-----------";
>>
>
> Brian brought this up already, but yes, having pre-defined separators
> instead of explicitly-specified sizes makes it all too easy for a
> malicious user to commit code that will break things for git-svn users.
>
Yup, will fix this. :-)
>> +sub hash_object {
>> + my (undef, $fh) = @_;
>> +
>> + my ($tmp_fh, $tmp_filename) = tempfile(UNLINK => 1);
>> + while (my $line = <$fh>) {
>> + print $tmp_fh $line;
>> + }
>> + close($tmp_fh);
>>
>
> Related to the above. It's better to sysread()/syswrite() or
> read()/print() in a loop with a predefined buffer size rather than to
> use a readline() since you could be dealing with files with very long
> lines or binaries with no newline characters in them at all.
>
Hm, OK. I'll look for similar code in git-svn and follow that.
>> + _open_hash_object_if_needed();
>> + print $_hash_object_out $tmp_filename . "\n";
>>
>
> Minor, but
>
> print $_hash_object_out $tmp_filename, "\n";
>
> avoids creating a new string.
>
Good idea.
Thanks for the feedback! I'll send out some new patches sometime soon.
-Adam
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/9] Add tests for git cat-file
2007-10-25 10:25 [RESEND PATCH " Adam Roben
@ 2007-10-25 10:25 ` Adam Roben
0 siblings, 0 replies; 25+ messages in thread
From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw)
To: git; +Cc: Junio Hamano, Adam Roben, Johannes Sixt
Signed-off-by: Adam Roben <aroben@apple.com>
---
Johannes Sixt wrote:
> Adam Roben schrieb:
> > + test_expect_success \
> > + "$type exists" \
> > + "git cat-file -e $hello_sha1"
>
> You mean $sha1 here, right?
I most definitely did!
> > + test_expect_success \
> > + "Type of $type is correct" \
> > + "test $type = \"$(git cat-file -t $sha1)\""
>
> This should escape the $(...) in all the tests. Like this:
>
> "test $type = \"\$(git cat-file -t $sha1)\""
>
> > +test_expect_success \
> > + "Reach a blob from a tag pointing to it" \
> > + "test \"$hello_content\" = \"$(git cat-file blob $tag_sha1)\""
>
> And use single quotes without escaping the double-quotes here.
Done.
t/t1005-cat-file.sh | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 91 insertions(+), 0 deletions(-)
create mode 100755 t/t1005-cat-file.sh
diff --git a/t/t1005-cat-file.sh b/t/t1005-cat-file.sh
new file mode 100755
index 0000000..697354d
--- /dev/null
+++ b/t/t1005-cat-file.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+
+test_description='git cat-file'
+
+. ./test-lib.sh
+
+function maybe_remove_timestamp()
+{
+ if test -z "$2"; then
+ echo "$1"
+ else
+ echo "$1" | sed -e 's/ [0-9]\{10\} [+-][0-9]\{4\}$//'
+ fi
+}
+
+function run_tests()
+{
+ type=$1
+ sha1=$2
+ size=$3
+ content=$4
+ pretty_content=$5
+ no_timestamp=$6
+
+ test_expect_success \
+ "$type exists" \
+ "git cat-file -e $sha1"
+ test_expect_success \
+ "Type of $type is correct" \
+ "test $type = \"\$(git cat-file -t $sha1)\""
+ test_expect_success \
+ "Size of $type is correct" \
+ "test $size = \"\$(git cat-file -s $sha1)\""
+ test -z "$content" || test_expect_success \
+ "Content of $type is correct" \
+ "test \"\$(maybe_remove_timestamp '$content' $no_timestamp)\" = \"\$(maybe_remove_timestamp \"\$(git cat-file $type $sha1)\" $no_timestamp)\""
+ test_expect_success \
+ "Pretty content of $type is correct" \
+ "test \"\$(maybe_remove_timestamp '$pretty_content' $no_timestamp)\" = \"\$(maybe_remove_timestamp \"\$(git cat-file -p $sha1)\" $no_timestamp)\""
+}
+
+hello_content="Hello World"
+hello_size=$(echo "$hello_content" | wc -c)
+hello_sha1=557db03de997c86a4a028e1ebd3a1ceb225be238
+
+echo "$hello_content" > hello
+
+git update-index --add hello
+
+run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
+
+tree_sha1=$(git write-tree)
+tree_size=33
+tree_pretty_content="100644 blob $hello_sha1 hello"
+
+run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
+
+commit_message="Intial commit"
+commit_sha1=$(echo "$commit_message" | git commit-tree $tree_sha1)
+commit_size=177
+commit_content="tree $tree_sha1
+author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 0000000000 +0000
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 0000000000 +0000
+
+$commit_message"
+
+run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1
+
+tag_header="object $hello_sha1
+type blob
+tag hellotag
+tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>"
+tag_description="This is a tag"
+tag_content="$tag_header
+
+$tag_description"
+tag_pretty_content="$tag_header
+Thu Jan 1 00:00:00 1970 +0000
+
+$tag_description"
+
+tag_sha1=$(echo "$tag_content" | git mktag)
+tag_size=$(echo "$tag_content" | wc -c)
+
+run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_pretty_content"
+
+test_expect_success \
+ "Reach a blob from a tag pointing to it" \
+ "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\""
+
+test_done
--
1.5.3.4.1337.g8e67d-dirty
^ permalink raw reply related [flat|nested] 25+ messages in thread
end of thread, other threads:[~2007-10-25 10:26 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-23 5:46 [PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben
2007-10-23 5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
2007-10-23 5:46 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
2007-10-23 5:46 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
2007-10-23 5:46 ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
2007-10-23 5:46 ` [PATCH 5/9] git-cat-file: Add --separator option Adam Roben
2007-10-23 5:46 ` [PATCH 6/9] Add tests for git hash-object Adam Roben
2007-10-23 5:46 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben
2007-10-23 5:46 ` [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
2007-10-23 5:46 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben
2007-10-23 7:01 ` Johannes Sixt
2007-10-24 6:34 ` Eric Wong
2007-10-24 6:48 ` Adam Roben
2007-10-23 5:53 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Shawn O. Pearce
2007-10-23 5:57 ` Adam Roben
2007-10-23 6:10 ` Shawn O. Pearce
2007-10-24 6:11 ` Eric Wong
2007-10-23 6:59 ` [PATCH 6/9] Add tests for git hash-object Johannes Sixt
2007-10-24 3:43 ` [PATCH 5/9] git-cat-file: Add --separator option Brian Downing
2007-10-24 4:26 ` Adam Roben
2007-10-23 6:59 ` [PATCH 1/9] Add tests for git cat-file Johannes Sixt
2007-10-23 6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey
2007-10-23 6:13 ` Adam Roben
2007-10-24 0:43 ` Sam Vilain
-- strict thread matches above, loose matches on Subject: below --
2007-10-25 10:25 [RESEND PATCH " Adam Roben
2007-10-25 10:25 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).