* [PATCH 0/9] Make git-svn fetch ~1.7x faster @ 2007-10-23 5:46 Adam Roben 2007-10-23 5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben 2007-10-23 6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey 0 siblings, 2 replies; 26+ messages in thread From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano This patch series makes git-svn fetch about 1.7x faster by reducing the number of forks/execs that occur for each file retrieved from Subversion. To do so, a few new options are added to git-cat-file and git-hash-object to allow continuous input on stdin and continuous output on stdout, so that one instance of each of these commands can be kept running for the duration of the fetch. The series is based on top of next. I considered basing it on top of the parse_options work since I touch the option parsing in these two commands, but I didn't know how wise it would be to base a patch series on something in pu. I tried to add some new tests for cat-file and hash-object to ensure that I didn't break old behavior, but I'm not very experienced with the git test suite and I'm sure my tests could use some improvement. This is the most invasive change I've yet made to git, so comments are more than welcome. -Adam -- Documentation/git-cat-file.txt | 11 +++- Documentation/git-hash-object.txt | 5 +- builtin-cat-file.c | 96 +++++++++++++++++++++---- git-svn.perl | 94 +++++++++++++++++++------ hash-object.c | 29 ++++++++- perl/Git.pm | 56 +++++++++++++++ t/t1005-cat-file.sh | 139 +++++++++++++++++++++++++++++++++++++ t/t1006-hash-object.sh | 49 +++++++++++++ 8 files changed, 438 insertions(+), 41 deletions(-) -- ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 1/9] Add tests for git cat-file 2007-10-23 5:46 [PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben @ 2007-10-23 5:46 ` Adam Roben 2007-10-23 5:46 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben 2007-10-23 6:59 ` [PATCH 1/9] Add tests for git cat-file Johannes Sixt 2007-10-23 6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey 1 sibling, 2 replies; 26+ messages in thread From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Adam Roben Signed-off-by: Adam Roben <aroben@apple.com> --- t/t1005-cat-file.sh | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 91 insertions(+), 0 deletions(-) create mode 100755 t/t1005-cat-file.sh diff --git a/t/t1005-cat-file.sh b/t/t1005-cat-file.sh new file mode 100755 index 0000000..2fdc446 --- /dev/null +++ b/t/t1005-cat-file.sh @@ -0,0 +1,91 @@ +#!/bin/sh + +test_description='git cat-file' + +. ./test-lib.sh + +function maybe_remove_timestamp() +{ + if test -z "$2"; then + echo "$1" + else + echo "$1" | sed -e 's/ [0-9]\{10\} [+-][0-9]\{4\}$//' + fi +} + +function run_tests() +{ + type=$1 + sha1=$2 + size=$3 + content=$4 + pretty_content=$5 + no_timestamp=$6 + + test_expect_success \ + "$type exists" \ + "git cat-file -e $hello_sha1" + test_expect_success \ + "Type of $type is correct" \ + "test $type = \"$(git cat-file -t $sha1)\"" + test_expect_success \ + "Size of $type is correct" \ + "test $size = \"$(git cat-file -s $sha1)\"" + test -z "$content" || test_expect_success \ + "Content of $type is correct" \ + "test \"$(maybe_remove_timestamp "$content" $no_timestamp)\" = \"$(maybe_remove_timestamp "$(git cat-file $type $sha1)" $no_timestamp)\"" + test_expect_success \ + "Pretty content of $type is correct" \ + "test \"$(maybe_remove_timestamp "$pretty_content" $no_timestamp)\" = \"$(maybe_remove_timestamp "$(git cat-file -p $sha1)" $no_timestamp)\"" +} + +hello_content="Hello World" +hello_size=$(echo "$hello_content" | wc -c) +hello_sha1=557db03de997c86a4a028e1ebd3a1ceb225be238 + +echo "$hello_content" > hello + +git update-index --add hello + +run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content" + +tree_sha1=$(git write-tree) +tree_size=33 +tree_pretty_content="100644 blob $hello_sha1 hello" + +run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content" + +commit_message="Intial commit" +commit_sha1=$(echo "$commit_message" | git commit-tree $tree_sha1) +commit_size=177 +commit_content="tree $tree_sha1 +author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 0000000000 +0000 +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 0000000000 +0000 + +$commit_message" + +run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1 + +tag_header="object $hello_sha1 +type blob +tag hellotag +tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>" +tag_description="This is a tag" +tag_content="$tag_header + +$tag_description" +tag_pretty_content="$tag_header +Thu Jan 1 00:00:00 1970 +0000 + +$tag_description" + +tag_sha1=$(echo "$tag_content" | git mktag) +tag_size=$(echo "$tag_content" | wc -c) + +run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_pretty_content" + +test_expect_success \ + "Reach a blob from a tag pointing to it" \ + "test \"$hello_content\" = \"$(git cat-file blob $tag_sha1)\"" + +test_done -- 1.5.3.4.1333.ga2f32 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file 2007-10-23 5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben @ 2007-10-23 5:46 ` Adam Roben 2007-10-23 5:46 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben 2007-10-23 6:59 ` [PATCH 1/9] Add tests for git cat-file Johannes Sixt 1 sibling, 1 reply; 26+ messages in thread From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Adam Roben I separated the logic of parsing the arguments from the logic of fetching and outputting the data. cat_one_file now does the latter. Signed-off-by: Adam Roben <aroben@apple.com> --- builtin-cat-file.c | 38 ++++++++++++++++++++++---------------- 1 files changed, 22 insertions(+), 16 deletions(-) diff --git a/builtin-cat-file.c b/builtin-cat-file.c index f132d58..34a63d1 100644 --- a/builtin-cat-file.c +++ b/builtin-cat-file.c @@ -76,31 +76,16 @@ static void pprint_tag(const unsigned char *sha1, const char *buf, unsigned long write_or_die(1, cp, endp - cp); } -int cmd_cat_file(int argc, const char **argv, const char *prefix) +static int cat_one_file(int opt, const char *exp_type, const char *obj_name) { unsigned char sha1[20]; enum object_type type; void *buf; unsigned long size; - int opt; - const char *exp_type, *obj_name; - - git_config(git_default_config); - if (argc != 3) - usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>"); - exp_type = argv[1]; - obj_name = argv[2]; if (get_sha1(obj_name, sha1)) die("Not a valid object name %s", obj_name); - opt = 0; - if ( exp_type[0] == '-' ) { - opt = exp_type[1]; - if ( !opt || exp_type[2] ) - opt = -1; /* Not a single character option */ - } - buf = NULL; switch (opt) { case 't': @@ -157,3 +142,24 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) write_or_die(1, buf, size); return 0; } + +int cmd_cat_file(int argc, const char **argv, const char *prefix) +{ + int opt; + const char *exp_type, *obj_name; + + git_config(git_default_config); + if (argc != 3) + usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>"); + exp_type = argv[1]; + obj_name = argv[2]; + + opt = 0; + if ( exp_type[0] == '-' ) { + opt = exp_type[1]; + if ( !opt || exp_type[2] ) + opt = -1; /* Not a single character option */ + } + + return cat_one_file(opt, exp_type, obj_name); +} -- 1.5.3.4.1333.ga2f32 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 3/9] git-cat-file: Make option parsing a little more flexible 2007-10-23 5:46 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben @ 2007-10-23 5:46 ` Adam Roben 2007-10-23 5:46 ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben 0 siblings, 1 reply; 26+ messages in thread From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Adam Roben This will make it easier to add newer options later. Signed-off-by: Adam Roben <aroben@apple.com> --- builtin-cat-file.c | 42 ++++++++++++++++++++++++++++++------------ 1 files changed, 30 insertions(+), 12 deletions(-) diff --git a/builtin-cat-file.c b/builtin-cat-file.c index 34a63d1..3a0be4a 100644 --- a/builtin-cat-file.c +++ b/builtin-cat-file.c @@ -143,23 +143,41 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) return 0; } +static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] <sha1>"; + int cmd_cat_file(int argc, const char **argv, const char *prefix) { - int opt; - const char *exp_type, *obj_name; + int i, opt = 0; + const char *exp_type = 0, *obj_name = 0; git_config(git_default_config); - if (argc != 3) - usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>"); - exp_type = argv[1]; - obj_name = argv[2]; - - opt = 0; - if ( exp_type[0] == '-' ) { - opt = exp_type[1]; - if ( !opt || exp_type[2] ) - opt = -1; /* Not a single character option */ + + for (i = 1; i < argc; ++i) { + const char *arg = argv[i]; + + if (!strcmp(arg, "-t") || !strcmp(arg, "-s") || !strcmp(arg, "-e") || !strcmp(arg, "-p")) { + exp_type = arg; + opt = exp_type[1]; + continue; + } + + if (arg[0] == '-') + usage(cat_file_usage); + + if (!exp_type) { + exp_type = arg; + continue; + } + + if (obj_name) + usage(cat_file_usage); + + obj_name = arg; + break; } + if (!exp_type || !obj_name) + usage(cat_file_usage); + return cat_one_file(opt, exp_type, obj_name); } -- 1.5.3.4.1333.ga2f32 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 4/9] git-cat-file: Add --stdin option 2007-10-23 5:46 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben @ 2007-10-23 5:46 ` Adam Roben 2007-10-23 5:46 ` [PATCH 5/9] git-cat-file: Add --separator option Adam Roben 0 siblings, 1 reply; 26+ messages in thread From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Adam Roben This lets you specify object names on stdin instead of on the command line. Signed-off-by: Adam Roben <aroben@apple.com> --- Documentation/git-cat-file.txt | 6 +++++- builtin-cat-file.c | 26 ++++++++++++++++++++++---- t/t1005-cat-file.sh | 14 ++++++++++++++ 3 files changed, 41 insertions(+), 5 deletions(-) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index afa095c..588d71a 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -8,7 +8,7 @@ git-cat-file - Provide content or type/size information for repository objects SYNOPSIS -------- -'git-cat-file' [-t | -s | -e | -p | <type>] <object> +'git-cat-file' [-t | -s | -e | -p | <type>] [--stdin | <object>] DESCRIPTION ----------- @@ -23,6 +23,10 @@ OPTIONS For a more complete list of ways to spell object names, see "SPECIFYING REVISIONS" section in gitlink:git-rev-parse[1]. +--stdin:: + Read object names from stdin instead of specifying one on the + command line. + -t:: Instead of the content, show the object type identified by <object>. diff --git a/builtin-cat-file.c b/builtin-cat-file.c index 3a0be4a..0f1ffe5 100644 --- a/builtin-cat-file.c +++ b/builtin-cat-file.c @@ -143,12 +143,14 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) return 0; } -static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] <sha1>"; +static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] [--stdin | <sha1>]"; int cmd_cat_file(int argc, const char **argv, const char *prefix) { int i, opt = 0; + int read_stdin = 0; const char *exp_type = 0, *obj_name = 0; + struct strbuf buf; git_config(git_default_config); @@ -161,6 +163,11 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) continue; } + if (!strcmp(arg, "--stdin")) { + read_stdin = 1; + continue; + } + if (arg[0] == '-') usage(cat_file_usage); @@ -169,15 +176,26 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) continue; } - if (obj_name) + if (obj_name || read_stdin) usage(cat_file_usage); obj_name = arg; break; } - if (!exp_type || !obj_name) + if (!read_stdin) { + if (!exp_type || !obj_name) usage(cat_file_usage); + return cat_one_file(opt, exp_type, obj_name); + } - return cat_one_file(opt, exp_type, obj_name); + strbuf_init(&buf, 0); + while (strbuf_getline(&buf, stdin, '\n') != EOF) { + int error = cat_one_file(opt, exp_type, buf.buf); + if (error) + return error; + } + strbuf_release(&buf); + + return 0; } diff --git a/t/t1005-cat-file.sh b/t/t1005-cat-file.sh index 2fdc446..49eb89d 100755 --- a/t/t1005-cat-file.sh +++ b/t/t1005-cat-file.sh @@ -88,4 +88,18 @@ test_expect_success \ "Reach a blob from a tag pointing to it" \ "test \"$hello_content\" = \"$(git cat-file blob $tag_sha1)\"" +sha1s="$hello_sha1 +$tree_sha1 +$commit_sha1 +$tag_sha1" + +sizes="$hello_size +$tree_size +$commit_size +$tag_size" + +test_expect_success \ + "Pass object hashes on stdin" \ + "test \"$sizes\" = \"$(echo "$sha1s" | git cat-file -s --stdin)\"" + test_done -- 1.5.3.4.1333.ga2f32 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 5/9] git-cat-file: Add --separator option 2007-10-23 5:46 ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben @ 2007-10-23 5:46 ` Adam Roben 2007-10-23 5:46 ` [PATCH 6/9] Add tests for git hash-object Adam Roben 2007-10-24 3:43 ` [PATCH 5/9] git-cat-file: Add --separator option Brian Downing 0 siblings, 2 replies; 26+ messages in thread From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Adam Roben This lets the user specify a string to be printed in between the output from each object passed on stdin. Signed-off-by: Adam Roben <aroben@apple.com> --- Documentation/git-cat-file.txt | 7 ++++++- builtin-cat-file.c | 28 +++++++++++++++++++++++++--- t/t1005-cat-file.sh | 36 +++++++++++++++++++++++++++++++++++- 3 files changed, 66 insertions(+), 5 deletions(-) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index 588d71a..7a59a5e 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -8,7 +8,7 @@ git-cat-file - Provide content or type/size information for repository objects SYNOPSIS -------- -'git-cat-file' [-t | -s | -e | -p | <type>] [--stdin | <object>] +'git-cat-file' [-t | -s | -e | -p | <type>] [--stdin [--separator <string>] | <object>] DESCRIPTION ----------- @@ -27,6 +27,11 @@ OPTIONS Read object names from stdin instead of specifying one on the command line. +--separator:: + A string to print in between the output for each object passed on + stdin. A newline will be appended to the separator each time it is + printed. + -t:: Instead of the content, show the object type identified by <object>. diff --git a/builtin-cat-file.c b/builtin-cat-file.c index 0f1ffe5..9ae3184 100644 --- a/builtin-cat-file.c +++ b/builtin-cat-file.c @@ -92,6 +92,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) type = sha1_object_info(sha1, NULL); if (type > 0) { printf("%s\n", typename(type)); + fflush(stdout); return 0; } break; @@ -100,6 +101,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) type = sha1_object_info(sha1, &size); if (type > 0) { printf("%lu\n", size); + fflush(stdout); return 0; } break; @@ -143,14 +145,16 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) return 0; } -static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] [--stdin | <sha1>]"; +static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] [--stdin [--separator <string>] | <sha1>]"; + +static const char *separator; int cmd_cat_file(int argc, const char **argv, const char *prefix) { int i, opt = 0; int read_stdin = 0; const char *exp_type = 0, *obj_name = 0; - struct strbuf buf; + struct strbuf buf, sbuf; git_config(git_default_config); @@ -168,6 +172,13 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) continue; } + if (!strcmp(arg, "--separator")) { + if (++i == argc) + usage(cat_file_usage); + separator = argv[i]; + continue; + } + if (arg[0] == '-') usage(cat_file_usage); @@ -184,18 +195,29 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) } if (!read_stdin) { - if (!exp_type || !obj_name) + if (!exp_type || !obj_name || separator) usage(cat_file_usage); return cat_one_file(opt, exp_type, obj_name); } + if (separator) { + strbuf_init(&sbuf, 0); + strbuf_addstr(&sbuf, separator); + strbuf_addch(&sbuf, '\n'); + } + strbuf_init(&buf, 0); while (strbuf_getline(&buf, stdin, '\n') != EOF) { int error = cat_one_file(opt, exp_type, buf.buf); if (error) return error; + if (separator) + write_or_die(1, sbuf.buf, sbuf.len); } strbuf_release(&buf); + if (separator) + strbuf_release(&sbuf); + return 0; } diff --git a/t/t1005-cat-file.sh b/t/t1005-cat-file.sh index 49eb89d..52a3efd 100755 --- a/t/t1005-cat-file.sh +++ b/t/t1005-cat-file.sh @@ -99,7 +99,41 @@ $commit_size $tag_size" test_expect_success \ - "Pass object hashes on stdin" \ + "Print sizes for object hashes on stdin" \ "test \"$sizes\" = \"$(echo "$sha1s" | git cat-file -s --stdin)\"" +separator="TESTSEPARATOR" + +separated_sizes="$hello_size +$separator +$tree_size +$separator +$commit_size +$separator +$tag_size +$separator" + +test_expect_success \ + "Print sizes for object hashes on stdin with --separator" \ + "test \"$separated_sizes\" = \"$(echo "$sha1s" | git cat-file -s --stdin --separator $separator)\"" + +sha1s="$hello_sha1 +$hello_sha1" + +contents="$hello_content +$hello_content" + +separated_contents="$hello_content +$separator +$hello_content +$separator" + +test_expect_success \ + "Print objects for object hashes on stdin" \ + "test \"$contents\" = \"$(echo "$sha1s" | git cat-file blob --stdin)\"" + +test_expect_success \ + "Print objects for object hashes on stdin with --separator" \ + "test \"$separated_contents\" = \"$(echo "$sha1s" | git cat-file blob --stdin --separator $separator)\"" + test_done -- 1.5.3.4.1333.ga2f32 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 6/9] Add tests for git hash-object 2007-10-23 5:46 ` [PATCH 5/9] git-cat-file: Add --separator option Adam Roben @ 2007-10-23 5:46 ` Adam Roben 2007-10-23 5:46 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben 2007-10-23 6:59 ` [PATCH 6/9] Add tests for git hash-object Johannes Sixt 2007-10-24 3:43 ` [PATCH 5/9] git-cat-file: Add --separator option Brian Downing 1 sibling, 2 replies; 26+ messages in thread From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Adam Roben Signed-off-by: Adam Roben <aroben@apple.com> --- t/t1006-hash-object.sh | 27 +++++++++++++++++++++++++++ 1 files changed, 27 insertions(+), 0 deletions(-) create mode 100755 t/t1006-hash-object.sh diff --git a/t/t1006-hash-object.sh b/t/t1006-hash-object.sh new file mode 100755 index 0000000..77b8eca --- /dev/null +++ b/t/t1006-hash-object.sh @@ -0,0 +1,27 @@ +#!/bin/sh + +test_description='git hash-object' + +. ./test-lib.sh + +hello_content="Hello World" +hello_sha1=557db03de997c86a4a028e1ebd3a1ceb225be238 +echo "$hello_content" > hello + +test_expect_success \ + 'hash a file' \ + "test $hello_sha1 = $(git hash-object hello)" + +test_expect_success \ + 'hash from stdin' \ + "test $hello_sha1 = $(echo "$hello_content" | git hash-object --stdin)" + +test_expect_success \ + 'hash a file and write to database' \ + "test $hello_sha1 = $(git hash-object -w hello)" + +test_expect_success \ + 'hash from stdin and write to database' \ + "test $hello_sha1 = $(echo "$hello_content" | git hash-object -w --stdin)" + +test_done -- 1.5.3.4.1333.ga2f32 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 7/9] git-hash-object: Add --stdin-paths option 2007-10-23 5:46 ` [PATCH 6/9] Add tests for git hash-object Adam Roben @ 2007-10-23 5:46 ` Adam Roben 2007-10-23 5:46 ` [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben 2007-10-23 5:53 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Shawn O. Pearce 2007-10-23 6:59 ` [PATCH 6/9] Add tests for git hash-object Johannes Sixt 1 sibling, 2 replies; 26+ messages in thread From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Adam Roben This allows multiple paths to be specified on stdin. Signed-off-by: Adam Roben <aroben@apple.com> --- Documentation/git-hash-object.txt | 5 ++++- hash-object.c | 29 ++++++++++++++++++++++++++++- t/t1006-hash-object.sh | 22 ++++++++++++++++++++++ 3 files changed, 54 insertions(+), 2 deletions(-) diff --git a/Documentation/git-hash-object.txt b/Documentation/git-hash-object.txt index 616f196..50fc401 100644 --- a/Documentation/git-hash-object.txt +++ b/Documentation/git-hash-object.txt @@ -8,7 +8,7 @@ git-hash-object - Compute object ID and optionally creates a blob from a file SYNOPSIS -------- -'git-hash-object' [-t <type>] [-w] [--stdin] [--] <file>... +'git-hash-object' [-t <type>] [-w] [--stdin | --stdin-paths] [--] <file>... DESCRIPTION ----------- @@ -32,6 +32,9 @@ OPTIONS --stdin:: Read the object from standard input instead of from a file. +--stdin-paths:: + Read file names from stdin instead of from the command-line. + Author ------ Written by Junio C Hamano <junkio@cox.net> diff --git a/hash-object.c b/hash-object.c index 18f5017..fd96d50 100644 --- a/hash-object.c +++ b/hash-object.c @@ -20,6 +20,7 @@ static void hash_object(const char *path, enum object_type type, int write_objec ? "Unable to add %s to database" : "Unable to hash %s", path); printf("%s\n", sha1_to_hex(sha1)); + maybe_flush_or_die(stdout, "hash to stdout"); } static void hash_stdin(const char *type, int write_object) @@ -31,7 +32,7 @@ static void hash_stdin(const char *type, int write_object) } static const char hash_object_usage[] = -"git-hash-object [-t <type>] [-w] [--stdin] <file>..."; +"git-hash-object [-t <type>] [-w] [--stdin | --stdin-paths] <file>..."; int main(int argc, char **argv) { @@ -41,6 +42,7 @@ int main(int argc, char **argv) const char *prefix = NULL; int prefix_length = -1; int no_more_flags = 0; + int found_stdin_flag = 0; for (i = 1 ; i < argc; i++) { if (!no_more_flags && argv[i][0] == '-') { @@ -62,7 +64,32 @@ int main(int argc, char **argv) } else if (!strcmp(argv[i], "--help")) usage(hash_object_usage); + else if (!strcmp(argv[i], "--stdin-paths")) { + struct strbuf buf, nbuf; + + if (found_stdin_flag) + die("Can't use both --stdin and --stdin-paths"); + found_stdin_flag = 1; + + strbuf_init(&buf, 0); + strbuf_init(&nbuf, 0); + while (strbuf_getline(&buf, stdin, '\n') != EOF) { + if (buf.buf[0] == '"') { + strbuf_reset(&nbuf); + if (unquote_c_style(&nbuf, buf.buf, NULL)) + die("line is badly quoted"); + strbuf_swap(&buf, &nbuf); + } + hash_object(buf.buf, type_from_string(type), write_object); + } + strbuf_release(&buf); + strbuf_release(&nbuf); + } else if (!strcmp(argv[i], "--stdin")) { + if (found_stdin_flag) + die("Can't use both --stdin and --stdin-paths"); + found_stdin_flag = 1; + hash_stdin(type, write_object); } else diff --git a/t/t1006-hash-object.sh b/t/t1006-hash-object.sh index 77b8eca..e6da1c1 100755 --- a/t/t1006-hash-object.sh +++ b/t/t1006-hash-object.sh @@ -24,4 +24,26 @@ test_expect_success \ 'hash from stdin and write to database' \ "test $hello_sha1 = $(echo "$hello_content" | git hash-object -w --stdin)" +example_content="Silly example" +example_sha1=f24c74a2e500f5ee1332c86b94199f52b1d1d962 +echo "$example_content" > example + +filenames="hello +example" + +sha1s="$hello_sha1 +$example_sha1" + +test_expect_success \ + 'hash two files with names on stdin' \ + "test \"$sha1s\" = \"$(echo "$filenames" | git hash-object --stdin-paths)\"" + +test_expect_success \ + 'hash two files with names on stdin and write to database' \ + "test \"$sha1s\" = \"$(echo "$filenames" | git hash-object --stdin-paths)\"" + +test_expect_failure \ + "Can't use --stdin and --stdin-paths together" \ + "echo \"$filenames\" | git hash-object --stdin --stdin-paths" + test_done -- 1.5.3.4.1333.ga2f32 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe 2007-10-23 5:46 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben @ 2007-10-23 5:46 ` Adam Roben 2007-10-23 5:46 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben 2007-10-23 5:53 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Shawn O. Pearce 1 sibling, 1 reply; 26+ messages in thread From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Adam Roben, Petr Baudis command_bidi_pipe hands back the stdin and stdout file handles from the executed command. command_close_bidi_pipe closes these handles and terminates the process. Signed-off-by: Adam Roben <aroben@apple.com> --- perl/Git.pm | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 56 insertions(+), 0 deletions(-) diff --git a/perl/Git.pm b/perl/Git.pm index 3f4080c..eb699ff 100644 --- a/perl/Git.pm +++ b/perl/Git.pm @@ -51,6 +51,7 @@ require Exporter; # Methods which can be called as standalone functions as well: @EXPORT_OK = qw(command command_oneline command_noisy command_output_pipe command_input_pipe command_close_pipe + command_bidi_pipe command_close_bidi_pipe version exec_path hash_object git_cmd_try); @@ -92,6 +93,7 @@ increate nonwithstanding). use Carp qw(carp croak); # but croak is bad - throw instead use Error qw(:try); use Cwd qw(abs_path); +use IPC::Open2 qw(open2); } @@ -375,6 +377,60 @@ sub command_close_pipe { _cmd_close($fh, $ctx); } +=item command_bidi_pipe ( COMMAND [, ARGUMENTS... ] ) + +Execute the given C<COMMAND> in the same way as command_output_pipe() +does but return both an input pipe filehandle and an output pipe filehandle. + +The function will return return C<($pid, $pipe_in, $pipe_out, $ctx)>. +See C<command_close_bidi_pipe()> for details. + +=cut + +sub command_bidi_pipe { + my ($pid, $in, $out); + $pid = open2($in, $out, @_); + return ($pid, $in, $out, join(' ', @_)); +} + +=item command_close_bidi_pipe ( PID, PIPE_IN, PIPE_OUT [, CTX] ) + +Close the C<PIPE_IN> and C<PIPE_OUT> as returned from C<command_bidi_pipe()>, +checking whether the command finished successfully. The optional C<CTX> +argument is required if you want to see the command name in the error message, +and it is the fourth value returned by C<command_bidi_pipe()>. The call idiom +is: + + my ($pid, $in, $out, $ctx) = $r->command_bidi_pipe('cat-file --stdin'); + print "000000000\n" $out; + while (<$in>) { ... } + $r->command_close_bidi_pipe($pid, $in, $out, $ctx); + +Note that you should not rely on whatever actually is in C<CTX>; +currently it is simply the command name but in future the context might +have more complicated structure. + +=cut + +sub command_close_bidi_pipe { + my ($pid, $in, $out, $ctx) = @_; + foreach my $fh ($in, $out) { + if (not close $fh) { + if ($!) { + carp "error closing pipe: $!"; + } elsif ($? >> 8) { + throw Git::Error::Command($ctx, $? >>8); + } + } + } + + waitpid $pid, 0; + + if ($? >> 8) { + throw Git::Error::Command($ctx, $? >>8); + } +} + =item command_noisy ( COMMAND [, ARGUMENTS... ] ) -- 1.5.3.4.1333.ga2f32 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 9/9] git-svn: Make fetch ~1.7x faster 2007-10-23 5:46 ` [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben @ 2007-10-23 5:46 ` Adam Roben 2007-10-23 7:01 ` Johannes Sixt 2007-10-24 6:34 ` Eric Wong 0 siblings, 2 replies; 26+ messages in thread From: Adam Roben @ 2007-10-23 5:46 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Adam Roben, Eric Wong We were spending a lot of time forking/execing git-cat-file and git-hash-object. We now use command_bidi_pipe to keep one instance of each running and feed it input on stdin. Signed-off-by: Adam Roben <aroben@apple.com> --- git-svn.perl | 94 ++++++++++++++++++++++++++++++++++++++++++++------------- 1 files changed, 72 insertions(+), 22 deletions(-) diff --git a/git-svn.perl b/git-svn.perl index 22bb47b..8b72046 100755 --- a/git-svn.perl +++ b/git-svn.perl @@ -236,6 +236,8 @@ eval { }; fatal $@ if $@; post_fetch_checkout(); +Git::Commands->close_cat_blob(); +Git::Commands->close_hash_object(); exit 0; ####################### primary functions ###################### @@ -2683,14 +2685,8 @@ sub apply_textdelta { my $base = IO::File->new_tmpfile; $base->autoflush(1); if ($fb->{blob}) { - defined (my $pid = fork) or croak $!; - if (!$pid) { - open STDOUT, '>&', $base or croak $!; - print STDOUT 'link ' if ($fb->{mode_a} == 120000); - exec qw/git-cat-file blob/, $fb->{blob} or croak $!; - } - waitpid $pid, 0; - croak $? if $?; + my $contents = Git::Commands->cat_blob($fb->{blob}); + print $base $contents; if (defined $exp) { seek $base, 0, 0 or croak $!; @@ -2729,13 +2725,7 @@ sub close_file { $buf eq 'link ' or die "$path has mode 120000", "but is not a link\n"; } - defined(my $pid = open my $out,'-|') or die "Can't fork: $!\n"; - if (!$pid) { - open STDIN, '<&', $fh or croak $!; - exec qw/git-hash-object -w --stdin/ or croak $!; - } - chomp($hash = do { local $/; <$out> }); - close $out or croak $!; + $hash = Git::Commands->hash_object($fh); close $fh or croak $!; $hash =~ /^[a-f\d]{40}$/ or die "not a sha1: $hash\n"; close $fb->{base} or croak $!; @@ -3063,13 +3053,8 @@ sub chg_file { } elsif ($m->{mode_a} =~ /^120/ && $m->{mode_b} !~ /^120/) { $self->change_file_prop($fbat,'svn:special',undef); } - defined(my $pid = fork) or croak $!; - if (!$pid) { - open STDOUT, '>&', $fh or croak $!; - exec qw/git-cat-file blob/, $m->{sha1_b} or croak $!; - } - waitpid $pid, 0; - croak $? if $?; + my $blob = Git::Commands->cat_blob($m->{sha1_b}); + print $fh $blob; $fh->flush == 0 or croak $!; seek $fh, 0, 0 or croak $!; @@ -4272,6 +4257,71 @@ sub full_path { $path . (length $self->{right} ? "/$self->{right}" : ''); } +package Git::Commands; +use vars qw/$_cat_blob_pid $_cat_blob_in $_cat_blob_out $_cat_blob_ctx $_cat_blob_separator + $_hash_object_pid $_hash_object_in $_hash_object_out $_hash_object_ctx/; +use strict; +use warnings; +use File::Temp qw/tempfile/; +use Git qw/command_bidi_pipe command_close_bidi_pipe/; + +sub _open_cat_blob_if_needed { + return if defined($_cat_blob_pid); + $_cat_blob_separator = "--------------GITCATFILESEPARATOR-----------"; + + ($_cat_blob_pid, $_cat_blob_in, $_cat_blob_out, $_cat_blob_ctx) = command_bidi_pipe(qw(git-cat-file blob --stdin --separator), $_cat_blob_separator); +} + +sub close_cat_blob { + return unless defined($_cat_blob_pid); + + command_close_bidi_pipe($_cat_blob_pid, $_cat_blob_in, $_cat_blob_out, $_cat_blob_ctx); +} + +sub cat_blob { + my (undef, $sha1) = @_; + + _open_cat_blob_if_needed(); + print $_cat_blob_out "$sha1\n"; + my @file = (); + while (my $line = <$_cat_blob_in>) { + my $last = 0; + if ($line =~ s/\Q$_cat_blob_separator\E$//) { + chomp($line); + $last = 1; + } + push(@file, $line); + last if $last; + } + return join('', @file); +} + +sub _open_hash_object_if_needed { + return if defined($_hash_object_pid); + + ($_hash_object_pid, $_hash_object_in, $_hash_object_out, $_hash_object_ctx) = command_bidi_pipe(qw(git-hash-object -w --stdin-paths)); +} + +sub close_hash_object { + return unless defined($_hash_object_pid); + + command_close_bidi_pipe($_hash_object_pid, $_hash_object_in, $_hash_object_out, $_hash_object_ctx); +} + +sub hash_object { + my (undef, $fh) = @_; + + my ($tmp_fh, $tmp_filename) = tempfile(UNLINK => 1); + while (my $line = <$fh>) { + print $tmp_fh $line; + } + close($tmp_fh); + _open_hash_object_if_needed(); + print $_hash_object_out $tmp_filename . "\n"; + chomp(my $hash = <$_hash_object_in>); + return $hash; +} + __END__ Data structures: -- 1.5.3.4.1333.ga2f32 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 9/9] git-svn: Make fetch ~1.7x faster 2007-10-23 5:46 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben @ 2007-10-23 7:01 ` Johannes Sixt 2007-10-24 6:34 ` Eric Wong 1 sibling, 0 replies; 26+ messages in thread From: Johannes Sixt @ 2007-10-23 7:01 UTC (permalink / raw) To: Adam Roben; +Cc: git, Junio C Hamano, Eric Wong Adam Roben schrieb: > We were spending a lot of time forking/execing git-cat-file and > git-hash-object. We now use command_bidi_pipe to keep one instance of each > running and feed it input on stdin. I appreciate this. It's certainly going to be a much bigger win on Windows, although git svn doesn't work (in the MinGW port) at this time because of the old perl and the missing SVN module. -- Hannes ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 9/9] git-svn: Make fetch ~1.7x faster 2007-10-23 5:46 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben 2007-10-23 7:01 ` Johannes Sixt @ 2007-10-24 6:34 ` Eric Wong 2007-10-24 6:48 ` Adam Roben 1 sibling, 1 reply; 26+ messages in thread From: Eric Wong @ 2007-10-24 6:34 UTC (permalink / raw) To: Adam Roben; +Cc: git, Junio C Hamano Adam Roben <aroben@apple.com> wrote: > We were spending a lot of time forking/execing git-cat-file and > git-hash-object. We now use command_bidi_pipe to keep one instance of each > running and feed it input on stdin. Nice job! I just got access to a very fast SVN repository for a project I'm working on (not working on git-svn itself, unfortunately). A few comments and small nitpicks below: > Signed-off-by: Adam Roben <aroben@apple.com> > --- > git-svn.perl | 94 ++++++++++++++++++++++++++++++++++++++++++++------------- > 1 files changed, 72 insertions(+), 22 deletions(-) > +package Git::Commands; Can this be a separate file, or a part of Git.pm? I'm sure other scripts can eventually use this and I've been meaning to split git-svn.perl into separate files so it's easier to follow. > +use vars qw/$_cat_blob_pid $_cat_blob_in $_cat_blob_out $_cat_blob_ctx $_cat_blob_separator > + $_hash_object_pid $_hash_object_in $_hash_object_out $_hash_object_ctx/; I have trouble following long lines, and most of the git code also wraps at 80-columns. Dead-tree publishers got this concept right a long time ago :) > +use strict; > +use warnings; > +use File::Temp qw/tempfile/; > +use Git qw/command_bidi_pipe command_close_bidi_pipe/; > + > +sub _open_cat_blob_if_needed { > + return if defined($_cat_blob_pid); > + $_cat_blob_separator = "--------------GITCATFILESEPARATOR-----------"; Brian brought this up already, but yes, having pre-defined separators instead of explicitly-specified sizes makes it all too easy for a malicious user to commit code that will break things for git-svn users. > +sub hash_object { > + my (undef, $fh) = @_; > + > + my ($tmp_fh, $tmp_filename) = tempfile(UNLINK => 1); > + while (my $line = <$fh>) { > + print $tmp_fh $line; > + } > + close($tmp_fh); Related to the above. It's better to sysread()/syswrite() or read()/print() in a loop with a predefined buffer size rather than to use a readline() since you could be dealing with files with very long lines or binaries with no newline characters in them at all. > + _open_hash_object_if_needed(); > + print $_hash_object_out $tmp_filename . "\n"; Minor, but print $_hash_object_out $tmp_filename, "\n"; avoids creating a new string. -- Eric Wong ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 9/9] git-svn: Make fetch ~1.7x faster 2007-10-24 6:34 ` Eric Wong @ 2007-10-24 6:48 ` Adam Roben 0 siblings, 0 replies; 26+ messages in thread From: Adam Roben @ 2007-10-24 6:48 UTC (permalink / raw) To: Eric Wong; +Cc: git, Junio C Hamano Eric Wong wrote: > Adam Roben <aroben@apple.com> wrote: > >> +package Git::Commands; >> > > Can this be a separate file, or a part of Git.pm? I'm sure other > scripts can eventually use this and I've been meaning to split > git-svn.perl into separate files so it's easier to follow. > I had considered doing one of the above, but decided that splitting it out could be done if/when it was deemed useful for another script. But I'll split it out since you think it's a good idea. >> +use vars qw/$_cat_blob_pid $_cat_blob_in $_cat_blob_out $_cat_blob_ctx $_cat_blob_separator >> + $_hash_object_pid $_hash_object_in $_hash_object_out $_hash_object_ctx/; >> > > I have trouble following long lines, and most of the git code also wraps > at 80-columns. Dead-tree publishers got this concept right a long > time ago :) > Will fix. >> +use strict; >> +use warnings; >> +use File::Temp qw/tempfile/; >> +use Git qw/command_bidi_pipe command_close_bidi_pipe/; >> + >> +sub _open_cat_blob_if_needed { >> + return if defined($_cat_blob_pid); >> + $_cat_blob_separator = "--------------GITCATFILESEPARATOR-----------"; >> > > Brian brought this up already, but yes, having pre-defined separators > instead of explicitly-specified sizes makes it all too easy for a > malicious user to commit code that will break things for git-svn users. > Yup, will fix this. :-) >> +sub hash_object { >> + my (undef, $fh) = @_; >> + >> + my ($tmp_fh, $tmp_filename) = tempfile(UNLINK => 1); >> + while (my $line = <$fh>) { >> + print $tmp_fh $line; >> + } >> + close($tmp_fh); >> > > Related to the above. It's better to sysread()/syswrite() or > read()/print() in a loop with a predefined buffer size rather than to > use a readline() since you could be dealing with files with very long > lines or binaries with no newline characters in them at all. > Hm, OK. I'll look for similar code in git-svn and follow that. >> + _open_hash_object_if_needed(); >> + print $_hash_object_out $tmp_filename . "\n"; >> > > Minor, but > > print $_hash_object_out $tmp_filename, "\n"; > > avoids creating a new string. > Good idea. Thanks for the feedback! I'll send out some new patches sometime soon. -Adam ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 7/9] git-hash-object: Add --stdin-paths option 2007-10-23 5:46 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben 2007-10-23 5:46 ` [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben @ 2007-10-23 5:53 ` Shawn O. Pearce 2007-10-23 5:57 ` Adam Roben 1 sibling, 1 reply; 26+ messages in thread From: Shawn O. Pearce @ 2007-10-23 5:53 UTC (permalink / raw) To: Adam Roben; +Cc: git, Junio C Hamano Adam Roben <aroben@apple.com> wrote: > This allows multiple paths to be specified on stdin. git-fast-import wasn't suited to the task? -- Shawn. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 7/9] git-hash-object: Add --stdin-paths option 2007-10-23 5:53 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Shawn O. Pearce @ 2007-10-23 5:57 ` Adam Roben 2007-10-23 6:10 ` Shawn O. Pearce 0 siblings, 1 reply; 26+ messages in thread From: Adam Roben @ 2007-10-23 5:57 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git, Junio C Hamano Shawn O. Pearce wrote: > Adam Roben <aroben@apple.com> wrote: > >> This allows multiple paths to be specified on stdin. >> > > git-fast-import wasn't suited to the task? > I actually considered using fast-import for the whole shebang, but decided that I don't yet understand the workings and structure of git-svn well enough to make such a big change. git-svn uses git-hash-object to both determine a file's hash and insert it into the index in one go -- can fast-import do this? Or will it just put it in the index and not give you the hash back? The latter was my impression. -Adam ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 7/9] git-hash-object: Add --stdin-paths option 2007-10-23 5:57 ` Adam Roben @ 2007-10-23 6:10 ` Shawn O. Pearce 2007-10-24 6:11 ` Eric Wong 0 siblings, 1 reply; 26+ messages in thread From: Shawn O. Pearce @ 2007-10-23 6:10 UTC (permalink / raw) To: Adam Roben; +Cc: git, Junio C Hamano Adam Roben <aroben@apple.com> wrote: > Shawn O. Pearce wrote: > >Adam Roben <aroben@apple.com> wrote: > > > >>This allows multiple paths to be specified on stdin. > > > >git-fast-import wasn't suited to the task? > > I actually considered using fast-import for the whole shebang, but > decided that I don't yet understand the workings and structure of > git-svn well enough to make such a big change. > > git-svn uses git-hash-object to both determine a file's hash and insert > it into the index in one go -- can fast-import do this? Or will it just > put it in the index and not give you the hash back? The latter was my > impression. It doesn't currently give you the hash back. You can sort of get to it by marking the blob then using the 'checkpoint' command to dump the marks to a file, which you can read in. Not good. It probably wouldn't be very difficult to give fast-import a way to dump marks back on stdout as they are assigned. So long as the frontend either locksteps with fast-import or is willing to monitor it with a select/poll type of arrangement and read from stdout as soon as its ready. Probably a 5 line code change to fast-import. Like this. Only Git won't recognize that object SHA-1 as its in a packfile that has no index. You'd need to 'checkpoint' to flush the object out, or just use all of fast-import for the processing. So yea, I guess I can see now how its not suited to this. --8>-- diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index d511967..7fd8b2c 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -67,6 +67,10 @@ OPTIONS at checkpoint (or completion) the same path can also be safely given to \--import-marks. +--export-marks-to-stdout:: + Dumps marks to stdout as soon as they are assigned. + Marks are written one per line as `:markid SHA-1`. + --import-marks=<file>:: Before processing any input, load the marks specified in <file>. The input file must exist, must be readable, and diff --git a/fast-import.c b/fast-import.c index 6f888f6..619ed05 100644 --- a/fast-import.c +++ b/fast-import.c @@ -272,6 +272,7 @@ struct recent_command static unsigned long max_depth = 10; static off_t max_packsize = (1LL << 32) - 1; static int force_update; +static int marks_to_stdout; /* Stats and misc. counters */ static uintmax_t alloc_count; @@ -561,6 +562,7 @@ static char *pool_strdup(const char *s) static void insert_mark(uintmax_t idnum, struct object_entry *oe) { + uintmax_t orig_idnum = idnum; struct mark_set *s = marks; while ((idnum >> s->shift) >= 1024) { s = pool_calloc(1, sizeof(struct mark_set)); @@ -580,6 +582,8 @@ static void insert_mark(uintmax_t idnum, struct object_entry *oe) if (!s->data.marked[idnum]) marks_set_count++; s->data.marked[idnum] = oe; + if (marks_to_stdout) + printf(":%" PRIuMAX " %s\n", orig_idnum, sha1_to_hex(oe->sha1)); } static struct object_entry *find_mark(uintmax_t idnum) @@ -2294,6 +2298,8 @@ int main(int argc, const char **argv) max_active_branches = strtoul(a + 18, NULL, 0); else if (!prefixcmp(a, "--import-marks=")) import_marks(a + 15); + else if (!prefixcmp(a, "--export-marks-to-stdout")) + marks_to_stdout = 1; else if (!prefixcmp(a, "--export-marks=")) mark_file = a + 15; else if (!prefixcmp(a, "--export-pack-edges=")) { -- Shawn. ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 7/9] git-hash-object: Add --stdin-paths option 2007-10-23 6:10 ` Shawn O. Pearce @ 2007-10-24 6:11 ` Eric Wong 0 siblings, 0 replies; 26+ messages in thread From: Eric Wong @ 2007-10-24 6:11 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Adam Roben, git, Junio C Hamano "Shawn O. Pearce" <spearce@spearce.org> wrote: > Adam Roben <aroben@apple.com> wrote: > > Shawn O. Pearce wrote: > > >Adam Roben <aroben@apple.com> wrote: > > > > > >>This allows multiple paths to be specified on stdin. > > > > > >git-fast-import wasn't suited to the task? > > > > I actually considered using fast-import for the whole shebang, but > > decided that I don't yet understand the workings and structure of > > git-svn well enough to make such a big change. > > > > git-svn uses git-hash-object to both determine a file's hash and insert > > it into the index in one go -- can fast-import do this? Or will it just > > put it in the index and not give you the hash back? The latter was my > > impression. > > It doesn't currently give you the hash back. You can sort of get > to it by marking the blob then using the 'checkpoint' command to > dump the marks to a file, which you can read in. Not good. > > It probably wouldn't be very difficult to give fast-import a way > to dump marks back on stdout as they are assigned. So long as the > frontend either locksteps with fast-import or is willing to monitor > it with a select/poll type of arrangement and read from stdout as > soon as its ready. > > Probably a 5 line code change to fast-import. Like this. Only Git > won't recognize that object SHA-1 as its in a packfile that has > no index. You'd need to 'checkpoint' to flush the object out, or > just use all of fast-import for the processing. So yea, I guess > I can see now how its not suited to this. Shawn, thanks for clearing that up. I was previously considering fast-import for git-svn, but never had time[1] to really look at it. I guess Adam is on the right track with his patches. [1] - Sorry to all on the list, but I've really been slacking on git-svn work. I was going to get some stuff done this weekend but decided to attempt to fight my nasty caffeine addiction instead :x -- Eric Wong ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 6/9] Add tests for git hash-object 2007-10-23 5:46 ` [PATCH 6/9] Add tests for git hash-object Adam Roben 2007-10-23 5:46 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben @ 2007-10-23 6:59 ` Johannes Sixt 1 sibling, 0 replies; 26+ messages in thread From: Johannes Sixt @ 2007-10-23 6:59 UTC (permalink / raw) To: Adam Roben; +Cc: git, Junio C Hamano Adam Roben schrieb: > +test_expect_success \ > + 'hash a file' \ > + "test $hello_sha1 = $(git hash-object hello)" Put tests in double-quotes; otherwise, the substitutions happen before the test begins, and not as part of the test. -- Hannes ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 5/9] git-cat-file: Add --separator option 2007-10-23 5:46 ` [PATCH 5/9] git-cat-file: Add --separator option Adam Roben 2007-10-23 5:46 ` [PATCH 6/9] Add tests for git hash-object Adam Roben @ 2007-10-24 3:43 ` Brian Downing 2007-10-24 4:26 ` Adam Roben 1 sibling, 1 reply; 26+ messages in thread From: Brian Downing @ 2007-10-24 3:43 UTC (permalink / raw) To: Adam Roben; +Cc: git, Junio C Hamano On Mon, Oct 22, 2007 at 10:46:33PM -0700, Adam Roben wrote: > +--separator:: > + A string to print in between the output for each object passed on > + stdin. A newline will be appended to the separator each time it is > + printed. Maybe I'm just unreasonably paranoid, but I don't think I could ever trust that you'd never find an arbitrary separator in the data. I suppose if you scanned the files beforehand you could come up with something guaranteed to be unique, but that seems like a pain (and doesn't happen regardless in patch 9/9; it just uses "--------------GITCATFILESEPARATOR-----------") If I were committing to SVN, it's sure not something I'd like to bet the integrity of my data on. I think a far more reasonable output format for multiple objects would be something like: <count> LF <raw data> LF Where <count> is the number of bytes in the <raw data> as an ASCII decimal integer. This is pretty much the spiritual analog to the fast-import "exact byte count" data input format as well. -bcd ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 5/9] git-cat-file: Add --separator option 2007-10-24 3:43 ` [PATCH 5/9] git-cat-file: Add --separator option Brian Downing @ 2007-10-24 4:26 ` Adam Roben 0 siblings, 0 replies; 26+ messages in thread From: Adam Roben @ 2007-10-24 4:26 UTC (permalink / raw) To: Brian Downing; +Cc: git, Junio C Hamano Brian Downing wrote: > On Mon, Oct 22, 2007 at 10:46:33PM -0700, Adam Roben wrote: > >> +--separator:: >> + A string to print in between the output for each object passed on >> + stdin. A newline will be appended to the separator each time it is >> + printed. >> > > Maybe I'm just unreasonably paranoid, but I don't think I could ever > trust that you'd never find an arbitrary separator in the data. I > suppose if you scanned the files beforehand you could come up with > something guaranteed to be unique, but that seems like a pain (and > doesn't happen regardless in patch 9/9; it just uses > "--------------GITCATFILESEPARATOR-----------") If I were committing to > SVN, it's sure not something I'd like to bet the integrity of my data > on. > I had some of the same concerns. > I think a far more reasonable output format for multiple objects would > be something like: > > <count> LF > <raw data> LF > > Where <count> is the number of bytes in the <raw data> as an ASCII > decimal integer. > This sounds like a much better solution. I'll implement it that way and send out a new patch. Thanks for the suggestion! -Adam ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/9] Add tests for git cat-file 2007-10-23 5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben 2007-10-23 5:46 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben @ 2007-10-23 6:59 ` Johannes Sixt 1 sibling, 0 replies; 26+ messages in thread From: Johannes Sixt @ 2007-10-23 6:59 UTC (permalink / raw) To: Adam Roben; +Cc: git, Junio C Hamano Adam Roben schrieb: > + test_expect_success \ > + "$type exists" \ > + "git cat-file -e $hello_sha1" You mean $sha1 here, right? > + test_expect_success \ > + "Type of $type is correct" \ > + "test $type = \"$(git cat-file -t $sha1)\"" This should escape the $(...) in all the tests. Like this: "test $type = \"\$(git cat-file -t $sha1)\"" > +test_expect_success \ > + "Reach a blob from a tag pointing to it" \ > + "test \"$hello_content\" = \"$(git cat-file blob $tag_sha1)\"" And use single quotes without escaping the double-quotes here. -- Hannes ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/9] Make git-svn fetch ~1.7x faster 2007-10-23 5:46 [PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben 2007-10-23 5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben @ 2007-10-23 6:08 ` Mike Hommey 2007-10-23 6:13 ` Adam Roben 2007-10-24 0:43 ` Sam Vilain 1 sibling, 2 replies; 26+ messages in thread From: Mike Hommey @ 2007-10-23 6:08 UTC (permalink / raw) To: Adam Roben; +Cc: git, Junio C Hamano On Mon, Oct 22, 2007 at 10:46:28PM -0700, Adam Roben wrote: > > This patch series makes git-svn fetch about 1.7x faster by reducing the number > of forks/execs that occur for each file retrieved from Subversion. To do so, a > few new options are added to git-cat-file and git-hash-object to allow > continuous input on stdin and continuous output on stdout, so that one instance > of each of these commands can be kept running for the duration of the fetch. You don't need to do this to avoid forks. Just use git-fast-import instead. Mike ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/9] Make git-svn fetch ~1.7x faster 2007-10-23 6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey @ 2007-10-23 6:13 ` Adam Roben 2007-10-24 0:43 ` Sam Vilain 1 sibling, 0 replies; 26+ messages in thread From: Adam Roben @ 2007-10-23 6:13 UTC (permalink / raw) To: Mike Hommey; +Cc: git, Junio C Hamano Mike Hommey wrote: > On Mon, Oct 22, 2007 at 10:46:28PM -0700, Adam Roben wrote: > >> This patch series makes git-svn fetch about 1.7x faster by reducing the number >> of forks/execs that occur for each file retrieved from Subversion. To do so, a >> few new options are added to git-cat-file and git-hash-object to allow >> continuous input on stdin and continuous output on stdout, so that one instance >> of each of these commands can be kept running for the duration of the fetch. >> > > You don't need to do this to avoid forks. Just use git-fast-import > instead. > I agree that fast-import is probably ultimately a better solution for this, but given that git-svn currently uses the output of every command it forks off and that fast-import doesn't seem to give the same output, changing git-svn to use fast-import would be a fairly sweeping change that I didn't feel comfortable making without a better understanding of git-svn. -Adam ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 0/9] Make git-svn fetch ~1.7x faster 2007-10-23 6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey 2007-10-23 6:13 ` Adam Roben @ 2007-10-24 0:43 ` Sam Vilain 1 sibling, 0 replies; 26+ messages in thread From: Sam Vilain @ 2007-10-24 0:43 UTC (permalink / raw) To: Mike Hommey; +Cc: git, aroben Mike Hommey wrote: > On Mon, Oct 22, 2007 at 10:46:28PM -0700, Adam Roben wrote: >> This patch series makes git-svn fetch about 1.7x faster by reducing the number >> of forks/execs that occur for each file retrieved from Subversion. To do so, a >> few new options are added to git-cat-file and git-hash-object to allow >> continuous input on stdin and continuous output on stdout, so that one instance >> of each of these commands can be kept running for the duration of the fetch. > > You don't need to do this to avoid forks. Just use git-fast-import > instead. git-fast-import only covers the hash-object side of things, not cat-file. git-fast-import does not currently suit 'gradual deployment' for converters such as git-svn, because it; - returns object IDs at the end, when you checkpoint. This could be 'fixed' by allowing a marks log file instead of or in addition to the current behaviour, though if the exporter is continually waiting for the tokens rather than using marks, it will slow it down. - you can't use plumbing commands, such as rev-parse, cat-file, etc on objects which have not been checkpointed yet. - can't just stream a file of unknown length to it as you can to hash-object These are the design trade-offs of using fast-import. Using fast-import, you are creating a 'transaction' area which uses user sequences instead of (git)database-issued identifiers. And this transaction is isolated from the other concurrent users of the object database. However the interface does not have the full git CLI available to it, so unlike a regular database transaction, you end up having to care. Rewriting the importer so as to correctly deal with these problems is quite challenging, and for slow import sources such as Subversion, of limited merit. Sam. ^ permalink raw reply [flat|nested] 26+ messages in thread
* [RESEND PATCH 0/9] Make git-svn fetch ~1.7x faster @ 2007-10-25 10:25 Adam Roben 2007-10-25 10:25 ` [PATCH 1/9] Add tests for git cat-file Adam Roben 0 siblings, 1 reply; 26+ messages in thread From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw) To: git; +Cc: Junio Hamano This is a resend of my previous patch series to speed up git-svn, taking into account comments from Eric, Johannes, and Brian. -- Documentation/git-cat-file.txt | 6 +- Documentation/git-hash-object.txt | 5 +- builtin-cat-file.c | 87 +++++++++++++++++---- git-svn.perl | 40 +++++----- hash-object.c | 29 +++++++- perl/Git.pm | 153 ++++++++++++++++++++++++++++++++++++- t/t1005-cat-file.sh | 126 ++++++++++++++++++++++++++++++ t/t1006-hash-object.sh | 49 ++++++++++++ 8 files changed, 452 insertions(+), 43 deletions(-) ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 1/9] Add tests for git cat-file 2007-10-25 10:25 [RESEND PATCH " Adam Roben @ 2007-10-25 10:25 ` Adam Roben 2007-10-25 10:25 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben 0 siblings, 1 reply; 26+ messages in thread From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw) To: git; +Cc: Junio Hamano, Adam Roben, Johannes Sixt Signed-off-by: Adam Roben <aroben@apple.com> --- Johannes Sixt wrote: > Adam Roben schrieb: > > + test_expect_success \ > > + "$type exists" \ > > + "git cat-file -e $hello_sha1" > > You mean $sha1 here, right? I most definitely did! > > + test_expect_success \ > > + "Type of $type is correct" \ > > + "test $type = \"$(git cat-file -t $sha1)\"" > > This should escape the $(...) in all the tests. Like this: > > "test $type = \"\$(git cat-file -t $sha1)\"" > > > +test_expect_success \ > > + "Reach a blob from a tag pointing to it" \ > > + "test \"$hello_content\" = \"$(git cat-file blob $tag_sha1)\"" > > And use single quotes without escaping the double-quotes here. Done. t/t1005-cat-file.sh | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 91 insertions(+), 0 deletions(-) create mode 100755 t/t1005-cat-file.sh diff --git a/t/t1005-cat-file.sh b/t/t1005-cat-file.sh new file mode 100755 index 0000000..697354d --- /dev/null +++ b/t/t1005-cat-file.sh @@ -0,0 +1,91 @@ +#!/bin/sh + +test_description='git cat-file' + +. ./test-lib.sh + +function maybe_remove_timestamp() +{ + if test -z "$2"; then + echo "$1" + else + echo "$1" | sed -e 's/ [0-9]\{10\} [+-][0-9]\{4\}$//' + fi +} + +function run_tests() +{ + type=$1 + sha1=$2 + size=$3 + content=$4 + pretty_content=$5 + no_timestamp=$6 + + test_expect_success \ + "$type exists" \ + "git cat-file -e $sha1" + test_expect_success \ + "Type of $type is correct" \ + "test $type = \"\$(git cat-file -t $sha1)\"" + test_expect_success \ + "Size of $type is correct" \ + "test $size = \"\$(git cat-file -s $sha1)\"" + test -z "$content" || test_expect_success \ + "Content of $type is correct" \ + "test \"\$(maybe_remove_timestamp '$content' $no_timestamp)\" = \"\$(maybe_remove_timestamp \"\$(git cat-file $type $sha1)\" $no_timestamp)\"" + test_expect_success \ + "Pretty content of $type is correct" \ + "test \"\$(maybe_remove_timestamp '$pretty_content' $no_timestamp)\" = \"\$(maybe_remove_timestamp \"\$(git cat-file -p $sha1)\" $no_timestamp)\"" +} + +hello_content="Hello World" +hello_size=$(echo "$hello_content" | wc -c) +hello_sha1=557db03de997c86a4a028e1ebd3a1ceb225be238 + +echo "$hello_content" > hello + +git update-index --add hello + +run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content" + +tree_sha1=$(git write-tree) +tree_size=33 +tree_pretty_content="100644 blob $hello_sha1 hello" + +run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content" + +commit_message="Intial commit" +commit_sha1=$(echo "$commit_message" | git commit-tree $tree_sha1) +commit_size=177 +commit_content="tree $tree_sha1 +author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 0000000000 +0000 +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 0000000000 +0000 + +$commit_message" + +run_tests 'commit' $commit_sha1 $commit_size "$commit_content" "$commit_content" 1 + +tag_header="object $hello_sha1 +type blob +tag hellotag +tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL>" +tag_description="This is a tag" +tag_content="$tag_header + +$tag_description" +tag_pretty_content="$tag_header +Thu Jan 1 00:00:00 1970 +0000 + +$tag_description" + +tag_sha1=$(echo "$tag_content" | git mktag) +tag_size=$(echo "$tag_content" | wc -c) + +run_tests 'tag' $tag_sha1 $tag_size "$tag_content" "$tag_pretty_content" + +test_expect_success \ + "Reach a blob from a tag pointing to it" \ + "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\"" + +test_done -- 1.5.3.4.1337.g8e67d-dirty ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file 2007-10-25 10:25 ` [PATCH 1/9] Add tests for git cat-file Adam Roben @ 2007-10-25 10:25 ` Adam Roben 2007-10-25 10:25 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben 0 siblings, 1 reply; 26+ messages in thread From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw) To: git; +Cc: Junio Hamano, Adam Roben I separated the logic of parsing the arguments from the logic of fetching and outputting the data. cat_one_file now does the latter. Signed-off-by: Adam Roben <aroben@apple.com> --- builtin-cat-file.c | 38 ++++++++++++++++++++++---------------- 1 files changed, 22 insertions(+), 16 deletions(-) diff --git a/builtin-cat-file.c b/builtin-cat-file.c index f132d58..34a63d1 100644 --- a/builtin-cat-file.c +++ b/builtin-cat-file.c @@ -76,31 +76,16 @@ static void pprint_tag(const unsigned char *sha1, const char *buf, unsigned long write_or_die(1, cp, endp - cp); } -int cmd_cat_file(int argc, const char **argv, const char *prefix) +static int cat_one_file(int opt, const char *exp_type, const char *obj_name) { unsigned char sha1[20]; enum object_type type; void *buf; unsigned long size; - int opt; - const char *exp_type, *obj_name; - - git_config(git_default_config); - if (argc != 3) - usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>"); - exp_type = argv[1]; - obj_name = argv[2]; if (get_sha1(obj_name, sha1)) die("Not a valid object name %s", obj_name); - opt = 0; - if ( exp_type[0] == '-' ) { - opt = exp_type[1]; - if ( !opt || exp_type[2] ) - opt = -1; /* Not a single character option */ - } - buf = NULL; switch (opt) { case 't': @@ -157,3 +142,24 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) write_or_die(1, buf, size); return 0; } + +int cmd_cat_file(int argc, const char **argv, const char *prefix) +{ + int opt; + const char *exp_type, *obj_name; + + git_config(git_default_config); + if (argc != 3) + usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>"); + exp_type = argv[1]; + obj_name = argv[2]; + + opt = 0; + if ( exp_type[0] == '-' ) { + opt = exp_type[1]; + if ( !opt || exp_type[2] ) + opt = -1; /* Not a single character option */ + } + + return cat_one_file(opt, exp_type, obj_name); +} -- 1.5.3.4.1337.g8e67d-dirty ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 3/9] git-cat-file: Make option parsing a little more flexible 2007-10-25 10:25 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben @ 2007-10-25 10:25 ` Adam Roben 2007-10-25 10:25 ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben 0 siblings, 1 reply; 26+ messages in thread From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw) To: git; +Cc: Junio Hamano, Adam Roben This will make it easier to add newer options later. Signed-off-by: Adam Roben <aroben@apple.com> --- builtin-cat-file.c | 42 ++++++++++++++++++++++++++++++------------ 1 files changed, 30 insertions(+), 12 deletions(-) diff --git a/builtin-cat-file.c b/builtin-cat-file.c index 34a63d1..3a0be4a 100644 --- a/builtin-cat-file.c +++ b/builtin-cat-file.c @@ -143,23 +143,41 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) return 0; } +static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] <sha1>"; + int cmd_cat_file(int argc, const char **argv, const char *prefix) { - int opt; - const char *exp_type, *obj_name; + int i, opt = 0; + const char *exp_type = 0, *obj_name = 0; git_config(git_default_config); - if (argc != 3) - usage("git-cat-file [-t|-s|-e|-p|<type>] <sha1>"); - exp_type = argv[1]; - obj_name = argv[2]; - - opt = 0; - if ( exp_type[0] == '-' ) { - opt = exp_type[1]; - if ( !opt || exp_type[2] ) - opt = -1; /* Not a single character option */ + + for (i = 1; i < argc; ++i) { + const char *arg = argv[i]; + + if (!strcmp(arg, "-t") || !strcmp(arg, "-s") || !strcmp(arg, "-e") || !strcmp(arg, "-p")) { + exp_type = arg; + opt = exp_type[1]; + continue; + } + + if (arg[0] == '-') + usage(cat_file_usage); + + if (!exp_type) { + exp_type = arg; + continue; + } + + if (obj_name) + usage(cat_file_usage); + + obj_name = arg; + break; } + if (!exp_type || !obj_name) + usage(cat_file_usage); + return cat_one_file(opt, exp_type, obj_name); } -- 1.5.3.4.1337.g8e67d-dirty ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 4/9] git-cat-file: Add --stdin option 2007-10-25 10:25 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben @ 2007-10-25 10:25 ` Adam Roben 2007-10-26 20:59 ` Junio C Hamano 0 siblings, 1 reply; 26+ messages in thread From: Adam Roben @ 2007-10-25 10:25 UTC (permalink / raw) To: git; +Cc: Junio Hamano, Adam Roben, Brian Downing This lets you specify object names on stdin instead of on the command line. When printing object contents or pretty-printing, objects will be printed preceded by their size: <size>LF <content>LF Signed-off-by: Adam Roben <aroben@apple.com> --- Brian Downing wrote: > I think a far more reasonable output format for multiple objects would > be something like: > > <count> LF > <raw data> LF > > Where <count> is the number of bytes in the <raw data> as an ASCII > decimal integer. Agreed. Documentation/git-cat-file.txt | 6 ++++- builtin-cat-file.c | 43 ++++++++++++++++++++++++++++++++++----- t/t1005-cat-file.sh | 35 ++++++++++++++++++++++++++++++++ 3 files changed, 77 insertions(+), 7 deletions(-) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index afa095c..588d71a 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -8,7 +8,7 @@ git-cat-file - Provide content or type/size information for repository objects SYNOPSIS -------- -'git-cat-file' [-t | -s | -e | -p | <type>] <object> +'git-cat-file' [-t | -s | -e | -p | <type>] [--stdin | <object>] DESCRIPTION ----------- @@ -23,6 +23,10 @@ OPTIONS For a more complete list of ways to spell object names, see "SPECIFYING REVISIONS" section in gitlink:git-rev-parse[1]. +--stdin:: + Read object names from stdin instead of specifying one on the + command line. + -t:: Instead of the content, show the object type identified by <object>. diff --git a/builtin-cat-file.c b/builtin-cat-file.c index 3a0be4a..ee46ba4 100644 --- a/builtin-cat-file.c +++ b/builtin-cat-file.c @@ -76,7 +76,7 @@ static void pprint_tag(const unsigned char *sha1, const char *buf, unsigned long write_or_die(1, cp, endp - cp); } -static int cat_one_file(int opt, const char *exp_type, const char *obj_name) +static int cat_one_file(int opt, const char *exp_type, const char *obj_name, int print_size) { unsigned char sha1[20]; enum object_type type; @@ -139,16 +139,26 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) if (!buf) die("git-cat-file %s: bad file", obj_name); + if (print_size) { + printf("%lu\n", size); + fflush(stdout); + } write_or_die(1, buf, size); + if (print_size) { + printf("\n"); + fflush(stdout); + } return 0; } -static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] <sha1>"; +static const char cat_file_usage[] = "git-cat-file [-t|-s|-e|-p|<type>] [--stdin | <sha1>]"; int cmd_cat_file(int argc, const char **argv, const char *prefix) { - int i, opt = 0; + int i, opt = 0, print_size = 0; + int read_stdin = 0; const char *exp_type = 0, *obj_name = 0; + struct strbuf buf; git_config(git_default_config); @@ -161,6 +171,11 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) continue; } + if (!strcmp(arg, "--stdin")) { + read_stdin = 1; + continue; + } + if (arg[0] == '-') usage(cat_file_usage); @@ -169,15 +184,31 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) continue; } - if (obj_name) + if (obj_name || read_stdin) usage(cat_file_usage); obj_name = arg; break; } - if (!exp_type || !obj_name) + if (!exp_type) usage(cat_file_usage); - return cat_one_file(opt, exp_type, obj_name); + if (!read_stdin) { + if (!obj_name) + usage(cat_file_usage); + return cat_one_file(opt, exp_type, obj_name, 0); + } + + print_size = !opt || opt == 'p'; + + strbuf_init(&buf, 0); + while (strbuf_getline(&buf, stdin, '\n') != EOF) { + int error = cat_one_file(opt, exp_type, buf.buf, print_size); + if (error) + return error; + } + strbuf_release(&buf); + + return 0; } diff --git a/t/t1005-cat-file.sh b/t/t1005-cat-file.sh index 697354d..2b2d386 100755 --- a/t/t1005-cat-file.sh +++ b/t/t1005-cat-file.sh @@ -88,4 +88,39 @@ test_expect_success \ "Reach a blob from a tag pointing to it" \ "test '$hello_content' = \"\$(git cat-file blob $tag_sha1)\"" +sha1s="$hello_sha1 +$tree_sha1 +$commit_sha1 +$tag_sha1" + +sizes="$hello_size +$tree_size +$commit_size +$tag_size" + +test_expect_success \ + "Pass object hashes on stdin to retrieve sizes" \ + "test '$sizes' = \"\$(echo '$sha1s' | git cat-file -s --stdin)\"" + +example_content="Silly example" +example_size=$(echo "$example_content" | wc -c) +example_sha1=f24c74a2e500f5ee1332c86b94199f52b1d1d962 + +echo "$example_content" > example + +git update-index --add example + +sha1s="$hello_sha1 +$example_sha1" + +contents="$hello_size +$hello_content + +$example_size +$example_content" + +test_expect_success \ + "Pass object hashes on stdin to retrieve contents" \ + "test '$contents' = \"\$(echo '$sha1s' | git cat-file blob --stdin)\"" + test_done -- 1.5.3.4.1337.g8e67d-dirty ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 4/9] git-cat-file: Add --stdin option 2007-10-25 10:25 ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben @ 2007-10-26 20:59 ` Junio C Hamano 0 siblings, 0 replies; 26+ messages in thread From: Junio C Hamano @ 2007-10-26 20:59 UTC (permalink / raw) To: Adam Roben; +Cc: git, Brian Downing Adam Roben <aroben@apple.com> writes: > @@ -23,6 +23,10 @@ OPTIONS > For a more complete list of ways to spell object names, see > "SPECIFYING REVISIONS" section in gitlink:git-rev-parse[1]. > > +--stdin:: > + Read object names from stdin instead of specifying one on the > + command line. > + This does not talk about modified output format: what the format is, nor when that modified format is used. > @@ -139,16 +139,26 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) > if (!buf) > die("git-cat-file %s: bad file", obj_name); > > + if (print_size) { > + printf("%lu\n", size); > + fflush(stdout); > + } > write_or_die(1, buf, size); > + if (print_size) { > + printf("\n"); > + fflush(stdout); > + } > return 0; > } > Not that I object strongly to it, but do we need extra LF after the contents? - "It would help readers written in typical scripting languages" is an acceptable answer, but I doubt that is the case --- the reader is given the number of bytes and is going to "read($pipe, $buf, $that_size)" anyway. - "The reader can assert that one-byte past the content is a LF to catch errors, and this LF would help re-synchronize after such an error" would be another acceptable answer, but for the re-synchronization to work, the output needs to tell which record each chunk is about (i.e. if the output were "<type> <sha1> <size>LF<contents>LF", the "re-sync" argument would make a bit more sense). > + print_size = !opt || opt == 'p'; Needs a bit of comment here, and in the documentation. E.g. git-cat-file --stdin -t <list-of-sha1 git-cat-file --stdin -s <list-of-sha1 are ways to check types and sizes of the objects in the list. How does --stdin interact with -e? How does --stdin interact with -p when printing a tree or a tag object? How does "blob --stdin" do when input sequence contains a non blob SHA1? It almost feels that --stdin should be named something else, such as --batch or --bulk, as it is not just affecting the input. Here is an alternative suggestion. Two new options, --batch and --batch-check, are introduced. These options are incompatible with -[tsep] or an object type given as the first parameter to git-cat-file. * git-cat-file --batch-check <list-of-sha1 outputs a record of this form <sha1> SP <type> SP <size> LF for each of the input lines. * git-cat-file --batch <list-of-sha1 outputs a record of this form <sha1> SP <type> SP <size> LF <contents> LF for each of the input lines. For a missing object, either option gives a record of form: <sha1> SP missing LF ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2007-10-26 20:59 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-10-23 5:46 [PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben 2007-10-23 5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben 2007-10-23 5:46 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben 2007-10-23 5:46 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben 2007-10-23 5:46 ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben 2007-10-23 5:46 ` [PATCH 5/9] git-cat-file: Add --separator option Adam Roben 2007-10-23 5:46 ` [PATCH 6/9] Add tests for git hash-object Adam Roben 2007-10-23 5:46 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben 2007-10-23 5:46 ` [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben 2007-10-23 5:46 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben 2007-10-23 7:01 ` Johannes Sixt 2007-10-24 6:34 ` Eric Wong 2007-10-24 6:48 ` Adam Roben 2007-10-23 5:53 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Shawn O. Pearce 2007-10-23 5:57 ` Adam Roben 2007-10-23 6:10 ` Shawn O. Pearce 2007-10-24 6:11 ` Eric Wong 2007-10-23 6:59 ` [PATCH 6/9] Add tests for git hash-object Johannes Sixt 2007-10-24 3:43 ` [PATCH 5/9] git-cat-file: Add --separator option Brian Downing 2007-10-24 4:26 ` Adam Roben 2007-10-23 6:59 ` [PATCH 1/9] Add tests for git cat-file Johannes Sixt 2007-10-23 6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey 2007-10-23 6:13 ` Adam Roben 2007-10-24 0:43 ` Sam Vilain -- strict thread matches above, loose matches on Subject: below -- 2007-10-25 10:25 [RESEND PATCH " Adam Roben 2007-10-25 10:25 ` [PATCH 1/9] Add tests for git cat-file Adam Roben 2007-10-25 10:25 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben 2007-10-25 10:25 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben 2007-10-25 10:25 ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben 2007-10-26 20:59 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).