[PATCH v5 2/2] gitweb: append short hash ids to snapshot files

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v5 2/2] gitweb: append short hash ids to snapshot files
@ 2009-09-26 17:46 Mark Rada
  2009-10-05 10:06 ` Jakub Narebski
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Rada @ 2009-09-26 17:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narebski, git

Teach gitweb how to produce nicer snapshot names by only using the
short hash id. If clients make requests using a tree-ish that is not a
partial or full SHA-1 hash, then the short hash will also be appended
to whatever they asked for.

This also includes tests cases for t9502-gitweb-standalone-parse-output.

Signed-off-by: Mark Rada <marada@uwaterloo.ca>
---


	Changes since v4:
		- moved git_get_full_hash into this commit
		- changed test case format, suggested by Junio
		- explicity request at least a length of 7 for short hashes


 gitweb/gitweb.perl                        |   40 +++++++++++++++--
 t/t9502-gitweb-standalone-parse-output.sh |   67 +++++++++++++++++++++++++++++
 2 files changed, 103 insertions(+), 4 deletions(-)
 create mode 100644 t/t9502-gitweb-standalone-parse-output.sh

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 8d4a2ae..bc132a5 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1983,14 +1983,39 @@ sub quote_command {
 
 # get HEAD ref of given project as hash
 sub git_get_head_hash {
+	return git_get_full_hash(shift, 'HEAD');
+}
+
+sub git_get_full_hash {
 	my $project = shift;
+	my $hash = shift;
 	my $o_git_dir = $git_dir;
 	my $retval = undef;
 	$git_dir = "$projectroot/$project";
-	if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") {
-		my $head = <$fd>;
+	if (open my $fd, '-|', git_cmd(), 'rev-parse', '--verify', $hash) {
+		$hash = <$fd>;
 		close $fd;
-		if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) {
+		if (defined $hash && $hash =~ /^([0-9a-fA-F]{40})$/) {
+			$retval = $1;
+		}
+	}
+	if (defined $o_git_dir) {
+		$git_dir = $o_git_dir;
+	}
+	return $retval;
+}
+
+# try and get a shorter hash id
+sub git_get_short_hash {
+	my $project = shift;
+	my $hash = shift;
+	my $o_git_dir = $git_dir;
+	my $retval = undef;
+	$git_dir = "$projectroot/$project";
+	if (open my $fd, '-|', git_cmd(), 'rev-parse', '--short=7', $hash) {
+		$hash = <$fd>;
+		close $fd;
+		if (defined $hash && $hash =~ /^([0-9a-fA-F]{7,})$/) {
 			$retval = $1;
 		}
 	}
@@ -5203,6 +5228,13 @@ sub git_snapshot {
 		die_error(400, 'Object is not a tree-ish');
 	}
 
+
+	my $full_hash = git_get_full_hash($project, $hash);
+	if ($full_hash =~ /^$hash/) {
+		$hash = git_get_short_hash($project, $hash);
+	} else {
+		$hash .= '-' . git_get_short_hash($project, $hash);
+	}
 	my $name = $project;
 	$name =~ s,([^/])/*\.git$,$1,;
 	$name = basename($name);
@@ -5213,7 +5245,7 @@ sub git_snapshot {
 	$cmd = quote_command(
 		git_cmd(), 'archive',
 		"--format=$known_snapshot_formats{$format}{'format'}",
-		"--prefix=$name/", $hash);
+		"--prefix=$name/", $full_hash);
 	if (exists $known_snapshot_formats{$format}{'compressor'}) {
 		$cmd .= ' | ' . quote_command(@{$known_snapshot_formats{$format}{'compressor'}});
 	}
diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh
new file mode 100644
index 0000000..5f2b1d5
--- /dev/null
+++ b/t/t9502-gitweb-standalone-parse-output.sh
@@ -0,0 +1,67 @@
+#!/bin/sh
+#
+# Copyright (c) 2009 Mark Rada
+#
+
+test_description='gitweb as standalone script (parsing script output).
+
+This test runs gitweb (git web interface) as a CGI script from the
+commandline, and checks that it produces the correct output, either
+in the HTTP header or the actual script output.'
+
+
+. ./gitweb-lib.sh
+
+# ----------------------------------------------------------------------
+# snapshot file name
+
+test_commit \
+	'SnapshotFileTests' \
+	'i can has snapshot?'
+
+test_expect_success 'snapshots: give full hash' '
+	ID=`git rev-parse --verify HEAD` &&
+	gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" &&
+	ID=`git rev-parse --short HEAD` &&
+	grep ".git-$ID.tar.gz" gitweb.output
+'
+test_debug 'cat gitweb.output'
+
+test_expect_success 'snapshots: give short hash' '
+	ID=`git rev-parse --short HEAD` &&
+	gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" &&
+	grep ".git-$ID.tar.gz" gitweb.output
+'
+test_debug 'cat gitweb.output'
+
+test_expect_success 'snapshots: give almost full hash' '
+	ID=`git rev-parse --short=30 HEAD` &&
+	gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" &&
+	ID=`git rev-parse --short HEAD` &&
+	grep ".git-$ID.tar.gz" gitweb.output
+'
+test_debug 'cat gitweb.output'
+
+test_expect_success 'snapshots: give HEAD tree-ish' '
+	gitweb_run "p=.git;a=snapshot;h=HEAD;sf=tgz" &&
+	ID=`git rev-parse --short HEAD` &&
+	grep ".git-HEAD-$ID.tar.gz" gitweb.output
+'
+test_debug 'cat gitweb.output'
+
+test_expect_success 'snapshots: give branch name tree-ish' '
+	gitweb_run "p=.git;a=snapshot;h=master;sf=tgz" &&
+	ID=`git rev-parse --short master` &&
+	grep ".git-master-$ID.tar.gz" gitweb.output
+'
+test_debug 'cat gitweb.output'
+
+test_expect_success 'snapshots: give tag tree-ish' '
+	gitweb_run "p=.git;a=snapshot;h=SnapshotFileTests;sf=tgz" &&
+	ID=`git rev-parse --short SnapshotFileTests` &&
+	grep ".git-SnapshotFileTests-$ID.tar.gz" gitweb.output
+'
+test_debug 'cat gitweb.output'
+
+
+test_done
-- 
1.6.4.GIT

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v5 2/2] gitweb: append short hash ids to snapshot files
  2009-09-26 17:46 [PATCH v5 2/2] gitweb: append short hash ids to snapshot files Mark Rada
@ 2009-10-05 10:06 ` Jakub Narebski
  2009-10-12 15:34   ` Mark Rada
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Narebski @ 2009-10-05 10:06 UTC (permalink / raw)
  To: Mark Rada; +Cc: Junio C Hamano, git

I am sorry for being late with review of this patch.

On Sat, 26 Sep 2009, Mark Rada wrote:

> Teach gitweb how to produce nicer snapshot names by only using the
> short hash id.

A few questions (which I think should be answered in this commit message).

First, what was original behaviour of 'snapshot' action?  Did gitweb
always convert 'h' (hash) parameter to full SHA-1?

Second, do you preserve 'snapshot' behavior that it generated archive
which unpacks to the directory with the same name as basename of (proposed)
archive name?  I mean here that "repo-2cc6859e.tar.gz" unpacks to
"repo-2cc6859e/" directory.  I guess it does, but this should be
mentioned in the commit message.

>               If clients make requests using a tree-ish that is not a
> partial or full SHA-1 hash, then the short hash will also be appended
> to whatever they asked for.

Here example would be a good idea.  I guess this means that if one is
requesting for snapshot of 'next' or 'v1.6.0' of 'repo.git' project,
one would get 'repo-next-2cc6859.tar.gz' or 'repo-v1.6.0-2cc6859.tar.gz'
as [proposed] snapshot filename.

> 
> This also includes tests cases for t9502-gitweb-standalone-parse-output.

I am not sure if it shouldn't be rather t9502-gitweb-standalone-snapshot
test; see comments below for whys.

> 
> Signed-off-by: Mark Rada <marada@uwaterloo.ca>
> ---
> 
> 
> 	Changes since v4:
> 		- moved git_get_full_hash into this commit
> 		- changed test case format, suggested by Junio
> 		- explicity request at least a length of 7 for short hashes
> 
> 
>  gitweb/gitweb.perl                        |   40 +++++++++++++++--
>  t/t9502-gitweb-standalone-parse-output.sh |   67 +++++++++++++++++++++++++++++
>  2 files changed, 103 insertions(+), 4 deletions(-)
>  create mode 100644 t/t9502-gitweb-standalone-parse-output.sh
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 8d4a2ae..bc132a5 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1983,14 +1983,39 @@ sub quote_command {
>  
>  # get HEAD ref of given project as hash
>  sub git_get_head_hash {
> +	return git_get_full_hash(shift, 'HEAD');
> +}

Good.  This means that we don't need to change code that do not use
new feature (new subroutines).  It's nice to cater to backward 
compatibility, especially if it is so low cost as this one.

> +
> +sub git_get_full_hash {
>  	my $project = shift;
> +	my $hash = shift;

I think it might be good idea here to default to 'HEAD', i.e.

  +	my $hash = shift || 'HEAD';

>  	my $o_git_dir = $git_dir;
>  	my $retval = undef;
>  	$git_dir = "$projectroot/$project";
> -	if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") {
> -		my $head = <$fd>;
> +	if (open my $fd, '-|', git_cmd(), 'rev-parse', '--verify', $hash) {
> +		$hash = <$fd>;
>  		close $fd;
> -		if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) {
> +		if (defined $hash && $hash =~ /^([0-9a-fA-F]{40})$/) {
> +			$retval = $1;
> +		}

I guess that you use "$retval = $1;" instead of just "$retval = $hash;"
because of similarities with git_get_short_hash, isn't it?  Or it is just
following earlier code?

> +	}
> +	if (defined $o_git_dir) {
> +		$git_dir = $o_git_dir;
> +	}
> +	return $retval;
> +}
> +
> +# try and get a shorter hash id
> +sub git_get_short_hash {
> +	my $project = shift;
> +	my $hash = shift;
> +	my $o_git_dir = $git_dir;
> +	my $retval = undef;
> +	$git_dir = "$projectroot/$project";
> +	if (open my $fd, '-|', git_cmd(), 'rev-parse', '--short=7', $hash) {
> +		$hash = <$fd>;
> +		close $fd;
> +		if (defined $hash && $hash =~ /^([0-9a-fA-F]{7,})$/) {
>  			$retval = $1;
>  		}
>  	}

Note that git_get_full_hash (which additionally does verification) and
git_get_short_hash share much of code.  Perhaps it might be worth to
avoid code duplication somehow?  On the other hand it might be not worth
to complicate code by trying to extract common parts here...

> @@ -5203,6 +5228,13 @@ sub git_snapshot {
>  		die_error(400, 'Object is not a tree-ish');
>  	}
>  
> +
> +	my $full_hash = git_get_full_hash($project, $hash);
> +	if ($full_hash =~ /^$hash/) {
> +		$hash = git_get_short_hash($project, $hash);
> +	} else {
> +		$hash .= '-' . git_get_short_hash($project, $hash);
> +	}

I think we might want to avoid calling git_get_full_hash (and extra call
to "git rev-parse" command, which is extra fork) if we know in advance
that  $full_hash =~ /^$hash/  can't be true, i.e. if $hash doesn't match
/^[0-9a-fA-F]+$/.  That would require that we continue to use $hash
and not $full_hash, see comment for the chunk below.

BTW do you think that having better name (nicer name in the case
when $hash is full SHA-1, or name which describes exact version as 
in the case when $hash is branch name or just 'HEAD') is worth
slight extra cost of "git rev-parse --abbrev=7"?

>  	my $name = $project;
>  	$name =~ s,([^/])/*\.git$,$1,;
>  	$name = basename($name);
> @@ -5213,7 +5245,7 @@ sub git_snapshot {
>  	$cmd = quote_command(
>  		git_cmd(), 'archive',
>  		"--format=$known_snapshot_formats{$format}{'format'}",
> -		"--prefix=$name/", $hash);
> +		"--prefix=$name/", $full_hash);

Why this change?

>  	if (exists $known_snapshot_formats{$format}{'compressor'}) {
>  		$cmd .= ' | ' . quote_command(@{$known_snapshot_formats{$format}{'compressor'}});
>  	}
> diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh
> new file mode 100644
> index 0000000..5f2b1d5
> --- /dev/null
> +++ b/t/t9502-gitweb-standalone-parse-output.sh
> @@ -0,0 +1,67 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2009 Mark Rada
> +#
> +
> +test_description='gitweb as standalone script (parsing script output).
> +
> +This test runs gitweb (git web interface) as a CGI script from the
> +commandline, and checks that it produces the correct output, either
> +in the HTTP header or the actual script output.'

Currently all tests here are about 'snapshot' action.  They are quite
specific, and they do require some knowledge about chosen archive format.
I think it would be better to put snapshot test into separate test,
i.e. in 't/t9502-gitweb-standalone-snapshot.sh'.

> +
> +
> +. ./gitweb-lib.sh
> +
> +# ----------------------------------------------------------------------
> +# snapshot file name
> +
> +test_commit \
> +	'SnapshotFileTests' \
> +	'i can has snapshot?'

Errr... with filename [cutely] called 'i can has snapshot?' you would
have, I guess, problems with tests on MS Windows, where IIRC '?' is
forbidden in filenames.

Perhaps

  +test_commit \
  +	'Initial commit' \
  +	'foo'

or

  +test_commit \
  +	'SnapshotFileTests' \
  +	'foo' 'i can has snapshot?'

> +

In the test below you use "git rev-parse --verify HEAD" and
"git rev-parse --short HEAD" over and over.  I think it would be better
to calculate them upfront:

  +test_expect_success 'calculate full and short ids' '
  +	FULLID= $(git rev-parse --verify  HEAD) &&
  +	SHORTID=$(git rev-parse --short=7 HEAD)
  +'

> +test_expect_success 'snapshots: give full hash' '

You test here that giving full has works, and that gitweb uses short hash
in file name.  Better name would therefore be something like

  +test_expect_success 'snapshots: give full hash, get short hash' '

> +	ID=`git rev-parse --verify HEAD` &&
> +	gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" &&

Here I had to remember that 'tgz' snapshot format is enabled by default.
I think it could be better to explicitly enable it in preparation step.

> +	ID=`git rev-parse --short HEAD` &&
> +	grep ".git-$ID.tar.gz" gitweb.output

Here had to think a bit that gitweb.output consists both of HTTP headers,
and of response body, and you are grepping here in the HTTP headers part.
It would be better solution for gitweb_run to split gitweb.output into
gitweb.headers and gitweb.body (perhaps if requested by setting some
variable, e.g. GITWEB_SPLIT_OUTPUT).

It can be done using the following lines:

	sed    -e '/^\r$/'      <gitweb.output >gitweb.headers
	sed -n -e '0,/^\r$/!p'  <gitweb.output >gitweb.body

	# gitweb.headers is used to parse http headers
	# gitweb.body is response without http headers

But the second one uses GNU sed extension; I don't know how to write
it in more portable way.

Then

> +	grep ".git-$ID.tar.gz" gitweb.output

would be

  +	grep ".git-$ID.tar.gz" gitweb.headers

Note that this would mean that t/t9501-gitweb-standalone-http-status.sh
should also be updated to use gitweb.headers and gitweb.body

> +'
> +test_debug 'cat gitweb.output'

Not a good idea in current state.  gitweb.output contains binary part,
and generally it is not a good idea to output binary files (which can
contain ANSI escape sequences) to terminal.

> +
> +test_expect_success 'snapshots: give short hash' '

  +test_expect_success 'snapshots: give short hash, get short hash' '

> +	ID=`git rev-parse --short HEAD` &&

Gitweb uses '--short=7'.  Shouldn't you use the same option here?

> +	gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" &&
> +	grep ".git-$ID.tar.gz" gitweb.output
> +'
> +test_debug 'cat gitweb.output'
> +
> +test_expect_success 'snapshots: give almost full hash' '

  +test_expect_success 'snapshots: give almost full hash, get short hash' '

> +	ID=`git rev-parse --short=30 HEAD` &&
> +	gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" &&
> +	ID=`git rev-parse --short HEAD` &&
> +	grep ".git-$ID.tar.gz" gitweb.output
> +'
> +test_debug 'cat gitweb.output'
> +
> +test_expect_success 'snapshots: give HEAD tree-ish' '

  +test_expect_success 'snapshots: give HEAD, get HEAD-<short hash>' '

> +	gitweb_run "p=.git;a=snapshot;h=HEAD;sf=tgz" &&
> +	ID=`git rev-parse --short HEAD` &&
> +	grep ".git-HEAD-$ID.tar.gz" gitweb.output
> +'
> +test_debug 'cat gitweb.output'
> +
> +test_expect_success 'snapshots: give branch name tree-ish' '

  +test_expect_success 'snapshots: give branch name, get <branch name>-<short hash>' '

> +	gitweb_run "p=.git;a=snapshot;h=master;sf=tgz" &&
> +	ID=`git rev-parse --short master` &&
> +	grep ".git-master-$ID.tar.gz" gitweb.output
> +'
> +test_debug 'cat gitweb.output'
> +
> +test_expect_success 'snapshots: give tag tree-ish' '

  +test_expect_success 'snapshots: give tag name, get <tag name>-<short hash>' '

> +	gitweb_run "p=.git;a=snapshot;h=SnapshotFileTests;sf=tgz" &&
> +	ID=`git rev-parse --short SnapshotFileTests` &&
> +	grep ".git-SnapshotFileTests-$ID.tar.gz" gitweb.output
> +'
> +test_debug 'cat gitweb.output'

Note that to avoid ambiguities currently gitweb uses refs/heads/master
and refs/tags/SnapshotFileTests... but dealing with this issue should be
left, I think, for separate commit.

> +
> +
> +test_done
> -- 
> 1.6.4.GIT
> 
> 
> 

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v5 2/2] gitweb: append short hash ids to snapshot files
  2009-10-05 10:06 ` Jakub Narebski
@ 2009-10-12 15:34   ` Mark Rada
  2009-10-13 23:46     ` Jakub Narebski
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Rada @ 2009-10-12 15:34 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Junio C Hamano, git

On 09-10-05 6:06 AM, Jakub Narebski wrote:

>>  	my $o_git_dir = $git_dir;
>>  	my $retval = undef;
>>  	$git_dir = "$projectroot/$project";
>> -	if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") {
>> -		my $head = <$fd>;
>> +	if (open my $fd, '-|', git_cmd(), 'rev-parse', '--verify', $hash) {
>> +		$hash = <$fd>;
>>  		close $fd;
>> -		if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) {
>> +		if (defined $hash && $hash =~ /^([0-9a-fA-F]{40})$/) {
>> +			$retval = $1;
>> +		}
> 
> I guess that you use "$retval = $1;" instead of just "$retval = $hash;"
> because of similarities with git_get_short_hash, isn't it?  Or it is just
> following earlier code?

Yeah, it is following earlier code, I did not change it, though the diff
seems to think I added it, perhaps this is a bug with diff?

>> +	}
>> +	if (defined $o_git_dir) {
>> +		$git_dir = $o_git_dir;
>> +	}
>> +	return $retval;
>> +}
>> +
>> +# try and get a shorter hash id
>> +sub git_get_short_hash {
>> +	my $project = shift;
>> +	my $hash = shift;
>> +	my $o_git_dir = $git_dir;
>> +	my $retval = undef;
>> +	$git_dir = "$projectroot/$project";
>> +	if (open my $fd, '-|', git_cmd(), 'rev-parse', '--short=7', $hash) {
>> +		$hash = <$fd>;
>> +		close $fd;
>> +		if (defined $hash && $hash =~ /^([0-9a-fA-F]{7,})$/) {
>>  			$retval = $1;
>>  		}
>>  	}
> 
> Note that git_get_full_hash (which additionally does verification) and
> git_get_short_hash share much of code.  Perhaps it might be worth to
> avoid code duplication somehow?  On the other hand it might be not worth
> to complicate code by trying to extract common parts here...

Hmm, I think it might be a good idea to just write a generic routine
that takes a hash length as an extra parameter. Then the short and full
hash fetching routines can just acts as wrappers.

>> @@ -5203,6 +5228,13 @@ sub git_snapshot {
>>  		die_error(400, 'Object is not a tree-ish');
>>  	}
>>  
>> +
>> +	my $full_hash = git_get_full_hash($project, $hash);
>> +	if ($full_hash =~ /^$hash/) {
>> +		$hash = git_get_short_hash($project, $hash);
>> +	} else {
>> +		$hash .= '-' . git_get_short_hash($project, $hash);
>> +	}
> 
> I think we might want to avoid calling git_get_full_hash (and extra call
> to "git rev-parse" command, which is extra fork) if we know in advance
> that  $full_hash =~ /^$hash/  can't be true, i.e. if $hash doesn't match
> /^[0-9a-fA-F]+$/.  That would require that we continue to use $hash
> and not $full_hash, see comment for the chunk below.
> 
> BTW do you think that having better name (nicer name in the case
> when $hash is full SHA-1, or name which describes exact version as 
> in the case when $hash is branch name or just 'HEAD') is worth
> slight extra cost of "git rev-parse --abbrev=7"?

Hmm, yeah, some optimization will have to occur in that block of
code. Though, my reason for that extra call to rev-parse to get the
short hash is so I can get git to find the shortest unique SHA-1,
instead of just assuming that it will always be of length 7. I think
the cost is not too bad considering a snapshot will have to be generated
and probably take way more time. Though, warthog9 has some caching
patches that work, so maybe it isn't worth it. Hmm...

>>  	my $name = $project;
>>  	$name =~ s,([^/])/*\.git$,$1,;
>>  	$name = basename($name);
>> @@ -5213,7 +5245,7 @@ sub git_snapshot {
>>  	$cmd = quote_command(
>>  		git_cmd(), 'archive',
>>  		"--format=$known_snapshot_formats{$format}{'format'}",
>> -		"--prefix=$name/", $hash);
>> +		"--prefix=$name/", $full_hash);
> 
> Why this change?

Since $hash can change by becoming something like 'HEAD-43ab5f2c' due to
the process of creating the better name we need to pass something to
`archive' that will be valid, and $full_hash will be valid.

>> +test_description='gitweb as standalone script (parsing script output).
>> +
>> +This test runs gitweb (git web interface) as a CGI script from the
>> +commandline, and checks that it produces the correct output, either
>> +in the HTTP header or the actual script output.'
> 
> Currently all tests here are about 'snapshot' action.  They are quite
> specific, and they do require some knowledge about chosen archive format.
> I think it would be better to put snapshot test into separate test,
> i.e. in 't/t9502-gitweb-standalone-snapshot.sh'.
> 

Ok.

>> +test_commit \
>> +	'SnapshotFileTests' \
>> +	'i can has snapshot?'
> 
> Errr... with filename [cutely] called 'i can has snapshot?' you would
> have, I guess, problems with tests on MS Windows, where IIRC '?' is
> forbidden in filenames.

I was able to confirm `?' as a forbidden file name character in Windows 7,
so I will have to change that...

> In the test below you use "git rev-parse --verify HEAD" and
> "git rev-parse --short HEAD" over and over.  I think it would be better
> to calculate them upfront:
> 
>   +test_expect_success 'calculate full and short ids' '
>   +	FULLID= $(git rev-parse --verify  HEAD) &&
>   +	SHORTID=$(git rev-parse --short=7 HEAD)
>   +'
> 

Ok.

>> +	ID=`git rev-parse --short HEAD` &&
>> +	grep ".git-$ID.tar.gz" gitweb.output
> 
> Here had to think a bit that gitweb.output consists both of HTTP headers,
> and of response body, and you are grepping here in the HTTP headers part.
> It would be better solution for gitweb_run to split gitweb.output into
> gitweb.headers and gitweb.body (perhaps if requested by setting some
> variable, e.g. GITWEB_SPLIT_OUTPUT).
> 
> It can be done using the following lines:
> 
> 	sed    -e '/^\r$/'      <gitweb.output >gitweb.headers
> 	sed -n -e '0,/^\r$/!p'  <gitweb.output >gitweb.body
> 
> 	# gitweb.headers is used to parse http headers
> 	# gitweb.body is response without http headers
> 
> But the second one uses GNU sed extension; I don't know how to write
> it in more portable way.

I like this and will try to find a way of setting this up without using
GNU extensions.

> Note that this would mean that t/t9501-gitweb-standalone-http-status.sh
> should also be updated to use gitweb.headers and gitweb.body

Yeah, now I have a few things mentioned by either you or Junio that I
should probably fix in the test cases I have submitted. I will clean
them up in a separate patch once I finish with this patch.

>> +	gitweb_run "p=.git;a=snapshot;h=SnapshotFileTests;sf=tgz" &&
>> +	ID=`git rev-parse --short SnapshotFileTests` &&
>> +	grep ".git-SnapshotFileTests-$ID.tar.gz" gitweb.output
>> +'
>> +test_debug 'cat gitweb.output'
> 
> Note that to avoid ambiguities currently gitweb uses refs/heads/master
> and refs/tags/SnapshotFileTests... but dealing with this issue should be
> left, I think, for separate commit.
> 

I do not understand what ambiguity exists, can you please explain this?


-- 
Mark Rada (ferrous26)
marada@uwaterloo.ca

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v5 2/2] gitweb: append short hash ids to snapshot files
  2009-10-12 15:34   ` Mark Rada
@ 2009-10-13 23:46     ` Jakub Narebski
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Narebski @ 2009-10-13 23:46 UTC (permalink / raw)
  To: Mark Rada; +Cc: Junio C Hamano, git

On Mon, 12 Oct 2009, Mark Rada wrote:
> On 09-10-05 6:06 AM, Jakub Narebski wrote:
> 
>>>  	my $o_git_dir = $git_dir;
>>>  	my $retval = undef;
>>>  	$git_dir = "$projectroot/$project";
>>> -	if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") {
>>> -		my $head = <$fd>;
>>> +	if (open my $fd, '-|', git_cmd(), 'rev-parse', '--verify', $hash) {
>>> +		$hash = <$fd>;
>>>  		close $fd;
>>> -		if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) {
>>> +		if (defined $hash && $hash =~ /^([0-9a-fA-F]{40})$/) {
>>> +			$retval = $1;
>>> +		}
>> 
>> I guess that you use "$retval = $1;" instead of just "$retval = $hash;"
>> because of similarities with git_get_short_hash, isn't it?  Or it is just
>> following earlier code?
> 
> Yeah, it is following earlier code, I did not change it, 

Ah, that's O.K.

Although if you plan refactoring this, you might fix this bit of
inefficiency (no need for capturing).

> though the diff seems to think I added it, perhaps this is a bug with
> diff? 

No, that is just diff being ambiguous, as there is more than one way
to generate diff of changes.  Perhaps patience diff would produce
better results, perhaps not.  It might mean that refactoring common
code is needed ;-)))))

>>> +	}
>>> +	if (defined $o_git_dir) {
>>> +		$git_dir = $o_git_dir;
>>> +	}
>>> +	return $retval;
>>> +}
>>> +
>>> +# try and get a shorter hash id
>>> +sub git_get_short_hash {
>>> +	my $project = shift;
>>> +	my $hash = shift;
>>> +	my $o_git_dir = $git_dir;
>>> +	my $retval = undef;
>>> +	$git_dir = "$projectroot/$project";
>>> +	if (open my $fd, '-|', git_cmd(), 'rev-parse', '--short=7', $hash) {
>>> +		$hash = <$fd>;
>>> +		close $fd;
>>> +		if (defined $hash && $hash =~ /^([0-9a-fA-F]{7,})$/) {
>>>  			$retval = $1;
>>>  		}
>>>  	}
>> 
>> Note that git_get_full_hash (which additionally does verification) and
>> git_get_short_hash share much of code.  Perhaps it might be worth to
>> avoid code duplication somehow?  On the other hand it might be not worth
>> to complicate code by trying to extract common parts here...
> 
> Hmm, I think it might be a good idea to just write a generic routine
> that takes a hash length as an extra parameter. Then the short and full
> hash fetching routines can just acts as wrappers.

Well, git_get_full_hash uses --verify, git_get_short_hash uses --short=7
(but perhaps it should also use --verify).

BTW. I think that checking that output of git-rev-parse is (shortened)
SHA-1 predates usage of --verify; with --verify is, I think, not
necessary:

 --verify
     The parameter given must be usable as a single, valid object name.
     Otherwise barf and abort.

>>> @@ -5203,6 +5228,13 @@ sub git_snapshot {
>>>  		die_error(400, 'Object is not a tree-ish');
>>>  	}
>>>  
>>> +
>>> +	my $full_hash = git_get_full_hash($project, $hash);
>>> +	if ($full_hash =~ /^$hash/) {

BTW, we can use  

        if (index($full_hash, $hash) == 0) {

instead.  BTW, $hash could contain regexp metacharacters like '.'
('dead.beef' is a valid branch name), so it should be

	if ($full_hash =~ /^\Q$hash/) {

if you want to use regexp (it might be easier to read).  

Or you can encapsulate this into is_substring() subroutine, but that
might be (well, almost surely is) overkill...

>>> +		$hash = git_get_short_hash($project, $hash);
>>> +	} else {
>>> +		$hash .= '-' . git_get_short_hash($project, $hash);
>>> +	}
>> 
>> I think we might want to avoid calling git_get_full_hash (and extra call
>> to "git rev-parse" command, which is extra fork) if we know in advance
>> that  $full_hash =~ /^$hash/  can't be true, i.e. if $hash doesn't match
>> /^[0-9a-fA-F]+$/.  That would require that we continue to use $hash
>> and not $full_hash, see comment for the chunk below.
>> 
>> BTW do you think that having better name (nicer name in the case
>> when $hash is full SHA-1, or name which describes exact version as 
>> in the case when $hash is branch name or just 'HEAD') is worth
>> slight extra cost of "git rev-parse --abbrev=7"?
> 
> Hmm, yeah, some optimization will have to occur in that block of
> code. Though, my reason for that extra call to rev-parse to get the
> short hash is so I can get git to find the shortest unique SHA-1,
> instead of just assuming that it will always be of length 7. I think
> the cost is not too bad considering a snapshot will have to be generated
> and probably take way more time. Though, warthog9 has some caching
> patches that work, so maybe it isn't worth it. Hmm...

What I meant here that unless $hash =~ /^[0-9a-fA-F]{7,}$/ then we 
always use git_get_short_hash, as $full_hash wouldn't match /^$hash/
($hash wouldn't be a prefix of $full_hash).  We don't need to
calculate git_get_full_hash which wouldn't be used (see also comment
below, though).

> 
>>>  	my $name = $project;
>>>  	$name =~ s,([^/])/*\.git$,$1,;
>>>  	$name = basename($name);
>>> @@ -5213,7 +5245,7 @@ sub git_snapshot {
>>>  	$cmd = quote_command(
>>>  		git_cmd(), 'archive',
>>>  		"--format=$known_snapshot_formats{$format}{'format'}",
>>> -		"--prefix=$name/", $hash);
>>> +		"--prefix=$name/", $full_hash);
>> 
>> Why this change?
> 
> Since $hash can change by becoming something like 'HEAD-43ab5f2c' due to
> the process of creating the better name we need to pass something to
> `archive' that will be valid, and $full_hash will be valid.

Errr... why it is called _$hash_ then, if it can be not hash?  Wouldn't
it be better to manipulate $name here?

I think this fragment should be extracted into snapshot_name() subroutine,
which result would be used both as proposed snapshot name, and as prefix
to be used.

> 
>>> +test_description='gitweb as standalone script (parsing script output).
>>> +
>>> +This test runs gitweb (git web interface) as a CGI script from the
>>> +commandline, and checks that it produces the correct output, either
>>> +in the HTTP header or the actual script output.'
>> 
>> Currently all tests here are about 'snapshot' action.  They are quite
>> specific, and they do require some knowledge about chosen archive format.

That is not true, as I haven't noticed at this point that you are 
examining only HTTP headers... but not the HTTP status but other headers.

[...]
>>> +	grep ".git-$ID.tar.gz" gitweb.output
>> 
>> Here had to think a bit that gitweb.output consists both of HTTP headers,
>> and of response body, and you are grepping here in the HTTP headers part.
>> It would be better solution for gitweb_run to split gitweb.output into
>> gitweb.headers and gitweb.body (perhaps if requested by setting some
>> variable, e.g. GITWEB_SPLIT_OUTPUT).
>> 
>> It can be done using the following lines:
>> 
>> 	sed    -e '/^\r$/'      <gitweb.output>gitweb.headers

That was meant to be

>> 	sed    -e '/^\r$/q'     <gitweb.output >gitweb.headers

which means print (the default action) until single empty CRLF terminated
line, which ends HTTP headers.

>> 	sed -n -e '0,/^\r$/!p'  <gitweb.output>gitweb.body
>> 
>> 	# gitweb.headers is used to parse http headers
>> 	# gitweb.body is response without http headers
>> 
>> But the second one uses GNU sed extension; I don't know how to write
>> it in more portable way.
> 
> I like this and will try to find a way of setting this up without using
> GNU extensions.

Well, we do know that there always would be at least one header, so we
can use:

 	sed -n -e '1,/^\r$/!p'  <gitweb.output >gitweb.body

But I'd prefer that somebody better versed in sed would come up with
solution to extract everything up to first empty CRLF terminated line,
and everything from such line till the end of file.

>> Note that to avoid ambiguities currently gitweb uses refs/heads/master
>> and refs/tags/SnapshotFileTests... but dealing with this issue should be
>> left, I think, for separate commit.
>> 
> 
> I do not understand what ambiguity exists, can you please explain this?

The problem I was thinking about is the following.

In commit bf901f8 (gitweb: disambiguate heads and tags withs the same
name, 2007-12-15) started to use refs/heads/<branch> and refs/tags/<tag>
instead of <branch> and <tag> because there was problem when there were
tag and branch with the same name.

The problem is that we can't use '/' in proposed snapshot file name,
and we shouldn't use '/' in git-archive prefix.  So we can't simply
use (as you proposed) 

  $hash . '-' . git_get_short_hash($project, $hash);

as a snapshot basename suffix, because $hash can be 'refs/heads/master',
or it can be 'mr/gitweb-snapshot'.  What to do, what to do...


Also if $hash is refs/tags/v1.6.0, we don't really need shortened SHA-1
suffix.  

Alternative to checking for refs/tags/ prefix would be to use
git-describe output... perhaps.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-10-13 23:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-26 17:46 [PATCH v5 2/2] gitweb: append short hash ids to snapshot files Mark Rada
2009-10-05 10:06 ` Jakub Narebski
2009-10-12 15:34   ` Mark Rada
2009-10-13 23:46     ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).