* [PATCH v5 2/2] gitweb: append short hash ids to snapshot files @ 2009-09-26 17:46 Mark Rada 2009-10-05 10:06 ` Jakub Narebski 0 siblings, 1 reply; 4+ messages in thread From: Mark Rada @ 2009-09-26 17:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jakub Narebski, git Teach gitweb how to produce nicer snapshot names by only using the short hash id. If clients make requests using a tree-ish that is not a partial or full SHA-1 hash, then the short hash will also be appended to whatever they asked for. This also includes tests cases for t9502-gitweb-standalone-parse-output. Signed-off-by: Mark Rada <marada@uwaterloo.ca> --- Changes since v4: - moved git_get_full_hash into this commit - changed test case format, suggested by Junio - explicity request at least a length of 7 for short hashes gitweb/gitweb.perl | 40 +++++++++++++++-- t/t9502-gitweb-standalone-parse-output.sh | 67 +++++++++++++++++++++++++++++ 2 files changed, 103 insertions(+), 4 deletions(-) create mode 100644 t/t9502-gitweb-standalone-parse-output.sh diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 8d4a2ae..bc132a5 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -1983,14 +1983,39 @@ sub quote_command { # get HEAD ref of given project as hash sub git_get_head_hash { + return git_get_full_hash(shift, 'HEAD'); +} + +sub git_get_full_hash { my $project = shift; + my $hash = shift; my $o_git_dir = $git_dir; my $retval = undef; $git_dir = "$projectroot/$project"; - if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") { - my $head = <$fd>; + if (open my $fd, '-|', git_cmd(), 'rev-parse', '--verify', $hash) { + $hash = <$fd>; close $fd; - if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) { + if (defined $hash && $hash =~ /^([0-9a-fA-F]{40})$/) { + $retval = $1; + } + } + if (defined $o_git_dir) { + $git_dir = $o_git_dir; + } + return $retval; +} + +# try and get a shorter hash id +sub git_get_short_hash { + my $project = shift; + my $hash = shift; + my $o_git_dir = $git_dir; + my $retval = undef; + $git_dir = "$projectroot/$project"; + if (open my $fd, '-|', git_cmd(), 'rev-parse', '--short=7', $hash) { + $hash = <$fd>; + close $fd; + if (defined $hash && $hash =~ /^([0-9a-fA-F]{7,})$/) { $retval = $1; } } @@ -5203,6 +5228,13 @@ sub git_snapshot { die_error(400, 'Object is not a tree-ish'); } + + my $full_hash = git_get_full_hash($project, $hash); + if ($full_hash =~ /^$hash/) { + $hash = git_get_short_hash($project, $hash); + } else { + $hash .= '-' . git_get_short_hash($project, $hash); + } my $name = $project; $name =~ s,([^/])/*\.git$,$1,; $name = basename($name); @@ -5213,7 +5245,7 @@ sub git_snapshot { $cmd = quote_command( git_cmd(), 'archive', "--format=$known_snapshot_formats{$format}{'format'}", - "--prefix=$name/", $hash); + "--prefix=$name/", $full_hash); if (exists $known_snapshot_formats{$format}{'compressor'}) { $cmd .= ' | ' . quote_command(@{$known_snapshot_formats{$format}{'compressor'}}); } diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh new file mode 100644 index 0000000..5f2b1d5 --- /dev/null +++ b/t/t9502-gitweb-standalone-parse-output.sh @@ -0,0 +1,67 @@ +#!/bin/sh +# +# Copyright (c) 2009 Mark Rada +# + +test_description='gitweb as standalone script (parsing script output). + +This test runs gitweb (git web interface) as a CGI script from the +commandline, and checks that it produces the correct output, either +in the HTTP header or the actual script output.' + + +. ./gitweb-lib.sh + +# ---------------------------------------------------------------------- +# snapshot file name + +test_commit \ + 'SnapshotFileTests' \ + 'i can has snapshot?' + +test_expect_success 'snapshots: give full hash' ' + ID=`git rev-parse --verify HEAD` && + gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" && + ID=`git rev-parse --short HEAD` && + grep ".git-$ID.tar.gz" gitweb.output +' +test_debug 'cat gitweb.output' + +test_expect_success 'snapshots: give short hash' ' + ID=`git rev-parse --short HEAD` && + gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" && + grep ".git-$ID.tar.gz" gitweb.output +' +test_debug 'cat gitweb.output' + +test_expect_success 'snapshots: give almost full hash' ' + ID=`git rev-parse --short=30 HEAD` && + gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" && + ID=`git rev-parse --short HEAD` && + grep ".git-$ID.tar.gz" gitweb.output +' +test_debug 'cat gitweb.output' + +test_expect_success 'snapshots: give HEAD tree-ish' ' + gitweb_run "p=.git;a=snapshot;h=HEAD;sf=tgz" && + ID=`git rev-parse --short HEAD` && + grep ".git-HEAD-$ID.tar.gz" gitweb.output +' +test_debug 'cat gitweb.output' + +test_expect_success 'snapshots: give branch name tree-ish' ' + gitweb_run "p=.git;a=snapshot;h=master;sf=tgz" && + ID=`git rev-parse --short master` && + grep ".git-master-$ID.tar.gz" gitweb.output +' +test_debug 'cat gitweb.output' + +test_expect_success 'snapshots: give tag tree-ish' ' + gitweb_run "p=.git;a=snapshot;h=SnapshotFileTests;sf=tgz" && + ID=`git rev-parse --short SnapshotFileTests` && + grep ".git-SnapshotFileTests-$ID.tar.gz" gitweb.output +' +test_debug 'cat gitweb.output' + + +test_done -- 1.6.4.GIT ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v5 2/2] gitweb: append short hash ids to snapshot files 2009-09-26 17:46 [PATCH v5 2/2] gitweb: append short hash ids to snapshot files Mark Rada @ 2009-10-05 10:06 ` Jakub Narebski 2009-10-12 15:34 ` Mark Rada 0 siblings, 1 reply; 4+ messages in thread From: Jakub Narebski @ 2009-10-05 10:06 UTC (permalink / raw) To: Mark Rada; +Cc: Junio C Hamano, git I am sorry for being late with review of this patch. On Sat, 26 Sep 2009, Mark Rada wrote: > Teach gitweb how to produce nicer snapshot names by only using the > short hash id. A few questions (which I think should be answered in this commit message). First, what was original behaviour of 'snapshot' action? Did gitweb always convert 'h' (hash) parameter to full SHA-1? Second, do you preserve 'snapshot' behavior that it generated archive which unpacks to the directory with the same name as basename of (proposed) archive name? I mean here that "repo-2cc6859e.tar.gz" unpacks to "repo-2cc6859e/" directory. I guess it does, but this should be mentioned in the commit message. > If clients make requests using a tree-ish that is not a > partial or full SHA-1 hash, then the short hash will also be appended > to whatever they asked for. Here example would be a good idea. I guess this means that if one is requesting for snapshot of 'next' or 'v1.6.0' of 'repo.git' project, one would get 'repo-next-2cc6859.tar.gz' or 'repo-v1.6.0-2cc6859.tar.gz' as [proposed] snapshot filename. > > This also includes tests cases for t9502-gitweb-standalone-parse-output. I am not sure if it shouldn't be rather t9502-gitweb-standalone-snapshot test; see comments below for whys. > > Signed-off-by: Mark Rada <marada@uwaterloo.ca> > --- > > > Changes since v4: > - moved git_get_full_hash into this commit > - changed test case format, suggested by Junio > - explicity request at least a length of 7 for short hashes > > > gitweb/gitweb.perl | 40 +++++++++++++++-- > t/t9502-gitweb-standalone-parse-output.sh | 67 +++++++++++++++++++++++++++++ > 2 files changed, 103 insertions(+), 4 deletions(-) > create mode 100644 t/t9502-gitweb-standalone-parse-output.sh > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 8d4a2ae..bc132a5 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -1983,14 +1983,39 @@ sub quote_command { > > # get HEAD ref of given project as hash > sub git_get_head_hash { > + return git_get_full_hash(shift, 'HEAD'); > +} Good. This means that we don't need to change code that do not use new feature (new subroutines). It's nice to cater to backward compatibility, especially if it is so low cost as this one. > + > +sub git_get_full_hash { > my $project = shift; > + my $hash = shift; I think it might be good idea here to default to 'HEAD', i.e. + my $hash = shift || 'HEAD'; > my $o_git_dir = $git_dir; > my $retval = undef; > $git_dir = "$projectroot/$project"; > - if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") { > - my $head = <$fd>; > + if (open my $fd, '-|', git_cmd(), 'rev-parse', '--verify', $hash) { > + $hash = <$fd>; > close $fd; > - if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) { > + if (defined $hash && $hash =~ /^([0-9a-fA-F]{40})$/) { > + $retval = $1; > + } I guess that you use "$retval = $1;" instead of just "$retval = $hash;" because of similarities with git_get_short_hash, isn't it? Or it is just following earlier code? > + } > + if (defined $o_git_dir) { > + $git_dir = $o_git_dir; > + } > + return $retval; > +} > + > +# try and get a shorter hash id > +sub git_get_short_hash { > + my $project = shift; > + my $hash = shift; > + my $o_git_dir = $git_dir; > + my $retval = undef; > + $git_dir = "$projectroot/$project"; > + if (open my $fd, '-|', git_cmd(), 'rev-parse', '--short=7', $hash) { > + $hash = <$fd>; > + close $fd; > + if (defined $hash && $hash =~ /^([0-9a-fA-F]{7,})$/) { > $retval = $1; > } > } Note that git_get_full_hash (which additionally does verification) and git_get_short_hash share much of code. Perhaps it might be worth to avoid code duplication somehow? On the other hand it might be not worth to complicate code by trying to extract common parts here... > @@ -5203,6 +5228,13 @@ sub git_snapshot { > die_error(400, 'Object is not a tree-ish'); > } > > + > + my $full_hash = git_get_full_hash($project, $hash); > + if ($full_hash =~ /^$hash/) { > + $hash = git_get_short_hash($project, $hash); > + } else { > + $hash .= '-' . git_get_short_hash($project, $hash); > + } I think we might want to avoid calling git_get_full_hash (and extra call to "git rev-parse" command, which is extra fork) if we know in advance that $full_hash =~ /^$hash/ can't be true, i.e. if $hash doesn't match /^[0-9a-fA-F]+$/. That would require that we continue to use $hash and not $full_hash, see comment for the chunk below. BTW do you think that having better name (nicer name in the case when $hash is full SHA-1, or name which describes exact version as in the case when $hash is branch name or just 'HEAD') is worth slight extra cost of "git rev-parse --abbrev=7"? > my $name = $project; > $name =~ s,([^/])/*\.git$,$1,; > $name = basename($name); > @@ -5213,7 +5245,7 @@ sub git_snapshot { > $cmd = quote_command( > git_cmd(), 'archive', > "--format=$known_snapshot_formats{$format}{'format'}", > - "--prefix=$name/", $hash); > + "--prefix=$name/", $full_hash); Why this change? > if (exists $known_snapshot_formats{$format}{'compressor'}) { > $cmd .= ' | ' . quote_command(@{$known_snapshot_formats{$format}{'compressor'}}); > } > diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh > new file mode 100644 > index 0000000..5f2b1d5 > --- /dev/null > +++ b/t/t9502-gitweb-standalone-parse-output.sh > @@ -0,0 +1,67 @@ > +#!/bin/sh > +# > +# Copyright (c) 2009 Mark Rada > +# > + > +test_description='gitweb as standalone script (parsing script output). > + > +This test runs gitweb (git web interface) as a CGI script from the > +commandline, and checks that it produces the correct output, either > +in the HTTP header or the actual script output.' Currently all tests here are about 'snapshot' action. They are quite specific, and they do require some knowledge about chosen archive format. I think it would be better to put snapshot test into separate test, i.e. in 't/t9502-gitweb-standalone-snapshot.sh'. > + > + > +. ./gitweb-lib.sh > + > +# ---------------------------------------------------------------------- > +# snapshot file name > + > +test_commit \ > + 'SnapshotFileTests' \ > + 'i can has snapshot?' Errr... with filename [cutely] called 'i can has snapshot?' you would have, I guess, problems with tests on MS Windows, where IIRC '?' is forbidden in filenames. Perhaps +test_commit \ + 'Initial commit' \ + 'foo' or +test_commit \ + 'SnapshotFileTests' \ + 'foo' 'i can has snapshot?' > + In the test below you use "git rev-parse --verify HEAD" and "git rev-parse --short HEAD" over and over. I think it would be better to calculate them upfront: +test_expect_success 'calculate full and short ids' ' + FULLID= $(git rev-parse --verify HEAD) && + SHORTID=$(git rev-parse --short=7 HEAD) +' > +test_expect_success 'snapshots: give full hash' ' You test here that giving full has works, and that gitweb uses short hash in file name. Better name would therefore be something like +test_expect_success 'snapshots: give full hash, get short hash' ' > + ID=`git rev-parse --verify HEAD` && > + gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" && Here I had to remember that 'tgz' snapshot format is enabled by default. I think it could be better to explicitly enable it in preparation step. > + ID=`git rev-parse --short HEAD` && > + grep ".git-$ID.tar.gz" gitweb.output Here had to think a bit that gitweb.output consists both of HTTP headers, and of response body, and you are grepping here in the HTTP headers part. It would be better solution for gitweb_run to split gitweb.output into gitweb.headers and gitweb.body (perhaps if requested by setting some variable, e.g. GITWEB_SPLIT_OUTPUT). It can be done using the following lines: sed -e '/^\r$/' <gitweb.output >gitweb.headers sed -n -e '0,/^\r$/!p' <gitweb.output >gitweb.body # gitweb.headers is used to parse http headers # gitweb.body is response without http headers But the second one uses GNU sed extension; I don't know how to write it in more portable way. Then > + grep ".git-$ID.tar.gz" gitweb.output would be + grep ".git-$ID.tar.gz" gitweb.headers Note that this would mean that t/t9501-gitweb-standalone-http-status.sh should also be updated to use gitweb.headers and gitweb.body > +' > +test_debug 'cat gitweb.output' Not a good idea in current state. gitweb.output contains binary part, and generally it is not a good idea to output binary files (which can contain ANSI escape sequences) to terminal. > + > +test_expect_success 'snapshots: give short hash' ' +test_expect_success 'snapshots: give short hash, get short hash' ' > + ID=`git rev-parse --short HEAD` && Gitweb uses '--short=7'. Shouldn't you use the same option here? > + gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" && > + grep ".git-$ID.tar.gz" gitweb.output > +' > +test_debug 'cat gitweb.output' > + > +test_expect_success 'snapshots: give almost full hash' ' +test_expect_success 'snapshots: give almost full hash, get short hash' ' > + ID=`git rev-parse --short=30 HEAD` && > + gitweb_run "p=.git;a=snapshot;h=$ID;sf=tgz" && > + ID=`git rev-parse --short HEAD` && > + grep ".git-$ID.tar.gz" gitweb.output > +' > +test_debug 'cat gitweb.output' > + > +test_expect_success 'snapshots: give HEAD tree-ish' ' +test_expect_success 'snapshots: give HEAD, get HEAD-<short hash>' ' > + gitweb_run "p=.git;a=snapshot;h=HEAD;sf=tgz" && > + ID=`git rev-parse --short HEAD` && > + grep ".git-HEAD-$ID.tar.gz" gitweb.output > +' > +test_debug 'cat gitweb.output' > + > +test_expect_success 'snapshots: give branch name tree-ish' ' +test_expect_success 'snapshots: give branch name, get <branch name>-<short hash>' ' > + gitweb_run "p=.git;a=snapshot;h=master;sf=tgz" && > + ID=`git rev-parse --short master` && > + grep ".git-master-$ID.tar.gz" gitweb.output > +' > +test_debug 'cat gitweb.output' > + > +test_expect_success 'snapshots: give tag tree-ish' ' +test_expect_success 'snapshots: give tag name, get <tag name>-<short hash>' ' > + gitweb_run "p=.git;a=snapshot;h=SnapshotFileTests;sf=tgz" && > + ID=`git rev-parse --short SnapshotFileTests` && > + grep ".git-SnapshotFileTests-$ID.tar.gz" gitweb.output > +' > +test_debug 'cat gitweb.output' Note that to avoid ambiguities currently gitweb uses refs/heads/master and refs/tags/SnapshotFileTests... but dealing with this issue should be left, I think, for separate commit. > + > + > +test_done > -- > 1.6.4.GIT > > > -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v5 2/2] gitweb: append short hash ids to snapshot files 2009-10-05 10:06 ` Jakub Narebski @ 2009-10-12 15:34 ` Mark Rada 2009-10-13 23:46 ` Jakub Narebski 0 siblings, 1 reply; 4+ messages in thread From: Mark Rada @ 2009-10-12 15:34 UTC (permalink / raw) To: Jakub Narebski; +Cc: Junio C Hamano, git On 09-10-05 6:06 AM, Jakub Narebski wrote: >> my $o_git_dir = $git_dir; >> my $retval = undef; >> $git_dir = "$projectroot/$project"; >> - if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") { >> - my $head = <$fd>; >> + if (open my $fd, '-|', git_cmd(), 'rev-parse', '--verify', $hash) { >> + $hash = <$fd>; >> close $fd; >> - if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) { >> + if (defined $hash && $hash =~ /^([0-9a-fA-F]{40})$/) { >> + $retval = $1; >> + } > > I guess that you use "$retval = $1;" instead of just "$retval = $hash;" > because of similarities with git_get_short_hash, isn't it? Or it is just > following earlier code? Yeah, it is following earlier code, I did not change it, though the diff seems to think I added it, perhaps this is a bug with diff? >> + } >> + if (defined $o_git_dir) { >> + $git_dir = $o_git_dir; >> + } >> + return $retval; >> +} >> + >> +# try and get a shorter hash id >> +sub git_get_short_hash { >> + my $project = shift; >> + my $hash = shift; >> + my $o_git_dir = $git_dir; >> + my $retval = undef; >> + $git_dir = "$projectroot/$project"; >> + if (open my $fd, '-|', git_cmd(), 'rev-parse', '--short=7', $hash) { >> + $hash = <$fd>; >> + close $fd; >> + if (defined $hash && $hash =~ /^([0-9a-fA-F]{7,})$/) { >> $retval = $1; >> } >> } > > Note that git_get_full_hash (which additionally does verification) and > git_get_short_hash share much of code. Perhaps it might be worth to > avoid code duplication somehow? On the other hand it might be not worth > to complicate code by trying to extract common parts here... Hmm, I think it might be a good idea to just write a generic routine that takes a hash length as an extra parameter. Then the short and full hash fetching routines can just acts as wrappers. >> @@ -5203,6 +5228,13 @@ sub git_snapshot { >> die_error(400, 'Object is not a tree-ish'); >> } >> >> + >> + my $full_hash = git_get_full_hash($project, $hash); >> + if ($full_hash =~ /^$hash/) { >> + $hash = git_get_short_hash($project, $hash); >> + } else { >> + $hash .= '-' . git_get_short_hash($project, $hash); >> + } > > I think we might want to avoid calling git_get_full_hash (and extra call > to "git rev-parse" command, which is extra fork) if we know in advance > that $full_hash =~ /^$hash/ can't be true, i.e. if $hash doesn't match > /^[0-9a-fA-F]+$/. That would require that we continue to use $hash > and not $full_hash, see comment for the chunk below. > > BTW do you think that having better name (nicer name in the case > when $hash is full SHA-1, or name which describes exact version as > in the case when $hash is branch name or just 'HEAD') is worth > slight extra cost of "git rev-parse --abbrev=7"? Hmm, yeah, some optimization will have to occur in that block of code. Though, my reason for that extra call to rev-parse to get the short hash is so I can get git to find the shortest unique SHA-1, instead of just assuming that it will always be of length 7. I think the cost is not too bad considering a snapshot will have to be generated and probably take way more time. Though, warthog9 has some caching patches that work, so maybe it isn't worth it. Hmm... >> my $name = $project; >> $name =~ s,([^/])/*\.git$,$1,; >> $name = basename($name); >> @@ -5213,7 +5245,7 @@ sub git_snapshot { >> $cmd = quote_command( >> git_cmd(), 'archive', >> "--format=$known_snapshot_formats{$format}{'format'}", >> - "--prefix=$name/", $hash); >> + "--prefix=$name/", $full_hash); > > Why this change? Since $hash can change by becoming something like 'HEAD-43ab5f2c' due to the process of creating the better name we need to pass something to `archive' that will be valid, and $full_hash will be valid. >> +test_description='gitweb as standalone script (parsing script output). >> + >> +This test runs gitweb (git web interface) as a CGI script from the >> +commandline, and checks that it produces the correct output, either >> +in the HTTP header or the actual script output.' > > Currently all tests here are about 'snapshot' action. They are quite > specific, and they do require some knowledge about chosen archive format. > I think it would be better to put snapshot test into separate test, > i.e. in 't/t9502-gitweb-standalone-snapshot.sh'. > Ok. >> +test_commit \ >> + 'SnapshotFileTests' \ >> + 'i can has snapshot?' > > Errr... with filename [cutely] called 'i can has snapshot?' you would > have, I guess, problems with tests on MS Windows, where IIRC '?' is > forbidden in filenames. I was able to confirm `?' as a forbidden file name character in Windows 7, so I will have to change that... > In the test below you use "git rev-parse --verify HEAD" and > "git rev-parse --short HEAD" over and over. I think it would be better > to calculate them upfront: > > +test_expect_success 'calculate full and short ids' ' > + FULLID= $(git rev-parse --verify HEAD) && > + SHORTID=$(git rev-parse --short=7 HEAD) > +' > Ok. >> + ID=`git rev-parse --short HEAD` && >> + grep ".git-$ID.tar.gz" gitweb.output > > Here had to think a bit that gitweb.output consists both of HTTP headers, > and of response body, and you are grepping here in the HTTP headers part. > It would be better solution for gitweb_run to split gitweb.output into > gitweb.headers and gitweb.body (perhaps if requested by setting some > variable, e.g. GITWEB_SPLIT_OUTPUT). > > It can be done using the following lines: > > sed -e '/^\r$/' <gitweb.output >gitweb.headers > sed -n -e '0,/^\r$/!p' <gitweb.output >gitweb.body > > # gitweb.headers is used to parse http headers > # gitweb.body is response without http headers > > But the second one uses GNU sed extension; I don't know how to write > it in more portable way. I like this and will try to find a way of setting this up without using GNU extensions. > Note that this would mean that t/t9501-gitweb-standalone-http-status.sh > should also be updated to use gitweb.headers and gitweb.body Yeah, now I have a few things mentioned by either you or Junio that I should probably fix in the test cases I have submitted. I will clean them up in a separate patch once I finish with this patch. >> + gitweb_run "p=.git;a=snapshot;h=SnapshotFileTests;sf=tgz" && >> + ID=`git rev-parse --short SnapshotFileTests` && >> + grep ".git-SnapshotFileTests-$ID.tar.gz" gitweb.output >> +' >> +test_debug 'cat gitweb.output' > > Note that to avoid ambiguities currently gitweb uses refs/heads/master > and refs/tags/SnapshotFileTests... but dealing with this issue should be > left, I think, for separate commit. > I do not understand what ambiguity exists, can you please explain this? -- Mark Rada (ferrous26) marada@uwaterloo.ca ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v5 2/2] gitweb: append short hash ids to snapshot files 2009-10-12 15:34 ` Mark Rada @ 2009-10-13 23:46 ` Jakub Narebski 0 siblings, 0 replies; 4+ messages in thread From: Jakub Narebski @ 2009-10-13 23:46 UTC (permalink / raw) To: Mark Rada; +Cc: Junio C Hamano, git On Mon, 12 Oct 2009, Mark Rada wrote: > On 09-10-05 6:06 AM, Jakub Narebski wrote: > >>> my $o_git_dir = $git_dir; >>> my $retval = undef; >>> $git_dir = "$projectroot/$project"; >>> - if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") { >>> - my $head = <$fd>; >>> + if (open my $fd, '-|', git_cmd(), 'rev-parse', '--verify', $hash) { >>> + $hash = <$fd>; >>> close $fd; >>> - if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) { >>> + if (defined $hash && $hash =~ /^([0-9a-fA-F]{40})$/) { >>> + $retval = $1; >>> + } >> >> I guess that you use "$retval = $1;" instead of just "$retval = $hash;" >> because of similarities with git_get_short_hash, isn't it? Or it is just >> following earlier code? > > Yeah, it is following earlier code, I did not change it, Ah, that's O.K. Although if you plan refactoring this, you might fix this bit of inefficiency (no need for capturing). > though the diff seems to think I added it, perhaps this is a bug with > diff? No, that is just diff being ambiguous, as there is more than one way to generate diff of changes. Perhaps patience diff would produce better results, perhaps not. It might mean that refactoring common code is needed ;-))))) >>> + } >>> + if (defined $o_git_dir) { >>> + $git_dir = $o_git_dir; >>> + } >>> + return $retval; >>> +} >>> + >>> +# try and get a shorter hash id >>> +sub git_get_short_hash { >>> + my $project = shift; >>> + my $hash = shift; >>> + my $o_git_dir = $git_dir; >>> + my $retval = undef; >>> + $git_dir = "$projectroot/$project"; >>> + if (open my $fd, '-|', git_cmd(), 'rev-parse', '--short=7', $hash) { >>> + $hash = <$fd>; >>> + close $fd; >>> + if (defined $hash && $hash =~ /^([0-9a-fA-F]{7,})$/) { >>> $retval = $1; >>> } >>> } >> >> Note that git_get_full_hash (which additionally does verification) and >> git_get_short_hash share much of code. Perhaps it might be worth to >> avoid code duplication somehow? On the other hand it might be not worth >> to complicate code by trying to extract common parts here... > > Hmm, I think it might be a good idea to just write a generic routine > that takes a hash length as an extra parameter. Then the short and full > hash fetching routines can just acts as wrappers. Well, git_get_full_hash uses --verify, git_get_short_hash uses --short=7 (but perhaps it should also use --verify). BTW. I think that checking that output of git-rev-parse is (shortened) SHA-1 predates usage of --verify; with --verify is, I think, not necessary: --verify The parameter given must be usable as a single, valid object name. Otherwise barf and abort. >>> @@ -5203,6 +5228,13 @@ sub git_snapshot { >>> die_error(400, 'Object is not a tree-ish'); >>> } >>> >>> + >>> + my $full_hash = git_get_full_hash($project, $hash); >>> + if ($full_hash =~ /^$hash/) { BTW, we can use if (index($full_hash, $hash) == 0) { instead. BTW, $hash could contain regexp metacharacters like '.' ('dead.beef' is a valid branch name), so it should be if ($full_hash =~ /^\Q$hash/) { if you want to use regexp (it might be easier to read). Or you can encapsulate this into is_substring() subroutine, but that might be (well, almost surely is) overkill... >>> + $hash = git_get_short_hash($project, $hash); >>> + } else { >>> + $hash .= '-' . git_get_short_hash($project, $hash); >>> + } >> >> I think we might want to avoid calling git_get_full_hash (and extra call >> to "git rev-parse" command, which is extra fork) if we know in advance >> that $full_hash =~ /^$hash/ can't be true, i.e. if $hash doesn't match >> /^[0-9a-fA-F]+$/. That would require that we continue to use $hash >> and not $full_hash, see comment for the chunk below. >> >> BTW do you think that having better name (nicer name in the case >> when $hash is full SHA-1, or name which describes exact version as >> in the case when $hash is branch name or just 'HEAD') is worth >> slight extra cost of "git rev-parse --abbrev=7"? > > Hmm, yeah, some optimization will have to occur in that block of > code. Though, my reason for that extra call to rev-parse to get the > short hash is so I can get git to find the shortest unique SHA-1, > instead of just assuming that it will always be of length 7. I think > the cost is not too bad considering a snapshot will have to be generated > and probably take way more time. Though, warthog9 has some caching > patches that work, so maybe it isn't worth it. Hmm... What I meant here that unless $hash =~ /^[0-9a-fA-F]{7,}$/ then we always use git_get_short_hash, as $full_hash wouldn't match /^$hash/ ($hash wouldn't be a prefix of $full_hash). We don't need to calculate git_get_full_hash which wouldn't be used (see also comment below, though). > >>> my $name = $project; >>> $name =~ s,([^/])/*\.git$,$1,; >>> $name = basename($name); >>> @@ -5213,7 +5245,7 @@ sub git_snapshot { >>> $cmd = quote_command( >>> git_cmd(), 'archive', >>> "--format=$known_snapshot_formats{$format}{'format'}", >>> - "--prefix=$name/", $hash); >>> + "--prefix=$name/", $full_hash); >> >> Why this change? > > Since $hash can change by becoming something like 'HEAD-43ab5f2c' due to > the process of creating the better name we need to pass something to > `archive' that will be valid, and $full_hash will be valid. Errr... why it is called _$hash_ then, if it can be not hash? Wouldn't it be better to manipulate $name here? I think this fragment should be extracted into snapshot_name() subroutine, which result would be used both as proposed snapshot name, and as prefix to be used. > >>> +test_description='gitweb as standalone script (parsing script output). >>> + >>> +This test runs gitweb (git web interface) as a CGI script from the >>> +commandline, and checks that it produces the correct output, either >>> +in the HTTP header or the actual script output.' >> >> Currently all tests here are about 'snapshot' action. They are quite >> specific, and they do require some knowledge about chosen archive format. That is not true, as I haven't noticed at this point that you are examining only HTTP headers... but not the HTTP status but other headers. [...] >>> + grep ".git-$ID.tar.gz" gitweb.output >> >> Here had to think a bit that gitweb.output consists both of HTTP headers, >> and of response body, and you are grepping here in the HTTP headers part. >> It would be better solution for gitweb_run to split gitweb.output into >> gitweb.headers and gitweb.body (perhaps if requested by setting some >> variable, e.g. GITWEB_SPLIT_OUTPUT). >> >> It can be done using the following lines: >> >> sed -e '/^\r$/' <gitweb.output>gitweb.headers That was meant to be >> sed -e '/^\r$/q' <gitweb.output >gitweb.headers which means print (the default action) until single empty CRLF terminated line, which ends HTTP headers. >> sed -n -e '0,/^\r$/!p' <gitweb.output>gitweb.body >> >> # gitweb.headers is used to parse http headers >> # gitweb.body is response without http headers >> >> But the second one uses GNU sed extension; I don't know how to write >> it in more portable way. > > I like this and will try to find a way of setting this up without using > GNU extensions. Well, we do know that there always would be at least one header, so we can use: sed -n -e '1,/^\r$/!p' <gitweb.output >gitweb.body But I'd prefer that somebody better versed in sed would come up with solution to extract everything up to first empty CRLF terminated line, and everything from such line till the end of file. >> Note that to avoid ambiguities currently gitweb uses refs/heads/master >> and refs/tags/SnapshotFileTests... but dealing with this issue should be >> left, I think, for separate commit. >> > > I do not understand what ambiguity exists, can you please explain this? The problem I was thinking about is the following. In commit bf901f8 (gitweb: disambiguate heads and tags withs the same name, 2007-12-15) started to use refs/heads/<branch> and refs/tags/<tag> instead of <branch> and <tag> because there was problem when there were tag and branch with the same name. The problem is that we can't use '/' in proposed snapshot file name, and we shouldn't use '/' in git-archive prefix. So we can't simply use (as you proposed) $hash . '-' . git_get_short_hash($project, $hash); as a snapshot basename suffix, because $hash can be 'refs/heads/master', or it can be 'mr/gitweb-snapshot'. What to do, what to do... Also if $hash is refs/tags/v1.6.0, we don't really need shortened SHA-1 suffix. Alternative to checking for refs/tags/ prefix would be to use git-describe output... perhaps. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-10-13 23:52 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-09-26 17:46 [PATCH v5 2/2] gitweb: append short hash ids to snapshot files Mark Rada 2009-10-05 10:06 ` Jakub Narebski 2009-10-12 15:34 ` Mark Rada 2009-10-13 23:46 ` Jakub Narebski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).