From: Mark Rada <marada@uwaterloo.ca>
To: Jakub Narebski <jnareb@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH v5 2/2] gitweb: append short hash ids to snapshot files
Date: Mon, 12 Oct 2009 11:34:43 -0400 [thread overview]
Message-ID: <4AD34C93.20605@mailservices.uwaterloo.ca> (raw)
In-Reply-To: <200910051206.18943.jnareb@gmail.com>
On 09-10-05 6:06 AM, Jakub Narebski wrote:
>> my $o_git_dir = $git_dir;
>> my $retval = undef;
>> $git_dir = "$projectroot/$project";
>> - if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") {
>> - my $head = <$fd>;
>> + if (open my $fd, '-|', git_cmd(), 'rev-parse', '--verify', $hash) {
>> + $hash = <$fd>;
>> close $fd;
>> - if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) {
>> + if (defined $hash && $hash =~ /^([0-9a-fA-F]{40})$/) {
>> + $retval = $1;
>> + }
>
> I guess that you use "$retval = $1;" instead of just "$retval = $hash;"
> because of similarities with git_get_short_hash, isn't it? Or it is just
> following earlier code?
Yeah, it is following earlier code, I did not change it, though the diff
seems to think I added it, perhaps this is a bug with diff?
>> + }
>> + if (defined $o_git_dir) {
>> + $git_dir = $o_git_dir;
>> + }
>> + return $retval;
>> +}
>> +
>> +# try and get a shorter hash id
>> +sub git_get_short_hash {
>> + my $project = shift;
>> + my $hash = shift;
>> + my $o_git_dir = $git_dir;
>> + my $retval = undef;
>> + $git_dir = "$projectroot/$project";
>> + if (open my $fd, '-|', git_cmd(), 'rev-parse', '--short=7', $hash) {
>> + $hash = <$fd>;
>> + close $fd;
>> + if (defined $hash && $hash =~ /^([0-9a-fA-F]{7,})$/) {
>> $retval = $1;
>> }
>> }
>
> Note that git_get_full_hash (which additionally does verification) and
> git_get_short_hash share much of code. Perhaps it might be worth to
> avoid code duplication somehow? On the other hand it might be not worth
> to complicate code by trying to extract common parts here...
Hmm, I think it might be a good idea to just write a generic routine
that takes a hash length as an extra parameter. Then the short and full
hash fetching routines can just acts as wrappers.
>> @@ -5203,6 +5228,13 @@ sub git_snapshot {
>> die_error(400, 'Object is not a tree-ish');
>> }
>>
>> +
>> + my $full_hash = git_get_full_hash($project, $hash);
>> + if ($full_hash =~ /^$hash/) {
>> + $hash = git_get_short_hash($project, $hash);
>> + } else {
>> + $hash .= '-' . git_get_short_hash($project, $hash);
>> + }
>
> I think we might want to avoid calling git_get_full_hash (and extra call
> to "git rev-parse" command, which is extra fork) if we know in advance
> that $full_hash =~ /^$hash/ can't be true, i.e. if $hash doesn't match
> /^[0-9a-fA-F]+$/. That would require that we continue to use $hash
> and not $full_hash, see comment for the chunk below.
>
> BTW do you think that having better name (nicer name in the case
> when $hash is full SHA-1, or name which describes exact version as
> in the case when $hash is branch name or just 'HEAD') is worth
> slight extra cost of "git rev-parse --abbrev=7"?
Hmm, yeah, some optimization will have to occur in that block of
code. Though, my reason for that extra call to rev-parse to get the
short hash is so I can get git to find the shortest unique SHA-1,
instead of just assuming that it will always be of length 7. I think
the cost is not too bad considering a snapshot will have to be generated
and probably take way more time. Though, warthog9 has some caching
patches that work, so maybe it isn't worth it. Hmm...
>> my $name = $project;
>> $name =~ s,([^/])/*\.git$,$1,;
>> $name = basename($name);
>> @@ -5213,7 +5245,7 @@ sub git_snapshot {
>> $cmd = quote_command(
>> git_cmd(), 'archive',
>> "--format=$known_snapshot_formats{$format}{'format'}",
>> - "--prefix=$name/", $hash);
>> + "--prefix=$name/", $full_hash);
>
> Why this change?
Since $hash can change by becoming something like 'HEAD-43ab5f2c' due to
the process of creating the better name we need to pass something to
`archive' that will be valid, and $full_hash will be valid.
>> +test_description='gitweb as standalone script (parsing script output).
>> +
>> +This test runs gitweb (git web interface) as a CGI script from the
>> +commandline, and checks that it produces the correct output, either
>> +in the HTTP header or the actual script output.'
>
> Currently all tests here are about 'snapshot' action. They are quite
> specific, and they do require some knowledge about chosen archive format.
> I think it would be better to put snapshot test into separate test,
> i.e. in 't/t9502-gitweb-standalone-snapshot.sh'.
>
Ok.
>> +test_commit \
>> + 'SnapshotFileTests' \
>> + 'i can has snapshot?'
>
> Errr... with filename [cutely] called 'i can has snapshot?' you would
> have, I guess, problems with tests on MS Windows, where IIRC '?' is
> forbidden in filenames.
I was able to confirm `?' as a forbidden file name character in Windows 7,
so I will have to change that...
> In the test below you use "git rev-parse --verify HEAD" and
> "git rev-parse --short HEAD" over and over. I think it would be better
> to calculate them upfront:
>
> +test_expect_success 'calculate full and short ids' '
> + FULLID= $(git rev-parse --verify HEAD) &&
> + SHORTID=$(git rev-parse --short=7 HEAD)
> +'
>
Ok.
>> + ID=`git rev-parse --short HEAD` &&
>> + grep ".git-$ID.tar.gz" gitweb.output
>
> Here had to think a bit that gitweb.output consists both of HTTP headers,
> and of response body, and you are grepping here in the HTTP headers part.
> It would be better solution for gitweb_run to split gitweb.output into
> gitweb.headers and gitweb.body (perhaps if requested by setting some
> variable, e.g. GITWEB_SPLIT_OUTPUT).
>
> It can be done using the following lines:
>
> sed -e '/^\r$/' <gitweb.output >gitweb.headers
> sed -n -e '0,/^\r$/!p' <gitweb.output >gitweb.body
>
> # gitweb.headers is used to parse http headers
> # gitweb.body is response without http headers
>
> But the second one uses GNU sed extension; I don't know how to write
> it in more portable way.
I like this and will try to find a way of setting this up without using
GNU extensions.
> Note that this would mean that t/t9501-gitweb-standalone-http-status.sh
> should also be updated to use gitweb.headers and gitweb.body
Yeah, now I have a few things mentioned by either you or Junio that I
should probably fix in the test cases I have submitted. I will clean
them up in a separate patch once I finish with this patch.
>> + gitweb_run "p=.git;a=snapshot;h=SnapshotFileTests;sf=tgz" &&
>> + ID=`git rev-parse --short SnapshotFileTests` &&
>> + grep ".git-SnapshotFileTests-$ID.tar.gz" gitweb.output
>> +'
>> +test_debug 'cat gitweb.output'
>
> Note that to avoid ambiguities currently gitweb uses refs/heads/master
> and refs/tags/SnapshotFileTests... but dealing with this issue should be
> left, I think, for separate commit.
>
I do not understand what ambiguity exists, can you please explain this?
--
Mark Rada (ferrous26)
marada@uwaterloo.ca
next prev parent reply other threads:[~2009-10-12 15:41 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-26 17:46 [PATCH v5 2/2] gitweb: append short hash ids to snapshot files Mark Rada
2009-10-05 10:06 ` Jakub Narebski
2009-10-12 15:34 ` Mark Rada [this message]
2009-10-13 23:46 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AD34C93.20605@mailservices.uwaterloo.ca \
--to=marada@uwaterloo.ca \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jnareb@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.