git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Giuseppe Bilotta" <giuseppe.bilotta@gmail.com>
To: "Jakub Narebski" <jnareb@gmail.com>
Cc: git@vger.kernel.org, "Petr Baudis" <pasky@suse.cz>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO
Date: Tue, 21 Oct 2008 20:36:41 +0200	[thread overview]
Message-ID: <cb7bb73a0810211136n452ac8bdp7814ff09749b3142@mail.gmail.com> (raw)
In-Reply-To: <200810211844.35714.jnareb@gmail.com>

On Tue, Oct 21, 2008 at 6:44 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> I like the idea behind this patch, to enable to use path_info for as
> much gitweb parameters as possible.  After this patch series the only
> parameters which wouldn't be possible to represent in path_info would
> be:
>  * @extra_options ('opt') multi-valued parameter, used to pass
>   thinks like '--no-merges', which cannot be fit in the "simplified"
>   list-like (as opposed to hash-like query string) path_info URL.
>  * $searchtype ('st') and $searchtext ('s') etc. parameters, which
>   are generated by HTML form, and are naturally generated in query
>   string format.
>  * $page ('pg') parameter, which could theoretically be added as last
>   part of path_info URL, for example $project/next/2/... if not for
>   pesky $project/history/next:/Documentation/2/ where you cannot be
>   sure that having /<number>/ at the end is rare.
>  * $order ('o') parameter, which would be hard to fit in path_info,
>   with its limitation of parameters being specified by position.
>   Or even next to impossible.
>  * 'by_tag'...
>
> But I'd rather have this patch series to be in separate thread...

Yes, a posteriori I think it's better too. I'll resend the 5 path_info
patches with the minor stylistic corrections you suggested, and send
these 3 separately.


> On Sun, 19 Oct 2008, Giuseppe Bilotta wrote:
>
>> We parse requests for $project/snapshot/$head.$sfx as equivalent to
>> $project/snapshot/$head?sf=$sfx, where $sfx is any of the known
>> (although not necessarily supported) snapshot formats (or its default
>> suffix).
>>
>> The filename for the resulting package preserves the requested
>> extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz
>> gives a .tar.gz), although for obvious reasons it doesn't preserve the
>> basename (git/snapshot/next.tgz returns a file names git-next.tgz).
>
> That is a bit of difference from sf=<format> in CGI query string, where
> <format> is always a name of a format (for example 'tgz' or 'tbz2'),
> and actual suffix is defined in %known_snapshot_formats (for example
> '.tar.gz' and '.tar.bz2' respectively).  Now you can specify snapshot
> format either either by its name, for example 'tgz' (which is simple
> lookup in hash) which result in proposed filename with '.tgz' suffix,
> or you can specify suffix, for example 'tar.gz' (which requires
> searching through all hash) which result in proposed filename with
> '.tar.gz' suffix.
>
> This is a bit of inconsistency; to be consistent with how we handle
> 'sf' CGI parameter we would translate 'tgz' $sfx into 'tar.gz' in
> snapshot filename.  This would also cover currently purely theoretical
> case when different snapshot formats (for example 'tgz' and 'tgz9')
> would use the same snapshot suffix (extension), but differ for example
> in parameters passed to compressor (for example '-9' or '--best' in
> the 'tgz9' case).
>
> On the other hand one would expect that when URL which looks like
> URL to snapshot ends with '.$sfx', then filename for snapshot would
> also end with '.$sfx'.
>
> This certainly requires some further thoughts.

What I decided was to set gitweb to always produce links with the
suffix (.e.g .tar.gz), but I saw no particular reason not to accept
the shorter version which is (1) commonly used as a suffix as well and
(2) happens to be the actual format key used by gitweb.

A different, possibly cleaner approach, but a more extensive change,
would be to have each format describe a list of suffixes, defaulting
to the first one on creation by identifying all of them. This is more
invasive because all of the uses of {'suffix'} have to be replaced
with {'suffix'}[0], or something like that (maybe we could add a
separate key 'other_suffixes' instead?)


>> This introduces a potential case for ambiguity if a project has a head
>> that ends with a snapshot-like suffix (.zip, .tgz, .tar.gz, etc) and the
>> sf CGI parameter is not present; however, gitweb only produces URLs with
>> the sf parameter, so this is only a potential issue for hand-coded URLs
>> for extremely unusual project.
>
> I think you wanted to say here "_currently_ produces URLs with the 'sf'
> parameter" as the next patch in series changes this.

Ah yes, good point.

>> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
>> ---
>>
>> I had second thoughts on this. Now we always look for the snapshot extension if
>> the sf CGI parameter is missing, even if the project has a head that matches
>> the full pseudo-refname $head.$sfx.
>>
>> The reason for this is that (1) there is no ambiguity for gitweb-generated
>> URLs (2) the only URLs that could fail are hand-made URLs for extremely
>> unusual projects and (3) it allows us to set gitweb up to generate
>> (unambiguous) URLs without the sf CGI parameter.
>
> This is also simpler and cheaper solution.

That, too 8-)

>> This also means that I can add 3 patches to the series, instead of just one:
>> * patch #6 that parses the new format
>> * patch #7 that generates the new URLs
>> * patch #8 for some code refactoring
>
> Now, I haven't yet read the last patch in series, so I don't know if
> it is independent refactoring, making sense even before patches named
> #6 and #7 here, or is it connected with searching for snapshot format
> by suffix it uses.  If the former, it should be done upfront, as it
> shouldn't need discussion, and being easier to be accepted into git.git.
> If the latter, then it should probably be folded (squashed) into #6,
> first patch in the series.

In fact, patch #8 can be written independently of the other too, and
would provide a significant speed benefit for generation of pages with
lots of 'snapshot' links: what it does is just to make the 'supported
formats' array global, preparing it only once instead of re-preparing
it every time a snapshot link is created.

>>  gitweb/gitweb.perl |   34 ++++++++++++++++++++++++++++++++++
>>  1 files changed, 34 insertions(+), 0 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 99c8c20..e9e9e60 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -609,6 +609,40 @@ sub evaluate_path_info {
>>                       $input_params{'hash_parent'} ||= $parentrefname;
>>               }
>>       }
>> +
>> +     # for the snapshot action, we allow URLs in the form
>> +     # $project/snapshot/$hash.ext
>> +     # where .ext determines the snapshot and gets removed from the
>> +     # passed $refname to provide the $hash.
>> +     #
>> +     # To be able to tell that $refname includes the format extension, we
>> +     # require the following two conditions to be satisfied:
>> +     # - the hash input parameter MUST have been set from the $refname part
>> +     #   of the URL (i.e. they must be equal)
>
> This means no "$project/.tgz?h=next", isn't it?

Right.

>> +     # - the snapshot format MUST NOT have been defined already
>
> I would add "which means that 'sf' parameter is not set in URL", or
> something like that as the last line of above comment.

Good idea. I'll make it an 'e.g.' to keep the comment valid for future
additional parameter evaluation such as command-line input.

> I like that the code is so well commented, by the way.

Thanks.

>> +     if ($input_params{'action'} eq 'snapshot' && defined $refname &&
>> +             $refname eq $input_params{'hash'} &&
>
> Minor nit.
>
> I would use here (the question of style / better readability):
>
> +       if ($input_params{'action'} eq 'snapshot' &&
> +                defined $refname && $refname eq $input_params{'hash'} &&
>
> to have both conditions about $refname in the same line.

Yes, it'd look much better.

>> +             !defined $input_params{'snapshot_format'}) {
>> +             # We loop over the known snapshot formats, checking for
>> +             # extensions. Allowed extensions are both the defined suffix
>> +             # (which includes the initial dot already) and the snapshot
>> +             # format key itself, with a prepended dot
>> +             while (my ($fmt, %opt) = each %known_snapshot_formats) {
>> +                     my $hash = $refname;
>> +                     my $sfx;
>> +                     $hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//;
>> +                     next unless $sfx = $1;
>> +                     # a valid suffix was found, so set the snapshot format
>> +                     # and reset the hash parameter
>> +                     $input_params{'snapshot_format'} = $fmt;
>> +                     $input_params{'hash'} = $hash;
>> +                     # we also set the format suffix to the one requested
>> +                     # in the URL: this way a request for e.g. .tgz returns
>> +                     # a .tgz instead of a .tar.gz
>> +                     $known_snapshot_formats{$fmt}{'suffix'} = $sfx;
>> +                     last;
>> +             }
>
> I'm not sure if it worth (see comment at the beginning of this mail)
> adding this code, or just allow $sfx to be snapshot _name_ (key in
> %known_snapshot_formats hash).
>
> Otherwise it would be as simple as checking if $known_snapshot_formats{$sfx}
> exists (assuming that snapshot format names does not contain '.').
>
> If we decide to go more complicated route, then refactoring it in such
> a way that suffixes are also keys to %known_snapshot_formats would be
> preferred... err, sorry, not so simple.  But refactoring this check
> into separate subroutine (as I think last patch in series does) would
> be good idea.

See comments above.

> Also, I'd rather you checked if the $refname part contains '.' for it
> to even consider that it can be suffix.

Ah, good idea.

-- 
Giuseppe "Oblomov" Bilotta

  reply	other threads:[~2008-10-21 18:38 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-16 20:27 [PATCHv6 0/5] gitweb: PATH_INFO enhancement Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
2008-10-16 20:27   ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta
2008-10-16 20:27     ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
2008-10-16 20:27       ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta
2008-10-16 20:27         ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta
2008-10-19 12:11           ` [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO Giuseppe Bilotta
2008-10-19 14:24           ` [PATHv2 6/8] " Giuseppe Bilotta
2008-10-19 14:24             ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta
2008-10-19 14:24               ` [PATHv2 8/8] gitweb: make the supported snapshot formats array global Giuseppe Bilotta
2008-11-02  1:54                 ` Jakub Narebski
2008-11-02  8:50                   ` Junio C Hamano
2008-11-01  0:18               ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Jakub Narebski
2008-11-01 12:57                 ` Giuseppe Bilotta
2008-10-21 16:44             ` [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO Jakub Narebski
2008-10-21 18:36               ` Giuseppe Bilotta [this message]
2008-10-21 19:09                 ` Junio C Hamano
2008-10-20 10:49           ` [PATCHv6 5/5] gitweb: generate parent..current URLs Jakub Narebski
2008-10-20 14:57             ` Giuseppe Bilotta
2008-10-20  9:18         ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Jakub Narebski
2008-10-20 14:52           ` Giuseppe Bilotta
2008-10-18 23:26       ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Jakub Narebski
2008-10-18 23:57         ` Giuseppe Bilotta
2008-10-19  8:43           ` Jakub Narebski
2008-10-18 23:14     ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Jakub Narebski
2008-10-18 22:41   ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb7bb73a0810211136n452ac8bdp7814ff09749b3142@mail.gmail.com \
    --to=giuseppe.bilotta@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=pasky@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).