From: "Giuseppe Bilotta" <giuseppe.bilotta@gmail.com>
To: "Jakub Narebski" <jnareb@gmail.com>
Cc: git@vger.kernel.org, "Petr Baudis" <pasky@suse.cz>,
"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO
Date: Tue, 21 Oct 2008 20:36:41 +0200 [thread overview]
Message-ID: <cb7bb73a0810211136n452ac8bdp7814ff09749b3142@mail.gmail.com> (raw)
In-Reply-To: <200810211844.35714.jnareb@gmail.com>
On Tue, Oct 21, 2008 at 6:44 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> I like the idea behind this patch, to enable to use path_info for as
> much gitweb parameters as possible. After this patch series the only
> parameters which wouldn't be possible to represent in path_info would
> be:
> * @extra_options ('opt') multi-valued parameter, used to pass
> thinks like '--no-merges', which cannot be fit in the "simplified"
> list-like (as opposed to hash-like query string) path_info URL.
> * $searchtype ('st') and $searchtext ('s') etc. parameters, which
> are generated by HTML form, and are naturally generated in query
> string format.
> * $page ('pg') parameter, which could theoretically be added as last
> part of path_info URL, for example $project/next/2/... if not for
> pesky $project/history/next:/Documentation/2/ where you cannot be
> sure that having /<number>/ at the end is rare.
> * $order ('o') parameter, which would be hard to fit in path_info,
> with its limitation of parameters being specified by position.
> Or even next to impossible.
> * 'by_tag'...
>
> But I'd rather have this patch series to be in separate thread...
Yes, a posteriori I think it's better too. I'll resend the 5 path_info
patches with the minor stylistic corrections you suggested, and send
these 3 separately.
> On Sun, 19 Oct 2008, Giuseppe Bilotta wrote:
>
>> We parse requests for $project/snapshot/$head.$sfx as equivalent to
>> $project/snapshot/$head?sf=$sfx, where $sfx is any of the known
>> (although not necessarily supported) snapshot formats (or its default
>> suffix).
>>
>> The filename for the resulting package preserves the requested
>> extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz
>> gives a .tar.gz), although for obvious reasons it doesn't preserve the
>> basename (git/snapshot/next.tgz returns a file names git-next.tgz).
>
> That is a bit of difference from sf=<format> in CGI query string, where
> <format> is always a name of a format (for example 'tgz' or 'tbz2'),
> and actual suffix is defined in %known_snapshot_formats (for example
> '.tar.gz' and '.tar.bz2' respectively). Now you can specify snapshot
> format either either by its name, for example 'tgz' (which is simple
> lookup in hash) which result in proposed filename with '.tgz' suffix,
> or you can specify suffix, for example 'tar.gz' (which requires
> searching through all hash) which result in proposed filename with
> '.tar.gz' suffix.
>
> This is a bit of inconsistency; to be consistent with how we handle
> 'sf' CGI parameter we would translate 'tgz' $sfx into 'tar.gz' in
> snapshot filename. This would also cover currently purely theoretical
> case when different snapshot formats (for example 'tgz' and 'tgz9')
> would use the same snapshot suffix (extension), but differ for example
> in parameters passed to compressor (for example '-9' or '--best' in
> the 'tgz9' case).
>
> On the other hand one would expect that when URL which looks like
> URL to snapshot ends with '.$sfx', then filename for snapshot would
> also end with '.$sfx'.
>
> This certainly requires some further thoughts.
What I decided was to set gitweb to always produce links with the
suffix (.e.g .tar.gz), but I saw no particular reason not to accept
the shorter version which is (1) commonly used as a suffix as well and
(2) happens to be the actual format key used by gitweb.
A different, possibly cleaner approach, but a more extensive change,
would be to have each format describe a list of suffixes, defaulting
to the first one on creation by identifying all of them. This is more
invasive because all of the uses of {'suffix'} have to be replaced
with {'suffix'}[0], or something like that (maybe we could add a
separate key 'other_suffixes' instead?)
>> This introduces a potential case for ambiguity if a project has a head
>> that ends with a snapshot-like suffix (.zip, .tgz, .tar.gz, etc) and the
>> sf CGI parameter is not present; however, gitweb only produces URLs with
>> the sf parameter, so this is only a potential issue for hand-coded URLs
>> for extremely unusual project.
>
> I think you wanted to say here "_currently_ produces URLs with the 'sf'
> parameter" as the next patch in series changes this.
Ah yes, good point.
>> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
>> ---
>>
>> I had second thoughts on this. Now we always look for the snapshot extension if
>> the sf CGI parameter is missing, even if the project has a head that matches
>> the full pseudo-refname $head.$sfx.
>>
>> The reason for this is that (1) there is no ambiguity for gitweb-generated
>> URLs (2) the only URLs that could fail are hand-made URLs for extremely
>> unusual projects and (3) it allows us to set gitweb up to generate
>> (unambiguous) URLs without the sf CGI parameter.
>
> This is also simpler and cheaper solution.
That, too 8-)
>> This also means that I can add 3 patches to the series, instead of just one:
>> * patch #6 that parses the new format
>> * patch #7 that generates the new URLs
>> * patch #8 for some code refactoring
>
> Now, I haven't yet read the last patch in series, so I don't know if
> it is independent refactoring, making sense even before patches named
> #6 and #7 here, or is it connected with searching for snapshot format
> by suffix it uses. If the former, it should be done upfront, as it
> shouldn't need discussion, and being easier to be accepted into git.git.
> If the latter, then it should probably be folded (squashed) into #6,
> first patch in the series.
In fact, patch #8 can be written independently of the other too, and
would provide a significant speed benefit for generation of pages with
lots of 'snapshot' links: what it does is just to make the 'supported
formats' array global, preparing it only once instead of re-preparing
it every time a snapshot link is created.
>> gitweb/gitweb.perl | 34 ++++++++++++++++++++++++++++++++++
>> 1 files changed, 34 insertions(+), 0 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 99c8c20..e9e9e60 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -609,6 +609,40 @@ sub evaluate_path_info {
>> $input_params{'hash_parent'} ||= $parentrefname;
>> }
>> }
>> +
>> + # for the snapshot action, we allow URLs in the form
>> + # $project/snapshot/$hash.ext
>> + # where .ext determines the snapshot and gets removed from the
>> + # passed $refname to provide the $hash.
>> + #
>> + # To be able to tell that $refname includes the format extension, we
>> + # require the following two conditions to be satisfied:
>> + # - the hash input parameter MUST have been set from the $refname part
>> + # of the URL (i.e. they must be equal)
>
> This means no "$project/.tgz?h=next", isn't it?
Right.
>> + # - the snapshot format MUST NOT have been defined already
>
> I would add "which means that 'sf' parameter is not set in URL", or
> something like that as the last line of above comment.
Good idea. I'll make it an 'e.g.' to keep the comment valid for future
additional parameter evaluation such as command-line input.
> I like that the code is so well commented, by the way.
Thanks.
>> + if ($input_params{'action'} eq 'snapshot' && defined $refname &&
>> + $refname eq $input_params{'hash'} &&
>
> Minor nit.
>
> I would use here (the question of style / better readability):
>
> + if ($input_params{'action'} eq 'snapshot' &&
> + defined $refname && $refname eq $input_params{'hash'} &&
>
> to have both conditions about $refname in the same line.
Yes, it'd look much better.
>> + !defined $input_params{'snapshot_format'}) {
>> + # We loop over the known snapshot formats, checking for
>> + # extensions. Allowed extensions are both the defined suffix
>> + # (which includes the initial dot already) and the snapshot
>> + # format key itself, with a prepended dot
>> + while (my ($fmt, %opt) = each %known_snapshot_formats) {
>> + my $hash = $refname;
>> + my $sfx;
>> + $hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//;
>> + next unless $sfx = $1;
>> + # a valid suffix was found, so set the snapshot format
>> + # and reset the hash parameter
>> + $input_params{'snapshot_format'} = $fmt;
>> + $input_params{'hash'} = $hash;
>> + # we also set the format suffix to the one requested
>> + # in the URL: this way a request for e.g. .tgz returns
>> + # a .tgz instead of a .tar.gz
>> + $known_snapshot_formats{$fmt}{'suffix'} = $sfx;
>> + last;
>> + }
>
> I'm not sure if it worth (see comment at the beginning of this mail)
> adding this code, or just allow $sfx to be snapshot _name_ (key in
> %known_snapshot_formats hash).
>
> Otherwise it would be as simple as checking if $known_snapshot_formats{$sfx}
> exists (assuming that snapshot format names does not contain '.').
>
> If we decide to go more complicated route, then refactoring it in such
> a way that suffixes are also keys to %known_snapshot_formats would be
> preferred... err, sorry, not so simple. But refactoring this check
> into separate subroutine (as I think last patch in series does) would
> be good idea.
See comments above.
> Also, I'd rather you checked if the $refname part contains '.' for it
> to even consider that it can be suffix.
Ah, good idea.
--
Giuseppe "Oblomov" Bilotta
next prev parent reply other threads:[~2008-10-21 18:38 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-16 20:27 [PATCHv6 0/5] gitweb: PATH_INFO enhancement Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta
2008-10-19 12:11 ` [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO Giuseppe Bilotta
2008-10-19 14:24 ` [PATHv2 6/8] " Giuseppe Bilotta
2008-10-19 14:24 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta
2008-10-19 14:24 ` [PATHv2 8/8] gitweb: make the supported snapshot formats array global Giuseppe Bilotta
2008-11-02 1:54 ` Jakub Narebski
2008-11-02 8:50 ` Junio C Hamano
2008-11-01 0:18 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Jakub Narebski
2008-11-01 12:57 ` Giuseppe Bilotta
2008-10-21 16:44 ` [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO Jakub Narebski
2008-10-21 18:36 ` Giuseppe Bilotta [this message]
2008-10-21 19:09 ` Junio C Hamano
2008-10-20 10:49 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Jakub Narebski
2008-10-20 14:57 ` Giuseppe Bilotta
2008-10-20 9:18 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Jakub Narebski
2008-10-20 14:52 ` Giuseppe Bilotta
2008-10-18 23:26 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Jakub Narebski
2008-10-18 23:57 ` Giuseppe Bilotta
2008-10-19 8:43 ` Jakub Narebski
2008-10-18 23:14 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Jakub Narebski
2008-10-18 22:41 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cb7bb73a0810211136n452ac8bdp7814ff09749b3142@mail.gmail.com \
--to=giuseppe.bilotta@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jnareb@gmail.com \
--cc=pasky@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).