git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Cc: git@vger.kernel.org, Petr Baudis <pasky@suse.cz>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO
Date: Tue, 21 Oct 2008 18:44:34 +0200	[thread overview]
Message-ID: <200810211844.35714.jnareb@gmail.com> (raw)
In-Reply-To: <1224426270-27755-1-git-send-email-giuseppe.bilotta@gmail.com>

I like the idea behind this patch, to enable to use path_info for as
much gitweb parameters as possible.  After this patch series the only
parameters which wouldn't be possible to represent in path_info would
be: 
 * @extra_options ('opt') multi-valued parameter, used to pass
   thinks like '--no-merges', which cannot be fit in the "simplified"
   list-like (as opposed to hash-like query string) path_info URL.
 * $searchtype ('st') and $searchtext ('s') etc. parameters, which
   are generated by HTML form, and are naturally generated in query
   string format.
 * $page ('pg') parameter, which could theoretically be added as last
   part of path_info URL, for example $project/next/2/... if not for
   pesky $project/history/next:/Documentation/2/ where you cannot be
   sure that having /<number>/ at the end is rare.
 * $order ('o') parameter, which would be hard to fit in path_info,
   with its limitation of parameters being specified by position.
   Or even next to impossible.
 * 'by_tag'...

But I'd rather have this patch series to be in separate thread...


On Sun, 19 Oct 2008, Giuseppe Bilotta wrote:

> We parse requests for $project/snapshot/$head.$sfx as equivalent to
> $project/snapshot/$head?sf=$sfx, where $sfx is any of the known
> (although not necessarily supported) snapshot formats (or its default
> suffix).
> 
> The filename for the resulting package preserves the requested
> extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz
> gives a .tar.gz), although for obvious reasons it doesn't preserve the
> basename (git/snapshot/next.tgz returns a file names git-next.tgz).

That is a bit of difference from sf=<format> in CGI query string, where
<format> is always a name of a format (for example 'tgz' or 'tbz2'), 
and actual suffix is defined in %known_snapshot_formats (for example
'.tar.gz' and '.tar.bz2' respectively).  Now you can specify snapshot
format either either by its name, for example 'tgz' (which is simple
lookup in hash) which result in proposed filename with '.tgz' suffix,
or you can specify suffix, for example 'tar.gz' (which requires
searching through all hash) which result in proposed filename with
'.tar.gz' suffix.

This is a bit of inconsistency; to be consistent with how we handle
'sf' CGI parameter we would translate 'tgz' $sfx into 'tar.gz' in
snapshot filename.  This would also cover currently purely theoretical
case when different snapshot formats (for example 'tgz' and 'tgz9')
would use the same snapshot suffix (extension), but differ for example
in parameters passed to compressor (for example '-9' or '--best' in
the 'tgz9' case).

On the other hand one would expect that when URL which looks like
URL to snapshot ends with '.$sfx', then filename for snapshot would
also end with '.$sfx'.

This certainly requires some further thoughts.

> 
> This introduces a potential case for ambiguity if a project has a head
> that ends with a snapshot-like suffix (.zip, .tgz, .tar.gz, etc) and the
> sf CGI parameter is not present; however, gitweb only produces URLs with
> the sf parameter, so this is only a potential issue for hand-coded URLs
> for extremely unusual project.

I think you wanted to say here "_currently_ produces URLs with the 'sf'
parameter" as the next patch in series changes this.

> 
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
> ---
> 
> I had second thoughts on this. Now we always look for the snapshot extension if
> the sf CGI parameter is missing, even if the project has a head that matches
> the full pseudo-refname $head.$sfx.
> 
> The reason for this is that (1) there is no ambiguity for gitweb-generated
> URLs (2) the only URLs that could fail are hand-made URLs for extremely
> unusual projects and (3) it allows us to set gitweb up to generate
> (unambiguous) URLs without the sf CGI parameter.

This is also simpler and cheaper solution.

> 
> This also means that I can add 3 patches to the series, instead of just one:
> * patch #6 that parses the new format
> * patch #7 that generates the new URLs
> * patch #8 for some code refactoring

Now, I haven't yet read the last patch in series, so I don't know if
it is independent refactoring, making sense even before patches named
#6 and #7 here, or is it connected with searching for snapshot format
by suffix it uses.  If the former, it should be done upfront, as it
shouldn't need discussion, and being easier to be accepted into git.git.
If the latter, then it should probably be folded (squashed) into #6,
first patch in the series.

> 
>  gitweb/gitweb.perl |   34 ++++++++++++++++++++++++++++++++++
>  1 files changed, 34 insertions(+), 0 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 99c8c20..e9e9e60 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -609,6 +609,40 @@ sub evaluate_path_info {
>  			$input_params{'hash_parent'} ||= $parentrefname;
>  		}
>  	}
> +
> +	# for the snapshot action, we allow URLs in the form
> +	# $project/snapshot/$hash.ext
> +	# where .ext determines the snapshot and gets removed from the
> +	# passed $refname to provide the $hash.
> +	#
> +	# To be able to tell that $refname includes the format extension, we
> +	# require the following two conditions to be satisfied:
> +	# - the hash input parameter MUST have been set from the $refname part
> +	#   of the URL (i.e. they must be equal)

This means no "$project/.tgz?h=next", isn't it?

> +	# - the snapshot format MUST NOT have been defined already

I would add "which means that 'sf' parameter is not set in URL", or
something like that as the last line of above comment.

I like that the code is so well commented, by the way.

> +	if ($input_params{'action'} eq 'snapshot' && defined $refname &&
> +		$refname eq $input_params{'hash'} &&

Minor nit.

I would use here (the question of style / better readability):

+	if ($input_params{'action'} eq 'snapshot' &&
+		 defined $refname && $refname eq $input_params{'hash'} &&

to have both conditions about $refname in the same line.

> +		!defined $input_params{'snapshot_format'}) {
> +		# We loop over the known snapshot formats, checking for
> +		# extensions. Allowed extensions are both the defined suffix
> +		# (which includes the initial dot already) and the snapshot
> +		# format key itself, with a prepended dot
> +		while (my ($fmt, %opt) = each %known_snapshot_formats) {
> +			my $hash = $refname;
> +			my $sfx;
> +			$hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//;
> +			next unless $sfx = $1;
> +			# a valid suffix was found, so set the snapshot format
> +			# and reset the hash parameter
> +			$input_params{'snapshot_format'} = $fmt;
> +			$input_params{'hash'} = $hash;
> +			# we also set the format suffix to the one requested
> +			# in the URL: this way a request for e.g. .tgz returns
> +			# a .tgz instead of a .tar.gz
> +			$known_snapshot_formats{$fmt}{'suffix'} = $sfx;
> +			last;
> +		}

I'm not sure if it worth (see comment at the beginning of this mail)
adding this code, or just allow $sfx to be snapshot _name_ (key in
%known_snapshot_formats hash).

Otherwise it would be as simple as checking if $known_snapshot_formats{$sfx}
exists (assuming that snapshot format names does not contain '.').

If we decide to go more complicated route, then refactoring it in such
a way that suffixes are also keys to %known_snapshot_formats would be
preferred... err, sorry, not so simple.  But refactoring this check
into separate subroutine (as I think last patch in series does) would
be good idea.


Also, I'd rather you checked if the $refname part contains '.' for it
to even consider that it can be suffix.

> +	}
>  }
>  evaluate_path_info();
>  
> -- 
> 1.5.6.5
> 
> 

-- 
Jakub Narebski
Poland

  parent reply	other threads:[~2008-10-21 16:46 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-16 20:27 [PATCHv6 0/5] gitweb: PATH_INFO enhancement Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
2008-10-16 20:27   ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta
2008-10-16 20:27     ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
2008-10-16 20:27       ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta
2008-10-16 20:27         ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta
2008-10-19 12:11           ` [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO Giuseppe Bilotta
2008-10-19 14:24           ` [PATHv2 6/8] " Giuseppe Bilotta
2008-10-19 14:24             ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta
2008-10-19 14:24               ` [PATHv2 8/8] gitweb: make the supported snapshot formats array global Giuseppe Bilotta
2008-11-02  1:54                 ` Jakub Narebski
2008-11-02  8:50                   ` Junio C Hamano
2008-11-01  0:18               ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Jakub Narebski
2008-11-01 12:57                 ` Giuseppe Bilotta
2008-10-21 16:44             ` Jakub Narebski [this message]
2008-10-21 18:36               ` [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO Giuseppe Bilotta
2008-10-21 19:09                 ` Junio C Hamano
2008-10-20 10:49           ` [PATCHv6 5/5] gitweb: generate parent..current URLs Jakub Narebski
2008-10-20 14:57             ` Giuseppe Bilotta
2008-10-20  9:18         ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Jakub Narebski
2008-10-20 14:52           ` Giuseppe Bilotta
2008-10-18 23:26       ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Jakub Narebski
2008-10-18 23:57         ` Giuseppe Bilotta
2008-10-19  8:43           ` Jakub Narebski
2008-10-18 23:14     ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Jakub Narebski
2008-10-18 22:41   ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200810211844.35714.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=giuseppe.bilotta@gmail.com \
    --cc=pasky@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).