Re: [PATCHv4] gitweb: refactor input parameters parse/validation

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jakub Narebski <jnareb@gmail.com>
To: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Cc: git@vger.kernel.org, Petr Baudis <pasky@suse.cz>,
	Junio C Hamano <gitster@pobox.com>,
	"Shawn O. Pearce" <spearce@spearce.org>
Subject: Re: [PATCHv4] gitweb: refactor input parameters parse/validation
Date: Fri, 3 Oct 2008 03:36:30 +0200	[thread overview]
Message-ID: <200810030336.31469.jnareb@gmail.com> (raw)
In-Reply-To: <1222906234-8182-3-git-send-email-giuseppe.bilotta@gmail.com>

On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:

> Since input parameters can now be obtained both from CGI parameters and
> PATH_INFO, we would like most of the code to be agnostic about the way
> parameters were retrieved.
> 

I'd prefer that this cleanup/refactoring patch was _first_ patch in
the series, as we were able to obtain parameters both from CGI query
parameters and from PATH_INFO ($project, $hash or $hash_base+$file_name)
_before_ first patch in this series.  So it correct not only issue
introduced by first patch (and fixed somewhat there), but what was
outstanding (but rare because gitweb did not generate such links)
issue.


> We thus collect all the parameters into the new %input_params hash,
> removing the need for ad-hoc kludgy code in href(). 

Alternate solution would be push data from PATH_INFO in query params
data using (for example)

  $cgi->param('a', $action);

or, naming parameters explicitely

  $cgi->param(-name=>'a', -value=>$action);

This avoids need for additional variable, reuses current code, and
nicely sidesteps issue whether to use long names for keys ('action',
'file_name') or short ones from CGI query ('a', 'f').


It would probably has to be at least partially written to check which
of those two solutions (%input_params or $cgi->param('a', $a)))
is better.

> As much of the 
> parameters validation code is now shared between CGI and PATH_INFO,
> although this requires PATH_INFO parsing to be interleaved into the main
> code instead of being factored out into its own routine.

I'm not sure if it is worth it then to unify parameters validation,
with such drawback.

> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
> ---
>  gitweb/gitweb.perl |  336 ++++++++++++++++++++++++++++------------------------
>  1 files changed, 183 insertions(+), 153 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index f088681..ec4326f 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -28,8 +28,10 @@ our $my_url = $cgi->url();
>  our $my_uri = $cgi->url(-absolute => 1);
>  
>  # if we're called with PATH_INFO, we have to strip that
> -# from the URL to find our real URL
> -if (my $path_info = $ENV{"PATH_INFO"}) {
> +# from the URL to find our real URL. PATH_INFO is kept
> +# as it's used later on for parameter extraction
> +my $path_info = $ENV{"PATH_INFO"};
> +if ($path_info) {
>  	$my_url =~ s,\Q$path_info\E$,,;
>  	$my_uri =~ s,\Q$path_info\E$,,;
>  }

This might be separate patch, if you wanted to increase your commit
count ;-)  More seriously I think it should be at least briefly
mentioned in commit message (make $path_info global).

> @@ -390,15 +392,111 @@ $projects_list ||= $projectroot;
>  
>  # ======================================================================
>  # input validation and dispatch
> -our $action = $cgi->param('a');
> -if (defined $action) {
> -	if ($action =~ m/[^0-9a-zA-Z\.\-_]/) {
> -		die_error(400, "Invalid action parameter");
> +
> +# input parameters can be collected from a variety of sources (presently, CGI
> +# and PATH_INFO), so we define an %input_params hash that collects them all
> +# together during validation: this allows subsequent uses (e.g. href()) to be
> +# agnostic of the parameter origin
> +
> +my %input_params = ();
> +
> +# input parameters are stored with the long parameter name as key, so we need
> +# the name -> CGI key mapping here. This will also be used in the href
> +# subroutine to convert parameters to their CGI equivalent.
> +#
> +# XXX: Warning: If you touch this, check the search form for updating,
> +# too.
> +
> +my @cgi_param_mapping = (
> +	project => "p",
> +	action => "a",
> +	file_name => "f",
> +	file_parent => "fp",
> +	hash => "h",
> +	hash_parent => "hp",
> +	hash_base => "hb",
> +	hash_parent_base => "hpb",
> +	page => "pg",
> +	order => "o",
> +	searchtext => "s",
> +	searchtype => "st",
> +	snapshot_format => "sf",
> +	extra_options => "opt",
> +	search_use_regexp => "sr",
> +);
> +my %cgi_param_mapping = @cgi_param_mapping;
> +
> +# we will also need to know the possible actions, for validation
> +my %actions = (
> +	"blame" => \&git_blame,
> +	"blobdiff" => \&git_blobdiff,
> +	"blobdiff_plain" => \&git_blobdiff_plain,
> +	"blob" => \&git_blob,
> +	"blob_plain" => \&git_blob_plain,
> +	"commitdiff" => \&git_commitdiff,
> +	"commitdiff_plain" => \&git_commitdiff_plain,
> +	"commit" => \&git_commit,
> +	"forks" => \&git_forks,
> +	"heads" => \&git_heads,
> +	"history" => \&git_history,
> +	"log" => \&git_log,
> +	"rss" => \&git_rss,
> +	"atom" => \&git_atom,
> +	"search" => \&git_search,
> +	"search_help" => \&git_search_help,
> +	"shortlog" => \&git_shortlog,
> +	"summary" => \&git_summary,
> +	"tag" => \&git_tag,
> +	"tags" => \&git_tags,
> +	"tree" => \&git_tree,
> +	"snapshot" => \&git_snapshot,
> +	"object" => \&git_object,
> +	# those below don't need $project
> +	"opml" => \&git_opml,
> +	"project_list" => \&git_project_list,
> +	"project_index" => \&git_project_index,
> +);

Nice, although I'm not sure if [%@]cgi_param_mapping has to global.
If we use long parameters names as keys, I think it has to, somewhat.
See also comment below.

> +
> +# fill %input_params with the CGI parameters. All values except for 'opt'
> +# should be single values, but opt can be an array. We should probably
> +# build an array of parameters that can be multi-valued, but since for the time
> +# being it's only this one, we just single it out
> +while (my ($name, $symbol) = each %cgi_param_mapping) {
> +	if ($symbol eq 'opt') {
> +		$input_params{$name} = [$cgi->param($symbol)];
> +	} else {
> +		$input_params{$name} = $cgi->param($symbol);
>  	}
>  }

If it was chosen to use short (CGI query) parameter names, the above
loop could be replaced by simple

  %input_params = $cgi->Vars;

or to be more exact, if we want to have multi-valued parameters stored
as array ref

  %input_params = map { [ split /\0/ ] if /\0/; } $cgi->Vars;


See CGI(3pm):

    When using this [Vars], the thing you must watch out for are multivalued CGI
    parameters.  Because a hash cannot distinguish between scalar and list con-
    text, multivalued parameters will be returned as a packed string, separated
    by the "\0" (null) character.  You must split this packed string in order
    to get at the individual values.
  
> -# parameters which are pathnames
> -our $project = $cgi->param('p');
> +# next, we want to parse PATH_INFO (which was already stored in $path_info at
> +# the beginning). This is a little hairy because PATH_INFO parsing needs some
> +# form of parameter validation, so we interleave parsing and validation.

I don't think it is a good idea. In my opinion, for my taste, it would
be better to separate evaluating path_info from the rest.

We could instead introduce convention that if variable (like $project)
is set, then it is assumed to be validated; if it is present only in
the %input_params hash, then it has to be validated.


On the other hand this ordering, first by parameter, then by method
of extraction could be seem quite equally valid.  Nevertheless I think
that previous flow with separate evaluate_path_info() and what should
be evaluate_CGI_query() has better encapsulation.

> +#
> +# The accepted PATH_INFO syntax is $project/$action/$hash or
> +# $project/$action/$hash_base:$file_name, where $action may be missing (mostly for
> +# backwards compatibility), so we need to parse and validate the parameters in
> +# this same order.
> +
> +# clear $path_info of any leading /
> +$path_info =~ s,^/+,,;
> +
> +our $project = $input_params{'project'};
> +if ($path_info && !defined $project) {
> +	# if $project was not defined by CGI, we try to extract it from
> +	# $path_info
> +	$project = $path_info;
> +	$project =~ s,/+$,,;
> +	while ($project && !check_head_link("$projectroot/$project")) {
> +		$project =~ s,/*[^/]*$,,;
> +	}
> +	$input_params{'project'} = $project;
> +} else {
> +	# otherwise, we suppress $path_info parsing altogether
> +	$path_info = undef;
> +}
> +
> +# project name validation
>  if (defined $project) {
>  	if (!validate_pathname($project) ||
>  	    !(-d "$projectroot/$project") ||

Note that this code does less checking if $project is in path_info than
for the case where it is set by CGI query. Perhaps there should be base
fast check in a loop, and more extensive test later.

> @@ -408,16 +506,66 @@ if (defined $project) {
>  		undef $project;
>  		die_error(404, "No such project");
>  	}
> +
> +	# we purge the $project name from the $path_info, preparing it for
> +	# subsequent parameters extraction
> +	$path_info =~ s,^\Q$project\E/*,, if defined $path_info;
> +} else {
> +	# we also suppress $path_info parsing if no project was defined
> +	$path_info = undef;
> +}

In evaluate_path_info it was simply 'return if...'; here with mentioned
interleaving it is harder and a bit hacky.

[cut]

> @@ -615,43 +676,12 @@ sub href (%) {
>  	# default is to use -absolute url() i.e. $my_uri
>  	my $href = $params{-full} ? $my_url : $my_uri;

[cut removed, or rather moved to beginning, @mapping]
  
>  	$params{'project'} = $project unless exists $params{'project'};
>  
>  	if ($params{-replay}) {
> -		while (my ($name, $symbol) = each %mapping) {
> -			if (!exists $params{$name}) {
> -				# the parameter we want to recycle may be either part of the
> -				# list of CGI parameter, or recovered from PATH_INFO
> -				if ($cgi->param($symbol)) {
> -					# to allow for multivalued params we use arrayref form
> -					$params{$name} = [ $cgi->param($symbol) ];
> -				} else {
> -					no strict 'refs';
> -					$params{$name} = $$name if $$name;
> -				}
> -			}
> +		while (my ($name, $val) = each %input_params) {
> +			$params{$name} = $input_params{$name}
> +				unless (exists $params{$name});

Very nice, short code.  Should be something like that from
the very beginning.

-- 
Jakub Narebski
Poland

next prev parent reply	other threads:[~2008-10-03  1:50 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-02  0:10 [PATCHv4] gitweb: PATH_INFO support improvements Giuseppe Bilotta
2008-10-02  0:10 ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
2008-10-02  0:10   ` [PATCHv4] gitweb: refactor input parameters parse/validation Giuseppe Bilotta
2008-10-02  0:10     ` [PATCHv4] gitweb: generate project/action/hash URLs Giuseppe Bilotta
2008-10-02  0:10       ` [PATCHv4] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
2008-10-02  0:10         ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
2008-10-02  0:10           ` [PATCHv4] gitweb: generate parent..current URLs Giuseppe Bilotta
2008-10-06  0:17             ` Jakub Narebski
2008-10-04  1:31           ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Jakub Narebski
2008-10-04  7:24             ` Giuseppe Bilotta
2008-10-04  7:48               ` Jakub Narebski
2008-10-05  8:19             ` Jakub Narebski
2008-10-03 11:28         ` [PATCHv4] gitweb: use_pathinfo filenames start with / Jakub Narebski
2008-10-03  1:48       ` [PATCHv4] gitweb: generate project/action/hash URLs Jakub Narebski
     [not found]         ` <cb7bb73a0810022330l498bdb20h703dec7833a443e@mail.gmail.com>
2008-10-03 11:24           ` Jakub Narebski
2008-10-04  1:15       ` Jakub Narebski
2008-10-03  1:36     ` Jakub Narebski [this message]
2008-10-03  7:24       ` [PATCHv4] gitweb: refactor input parameters parse/validation Giuseppe Bilotta
2008-10-03 11:20         ` Jakub Narebski
2008-10-02  8:59   ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
2008-10-02  9:43     ` Giuseppe Bilotta
2008-10-03  0:48       ` Jakub Narebski
2008-10-03  6:04         ` Giuseppe Bilotta
2008-10-03 10:31           ` Jakub Narebski
2008-10-02 15:34   ` Petr Baudis
2008-10-02 19:30     ` Giuseppe Bilotta
2008-10-02 20:56       ` Petr Baudis
2008-10-02 21:05         ` Giuseppe Bilotta
2008-10-02 22:04           ` Petr Baudis
2008-10-02 22:41             ` Jakub Narebski
2008-10-03  5:54               ` Giuseppe Bilotta
2008-10-02  8:19 ` [PATCHv4] gitweb: PATH_INFO support improvements Jakub Narebski
2008-10-02  8:49   ` Giuseppe Bilotta
2008-10-02 10:16     ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200810030336.31469.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=giuseppe.bilotta@gmail.com \
    --cc=pasky@suse.cz \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).