Re: [PATCHv4] gitweb: refactor input parameters parse/validation

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Giuseppe Bilotta" <giuseppe.bilotta@gmail.com>
To: "Jakub Narebski" <jnareb@gmail.com>
Cc: git@vger.kernel.org, "Petr Baudis" <pasky@suse.cz>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Shawn O. Pearce" <spearce@spearce.org>
Subject: Re: [PATCHv4] gitweb: refactor input parameters parse/validation
Date: Fri, 3 Oct 2008 09:24:35 +0200	[thread overview]
Message-ID: <cb7bb73a0810030024t663940dbqa23f2e1bb75e23bc@mail.gmail.com> (raw)
In-Reply-To: <200810030336.31469.jnareb@gmail.com>

On Fri, Oct 3, 2008 at 3:36 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:
>
>> Since input parameters can now be obtained both from CGI parameters and
>> PATH_INFO, we would like most of the code to be agnostic about the way
>> parameters were retrieved.
>
> I'd prefer that this cleanup/refactoring patch was _first_ patch in
> the series, as we were able to obtain parameters both from CGI query
> parameters and from PATH_INFO ($project, $hash or $hash_base+$file_name)
> _before_ first patch in this series.  So it correct not only issue
> introduced by first patch (and fixed somewhat there), but what was
> outstanding (but rare because gitweb did not generate such links)
> issue.

It's true that the patch makes sense regardless of the rest of the
path_info patch, indeed.

>> We thus collect all the parameters into the new %input_params hash,
>> removing the need for ad-hoc kludgy code in href().
>
> Alternate solution would be push data from PATH_INFO in query params
> data using (for example)
>
>  $cgi->param('a', $action);
>
> or, naming parameters explicitely
>
>  $cgi->param(-name=>'a', -value=>$action);
>
> This avoids need for additional variable, reuses current code, and
> nicely sidesteps issue whether to use long names for keys ('action',
> 'file_name') or short ones from CGI query ('a', 'f').
>
>
> It would probably has to be at least partially written to check which
> of those two solutions (%input_params or $cgi->param('a', $a)))
> is better.

If we really don't want to care where the parameters came from *at
all*, writing on $cgi->param is likely to be the cleanest solution.

>> As much of the
>> parameters validation code is now shared between CGI and PATH_INFO,
>> although this requires PATH_INFO parsing to be interleaved into the main
>> code instead of being factored out into its own routine.
>
> I'm not sure if it is worth it then to unify parameters validation,
> with such drawback.

Yeah, I don't like it very much either. But it does spare a little on
the validation. OTOH, the amount we spare is not extraordinary, so
it's probably not worth the spaghettization of the code ...

>>  # if we're called with PATH_INFO, we have to strip that
>> -# from the URL to find our real URL
>> -if (my $path_info = $ENV{"PATH_INFO"}) {
>> +# from the URL to find our real URL. PATH_INFO is kept
>> +# as it's used later on for parameter extraction
>> +my $path_info = $ENV{"PATH_INFO"};
>> +if ($path_info) {
>>       $my_url =~ s,\Q$path_info\E$,,;
>>       $my_uri =~ s,\Q$path_info\E$,,;
>>  }
>
> This might be separate patch, if you wanted to increase your commit
> count ;-)  More seriously I think it should be at least briefly
> mentioned in commit message (make $path_info global).

Note however that the path_info evaluation is _destructive_, so after
evaluation is complete we don't have much of it left.

[snip]

> Nice, although I'm not sure if [%@]cgi_param_mapping has to global.
> If we use long parameters names as keys, I think it has to, somewhat.
> See also comment below.

>> +# fill %input_params with the CGI parameters. All values except for 'opt'
>> +# should be single values, but opt can be an array. We should probably
>> +# build an array of parameters that can be multi-valued, but since for the time
>> +# being it's only this one, we just single it out
>> +while (my ($name, $symbol) = each %cgi_param_mapping) {
>> +     if ($symbol eq 'opt') {
>> +             $input_params{$name} = [$cgi->param($symbol)];
>> +     } else {
>> +             $input_params{$name} = $cgi->param($symbol);
>>       }
>>  }
>
> If it was chosen to use short (CGI query) parameter names, the above
> loop could be replaced by simple
>
>  %input_params = $cgi->Vars;
>
> or to be more exact, if we want to have multi-valued parameters stored
> as array ref
>
>  %input_params = map { [ split /\0/ ] if /\0/; } $cgi->Vars;
>
>
> See CGI(3pm):
>
>    When using this [Vars], the thing you must watch out for are multivalued CGI
>    parameters.  Because a hash cannot distinguish between scalar and list con-
>    text, multivalued parameters will be returned as a packed string, separated
>    by the "\0" (null) character.  You must split this packed string in order
>    to get at the individual values.

Ah, an interesting alternative. This would make parameter copying a
one-liner, almost as good as just using $cgi->param for everything :)

>> -# parameters which are pathnames
>> -our $project = $cgi->param('p');
>> +# next, we want to parse PATH_INFO (which was already stored in $path_info at
>> +# the beginning). This is a little hairy because PATH_INFO parsing needs some
>> +# form of parameter validation, so we interleave parsing and validation.
>
> I don't think it is a good idea. In my opinion, for my taste, it would
> be better to separate evaluating path_info from the rest.
>
> We could instead introduce convention that if variable (like $project)
> is set, then it is assumed to be validated; if it is present only in
> the %input_params hash, then it has to be validated.
>
>
> On the other hand this ordering, first by parameter, then by method
> of extraction could be seem quite equally valid.  Nevertheless I think
> that previous flow with separate evaluate_path_info() and what should
> be evaluate_CGI_query() has better encapsulation.
>
>> +#
>> +# The accepted PATH_INFO syntax is $project/$action/$hash or
>> +# $project/$action/$hash_base:$file_name, where $action may be missing (mostly for
>> +# backwards compatibility), so we need to parse and validate the parameters in
>> +# this same order.
>> +
>> +# clear $path_info of any leading /
>> +$path_info =~ s,^/+,,;
>> +
>> +our $project = $input_params{'project'};
>> +if ($path_info && !defined $project) {
>> +     # if $project was not defined by CGI, we try to extract it from
>> +     # $path_info
>> +     $project = $path_info;
>> +     $project =~ s,/+$,,;
>> +     while ($project && !check_head_link("$projectroot/$project")) {
>> +             $project =~ s,/*[^/]*$,,;
>> +     }
>> +     $input_params{'project'} = $project;
>> +} else {
>> +     # otherwise, we suppress $path_info parsing altogether
>> +     $path_info = undef;
>> +}
>> +
>> +# project name validation
>>  if (defined $project) {
>>       if (!validate_pathname($project) ||
>>           !(-d "$projectroot/$project") ||
>
> Note that this code does less checking if $project is in path_info than
> for the case where it is set by CGI query. Perhaps there should be base
> fast check in a loop, and more extensive test later.

Uh ... isn't this exactly what's happening? In the loop we just gobble
until we find a git dir. Validation is then done, and it's the _same_
validation for both cases. Why do you say that path_info $project is
less checked?

>> @@ -408,16 +506,66 @@ if (defined $project) {
>>               undef $project;
>>               die_error(404, "No such project");
>>       }
>> +
>> +     # we purge the $project name from the $path_info, preparing it for
>> +     # subsequent parameters extraction
>> +     $path_info =~ s,^\Q$project\E/*,, if defined $path_info;
>> +} else {
>> +     # we also suppress $path_info parsing if no project was defined
>> +     $path_info = undef;
>> +}
>
> In evaluate_path_info it was simply 'return if...'; here with mentioned
> interleaving it is harder and a bit hacky.

I know.

>>       $params{'project'} = $project unless exists $params{'project'};
>>
>>       if ($params{-replay}) {
>> -             while (my ($name, $symbol) = each %mapping) {
>> -                     if (!exists $params{$name}) {
>> -                             # the parameter we want to recycle may be either part of the
>> -                             # list of CGI parameter, or recovered from PATH_INFO
>> -                             if ($cgi->param($symbol)) {
>> -                                     # to allow for multivalued params we use arrayref form
>> -                                     $params{$name} = [ $cgi->param($symbol) ];
>> -                             } else {
>> -                                     no strict 'refs';
>> -                                     $params{$name} = $$name if $$name;
>> -                             }
>> -                     }
>> +             while (my ($name, $val) = each %input_params) {
>> +                     $params{$name} = $input_params{$name}
>> +                             unless (exists $params{$name});
>
> Very nice, short code.  Should be something like that from
> the very beginning.

Ok, I'll try working up a patch for params merging before any
path_info extensions.

-- 
Giuseppe "Oblomov" Bilotta

next prev parent reply	other threads:[~2008-10-03  7:25 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-02  0:10 [PATCHv4] gitweb: PATH_INFO support improvements Giuseppe Bilotta
2008-10-02  0:10 ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
2008-10-02  0:10   ` [PATCHv4] gitweb: refactor input parameters parse/validation Giuseppe Bilotta
2008-10-02  0:10     ` [PATCHv4] gitweb: generate project/action/hash URLs Giuseppe Bilotta
2008-10-02  0:10       ` [PATCHv4] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
2008-10-02  0:10         ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
2008-10-02  0:10           ` [PATCHv4] gitweb: generate parent..current URLs Giuseppe Bilotta
2008-10-06  0:17             ` Jakub Narebski
2008-10-04  1:31           ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Jakub Narebski
2008-10-04  7:24             ` Giuseppe Bilotta
2008-10-04  7:48               ` Jakub Narebski
2008-10-05  8:19             ` Jakub Narebski
2008-10-03 11:28         ` [PATCHv4] gitweb: use_pathinfo filenames start with / Jakub Narebski
2008-10-03  1:48       ` [PATCHv4] gitweb: generate project/action/hash URLs Jakub Narebski
     [not found]         ` <cb7bb73a0810022330l498bdb20h703dec7833a443e@mail.gmail.com>
2008-10-03 11:24           ` Jakub Narebski
2008-10-04  1:15       ` Jakub Narebski
2008-10-03  1:36     ` [PATCHv4] gitweb: refactor input parameters parse/validation Jakub Narebski
2008-10-03  7:24       ` Giuseppe Bilotta [this message]
2008-10-03 11:20         ` Jakub Narebski
2008-10-02  8:59   ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
2008-10-02  9:43     ` Giuseppe Bilotta
2008-10-03  0:48       ` Jakub Narebski
2008-10-03  6:04         ` Giuseppe Bilotta
2008-10-03 10:31           ` Jakub Narebski
2008-10-02 15:34   ` Petr Baudis
2008-10-02 19:30     ` Giuseppe Bilotta
2008-10-02 20:56       ` Petr Baudis
2008-10-02 21:05         ` Giuseppe Bilotta
2008-10-02 22:04           ` Petr Baudis
2008-10-02 22:41             ` Jakub Narebski
2008-10-03  5:54               ` Giuseppe Bilotta
2008-10-02  8:19 ` [PATCHv4] gitweb: PATH_INFO support improvements Jakub Narebski
2008-10-02  8:49   ` Giuseppe Bilotta
2008-10-02 10:16     ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb7bb73a0810030024t663940dbqa23f2e1bb75e23bc@mail.gmail.com \
    --to=giuseppe.bilotta@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=pasky@suse.cz \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).