git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: "Giuseppe Bilotta" <giuseppe.bilotta@gmail.com>
Cc: git@vger.kernel.org, "Petr Baudis" <pasky@suse.cz>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Shawn O. Pearce" <spearce@spearce.org>
Subject: Re: [PATCHv4] gitweb: refactor input parameters parse/validation
Date: Fri, 3 Oct 2008 13:20:28 +0200	[thread overview]
Message-ID: <200810031320.28843.jnareb@gmail.com> (raw)
In-Reply-To: <cb7bb73a0810030024t663940dbqa23f2e1bb75e23bc@mail.gmail.com>

On Fri, 3 Oct 2008, Giuseppe Bilotta wrote:
> On Fri, Oct 3, 2008 at 3:36 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>> On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:

>>> We thus collect all the parameters into the new %input_params hash,
>>> removing the need for ad-hoc kludgy code in href().
>>
>> Alternate solution would be push data from PATH_INFO in query params
>> data using (for example)
>>
>>  $cgi->param('a', $action);
>>
>> or, naming parameters explicitely
>>
>>  $cgi->param(-name=>'a', -value=>$action);
>>
>> This avoids need for additional variable, reuses current code, and
>> nicely sidesteps issue whether to use long names for keys ('action',
>> 'file_name') or short ones from CGI query ('a', 'f').
>>
>>
>> It would probably has to be at least partially written to check which
>> of those two solutions (%input_params or $cgi->param('a', $a)))
>> is better.
> 
> If we really don't want to care where the parameters came from *at
> all*, writing on $cgi->param is likely to be the cleanest solution.

I now think that %input_params is a better solution, even if we won't
be implementing -true_replay / -faithful_replay option, where options
gotten in path_info would be passed in path_info, and options passed
in query string would be passed as CGI params (even if those params
could be encoded in path_info).
 
One of the advantages is that having @cgi_param_mapping near the top
provides opportunity to describe short CGI query options.

Another advantage is that with %input_params it is easy to find if
parameter is multivalued, or is meant to be multivalued: just check
ref($input_params{long_name}); well, perhaps not _that_ easy, but...

>>> As much of the
>>> parameters validation code is now shared between CGI and PATH_INFO,
>>> although this requires PATH_INFO parsing to be interleaved into the main
>>> code instead of being factored out into its own routine.
>>
>> I'm not sure if it is worth it then to unify parameters validation,
>> with such drawback.
> 
> Yeah, I don't like it very much either. But it does spare a little on
> the validation. OTOH, the amount we spare is not extraordinary, so
> it's probably not worth the spaghettization of the code ...

Avoiding repetition can be done in different ways, for example we can
have gitweb assume that if value is already in params variable
($project, $action, $hash,...) it is already validated, and if it is
only in %input_params it is not.  Add to that for example putting
common part of project path checks into validate_project and you have
all the advantages (no code repetition, spared validation) without
drawback of harder to maintain spaghetti-like code.

>>> +# fill %input_params with the CGI parameters. All values except for 'opt'
>>> +# should be single values, but opt can be an array. We should probably
>>> +# build an array of parameters that can be multi-valued, but since for the time
>>> +# being it's only this one, we just single it out
>>> +while (my ($name, $symbol) = each %cgi_param_mapping) {
>>> +     if ($symbol eq 'opt') {
>>> +             $input_params{$name} = [$cgi->param($symbol)];
>>> +     } else {
>>> +             $input_params{$name} = $cgi->param($symbol);
>>>       }
>>>  }
>>
>> If it was chosen to use short (CGI query) parameter names, the above
>> loop could be replaced by simple
>>
>>  %input_params = $cgi->Vars;
>>
>> or to be more exact, if we want to have multi-valued parameters stored
>> as array ref
>>
>>  %input_params = map { [ split /\0/ ] if /\0/; } $cgi->Vars;
>>
>>
>> See CGI(3pm):
>>
>>    When using this [Vars], the thing you must watch out for are multivalued CGI
>>    parameters.  Because a hash cannot distinguish between scalar and list con-
>>    text, multivalued parameters will be returned as a packed string, separated
>>    by the "\0" (null) character.  You must split this packed string in order
>>    to get at the individual values.
> 
> Ah, an interesting alternative. This would make parameter copying a
> one-liner, almost as good as just using $cgi->param for everything :)

Hmmmm...
 

[cut]
>> Note that this code does less checking if $project is in path_info than
>> for the case where it is set by CGI query. Perhaps there should be base
>> fast check in a loop, and more extensive test later.
> 
> Uh ... isn't this exactly what's happening? In the loop we just gobble
> until we find a git dir. Validation is then done, and it's the _same_
> validation for both cases. Why do you say that path_info $project is
> less checked?

For project from query string we now have:

  !validate_pathname($project) ||
  !(-d "$projectroot/$project") ||
  !check_head_link("$projectroot/$project") ||
  ($export_ok && !(-e "$projectroot/$project/$export_ok")) ||
  ($strict_export && !project_in_list($project))

For project from path_info we now have:

  while ($project && !check_head_link("$projectroot/$project")) {
	$project =~ s,/*[^/]*$,,;
  }
  ...
  !validate_pathname($project) ||
  ($export_ok && !-e "$projectroot/$project/$export_ok") ||
  ($strict_export && !project_in_list($project))
  
It is almost the same; I have thought that they differ more. Perhaps
some of above code could be refactored into validate_project(), or
project_ok() subroutine.

>>>       $params{'project'} = $project unless exists $params{'project'};
>>>
>>>       if ($params{-replay}) {
>>> -             while (my ($name, $symbol) = each %mapping) {
>>> -                     if (!exists $params{$name}) {
>>> -                             # the parameter we want to recycle may be either part of the
>>> -                             # list of CGI parameter, or recovered from PATH_INFO
>>> -                             if ($cgi->param($symbol)) {
>>> -                                     # to allow for multivalued params we use arrayref form
>>> -                                     $params{$name} = [ $cgi->param($symbol) ];
>>> -                             } else {
>>> -                                     no strict 'refs';
>>> -                                     $params{$name} = $$name if $$name;
>>> -                             }
>>> -                     }
>>> +             while (my ($name, $val) = each %input_params) {
>>> +                     $params{$name} = $input_params{$name}
>>> +                             unless (exists $params{$name});
>>
>> Very nice, short code.  Should be something like that from
>> the very beginning.
> 
> Ok, I'll try working up a patch for params merging before any
> path_info extensions.

Perhaps also put query string handling into evaluate_CGI_query(), or
evaluate_query_string(), similar to evaluate_path_info()?

-- 
Jakub Narebski
Poland

  reply	other threads:[~2008-10-03 11:30 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-02  0:10 [PATCHv4] gitweb: PATH_INFO support improvements Giuseppe Bilotta
2008-10-02  0:10 ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
2008-10-02  0:10   ` [PATCHv4] gitweb: refactor input parameters parse/validation Giuseppe Bilotta
2008-10-02  0:10     ` [PATCHv4] gitweb: generate project/action/hash URLs Giuseppe Bilotta
2008-10-02  0:10       ` [PATCHv4] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
2008-10-02  0:10         ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
2008-10-02  0:10           ` [PATCHv4] gitweb: generate parent..current URLs Giuseppe Bilotta
2008-10-06  0:17             ` Jakub Narebski
2008-10-04  1:31           ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Jakub Narebski
2008-10-04  7:24             ` Giuseppe Bilotta
2008-10-04  7:48               ` Jakub Narebski
2008-10-05  8:19             ` Jakub Narebski
2008-10-03 11:28         ` [PATCHv4] gitweb: use_pathinfo filenames start with / Jakub Narebski
2008-10-03  1:48       ` [PATCHv4] gitweb: generate project/action/hash URLs Jakub Narebski
     [not found]         ` <cb7bb73a0810022330l498bdb20h703dec7833a443e@mail.gmail.com>
2008-10-03 11:24           ` Jakub Narebski
2008-10-04  1:15       ` Jakub Narebski
2008-10-03  1:36     ` [PATCHv4] gitweb: refactor input parameters parse/validation Jakub Narebski
2008-10-03  7:24       ` Giuseppe Bilotta
2008-10-03 11:20         ` Jakub Narebski [this message]
2008-10-02  8:59   ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
2008-10-02  9:43     ` Giuseppe Bilotta
2008-10-03  0:48       ` Jakub Narebski
2008-10-03  6:04         ` Giuseppe Bilotta
2008-10-03 10:31           ` Jakub Narebski
2008-10-02 15:34   ` Petr Baudis
2008-10-02 19:30     ` Giuseppe Bilotta
2008-10-02 20:56       ` Petr Baudis
2008-10-02 21:05         ` Giuseppe Bilotta
2008-10-02 22:04           ` Petr Baudis
2008-10-02 22:41             ` Jakub Narebski
2008-10-03  5:54               ` Giuseppe Bilotta
2008-10-02  8:19 ` [PATCHv4] gitweb: PATH_INFO support improvements Jakub Narebski
2008-10-02  8:49   ` Giuseppe Bilotta
2008-10-02 10:16     ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200810031320.28843.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=giuseppe.bilotta@gmail.com \
    --cc=pasky@suse.cz \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).