From: "Giuseppe Bilotta" <giuseppe.bilotta@gmail.com>
To: "Jakub Narebski" <jnareb@gmail.com>
Cc: git@vger.kernel.org, "Petr Baudis" <pasky@suse.cz>,
"Junio C Hamano" <gitster@pobox.com>,
"Shawn O. Pearce" <spearce@spearce.org>
Subject: Re: [PATCHv4] gitweb: refactor input parameters parse/validation
Date: Fri, 3 Oct 2008 09:24:35 +0200 [thread overview]
Message-ID: <cb7bb73a0810030024t663940dbqa23f2e1bb75e23bc@mail.gmail.com> (raw)
In-Reply-To: <200810030336.31469.jnareb@gmail.com>
On Fri, Oct 3, 2008 at 3:36 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:
>
>> Since input parameters can now be obtained both from CGI parameters and
>> PATH_INFO, we would like most of the code to be agnostic about the way
>> parameters were retrieved.
>
> I'd prefer that this cleanup/refactoring patch was _first_ patch in
> the series, as we were able to obtain parameters both from CGI query
> parameters and from PATH_INFO ($project, $hash or $hash_base+$file_name)
> _before_ first patch in this series. So it correct not only issue
> introduced by first patch (and fixed somewhat there), but what was
> outstanding (but rare because gitweb did not generate such links)
> issue.
It's true that the patch makes sense regardless of the rest of the
path_info patch, indeed.
>> We thus collect all the parameters into the new %input_params hash,
>> removing the need for ad-hoc kludgy code in href().
>
> Alternate solution would be push data from PATH_INFO in query params
> data using (for example)
>
> $cgi->param('a', $action);
>
> or, naming parameters explicitely
>
> $cgi->param(-name=>'a', -value=>$action);
>
> This avoids need for additional variable, reuses current code, and
> nicely sidesteps issue whether to use long names for keys ('action',
> 'file_name') or short ones from CGI query ('a', 'f').
>
>
> It would probably has to be at least partially written to check which
> of those two solutions (%input_params or $cgi->param('a', $a)))
> is better.
If we really don't want to care where the parameters came from *at
all*, writing on $cgi->param is likely to be the cleanest solution.
>> As much of the
>> parameters validation code is now shared between CGI and PATH_INFO,
>> although this requires PATH_INFO parsing to be interleaved into the main
>> code instead of being factored out into its own routine.
>
> I'm not sure if it is worth it then to unify parameters validation,
> with such drawback.
Yeah, I don't like it very much either. But it does spare a little on
the validation. OTOH, the amount we spare is not extraordinary, so
it's probably not worth the spaghettization of the code ...
>> # if we're called with PATH_INFO, we have to strip that
>> -# from the URL to find our real URL
>> -if (my $path_info = $ENV{"PATH_INFO"}) {
>> +# from the URL to find our real URL. PATH_INFO is kept
>> +# as it's used later on for parameter extraction
>> +my $path_info = $ENV{"PATH_INFO"};
>> +if ($path_info) {
>> $my_url =~ s,\Q$path_info\E$,,;
>> $my_uri =~ s,\Q$path_info\E$,,;
>> }
>
> This might be separate patch, if you wanted to increase your commit
> count ;-) More seriously I think it should be at least briefly
> mentioned in commit message (make $path_info global).
Note however that the path_info evaluation is _destructive_, so after
evaluation is complete we don't have much of it left.
[snip]
> Nice, although I'm not sure if [%@]cgi_param_mapping has to global.
> If we use long parameters names as keys, I think it has to, somewhat.
> See also comment below.
>> +# fill %input_params with the CGI parameters. All values except for 'opt'
>> +# should be single values, but opt can be an array. We should probably
>> +# build an array of parameters that can be multi-valued, but since for the time
>> +# being it's only this one, we just single it out
>> +while (my ($name, $symbol) = each %cgi_param_mapping) {
>> + if ($symbol eq 'opt') {
>> + $input_params{$name} = [$cgi->param($symbol)];
>> + } else {
>> + $input_params{$name} = $cgi->param($symbol);
>> }
>> }
>
> If it was chosen to use short (CGI query) parameter names, the above
> loop could be replaced by simple
>
> %input_params = $cgi->Vars;
>
> or to be more exact, if we want to have multi-valued parameters stored
> as array ref
>
> %input_params = map { [ split /\0/ ] if /\0/; } $cgi->Vars;
>
>
> See CGI(3pm):
>
> When using this [Vars], the thing you must watch out for are multivalued CGI
> parameters. Because a hash cannot distinguish between scalar and list con-
> text, multivalued parameters will be returned as a packed string, separated
> by the "\0" (null) character. You must split this packed string in order
> to get at the individual values.
Ah, an interesting alternative. This would make parameter copying a
one-liner, almost as good as just using $cgi->param for everything :)
>> -# parameters which are pathnames
>> -our $project = $cgi->param('p');
>> +# next, we want to parse PATH_INFO (which was already stored in $path_info at
>> +# the beginning). This is a little hairy because PATH_INFO parsing needs some
>> +# form of parameter validation, so we interleave parsing and validation.
>
> I don't think it is a good idea. In my opinion, for my taste, it would
> be better to separate evaluating path_info from the rest.
>
> We could instead introduce convention that if variable (like $project)
> is set, then it is assumed to be validated; if it is present only in
> the %input_params hash, then it has to be validated.
>
>
> On the other hand this ordering, first by parameter, then by method
> of extraction could be seem quite equally valid. Nevertheless I think
> that previous flow with separate evaluate_path_info() and what should
> be evaluate_CGI_query() has better encapsulation.
>
>> +#
>> +# The accepted PATH_INFO syntax is $project/$action/$hash or
>> +# $project/$action/$hash_base:$file_name, where $action may be missing (mostly for
>> +# backwards compatibility), so we need to parse and validate the parameters in
>> +# this same order.
>> +
>> +# clear $path_info of any leading /
>> +$path_info =~ s,^/+,,;
>> +
>> +our $project = $input_params{'project'};
>> +if ($path_info && !defined $project) {
>> + # if $project was not defined by CGI, we try to extract it from
>> + # $path_info
>> + $project = $path_info;
>> + $project =~ s,/+$,,;
>> + while ($project && !check_head_link("$projectroot/$project")) {
>> + $project =~ s,/*[^/]*$,,;
>> + }
>> + $input_params{'project'} = $project;
>> +} else {
>> + # otherwise, we suppress $path_info parsing altogether
>> + $path_info = undef;
>> +}
>> +
>> +# project name validation
>> if (defined $project) {
>> if (!validate_pathname($project) ||
>> !(-d "$projectroot/$project") ||
>
> Note that this code does less checking if $project is in path_info than
> for the case where it is set by CGI query. Perhaps there should be base
> fast check in a loop, and more extensive test later.
Uh ... isn't this exactly what's happening? In the loop we just gobble
until we find a git dir. Validation is then done, and it's the _same_
validation for both cases. Why do you say that path_info $project is
less checked?
>> @@ -408,16 +506,66 @@ if (defined $project) {
>> undef $project;
>> die_error(404, "No such project");
>> }
>> +
>> + # we purge the $project name from the $path_info, preparing it for
>> + # subsequent parameters extraction
>> + $path_info =~ s,^\Q$project\E/*,, if defined $path_info;
>> +} else {
>> + # we also suppress $path_info parsing if no project was defined
>> + $path_info = undef;
>> +}
>
> In evaluate_path_info it was simply 'return if...'; here with mentioned
> interleaving it is harder and a bit hacky.
I know.
>> $params{'project'} = $project unless exists $params{'project'};
>>
>> if ($params{-replay}) {
>> - while (my ($name, $symbol) = each %mapping) {
>> - if (!exists $params{$name}) {
>> - # the parameter we want to recycle may be either part of the
>> - # list of CGI parameter, or recovered from PATH_INFO
>> - if ($cgi->param($symbol)) {
>> - # to allow for multivalued params we use arrayref form
>> - $params{$name} = [ $cgi->param($symbol) ];
>> - } else {
>> - no strict 'refs';
>> - $params{$name} = $$name if $$name;
>> - }
>> - }
>> + while (my ($name, $val) = each %input_params) {
>> + $params{$name} = $input_params{$name}
>> + unless (exists $params{$name});
>
> Very nice, short code. Should be something like that from
> the very beginning.
Ok, I'll try working up a patch for params merging before any
path_info extensions.
--
Giuseppe "Oblomov" Bilotta
next prev parent reply other threads:[~2008-10-03 7:25 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-02 0:10 [PATCHv4] gitweb: PATH_INFO support improvements Giuseppe Bilotta
2008-10-02 0:10 ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
2008-10-02 0:10 ` [PATCHv4] gitweb: refactor input parameters parse/validation Giuseppe Bilotta
2008-10-02 0:10 ` [PATCHv4] gitweb: generate project/action/hash URLs Giuseppe Bilotta
2008-10-02 0:10 ` [PATCHv4] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
2008-10-02 0:10 ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
2008-10-02 0:10 ` [PATCHv4] gitweb: generate parent..current URLs Giuseppe Bilotta
2008-10-06 0:17 ` Jakub Narebski
2008-10-04 1:31 ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Jakub Narebski
2008-10-04 7:24 ` Giuseppe Bilotta
2008-10-04 7:48 ` Jakub Narebski
2008-10-05 8:19 ` Jakub Narebski
2008-10-03 11:28 ` [PATCHv4] gitweb: use_pathinfo filenames start with / Jakub Narebski
2008-10-03 1:48 ` [PATCHv4] gitweb: generate project/action/hash URLs Jakub Narebski
[not found] ` <cb7bb73a0810022330l498bdb20h703dec7833a443e@mail.gmail.com>
2008-10-03 11:24 ` Jakub Narebski
2008-10-04 1:15 ` Jakub Narebski
2008-10-03 1:36 ` [PATCHv4] gitweb: refactor input parameters parse/validation Jakub Narebski
2008-10-03 7:24 ` Giuseppe Bilotta [this message]
2008-10-03 11:20 ` Jakub Narebski
2008-10-02 8:59 ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
2008-10-02 9:43 ` Giuseppe Bilotta
2008-10-03 0:48 ` Jakub Narebski
2008-10-03 6:04 ` Giuseppe Bilotta
2008-10-03 10:31 ` Jakub Narebski
2008-10-02 15:34 ` Petr Baudis
2008-10-02 19:30 ` Giuseppe Bilotta
2008-10-02 20:56 ` Petr Baudis
2008-10-02 21:05 ` Giuseppe Bilotta
2008-10-02 22:04 ` Petr Baudis
2008-10-02 22:41 ` Jakub Narebski
2008-10-03 5:54 ` Giuseppe Bilotta
2008-10-02 8:19 ` [PATCHv4] gitweb: PATH_INFO support improvements Jakub Narebski
2008-10-02 8:49 ` Giuseppe Bilotta
2008-10-02 10:16 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cb7bb73a0810030024t663940dbqa23f2e1bb75e23bc@mail.gmail.com \
--to=giuseppe.bilotta@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jnareb@gmail.com \
--cc=pasky@suse.cz \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).