git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4] gitweb: PATH_INFO support improvements
@ 2008-10-02  0:10 Giuseppe Bilotta
  2008-10-02  0:10 ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
  2008-10-02  8:19 ` [PATCHv4] gitweb: PATH_INFO support improvements Jakub Narebski
  0 siblings, 2 replies; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-02  0:10 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Shawn O. Pearce,
	Giuseppe Bilotta

Fourth version of my gitweb PATH_INFO patchset, whose purpose is to
reduce the use of CGI parameters by embedding as many parameters as
possible in the URL path itself, provided the pathinfo feature is
enabled.

The new typical gitweb URL is therefore in the form

$project/$action/$parent:$file..$hash:$file

(with useless parts stripped). Backwards compatibility for old-style
$project/$hash URLs is kept, as long as $hash is not a refname whose
name happens to match a git action.

The main implementation is provided by paired patches (#1#3, #5#6)
that implement parsing and generation of the new style URLs.

Patch #2 deals with a refactoring of the input parameters parsing and
validation, so that the rest of gitweb can be agnostic wrt to the
parameters' origin (CGI vs PATH_INFO vs possible other future inputs
such as CLI).

Patch #4 is a minor improvement to the URL syntax that allows web
documents to be properly browsable in raw mode.

Giuseppe Bilotta (6):
  gitweb: parse project/action/hash_base:filename PATH_INFO
  gitweb: refactor input parameters parse/validation
  gitweb: generate project/action/hash URLs
  gitweb: use_pathinfo filenames start with /
  gitweb: parse parent..current syntax from pathinfo
  gitweb: generate parent..current URLs

 gitweb/gitweb.perl |  392 ++++++++++++++++++++++++++++++++++------------------
 1 files changed, 254 insertions(+), 138 deletions(-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-02  0:10 [PATCHv4] gitweb: PATH_INFO support improvements Giuseppe Bilotta
@ 2008-10-02  0:10 ` Giuseppe Bilotta
  2008-10-02  0:10   ` [PATCHv4] gitweb: refactor input parameters parse/validation Giuseppe Bilotta
                     ` (2 more replies)
  2008-10-02  8:19 ` [PATCHv4] gitweb: PATH_INFO support improvements Jakub Narebski
  1 sibling, 3 replies; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-02  0:10 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Shawn O. Pearce,
	Giuseppe Bilotta

This patch enables gitweb to parse URLs with more information embedded
in PATH_INFO, reducing the need for CGI parameters. The typical gitweb
path is now $project/$action/$hash_base:$file_name or
$project/$action/$hash

This is mostly backwards compatible with the old-style gitweb paths,
except when $project/$branch was used to access a branch whose name
matches a gitweb action.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 gitweb/gitweb.perl |   90 +++++++++++++++++++++++++++++++---------------------
 1 files changed, 54 insertions(+), 36 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e7e4d6b..f088681 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -495,6 +495,37 @@ if (defined $searchtext) {
 	$search_regexp = $search_use_regexp ? $searchtext : quotemeta $searchtext;
 }
 
+# dispatch
+my %actions = (
+	"blame" => \&git_blame,
+	"blobdiff" => \&git_blobdiff,
+	"blobdiff_plain" => \&git_blobdiff_plain,
+	"blob" => \&git_blob,
+	"blob_plain" => \&git_blob_plain,
+	"commitdiff" => \&git_commitdiff,
+	"commitdiff_plain" => \&git_commitdiff_plain,
+	"commit" => \&git_commit,
+	"forks" => \&git_forks,
+	"heads" => \&git_heads,
+	"history" => \&git_history,
+	"log" => \&git_log,
+	"rss" => \&git_rss,
+	"atom" => \&git_atom,
+	"search" => \&git_search,
+	"search_help" => \&git_search_help,
+	"shortlog" => \&git_shortlog,
+	"summary" => \&git_summary,
+	"tag" => \&git_tag,
+	"tags" => \&git_tags,
+	"tree" => \&git_tree,
+	"snapshot" => \&git_snapshot,
+	"object" => \&git_object,
+	# those below don't need $project
+	"opml" => \&git_opml,
+	"project_list" => \&git_project_list,
+	"project_index" => \&git_project_index,
+);
+
 # now read PATH_INFO and use it as alternative to parameters
 sub evaluate_path_info {
 	return if defined $project;
@@ -519,9 +550,19 @@ sub evaluate_path_info {
 	# do not change any parameters if an action is given using the query string
 	return if $action;
 	$path_info =~ s,^\Q$project\E/*,,;
+
+	# next comes the action
+	$action = $path_info;
+	$action =~ s,/.*$,,;
+	if (exists $actions{$action}) {
+		$path_info =~ s,^$action/*,,;
+	} else {
+		$action  = undef;
+	}
+
 	my ($refname, $pathname) = split(/:/, $path_info, 2);
 	if (defined $pathname) {
-		# we got "project.git/branch:filename" or "project.git/branch:dir/"
+		# we got "project.git/action/branch:filename" or "project.git/action/branch:dir/"
 		# we could use git_get_type(branch:pathname), but it needs $git_dir
 		$pathname =~ s,^/+,,;
 		if (!$pathname || substr($pathname, -1) eq "/") {
@@ -534,8 +575,9 @@ sub evaluate_path_info {
 		$file_name ||= validate_pathname($pathname);
 	} elsif (defined $refname) {
 		# we got "project.git/branch"
-		$action ||= "shortlog";
-		$hash   ||= validate_refname($refname);
+		$action    ||= "shortlog";
+		$hash      ||= validate_refname($refname);
+		$hash_base ||= validate_refname($refname);
 	}
 }
 evaluate_path_info();
@@ -544,37 +586,6 @@ evaluate_path_info();
 our $git_dir;
 $git_dir = "$projectroot/$project" if $project;
 
-# dispatch
-my %actions = (
-	"blame" => \&git_blame,
-	"blobdiff" => \&git_blobdiff,
-	"blobdiff_plain" => \&git_blobdiff_plain,
-	"blob" => \&git_blob,
-	"blob_plain" => \&git_blob_plain,
-	"commitdiff" => \&git_commitdiff,
-	"commitdiff_plain" => \&git_commitdiff_plain,
-	"commit" => \&git_commit,
-	"forks" => \&git_forks,
-	"heads" => \&git_heads,
-	"history" => \&git_history,
-	"log" => \&git_log,
-	"rss" => \&git_rss,
-	"atom" => \&git_atom,
-	"search" => \&git_search,
-	"search_help" => \&git_search_help,
-	"shortlog" => \&git_shortlog,
-	"summary" => \&git_summary,
-	"tag" => \&git_tag,
-	"tags" => \&git_tags,
-	"tree" => \&git_tree,
-	"snapshot" => \&git_snapshot,
-	"object" => \&git_object,
-	# those below don't need $project
-	"opml" => \&git_opml,
-	"project_list" => \&git_project_list,
-	"project_index" => \&git_project_index,
-);
-
 if (!defined $action) {
 	if (defined $hash) {
 		$action = git_get_type($hash);
@@ -631,8 +642,15 @@ sub href (%) {
 	if ($params{-replay}) {
 		while (my ($name, $symbol) = each %mapping) {
 			if (!exists $params{$name}) {
-				# to allow for multivalued params we use arrayref form
-				$params{$name} = [ $cgi->param($symbol) ];
+				# the parameter we want to recycle may be either part of the
+				# list of CGI parameter, or recovered from PATH_INFO
+				if ($cgi->param($symbol)) {
+					# to allow for multivalued params we use arrayref form
+					$params{$name} = [ $cgi->param($symbol) ];
+				} else {
+					no strict 'refs';
+					$params{$name} = $$name if $$name;
+				}
 			}
 		}
 	}
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv4] gitweb: refactor input parameters parse/validation
  2008-10-02  0:10 ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
@ 2008-10-02  0:10   ` Giuseppe Bilotta
  2008-10-02  0:10     ` [PATCHv4] gitweb: generate project/action/hash URLs Giuseppe Bilotta
  2008-10-03  1:36     ` [PATCHv4] gitweb: refactor input parameters parse/validation Jakub Narebski
  2008-10-02  8:59   ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
  2008-10-02 15:34   ` Petr Baudis
  2 siblings, 2 replies; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-02  0:10 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Shawn O. Pearce,
	Giuseppe Bilotta

Since input parameters can now be obtained both from CGI parameters and
PATH_INFO, we would like most of the code to be agnostic about the way
parameters were retrieved.

We thus collect all the parameters into the new %input_params hash,
removing the need for ad-hoc kludgy code in href(). As much of the
parameters validation code is now shared between CGI and PATH_INFO,
although this requires PATH_INFO parsing to be interleaved into the main
code instead of being factored out into its own routine.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 gitweb/gitweb.perl |  336 ++++++++++++++++++++++++++++------------------------
 1 files changed, 183 insertions(+), 153 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index f088681..ec4326f 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -28,8 +28,10 @@ our $my_url = $cgi->url();
 our $my_uri = $cgi->url(-absolute => 1);
 
 # if we're called with PATH_INFO, we have to strip that
-# from the URL to find our real URL
-if (my $path_info = $ENV{"PATH_INFO"}) {
+# from the URL to find our real URL. PATH_INFO is kept
+# as it's used later on for parameter extraction
+my $path_info = $ENV{"PATH_INFO"};
+if ($path_info) {
 	$my_url =~ s,\Q$path_info\E$,,;
 	$my_uri =~ s,\Q$path_info\E$,,;
 }
@@ -390,15 +392,111 @@ $projects_list ||= $projectroot;
 
 # ======================================================================
 # input validation and dispatch
-our $action = $cgi->param('a');
-if (defined $action) {
-	if ($action =~ m/[^0-9a-zA-Z\.\-_]/) {
-		die_error(400, "Invalid action parameter");
+
+# input parameters can be collected from a variety of sources (presently, CGI
+# and PATH_INFO), so we define an %input_params hash that collects them all
+# together during validation: this allows subsequent uses (e.g. href()) to be
+# agnostic of the parameter origin
+
+my %input_params = ();
+
+# input parameters are stored with the long parameter name as key, so we need
+# the name -> CGI key mapping here. This will also be used in the href
+# subroutine to convert parameters to their CGI equivalent.
+#
+# XXX: Warning: If you touch this, check the search form for updating,
+# too.
+
+my @cgi_param_mapping = (
+	project => "p",
+	action => "a",
+	file_name => "f",
+	file_parent => "fp",
+	hash => "h",
+	hash_parent => "hp",
+	hash_base => "hb",
+	hash_parent_base => "hpb",
+	page => "pg",
+	order => "o",
+	searchtext => "s",
+	searchtype => "st",
+	snapshot_format => "sf",
+	extra_options => "opt",
+	search_use_regexp => "sr",
+);
+my %cgi_param_mapping = @cgi_param_mapping;
+
+# we will also need to know the possible actions, for validation
+my %actions = (
+	"blame" => \&git_blame,
+	"blobdiff" => \&git_blobdiff,
+	"blobdiff_plain" => \&git_blobdiff_plain,
+	"blob" => \&git_blob,
+	"blob_plain" => \&git_blob_plain,
+	"commitdiff" => \&git_commitdiff,
+	"commitdiff_plain" => \&git_commitdiff_plain,
+	"commit" => \&git_commit,
+	"forks" => \&git_forks,
+	"heads" => \&git_heads,
+	"history" => \&git_history,
+	"log" => \&git_log,
+	"rss" => \&git_rss,
+	"atom" => \&git_atom,
+	"search" => \&git_search,
+	"search_help" => \&git_search_help,
+	"shortlog" => \&git_shortlog,
+	"summary" => \&git_summary,
+	"tag" => \&git_tag,
+	"tags" => \&git_tags,
+	"tree" => \&git_tree,
+	"snapshot" => \&git_snapshot,
+	"object" => \&git_object,
+	# those below don't need $project
+	"opml" => \&git_opml,
+	"project_list" => \&git_project_list,
+	"project_index" => \&git_project_index,
+);
+
+# fill %input_params with the CGI parameters. All values except for 'opt'
+# should be single values, but opt can be an array. We should probably
+# build an array of parameters that can be multi-valued, but since for the time
+# being it's only this one, we just single it out
+while (my ($name, $symbol) = each %cgi_param_mapping) {
+	if ($symbol eq 'opt') {
+		$input_params{$name} = [$cgi->param($symbol)];
+	} else {
+		$input_params{$name} = $cgi->param($symbol);
 	}
 }
 
-# parameters which are pathnames
-our $project = $cgi->param('p');
+# next, we want to parse PATH_INFO (which was already stored in $path_info at
+# the beginning). This is a little hairy because PATH_INFO parsing needs some
+# form of parameter validation, so we interleave parsing and validation.
+#
+# The accepted PATH_INFO syntax is $project/$action/$hash or
+# $project/$action/$hash_base:$file_name, where $action may be missing (mostly for
+# backwards compatibility), so we need to parse and validate the parameters in
+# this same order.
+
+# clear $path_info of any leading /
+$path_info =~ s,^/+,,;
+
+our $project = $input_params{'project'};
+if ($path_info && !defined $project) {
+	# if $project was not defined by CGI, we try to extract it from
+	# $path_info
+	$project = $path_info;
+	$project =~ s,/+$,,;
+	while ($project && !check_head_link("$projectroot/$project")) {
+		$project =~ s,/*[^/]*$,,;
+	}
+	$input_params{'project'} = $project;
+} else {
+	# otherwise, we suppress $path_info parsing altogether
+	$path_info = undef;
+}
+
+# project name validation
 if (defined $project) {
 	if (!validate_pathname($project) ||
 	    !(-d "$projectroot/$project") ||
@@ -408,16 +506,66 @@ if (defined $project) {
 		undef $project;
 		die_error(404, "No such project");
 	}
+
+	# we purge the $project name from the $path_info, preparing it for
+	# subsequent parameters extraction
+	$path_info =~ s,^\Q$project\E/*,, if defined $path_info;
+} else {
+	# we also suppress $path_info parsing if no project was defined
+	$path_info = undef;
+}
+
+# action parsing/validation
+our $action = $input_params{'action'};
+if ($path_info && !defined $action) {
+	# next comes the action
+	$action = $path_info;
+	$action =~ s,/.*$,,;
+	if (exists $actions{$action}) {
+		# remove the action from $path_info, and sync %input_params
+		$path_info =~ s,^$action/*,,;
+		$input_params{'action'} = $action;
+	} else {
+		$action  = undef;
+	}
+} elsif (defined $action && !exists $actions{$action}) {
+	die_error(400, "Invalid action parameter");
+}
+
+# we can now parse ref and pathnames in PATH_INFO
+if ($path_info) {
+	my ($refname, $pathname) = split(/:/, $path_info, 2);
+	if (defined $pathname) {
+		# we got "project.git/action/branch:filename" or "project.git/action/branch:dir/"
+		# we could use git_get_type(branch:pathname), but it needs $git_dir
+		$pathname =~ s,^/+,,;
+		if (!$pathname || substr($pathname, -1) eq "/") {
+			$input_params{'action'} ||= "tree";
+			$pathname =~ s,/$,,;
+		} else {
+			$input_params{'action'}  ||= "blob_plain";
+		}
+		$input_params{'hash_base'} ||= $refname;
+		$input_params{'file_name'} ||= $pathname;
+	} elsif (defined $refname) {
+		# we got "project.git/branch"
+		$input_params{'action'}    ||= "shortlog";
+		$input_params{'hash'}      ||= $refname;
+		$input_params{'hash_base'} ||= $refname;
+	}
 }
 
-our $file_name = $cgi->param('f');
+# and now the rest of the validation
+
+# parameters which are pathnames
+our $file_name = $input_params{'file_name'};
 if (defined $file_name) {
 	if (!validate_pathname($file_name)) {
 		die_error(400, "Invalid file parameter");
 	}
 }
 
-our $file_parent = $cgi->param('fp');
+our $file_parent = $input_params{'file_parent'};
 if (defined $file_parent) {
 	if (!validate_pathname($file_parent)) {
 		die_error(400, "Invalid file parent parameter");
@@ -425,21 +573,21 @@ if (defined $file_parent) {
 }
 
 # parameters which are refnames
-our $hash = $cgi->param('h');
+our $hash = $input_params{'hash'};
 if (defined $hash) {
 	if (!validate_refname($hash)) {
 		die_error(400, "Invalid hash parameter");
 	}
 }
 
-our $hash_parent = $cgi->param('hp');
+our $hash_parent = $input_params{'hash_parent'};
 if (defined $hash_parent) {
 	if (!validate_refname($hash_parent)) {
 		die_error(400, "Invalid hash parent parameter");
 	}
 }
 
-our $hash_base = $cgi->param('hb');
+our $hash_base = $input_params{'hash_base'};
 if (defined $hash_base) {
 	if (!validate_refname($hash_base)) {
 		die_error(400, "Invalid hash base parameter");
@@ -450,19 +598,19 @@ my %allowed_options = (
 	"--no-merges" => [ qw(rss atom log shortlog history) ],
 );
 
-our @extra_options = $cgi->param('opt');
-if (defined @extra_options) {
-	foreach my $opt (@extra_options) {
-		if (not exists $allowed_options{$opt}) {
-			die_error(400, "Invalid option parameter");
-		}
-		if (not grep(/^$action$/, @{$allowed_options{$opt}})) {
-			die_error(400, "Invalid option parameter for this action");
-		}
+our @extra_options = @{$input_params{'extra_options'}};
+# @extra_options is always defined, since it can only be (currently) set from
+# CGI, and $cgi->param() returns the empty array in array context
+foreach my $opt (@extra_options) {
+	if (not exists $allowed_options{$opt}) {
+		die_error(400, "Invalid option parameter");
+	}
+	if (not grep(/^$action$/, @{$allowed_options{$opt}})) {
+		die_error(400, "Invalid option parameter for this action");
 	}
 }
 
-our $hash_parent_base = $cgi->param('hpb');
+our $hash_parent_base = $input_params{'hash_parent_base'};
 if (defined $hash_parent_base) {
 	if (!validate_refname($hash_parent_base)) {
 		die_error(400, "Invalid hash parent base parameter");
@@ -470,23 +618,23 @@ if (defined $hash_parent_base) {
 }
 
 # other parameters
-our $page = $cgi->param('pg');
+our $page = $input_params{'page'};
 if (defined $page) {
 	if ($page =~ m/[^0-9]/) {
 		die_error(400, "Invalid page parameter");
 	}
 }
 
-our $searchtype = $cgi->param('st');
+our $searchtype = $input_params{'searchtype'};
 if (defined $searchtype) {
 	if ($searchtype =~ m/[^a-z]/) {
 		die_error(400, "Invalid searchtype parameter");
 	}
 }
 
-our $search_use_regexp = $cgi->param('sr');
+our $search_use_regexp = $input_params{'search_use_regexp'};
 
-our $searchtext = $cgi->param('s');
+our $searchtext = $input_params{'searchtext'};
 our $search_regexp;
 if (defined $searchtext) {
 	if (length($searchtext) < 2) {
@@ -495,93 +643,6 @@ if (defined $searchtext) {
 	$search_regexp = $search_use_regexp ? $searchtext : quotemeta $searchtext;
 }
 
-# dispatch
-my %actions = (
-	"blame" => \&git_blame,
-	"blobdiff" => \&git_blobdiff,
-	"blobdiff_plain" => \&git_blobdiff_plain,
-	"blob" => \&git_blob,
-	"blob_plain" => \&git_blob_plain,
-	"commitdiff" => \&git_commitdiff,
-	"commitdiff_plain" => \&git_commitdiff_plain,
-	"commit" => \&git_commit,
-	"forks" => \&git_forks,
-	"heads" => \&git_heads,
-	"history" => \&git_history,
-	"log" => \&git_log,
-	"rss" => \&git_rss,
-	"atom" => \&git_atom,
-	"search" => \&git_search,
-	"search_help" => \&git_search_help,
-	"shortlog" => \&git_shortlog,
-	"summary" => \&git_summary,
-	"tag" => \&git_tag,
-	"tags" => \&git_tags,
-	"tree" => \&git_tree,
-	"snapshot" => \&git_snapshot,
-	"object" => \&git_object,
-	# those below don't need $project
-	"opml" => \&git_opml,
-	"project_list" => \&git_project_list,
-	"project_index" => \&git_project_index,
-);
-
-# now read PATH_INFO and use it as alternative to parameters
-sub evaluate_path_info {
-	return if defined $project;
-	my $path_info = $ENV{"PATH_INFO"};
-	return if !$path_info;
-	$path_info =~ s,^/+,,;
-	return if !$path_info;
-	# find which part of PATH_INFO is project
-	$project = $path_info;
-	$project =~ s,/+$,,;
-	while ($project && !check_head_link("$projectroot/$project")) {
-		$project =~ s,/*[^/]*$,,;
-	}
-	# validate project
-	$project = validate_pathname($project);
-	if (!$project ||
-	    ($export_ok && !-e "$projectroot/$project/$export_ok") ||
-	    ($strict_export && !project_in_list($project))) {
-		undef $project;
-		return;
-	}
-	# do not change any parameters if an action is given using the query string
-	return if $action;
-	$path_info =~ s,^\Q$project\E/*,,;
-
-	# next comes the action
-	$action = $path_info;
-	$action =~ s,/.*$,,;
-	if (exists $actions{$action}) {
-		$path_info =~ s,^$action/*,,;
-	} else {
-		$action  = undef;
-	}
-
-	my ($refname, $pathname) = split(/:/, $path_info, 2);
-	if (defined $pathname) {
-		# we got "project.git/action/branch:filename" or "project.git/action/branch:dir/"
-		# we could use git_get_type(branch:pathname), but it needs $git_dir
-		$pathname =~ s,^/+,,;
-		if (!$pathname || substr($pathname, -1) eq "/") {
-			$action  ||= "tree";
-			$pathname =~ s,/$,,;
-		} else {
-			$action  ||= "blob_plain";
-		}
-		$hash_base ||= validate_refname($refname);
-		$file_name ||= validate_pathname($pathname);
-	} elsif (defined $refname) {
-		# we got "project.git/branch"
-		$action    ||= "shortlog";
-		$hash      ||= validate_refname($refname);
-		$hash_base ||= validate_refname($refname);
-	}
-}
-evaluate_path_info();
-
 # path to the current git repository
 our $git_dir;
 $git_dir = "$projectroot/$project" if $project;
@@ -615,43 +676,12 @@ sub href (%) {
 	# default is to use -absolute url() i.e. $my_uri
 	my $href = $params{-full} ? $my_url : $my_uri;
 
-	# XXX: Warning: If you touch this, check the search form for updating,
-	# too.
-
-	my @mapping = (
-		project => "p",
-		action => "a",
-		file_name => "f",
-		file_parent => "fp",
-		hash => "h",
-		hash_parent => "hp",
-		hash_base => "hb",
-		hash_parent_base => "hpb",
-		page => "pg",
-		order => "o",
-		searchtext => "s",
-		searchtype => "st",
-		snapshot_format => "sf",
-		extra_options => "opt",
-		search_use_regexp => "sr",
-	);
-	my %mapping = @mapping;
-
 	$params{'project'} = $project unless exists $params{'project'};
 
 	if ($params{-replay}) {
-		while (my ($name, $symbol) = each %mapping) {
-			if (!exists $params{$name}) {
-				# the parameter we want to recycle may be either part of the
-				# list of CGI parameter, or recovered from PATH_INFO
-				if ($cgi->param($symbol)) {
-					# to allow for multivalued params we use arrayref form
-					$params{$name} = [ $cgi->param($symbol) ];
-				} else {
-					no strict 'refs';
-					$params{$name} = $$name if $$name;
-				}
-			}
+		while (my ($name, $val) = each %input_params) {
+			$params{$name} = $input_params{$name}
+				unless (exists $params{$name});
 		}
 	}
 
@@ -669,8 +699,8 @@ sub href (%) {
 
 	# now encode the parameters explicitly
 	my @result = ();
-	for (my $i = 0; $i < @mapping; $i += 2) {
-		my ($name, $symbol) = ($mapping[$i], $mapping[$i+1]);
+	for (my $i = 0; $i < @cgi_param_mapping; $i += 2) {
+		my ($name, $symbol) = ($cgi_param_mapping[$i], $cgi_param_mapping[$i+1]);
 		if (defined $params{$name}) {
 			if (ref($params{$name}) eq "ARRAY") {
 				foreach my $par (@{$params{$name}}) {
@@ -4006,7 +4036,7 @@ sub git_search_grep_body {
 ## actions
 
 sub git_project_list {
-	my $order = $cgi->param('o');
+	my $order = $input_params{'order'};
 	if (defined $order && $order !~ m/none|project|descr|owner|age/) {
 		die_error(400, "Unknown order parameter");
 	}
@@ -4029,7 +4059,7 @@ sub git_project_list {
 }
 
 sub git_forks {
-	my $order = $cgi->param('o');
+	my $order = $input_params{'order'};
 	if (defined $order && $order !~ m/none|project|descr|owner|age/) {
 		die_error(400, "Unknown order parameter");
 	}
@@ -4562,7 +4592,7 @@ sub git_snapshot {
 	my @supported_fmts = gitweb_check_feature('snapshot');
 	@supported_fmts = filter_snapshot_fmts(@supported_fmts);
 
-	my $format = $cgi->param('sf');
+	my $format = $input_params{'snapshot_format'};
 	if (!@supported_fmts) {
 		die_error(403, "Snapshots not allowed");
 	}
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv4] gitweb: generate project/action/hash URLs
  2008-10-02  0:10   ` [PATCHv4] gitweb: refactor input parameters parse/validation Giuseppe Bilotta
@ 2008-10-02  0:10     ` Giuseppe Bilotta
  2008-10-02  0:10       ` [PATCHv4] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
                         ` (2 more replies)
  2008-10-03  1:36     ` [PATCHv4] gitweb: refactor input parameters parse/validation Jakub Narebski
  1 sibling, 3 replies; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-02  0:10 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Shawn O. Pearce,
	Giuseppe Bilotta

When generating path info URLs, reduce the number of CGI parameters by
embedding action and hash_parent:filename or hash in the path.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 gitweb/gitweb.perl |   32 +++++++++++++++++++++++++++++---
 1 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index ec4326f..2c380ac 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -687,14 +687,40 @@ sub href (%) {
 
 	my ($use_pathinfo) = gitweb_check_feature('pathinfo');
 	if ($use_pathinfo) {
-		# use PATH_INFO for project name
+		# try to put as many parameters as possible in PATH_INFO:
+		#   - project name
+		#   - action
+		#   - hash or hash_base:filename
+
+		# Strip any trailing / from $href, or we might get double
+		# slashes when the script is the DirectoryIndex
+		#
+		$href =~ s,/$,,;
+
+		# Then add the project name, if present
 		$href .= "/".esc_url($params{'project'}) if defined $params{'project'};
 		delete $params{'project'};
 
-		# Summary just uses the project path URL
-		if (defined $params{'action'} && $params{'action'} eq 'summary') {
+		# Summary just uses the project path URL, any other action is
+		# added to the URL
+		if (defined $params{'action'}) {
+			$href .= "/".esc_url($params{'action'}) unless $params{'action'} eq 'summary';
 			delete $params{'action'};
 		}
+
+		# Finally, we put either hash_base:file_name or hash
+		if (defined $params{'hash_base'}) {
+			$href .= "/".esc_url($params{'hash_base'});
+			if (defined $params{'file_name'}) {
+				$href .= ":".esc_url($params{'file_name'});
+				delete $params{'file_name'};
+			}
+			delete $params{'hash'};
+			delete $params{'hash_base'};
+		} elsif (defined $params{'hash'}) {
+			$href .= "/".esc_url($params{'hash'});
+			delete $params{'hash'};
+		}
 	}
 
 	# now encode the parameters explicitly
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv4] gitweb: use_pathinfo filenames start with /
  2008-10-02  0:10     ` [PATCHv4] gitweb: generate project/action/hash URLs Giuseppe Bilotta
@ 2008-10-02  0:10       ` Giuseppe Bilotta
  2008-10-02  0:10         ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
  2008-10-03 11:28         ` [PATCHv4] gitweb: use_pathinfo filenames start with / Jakub Narebski
  2008-10-03  1:48       ` [PATCHv4] gitweb: generate project/action/hash URLs Jakub Narebski
  2008-10-04  1:15       ` Jakub Narebski
  2 siblings, 2 replies; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-02  0:10 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Shawn O. Pearce,
	Giuseppe Bilotta

When using path info, make filenames start with a / (right after the :
that separates them from the hash base). This minimal change allows
relative navigation to work properly when viewing HTML files.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 gitweb/gitweb.perl |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2c380ac..3e5b2b7 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -690,7 +690,7 @@ sub href (%) {
 		# try to put as many parameters as possible in PATH_INFO:
 		#   - project name
 		#   - action
-		#   - hash or hash_base:filename
+		#   - hash or hash_base:/filename
 
 		# Strip any trailing / from $href, or we might get double
 		# slashes when the script is the DirectoryIndex
@@ -708,11 +708,11 @@ sub href (%) {
 			delete $params{'action'};
 		}
 
-		# Finally, we put either hash_base:file_name or hash
+		# Finally, we put either hash_base:/file_name or hash
 		if (defined $params{'hash_base'}) {
 			$href .= "/".esc_url($params{'hash_base'});
 			if (defined $params{'file_name'}) {
-				$href .= ":".esc_url($params{'file_name'});
+				$href .= ":/".esc_url($params{'file_name'});
 				delete $params{'file_name'};
 			}
 			delete $params{'hash'};
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv4] gitweb: parse parent..current syntax from pathinfo
  2008-10-02  0:10       ` [PATCHv4] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
@ 2008-10-02  0:10         ` Giuseppe Bilotta
  2008-10-02  0:10           ` [PATCHv4] gitweb: generate parent..current URLs Giuseppe Bilotta
  2008-10-04  1:31           ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Jakub Narebski
  2008-10-03 11:28         ` [PATCHv4] gitweb: use_pathinfo filenames start with / Jakub Narebski
  1 sibling, 2 replies; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-02  0:10 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Shawn O. Pearce,
	Giuseppe Bilotta

This makes it possible to use an URL such as
$project/somebranch..otherbranch:/filename to get a diff between
different version of a file. Paths like
$project/$action/somebranch:/somefile..otherbranch:/otherfile are parsed
as well.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 gitweb/gitweb.perl |   26 ++++++++++++++++++++++++--
 1 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 3e5b2b7..89e360f 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -534,7 +534,9 @@ if ($path_info && !defined $action) {
 
 # we can now parse ref and pathnames in PATH_INFO
 if ($path_info) {
-	my ($refname, $pathname) = split(/:/, $path_info, 2);
+	$path_info =~ /^((.+?)(:(.+))?\.\.)?(.+?)(:(.+))?$/;
+	my ($parentrefname, $parentpathname, $refname, $pathname) = (
+		$2, $4, $5, $7);
 	if (defined $pathname) {
 		# we got "project.git/action/branch:filename" or "project.git/action/branch:dir/"
 		# we could use git_get_type(branch:pathname), but it needs $git_dir
@@ -543,7 +545,11 @@ if ($path_info) {
 			$input_params{'action'} ||= "tree";
 			$pathname =~ s,/$,,;
 		} else {
-			$input_params{'action'}  ||= "blob_plain";
+			if ($parentrefname) {
+				$input_params{'action'} ||= "blobdiff_plain";
+			} else {
+				$input_params{'action'} ||= "blob_plain";
+			}
 		}
 		$input_params{'hash_base'} ||= $refname;
 		$input_params{'file_name'} ||= $pathname;
@@ -553,6 +559,22 @@ if ($path_info) {
 		$input_params{'hash'}      ||= $refname;
 		$input_params{'hash_base'} ||= $refname;
 	}
+	# the parent part might be missing the pathname, in which case we use the $file_name, if present
+	if (defined $parentrefname) {
+		$input_params{'hash_parent_base'} ||= $parentrefname;
+		if ($parentpathname) {
+			$parentpathname =~ s,^/+,,;
+			$parentpathname =~ s,/$,,;
+			$input_params{'file_parent'} ||= $parentpathname;
+		} else {
+			$input_params{'file_parent'} ||= $input_params{'file_name'};
+		}
+		if (defined $input_params{'file_parent'}) {
+			$input_params{'hash_parent'} ||= git_get_hash_by_path($input_params{'hash_parent_base'}, $input_params{'file_parent'});
+		} else {
+			$input_params{'hash_parent'} ||= $parentrefname;
+		}
+	}
 }
 
 # and now the rest of the validation
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCHv4] gitweb: generate parent..current URLs
  2008-10-02  0:10         ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
@ 2008-10-02  0:10           ` Giuseppe Bilotta
  2008-10-06  0:17             ` Jakub Narebski
  2008-10-04  1:31           ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Jakub Narebski
  1 sibling, 1 reply; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-02  0:10 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Shawn O. Pearce,
	Giuseppe Bilotta

If use_pathinfo is enabled, href now creates links that contain paths in
the form $project/$action/oldhash:/oldname..newhash:/newname for actions
that use hash_parent etc.

If any of the filename contains two consecutive dots, it's kept as a CGI
parameter since the resulting path would otherwise be ambiguous.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 gitweb/gitweb.perl |   30 +++++++++++++++++++++++++-----
 1 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 89e360f..d863ef7 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -712,7 +712,8 @@ sub href (%) {
 		# try to put as many parameters as possible in PATH_INFO:
 		#   - project name
 		#   - action
-		#   - hash or hash_base:/filename
+		#   - hash_parent or hash_parent_base:/file_parent
+		#   - hash or hash_base:/file_name
 
 		# Strip any trailing / from $href, or we might get double
 		# slashes when the script is the DirectoryIndex
@@ -730,17 +731,36 @@ sub href (%) {
 			delete $params{'action'};
 		}
 
-		# Finally, we put either hash_base:/file_name or hash
+		# Next, we put hash_parent_base:/file_parent..hash_base:/file_name,
+		# stripping nonexistent or useless pieces
+		$href .= "/" if ($params{'hash_base'} || $params{'hash_parent_base'}
+			|| $params{'hash_parent'} || $params{'hash'});
 		if (defined $params{'hash_base'}) {
-			$href .= "/".esc_url($params{'hash_base'});
-			if (defined $params{'file_name'}) {
+			if (defined $params{'hash_parent_base'}) {
+				$href .= esc_url($params{'hash_parent_base'});
+				# skip the file_parent if it's the same as the file_name
+				delete $params{'file_parent'} if $params{'file_parent'} eq $params{'file_name'};
+				if (defined $params{'file_parent'} && $params{'file_parent'} !~ /\.\./) {
+					$href .= ":/".esc_url($params{'file_parent'});
+					delete $params{'file_parent'};
+				}
+				$href .= "..";
+				delete $params{'hash_parent'};
+				delete $params{'hash_parent_base'};
+			} elsif (defined $params{'hash_parent'}) {
+				$href .= esc_url($params{'hash_parent'}). "..";
+				delete $params{'hash_parent'};
+			}
+
+			$href .= esc_url($params{'hash_base'});
+			if (defined $params{'file_name'} && $params{'file_name'} !~ /\.\./) {
 				$href .= ":/".esc_url($params{'file_name'});
 				delete $params{'file_name'};
 			}
 			delete $params{'hash'};
 			delete $params{'hash_base'};
 		} elsif (defined $params{'hash'}) {
-			$href .= "/".esc_url($params{'hash'});
+			$href .= esc_url($params{'hash'});
 			delete $params{'hash'};
 		}
 	}
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: PATH_INFO support improvements
  2008-10-02  0:10 [PATCHv4] gitweb: PATH_INFO support improvements Giuseppe Bilotta
  2008-10-02  0:10 ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
@ 2008-10-02  8:19 ` Jakub Narebski
  2008-10-02  8:49   ` Giuseppe Bilotta
  1 sibling, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2008-10-02  8:19 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

Giuseppe Bilotta wrote:

> Fourth version of my gitweb PATH_INFO patchset, whose purpose is to
> reduce the use of CGI parameters by embedding as many parameters as
> possible in the URL path itself, provided the pathinfo feature is
> enabled.

A nit: when sending longer patch series you should use numbered
format in the form of [PATCH m/n] or [PATCH m/n vX] prefix.

> 
> The new typical gitweb URL is therefore in the form
> 
> $project/$action/$parent:$file..$hash:$file
> 
> (with useless parts stripped). Backwards compatibility for old-style
> $project/$hash URLs is kept, as long as $hash is not a refname whose
> name happens to match a git action.

Minor nit: there was also old-style $project/$hash_base:$file_name
path_info format.

>
> The main implementation is provided by paired patches (#1#3, #5#6)
> that implement parsing and generation of the new style URLs.
> 
> Patch #2 deals with a refactoring of the input parameters parsing and
> validation, so that the rest of gitweb can be agnostic wrt to the
> parameters' origin (CGI vs PATH_INFO vs possible other future inputs
> such as CLI).
> 
> Patch #4 is a minor improvement to the URL syntax that allows web
> documents to be properly browsable in raw mode.

Very nice summary of patchset and patch  coverage in this cover letter.

>
> Giuseppe Bilotta (6):
>   gitweb: parse project/action/hash_base:filename PATH_INFO
>   gitweb: refactor input parameters parse/validation
>   gitweb: generate project/action/hash URLs
>   gitweb: use_pathinfo filenames start with /
>   gitweb: parse parent..current syntax from pathinfo
>   gitweb: generate parent..current URLs

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: PATH_INFO support improvements
  2008-10-02  8:19 ` [PATCHv4] gitweb: PATH_INFO support improvements Jakub Narebski
@ 2008-10-02  8:49   ` Giuseppe Bilotta
  2008-10-02 10:16     ` Jakub Narebski
  0 siblings, 1 reply; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-02  8:49 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Thu, Oct 2, 2008 at 10:19 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> Giuseppe Bilotta wrote:
>
>> Fourth version of my gitweb PATH_INFO patchset, whose purpose is to
>> reduce the use of CGI parameters by embedding as many parameters as
>> possible in the URL path itself, provided the pathinfo feature is
>> enabled.
>
> A nit: when sending longer patch series you should use numbered
> format in the form of [PATCH m/n] or [PATCH m/n vX] prefix.

W00t, I still manage to get this wrong. Kudos to me 8-/

I wonder why these options are not the default when there is more than
one patch, btw?

(And yes, I tried looking into the builtin-log.c code but making it
automatic is somewhat less trivial than I can dedicate time to.)

>> The new typical gitweb URL is therefore in the form
>>
>> $project/$action/$parent:$file..$hash:$file
>>
>> (with useless parts stripped). Backwards compatibility for old-style
>> $project/$hash URLs is kept, as long as $hash is not a refname whose
>> name happens to match a git action.
>
> Minor nit: there was also old-style $project/$hash_base:$file_name
> path_info format.

Right, forgot about that.

>> The main implementation is provided by paired patches (#1#3, #5#6)
>> that implement parsing and generation of the new style URLs.
>>
>> Patch #2 deals with a refactoring of the input parameters parsing and
>> validation, so that the rest of gitweb can be agnostic wrt to the
>> parameters' origin (CGI vs PATH_INFO vs possible other future inputs
>> such as CLI).
>>
>> Patch #4 is a minor improvement to the URL syntax that allows web
>> documents to be properly browsable in raw mode.
>
> Very nice summary of patchset and patch  coverage in this cover letter.

Thanks. At least I'm learning from my past errors. I'll manage to send
the perfect patchset sooner or later ;)


-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-02  0:10 ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
  2008-10-02  0:10   ` [PATCHv4] gitweb: refactor input parameters parse/validation Giuseppe Bilotta
@ 2008-10-02  8:59   ` Jakub Narebski
  2008-10-02  9:43     ` Giuseppe Bilotta
  2008-10-02 15:34   ` Petr Baudis
  2 siblings, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2008-10-02  8:59 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

Giuseppe Bilotta wrote:

> This patch enables gitweb to parse URLs with more information embedded
> in PATH_INFO, reducing the need for CGI parameters. The typical gitweb
> path is now $project/$action/$hash_base:$file_name or
> $project/$action/$hash
> 
> This is mostly backwards compatible with the old-style gitweb paths,
> except when $project/$branch was used to access a branch whose name
> matches a gitweb action.

Nice summary.

> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
> ---
>  gitweb/gitweb.perl |   90 +++++++++++++++++++++++++++++++---------------------
>  1 files changed, 54 insertions(+), 36 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index e7e4d6b..f088681 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -495,6 +495,37 @@ if (defined $searchtext) {
>  	$search_regexp = $search_use_regexp ? $searchtext : quotemeta $searchtext;
>  }
>  
> +# dispatch
> +my %actions = (
> +	"blame" => \&git_blame,
[...]
> +);

I'm not sure if the '# dispatch' comment is correct here now that
%actions hash is moved away from actual dispatch (selecting action
to run)

> @@ -519,9 +550,19 @@ sub evaluate_path_info {
>  	# do not change any parameters if an action is given using the query string
>  	return if $action;
>  	$path_info =~ s,^\Q$project\E/*,,;
> +
> +	# next comes the action
> +	$action = $path_info;
> +	$action =~ s,/.*$,,;

I would use here pattern matching, but your code is also good and
doesn't need changing; just for completeness below there is alternate
solution:

+	$path_info =~ m,^(.*?)/,;
+	$action = $1;

> +	if (exists $actions{$action}) {
> +		$path_info =~ s,^$action/*,,;
> +	} else {
> +		$action  = undef;
> +	}
> +

[...]
> @@ -534,8 +575,9 @@ sub evaluate_path_info {
>  		$file_name ||= validate_pathname($pathname);
>  	} elsif (defined $refname) {
>  		# we got "project.git/branch"

You meant here

  		# we got "project.git/branch" or "project.git/action/branch"

> -		$action ||= "shortlog";
> -		$hash   ||= validate_refname($refname);
> +		$action    ||= "shortlog";
> +		$hash      ||= validate_refname($refname);
> +		$hash_base ||= validate_refname($refname);
>  	}
>  }

This hunk is IMHO incorrect.  First, $refname is _either_ $hash, or
$hash_base; it cannot be both.  Second, in most cases (like the case
of 'shortlog' action, either explicit or implicit) it is simply $hash;
I think it can be $hash_base when $file_name is not set only in
singular exception case of 'tree' view for the top tree (lack of
filename is not an error, but is equivalent to $file_name='/').

> @@ -544,37 +586,6 @@ evaluate_path_info();
>  our $git_dir;
>  $git_dir = "$projectroot/$project" if $project;
>  
> -# dispatch
> -my %actions = (
[...]
> -);
> -
>  if (!defined $action) {
>  	if (defined $hash) {
>  		$action = git_get_type($hash);

I _think_ the '# dispatch' comment should be left here, and not moved
with the %actions hash.

> @@ -631,8 +642,15 @@ sub href (%) {
>  	if ($params{-replay}) {
>  		while (my ($name, $symbol) = each %mapping) {
>  			if (!exists $params{$name}) {
> -				# to allow for multivalued params we use arrayref form
> -				$params{$name} = [ $cgi->param($symbol) ];
> +				# the parameter we want to recycle may be either part of the
> +				# list of CGI parameter, or recovered from PATH_INFO
> +				if ($cgi->param($symbol)) {
> +					# to allow for multivalued params we use arrayref form
> +					$params{$name} = [ $cgi->param($symbol) ];
> +				} else {
> +					no strict 'refs';
> +					$params{$name} = $$name if $$name;

I would _perhaps_ add here comment that multivalued parameters can come
only from CGI query string, so there is no need for something like:

+					$params{$name} = (ref($$name) ? @$name : $$name) if $$name;

> +				}
>  			}
>  		}
>  	}

This fragment is a bit of ugly code, hopefully corrected in later patch.
I think it would be better to have 'refactor parsing/validation of input
parameters' to be very fist patch in series; I am not sure but I suspect
that is a kind of bugfix for current "$project/$hash" ('shortlog' view)
and "$project/$hash_base:$file_name" ('blob_plain' and 'tree' view)
path_info.

P.S. It is a bit of pity that Mechanize test from Lea Wiemann caching
gitweb code is not in the 'master' or at least 'pu'.  Using big, single,
monolithic patch instead of patch series of small, easy reviewable
commits strikes again... ;-(

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-02  8:59   ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
@ 2008-10-02  9:43     ` Giuseppe Bilotta
  2008-10-03  0:48       ` Jakub Narebski
  0 siblings, 1 reply; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-02  9:43 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Thu, Oct 2, 2008 at 10:59 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> Giuseppe Bilotta wrote:
>>
>> +# dispatch
>> +my %actions = (
>> +     "blame" => \&git_blame,
> [...]
>> +);
>
> I'm not sure if the '# dispatch' comment is correct here now that
> %actions hash is moved away from actual dispatch (selecting action
> to run)

Bingo.

>> @@ -519,9 +550,19 @@ sub evaluate_path_info {
>>       # do not change any parameters if an action is given using the query string
>>       return if $action;
>>       $path_info =~ s,^\Q$project\E/*,,;
>> +
>> +     # next comes the action
>> +     $action = $path_info;
>> +     $action =~ s,/.*$,,;
>
> I would use here pattern matching, but your code is also good and
> doesn't need changing; just for completeness below there is alternate
> solution:
>
> +       $path_info =~ m,^(.*?)/,;
> +       $action = $1;


Yeah, I just followed the existing code style.


>> @@ -534,8 +575,9 @@ sub evaluate_path_info {
>>               $file_name ||= validate_pathname($pathname);
>>       } elsif (defined $refname) {
>>               # we got "project.git/branch"
>
> You meant here
>
>                # we got "project.git/branch" or "project.git/action/branch"

Yes I do.

>> -             $action ||= "shortlog";
>> -             $hash   ||= validate_refname($refname);
>> +             $action    ||= "shortlog";
>> +             $hash      ||= validate_refname($refname);
>> +             $hash_base ||= validate_refname($refname);
>>       }
>>  }
>
> This hunk is IMHO incorrect.  First, $refname is _either_ $hash, or
> $hash_base; it cannot be both.  Second, in most cases (like the case
> of 'shortlog' action, either explicit or implicit) it is simply $hash;
> I think it can be $hash_base when $file_name is not set only in
> singular exception case of 'tree' view for the top tree (lack of
> filename is not an error, but is equivalent to $file_name='/').

OTOH, while setting both $hash and $hash_base has worked fine for me
so far (because the right one is automatically used and apparently
setting the other doesn't hurt), choosing which one to set is a much
hairier case. Do you have suggestions for a better way to always make
it work?

>> @@ -544,37 +586,6 @@ evaluate_path_info();
>>  our $git_dir;
>>  $git_dir = "$projectroot/$project" if $project;
>>
>> -# dispatch
>> -my %actions = (
> [...]
>> -);
>> -
>>  if (!defined $action) {
>>       if (defined $hash) {
>>               $action = git_get_type($hash);
>
> I _think_ the '# dispatch' comment should be left here, and not moved
> with the %actions hash.

I agree.

>> @@ -631,8 +642,15 @@ sub href (%) {
>>       if ($params{-replay}) {
>>               while (my ($name, $symbol) = each %mapping) {
>>                       if (!exists $params{$name}) {
>> -                             # to allow for multivalued params we use arrayref form
>> -                             $params{$name} = [ $cgi->param($symbol) ];
>> +                             # the parameter we want to recycle may be either part of the
>> +                             # list of CGI parameter, or recovered from PATH_INFO
>> +                             if ($cgi->param($symbol)) {
>> +                                     # to allow for multivalued params we use arrayref form
>> +                                     $params{$name} = [ $cgi->param($symbol) ];
>> +                             } else {
>> +                                     no strict 'refs';
>> +                                     $params{$name} = $$name if $$name;
>
> I would _perhaps_ add here comment that multivalued parameters can come
> only from CGI query string, so there is no need for something like:
>
> +                                       $params{$name} = (ref($$name) ? @$name : $$name) if $$name;
>
>> +                             }
>>                       }
>>               }
>>       }
>
> This fragment is a bit of ugly code, hopefully corrected in later patch.
> I think it would be better to have 'refactor parsing/validation of input
> parameters' to be very fist patch in series; I am not sure but I suspect
> that is a kind of bugfix for current "$project/$hash" ('shortlog' view)
> and "$project/$hash_base:$file_name" ('blob_plain' and 'tree' view)
> path_info.

But implementing the path_info parsing first makes the input param
refactoring SO much nicer that I would rather put a comment here
saying "this code sucks: we should rather collect all input
parameters" and then clean it up on the subsequent patch.


-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: PATH_INFO support improvements
  2008-10-02  8:49   ` Giuseppe Bilotta
@ 2008-10-02 10:16     ` Jakub Narebski
  0 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2008-10-02 10:16 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

Giuseppe Bilotta wrote:
> On Thu, Oct 2, 2008 at 10:19 AM, Jakub Narebski wrote:

> >
> > A nit: when sending longer patch series you should use numbered
> > format in the form of [PATCH m/n] or [PATCH m/n vX] prefix.
> 
> W00t, I still manage to get this wrong. Kudos to me 8-/
> 
> I wonder why these options are not the default when there is more than
> one patch, btw?
> 
> (And yes, I tried looking into the builtin-log.c code but making it
> automatic is somewhat less trivial than I can dedicate time to.)

Hmmm... I thought that format.numbered config variable is 'auto' by
default; I guess it isn't (just so you know where to look to change
it ;-)).

I have the following in the .git/config / ~/.gitconfig:

  [format]
          numbered = auto
          suffix = .txt

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-02  0:10 ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
  2008-10-02  0:10   ` [PATCHv4] gitweb: refactor input parameters parse/validation Giuseppe Bilotta
  2008-10-02  8:59   ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
@ 2008-10-02 15:34   ` Petr Baudis
  2008-10-02 19:30     ` Giuseppe Bilotta
  2 siblings, 1 reply; 34+ messages in thread
From: Petr Baudis @ 2008-10-02 15:34 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Jakub Narebski, Junio C Hamano, Shawn O. Pearce

On Thu, Oct 02, 2008 at 02:10:29AM +0200, Giuseppe Bilotta wrote:
> This patch enables gitweb to parse URLs with more information embedded
> in PATH_INFO, reducing the need for CGI parameters. The typical gitweb
> path is now $project/$action/$hash_base:$file_name or
> $project/$action/$hash
> 
> This is mostly backwards compatible with the old-style gitweb paths,
> except when $project/$branch was used to access a branch whose name
> matches a gitweb action.
> 
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>

Just nits...

> ---
>  gitweb/gitweb.perl |   90 +++++++++++++++++++++++++++++++---------------------
>  1 files changed, 54 insertions(+), 36 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index e7e4d6b..f088681 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -519,9 +550,19 @@ sub evaluate_path_info {
>  	# do not change any parameters if an action is given using the query string
>  	return if $action;
>  	$path_info =~ s,^\Q$project\E/*,,;
> +
> +	# next comes the action
> +	$action = $path_info;
> +	$action =~ s,/.*$,,;
> +	if (exists $actions{$action}) {
> +		$path_info =~ s,^$action/*,,;
> +	} else {
> +		$action  = undef;

Extra whitespace.

> +	}
> +
>  	my ($refname, $pathname) = split(/:/, $path_info, 2);
>  	if (defined $pathname) {
> -		# we got "project.git/branch:filename" or "project.git/branch:dir/"
> +		# we got "project.git/action/branch:filename" or "project.git/action/branch:dir/"
>  		# we could use git_get_type(branch:pathname), but it needs $git_dir
>  		$pathname =~ s,^/+,,;
>  		if (!$pathname || substr($pathname, -1) eq "/") {

But the old variant is still possible, maybe the comments should
indicate that the action/ part is optional.

> @@ -534,8 +575,9 @@ sub evaluate_path_info {
>  		$file_name ||= validate_pathname($pathname);
>  	} elsif (defined $refname) {
>  		# we got "project.git/branch"
> -		$action ||= "shortlog";
> -		$hash   ||= validate_refname($refname);
> +		$action    ||= "shortlog";
> +		$hash      ||= validate_refname($refname);
> +		$hash_base ||= validate_refname($refname);
>  	}
>  }
>  evaluate_path_info();

What is this good for?

-- 
				Petr "Pasky" Baudis
People who take cold baths never have rheumatism, but they have
cold baths.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-02 15:34   ` Petr Baudis
@ 2008-10-02 19:30     ` Giuseppe Bilotta
  2008-10-02 20:56       ` Petr Baudis
  0 siblings, 1 reply; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-02 19:30 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, Jakub Narebski, Junio C Hamano, Shawn O. Pearce

On Thu, Oct 2, 2008 at 5:34 PM, Petr Baudis <pasky@suse.cz> wrote:

> Just nits...
>> +             $action  = undef;
>
> Extra whitespace.

Right, fixed.

>> +     }
>> +
>>       my ($refname, $pathname) = split(/:/, $path_info, 2);
>>       if (defined $pathname) {
>> -             # we got "project.git/branch:filename" or "project.git/branch:dir/"
>> +             # we got "project.git/action/branch:filename" or "project.git/action/branch:dir/"
>>               # we could use git_get_type(branch:pathname), but it needs $git_dir
>>               $pathname =~ s,^/+,,;
>>               if (!$pathname || substr($pathname, -1) eq "/") {
>
> But the old variant is still possible, maybe the comments should
> indicate that the action/ part is optional.

I put the action/ part in square brackets. (e.g.
project.git/[action/]branch:filename). Is this good enough?

>> @@ -534,8 +575,9 @@ sub evaluate_path_info {
>>               $file_name ||= validate_pathname($pathname);
>>       } elsif (defined $refname) {
>>               # we got "project.git/branch"
>> -             $action ||= "shortlog";
>> -             $hash   ||= validate_refname($refname);
>> +             $action    ||= "shortlog";
>> +             $hash      ||= validate_refname($refname);
>> +             $hash_base ||= validate_refname($refname);
>>       }
>>  }
>>  evaluate_path_info();
>
> What is this good for?

The purpose of what? setting both $hash and $hash_base was something
that I found was needed in some extreme cases, as discussed with
Jakub. Proposals for recommended cleaner but equally fast way to
handle it. If you're referring to the whitespace, I was just lining up
the entries. Should I do it in a separate patch?

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-02 19:30     ` Giuseppe Bilotta
@ 2008-10-02 20:56       ` Petr Baudis
  2008-10-02 21:05         ` Giuseppe Bilotta
  0 siblings, 1 reply; 34+ messages in thread
From: Petr Baudis @ 2008-10-02 20:56 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Jakub Narebski, Junio C Hamano, Shawn O. Pearce

On Thu, Oct 02, 2008 at 09:30:42PM +0200, Giuseppe Bilotta wrote:
> On Thu, Oct 2, 2008 at 5:34 PM, Petr Baudis <pasky@suse.cz> wrote:
> 
> > Just nits...
> >> +     }
> >> +
> >>       my ($refname, $pathname) = split(/:/, $path_info, 2);
> >>       if (defined $pathname) {
> >> -             # we got "project.git/branch:filename" or "project.git/branch:dir/"
> >> +             # we got "project.git/action/branch:filename" or "project.git/action/branch:dir/"
> >>               # we could use git_get_type(branch:pathname), but it needs $git_dir
> >>               $pathname =~ s,^/+,,;
> >>               if (!$pathname || substr($pathname, -1) eq "/") {
> >
> > But the old variant is still possible, maybe the comments should
> > indicate that the action/ part is optional.
> 
> I put the action/ part in square brackets. (e.g.
> project.git/[action/]branch:filename). Is this good enough?

That's perfect.

> >> @@ -534,8 +575,9 @@ sub evaluate_path_info {
> >>               $file_name ||= validate_pathname($pathname);
> >>       } elsif (defined $refname) {
> >>               # we got "project.git/branch"
> >> -             $action ||= "shortlog";
> >> -             $hash   ||= validate_refname($refname);
> >> +             $action    ||= "shortlog";
> >> +             $hash      ||= validate_refname($refname);
> >> +             $hash_base ||= validate_refname($refname);
> >>       }
> >>  }
> >>  evaluate_path_info();
> >
> > What is this good for?
> 
> The purpose of what? setting both $hash and $hash_base was something
> that I found was needed in some extreme cases, as discussed with
> Jakub. Proposals for recommended cleaner but equally fast way to
> handle it. If you're referring to the whitespace, I was just lining up
> the entries. Should I do it in a separate patch?

I refer to the setting of $hash_base (I'm not huge fan of the whitespace
aligning, but I don't really care). What extreme cases are these?
I think you should describe that in the code since it's not really
obvious. Maybe I could find it in older threads but I should understand
the code just from reading it. :-)

-- 
				Petr "Pasky" Baudis
People who take cold baths never have rheumatism, but they have
cold baths.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-02 20:56       ` Petr Baudis
@ 2008-10-02 21:05         ` Giuseppe Bilotta
  2008-10-02 22:04           ` Petr Baudis
  0 siblings, 1 reply; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-02 21:05 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, Jakub Narebski, Junio C Hamano, Shawn O. Pearce

>> >> @@ -534,8 +575,9 @@ sub evaluate_path_info {
>> >>               $file_name ||= validate_pathname($pathname);
>> >>       } elsif (defined $refname) {
>> >>               # we got "project.git/branch"
>> >> -             $action ||= "shortlog";
>> >> -             $hash   ||= validate_refname($refname);
>> >> +             $action    ||= "shortlog";
>> >> +             $hash      ||= validate_refname($refname);
>> >> +             $hash_base ||= validate_refname($refname);
>> >>       }
>> >>  }
>> >>  evaluate_path_info();
>> >
>> > What is this good for?
>>
>> The purpose of what? setting both $hash and $hash_base was something
>> that I found was needed in some extreme cases, as discussed with
>> Jakub. Proposals for recommended cleaner but equally fast way to
>> handle it. If you're referring to the whitespace, I was just lining up
>> the entries. Should I do it in a separate patch?
>
> I refer to the setting of $hash_base (I'm not huge fan of the whitespace
> aligning, but I don't really care). What extreme cases are these?
> I think you should describe that in the code since it's not really
> obvious. Maybe I could find it in older threads but I should understand
> the code just from reading it. :-)

In preparing the new patchset, I've put a big comment block explaining
why we need to set both $hash and $hash_base in this code-path:

# we got "project.git/[action/]branch". in this case
# we set both $hash and $hash_base because different actions
# need one or the other to be set to behave correctly.
#
# For example, if $hash_base is not set then the blob and
# history links on the page project.git/tree/somebranch will
# assume a $hash_base of HEAD instead of the correct
# somebranch.
# Conversely, not setting $hash will make URLs such as
# project.git/shortlog/somebranch display the wrong page.
#
# Since it also turns out that the unused one is properly
# overwritten as needed, setting both is quite safe.


-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-02 21:05         ` Giuseppe Bilotta
@ 2008-10-02 22:04           ` Petr Baudis
  2008-10-02 22:41             ` Jakub Narebski
  0 siblings, 1 reply; 34+ messages in thread
From: Petr Baudis @ 2008-10-02 22:04 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Jakub Narebski, Junio C Hamano, Shawn O. Pearce

On Thu, Oct 02, 2008 at 11:05:18PM +0200, Giuseppe Bilotta wrote:
> In preparing the new patchset, I've put a big comment block explaining
> why we need to set both $hash and $hash_base in this code-path:
> 
> # we got "project.git/[action/]branch". in this case
> # we set both $hash and $hash_base because different actions
> # need one or the other to be set to behave correctly.
> #
> # For example, if $hash_base is not set then the blob and
> # history links on the page project.git/tree/somebranch will
> # assume a $hash_base of HEAD instead of the correct
> # somebranch.
> # Conversely, not setting $hash will make URLs such as
> # project.git/shortlog/somebranch display the wrong page.
> #
> # Since it also turns out that the unused one is properly
> # overwritten as needed, setting both is quite safe.

  Ok, but is this related to the pathinfo changes? Or is this fixing a
separate bug? (Not that I would want to bother you with splitting this,
but you should at least mention it in the commit message... and it's
fairly isolated anyway.)

-- 
				Petr "Pasky" Baudis
People who take cold baths never have rheumatism, but they have
cold baths.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-02 22:04           ` Petr Baudis
@ 2008-10-02 22:41             ` Jakub Narebski
  2008-10-03  5:54               ` Giuseppe Bilotta
  0 siblings, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2008-10-02 22:41 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Giuseppe Bilotta, git, Junio C Hamano, Shawn O. Pearce

Petr Baudis wrote:
> On Thu, Oct 02, 2008 at 11:05:18PM +0200, Giuseppe Bilotta wrote:
> >
> > In preparing the new patchset, I've put a big comment block explaining
> > why we need to set both $hash and $hash_base in this code-path:
> > 
> > # we got "project.git/[action/]branch". in this case
> > # we set both $hash and $hash_base because different actions
> > # need one or the other to be set to behave correctly.
> > #
> > # For example, if $hash_base is not set then the blob and

'blob' without $file_name doesn't have sense, but 'tree' does.
Probably should be s/blob/tree/ above.

> > # history links on the page project.git/tree/somebranch will
> > # assume a $hash_base of HEAD instead of the correct
> > # somebranch.
> > # Conversely, not setting $hash will make URLs such as
> > # project.git/shortlog/somebranch display the wrong page.
> > #
> > # Since it also turns out that the unused one is properly
> > # overwritten as needed, setting both is quite safe.
> 
>   Ok, but is this related to the pathinfo changes? Or is this fixing a
> separate bug? (Not that I would want to bother you with splitting this,
> but you should at least mention it in the commit message... and it's
> fairly isolated anyway.)

Yes, it related to path_info changes. With current code the only way
to get to no $file_name branch of evaluate_path_info() code was to have
URL which looks like project/branch, for which 'shortlog' action was
chosen (if not specified by CGI query parameters), and for 'shortlog'
action the correct way was to use $hash. Now there can be
project/tree/branch to show top directory of given branch, and this
as described required $hash_base set.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-02  9:43     ` Giuseppe Bilotta
@ 2008-10-03  0:48       ` Jakub Narebski
  2008-10-03  6:04         ` Giuseppe Bilotta
  0 siblings, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2008-10-03  0:48 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:
> On Thu, Oct 2, 2008 at 10:59 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>> Giuseppe Bilotta wrote:

>>> -             $action ||= "shortlog";
>>> -             $hash   ||= validate_refname($refname);
>>> +             $action    ||= "shortlog";
>>> +             $hash      ||= validate_refname($refname);
>>> +             $hash_base ||= validate_refname($refname);
>>>       }
>>>  }
>>
>> This hunk is IMHO incorrect.  First, $refname is _either_ $hash, or
>> $hash_base; it cannot be both.  Second, in most cases (like the case
>> of 'shortlog' action, either explicit or implicit) it is simply $hash;
>> I think it can be $hash_base when $file_name is not set only in
>> singular exception case of 'tree' view for the top tree (lack of
>> filename is not an error, but is equivalent to $file_name='/').
> 
> OTOH, while setting both $hash and $hash_base has worked fine for me
> so far (because the right one is automatically used and apparently
> setting the other doesn't hurt), choosing which one to set is a much
> hairier case. Do you have suggestions for a better way to always make
> it work?

Well, it is either checking $action and setting either $hash or
$hash_base, or setting both, with some comments on why and when it is
needed (as discussed on #git). IIUC $hash_base is needed only for
filename-taking tree actions which acts on top-tree, and therefore
don't need $file_name, like 'project/tree/branch' or related 
'project/history/branch' (the latter is practically almost equivalent
to 'project/shortlog/branch' or 'project/branch').

I'm not sure if it wouldn't be better to call validate_refname($refname)
once, either as:

  $hash_base ||= $hash ||= validate_refname($refname);
 
but that might be incorrect in the obscure case of setting $hash via 'h'
CGI query parameter, and letting gitweb to set-up $hash_base via
path_info, so perhaps ($refname is local to evaluate_path_info, IIRC)

  $refname = validate_refname($refname);
  $hash      ||= $refname;
  $hash_base ||= $refname;

But that is just nitpicking this fragment of code to death. In short:
either check which of $hash and $hash_base to set in this branch of
conditional, or explain why setting both $hash and $hash_base is needed,
and why it is acceptable, either as comments, or in commit message.

>>> @@ -631,8 +642,15 @@ sub href (%) {
>>>       if ($params{-replay}) {
>>>               while (my ($name, $symbol) = each %mapping) {
>>>                       if (!exists $params{$name}) {
>>> -                             # to allow for multivalued params we use arrayref form
>>> -                             $params{$name} = [ $cgi->param($symbol) ];
>>> +                             # the parameter we want to recycle may be either part of the
>>> +                             # list of CGI parameter, or recovered from PATH_INFO
>>> +                             if ($cgi->param($symbol)) {
>>> +                                     # to allow for multivalued params we use arrayref form
>>> +                                     $params{$name} = [ $cgi->param($symbol) ];
>>> +                             } else {
>>> +                                     no strict 'refs';
>>> +                                     $params{$name} = $$name if $$name;
>>
>> I would _perhaps_ add here comment that multivalued parameters can come
>> only from CGI query string, so there is no need for something like:
>>
>> +                                       $params{$name} = (ref($$name) ? @$name : $$name) if $$name;
>>
>>> +                             }
>>>                       }
>>>               }
>>>       }
>>
>> This fragment is a bit of ugly code, hopefully corrected in later patch.
>> I think it would be better to have 'refactor parsing/validation of input
>> parameters' to be very fist patch in series; I am not sure but I suspect
>> that is a kind of bugfix for current "$project/$hash" ('shortlog' view)
>> and "$project/$hash_base:$file_name" ('blob_plain' and 'tree' view)
>> path_info.
> 
> But implementing the path_info parsing first makes the input param
> refactoring SO much nicer that I would rather put a comment here
> saying "this code sucks: we should rather collect all input
> parameters" and then clean it up on the subsequent patch.

Why not cleanup first?

When implementing href(..., -replay=>1) I have forgot that some of
gitweb parameters are implicitly passed ($project, because it is needed
in most gitweb links), and some can be passed via path_info ($hash 
and/or $hash_base, $file_name). Your code adds $action to the mix, but
it doesn't change the fact that 1.) even before your code -replay case
was incorrect for some path_info links (handcrafted, as gitweb generates
only $project via path_info); 2.) code you have added is a bit ugly.

Besides using variables change a little meaning of -replay, namely
in your code gitweb always sets action, even for non-path_name links
when we started from "default action" (i.e. without action set) links.
I guess this is mainly theoretical issue, as I don't think that default
views use many -replay links.


P.S. with the idea of pushing parameters obtained not from CGI query
string to $cgi->param() via "$cgi->param($name, $value);" or in named
params form "$cgi->(-name=>$name, -value=>$value);" you would not need
to change (a bit hacky, admittedly) href(...,-replay=>1) code.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: refactor input parameters parse/validation
  2008-10-02  0:10   ` [PATCHv4] gitweb: refactor input parameters parse/validation Giuseppe Bilotta
  2008-10-02  0:10     ` [PATCHv4] gitweb: generate project/action/hash URLs Giuseppe Bilotta
@ 2008-10-03  1:36     ` Jakub Narebski
  2008-10-03  7:24       ` Giuseppe Bilotta
  1 sibling, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2008-10-03  1:36 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:

> Since input parameters can now be obtained both from CGI parameters and
> PATH_INFO, we would like most of the code to be agnostic about the way
> parameters were retrieved.
> 

I'd prefer that this cleanup/refactoring patch was _first_ patch in
the series, as we were able to obtain parameters both from CGI query
parameters and from PATH_INFO ($project, $hash or $hash_base+$file_name)
_before_ first patch in this series.  So it correct not only issue
introduced by first patch (and fixed somewhat there), but what was
outstanding (but rare because gitweb did not generate such links)
issue.


> We thus collect all the parameters into the new %input_params hash,
> removing the need for ad-hoc kludgy code in href(). 

Alternate solution would be push data from PATH_INFO in query params
data using (for example)

  $cgi->param('a', $action);

or, naming parameters explicitely

  $cgi->param(-name=>'a', -value=>$action);

This avoids need for additional variable, reuses current code, and
nicely sidesteps issue whether to use long names for keys ('action',
'file_name') or short ones from CGI query ('a', 'f').


It would probably has to be at least partially written to check which
of those two solutions (%input_params or $cgi->param('a', $a)))
is better.

> As much of the 
> parameters validation code is now shared between CGI and PATH_INFO,
> although this requires PATH_INFO parsing to be interleaved into the main
> code instead of being factored out into its own routine.

I'm not sure if it is worth it then to unify parameters validation,
with such drawback.

> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
> ---
>  gitweb/gitweb.perl |  336 ++++++++++++++++++++++++++++------------------------
>  1 files changed, 183 insertions(+), 153 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index f088681..ec4326f 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -28,8 +28,10 @@ our $my_url = $cgi->url();
>  our $my_uri = $cgi->url(-absolute => 1);
>  
>  # if we're called with PATH_INFO, we have to strip that
> -# from the URL to find our real URL
> -if (my $path_info = $ENV{"PATH_INFO"}) {
> +# from the URL to find our real URL. PATH_INFO is kept
> +# as it's used later on for parameter extraction
> +my $path_info = $ENV{"PATH_INFO"};
> +if ($path_info) {
>  	$my_url =~ s,\Q$path_info\E$,,;
>  	$my_uri =~ s,\Q$path_info\E$,,;
>  }

This might be separate patch, if you wanted to increase your commit
count ;-)  More seriously I think it should be at least briefly
mentioned in commit message (make $path_info global).

> @@ -390,15 +392,111 @@ $projects_list ||= $projectroot;
>  
>  # ======================================================================
>  # input validation and dispatch
> -our $action = $cgi->param('a');
> -if (defined $action) {
> -	if ($action =~ m/[^0-9a-zA-Z\.\-_]/) {
> -		die_error(400, "Invalid action parameter");
> +
> +# input parameters can be collected from a variety of sources (presently, CGI
> +# and PATH_INFO), so we define an %input_params hash that collects them all
> +# together during validation: this allows subsequent uses (e.g. href()) to be
> +# agnostic of the parameter origin
> +
> +my %input_params = ();
> +
> +# input parameters are stored with the long parameter name as key, so we need
> +# the name -> CGI key mapping here. This will also be used in the href
> +# subroutine to convert parameters to their CGI equivalent.
> +#
> +# XXX: Warning: If you touch this, check the search form for updating,
> +# too.
> +
> +my @cgi_param_mapping = (
> +	project => "p",
> +	action => "a",
> +	file_name => "f",
> +	file_parent => "fp",
> +	hash => "h",
> +	hash_parent => "hp",
> +	hash_base => "hb",
> +	hash_parent_base => "hpb",
> +	page => "pg",
> +	order => "o",
> +	searchtext => "s",
> +	searchtype => "st",
> +	snapshot_format => "sf",
> +	extra_options => "opt",
> +	search_use_regexp => "sr",
> +);
> +my %cgi_param_mapping = @cgi_param_mapping;
> +
> +# we will also need to know the possible actions, for validation
> +my %actions = (
> +	"blame" => \&git_blame,
> +	"blobdiff" => \&git_blobdiff,
> +	"blobdiff_plain" => \&git_blobdiff_plain,
> +	"blob" => \&git_blob,
> +	"blob_plain" => \&git_blob_plain,
> +	"commitdiff" => \&git_commitdiff,
> +	"commitdiff_plain" => \&git_commitdiff_plain,
> +	"commit" => \&git_commit,
> +	"forks" => \&git_forks,
> +	"heads" => \&git_heads,
> +	"history" => \&git_history,
> +	"log" => \&git_log,
> +	"rss" => \&git_rss,
> +	"atom" => \&git_atom,
> +	"search" => \&git_search,
> +	"search_help" => \&git_search_help,
> +	"shortlog" => \&git_shortlog,
> +	"summary" => \&git_summary,
> +	"tag" => \&git_tag,
> +	"tags" => \&git_tags,
> +	"tree" => \&git_tree,
> +	"snapshot" => \&git_snapshot,
> +	"object" => \&git_object,
> +	# those below don't need $project
> +	"opml" => \&git_opml,
> +	"project_list" => \&git_project_list,
> +	"project_index" => \&git_project_index,
> +);

Nice, although I'm not sure if [%@]cgi_param_mapping has to global.
If we use long parameters names as keys, I think it has to, somewhat.
See also comment below.

> +
> +# fill %input_params with the CGI parameters. All values except for 'opt'
> +# should be single values, but opt can be an array. We should probably
> +# build an array of parameters that can be multi-valued, but since for the time
> +# being it's only this one, we just single it out
> +while (my ($name, $symbol) = each %cgi_param_mapping) {
> +	if ($symbol eq 'opt') {
> +		$input_params{$name} = [$cgi->param($symbol)];
> +	} else {
> +		$input_params{$name} = $cgi->param($symbol);
>  	}
>  }

If it was chosen to use short (CGI query) parameter names, the above
loop could be replaced by simple

  %input_params = $cgi->Vars;

or to be more exact, if we want to have multi-valued parameters stored
as array ref

  %input_params = map { [ split /\0/ ] if /\0/; } $cgi->Vars;


See CGI(3pm):

    When using this [Vars], the thing you must watch out for are multivalued CGI
    parameters.  Because a hash cannot distinguish between scalar and list con-
    text, multivalued parameters will be returned as a packed string, separated
    by the "\0" (null) character.  You must split this packed string in order
    to get at the individual values.
  
> -# parameters which are pathnames
> -our $project = $cgi->param('p');
> +# next, we want to parse PATH_INFO (which was already stored in $path_info at
> +# the beginning). This is a little hairy because PATH_INFO parsing needs some
> +# form of parameter validation, so we interleave parsing and validation.

I don't think it is a good idea. In my opinion, for my taste, it would
be better to separate evaluating path_info from the rest.

We could instead introduce convention that if variable (like $project)
is set, then it is assumed to be validated; if it is present only in
the %input_params hash, then it has to be validated.


On the other hand this ordering, first by parameter, then by method
of extraction could be seem quite equally valid.  Nevertheless I think
that previous flow with separate evaluate_path_info() and what should
be evaluate_CGI_query() has better encapsulation.

> +#
> +# The accepted PATH_INFO syntax is $project/$action/$hash or
> +# $project/$action/$hash_base:$file_name, where $action may be missing (mostly for
> +# backwards compatibility), so we need to parse and validate the parameters in
> +# this same order.
> +
> +# clear $path_info of any leading /
> +$path_info =~ s,^/+,,;
> +
> +our $project = $input_params{'project'};
> +if ($path_info && !defined $project) {
> +	# if $project was not defined by CGI, we try to extract it from
> +	# $path_info
> +	$project = $path_info;
> +	$project =~ s,/+$,,;
> +	while ($project && !check_head_link("$projectroot/$project")) {
> +		$project =~ s,/*[^/]*$,,;
> +	}
> +	$input_params{'project'} = $project;
> +} else {
> +	# otherwise, we suppress $path_info parsing altogether
> +	$path_info = undef;
> +}
> +
> +# project name validation
>  if (defined $project) {
>  	if (!validate_pathname($project) ||
>  	    !(-d "$projectroot/$project") ||

Note that this code does less checking if $project is in path_info than
for the case where it is set by CGI query. Perhaps there should be base
fast check in a loop, and more extensive test later.

> @@ -408,16 +506,66 @@ if (defined $project) {
>  		undef $project;
>  		die_error(404, "No such project");
>  	}
> +
> +	# we purge the $project name from the $path_info, preparing it for
> +	# subsequent parameters extraction
> +	$path_info =~ s,^\Q$project\E/*,, if defined $path_info;
> +} else {
> +	# we also suppress $path_info parsing if no project was defined
> +	$path_info = undef;
> +}

In evaluate_path_info it was simply 'return if...'; here with mentioned
interleaving it is harder and a bit hacky.

[cut]

> @@ -615,43 +676,12 @@ sub href (%) {
>  	# default is to use -absolute url() i.e. $my_uri
>  	my $href = $params{-full} ? $my_url : $my_uri;

[cut removed, or rather moved to beginning, @mapping]
  
>  	$params{'project'} = $project unless exists $params{'project'};
>  
>  	if ($params{-replay}) {
> -		while (my ($name, $symbol) = each %mapping) {
> -			if (!exists $params{$name}) {
> -				# the parameter we want to recycle may be either part of the
> -				# list of CGI parameter, or recovered from PATH_INFO
> -				if ($cgi->param($symbol)) {
> -					# to allow for multivalued params we use arrayref form
> -					$params{$name} = [ $cgi->param($symbol) ];
> -				} else {
> -					no strict 'refs';
> -					$params{$name} = $$name if $$name;
> -				}
> -			}
> +		while (my ($name, $val) = each %input_params) {
> +			$params{$name} = $input_params{$name}
> +				unless (exists $params{$name});

Very nice, short code.  Should be something like that from
the very beginning.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: generate project/action/hash URLs
  2008-10-02  0:10     ` [PATCHv4] gitweb: generate project/action/hash URLs Giuseppe Bilotta
  2008-10-02  0:10       ` [PATCHv4] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
@ 2008-10-03  1:48       ` Jakub Narebski
       [not found]         ` <cb7bb73a0810022330l498bdb20h703dec7833a443e@mail.gmail.com>
  2008-10-04  1:15       ` Jakub Narebski
  2 siblings, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2008-10-03  1:48 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Tue, 2 Oct 2008, Giuseppe Bilotta wrote:

> When generating path info URLs, reduce the number of CGI parameters by
> embedding action and hash_parent:filename or hash in the path.

_Perhaps_ it should be noted that even though gitweb accepted
'project/hash' and 'project/hash_base:file_name' path_info URLs, it
generated links with only 'project/' in path_info.

> 
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
> ---
>  gitweb/gitweb.perl |   32 +++++++++++++++++++++++++++++---
>  1 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index ec4326f..2c380ac 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -687,14 +687,40 @@ sub href (%) {
>  
>  	my ($use_pathinfo) = gitweb_check_feature('pathinfo');
>  	if ($use_pathinfo) {
> -		# use PATH_INFO for project name
> +		# try to put as many parameters as possible in PATH_INFO:
> +		#   - project name
> +		#   - action
> +		#   - hash or hash_base:filename
> +
> +		# Strip any trailing / from $href, or we might get double
> +		# slashes when the script is the DirectoryIndex

Perhaps example, like $href='gitweb.example.com/', could be put here.

> +		#

I think that we can lose this empty line comment here.

> +		$href =~ s,/$,,;
> +
> +		# Then add the project name, if present
>  		$href .= "/".esc_url($params{'project'}) if defined $params{'project'};
>  		delete $params{'project'};
>  
> -		# Summary just uses the project path URL
> -		if (defined $params{'action'} && $params{'action'} eq 'summary') {
> +		# Summary just uses the project path URL, any other action is
> +		# added to the URL
> +		if (defined $params{'action'}) {
> +			$href .= "/".esc_url($params{'action'}) unless $params{'action'} eq 'summary';
>  			delete $params{'action'};
>  		}

Good.

> +
> +		# Finally, we put either hash_base:file_name or hash
> +		if (defined $params{'hash_base'}) {
> +			$href .= "/".esc_url($params{'hash_base'});
> +			if (defined $params{'file_name'}) {
> +				$href .= ":".esc_url($params{'file_name'});
> +				delete $params{'file_name'};
> +			}
> +			delete $params{'hash'};
> +			delete $params{'hash_base'};
> +		} elsif (defined $params{'hash'}) {
> +			$href .= "/".esc_url($params{'hash'});
> +			delete $params{'hash'};
> +		}

Hmmmm...

Shouldn't the code first check for $file_name, then add either 
"$hash_base:$file_name" (url-escaped), or "$hash" (not "$hash_base")?

>  	}
>  
>  	# now encode the parameters explicitly

Thank you very much for work on improving gitweb.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-02 22:41             ` Jakub Narebski
@ 2008-10-03  5:54               ` Giuseppe Bilotta
  0 siblings, 0 replies; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-03  5:54 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Petr Baudis, git, Junio C Hamano, Shawn O. Pearce

On Fri, Oct 3, 2008 at 12:41 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> Petr Baudis wrote:
>> On Thu, Oct 02, 2008 at 11:05:18PM +0200, Giuseppe Bilotta wrote:
>> >
>> > In preparing the new patchset, I've put a big comment block explaining
>> > why we need to set both $hash and $hash_base in this code-path:
>> >
>> > # we got "project.git/[action/]branch". in this case
>> > # we set both $hash and $hash_base because different actions
>> > # need one or the other to be set to behave correctly.
>> > #
>> > # For example, if $hash_base is not set then the blob and
>
> 'blob' without $file_name doesn't have sense, but 'tree' does.
> Probably should be s/blob/tree/ above.
>
>> > # history links on the page project.git/tree/somebranch will
>> > # assume a $hash_base of HEAD instead of the correct
>> > # somebranch.

Note: the blob and history links _in a tree page_ are wrong. The page
itself is correct.
What this means is that if you go to project.git/tree/somebranch you
get the correct list of files and directories for the project at
somebranch's HEAD, but all the links IN that page will point to the
wrong version, because they will have no $hash_base set and will thus
default to HEAD.

You can see this effect for example by going to
http://repo.or.cz/w/git.git/v1.4.0?a=tree <- this hand-crafted path
would never be generated by the old pathinfo code, but it show exactly
what would happen with my pahtinfo code on
http://repo.or.cz/w/git.git/tree/v1.4.0 if $hash_base wasn't set.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-03  0:48       ` Jakub Narebski
@ 2008-10-03  6:04         ` Giuseppe Bilotta
  2008-10-03 10:31           ` Jakub Narebski
  0 siblings, 1 reply; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-03  6:04 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Fri, Oct 3, 2008 at 2:48 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:
>> OTOH, while setting both $hash and $hash_base has worked fine for me
>> so far (because the right one is automatically used and apparently
>> setting the other doesn't hurt), choosing which one to set is a much
>> hairier case. Do you have suggestions for a better way to always make
>> it work?
>
> Well, it is either checking $action and setting either $hash or
> $hash_base, or setting both, with some comments on why and when it is
> needed (as discussed on #git). IIUC $hash_base is needed only for
> filename-taking tree actions which acts on top-tree, and therefore
> don't need $file_name, like 'project/tree/branch' or related
> 'project/history/branch' (the latter is practically almost equivalent
> to 'project/shortlog/branch' or 'project/branch').
>
> I'm not sure if it wouldn't be better to call validate_refname($refname)
> once, either as:
>
>  $hash_base ||= $hash ||= validate_refname($refname);
>
> but that might be incorrect in the obscure case of setting $hash via 'h'
> CGI query parameter, and letting gitweb to set-up $hash_base via
> path_info, so perhaps ($refname is local to evaluate_path_info, IIRC)
>
>  $refname = validate_refname($refname);
>  $hash      ||= $refname;
>  $hash_base ||= $refname;

I'll go with this.

> But that is just nitpicking this fragment of code to death. In short:
> either check which of $hash and $hash_base to set in this branch of
> conditional, or explain why setting both $hash and $hash_base is needed,
> and why it is acceptable, either as comments, or in commit message.

Comment is probably better, as long as I remember to move it with the
code it belongs to ;)

>>>> @@ -631,8 +642,15 @@ sub href (%) {
>>>>       if ($params{-replay}) {
>>>>               while (my ($name, $symbol) = each %mapping) {
>>>>                       if (!exists $params{$name}) {
>>>> -                             # to allow for multivalued params we use arrayref form
>>>> -                             $params{$name} = [ $cgi->param($symbol) ];
>>>> +                             # the parameter we want to recycle may be either part of the
>>>> +                             # list of CGI parameter, or recovered from PATH_INFO
>>>> +                             if ($cgi->param($symbol)) {
>>>> +                                     # to allow for multivalued params we use arrayref form
>>>> +                                     $params{$name} = [ $cgi->param($symbol) ];
>>>> +                             } else {
>>>> +                                     no strict 'refs';
>>>> +                                     $params{$name} = $$name if $$name;
>>>
>>> I would _perhaps_ add here comment that multivalued parameters can come
>>> only from CGI query string, so there is no need for something like:
>>>
>>> +                                       $params{$name} = (ref($$name) ? @$name : $$name) if $$name;
>>>
>>>> +                             }
>>>>                       }
>>>>               }
>>>>       }
>>>
>>> This fragment is a bit of ugly code, hopefully corrected in later patch.
>>> I think it would be better to have 'refactor parsing/validation of input
>>> parameters' to be very fist patch in series; I am not sure but I suspect
>>> that is a kind of bugfix for current "$project/$hash" ('shortlog' view)
>>> and "$project/$hash_base:$file_name" ('blob_plain' and 'tree' view)
>>> path_info.
>>
>> But implementing the path_info parsing first makes the input param
>> refactoring SO much nicer that I would rather put a comment here
>> saying "this code sucks: we should rather collect all input
>> parameters" and then clean it up on the subsequent patch.
>
> Why not cleanup first?

Because cleaning it up depends on the refactoring, and the refactoring
is much cleaner when path_info already handles $action too.

> When implementing href(..., -replay=>1) I have forgot that some of
> gitweb parameters are implicitly passed ($project, because it is needed
> in most gitweb links), and some can be passed via path_info ($hash
> and/or $hash_base, $file_name). Your code adds $action to the mix, but
> it doesn't change the fact that 1.) even before your code -replay case
> was incorrect for some path_info links (handcrafted, as gitweb generates
> only $project via path_info); 2.) code you have added is a bit ugly.
>
> Besides using variables change a little meaning of -replay, namely
> in your code gitweb always sets action, even for non-path_name links
> when we started from "default action" (i.e. without action set) links.
> I guess this is mainly theoretical issue, as I don't think that default
> views use many -replay links.

Ah the issue of the default action is something I hadn't taken into
consideration, actually. Now the question is, should replay keep
default -> default, or should it go with default -> last incantation?

> P.S. with the idea of pushing parameters obtained not from CGI query
> string to $cgi->param() via "$cgi->param($name, $value);" or in named
> params form "$cgi->(-name=>$name, -value=>$value);" you would not need
> to change (a bit hacky, admittedly) href(...,-replay=>1) code.

Yes, but it would muddy the waters about 'where did this parameter
come from' in case we ever need to know that.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: refactor input parameters parse/validation
  2008-10-03  1:36     ` [PATCHv4] gitweb: refactor input parameters parse/validation Jakub Narebski
@ 2008-10-03  7:24       ` Giuseppe Bilotta
  2008-10-03 11:20         ` Jakub Narebski
  0 siblings, 1 reply; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-03  7:24 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Fri, Oct 3, 2008 at 3:36 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:
>
>> Since input parameters can now be obtained both from CGI parameters and
>> PATH_INFO, we would like most of the code to be agnostic about the way
>> parameters were retrieved.
>
> I'd prefer that this cleanup/refactoring patch was _first_ patch in
> the series, as we were able to obtain parameters both from CGI query
> parameters and from PATH_INFO ($project, $hash or $hash_base+$file_name)
> _before_ first patch in this series.  So it correct not only issue
> introduced by first patch (and fixed somewhat there), but what was
> outstanding (but rare because gitweb did not generate such links)
> issue.

It's true that the patch makes sense regardless of the rest of the
path_info patch, indeed.

>> We thus collect all the parameters into the new %input_params hash,
>> removing the need for ad-hoc kludgy code in href().
>
> Alternate solution would be push data from PATH_INFO in query params
> data using (for example)
>
>  $cgi->param('a', $action);
>
> or, naming parameters explicitely
>
>  $cgi->param(-name=>'a', -value=>$action);
>
> This avoids need for additional variable, reuses current code, and
> nicely sidesteps issue whether to use long names for keys ('action',
> 'file_name') or short ones from CGI query ('a', 'f').
>
>
> It would probably has to be at least partially written to check which
> of those two solutions (%input_params or $cgi->param('a', $a)))
> is better.

If we really don't want to care where the parameters came from *at
all*, writing on $cgi->param is likely to be the cleanest solution.

>> As much of the
>> parameters validation code is now shared between CGI and PATH_INFO,
>> although this requires PATH_INFO parsing to be interleaved into the main
>> code instead of being factored out into its own routine.
>
> I'm not sure if it is worth it then to unify parameters validation,
> with such drawback.

Yeah, I don't like it very much either. But it does spare a little on
the validation. OTOH, the amount we spare is not extraordinary, so
it's probably not worth the spaghettization of the code ...

>>  # if we're called with PATH_INFO, we have to strip that
>> -# from the URL to find our real URL
>> -if (my $path_info = $ENV{"PATH_INFO"}) {
>> +# from the URL to find our real URL. PATH_INFO is kept
>> +# as it's used later on for parameter extraction
>> +my $path_info = $ENV{"PATH_INFO"};
>> +if ($path_info) {
>>       $my_url =~ s,\Q$path_info\E$,,;
>>       $my_uri =~ s,\Q$path_info\E$,,;
>>  }
>
> This might be separate patch, if you wanted to increase your commit
> count ;-)  More seriously I think it should be at least briefly
> mentioned in commit message (make $path_info global).

Note however that the path_info evaluation is _destructive_, so after
evaluation is complete we don't have much of it left.

[snip]

> Nice, although I'm not sure if [%@]cgi_param_mapping has to global.
> If we use long parameters names as keys, I think it has to, somewhat.
> See also comment below.

>> +# fill %input_params with the CGI parameters. All values except for 'opt'
>> +# should be single values, but opt can be an array. We should probably
>> +# build an array of parameters that can be multi-valued, but since for the time
>> +# being it's only this one, we just single it out
>> +while (my ($name, $symbol) = each %cgi_param_mapping) {
>> +     if ($symbol eq 'opt') {
>> +             $input_params{$name} = [$cgi->param($symbol)];
>> +     } else {
>> +             $input_params{$name} = $cgi->param($symbol);
>>       }
>>  }
>
> If it was chosen to use short (CGI query) parameter names, the above
> loop could be replaced by simple
>
>  %input_params = $cgi->Vars;
>
> or to be more exact, if we want to have multi-valued parameters stored
> as array ref
>
>  %input_params = map { [ split /\0/ ] if /\0/; } $cgi->Vars;
>
>
> See CGI(3pm):
>
>    When using this [Vars], the thing you must watch out for are multivalued CGI
>    parameters.  Because a hash cannot distinguish between scalar and list con-
>    text, multivalued parameters will be returned as a packed string, separated
>    by the "\0" (null) character.  You must split this packed string in order
>    to get at the individual values.

Ah, an interesting alternative. This would make parameter copying a
one-liner, almost as good as just using $cgi->param for everything :)

>> -# parameters which are pathnames
>> -our $project = $cgi->param('p');
>> +# next, we want to parse PATH_INFO (which was already stored in $path_info at
>> +# the beginning). This is a little hairy because PATH_INFO parsing needs some
>> +# form of parameter validation, so we interleave parsing and validation.
>
> I don't think it is a good idea. In my opinion, for my taste, it would
> be better to separate evaluating path_info from the rest.
>
> We could instead introduce convention that if variable (like $project)
> is set, then it is assumed to be validated; if it is present only in
> the %input_params hash, then it has to be validated.
>
>
> On the other hand this ordering, first by parameter, then by method
> of extraction could be seem quite equally valid.  Nevertheless I think
> that previous flow with separate evaluate_path_info() and what should
> be evaluate_CGI_query() has better encapsulation.
>
>> +#
>> +# The accepted PATH_INFO syntax is $project/$action/$hash or
>> +# $project/$action/$hash_base:$file_name, where $action may be missing (mostly for
>> +# backwards compatibility), so we need to parse and validate the parameters in
>> +# this same order.
>> +
>> +# clear $path_info of any leading /
>> +$path_info =~ s,^/+,,;
>> +
>> +our $project = $input_params{'project'};
>> +if ($path_info && !defined $project) {
>> +     # if $project was not defined by CGI, we try to extract it from
>> +     # $path_info
>> +     $project = $path_info;
>> +     $project =~ s,/+$,,;
>> +     while ($project && !check_head_link("$projectroot/$project")) {
>> +             $project =~ s,/*[^/]*$,,;
>> +     }
>> +     $input_params{'project'} = $project;
>> +} else {
>> +     # otherwise, we suppress $path_info parsing altogether
>> +     $path_info = undef;
>> +}
>> +
>> +# project name validation
>>  if (defined $project) {
>>       if (!validate_pathname($project) ||
>>           !(-d "$projectroot/$project") ||
>
> Note that this code does less checking if $project is in path_info than
> for the case where it is set by CGI query. Perhaps there should be base
> fast check in a loop, and more extensive test later.

Uh ... isn't this exactly what's happening? In the loop we just gobble
until we find a git dir. Validation is then done, and it's the _same_
validation for both cases. Why do you say that path_info $project is
less checked?

>> @@ -408,16 +506,66 @@ if (defined $project) {
>>               undef $project;
>>               die_error(404, "No such project");
>>       }
>> +
>> +     # we purge the $project name from the $path_info, preparing it for
>> +     # subsequent parameters extraction
>> +     $path_info =~ s,^\Q$project\E/*,, if defined $path_info;
>> +} else {
>> +     # we also suppress $path_info parsing if no project was defined
>> +     $path_info = undef;
>> +}
>
> In evaluate_path_info it was simply 'return if...'; here with mentioned
> interleaving it is harder and a bit hacky.

I know.

>>       $params{'project'} = $project unless exists $params{'project'};
>>
>>       if ($params{-replay}) {
>> -             while (my ($name, $symbol) = each %mapping) {
>> -                     if (!exists $params{$name}) {
>> -                             # the parameter we want to recycle may be either part of the
>> -                             # list of CGI parameter, or recovered from PATH_INFO
>> -                             if ($cgi->param($symbol)) {
>> -                                     # to allow for multivalued params we use arrayref form
>> -                                     $params{$name} = [ $cgi->param($symbol) ];
>> -                             } else {
>> -                                     no strict 'refs';
>> -                                     $params{$name} = $$name if $$name;
>> -                             }
>> -                     }
>> +             while (my ($name, $val) = each %input_params) {
>> +                     $params{$name} = $input_params{$name}
>> +                             unless (exists $params{$name});
>
> Very nice, short code.  Should be something like that from
> the very beginning.

Ok, I'll try working up a patch for params merging before any
path_info extensions.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO
  2008-10-03  6:04         ` Giuseppe Bilotta
@ 2008-10-03 10:31           ` Jakub Narebski
  0 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2008-10-03 10:31 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Fri, 3 Oct 2008, Giuseppe Bilotta wrote:
> On Fri, Oct 3, 2008 at 2:48 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>> On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:

>>> But implementing the path_info parsing first makes the input param
>>> refactoring SO much nicer that I would rather put a comment here
>>> saying "this code sucks: we should rather collect all input
>>> parameters" and then clean it up on the subsequent patch.
>>
>> Why not cleanup first?
> 
> Because cleaning it up depends on the refactoring, and the refactoring
> is much cleaner when path_info already handles $action too.

Hmmm... it looks like after refactoring you don't have to change href()
at all when adding $action to parameters gitweb can get from path_info;
all that having refactoring (cleanup) upfront changes in/for this patch
is a bit different saving action (also in %input_params) and lack of
this ugly and I think subtly wrong added code for href(...,-replay=>1).
 
>> When implementing href(..., -replay=>1) I have forgot that some of
>> gitweb parameters are implicitly passed ($project, because it is needed
>> in most gitweb links), and some can be passed via path_info ($hash
>> and/or $hash_base, $file_name). Your code adds $action to the mix, but
>> it doesn't change the fact that 1.) even before your code -replay case
>> was incorrect for some path_info links (handcrafted, as gitweb generates
>> only $project via path_info); 2.) code you have added is a bit ugly.
>>
>> Besides using variables change a little meaning of -replay, namely
>> in your code gitweb always sets action, even for non-path_name links
>> when we started from "default action" (i.e. without action set) links.
>> I guess this is mainly theoretical issue, as I don't think that default
>> views use many -replay links.
> 
> Ah the issue of the default action is something I hadn't taken into
> consideration, actually. Now the question is, should replay keep
> default -> default, or should it go with default -> last incantation?

I think we should use default -> default.

Besides I think there can also be an issue ("can" because I am not sure
if in practice it is a problem) the fact that gitweb sometimes expand
parameters to sha-1, for example setting $head to git_get_head_hash()
if it is not set (default param value), and not to 'HEAD'. This perhaps
should be changed, but to be on safer side better not to use 'action
variables' because some code treats them as temporary variables.

>> P.S. with the idea of pushing parameters obtained not from CGI query
>> string to $cgi->param() via "$cgi->param($name, $value);" or in named
>> params form "$cgi->(-name=>$name, -value=>$value);" you would not need
>> to change (a bit hacky, admittedly) href(...,-replay=>1) code.
> 
> Yes, but it would muddy the waters about 'where did this parameter
> come from' in case we ever need to know that.

True. Like for example implementing -faithful_replay where if parameter
was passed through path_info it is replayed through path_info, and if
it was passed through query string it is replayed as CGI query param.

After thinking a bit about that I think that %input_params idea is
superior to both $cgi->params(-name=>..., -value=>...) and to have
either no strict refs $$name or name to action variable ref hash.
See also my comment for the refactoring patch.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: refactor input parameters parse/validation
  2008-10-03  7:24       ` Giuseppe Bilotta
@ 2008-10-03 11:20         ` Jakub Narebski
  0 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2008-10-03 11:20 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Fri, 3 Oct 2008, Giuseppe Bilotta wrote:
> On Fri, Oct 3, 2008 at 3:36 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>> On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:

>>> We thus collect all the parameters into the new %input_params hash,
>>> removing the need for ad-hoc kludgy code in href().
>>
>> Alternate solution would be push data from PATH_INFO in query params
>> data using (for example)
>>
>>  $cgi->param('a', $action);
>>
>> or, naming parameters explicitely
>>
>>  $cgi->param(-name=>'a', -value=>$action);
>>
>> This avoids need for additional variable, reuses current code, and
>> nicely sidesteps issue whether to use long names for keys ('action',
>> 'file_name') or short ones from CGI query ('a', 'f').
>>
>>
>> It would probably has to be at least partially written to check which
>> of those two solutions (%input_params or $cgi->param('a', $a)))
>> is better.
> 
> If we really don't want to care where the parameters came from *at
> all*, writing on $cgi->param is likely to be the cleanest solution.

I now think that %input_params is a better solution, even if we won't
be implementing -true_replay / -faithful_replay option, where options
gotten in path_info would be passed in path_info, and options passed
in query string would be passed as CGI params (even if those params
could be encoded in path_info).
 
One of the advantages is that having @cgi_param_mapping near the top
provides opportunity to describe short CGI query options.

Another advantage is that with %input_params it is easy to find if
parameter is multivalued, or is meant to be multivalued: just check
ref($input_params{long_name}); well, perhaps not _that_ easy, but...

>>> As much of the
>>> parameters validation code is now shared between CGI and PATH_INFO,
>>> although this requires PATH_INFO parsing to be interleaved into the main
>>> code instead of being factored out into its own routine.
>>
>> I'm not sure if it is worth it then to unify parameters validation,
>> with such drawback.
> 
> Yeah, I don't like it very much either. But it does spare a little on
> the validation. OTOH, the amount we spare is not extraordinary, so
> it's probably not worth the spaghettization of the code ...

Avoiding repetition can be done in different ways, for example we can
have gitweb assume that if value is already in params variable
($project, $action, $hash,...) it is already validated, and if it is
only in %input_params it is not.  Add to that for example putting
common part of project path checks into validate_project and you have
all the advantages (no code repetition, spared validation) without
drawback of harder to maintain spaghetti-like code.

>>> +# fill %input_params with the CGI parameters. All values except for 'opt'
>>> +# should be single values, but opt can be an array. We should probably
>>> +# build an array of parameters that can be multi-valued, but since for the time
>>> +# being it's only this one, we just single it out
>>> +while (my ($name, $symbol) = each %cgi_param_mapping) {
>>> +     if ($symbol eq 'opt') {
>>> +             $input_params{$name} = [$cgi->param($symbol)];
>>> +     } else {
>>> +             $input_params{$name} = $cgi->param($symbol);
>>>       }
>>>  }
>>
>> If it was chosen to use short (CGI query) parameter names, the above
>> loop could be replaced by simple
>>
>>  %input_params = $cgi->Vars;
>>
>> or to be more exact, if we want to have multi-valued parameters stored
>> as array ref
>>
>>  %input_params = map { [ split /\0/ ] if /\0/; } $cgi->Vars;
>>
>>
>> See CGI(3pm):
>>
>>    When using this [Vars], the thing you must watch out for are multivalued CGI
>>    parameters.  Because a hash cannot distinguish between scalar and list con-
>>    text, multivalued parameters will be returned as a packed string, separated
>>    by the "\0" (null) character.  You must split this packed string in order
>>    to get at the individual values.
> 
> Ah, an interesting alternative. This would make parameter copying a
> one-liner, almost as good as just using $cgi->param for everything :)

Hmmmm...
 

[cut]
>> Note that this code does less checking if $project is in path_info than
>> for the case where it is set by CGI query. Perhaps there should be base
>> fast check in a loop, and more extensive test later.
> 
> Uh ... isn't this exactly what's happening? In the loop we just gobble
> until we find a git dir. Validation is then done, and it's the _same_
> validation for both cases. Why do you say that path_info $project is
> less checked?

For project from query string we now have:

  !validate_pathname($project) ||
  !(-d "$projectroot/$project") ||
  !check_head_link("$projectroot/$project") ||
  ($export_ok && !(-e "$projectroot/$project/$export_ok")) ||
  ($strict_export && !project_in_list($project))

For project from path_info we now have:

  while ($project && !check_head_link("$projectroot/$project")) {
	$project =~ s,/*[^/]*$,,;
  }
  ...
  !validate_pathname($project) ||
  ($export_ok && !-e "$projectroot/$project/$export_ok") ||
  ($strict_export && !project_in_list($project))
  
It is almost the same; I have thought that they differ more. Perhaps
some of above code could be refactored into validate_project(), or
project_ok() subroutine.

>>>       $params{'project'} = $project unless exists $params{'project'};
>>>
>>>       if ($params{-replay}) {
>>> -             while (my ($name, $symbol) = each %mapping) {
>>> -                     if (!exists $params{$name}) {
>>> -                             # the parameter we want to recycle may be either part of the
>>> -                             # list of CGI parameter, or recovered from PATH_INFO
>>> -                             if ($cgi->param($symbol)) {
>>> -                                     # to allow for multivalued params we use arrayref form
>>> -                                     $params{$name} = [ $cgi->param($symbol) ];
>>> -                             } else {
>>> -                                     no strict 'refs';
>>> -                                     $params{$name} = $$name if $$name;
>>> -                             }
>>> -                     }
>>> +             while (my ($name, $val) = each %input_params) {
>>> +                     $params{$name} = $input_params{$name}
>>> +                             unless (exists $params{$name});
>>
>> Very nice, short code.  Should be something like that from
>> the very beginning.
> 
> Ok, I'll try working up a patch for params merging before any
> path_info extensions.

Perhaps also put query string handling into evaluate_CGI_query(), or
evaluate_query_string(), similar to evaluate_path_info()?

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: generate project/action/hash URLs
       [not found]         ` <cb7bb73a0810022330l498bdb20h703dec7833a443e@mail.gmail.com>
@ 2008-10-03 11:24           ` Jakub Narebski
  0 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2008-10-03 11:24 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git

On Fri, 3 October 2008, Giuseppe Bilotta wrote:
> On Fri, Oct 3, 2008 at 3:48 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>> On Tue, 2 Oct 2008, Giuseppe Bilotta wrote:

>>> +             # Finally, we put either hash_base:file_name or hash
>>> +             if (defined $params{'hash_base'}) {
>>> +                     $href .= "/".esc_url($params{'hash_base'});
>>> +                     if (defined $params{'file_name'}) {
>>> +                             $href .= ":".esc_url($params{'file_name'});
>>> +                             delete $params{'file_name'};
>>> +                     }
>>> +                     delete $params{'hash'};
>>> +                     delete $params{'hash_base'};
>>> +             } elsif (defined $params{'hash'}) {
>>> +                     $href .= "/".esc_url($params{'hash'});
>>> +                     delete $params{'hash'};
>>> +             }
>>
>> Hmmmm...
>>
>> Shouldn't the code first check for $file_name, then add either
>> "$hash_base:$file_name" (url-escaped), or "$hash" (not "$hash_base")?
> 
> Hm, your idea is probably better indeed, if we can ensure that
> $file_name is always set for generated links that need $hash_base (I
> mean, as opposed to the root tree case we were discussing for the
> first patch).

Hmm... I'm just worrying here about diluting meaning of params passed
via path_info.  We had either project/hash, or project/hash_base:file_name;
now the hashy parameter in path_info can be hash or hash_base and we
don't know which.  But perhaps I am worrying over nothing...

A short comment though would be nice (that we can have $hash or
$hash_base for case without $file_name)
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: use_pathinfo filenames start with /
  2008-10-02  0:10       ` [PATCHv4] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
  2008-10-02  0:10         ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
@ 2008-10-03 11:28         ` Jakub Narebski
  1 sibling, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2008-10-03 11:28 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:

> When using path info, make filenames start with a / (right after the :
> that separates them from the hash base). This minimal change allows
> relative navigation to work properly when viewing HTML files.

...in 'raw' mode/'blob_plain' view.

This allows to for example view web site as it was at some revision,
following relative links and checking if they are broken.

> 
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>

Acked-by: Jakub Narebski <jnareb@gmail.com>

(Note that it depends on previous patch, and without it doesn't make
sense, so this Ack doesn't matter much now!)

> ---
>  gitweb/gitweb.perl |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 2c380ac..3e5b2b7 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -690,7 +690,7 @@ sub href (%) {
>  		# try to put as many parameters as possible in PATH_INFO:
>  		#   - project name
>  		#   - action
> -		#   - hash or hash_base:filename
> +		#   - hash or hash_base:/filename
>  
>  		# Strip any trailing / from $href, or we might get double
>  		# slashes when the script is the DirectoryIndex
> @@ -708,11 +708,11 @@ sub href (%) {
>  			delete $params{'action'};
>  		}
>  
> -		# Finally, we put either hash_base:file_name or hash
> +		# Finally, we put either hash_base:/file_name or hash
>  		if (defined $params{'hash_base'}) {
>  			$href .= "/".esc_url($params{'hash_base'});
>  			if (defined $params{'file_name'}) {
> -				$href .= ":".esc_url($params{'file_name'});
> +				$href .= ":/".esc_url($params{'file_name'});
>  				delete $params{'file_name'};
>  			}
>  			delete $params{'hash'};
> -- 
> 1.5.6.5
> 
> 

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: generate project/action/hash URLs
  2008-10-02  0:10     ` [PATCHv4] gitweb: generate project/action/hash URLs Giuseppe Bilotta
  2008-10-02  0:10       ` [PATCHv4] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
  2008-10-03  1:48       ` [PATCHv4] gitweb: generate project/action/hash URLs Jakub Narebski
@ 2008-10-04  1:15       ` Jakub Narebski
  2 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2008-10-04  1:15 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:

> When generating path info URLs, reduce the number of CGI parameters by
> embedding action and hash_parent:filename or hash in the path.

I think this is good.

>---

> +		# Finally, we put either hash_base:file_name or hash
> +		if (defined $params{'hash_base'}) {
> +			$href .= "/".esc_url($params{'hash_base'});
> +			if (defined $params{'file_name'}) {
> +				$href .= ":".esc_url($params{'file_name'});
> +				delete $params{'file_name'};
> +			}
> +			delete $params{'hash'};
> +			delete $params{'hash_base'};
> +		} elsif (defined $params{'hash'}) {
> +			$href .= "/".esc_url($params{'hash'});
> +			delete $params{'hash'};
> +		}
>  	}

That I'm not sure about, both the layout of conditional (shouldn't
we check $file_name first), and the fact that we remove parameter
which is not passed, and can be even not recoverable (for example
both 'hash' and 'hash_base' set, 'hash' != 'hash_base', and
'file_name' not set).  So the code above probably has some strange
corner cases, but I guess it wouldn't be triggered by links generated
by gitweb.

But I guess that is "good enough", especially that 'tree' and 'history'
action links can have 'file_name' unset if they refer to top tree, and
they still need 'hash_base'.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse parent..current syntax from pathinfo
  2008-10-02  0:10         ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
  2008-10-02  0:10           ` [PATCHv4] gitweb: generate parent..current URLs Giuseppe Bilotta
@ 2008-10-04  1:31           ` Jakub Narebski
  2008-10-04  7:24             ` Giuseppe Bilotta
  2008-10-05  8:19             ` Jakub Narebski
  1 sibling, 2 replies; 34+ messages in thread
From: Jakub Narebski @ 2008-10-04  1:31 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:

> This makes it possible to use an URL such as
> $project/somebranch..otherbranch:/filename to get a diff between
> different version of a file. Paths like
> $project/$action/somebranch:/somefile..otherbranch:/otherfile are parsed
> as well.
>

In short, it allows to have link to '*diff' views using path_info URL,
or in general to pass $hash_[parent_]base and $file_parent using
path_info.
 
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
> ---
>  gitweb/gitweb.perl |   26 ++++++++++++++++++++++++--
>  1 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 3e5b2b7..89e360f 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -534,7 +534,9 @@ if ($path_info && !defined $action) {
>  
>  # we can now parse ref and pathnames in PATH_INFO
>  if ($path_info) {
> -	my ($refname, $pathname) = split(/:/, $path_info, 2);
> +	$path_info =~ /^((.+?)(:(.+))?\.\.)?(.+?)(:(.+))?$/;
> +	my ($parentrefname, $parentpathname, $refname, $pathname) = (
> +		$2, $4, $5, $7);

Style: I would use (but that is perhaps matter of taste)

+	my ($parentrefname, $parentpathname, $refname, $pathname) =
+		($2, $4, $5, $7);

Also it would be I think simpler to use instead non-catching grouping,
i.e. (?: xxx ) extended pattern (see perlre(1)), and use 
($1, $2, $3, $4), or even simpler  'list = (string =~ regexp)'  form.  


I also think that the situation is more complicated than that, if we
want to be more correct.  

The following path_info layouts with '..' make sense:

  hpb:fp..hb:f
  hpb..hb:f     == hpb:f..hb:f
  hp..h

And the layout below can be though to make sense, but it is just
plain weird.

  hpb:fp..f     == hpb:fp..HEAD:f
  

>  	if (defined $pathname) {
>  		# we got "project.git/action/branch:filename" or "project.git/action/branch:dir/"
>  		# we could use git_get_type(branch:pathname), but it needs $git_dir
> @@ -543,7 +545,11 @@ if ($path_info) {
>  			$input_params{'action'} ||= "tree";
>  			$pathname =~ s,/$,,;
>  		} else {
> -			$input_params{'action'}  ||= "blob_plain";
> +			if ($parentrefname) {
> +				$input_params{'action'} ||= "blobdiff_plain";
> +			} else {
> +				$input_params{'action'} ||= "blob_plain";
> +			}

Good catch.

>  		}
>  		$input_params{'hash_base'} ||= $refname;
>  		$input_params{'file_name'} ||= $pathname;
> @@ -553,6 +559,22 @@ if ($path_info) {
>  		$input_params{'hash'}      ||= $refname;
>  		$input_params{'hash_base'} ||= $refname;
>  	}
> +	# the parent part might be missing the pathname, in which case we use the $file_name, if present
> +	if (defined $parentrefname) {
> +		$input_params{'hash_parent_base'} ||= $parentrefname;
> +		if ($parentpathname) {
> +			$parentpathname =~ s,^/+,,;
> +			$parentpathname =~ s,/$,,;
> +			$input_params{'file_parent'} ||= $parentpathname;
> +		} else {
> +			$input_params{'file_parent'} ||= $input_params{'file_name'};
> +		}
> +		if (defined $input_params{'file_parent'}) {
> +			$input_params{'hash_parent'} ||= git_get_hash_by_path($input_params{'hash_parent_base'}, $input_params{'file_parent'});

This line is bit long, and I think it should be wrapped..

> +		} else {
> +			$input_params{'hash_parent'} ||= $parentrefname;
> +		}
> +	}
>  }

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse parent..current syntax from pathinfo
  2008-10-04  1:31           ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Jakub Narebski
@ 2008-10-04  7:24             ` Giuseppe Bilotta
  2008-10-04  7:48               ` Jakub Narebski
  2008-10-05  8:19             ` Jakub Narebski
  1 sibling, 1 reply; 34+ messages in thread
From: Giuseppe Bilotta @ 2008-10-04  7:24 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Sat, Oct 4, 2008 at 3:31 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:
>
>> This makes it possible to use an URL such as
>> $project/somebranch..otherbranch:/filename to get a diff between
>> different version of a file. Paths like
>> $project/$action/somebranch:/somefile..otherbranch:/otherfile are parsed
>> as well.
>>
>
> In short, it allows to have link to '*diff' views using path_info URL,
> or in general to pass $hash_[parent_]base and $file_parent using
> path_info.

Yes, that's probably a better form for the commit message.

>> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
>> ---
>>  gitweb/gitweb.perl |   26 ++++++++++++++++++++++++--
>>  1 files changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 3e5b2b7..89e360f 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -534,7 +534,9 @@ if ($path_info && !defined $action) {
>>
>>  # we can now parse ref and pathnames in PATH_INFO
>>  if ($path_info) {
>> -     my ($refname, $pathname) = split(/:/, $path_info, 2);
>> +     $path_info =~ /^((.+?)(:(.+))?\.\.)?(.+?)(:(.+))?$/;
>> +     my ($parentrefname, $parentpathname, $refname, $pathname) = (
>> +             $2, $4, $5, $7);
>
> Style: I would use (but that is perhaps matter of taste)
>
> +       my ($parentrefname, $parentpathname, $refname, $pathname) =
> +               ($2, $4, $5, $7);

Right, I'm not sure why I put the ( on the previous line.

> Also it would be I think simpler to use instead non-catching grouping,
> i.e. (?: xxx ) extended pattern (see perlre(1)), and use
> ($1, $2, $3, $4), or even simpler  'list = (string =~ regexp)'  form.

Good idea, I'll rework it in that sense.

> I also think that the situation is more complicated than that, if we
> want to be more correct.
>
> The following path_info layouts with '..' make sense:
>
>  hpb:fp..hb:f
>  hpb..hb:f     == hpb:f..hb:f
>  hp..h

And these are matched by the above regexp

> And the layout below can be though to make sense, but it is just
> plain weird.
>
>  hpb:fp..f     == hpb:fp..HEAD:f

I'm afraid I'm not going to support that, although it's probably easy
to support hpb:fp..:f (i.e. accept a missing refname but on condition
of having a : in front of the file spec).

>> +             if (defined $input_params{'file_parent'}) {
>> +                     $input_params{'hash_parent'} ||= git_get_hash_by_path($input_params{'hash_parent_base'}, $input_params{'file_parent'});
>
> This line is bit long, and I think it should be wrapped..

By the way, on the first revision of the path_info patchset, you had me discard

$hash      ||= git_get_hash_by_path($hash_base, $file_name);

in the simple case on the basis that it was an extra call to external git.

I actually forgot to remove it from this part of the patchset too at
the time, so this gets me wondering about this: should I put it back
in place in the simple case, or remove it from here too?


-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse parent..current syntax from pathinfo
  2008-10-04  7:24             ` Giuseppe Bilotta
@ 2008-10-04  7:48               ` Jakub Narebski
  0 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2008-10-04  7:48 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

Giuseppe Bilotta wrote:
> On Sat, Oct 4, 2008 at 3:31 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>> On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:
>>
>>> This makes it possible to use an URL such as
>>> $project/somebranch..otherbranch:/filename to get a diff between
>>> different version of a file. Paths like
>>> $project/$action/somebranch:/somefile..otherbranch:/otherfile are parsed
>>> as well.
>>>
>>
>> In short, it allows to have link to '*diff' views using path_info URL,
>> or in general to pass $hash_[parent_]base and $file_parent using
>> path_info.
> 
> Yes, that's probably a better form for the commit message.

I have thought about this rather as supplement (addition) to the
current commit message (which states explicitly new form of supported
path_info URL), than replacing it.
 
>> The following path_info layouts with '..' make sense:
>>
>>  hpb:fp..hb:f
>>  hpb..hb:f     == hpb:f..hb:f
>>  hp..h
> 
> And these are matched by the above regexp
> 
>> And the layout below can be though to make sense, but it is just
>> plain weird.
>>
>>  hpb:fp..f     == hpb:fp..HEAD:f
> 
> I'm afraid I'm not going to support that, although it's probably easy
> to support hpb:fp..:f (i.e. accept a missing refname but on condition
> of having a : in front of the file spec).

No, not supporting this form is just fine.
 
>>> +             if (defined $input_params{'file_parent'}) {
>>> +                     $input_params{'hash_parent'} ||= git_get_hash_by_path($input_params{'hash_parent_base'}, $input_params{'file_parent'});
>>
>> This line is bit long, and I think it should be wrapped..
> 
> By the way, on the first revision of the path_info patchset, you had me discard
> 
> $hash      ||= git_get_hash_by_path($hash_base, $file_name);
> 
> in the simple case on the basis that it was an extra call to external git.
> 
> I actually forgot to remove it from this part of the patchset too at
> the time, so this gets me wondering about this: should I put it back
> in place in the simple case, or remove it from here too?

I think you should remove it here too.  IMHO if needed, it should be
dealt with (and I think is dealt with) in appropriate action subroutine.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: parse parent..current syntax from pathinfo
  2008-10-04  1:31           ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Jakub Narebski
  2008-10-04  7:24             ` Giuseppe Bilotta
@ 2008-10-05  8:19             ` Jakub Narebski
  1 sibling, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2008-10-05  8:19 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Sun, 4 Oct 2008, Jakub Narebski wrote:
> On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:
 
> > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> > index 3e5b2b7..89e360f 100755
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -534,7 +534,9 @@ if ($path_info && !defined $action) {
> >  
> >  # we can now parse ref and pathnames in PATH_INFO
> >  if ($path_info) {
> > -     my ($refname, $pathname) = split(/:/, $path_info, 2);
> > +     $path_info =~ /^((.+?)(:(.+))?\.\.)?(.+?)(:(.+))?$/;
> > +     my ($parentrefname, $parentpathname, $refname, $pathname) = (
> > +             $2, $4, $5, $7);
> 
> Style: I would use (but that is perhaps matter of taste)
> 
> +       my ($parentrefname, $parentpathname, $refname, $pathname) =
> +               ($2, $4, $5, $7);
> 
> Also it would be I think simpler to use instead non-catching grouping,
> i.e. (?: xxx ) extended pattern (see perlre(1)), and use 
> ($1, $2, $3, $4), or even simpler  'list = (string =~ regexp)'  form.  

Alternate solution would be to use split(..., ..., 2), but I think you
got the regular expression right.  Just mentioning...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCHv4] gitweb: generate parent..current URLs
  2008-10-02  0:10           ` [PATCHv4] gitweb: generate parent..current URLs Giuseppe Bilotta
@ 2008-10-06  0:17             ` Jakub Narebski
  0 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2008-10-06  0:17 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano, Shawn O. Pearce

On Thu, 2 Oct 2008, Giuseppe Bilotta wrote:

> If use_pathinfo is enabled, href now creates links that contain paths in
> the form $project/$action/oldhash:/oldname..newhash:/newname for actions
> that use hash_parent etc.
[cut]

>From the first glance, it looks good. I just worry a bit about
complicated issue of hash_parent vs hash_parent_base etc.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2008-10-06  0:18 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-02  0:10 [PATCHv4] gitweb: PATH_INFO support improvements Giuseppe Bilotta
2008-10-02  0:10 ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
2008-10-02  0:10   ` [PATCHv4] gitweb: refactor input parameters parse/validation Giuseppe Bilotta
2008-10-02  0:10     ` [PATCHv4] gitweb: generate project/action/hash URLs Giuseppe Bilotta
2008-10-02  0:10       ` [PATCHv4] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
2008-10-02  0:10         ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
2008-10-02  0:10           ` [PATCHv4] gitweb: generate parent..current URLs Giuseppe Bilotta
2008-10-06  0:17             ` Jakub Narebski
2008-10-04  1:31           ` [PATCHv4] gitweb: parse parent..current syntax from pathinfo Jakub Narebski
2008-10-04  7:24             ` Giuseppe Bilotta
2008-10-04  7:48               ` Jakub Narebski
2008-10-05  8:19             ` Jakub Narebski
2008-10-03 11:28         ` [PATCHv4] gitweb: use_pathinfo filenames start with / Jakub Narebski
2008-10-03  1:48       ` [PATCHv4] gitweb: generate project/action/hash URLs Jakub Narebski
     [not found]         ` <cb7bb73a0810022330l498bdb20h703dec7833a443e@mail.gmail.com>
2008-10-03 11:24           ` Jakub Narebski
2008-10-04  1:15       ` Jakub Narebski
2008-10-03  1:36     ` [PATCHv4] gitweb: refactor input parameters parse/validation Jakub Narebski
2008-10-03  7:24       ` Giuseppe Bilotta
2008-10-03 11:20         ` Jakub Narebski
2008-10-02  8:59   ` [PATCHv4] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
2008-10-02  9:43     ` Giuseppe Bilotta
2008-10-03  0:48       ` Jakub Narebski
2008-10-03  6:04         ` Giuseppe Bilotta
2008-10-03 10:31           ` Jakub Narebski
2008-10-02 15:34   ` Petr Baudis
2008-10-02 19:30     ` Giuseppe Bilotta
2008-10-02 20:56       ` Petr Baudis
2008-10-02 21:05         ` Giuseppe Bilotta
2008-10-02 22:04           ` Petr Baudis
2008-10-02 22:41             ` Jakub Narebski
2008-10-03  5:54               ` Giuseppe Bilotta
2008-10-02  8:19 ` [PATCHv4] gitweb: PATH_INFO support improvements Jakub Narebski
2008-10-02  8:49   ` Giuseppe Bilotta
2008-10-02 10:16     ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).