git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/6] gitweb pathinfo improvements
@ 2008-09-21 20:57 Giuseppe Bilotta
  2008-09-21 20:57 ` [PATCH 1/6] gitweb: action in path with use_pathinfo Giuseppe Bilotta
  0 siblings, 1 reply; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-21 20:57 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Lea Wiemann, Junio C Hamano,
	Giuseppe Bilotta

This is a resend, with some improvements and a proper cover
letter, of my patchset to extend PATH_INFO support in gitweb.
Hopefully I'm doing it the right way this time :)

The basic idea is to have gitweb support paths in the form of
project/action/parent:/filename..hash:/filename
(modulo missing parameters) and to generate them when use_pathinfo
is enabled.

For backwards compatibility, old-style urls $project/$hash are
still supported (unless $hash is a named ref that happens to
coincide with a gitweb action). Also, CGI parameters are still
used when the path_info form would be ambiguous (e.g. filenames
with two consecutive dots in their name).

Giuseppe Bilotta (6):
  gitweb: action in path with use_pathinfo
  gitweb: use_pathinfo filenames start with /
  gitweb: parse parent..current syntax from pathinfo
  gitweb: use_pathinfo creates parent..current paths
  gitweb: remove PATH_INFO from $my_url and $my_uri
  gitweb: prevent double slashes in PATH_INFO hrefs

 gitweb/gitweb.perl |  161 +++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 122 insertions(+), 39 deletions(-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/6] gitweb: action in path with use_pathinfo
  2008-09-21 20:57 [PATCHv2 0/6] gitweb pathinfo improvements Giuseppe Bilotta
@ 2008-09-21 20:57 ` Giuseppe Bilotta
  2008-09-21 20:57   ` [PATCH 2/6] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
  2008-09-29  1:03   ` [PATCH 1/6] gitweb: action in path with use_pathinfo Jakub Narebski
  0 siblings, 2 replies; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-21 20:57 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Lea Wiemann, Junio C Hamano,
	Giuseppe Bilotta

When using path info, reduce the number of CGI parameters by embedding
the action in the path. The typicial cgiweb path is thus
$project/$action/$hash_base:$file_name or $project/$action/$hash

This is mostly backwards compatible with the old-style gitweb paths,
except when $project/$branch was used to access a branch whose name
matches a gitweb action.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 gitweb/gitweb.perl |  109 ++++++++++++++++++++++++++++++++++------------------
 1 files changed, 72 insertions(+), 37 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index da474d0..e783d12 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -488,6 +488,37 @@ if (defined $searchtext) {
 	$search_regexp = $search_use_regexp ? $searchtext : quotemeta $searchtext;
 }
 
+# dispatch
+my %actions = (
+	"blame" => \&git_blame,
+	"blobdiff" => \&git_blobdiff,
+	"blobdiff_plain" => \&git_blobdiff_plain,
+	"blob" => \&git_blob,
+	"blob_plain" => \&git_blob_plain,
+	"commitdiff" => \&git_commitdiff,
+	"commitdiff_plain" => \&git_commitdiff_plain,
+	"commit" => \&git_commit,
+	"forks" => \&git_forks,
+	"heads" => \&git_heads,
+	"history" => \&git_history,
+	"log" => \&git_log,
+	"rss" => \&git_rss,
+	"atom" => \&git_atom,
+	"search" => \&git_search,
+	"search_help" => \&git_search_help,
+	"shortlog" => \&git_shortlog,
+	"summary" => \&git_summary,
+	"tag" => \&git_tag,
+	"tags" => \&git_tags,
+	"tree" => \&git_tree,
+	"snapshot" => \&git_snapshot,
+	"object" => \&git_object,
+	# those below don't need $project
+	"opml" => \&git_opml,
+	"project_list" => \&git_project_list,
+	"project_index" => \&git_project_index,
+);
+
 # now read PATH_INFO and use it as alternative to parameters
 sub evaluate_path_info {
 	return if defined $project;
@@ -512,6 +543,16 @@ sub evaluate_path_info {
 	# do not change any parameters if an action is given using the query string
 	return if $action;
 	$path_info =~ s,^\Q$project\E/*,,;
+
+	# next comes the action
+	$action = $path_info;
+	$action =~ s,/.*$,,;
+	if (exists $actions{$action}) {
+		$path_info =~ s,^\Q$action\E/*,,;
+	} else {
+		$action  = undef;
+	}
+
 	my ($refname, $pathname) = split(/:/, $path_info, 2);
 	if (defined $pathname) {
 		# we got "project.git/branch:filename" or "project.git/branch:dir/"
@@ -525,10 +566,12 @@ sub evaluate_path_info {
 		}
 		$hash_base ||= validate_refname($refname);
 		$file_name ||= validate_pathname($pathname);
+		$hash      ||= git_get_hash_by_path($hash_base, $file_name);
 	} elsif (defined $refname) {
 		# we got "project.git/branch"
-		$action ||= "shortlog";
-		$hash   ||= validate_refname($refname);
+		$action    ||= "shortlog";
+		$hash      ||= validate_refname($refname);
+		$hash_base ||= validate_refname($refname);
 	}
 }
 evaluate_path_info();
@@ -537,37 +580,6 @@ evaluate_path_info();
 our $git_dir;
 $git_dir = "$projectroot/$project" if $project;
 
-# dispatch
-my %actions = (
-	"blame" => \&git_blame,
-	"blobdiff" => \&git_blobdiff,
-	"blobdiff_plain" => \&git_blobdiff_plain,
-	"blob" => \&git_blob,
-	"blob_plain" => \&git_blob_plain,
-	"commitdiff" => \&git_commitdiff,
-	"commitdiff_plain" => \&git_commitdiff_plain,
-	"commit" => \&git_commit,
-	"forks" => \&git_forks,
-	"heads" => \&git_heads,
-	"history" => \&git_history,
-	"log" => \&git_log,
-	"rss" => \&git_rss,
-	"atom" => \&git_atom,
-	"search" => \&git_search,
-	"search_help" => \&git_search_help,
-	"shortlog" => \&git_shortlog,
-	"summary" => \&git_summary,
-	"tag" => \&git_tag,
-	"tags" => \&git_tags,
-	"tree" => \&git_tree,
-	"snapshot" => \&git_snapshot,
-	"object" => \&git_object,
-	# those below don't need $project
-	"opml" => \&git_opml,
-	"project_list" => \&git_project_list,
-	"project_index" => \&git_project_index,
-);
-
 if (!defined $action) {
 	if (defined $hash) {
 		$action = git_get_type($hash);
@@ -624,8 +636,13 @@ sub href (%) {
 	if ($params{-replay}) {
 		while (my ($name, $symbol) = each %mapping) {
 			if (!exists $params{$name}) {
-				# to allow for multivalued params we use arrayref form
-				$params{$name} = [ $cgi->param($symbol) ];
+				if ($cgi->param($symbol)) {
+					# to allow for multivalued params we use arrayref form
+					$params{$name} = [ $cgi->param($symbol) ];
+				} else {
+					no strict 'refs';
+					$params{$name} = $$name if $$name;
+				}
 			}
 		}
 	}
@@ -636,10 +653,28 @@ sub href (%) {
 		$href .= "/".esc_url($params{'project'}) if defined $params{'project'};
 		delete $params{'project'};
 
-		# Summary just uses the project path URL
-		if (defined $params{'action'} && $params{'action'} eq 'summary') {
+		# Summary just uses the project path URL, any other action come next
+		# in the URL
+		if (defined $params{'action'}) {
+			$href .= "/".esc_url($params{'action'}) unless $params{'action'} eq 'summary';
 			delete $params{'action'};
 		}
+
+		# next, we put either hash_base:file_name or hash
+		if (defined $params{'hash_base'}) {
+			$href .= "/".esc_url($params{'hash_base'});
+			if (defined $params{'file_name'}) {
+				$href .= ":".esc_url($params{'file_name'});
+				delete $params{'hash'} if $params{'hash'} eq git_get_hash_by_path($params{'hash_base'},$params{'file_name'});
+				delete $params{'file_name'};
+			} else {
+				delete $params{'hash'} if $params{'hash'} eq $params{'hash_base'};
+			}
+			delete $params{'hash_base'};
+		} elsif (defined $params{'hash'}) {
+			$href .= "/".esc_url($params{'hash'});
+			delete $params{'hash'};
+		}
 	}
 
 	# now encode the parameters explicitly
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/6] gitweb: use_pathinfo filenames start with /
  2008-09-21 20:57 ` [PATCH 1/6] gitweb: action in path with use_pathinfo Giuseppe Bilotta
@ 2008-09-21 20:57   ` Giuseppe Bilotta
  2008-09-21 20:57     ` [PATCH 3/6] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
  2008-09-29  1:08     ` [PATCH 2/6] gitweb: use_pathinfo filenames start with / Jakub Narebski
  2008-09-29  1:03   ` [PATCH 1/6] gitweb: action in path with use_pathinfo Jakub Narebski
  1 sibling, 2 replies; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-21 20:57 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Lea Wiemann, Junio C Hamano,
	Giuseppe Bilotta

When using path info, make filenames start with a / (right after the :
that separates them from the hash base). This minimal change allows
relative navigation to work properly when viewing HTML files.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 gitweb/gitweb.perl |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e783d12..18da484 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -664,7 +664,7 @@ sub href (%) {
 		if (defined $params{'hash_base'}) {
 			$href .= "/".esc_url($params{'hash_base'});
 			if (defined $params{'file_name'}) {
-				$href .= ":".esc_url($params{'file_name'});
+				$href .= ":/".esc_url($params{'file_name'});
 				delete $params{'hash'} if $params{'hash'} eq git_get_hash_by_path($params{'hash_base'},$params{'file_name'});
 				delete $params{'file_name'};
 			} else {
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/6] gitweb: parse parent..current syntax from pathinfo
  2008-09-21 20:57   ` [PATCH 2/6] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
@ 2008-09-21 20:57     ` Giuseppe Bilotta
  2008-09-21 20:57       ` [PATCH 4/6] gitweb: use_pathinfo creates parent..current paths Giuseppe Bilotta
  2008-09-29  1:08     ` [PATCH 2/6] gitweb: use_pathinfo filenames start with / Jakub Narebski
  1 sibling, 1 reply; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-21 20:57 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Lea Wiemann, Junio C Hamano,
	Giuseppe Bilotta

This makes it possible to use an URL such as
$project/somebranch..otherbranch:/filename to get a diff between
different version of a file. Paths like
$project/$action/somebranch:/somefile..otherbranch:/otherfile are parsed
as well.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 gitweb/gitweb.perl |   26 ++++++++++++++++++++++++--
 1 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 18da484..9868bf4 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -553,7 +553,9 @@ sub evaluate_path_info {
 		$action  = undef;
 	}
 
-	my ($refname, $pathname) = split(/:/, $path_info, 2);
+	$path_info =~ /^((.+?)(:(.+))?\.\.)?(.+?)(:(.+))?$/;
+	my ($parentrefname, $parentpathname, $refname, $pathname) = (
+		$2, $4, $5, $7);
 	if (defined $pathname) {
 		# we got "project.git/branch:filename" or "project.git/branch:dir/"
 		# we could use git_get_type(branch:pathname), but it needs $git_dir
@@ -562,7 +564,11 @@ sub evaluate_path_info {
 			$action  ||= "tree";
 			$pathname =~ s,/$,,;
 		} else {
-			$action  ||= "blob_plain";
+			if ($parentrefname) {
+				$action ||= "blobdiff_plain";
+			} else {
+				$action  ||= "blob_plain";
+			}
 		}
 		$hash_base ||= validate_refname($refname);
 		$file_name ||= validate_pathname($pathname);
@@ -573,6 +579,22 @@ sub evaluate_path_info {
 		$hash      ||= validate_refname($refname);
 		$hash_base ||= validate_refname($refname);
 	}
+	# the parent part might be missing the pathname, in which case we use the $file_name, if present
+	if (defined $parentrefname) {
+		$hash_parent_base ||= validate_refname($parentrefname);
+		if ($parentpathname) {
+			$parentpathname =~ s,^/+,,;
+			$parentpathname =~ s,/$,,;
+			$file_parent ||= validate_pathname($parentpathname);
+		} else {
+			$file_parent ||= $file_name
+		}
+		if (defined $file_parent) {
+			$hash_parent ||= git_get_hash_by_path($hash_parent_base, $file_parent);
+		} else {
+			$hash_parent ||= validate_refname($parentrefname);
+		}
+	}
 }
 evaluate_path_info();
 
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/6] gitweb: use_pathinfo creates parent..current paths
  2008-09-21 20:57     ` [PATCH 3/6] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
@ 2008-09-21 20:57       ` Giuseppe Bilotta
  2008-09-21 20:57         ` [PATCH 5/6] gitweb: remove PATH_INFO from $my_url and $my_uri Giuseppe Bilotta
  0 siblings, 1 reply; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-21 20:57 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Lea Wiemann, Junio C Hamano,
	Giuseppe Bilotta

If use_pathinfo is enabled, href now creates links that contain paths in
the form $project/$action/oldhash:/oldname..newhash:/newname for actions
that use hash_parent etc, unless oldname contains two consecutive dots, since
this would cause ambiguity when parsing the resulting path.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 gitweb/gitweb.perl |   26 +++++++++++++++++++++++---
 1 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 9868bf4..0dd2526 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -682,9 +682,29 @@ sub href (%) {
 			delete $params{'action'};
 		}
 
-		# next, we put either hash_base:file_name or hash
+		# next, we put hash_parent_base:/file_parent..hash_base:/file_name, stripping nonexistent or useless pieces
+		$href .= "/" if ($params{'hash_base'} || $params{'hash_parent_base'} || $params{'hash_parent'} || $params{'hash'});
 		if (defined $params{'hash_base'}) {
-			$href .= "/".esc_url($params{'hash_base'});
+			if (defined $params{'hash_parent_base'}) {
+				$href .= esc_url($params{'hash_parent_base'});
+				if (defined $params{'file_parent'} && $params{'file_parent'} !~ /\.\./) {
+					$href .= ":/".esc_url($params{'file_parent'}) unless $params{'file_parent'} eq $params{'file_name'};
+					delete $params{'hash_parent'} if $params{'hash_parent'} eq git_get_hash_by_path($params{'hash_parent_base'},$params{'file_parent'});
+					delete $params{'file_parent'};
+				} else {
+					delete $params{'hash_parent'} if $params{'hash_parent'} eq $params{'hash_parent_base'};
+					if ($params{'file_name'}) {
+						delete $params{'hash_parent'} if $params{'hash_parent'} eq git_get_hash_by_path($params{'hash_parent_base'},$params{'file_name'});
+					}
+				}
+				$href .= "..";
+				delete $params{'hash_parent_base'};
+			} elsif (defined $params{'hash_parent'}) {
+				$href .= esc_url($params{'hash_parent'}). "..";
+				delete $params{'hash_parent'};
+			}
+
+			$href .= esc_url($params{'hash_base'});
-			if (defined $params{'file_name'}) {
+			if (defined $params{'file_name'} && $params{'file_name'} !~ /\.\./) {
 				$href .= ":/".esc_url($params{'file_name'});
 				delete $params{'hash'} if $params{'hash'} eq git_get_hash_by_path($params{'hash_base'},$params{'file_name'});
@@ -694,7 +714,7 @@ sub href (%) {
 			}
 			delete $params{'hash_base'};
 		} elsif (defined $params{'hash'}) {
-			$href .= "/".esc_url($params{'hash'});
+			$href .= esc_url($params{'hash'});
 			delete $params{'hash'};
 		}
 	}
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/6] gitweb: remove PATH_INFO from $my_url and $my_uri
  2008-09-21 20:57       ` [PATCH 4/6] gitweb: use_pathinfo creates parent..current paths Giuseppe Bilotta
@ 2008-09-21 20:57         ` Giuseppe Bilotta
  2008-09-21 20:57           ` [PATCH 6/6] gitweb: prevent double slashes in PATH_INFO hrefs Giuseppe Bilotta
  2008-09-29  8:33           ` [PATCH 5/6] gitweb: remove PATH_INFO from $my_url and $my_uri Jakub Narebski
  0 siblings, 2 replies; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-21 20:57 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Lea Wiemann, Junio C Hamano,
	Giuseppe Bilotta

This patch (combined with appropriate server configuration) allows usage
of the gitweb CGI script as DirectoryIndex for the server root even when
the pathinfo feature is enabled.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 gitweb/gitweb.perl |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 0dd2526..4a91d07 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -26,6 +26,10 @@ our $cgi = new CGI;
 our $version = "++GIT_VERSION++";
 our $my_url = $cgi->url();
 our $my_uri = $cgi->url(-absolute => 1);
+if (my $path_info = $ENV{"PATH_INFO"}) {
+	$my_url =~ s,$path_info$,,;
+	$my_uri =~ s,$path_info$,,;
+}
 
 # core git executable to use
 # this can just be "git" if your webserver has a sensible PATH
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/6] gitweb: prevent double slashes in PATH_INFO hrefs
  2008-09-21 20:57         ` [PATCH 5/6] gitweb: remove PATH_INFO from $my_url and $my_uri Giuseppe Bilotta
@ 2008-09-21 20:57           ` Giuseppe Bilotta
  2008-09-29 18:12             ` Jakub Narebski
  2008-09-29  8:33           ` [PATCH 5/6] gitweb: remove PATH_INFO from $my_url and $my_uri Jakub Narebski
  1 sibling, 1 reply; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-21 20:57 UTC (permalink / raw)
  To: git
  Cc: Jakub Narebski, Petr Baudis, Lea Wiemann, Junio C Hamano,
	Giuseppe Bilotta

When using PATH_INFO in combination with a rewrite rule that hides the
cgi script name, links to projects and/or actions without projects might
be generated with a double slash.

Fix by removing the trailing slash (if present) from $href before
appending PATH_INFO data.

Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
 gitweb/gitweb.perl |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 4a91d07..ebab86b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -675,6 +675,8 @@ sub href (%) {
 
 	my ($use_pathinfo) = gitweb_check_feature('pathinfo');
 	if ($use_pathinfo) {
+		$href =~ s,/$,,;
+
 		# use PATH_INFO for project name
 		$href .= "/".esc_url($params{'project'}) if defined $params{'project'};
 		delete $params{'project'};
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] gitweb: action in path with use_pathinfo
  2008-09-21 20:57 ` [PATCH 1/6] gitweb: action in path with use_pathinfo Giuseppe Bilotta
  2008-09-21 20:57   ` [PATCH 2/6] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
@ 2008-09-29  1:03   ` Jakub Narebski
  2008-09-29 14:22     ` Giuseppe Bilotta
  1 sibling, 1 reply; 25+ messages in thread
From: Jakub Narebski @ 2008-09-29  1:03 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Lea Wiemann, Junio C Hamano

I'm sorry for the delay reviewing those patches.

On Sun, 21 Sep 2008, Giuseppe Bilotta wrote:

> When using path info, reduce the number of CGI parameters by embedding
> the action in the path. The typicial cgiweb path is thus
> $project/$action/$hash_base:$file_name or $project/$action/$hash

cgiweb?

> 
> This is mostly backwards compatible with the old-style gitweb paths,
> except when $project/$branch was used to access a branch whose name
> matches a gitweb action.

I would also state that gitweb _generates_ such pathinfo links if
configured (or if coming from pahinfo URL), and that this change
allow to represent wider number of gitweb links (gitweb URLs) in
pathinfo form.

Also, from what I understand, generated pathinfo links now always
use action, so they are a tiny little bit longer.

> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
> ---
>  gitweb/gitweb.perl |  109 ++++++++++++++++++++++++++++++++++------------------
>  1 files changed, 72 insertions(+), 37 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index da474d0..e783d12 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -488,6 +488,37 @@ if (defined $searchtext) {
>  	$search_regexp = $search_use_regexp ? $searchtext : quotemeta $searchtext;
>  }
>  
> +# dispatch
> +my %actions = (
> +	"blame" => \&git_blame,
> +	"blobdiff" => \&git_blobdiff,
> +	"blobdiff_plain" => \&git_blobdiff_plain,
> +	"blob" => \&git_blob,
> +	"blob_plain" => \&git_blob_plain,
> +	"commitdiff" => \&git_commitdiff,
> +	"commitdiff_plain" => \&git_commitdiff_plain,
> +	"commit" => \&git_commit,
> +	"forks" => \&git_forks,
> +	"heads" => \&git_heads,
> +	"history" => \&git_history,
> +	"log" => \&git_log,
> +	"rss" => \&git_rss,
> +	"atom" => \&git_atom,
> +	"search" => \&git_search,
> +	"search_help" => \&git_search_help,
> +	"shortlog" => \&git_shortlog,
> +	"summary" => \&git_summary,
> +	"tag" => \&git_tag,
> +	"tags" => \&git_tags,
> +	"tree" => \&git_tree,
> +	"snapshot" => \&git_snapshot,
> +	"object" => \&git_object,
> +	# those below don't need $project
> +	"opml" => \&git_opml,
> +	"project_list" => \&git_project_list,
> +	"project_index" => \&git_project_index,
> +);
> +

This is as I understand simply moving %actions hash around?
Well, you could instead split hash declaration from defining it,
in the form of

   my %actions = ();
   ...
   %actions = (
        ...
   );

but I guess moving declaration earlier is good solution.

>  # now read PATH_INFO and use it as alternative to parameters
>  sub evaluate_path_info {
>  	return if defined $project;
> @@ -512,6 +543,16 @@ sub evaluate_path_info {
>  	# do not change any parameters if an action is given using the query string
>  	return if $action;
>  	$path_info =~ s,^\Q$project\E/*,,;
> +
> +	# next comes the action
> +	$action = $path_info;
> +	$action =~ s,/.*$,,;

I would use perhaps "$action = ($path_info =~ m!^([^/]+)/!;"
But that is Perl, so TIMTOWDI.

> +	if (exists $actions{$action}) {
> +		$path_info =~ s,^\Q$action\E/*,,;
> +	} else {
> +		$action  = undef;
> +	}
> +

I don't think you need quoting pattern metacharacters; valid actions
should not contain regexp metacharacters.  Defensive programming?

>  	my ($refname, $pathname) = split(/:/, $path_info, 2);
>  	if (defined $pathname) {
>  		# we got "project.git/branch:filename" or "project.git/branch:dir/"
> @@ -525,10 +566,12 @@ sub evaluate_path_info {
>  		}
>  		$hash_base ||= validate_refname($refname);
>  		$file_name ||= validate_pathname($pathname);
> +		$hash      ||= git_get_hash_by_path($hash_base, $file_name);

I don't understand why you feel that you need to do this (this is
additional git command fork, as git_get_hash_by_path calls Git, to
be more exact it calls git-ls-tree (it could call git-rev-parse
instead).  Moreover, I don't understand why you need to do this _here_,
instead of just before where you would have to have $hash variable set.

>  	} elsif (defined $refname) {
>  		# we got "project.git/branch"
> -		$action ||= "shortlog";
> -		$hash   ||= validate_refname($refname);
> +		$action    ||= "shortlog";
> +		$hash      ||= validate_refname($refname);
> +		$hash_base ||= validate_refname($refname);
>  	}
>  }
>  evaluate_path_info();
> @@ -624,8 +636,13 @@ sub href (%) {
>  	if ($params{-replay}) {
>  		while (my ($name, $symbol) = each %mapping) {
>  			if (!exists $params{$name}) {
> -				# to allow for multivalued params we use arrayref form
> -				$params{$name} = [ $cgi->param($symbol) ];
> +				if ($cgi->param($symbol)) {
> +					# to allow for multivalued params we use arrayref form
> +					$params{$name} = [ $cgi->param($symbol) ];
> +				} else {
> +					no strict 'refs';
> +					$params{$name} = $$name if $$name;
> +				}
>  			}
>  		}
>  	}

What this change is about? And why this change is _here_, in this
commit? It is I think unrelated, and wrong change. 

href(..., -replay=>1) is all about reusing current URL, perhaps with
a few parameters changed, like for example pagination links differ only
in page number param change.  For example if we had only hb= and f=
parameters, -replay=>1 links should use only those, and not add h=
parameter because somewhere we felt that we need $hash to be calculated.

> @@ -636,10 +653,28 @@ sub href (%) {

This hunk is about generating pathinfo URL, isn't it?

You probably would want to change comment at the top of this part
of href() subroutine, namely

  	if ($use_pathinfo) {
		# use PATH_INFO for project name

as you now try to use PATH_INFO for more than project name. Please do
not leave comments to get out of sync with the code.

>  		$href .= "/".esc_url($params{'project'}) if defined $params{'project'};
>  		delete $params{'project'};
>  
> -		# Summary just uses the project path URL
> -		if (defined $params{'action'} && $params{'action'} eq 'summary') {
> +		# Summary just uses the project path URL, any other action come next
> +		# in the URL
> +		if (defined $params{'action'}) {
> +			$href .= "/".esc_url($params{'action'}) unless $params{'action'} eq 'summary';
>  			delete $params{'action'};
>  		}
> +
> +		# next, we put either hash_base:file_name or hash
> +		if (defined $params{'hash_base'}) {
> +			$href .= "/".esc_url($params{'hash_base'});
> +			if (defined $params{'file_name'}) {
> +				$href .= ":".esc_url($params{'file_name'});
> +				delete $params{'hash'} if $params{'hash'} eq git_get_hash_by_path($params{'hash_base'},$params{'file_name'});

First, this page has around 140 characters. That is too long, much too long.
Please try to wrap code around 80-characters.

Second, following Petr 'Pasky' Baudis suggestion of reducing complexity
and shortening gitweb URLs, we could unconditionally remove redundant
'hash' parameter if we have both 'hash_base' and 'file_name'
parameters.  This would also simplify and speed up (lack of extra fork)
gitweb code.

> +				delete $params{'file_name'};
> +			} else {
> +				delete $params{'hash'} if $params{'hash'} eq $params{'hash_base'};
> +			}
> +			delete $params{'hash_base'};
> +		} elsif (defined $params{'hash'}) {
> +			$href .= "/".esc_url($params{'hash'});
> +			delete $params{'hash'};
> +		}

O.K.... I think.

Did you test this, preferably also using Mechanize test, with gitweb
configuration turning path_info on by default.?

>  	}
>  
>  	# now encode the parameters explicitly

Hmmm... now I am not so sure if it wouldn't be better to split this
patch in pathinfo parsing and pathinfo generation. The first part
would be obvious, the second part would be smaller and easier to review.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] gitweb: use_pathinfo filenames start with /
  2008-09-21 20:57   ` [PATCH 2/6] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
  2008-09-21 20:57     ` [PATCH 3/6] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
@ 2008-09-29  1:08     ` Jakub Narebski
  2008-09-29 14:12       ` Giuseppe Bilotta
  1 sibling, 1 reply; 25+ messages in thread
From: Jakub Narebski @ 2008-09-29  1:08 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Lea Wiemann, Junio C Hamano

On Sun, 21 Sep 2008, Giuseppe Bilotta wrote:

> When using path info, make filenames start with a / (right after the :
> that separates them from the hash base). This minimal change allows
> relative navigation to work properly when viewing HTML files.

Good idea. Nitpick: instead of "using path info", perhaps "generating
path info URL"; this change is about gitweb link generation...

Did you check if gitweb strips leading '/' from filename?

> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
> ---
>  gitweb/gitweb.perl |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index e783d12..18da484 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -664,7 +664,7 @@ sub href (%) {
>  		if (defined $params{'hash_base'}) {
>  			$href .= "/".esc_url($params{'hash_base'});
>  			if (defined $params{'file_name'}) {
> -				$href .= ":".esc_url($params{'file_name'});
> +				$href .= ":/".esc_url($params{'file_name'});
>  				delete $params{'hash'} if $params{'hash'} eq git_get_hash_by_path($params{'hash_base'},$params{'file_name'});
>  				delete $params{'file_name'};
>  			} else {
> -- 
> 1.5.6.5

Is there reason why this change is separate (not squashed) from
previous commit?

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] gitweb: remove PATH_INFO from $my_url and $my_uri
  2008-09-21 20:57         ` [PATCH 5/6] gitweb: remove PATH_INFO from $my_url and $my_uri Giuseppe Bilotta
  2008-09-21 20:57           ` [PATCH 6/6] gitweb: prevent double slashes in PATH_INFO hrefs Giuseppe Bilotta
@ 2008-09-29  8:33           ` Jakub Narebski
  2008-09-29 13:05             ` Giuseppe Bilotta
  1 sibling, 1 reply; 25+ messages in thread
From: Jakub Narebski @ 2008-09-29  8:33 UTC (permalink / raw)
  To: Giuseppe Bilotta
  Cc: git, Petr Baudis, Lea Wiemann,
	Shawn O. Pearce (interim Git maintainer)

On Sun, 21 Sep 2008, Giuseppe Bilotta wrote:

> This patch (combined with appropriate server configuration) allows usage
> of the gitweb CGI script as DirectoryIndex for the server root even when
> the pathinfo feature is enabled.
>

This is IMHO a bugfix for a path_info handling bug, for which there
was an ugly workaround of specifying base URL ($my_url and $my_uri)
explicitly in gitweb configuration (GITWEB_CONFIG).

Therefore I think that this patch should have been sent outside of
the rest of "new path_info features" series, as a separate single
patch, and now that it is there it really should be applied, perhaps
even to the 'maint' branch.

> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>

Acked-by: Jakub Narebski <jnareb@gmail.com>
(for what is worth)

> ---
>  gitweb/gitweb.perl |    4 ++++
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 0dd2526..4a91d07 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -26,6 +26,10 @@ our $cgi = new CGI;
>  our $version = "++GIT_VERSION++";
>  our $my_url = $cgi->url();
>  our $my_uri = $cgi->url(-absolute => 1);

Perhaps some comment here?

> +if (my $path_info = $ENV{"PATH_INFO"}) {
> +	$my_url =~ s,$path_info$,,;
> +	$my_uri =~ s,$path_info$,,;

+	$my_url =~ s,\Q$path_info\E$,,;
+	$my_uri =~ s,\Q$path_info\E$,,;

Just in case.

> +}
>  
>  # core git executable to use
>  # this can just be "git" if your webserver has a sensible PATH

I was wondering if $path_info should be global variable, but then
I checked that $path_info is local to evaluate_path_info() subroutine.
So it is good as it is now, but with quoting regular expression
metacharacters.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] gitweb: remove PATH_INFO from $my_url and $my_uri
  2008-09-29  8:33           ` [PATCH 5/6] gitweb: remove PATH_INFO from $my_url and $my_uri Jakub Narebski
@ 2008-09-29 13:05             ` Giuseppe Bilotta
  0 siblings, 0 replies; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-29 13:05 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: git, Petr Baudis, Lea Wiemann,
	Shawn O. Pearce (interim Git maintainer)

On Mon, Sep 29, 2008 at 10:33 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Sun, 21 Sep 2008, Giuseppe Bilotta wrote:
>
>> This patch (combined with appropriate server configuration) allows usage
>> of the gitweb CGI script as DirectoryIndex for the server root even when
>> the pathinfo feature is enabled.
>>
>
> This is IMHO a bugfix for a path_info handling bug, for which there
> was an ugly workaround of specifying base URL ($my_url and $my_uri)
> explicitly in gitweb configuration (GITWEB_CONFIG).
>
> Therefore I think that this patch should have been sent outside of
> the rest of "new path_info features" series, as a separate single
> patch, and now that it is there it really should be applied, perhaps
> even to the 'maint' branch.

Ok, I'll resend it separately, with the suggested fix.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] gitweb: use_pathinfo filenames start with /
  2008-09-29  1:08     ` [PATCH 2/6] gitweb: use_pathinfo filenames start with / Jakub Narebski
@ 2008-09-29 14:12       ` Giuseppe Bilotta
  2008-09-29 23:20         ` Jakub Narebski
  0 siblings, 1 reply; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-29 14:12 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Lea Wiemann, Junio C Hamano

On Mon, Sep 29, 2008 at 3:08 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Sun, 21 Sep 2008, Giuseppe Bilotta wrote:
>
>> When using path info, make filenames start with a / (right after the :
>> that separates them from the hash base). This minimal change allows
>> relative navigation to work properly when viewing HTML files.
>
> Good idea. Nitpick: instead of "using path info", perhaps "generating
> path info URL"; this change is about gitweb link generation...

Right.

> Did you check if gitweb strips leading '/' from filename?

Yes.

>> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
>> ---
>>  gitweb/gitweb.perl |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index e783d12..18da484 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -664,7 +664,7 @@ sub href (%) {
>>               if (defined $params{'hash_base'}) {
>>                       $href .= "/".esc_url($params{'hash_base'});
>>                       if (defined $params{'file_name'}) {
>> -                             $href .= ":".esc_url($params{'file_name'});
>> +                             $href .= ":/".esc_url($params{'file_name'});
>>                               delete $params{'hash'} if $params{'hash'} eq git_get_hash_by_path($params{'hash_base'},$params{'file_name'});
>>                               delete $params{'file_name'};
>>                       } else {
>> --
>> 1.5.6.5
>
> Is there reason why this change is separate (not squashed) from
> previous commit?

Historical reason (i.e. I came up with the idea later on). I'll squash it.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] gitweb: action in path with use_pathinfo
  2008-09-29  1:03   ` [PATCH 1/6] gitweb: action in path with use_pathinfo Jakub Narebski
@ 2008-09-29 14:22     ` Giuseppe Bilotta
  2008-09-30  0:21       ` Jakub Narebski
  0 siblings, 1 reply; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-29 14:22 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Lea Wiemann, Junio C Hamano

On Mon, Sep 29, 2008 at 3:03 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> I'm sorry for the delay reviewing those patches.
>
> On Sun, 21 Sep 2008, Giuseppe Bilotta wrote:
>
>> When using path info, reduce the number of CGI parameters by embedding
>> the action in the path. The typicial cgiweb path is thus
>> $project/$action/$hash_base:$file_name or $project/$action/$hash
>
> cgiweb?

Good question. I have no idea why I wrote that instad of gitweb.

>> This is mostly backwards compatible with the old-style gitweb paths,
>> except when $project/$branch was used to access a branch whose name
>> matches a gitweb action.
>
> I would also state that gitweb _generates_ such pathinfo links if
> configured (or if coming from pahinfo URL), and that this change
> allow to represent wider number of gitweb links (gitweb URLs) in
> pathinfo form.

Can do.

> Also, from what I understand, generated pathinfo links now always
> use action, so they are a tiny little bit longer.

Is that a problem, by the way? I've had half-thoughts about making the
action implicit when possible, but I'm afraid that's prone to make the
code way more complex and the path info handling much less robust.

>> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
>> ---
>>  gitweb/gitweb.perl |  109 ++++++++++++++++++++++++++++++++++------------------
>>  1 files changed, 72 insertions(+), 37 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index da474d0..e783d12 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -488,6 +488,37 @@ if (defined $searchtext) {
>>       $search_regexp = $search_use_regexp ? $searchtext : quotemeta $searchtext;
>>  }
>>
>> +# dispatch
>> +my %actions = (
>> +     "blame" => \&git_blame,
>> +     "blobdiff" => \&git_blobdiff,
>> +     "blobdiff_plain" => \&git_blobdiff_plain,
>> +     "blob" => \&git_blob,
>> +     "blob_plain" => \&git_blob_plain,
>> +     "commitdiff" => \&git_commitdiff,
>> +     "commitdiff_plain" => \&git_commitdiff_plain,
>> +     "commit" => \&git_commit,
>> +     "forks" => \&git_forks,
>> +     "heads" => \&git_heads,
>> +     "history" => \&git_history,
>> +     "log" => \&git_log,
>> +     "rss" => \&git_rss,
>> +     "atom" => \&git_atom,
>> +     "search" => \&git_search,
>> +     "search_help" => \&git_search_help,
>> +     "shortlog" => \&git_shortlog,
>> +     "summary" => \&git_summary,
>> +     "tag" => \&git_tag,
>> +     "tags" => \&git_tags,
>> +     "tree" => \&git_tree,
>> +     "snapshot" => \&git_snapshot,
>> +     "object" => \&git_object,
>> +     # those below don't need $project
>> +     "opml" => \&git_opml,
>> +     "project_list" => \&git_project_list,
>> +     "project_index" => \&git_project_index,
>> +);
>> +
>
> This is as I understand simply moving %actions hash around?

Yes.

> Well, you could instead split hash declaration from defining it,
> in the form of
>
>   my %actions = ();
>   ...
>   %actions = (
>        ...
>   );
>
> but I guess moving declaration earlier is good solution.

Is there some coding style recommendation wrt this situations, or is
it just a matter of making the patch smaller?

>>  # now read PATH_INFO and use it as alternative to parameters
>>  sub evaluate_path_info {
>>       return if defined $project;
>> @@ -512,6 +543,16 @@ sub evaluate_path_info {
>>       # do not change any parameters if an action is given using the query string
>>       return if $action;
>>       $path_info =~ s,^\Q$project\E/*,,;
>> +
>> +     # next comes the action
>> +     $action = $path_info;
>> +     $action =~ s,/.*$,,;
>
> I would use perhaps "$action = ($path_info =~ m!^([^/]+)/!;"
> But that is Perl, so TIMTOWDI.

Well, Perl is not my native language so I tend to stay away from
complex expressions if possible ;-)

>> +     if (exists $actions{$action}) {
>> +             $path_info =~ s,^\Q$action\E/*,,;
>> +     } else {
>> +             $action  = undef;
>> +     }
>> +
>
> I don't think you need quoting pattern metacharacters; valid actions
> should not contain regexp metacharacters.  Defensive programming?

You're right, in this case it's probably excessive. I'll strip it out.

>>       my ($refname, $pathname) = split(/:/, $path_info, 2);
>>       if (defined $pathname) {
>>               # we got "project.git/branch:filename" or "project.git/branch:dir/"
>> @@ -525,10 +566,12 @@ sub evaluate_path_info {
>>               }
>>               $hash_base ||= validate_refname($refname);
>>               $file_name ||= validate_pathname($pathname);
>> +             $hash      ||= git_get_hash_by_path($hash_base, $file_name);
>
> I don't understand why you feel that you need to do this (this is
> additional git command fork, as git_get_hash_by_path calls Git, to
> be more exact it calls git-ls-tree (it could call git-rev-parse
> instead).  Moreover, I don't understand why you need to do this _here_,
> instead of just before where you would have to have $hash variable set.

Hm. I must confess that I honestly don't remember. The same holds for
the other chunks you have perplexities on. When I started writing
these patches I came across a few situations where $hash wouldn't
carry over properly, but now I can't seem to recreate those issues
anymore, which leads me to suspect it was a problem with hand-crafted
links (i.e. before I coded the link generation part too). I'll resend
without these chunks.


> > @@ -624,8 +636,13 @@ sub href (%) {
> >       if ($params{-replay}) {
> >               while (my ($name, $symbol) = each %mapping) {
> >                       if (!exists $params{$name}) {
> > -                             # to allow for multivalued params we use arrayref form
> > -                             $params{$name} = [ $cgi->param($symbol) ];
> > +                             if ($cgi->param($symbol)) {
> > +                                     # to allow for multivalued params we use arrayref form
> > +                                     $params{$name} = [ $cgi->param($symbol) ];
> > +                             } else {
> > +                                     no strict 'refs';
> > +                                     $params{$name} = $$name if $$name;
> > +                             }
> >                       }
> >               }
> >       }
>
> What this change is about? And why this change is _here_, in this
> commit? It is I think unrelated, and wrong change.

This is about being able to recycle CGI parameters that came through
as part of path_info instead of the CGI parameter list. It might not
be the best way to recover it, though. I *did* have a few thoughts
about an alternative way that consisted of build a parameter list
merging CGI and path-info parameter, but since this approach seemed to
work, I went with it.

> href(..., -replay=>1) is all about reusing current URL, perhaps with
> a few parameters changed, like for example pagination links differ only
> in page number param change.  For example if we had only hb= and f=
> parameters, -replay=>1 links should use only those, and not add h=
> parameter because somewhere we felt that we need $hash to be calculated.

Assume for example that you are to an url such as

http://git.oblomov.eu/git/tree/refs/remotes/origin/master:gitweb

Without this patch, the 'history' link on the second header line links
to ARRAY(0xblah)ARRAY(0xblah). With this patch, it shows the proper
link. So either replay is being abused somewhere in the link
generation code, or this CGI+path_info parameter retrieval is
necessary, one way or the other.

>> @@ -636,10 +653,28 @@ sub href (%) {
>
> This hunk is about generating pathinfo URL, isn't it?

Yes.

> You probably would want to change comment at the top of this part
> of href() subroutine, namely
>
>        if ($use_pathinfo) {
>                # use PATH_INFO for project name
>
> as you now try to use PATH_INFO for more than project name. Please do
> not leave comments to get out of sync with the code.

Right.

>>               $href .= "/".esc_url($params{'project'}) if defined $params{'project'};
>>               delete $params{'project'};
>>
>> -             # Summary just uses the project path URL
>> -             if (defined $params{'action'} && $params{'action'} eq 'summary') {
>> +             # Summary just uses the project path URL, any other action come next
>> +             # in the URL
>> +             if (defined $params{'action'}) {
>> +                     $href .= "/".esc_url($params{'action'}) unless $params{'action'} eq 'summary';
>>                       delete $params{'action'};
>>               }
>> +
>> +             # next, we put either hash_base:file_name or hash
>> +             if (defined $params{'hash_base'}) {
>> +                     $href .= "/".esc_url($params{'hash_base'});
>> +                     if (defined $params{'file_name'}) {
>> +                             $href .= ":".esc_url($params{'file_name'});
>> +                             delete $params{'hash'} if $params{'hash'} eq git_get_hash_by_path($params{'hash_base'},$params{'file_name'});
>
> First, this page has around 140 characters. That is too long, much too long.
> Please try to wrap code around 80-characters.
>
> Second, following Petr 'Pasky' Baudis suggestion of reducing complexity
> and shortening gitweb URLs, we could unconditionally remove redundant
> 'hash' parameter if we have both 'hash_base' and 'file_name'
> parameters.  This would also simplify and speed up (lack of extra fork)
> gitweb code.

If it's indeed guaranteed that hash is not needed in these cases, it's
surely the best course of action. I changed the code to that effect.

>> +                             delete $params{'file_name'};
>> +                     } else {
>> +                             delete $params{'hash'} if $params{'hash'} eq $params{'hash_base'};
>> +                     }
>> +                     delete $params{'hash_base'};
>> +             } elsif (defined $params{'hash'}) {
>> +                     $href .= "/".esc_url($params{'hash'});
>> +                     delete $params{'hash'};
>> +             }
>
> O.K.... I think.
>
> Did you test this, preferably also using Mechanize test, with gitweb
> configuration turning path_info on by default.?

The whole patchset is running both on my machine on
http://git.oblomov.eu and on the rbot repository
http://ruby-rbot.org/gitweb (older version there, though). I have no
idea how to run automated Mechanize tests though.

>>       }
>>
>>       # now encode the parameters explicitly
>
> Hmmm... now I am not so sure if it wouldn't be better to split this
> patch in pathinfo parsing and pathinfo generation. The first part
> would be obvious, the second part would be smaller and easier to review.

Ok, I'll split parsing from generation. Since it's what I did for
subsequent extensions (such as the parent..current thing) it also fits
nicely with the patchflow.



-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/6] gitweb: prevent double slashes in PATH_INFO hrefs
  2008-09-21 20:57           ` [PATCH 6/6] gitweb: prevent double slashes in PATH_INFO hrefs Giuseppe Bilotta
@ 2008-09-29 18:12             ` Jakub Narebski
  0 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2008-09-29 18:12 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Lea Wiemann, Junio C Hamano

On Sun, 21 Sep 2008, Giuseppe Bilotta wrote:

> When using PATH_INFO in combination with a rewrite rule that hides the
> cgi script name, links to projects and/or actions without projects might
> be generated with a double slash.
> 

You mean here that base URL ends with '/'?

> Fix by removing the trailing slash (if present) from $href before
> appending PATH_INFO data.
> 
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>

Acked-by: Jakub Narebski <jnareb@gmail.com>

This is a good change, and worth applying even before the rest of
series (which probably would go through a few rounds of review).
I'm not sure if it applies cleanly, but conceptually it does not
depend on the rest of patches in this series.

> ---
>  gitweb/gitweb.perl |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 4a91d07..ebab86b 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -675,6 +675,8 @@ sub href (%) {
>  
>  	my ($use_pathinfo) = gitweb_check_feature('pathinfo');
>  	if ($use_pathinfo) {
> +		$href =~ s,/$,,;
> +
>  		# use PATH_INFO for project name
>  		$href .= "/".esc_url($params{'project'}) if defined $params{'project'};
>  		delete $params{'project'};
> -- 
> 1.5.6.5
> 
> 

Should not go wrong...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] gitweb: use_pathinfo filenames start with /
  2008-09-29 14:12       ` Giuseppe Bilotta
@ 2008-09-29 23:20         ` Jakub Narebski
  2008-09-30  7:48           ` Giuseppe Bilotta
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Narebski @ 2008-09-29 23:20 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Lea Wiemann

On Mon, 29 Sep 2008, Giuseppe Bilotta wrote:
> On Mon, Sep 29, 2008 at 3:08 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>> On Sun, 21 Sep 2008, Giuseppe Bilotta wrote:

>>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>>> index e783d12..18da484 100755
>>> --- a/gitweb/gitweb.perl
>>> +++ b/gitweb/gitweb.perl
>>> @@ -664,7 +664,7 @@ sub href (%) {
>>>               if (defined $params{'hash_base'}) {
>>>                       $href .= "/".esc_url($params{'hash_base'});
>>>                       if (defined $params{'file_name'}) {
>>> -                             $href .= ":".esc_url($params{'file_name'});
>>> +                             $href .= ":/".esc_url($params{'file_name'});
>>>                               delete $params{'hash'} if $params{'hash'} eq git_get_hash_by_path($params{'hash_base'},$params{'file_name'});
>>>                               delete $params{'file_name'};
>>>                       } else {
>>> --
>>> 1.5.6.5
>>
>> Is there reason why this change is separate (not squashed) from
>> previous commit?
> 
> Historical reason (i.e. I came up with the idea later on). I'll squash it.

Hn. Now I am not sure if it should be squashed, or should be separate.


Good change.

Please don't forget to describe this decision, i.e. why gitweb is
using "branch:/filename" when creating path_info links instead of
simple "branch:name" for relative URLs in HTML in 'raw' ('blob_plain')
view, in the commit message.  Thanks in advance.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] gitweb: action in path with use_pathinfo
  2008-09-29 14:22     ` Giuseppe Bilotta
@ 2008-09-30  0:21       ` Jakub Narebski
  2008-09-30  8:05         ` Giuseppe Bilotta
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Narebski @ 2008-09-30  0:21 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Lea Wiemann

On Mon, 29 Sep 2008, Giuseppe Bilotta wrote:
> On Mon, Sep 29, 2008 at 3:03 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>> On Sun, 21 Sep 2008, Giuseppe Bilotta wrote:

>> Also, from what I understand, generated pathinfo links now always
>> use action, so they are a tiny little bit longer.
> 
> Is that a problem, by the way? I've had half-thoughts about making the
> action implicit when possible, but I'm afraid that's prone to make the
> code way more complex and the path info handling much less robust.

No, I don't think there is a problem; if I remember correctly action
is omitted for default actions with and without project, i.e. for
projects list, and for 'summary' view for a project, which is default
view in absence of other parameters.

I would explain difference between then and now in the patch adding
support for _creating_ wider range of path_info links (I don't know,
perhaps you did that in new version):
 * path_info URL were always without action, and were possible only
   for the case of default action (projects list, summary), and in the
   case of implicit action ('tree' for trees i.e. filename ending in
   '/'; 'blob_plain' for ordinary files i.e. filename but no '/' at end;
   'shortlog' for bare ref, assuming branch).
 * now that pathinfo can contain action, wider range of URL can be done
   as purely path_info links; in other way more of parameters can be put
   in path_info part.

Perhaps not in so large and detailed form... I guess explanation of
using ':/' as separator should be put there as well, if you plan to
squash those patches.
 
>>> ---
>>>  gitweb/gitweb.perl |  109 ++++++++++++++++++++++++++++++++++------------------
>>>  1 files changed, 72 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>>> index da474d0..e783d12 100755
>>> --- a/gitweb/gitweb.perl
>>> +++ b/gitweb/gitweb.perl

>> Well, you could instead split hash declaration from defining it,
>> in the form of
>>
>>   my %actions = ();
>>   ...
>>   %actions = (
>>        ...
>>   );
>>
>> but I guess moving declaration earlier is good solution.
> 
> Is there some coding style recommendation wrt this situations, or is
> it just a matter of making the patch smaller?

I think that moving %actions earlier is a better solution.
 
>>>  # now read PATH_INFO and use it as alternative to parameters
>>>  sub evaluate_path_info {
>>>       return if defined $project;
>>> @@ -512,6 +543,16 @@ sub evaluate_path_info {
>>>       # do not change any parameters if an action is given using the query string
>>>       return if $action;
>>>       $path_info =~ s,^\Q$project\E/*,,;
>>> +
>>> +     # next comes the action
>>> +     $action = $path_info;
>>> +     $action =~ s,/.*$,,;
>>
>> I would use perhaps "($action) = ($path_info =~ m!^([^/]+)!);"
>> But that is Perl, so TIMTOWDI.
> 
> Well, Perl is not my native language so I tend to stay away from
> complex expressions if possible ;-)

What I meant is instead of "copy and strip" use "find match".
I tried to use one-liner, but it could be written instead as:

+     # next comes the action
+     if ($path_info =~ m!^([^/]+)!) {;
+     	   $action = $1;
+     }

But I guess your approach is equally valid.  I don't think of myself
being a Perl expert, either.

>>> @@ -525,10 +566,12 @@ sub evaluate_path_info {
>>>               }
>>>               $hash_base ||= validate_refname($refname);
>>>               $file_name ||= validate_pathname($pathname);
>>> +             $hash      ||= git_get_hash_by_path($hash_base, $file_name);
>>
>> I don't understand why you feel that you need to do this (this is
>> additional git command fork, as git_get_hash_by_path calls Git, to
>> be more exact it calls git-ls-tree (it could call git-rev-parse
>> instead).  Moreover, I don't understand why you need to do this _here_,
>> instead of just before where you would have to have $hash variable set.
> 
> Hm. I must confess that I honestly don't remember. The same holds for
> the other chunks you have perplexities on. When I started writing
> these patches I came across a few situations where $hash wouldn't
> carry over properly, but now I can't seem to recreate those issues
> anymore, which leads me to suspect it was a problem with hand-crafted
> links (i.e. before I coded the link generation part too). I'll resend
> without these chunks.

That is the problem, but not as large a problem as having similar code
calling git_get_hash_by_path() during link generation, in href(...)
subroutine (at least without Lea's gitweb caching, the part that reuses
"git cat-file --batch").  This is called once per view (page); that
was called once per generated gitweb link...
 
>>> @@ -624,8 +636,13 @@ sub href (%) {
>>>       if ($params{-replay}) {
>>>               while (my ($name, $symbol) = each %mapping) {
>>>                       if (!exists $params{$name}) {
>>> -                             # to allow for multivalued params we use arrayref form
>>> -                             $params{$name} = [ $cgi->param($symbol) ];
>>> +                             if ($cgi->param($symbol)) {
>>> +                                     # to allow for multivalued params we use arrayref form
>>> +                                     $params{$name} = [ $cgi->param($symbol) ];
>>> +                             } else {
>>> +                                     no strict 'refs';
>>> +                                     $params{$name} = $$name if $$name;
>>> +                             }
>>>                       }
>>>               }
>>>       }
>>
>> What this change is about? And why this change is _here_, in this
>> commit? It is I think unrelated, and wrong change.
> 
> This is about being able to recycle CGI parameters that came through
> as part of path_info instead of the CGI parameter list. It might not
> be the best way to recover it, though. I *did* have a few thoughts
> about an alternative way that consisted of build a parameter list
> merging CGI and path-info parameter, but since this approach seemed to
> work, I went with it.

Fact, I have totally forgot about this.

>> href(..., -replay=>1) is all about reusing current URL, perhaps with
>> a few parameters changed, like for example pagination links differ only
>> in page number param change.  For example if we had only hb= and f=
>> parameters, -replay=>1 links should use only those, and not add h=
>> parameter because somewhere we felt that we need $hash to be calculated.
> 
> Assume for example that you are to an url such as
> 
> http://git.oblomov.eu/git/tree/refs/remotes/origin/master:gitweb
> 
> Without this patch, the 'history' link on the second header line links
> to ARRAY(0xblah)ARRAY(0xblah). With this patch, it shows the proper
> link. So either replay is being abused somewhere in the link
> generation code, or this CGI+path_info parameter retrieval is
> necessary, one way or the other.

Ah.  Now I understand.

When creating code for href(..., -replay=>1), which by the way I thought
would be more useful than actually is, I have forgot that parameters to
gitweb could be passed in other way that through CGI parameters
(CGI query)[1].

Using

	$params{$name} = [ $cgi->param($symbol) ];

is a cute hack, but it doesn't work for arguments passed via path_info
(was: project, hash_base and file_name; while now it is project, action,
hash_base (in full) and file_name).


The solution I thought about and abandoned in favor of this cute hack
was to have additional hash (in addition to %mapping), which would map
action names to references to variables holding the value for parameter.

This has the same problem as your proposed solution of putting some
parameters which didn't come from URL but were filled from other info.
$hash parameter is most likely to be culprit here.

On the other hand it is more generic and doesn't rely on knowledge that
there is no multi-valued parameter which can be expressed in path_info
(currently only 'opt' parameter can be multi-valued, and requires
arrayref, but also 'opt' parameter is and cannot be put in path_info).

I am talking there about the following solution:

	my %action_vars = (
		project => \$project,
		action => \$action,
                # ...
		extra_options => \@extra_options,
	);
        # ...
        while (my ($name, $symbol) = each %mapping) {
                if (!exists $params{$name}) {
                          $params{$name} = ${$action_vars{$name}};
                }
        }


This avoids cure hack of (from your code)

                } else {
                           no strict 'refs';
                           $params{$name} = $$name if $$name;
                }

I think that gitweb should use single source, not CGI query parameters
or variable saving [sanitized] value.


[*1*] Currently parameters can be passed either as CGI query parameters
      (which I remember about), but also (with some restrictions) in the
      path_info part of gitweb URLs.  If we implement command line
      switches (for example to generate directory listing in format
      expected by gitweb for file $projects_list, or for off-line
      generation of RSS feed), there could be yet another source.

>>> @@ -636,10 +653,28 @@ sub href (%) {

>>> +             # next, we put either hash_base:file_name or hash
>>> +             if (defined $params{'hash_base'}) {
>>> +                     $href .= "/".esc_url($params{'hash_base'});
>>> +                     if (defined $params{'file_name'}) {
>>> +                             $href .= ":".esc_url($params{'file_name'});
>>> +                             delete $params{'hash'} if $params{'hash'} eq git_get_hash_by_path($params{'hash_base'},$params{'file_name'});
>>
>> First, this page has around 140 characters. That is too long, much too long.
>> Please try to wrap code around 80-characters.
>>
>> Second, following Petr 'Pasky' Baudis suggestion of reducing complexity
>> and shortening gitweb URLs, we could unconditionally remove redundant
>> 'hash' parameter if we have both 'hash_base' and 'file_name'
>> parameters.  This would also simplify and speed up (lack of extra fork)
>> gitweb code.
> 
> If it's indeed guaranteed that hash is not needed in these cases, it's
> surely the best course of action. I changed the code to that effect.

Hash is not needed for hash_base:file_name, unless you messed up
something (by hand-crafting URL).  If they do not much, you have more
problems than that...
 
>> Hmmm... now I am not so sure if it wouldn't be better to split this
>> patch in pathinfo parsing and pathinfo generation. The first part
>> would be obvious, the second part would be smaller and easier to review.
> 
> Ok, I'll split parsing from generation. Since it's what I did for
> subsequent extensions (such as the parent..current thing) it also fits
> nicely with the patchflow.

Thanks.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] gitweb: use_pathinfo filenames start with /
  2008-09-29 23:20         ` Jakub Narebski
@ 2008-09-30  7:48           ` Giuseppe Bilotta
  2008-09-30 23:49             ` Jakub Narebski
  0 siblings, 1 reply; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-30  7:48 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Lea Wiemann

On Tue, Sep 30, 2008 at 1:20 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>
> Hn. Now I am not sure if it should be squashed, or should be separate.

Yeah. The fact that it's *specifically* to allow web docs to be used
in raw view makes it count towarda a feature in itself, even if the
patch by itself is trivial ...

So. Squashed or separate? :)

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] gitweb: action in path with use_pathinfo
  2008-09-30  0:21       ` Jakub Narebski
@ 2008-09-30  8:05         ` Giuseppe Bilotta
  2008-09-30  8:48           ` Jakub Narebski
  0 siblings, 1 reply; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-30  8:05 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Lea Wiemann

On Tue, Sep 30, 2008 at 2:21 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> Perhaps not in so large and detailed form... I guess explanation of
> using ':/' as separator should be put there as well, if you plan to
> squash those patches.

I will add the explanation for the :/ separator in the patch for URL
generation (forgot it in v3)

>>>> @@ -624,8 +636,13 @@ sub href (%) {
>>>>       if ($params{-replay}) {
>>>>               while (my ($name, $symbol) = each %mapping) {
>>>>                       if (!exists $params{$name}) {
>>>> -                             # to allow for multivalued params we use arrayref form
>>>> -                             $params{$name} = [ $cgi->param($symbol) ];
>>>> +                             if ($cgi->param($symbol)) {
>>>> +                                     # to allow for multivalued params we use arrayref form
>>>> +                                     $params{$name} = [ $cgi->param($symbol) ];
>>>> +                             } else {
>>>> +                                     no strict 'refs';
>>>> +                                     $params{$name} = $$name if $$name;
>>>> +                             }
>>>>                       }
>>>>               }
>>>>       }
>>>
>>> What this change is about? And why this change is _here_, in this
>>> commit? It is I think unrelated, and wrong change.
>>
>> This is about being able to recycle CGI parameters that came through
>> as part of path_info instead of the CGI parameter list. It might not
>> be the best way to recover it, though. I *did* have a few thoughts
>> about an alternative way that consisted of build a parameter list
>> merging CGI and path-info parameter, but since this approach seemed to
>> work, I went with it.
>
> Fact, I have totally forgot about this.

Me too actually (the first version of this patch is almost a year old,
and since I forgot to write down why I did a few things ... taught me
that I should start putting more comments, though 8-D)

>>> href(..., -replay=>1) is all about reusing current URL, perhaps with
>>> a few parameters changed, like for example pagination links differ only
>>> in page number param change.  For example if we had only hb= and f=
>>> parameters, -replay=>1 links should use only those, and not add h=
>>> parameter because somewhere we felt that we need $hash to be calculated.
>>
>> Assume for example that you are to an url such as
>>
>> http://git.oblomov.eu/git/tree/refs/remotes/origin/master:gitweb
>>
>> Without this patch, the 'history' link on the second header line links
>> to ARRAY(0xblah)ARRAY(0xblah). With this patch, it shows the proper
>> link. So either replay is being abused somewhere in the link
>> generation code, or this CGI+path_info parameter retrieval is
>> necessary, one way or the other.
>
> Ah.  Now I understand.
>
> When creating code for href(..., -replay=>1), which by the way I thought
> would be more useful than actually is, I have forgot that parameters to
> gitweb could be passed in other way that through CGI parameters
> (CGI query)[1].
>
> Using
>
>        $params{$name} = [ $cgi->param($symbol) ];
>
> is a cute hack, but it doesn't work for arguments passed via path_info
> (was: project, hash_base and file_name; while now it is project, action,
> hash_base (in full) and file_name).

Exactly.

> The solution I thought about and abandoned in favor of this cute hack
> was to have additional hash (in addition to %mapping), which would map
> action names to references to variables holding the value for parameter.
>
> This has the same problem as your proposed solution of putting some
> parameters which didn't come from URL but were filled from other info.
> $hash parameter is most likely to be culprit here.
>
> On the other hand it is more generic and doesn't rely on knowledge that
> there is no multi-valued parameter which can be expressed in path_info
> (currently only 'opt' parameter can be multi-valued, and requires
> arrayref, but also 'opt' parameter is and cannot be put in path_info).
>
> I am talking there about the following solution:
>
>        my %action_vars = (
>                project => \$project,
>                action => \$action,
>                # ...
>                extra_options => \@extra_options,
>        );
>        # ...
>        while (my ($name, $symbol) = each %mapping) {
>                if (!exists $params{$name}) {
>                          $params{$name} = ${$action_vars{$name}};
>                }
>        }
>
>
> This avoids cure hack of (from your code)
>
>                } else {
>                           no strict 'refs';
>                           $params{$name} = $$name if $$name;
>                }
>
> I think that gitweb should use single source, not CGI query parameters
> or variable saving [sanitized] value.

The alternative I've been thinking about would be to have an
%input_parameters hash that holds all input parameters regardless of
hash; thus CGI query parameters and data extracted from PATH_INFO,
presently, but also command line options in the future, or whatever
else.

This is somewhat different from your %action_vars alternative, in the
sense that it isolates _input_ data, whereas if I understand correctly
the approach you suggest would isolate _output_ data (in the sense of
data to be used during link creation and whatnot).

Presently, the gitweb code defines some $variables from the input
parameters, and then overwrites them for output. Keeping the input
stuff clearly separate from the output stuff would mean that any
routine can retrieve the input data regardless of the subsequent
mangling and without any need to make ad-hoc backups or other tricks.

So my proposal is that I implement this %input_params stuff as the
first patch for the pathinfo series, and use %input_params all around
where cgi parameters are used currently (of course, %input_params is
initialized with the CGI parameters at first). The next patch would be
the extraction of parameters from PATH_INFO. And thirdly the PATH_INFO
URL generation (with or without the /-before-filename thing, at your
preference)

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] gitweb: action in path with use_pathinfo
  2008-09-30  8:05         ` Giuseppe Bilotta
@ 2008-09-30  8:48           ` Jakub Narebski
  2008-09-30 10:40             ` Giuseppe Bilotta
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Narebski @ 2008-09-30  8:48 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Lea Wiemann

On Tue, 30 Sep 2008, Giuseppe Bilotta wrote:
> On Tue, Sep 30, 2008 at 2:21 AM, Jakub Narebski <jnareb@gmail.com> wrote:

[...]
>> Ah.  Now I understand.
>>
>> When creating code for href(..., -replay=>1), which by the way I thought
>> would be more useful than actually is, I have forgot that parameters to
>> gitweb could be passed in other way that through CGI parameters
>> (CGI query)[1].
>>
>> Using
>>
>>        $params{$name} = [ $cgi->param($symbol) ];
>>
>> is a cute hack, but it doesn't work for arguments passed via path_info
>> (was: project, hash_base and file_name; while now it is project, action,
>> hash_base (in full) and file_name).
[...]

>> The solution I thought about and abandoned in favor of this cute hack
>> was to have additional hash (in addition to %mapping), which would map
>> action names to references to variables holding the value for parameter.
[...]

>> I am talking there about the following solution:
>>
>>        my %action_vars = (
>>                project => \$project,
>>                action => \$action,
>>                # ...
>>                extra_options => \@extra_options,
>>        );
>>        # ...
>>        while (my ($name, $symbol) = each %mapping) {
>>                if (!exists $params{$name}) {
>>                          $params{$name} = ${$action_vars{$name}};
>>                }
>>        }
>>
>>
>> This avoids cure hack of (from your code)
>>
>>                } else {
>>                           no strict 'refs';
>>                           $params{$name} = $$name if $$name;
>>                }
>>
>> I think that gitweb should use single source, not CGI query parameters
>> or variable saving [sanitized] value.
> 
> The alternative I've been thinking about would be to have an
> %input_parameters hash that holds all input parameters regardless of
> hash; thus CGI query parameters and data extracted from PATH_INFO,
> presently, but also command line options in the future, or whatever
> else.
> 
> This is somewhat different from your %action_vars alternative, in the
> sense that it isolates _input_ data, whereas if I understand correctly
> the approach you suggest would isolate _output_ data (in the sense of
> data to be used during link creation and whatnot).
> 
> Presently, the gitweb code defines some $variables from the input
> parameters, and then overwrites them for output. Keeping the input
> stuff clearly separate from the output stuff would mean that any
> routine can retrieve the input data regardless of the subsequent
> mangling and without any need to make ad-hoc backups or other tricks.
> 
> So my proposal is that I implement this %input_params stuff as the
> first patch for the pathinfo series, and use %input_params all around
> where cgi parameters are used currently (of course, %input_params is
> initialized with the CGI parameters at first). The next patch would be
> the extraction of parameters from PATH_INFO. And thirdly the PATH_INFO
> URL generation (with or without the /-before-filename thing, at your
> preference)

I presume that you would want to replace for example $hash_base
everywhere by %input_params{'hash_base'}?


I can think of yet another solution, namely to abstract getting
parameters from CGI query string, from path_info, and possibly in the
future also from command line options, and use this mechanism in
the getting parameters and validation part.

The %params hash would be filled from CGI parameters by using simply
"%params = $cgi->Vars;", then added to in evaluate_path_info instead
of directly modifying global parameters variables.  The input validation
and dispatch part would be modified to use %params (taking care of
multivalued parameters as described in CGI(3pm)), like below:

  our $action = $params{'a'} || $params{'action'};
  if (defined $action) {
  	if ($action =~ m/[^0-9a-zA-Z\.\-_]/) {
  		die_error(400, "Invalid action parameter");
  	}
  }

That is just for consideration: each approach has its advantages and
disadvantages.  Your proposal, as I understand it, is similar to the
way described in "Storing options in a hash" subsection of 
Getopt::Long(3pm) manpage.


Or we could just scrap and revert adding href(..., -replay=>1).
There is much trouble with getting it right and performing well,
and it is less useful than I thought (at least now).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] gitweb: action in path with use_pathinfo
  2008-09-30  8:48           ` Jakub Narebski
@ 2008-09-30 10:40             ` Giuseppe Bilotta
  2008-09-30 11:22               ` Jakub Narebski
  2008-09-30 23:24               ` Jakub Narebski
  0 siblings, 2 replies; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-30 10:40 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Lea Wiemann

On Tue, Sep 30, 2008 at 10:48 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Tue, 30 Sep 2008, Giuseppe Bilotta wrote:
>> On Tue, Sep 30, 2008 at 2:21 AM, Jakub Narebski <jnareb@gmail.com> wrote:

>>> I think that gitweb should use single source, not CGI query parameters
>>> or variable saving [sanitized] value.
>>
>> The alternative I've been thinking about would be to have an
>> %input_parameters hash that holds all input parameters regardless of
>> hash; thus CGI query parameters and data extracted from PATH_INFO,
>> presently, but also command line options in the future, or whatever
>> else.
>>
>> This is somewhat different from your %action_vars alternative, in the
>> sense that it isolates _input_ data, whereas if I understand correctly
>> the approach you suggest would isolate _output_ data (in the sense of
>> data to be used during link creation and whatnot).
>>
>> Presently, the gitweb code defines some $variables from the input
>> parameters, and then overwrites them for output. Keeping the input
>> stuff clearly separate from the output stuff would mean that any
>> routine can retrieve the input data regardless of the subsequent
>> mangling and without any need to make ad-hoc backups or other tricks.
>>
>> So my proposal is that I implement this %input_params stuff as the
>> first patch for the pathinfo series, and use %input_params all around
>> where cgi parameters are used currently (of course, %input_params is
>> initialized with the CGI parameters at first). The next patch would be
>> the extraction of parameters from PATH_INFO. And thirdly the PATH_INFO
>> URL generation (with or without the /-before-filename thing, at your
>> preference)
>
> I presume that you would want to replace for example $hash_base
> everywhere by %input_params{'hash_base'}?

No. %input_params{'hash_base'} would only be the _input_ hash base.
$hash_base would be kept if it's supposed to indicate the value of
hash base that is being manipulated.

> I can think of yet another solution, namely to abstract getting
> parameters from CGI query string, from path_info, and possibly in the
> future also from command line options, and use this mechanism in
> the getting parameters and validation part.
>
> The %params hash would be filled from CGI parameters by using simply
> "%params = $cgi->Vars;", then added to in evaluate_path_info instead
> of directly modifying global parameters variables.

So far I agree.

> The input validation
> and dispatch part would be modified to use %params (taking care of
> multivalued parameters as described in CGI(3pm)), like below:
>
>  our $action = $params{'a'} || $params{'action'};

Not too sure about that. The path_info (or whatever)-derived params
should be converted to use the same name as the CGI params. Or
conversely, CGI params should be mapped to the corresponding
full-form.

> That is just for consideration: each approach has its advantages and
> disadvantages.  Your proposal, as I understand it, is similar to the
> way described in "Storing options in a hash" subsection of
> Getopt::Long(3pm) manpage.

I'll read that, although it probably is.

> Or we could just scrap and revert adding href(..., -replay=>1).
> There is much trouble with getting it right and performing well,
> and it is less useful than I thought (at least now).

Dunno, the idea in itself is not bad. We just have to get it right ;)

In a way, I actually think that -replay=>1 should be the default, I
suspect it makes sense in most cases.

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] gitweb: action in path with use_pathinfo
  2008-09-30 10:40             ` Giuseppe Bilotta
@ 2008-09-30 11:22               ` Jakub Narebski
  2008-09-30 12:53                 ` Giuseppe Bilotta
  2008-09-30 23:24               ` Jakub Narebski
  1 sibling, 1 reply; 25+ messages in thread
From: Jakub Narebski @ 2008-09-30 11:22 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Lea Wiemann

On Tue, 30 Sep 2008, Giuseppe "Oblomov" Bilotta wrote:
> On Tue, Sep 30, 2008 at 10:48 AM, Jakub Narebski <jnareb@gmail.com> wrote:

> > Or we could just scrap and revert adding href(..., -replay=>1).
> > There is much trouble with getting it right and performing well,
> > and it is less useful than I thought (at least now).
> 
> Dunno, the idea in itself is not bad. We just have to get it right ;)

It is not easy to get it right, especially that there are multivalued
parameters like @extra_options, see e.g. commit 7863c612
 
> In a way, I actually think that -replay=>1 should be the default, I
> suspect it makes sense in most cases.

Well, -replay=>1 was meant to be used for "alternate view" links, like
for example 'next page' link, or 'raw view' link, or 'sorted by' link;
it would be stretch and feature abuse to use it for "item" links, like
entries in 'tree' view, or commits in log-like views, or changed files
links in 'commitdiff' and 'blobdiff'  views.

I guess it would be half the cases, not most cases.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] gitweb: action in path with use_pathinfo
  2008-09-30 11:22               ` Jakub Narebski
@ 2008-09-30 12:53                 ` Giuseppe Bilotta
  2008-09-30 21:00                   ` Jakub Narebski
  0 siblings, 1 reply; 25+ messages in thread
From: Giuseppe Bilotta @ 2008-09-30 12:53 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, Petr Baudis, Lea Wiemann

On Tue, Sep 30, 2008 at 1:22 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Tue, 30 Sep 2008, Giuseppe "Oblomov" Bilotta wrote:
>> On Tue, Sep 30, 2008 at 10:48 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>
>> > Or we could just scrap and revert adding href(..., -replay=>1).
>> > There is much trouble with getting it right and performing well,
>> > and it is less useful than I thought (at least now).
>>
>> Dunno, the idea in itself is not bad. We just have to get it right ;)
>
> It is not easy to get it right, especially that there are multivalued
> parameters like @extra_options, see e.g. commit 7863c612

So we just let values be arrays. Or isn't this enough?

-- 
Giuseppe "Oblomov" Bilotta

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] gitweb: action in path with use_pathinfo
  2008-09-30 12:53                 ` Giuseppe Bilotta
@ 2008-09-30 21:00                   ` Jakub Narebski
  0 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2008-09-30 21:00 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Lea Wiemann

On Tue, 30 September 2008, Giuseppe Bilotta wrote:
> On Tue, Sep 30, 2008 at 1:22 PM, Jakub Narebski <jnareb@gmail.com> wrote:
>> On Tue, 30 Sep 2008, Giuseppe "Oblomov" Bilotta wrote:
>>> On Tue, Sep 30, 2008 at 10:48 AM, Jakub Narebski <jnareb@gmail.com> wrote:

>>>> Or we could just scrap and revert adding href(..., -replay=>1).
>>>> There is much trouble with getting it right and performing well,
>>>> and it is less useful than I thought (at least now).
>>>
>>> Dunno, the idea in itself is not bad. We just have to get it right ;)
>>
>> It is not easy to get it right, especially that there are multivalued
>> parameters like @extra_options, see e.g. commit 7863c612
> 
> So we just let values be arrays. Or isn't this enough?

I don't think it would be that simple.


Let me elaborate a bit about complications and difficulties
within href() code.

First there are two names of parameters: the short name like 'a',
or 'h', or 'f' which is used in links to make them short (and
which is a bit historical legacy), and the long names like 'action',
'hash' or 'file_name' which are used for readability; then there are
also variables which hold values of parameters, like $action,
$hash and $file_name (which were source of long names for parameters).
href() has to provide translation between those two (well, three)
names; long names are used as names of "named parameters" to href(),
while short names are used when generating URL; %mapping provides
mapping between those two.

Second, href() must distinguish between gitweb options/parameters
like 'file_name'=>$file_name and extra options like '-full'=>1
or '-replay'=>1; additionally we want to have options output in
some definite order, with more important options first.  This
is provided by %mapping (to filter out) and @mapping (to sort).

Third, there are various sources of values of parameters, and
parameters used.  There are parameters specified explicitly in
href() call, and there are also implicit parameters: 'project'
parameter is implicitly added as it is needed in almost all
gitweb links (you can override it and generate projectless link by
using 'project'=>undef), and for '-replay'=>1 all parameters from
current URL are added.  Parameters specified explicitly have preference
over implicit or '-replay' ones.  

Now for '-replay' parameters might come from CGI query string, from
path_info part of URL, and perhaps in the future also from command
line.  To avoid duplicated code and other problems we should either
get parameters from variables (like in your code, but which doesn't
cover @extra_options well, but it could; or using "long name" to
variable ref mapping), or have some place where we save parameters
as we extract it from CGI query string, from path_info part of URL,
and in the future probably also from command line options/parameters.

Fourth, there is a little matter of _some_ parameters be multivalued;
currently it is only @extra_options / 'opt', but this may change in
the future, while _most_ are ordinary scalar values.  Printing arrayref
isn't something we want to do...


That is third and fourth which caused problems in the past with
href(..., -replay=>1)...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/6] gitweb: action in path with use_pathinfo
  2008-09-30 10:40             ` Giuseppe Bilotta
  2008-09-30 11:22               ` Jakub Narebski
@ 2008-09-30 23:24               ` Jakub Narebski
  1 sibling, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2008-09-30 23:24 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Lea Wiemann

On Tue, 30 Sep 2008, Giuseppe Bilotta wrote:
> On Tue, Sep 30, 2008 at 10:48 AM, Jakub Narebski <jnareb@gmail.com> wrote:

>> I presume that you would want to replace for example $hash_base
>> everywhere by %input_params{'hash_base'}?
> 
> No. %input_params{'hash_base'} would only be the _input_ hash base.
> $hash_base would be kept if it's supposed to indicate the value of
> hash base that is being manipulated.

Ah, sorry, I misunderstood. Then your idea is the same as one of mine
(except perhaps some details).
 
>> I can think of yet another solution, namely to abstract getting
>> parameters from CGI query string, from path_info, and possibly in the
>> future also from command line options, and use this mechanism in
>> the getting parameters and validation part.
>>
>> The %params hash would be filled from CGI parameters by using simply
>> "%params = $cgi->Vars;", then added to in evaluate_path_info instead
>> of directly modifying global parameters variables.
> 
> So far I agree.

Using "%input_params = %cgi->Vars;" has consequence of using short
parameter names for keys (and also a bit strange syntax for multivalue
options, see CGI(3pm)).

>> The input validation
>> and dispatch part would be modified to use %params (taking care of
>> multivalued parameters as described in CGI(3pm)), like below:

This has the additional advantage of doing gitweb parameter validation
_once_, and not like it is now done for example first in the "input
validation and dispatch" section, and then in evaluate_path_info()
subroutine.

On the other hand $project is checked _already_ in evaluate_path_info(),
because it has to to find where project name ends, so this part would
get duplicated, unless something smart is done.

>>
>>  our $action = $params{'a'} || $params{'action'};
>
> Not too sure about that. The path_info (or whatever)-derived params
> should be converted to use the same name as the CGI params. Or
> conversely, CGI params should be mapped to the corresponding
> full-form.

After thinking about it a little, I agree with above paragraph.

>> That is just for consideration: each approach has its advantages and
>> disadvantages.  Your proposal, as I understand it, is similar to the
>> way described in "Storing options in a hash" subsection of
>> Getopt::Long(3pm) manpage.
> 
> I'll read that, although it probably is.

Perhaps gitweb should have implement something like GetOptions?

>> Or we could just scrap and revert adding href(..., -replay=>1).
>> There is much trouble with getting it right and performing well,
>> and it is less useful than I thought (at least now).
> 
> Dunno, the idea in itself is not bad. We just have to get it right ;)
> 
> In a way, I actually think that -replay=>1 should be the default, I
> suspect it makes sense in most cases.

PITA but useful. Hmmm....

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/6] gitweb: use_pathinfo filenames start with /
  2008-09-30  7:48           ` Giuseppe Bilotta
@ 2008-09-30 23:49             ` Jakub Narebski
  0 siblings, 0 replies; 25+ messages in thread
From: Jakub Narebski @ 2008-09-30 23:49 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Lea Wiemann

On Tue, 30 Sep 2008, Giuseppe Bilotta wrote:
> On Tue, Sep 30, 2008 at 1:20 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> >
> > Hn. Now I am not sure if it should be squashed, or should be separate.
> 
> Yeah. The fact that it's *specifically* to allow web docs to be used
> in raw view makes it count towarda a feature in itself, even if the
> patch by itself is trivial ...
> 
> So. Squashed or separate? :)

I'm slighty for separate, if only to avoid overly long commit message.
<commit-ish>:<filename> is understandable and expected because it is
what one uses for git-show; making <commit-ish>:/<filename> the default
because of relative links in 'blob_plain' view of HTML file requires
some further explanation.

But is is IMVHO finally your call here.
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2008-09-30 23:51 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-21 20:57 [PATCHv2 0/6] gitweb pathinfo improvements Giuseppe Bilotta
2008-09-21 20:57 ` [PATCH 1/6] gitweb: action in path with use_pathinfo Giuseppe Bilotta
2008-09-21 20:57   ` [PATCH 2/6] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
2008-09-21 20:57     ` [PATCH 3/6] gitweb: parse parent..current syntax from pathinfo Giuseppe Bilotta
2008-09-21 20:57       ` [PATCH 4/6] gitweb: use_pathinfo creates parent..current paths Giuseppe Bilotta
2008-09-21 20:57         ` [PATCH 5/6] gitweb: remove PATH_INFO from $my_url and $my_uri Giuseppe Bilotta
2008-09-21 20:57           ` [PATCH 6/6] gitweb: prevent double slashes in PATH_INFO hrefs Giuseppe Bilotta
2008-09-29 18:12             ` Jakub Narebski
2008-09-29  8:33           ` [PATCH 5/6] gitweb: remove PATH_INFO from $my_url and $my_uri Jakub Narebski
2008-09-29 13:05             ` Giuseppe Bilotta
2008-09-29  1:08     ` [PATCH 2/6] gitweb: use_pathinfo filenames start with / Jakub Narebski
2008-09-29 14:12       ` Giuseppe Bilotta
2008-09-29 23:20         ` Jakub Narebski
2008-09-30  7:48           ` Giuseppe Bilotta
2008-09-30 23:49             ` Jakub Narebski
2008-09-29  1:03   ` [PATCH 1/6] gitweb: action in path with use_pathinfo Jakub Narebski
2008-09-29 14:22     ` Giuseppe Bilotta
2008-09-30  0:21       ` Jakub Narebski
2008-09-30  8:05         ` Giuseppe Bilotta
2008-09-30  8:48           ` Jakub Narebski
2008-09-30 10:40             ` Giuseppe Bilotta
2008-09-30 11:22               ` Jakub Narebski
2008-09-30 12:53                 ` Giuseppe Bilotta
2008-09-30 21:00                   ` Jakub Narebski
2008-09-30 23:24               ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).