* [PATCHv6 0/5] gitweb: PATH_INFO enhancement
@ 2008-10-16 20:27 Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
0 siblings, 1 reply; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-16 20:27 UTC (permalink / raw)
To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta
Sixth attempt for my gitweb PATH_INFO patchset, whose purpose is to
reduce the use of CGI parameters by embedding as many parameters as
possible in the URL path itself, provided the pathinfo feature is
enabled.
The new typical gitweb URL is therefore in the form
$project/$action/$parent:$file..$hash:$file
(with useless parts stripped). Backwards compatibility for old-style
$project/$hash[:$file] URLs is kept, as long as $hash is not a refname
whose name happens to match a git action.
The main implementation is provided by paired patches (#1#2, #4#5)
that implement parsing and generation of the new style URLs.
Patch #3 is a minor improvement to the URL syntax that allows web
sites to be properly browsable in raw mode.
The patchset depends on my previous input parameter handling patch
currently waiting in 'next'. This resend fixes a couple of problems
I found in v5.
Giuseppe Bilotta (5):
gitweb: parse project/action/hash_base:filename PATH_INFO
gitweb: generate project/action/hash URLs
gitweb: use_pathinfo filenames start with /
gitweb: parse parent..current syntax from PATH_INFO
gitweb: generate parent..current URLs
gitweb/gitweb.perl | 124 +++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 114 insertions(+), 10 deletions(-)
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO
2008-10-16 20:27 [PATCHv6 0/5] gitweb: PATH_INFO enhancement Giuseppe Bilotta
@ 2008-10-16 20:27 ` Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta
2008-10-18 22:41 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
0 siblings, 2 replies; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-16 20:27 UTC (permalink / raw)
To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta
This patch enables gitweb to parse URLs with more information embedded
in PATH_INFO, reducing the need for CGI parameters. The typical gitweb
path is now $project/$action/$hash_base:$file_name or
$project/$action/$hash
This is mostly backwards compatible with the old-style gitweb paths,
$project/$branch[:$filename], except when it was used to access a branch
whose name matches a gitweb action.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
gitweb/gitweb.perl | 37 +++++++++++++++++++++++++++++++------
1 files changed, 31 insertions(+), 6 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c5254af..6d0dc26 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -534,23 +534,48 @@ sub evaluate_path_info {
return if $input_params{'action'};
$path_info =~ s,^\Q$project\E/*,,;
+ # next, check if we have an action
+ my $action = $path_info;
+ $action =~ s,/.*$,,;
+ if (exists $actions{$action}) {
+ $path_info =~ s,^$action/*,,;
+ $input_params{'action'} = $action;
+ }
+
+ # list of actions that want hash_base instead of hash
+ my @wants_base = (
+ 'tree',
+ 'history',
+ );
+
my ($refname, $pathname) = split(/:/, $path_info, 2);
if (defined $pathname) {
- # we got "project.git/branch:filename" or "project.git/branch:dir/"
+ # we got "branch:filename" or "branch:dir/"
# we could use git_get_type(branch:pathname), but it needs $git_dir
$pathname =~ s,^/+,,;
if (!$pathname || substr($pathname, -1) eq "/") {
- $input_params{'action'} = "tree";
+ $input_params{'action'} ||= "tree";
$pathname =~ s,/$,,;
} else {
- $input_params{'action'} = "blob_plain";
+ $input_params{'action'} ||= "blob_plain";
}
$input_params{'hash_base'} ||= $refname;
$input_params{'file_name'} ||= $pathname;
} elsif (defined $refname) {
- # we got "project.git/branch"
- $input_params{'action'} = "shortlog";
- $input_params{'hash'} ||= $refname;
+ # we got "branch". in this case we have to choose if we have to
+ # set hash or hash_base.
+ #
+ # Most of the actions without a pathname only want hash to be
+ # set, except for the ones specified in @wants_base that want
+ # hash_base instead. It should also be noted that hand-crafted
+ # links having 'history' as an action and no pathname or hash
+ # set will fail, but that happens regardless of PATH_INFO.
+ $input_params{'action'} ||= "shortlog";
+ if (grep {$input_params{'action'} eq $_} @wants_base) {
+ $input_params{'hash_base'} ||= $refname;
+ } else {
+ $input_params{'hash'} ||= $refname;
+ }
}
}
evaluate_path_info();
--
1.5.6.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCHv6 2/5] gitweb: generate project/action/hash URLs
2008-10-16 20:27 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
@ 2008-10-16 20:27 ` Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
2008-10-18 23:14 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Jakub Narebski
2008-10-18 22:41 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
1 sibling, 2 replies; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-16 20:27 UTC (permalink / raw)
To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta
When generating path info URLs, reduce the number of CGI parameters by
embedding action and hash_parent:filename or hash in the path.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
gitweb/gitweb.perl | 33 ++++++++++++++++++++++++++++++---
1 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 6d0dc26..5337d40 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -724,14 +724,41 @@ sub href (%) {
my ($use_pathinfo) = gitweb_check_feature('pathinfo');
if ($use_pathinfo) {
- # use PATH_INFO for project name
+ # try to put as many parameters as possible in PATH_INFO:
+ # - project name
+ # - action
+ # - hash or hash_base:filename
+
+ # When the script is the root DirectoryIndex for the domain,
+ # $href here would be something like http://gitweb.example.com/
+ # Thus, we strip any trailing / from $href, to spare us double
+ # slashes in the final URL
+ $href =~ s,/$,,;
+
+ # Then add the project name, if present
$href .= "/".esc_url($params{'project'}) if defined $params{'project'};
delete $params{'project'};
- # Summary just uses the project path URL
- if (defined $params{'action'} && $params{'action'} eq 'summary') {
+ # Summary just uses the project path URL, any other action is
+ # added to the URL
+ if (defined $params{'action'}) {
+ $href .= "/".esc_url($params{'action'}) unless $params{'action'} eq 'summary';
delete $params{'action'};
}
+
+ # Finally, we put either hash_base:file_name or hash
+ if (defined $params{'hash_base'}) {
+ $href .= "/".esc_url($params{'hash_base'});
+ if (defined $params{'file_name'}) {
+ $href .= ":".esc_url($params{'file_name'});
+ delete $params{'file_name'};
+ }
+ delete $params{'hash'};
+ delete $params{'hash_base'};
+ } elsif (defined $params{'hash'}) {
+ $href .= "/".esc_url($params{'hash'});
+ delete $params{'hash'};
+ }
}
# now encode the parameters explicitly
--
1.5.6.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCHv6 3/5] gitweb: use_pathinfo filenames start with /
2008-10-16 20:27 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta
@ 2008-10-16 20:27 ` Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta
2008-10-18 23:26 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Jakub Narebski
2008-10-18 23:14 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Jakub Narebski
1 sibling, 2 replies; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-16 20:27 UTC (permalink / raw)
To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta
When using path info, make filenames start with a / (right after the :
that separates them from the hash base). This minimal change allows
relative navigation to work properly when viewing HTML files in raw
('blob_plain') mode.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
gitweb/gitweb.perl | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 5337d40..49730f3 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -727,7 +727,7 @@ sub href (%) {
# try to put as many parameters as possible in PATH_INFO:
# - project name
# - action
- # - hash or hash_base:filename
+ # - hash or hash_base:/filename
# When the script is the root DirectoryIndex for the domain,
# $href here would be something like http://gitweb.example.com/
@@ -746,11 +746,11 @@ sub href (%) {
delete $params{'action'};
}
- # Finally, we put either hash_base:file_name or hash
+ # Finally, we put either hash_base:/file_name or hash
if (defined $params{'hash_base'}) {
$href .= "/".esc_url($params{'hash_base'});
if (defined $params{'file_name'}) {
- $href .= ":".esc_url($params{'file_name'});
+ $href .= ":/".esc_url($params{'file_name'});
delete $params{'file_name'};
}
delete $params{'hash'};
--
1.5.6.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO
2008-10-16 20:27 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
@ 2008-10-16 20:27 ` Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta
2008-10-20 9:18 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Jakub Narebski
2008-10-18 23:26 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Jakub Narebski
1 sibling, 2 replies; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-16 20:27 UTC (permalink / raw)
To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta
This patch makes it possible to use an URL such as
$project/somebranch..otherbranch:/filename to get a diff between
different version of a file. Paths like
$project/$action/somebranch:/somefile..otherbranch:/otherfile are parsed
as well.
All '*diff' actions and in general actions that use $hash_parent[_base]
and $file_parent can now get all of their parameters from PATH_INFO
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
gitweb/gitweb.perl | 36 ++++++++++++++++++++++++++++++++++--
1 files changed, 34 insertions(+), 2 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 49730f3..1a7b0b9 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -548,7 +548,12 @@ sub evaluate_path_info {
'history',
);
- my ($refname, $pathname) = split(/:/, $path_info, 2);
+ # horrible regexp to catch
+ # [$hash_parent_base[:$file_parent]..]$hash_parent[:$file_name]
+ my ($parentrefname, $parentpathname, $refname, $pathname) =
+ ($path_info =~ /^(?:(.+?)(?::(.+))?\.\.)?(.+?)(?::(.+))?$/);
+
+ # first, analyze the 'current' part
if (defined $pathname) {
# we got "branch:filename" or "branch:dir/"
# we could use git_get_type(branch:pathname), but it needs $git_dir
@@ -557,7 +562,13 @@ sub evaluate_path_info {
$input_params{'action'} ||= "tree";
$pathname =~ s,/$,,;
} else {
- $input_params{'action'} ||= "blob_plain";
+ # the default action depends on whether we had parent info
+ # or not
+ if ($parentrefname) {
+ $input_params{'action'} ||= "blobdiff_plain";
+ } else {
+ $input_params{'action'} ||= "blob_plain";
+ }
}
$input_params{'hash_base'} ||= $refname;
$input_params{'file_name'} ||= $pathname;
@@ -577,6 +588,27 @@ sub evaluate_path_info {
$input_params{'hash'} ||= $refname;
}
}
+
+ # next, handle the 'parent' part, if present
+ if (defined $parentrefname) {
+ # a missing pathspec defaults to the 'current' filename, allowing e.g.
+ # someproject/blobdiff/oldrev..newrev:/filename
+ if ($parentpathname) {
+ $parentpathname =~ s,^/+,,;
+ $parentpathname =~ s,/$,,;
+ $input_params{'file_parent'} ||= $parentpathname;
+ } else {
+ $input_params{'file_parent'} ||= $input_params{'file_name'};
+ }
+ # we assume that hash_parent_base is wanted if a path was specified,
+ # or if the action wants hash_base instead of hash
+ if (defined $input_params{'file_parent'} ||
+ grep {$input_params{'action'} eq $_} @wants_base) {
+ $input_params{'hash_parent_base'} ||= $parentrefname;
+ } else {
+ $input_params{'hash_parent'} ||= $parentrefname;
+ }
+ }
}
evaluate_path_info();
--
1.5.6.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCHv6 5/5] gitweb: generate parent..current URLs
2008-10-16 20:27 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta
@ 2008-10-16 20:27 ` Giuseppe Bilotta
2008-10-19 12:11 ` [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO Giuseppe Bilotta
` (2 more replies)
2008-10-20 9:18 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Jakub Narebski
1 sibling, 3 replies; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-16 20:27 UTC (permalink / raw)
To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta
If use_pathinfo is enabled, href now creates links that contain paths in
the form $project/$action/oldhash:/oldname..newhash:/newname for actions
that use hash_parent etc.
If any of the filename contains two consecutive dots, it's kept as a CGI
parameter since the resulting path would otherwise be ambiguous.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
gitweb/gitweb.perl | 30 +++++++++++++++++++++++++-----
1 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 1a7b0b9..f4642e7 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -759,7 +759,8 @@ sub href (%) {
# try to put as many parameters as possible in PATH_INFO:
# - project name
# - action
- # - hash or hash_base:/filename
+ # - hash_parent or hash_parent_base:/file_parent
+ # - hash or hash_base:/file_name
# When the script is the root DirectoryIndex for the domain,
# $href here would be something like http://gitweb.example.com/
@@ -778,17 +779,36 @@ sub href (%) {
delete $params{'action'};
}
- # Finally, we put either hash_base:/file_name or hash
+ # Next, we put hash_parent_base:/file_parent..hash_base:/file_name,
+ # stripping nonexistent or useless pieces
+ $href .= "/" if ($params{'hash_base'} || $params{'hash_parent_base'}
+ || $params{'hash_parent'} || $params{'hash'});
if (defined $params{'hash_base'}) {
- $href .= "/".esc_url($params{'hash_base'});
- if (defined $params{'file_name'}) {
+ if (defined $params{'hash_parent_base'}) {
+ $href .= esc_url($params{'hash_parent_base'});
+ # skip the file_parent if it's the same as the file_name
+ delete $params{'file_parent'} if $params{'file_parent'} eq $params{'file_name'};
+ if (defined $params{'file_parent'} && $params{'file_parent'} !~ /\.\./) {
+ $href .= ":/".esc_url($params{'file_parent'});
+ delete $params{'file_parent'};
+ }
+ $href .= "..";
+ delete $params{'hash_parent'};
+ delete $params{'hash_parent_base'};
+ } elsif (defined $params{'hash_parent'}) {
+ $href .= esc_url($params{'hash_parent'}). "..";
+ delete $params{'hash_parent'};
+ }
+
+ $href .= esc_url($params{'hash_base'});
+ if (defined $params{'file_name'} && $params{'file_name'} !~ /\.\./) {
$href .= ":/".esc_url($params{'file_name'});
delete $params{'file_name'};
}
delete $params{'hash'};
delete $params{'hash_base'};
} elsif (defined $params{'hash'}) {
- $href .= "/".esc_url($params{'hash'});
+ $href .= esc_url($params{'hash'});
delete $params{'hash'};
}
}
--
1.5.6.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO
2008-10-16 20:27 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta
@ 2008-10-18 22:41 ` Jakub Narebski
1 sibling, 0 replies; 26+ messages in thread
From: Jakub Narebski @ 2008-10-18 22:41 UTC (permalink / raw)
To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano
On Thu, 16 Oct 2008, Giuseppe Bilotta wrote:
> This patch enables gitweb to parse URLs with more information embedded
> in PATH_INFO, reducing the need for CGI parameters. The typical gitweb
> path is now $project/$action/$hash_base:$file_name or
> $project/$action/$hash
>
> This is mostly backwards compatible with the old-style gitweb paths,
> $project/$branch[:$filename], except when it was used to access a branch
> whose name matches a gitweb action.
...in which case old code would access branch, and new code would treat
name as action. This shouldn't matter in practice, unless there are
branches named exactly as gitweb actions.
[But I think current commit description is enough. Just in case...]
>
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
I like it. For what it is worth:
Acked-by: Jakub Narebski <jnareb@gmail.com>
> ---
> gitweb/gitweb.perl | 37 +++++++++++++++++++++++++++++++------
> 1 files changed, 31 insertions(+), 6 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index c5254af..6d0dc26 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -534,23 +534,48 @@ sub evaluate_path_info {
> return if $input_params{'action'};
> $path_info =~ s,^\Q$project\E/*,,;
>
> + # next, check if we have an action
> + my $action = $path_info;
> + $action =~ s,/.*$,,;
> + if (exists $actions{$action}) {
> + $path_info =~ s,^$action/*,,;
> + $input_params{'action'} = $action;
> + }
> +
> + # list of actions that want hash_base instead of hash
...and can have no file_name ('f') parameter.
> + my @wants_base = (
> + 'tree',
> + 'history',
> + );
I like this solution. It makes path_info URL unambiguous: it is
<project>/log/<hash> (where <hash> is usually refname), and
<project>/tree/<hash_base> which is just shortcut for <project>/tree/<hash_base>:/
It means that the @wants_base list contains actions which require / use
$hash_base and $file_name, but $file_name might be not set, which denotes
top directory (root tree).
> +
> my ($refname, $pathname) = split(/:/, $path_info, 2);
> if (defined $pathname) {
> - # we got "project.git/branch:filename" or "project.git/branch:dir/"
> + # we got "branch:filename" or "branch:dir/"
> # we could use git_get_type(branch:pathname), but it needs $git_dir
Well, this comment was there... but actually I think we avoid
git_get_type("branch:pathname") because currently it is additional fork.
And with convention that filename part of path_info ends in '/' for
directories (quite sensible I think) it is not needed. Moreso after
next patch in series, when we state action explicitly, and the only
difference _might_ be for 'history' view (which is for directories
and for files both).
> $pathname =~ s,^/+,,;
> if (!$pathname || substr($pathname, -1) eq "/") {
> - $input_params{'action'} = "tree";
> + $input_params{'action'} ||= "tree";
> $pathname =~ s,/$,,;
> } else {
> - $input_params{'action'} = "blob_plain";
> + $input_params{'action'} ||= "blob_plain";
> }
> $input_params{'hash_base'} ||= $refname;
> $input_params{'file_name'} ||= $pathname;
> } elsif (defined $refname) {
> - # we got "project.git/branch"
> - $input_params{'action'} = "shortlog";
> - $input_params{'hash'} ||= $refname;
> + # we got "branch". in this case we have to choose if we have to
^ ^
|-|-- ??? (typo?)
> + # set hash or hash_base.
> + #
> + # Most of the actions without a pathname only want hash to be
> + # set, except for the ones specified in @wants_base that want
> + # hash_base instead. It should also be noted that hand-crafted
> + # links having 'history' as an action and no pathname or hash
> + # set will fail, but that happens regardless of PATH_INFO.
Ah, I see that the comment about what makes action to be put into
@wants_base is here.
> + $input_params{'action'} ||= "shortlog";
> + if (grep {$input_params{'action'} eq $_} @wants_base) {
Style: I would use here (but perhaps it is just me) for better
readibility
+ if (grep { $input_params{'action'} eq $_ } @wants_base) {
> + $input_params{'hash_base'} ||= $refname;
> + } else {
> + $input_params{'hash'} ||= $refname;
> + }
> }
> }
> evaluate_path_info();
> --
> 1.5.6.5
>
>
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 2/5] gitweb: generate project/action/hash URLs
2008-10-16 20:27 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
@ 2008-10-18 23:14 ` Jakub Narebski
1 sibling, 0 replies; 26+ messages in thread
From: Jakub Narebski @ 2008-10-18 23:14 UTC (permalink / raw)
To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano
On Thu, 16 Oct 2008, Giuseppe Bilotta wrote:
> When generating path info URLs, reduce the number of CGI parameters by
> embedding action and hash_parent:filename or hash in the path.
>
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Nice, well commented, and clean.
Acked-by: Jakub Narebski <jnareb@gmail.com>
> ---
> gitweb/gitweb.perl | 33 ++++++++++++++++++++++++++++++---
> 1 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 6d0dc26..5337d40 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -724,14 +724,41 @@ sub href (%) {
>
> my ($use_pathinfo) = gitweb_check_feature('pathinfo');
> if ($use_pathinfo) {
> - # use PATH_INFO for project name
> + # try to put as many parameters as possible in PATH_INFO:
> + # - project name
> + # - action
> + # - hash or hash_base:filename
> +
> + # When the script is the root DirectoryIndex for the domain,
> + # $href here would be something like http://gitweb.example.com/
> + # Thus, we strip any trailing / from $href, to spare us double
> + # slashes in the final URL
> + $href =~ s,/$,,;
I like having this comment here.
> +
> + # Then add the project name, if present
> $href .= "/".esc_url($params{'project'}) if defined $params{'project'};
> delete $params{'project'};
>
> - # Summary just uses the project path URL
> - if (defined $params{'action'} && $params{'action'} eq 'summary') {
> + # Summary just uses the project path URL, any other action is
> + # added to the URL
Which meant that we prefer http://base_url/project over http://base_url/project/summary
> + if (defined $params{'action'}) {
> + $href .= "/".esc_url($params{'action'}) unless $params{'action'} eq 'summary';
> delete $params{'action'};
> }
> +
> + # Finally, we put either hash_base:file_name or hash
> + if (defined $params{'hash_base'}) {
> + $href .= "/".esc_url($params{'hash_base'});
> + if (defined $params{'file_name'}) {
> + $href .= ":".esc_url($params{'file_name'});
> + delete $params{'file_name'};
> + }
> + delete $params{'hash'};
> + delete $params{'hash_base'};
> + } elsif (defined $params{'hash'}) {
> + $href .= "/".esc_url($params{'hash'});
> + delete $params{'hash'};
> + }
> }
>
> # now encode the parameters explicitly
> --
> 1.5.6.5
>
>
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 3/5] gitweb: use_pathinfo filenames start with /
2008-10-16 20:27 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta
@ 2008-10-18 23:26 ` Jakub Narebski
2008-10-18 23:57 ` Giuseppe Bilotta
1 sibling, 1 reply; 26+ messages in thread
From: Jakub Narebski @ 2008-10-18 23:26 UTC (permalink / raw)
To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano
On Thu, 16 Oct 2008, Giuseppe Bilotta wrote:
> When using path info, make filenames start with a / (right after the :
> that separates them from the hash base). This minimal change allows
> relative navigation to work properly when viewing HTML files in raw
> ('blob_plain') mode.
This means generating project/action/hash_base:/filename rather than
earlier project/action/hash_base:filename, isn't it?
For example for http://repo.or.cz/w/git.git/html:/git.html links leads
to correct HTML pages, while for http://repo.or.cz/w/git.git/html:git.html
they lead to empty gitweb page (no errors, so link checker would be
fooled).
>
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Acked-by: Jakub Narebski <jnareb@gmail.com>
> ---
> gitweb/gitweb.perl | 6 +++---
> 1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 5337d40..49730f3 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -727,7 +727,7 @@ sub href (%) {
> # try to put as many parameters as possible in PATH_INFO:
> # - project name
> # - action
> - # - hash or hash_base:filename
> + # - hash or hash_base:/filename
>
> # When the script is the root DirectoryIndex for the domain,
> # $href here would be something like http://gitweb.example.com/
> @@ -746,11 +746,11 @@ sub href (%) {
> delete $params{'action'};
> }
>
> - # Finally, we put either hash_base:file_name or hash
> + # Finally, we put either hash_base:/file_name or hash
> if (defined $params{'hash_base'}) {
> $href .= "/".esc_url($params{'hash_base'});
> if (defined $params{'file_name'}) {
> - $href .= ":".esc_url($params{'file_name'});
> + $href .= ":/".esc_url($params{'file_name'});
> delete $params{'file_name'};
> }
> delete $params{'hash'};
> --
> 1.5.6.5
>
>
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 3/5] gitweb: use_pathinfo filenames start with /
2008-10-18 23:26 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Jakub Narebski
@ 2008-10-18 23:57 ` Giuseppe Bilotta
2008-10-19 8:43 ` Jakub Narebski
0 siblings, 1 reply; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-18 23:57 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano
On Sun, Oct 19, 2008 at 1:26 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Thu, 16 Oct 2008, Giuseppe Bilotta wrote:
>
>> When using path info, make filenames start with a / (right after the :
>> that separates them from the hash base). This minimal change allows
>> relative navigation to work properly when viewing HTML files in raw
>> ('blob_plain') mode.
>
> This means generating project/action/hash_base:/filename rather than
> earlier project/action/hash_base:filename, isn't it?
Exactly. We _accept_ paths with or without the /, but we always
generate paths with it.
> For example for http://repo.or.cz/w/git.git/html:/git.html links leads
> to correct HTML pages, while for http://repo.or.cz/w/git.git/html:git.html
> they lead to empty gitweb page (no errors, so link checker would be
> fooled).
An idea that could be taken into consideration, if the ability to
navigate web documents is deemed of primary importance, would be a
redirect from the no-slash URL (a hand-coded one, given that with this
patch we only generate slashed URLs) to the slashed URL. Not sure it's
worth the effort (and reparsing) though: it would obviously be MUCH
nicer if we could change the URL without having to actually reload the
document ...
--
Giuseppe "Oblomov" Bilotta
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 3/5] gitweb: use_pathinfo filenames start with /
2008-10-18 23:57 ` Giuseppe Bilotta
@ 2008-10-19 8:43 ` Jakub Narebski
0 siblings, 0 replies; 26+ messages in thread
From: Jakub Narebski @ 2008-10-19 8:43 UTC (permalink / raw)
To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano
On Sun, 19 Oct 2008, Giuseppe Bilotta wrote:
> On Sun, Oct 19, 2008 at 1:26 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>> On Thu, 16 Oct 2008, Giuseppe Bilotta wrote:
>>
>>> When using path info, make filenames start with a / (right after the :
>>> that separates them from the hash base). This minimal change allows
>>> relative navigation to work properly when viewing HTML files in raw
>>> ('blob_plain') mode.
[...]
>> For example for http://repo.or.cz/w/git.git/html:/git.html links leads
>> to correct HTML pages, while for http://repo.or.cz/w/git.git/html:git.html
>> they lead to empty gitweb page (no errors, so link checker would be
>> fooled).
>
> An idea that could be taken into consideration, if the ability to
> navigate web documents is deemed of primary importance, would be a
> redirect from the no-slash URL (a hand-coded one, given that with this
> patch we only generate slashed URLs) to the slashed URL. Not sure it's
> worth the effort (and reparsing) though: it would obviously be MUCH
> nicer if we could change the URL without having to actually reload the
> document ...
I think that changing URL without reloading is impossible because of
security reasons. For example if you change document.location in
JavaScript changing the URL (you can add links to non-existing anchors
without reloading) then web browser reloads the page from new URL.
What you can do is early redirect using 301 Moved Permanently (or
similar) and Location: redirect, using $cgi->redirect() like in
git_object().
Also not sure if it is worth the effort...
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO
2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta
@ 2008-10-19 12:11 ` Giuseppe Bilotta
2008-10-19 14:24 ` [PATHv2 6/8] " Giuseppe Bilotta
2008-10-20 10:49 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Jakub Narebski
2 siblings, 0 replies; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-19 12:11 UTC (permalink / raw)
To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta
We parse requests for $project/snapshot/$head.$sfx as equivalent to
$project/snapshot/$head?sf=$sfx, where $sfx is any of the known
(although not necessarily supported) snapshot formats (or its default
suffix).
The filename for the resulting package preserves the requested
extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz
gives a .tar.gz), although for obvious reasons it doesn't preserve the
basename (git/snapshot/next.tgz returns a file names git-next.tgz).
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
Ok, I lied, there's a 6th patch ready. Its purpose is to allow embedding of
the snapshot format parameter in the URL, in the form of an extension
appended to the refname.
I have not prepared the corresponding URL generation code yet, because of
potential ambiguity that might arise: even though at URL generation time
the project does not have a head named $head.$sfx, it is possible (although
extremely unlikely) that such head is created later on, and the PATH_INFO
generation code cannot therefore guarantee the permalink nature of the
embedded snapshot format URL.
gitweb/gitweb.perl | 35 +++++++++++++++++++++++++++++++++++
1 files changed, 35 insertions(+), 0 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 99c8c20..7ffd10b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -609,6 +609,41 @@ sub evaluate_path_info {
$input_params{'hash_parent'} ||= $parentrefname;
}
}
+
+ # for the snapshot action, we allow URLs in the form
+ # $project/snapshot/$hash.ext
+ # where .ext determines the snapshot and gets removed from the
+ # passed $refname to provide the $hash.
+ #
+ # To be able to tell that $refname includes the format extension, we
+ # require some conditions to be satisfied:
+ # - the hash input parameter MUST have been set from the $refname part
+ # of the URL (i.e. they must be equal)
+ # - the snapshot format MUST NOT have been defined already
+ # - the $refname itself MUST NOT be a valid refname
+ if ($input_params{'action'} eq 'snapshot' && defined $refname &&
+ $refname eq $input_params{'hash'} &&
+ !defined $input_params{'snapshot_format'} &&
+ !parse_commit($refname)) {
+ # We loop over the known snapshot formats, checking for
+ # extensions. Allowed extensions are both the defined suffix
+ # (which includes the initial dot already) and the snapshot
+ # format key itself, with a prepended dot
+ while (my ($fmt, %opt) = each %known_snapshot_formats) {
+ my $hash = $refname;
+ my $sfx;
+ $hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//;
+ next unless $sfx = $1;
+ # a valid suffix was found, so set the snapshot format
+ # and reset the hash parameter
+ $input_params{'snapshot_format'} = $fmt;
+ $input_params{'hash'} = $hash;
+ # we also set the format suffix to the one requested
+ # in the URL: this way a request for e.g. .tgz returns
+ # a .tgz instead of a .tar.gz
+ $known_snapshot_formats{$fmt}{'suffix'} = $sfx;
+ }
+ }
}
evaluate_path_info();
--
1.5.6.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO
2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta
2008-10-19 12:11 ` [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO Giuseppe Bilotta
@ 2008-10-19 14:24 ` Giuseppe Bilotta
2008-10-19 14:24 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta
2008-10-21 16:44 ` [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO Jakub Narebski
2008-10-20 10:49 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Jakub Narebski
2 siblings, 2 replies; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-19 14:24 UTC (permalink / raw)
To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta
We parse requests for $project/snapshot/$head.$sfx as equivalent to
$project/snapshot/$head?sf=$sfx, where $sfx is any of the known
(although not necessarily supported) snapshot formats (or its default
suffix).
The filename for the resulting package preserves the requested
extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz
gives a .tar.gz), although for obvious reasons it doesn't preserve the
basename (git/snapshot/next.tgz returns a file names git-next.tgz).
This introduces a potential case for ambiguity if a project has a head
that ends with a snapshot-like suffix (.zip, .tgz, .tar.gz, etc) and the
sf CGI parameter is not present; however, gitweb only produces URLs with
the sf parameter, so this is only a potential issue for hand-coded URLs
for extremely unusual project.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
I had second thoughts on this. Now we always look for the snapshot extension if
the sf CGI parameter is missing, even if the project has a head that matches
the full pseudo-refname $head.$sfx.
The reason for this is that (1) there is no ambiguity for gitweb-generated
URLs (2) the only URLs that could fail are hand-made URLs for extremely
unusual projects and (3) it allows us to set gitweb up to generate
(unambiguous) URLs without the sf CGI parameter.
This also means that I can add 3 patches to the series, instead of just one:
* patch #6 that parses the new format
* patch #7 that generates the new URLs
* patch #8 for some code refactoring
gitweb/gitweb.perl | 34 ++++++++++++++++++++++++++++++++++
1 files changed, 34 insertions(+), 0 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 99c8c20..e9e9e60 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -609,6 +609,40 @@ sub evaluate_path_info {
$input_params{'hash_parent'} ||= $parentrefname;
}
}
+
+ # for the snapshot action, we allow URLs in the form
+ # $project/snapshot/$hash.ext
+ # where .ext determines the snapshot and gets removed from the
+ # passed $refname to provide the $hash.
+ #
+ # To be able to tell that $refname includes the format extension, we
+ # require the following two conditions to be satisfied:
+ # - the hash input parameter MUST have been set from the $refname part
+ # of the URL (i.e. they must be equal)
+ # - the snapshot format MUST NOT have been defined already
+ if ($input_params{'action'} eq 'snapshot' && defined $refname &&
+ $refname eq $input_params{'hash'} &&
+ !defined $input_params{'snapshot_format'}) {
+ # We loop over the known snapshot formats, checking for
+ # extensions. Allowed extensions are both the defined suffix
+ # (which includes the initial dot already) and the snapshot
+ # format key itself, with a prepended dot
+ while (my ($fmt, %opt) = each %known_snapshot_formats) {
+ my $hash = $refname;
+ my $sfx;
+ $hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//;
+ next unless $sfx = $1;
+ # a valid suffix was found, so set the snapshot format
+ # and reset the hash parameter
+ $input_params{'snapshot_format'} = $fmt;
+ $input_params{'hash'} = $hash;
+ # we also set the format suffix to the one requested
+ # in the URL: this way a request for e.g. .tgz returns
+ # a .tgz instead of a .tar.gz
+ $known_snapshot_formats{$fmt}{'suffix'} = $sfx;
+ last;
+ }
+ }
}
evaluate_path_info();
--
1.5.6.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO
2008-10-19 14:24 ` [PATHv2 6/8] " Giuseppe Bilotta
@ 2008-10-19 14:24 ` Giuseppe Bilotta
2008-10-19 14:24 ` [PATHv2 8/8] gitweb: make the supported snapshot formats array global Giuseppe Bilotta
2008-11-01 0:18 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Jakub Narebski
2008-10-21 16:44 ` [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO Jakub Narebski
1 sibling, 2 replies; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-19 14:24 UTC (permalink / raw)
To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta
When PATH_INFO is active, get rid of the sf CGI parameter by embedding
the snapshot format information in the PATH_INFO URL, in the form of an
appropriate extension.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
gitweb/gitweb.perl | 21 +++++++++++++++++++++
1 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e9e9e60..5fd5a1f 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -795,6 +795,7 @@ sub href (%) {
# - action
# - hash_parent or hash_parent_base:/file_parent
# - hash or hash_base:/file_name
+ # - the snapshot_format as an appropriate suffix
# When the script is the root DirectoryIndex for the domain,
# $href here would be something like http://gitweb.example.com/
@@ -806,6 +807,10 @@ sub href (%) {
$href .= "/".esc_url($params{'project'}) if defined $params{'project'};
delete $params{'project'};
+ # since we destructively absorb parameters, we keep this
+ # boolean that remembers if we're handling a snapshot
+ my $is_snapshot = $params{'action'} eq 'snapshot';
+
# Summary just uses the project path URL, any other action is
# added to the URL
if (defined $params{'action'}) {
@@ -845,6 +850,22 @@ sub href (%) {
$href .= esc_url($params{'hash'});
delete $params{'hash'};
}
+
+ # If the action was a snapshot, we can absorb the
+ # snapshot_format parameter too
+ if ($is_snapshot) {
+ my $fmt = $params{'snapshot_format'};
+ # snapshot_format should always be defined when href()
+ # is called, but just in case some code forgets, we
+ # fall back to the default
+ if (!$fmt) {
+ my @snapshot_fmts = gitweb_check_feature('snapshot');
+ @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
+ $fmt = $snapshot_fmts[0];
+ }
+ $href .= $known_snapshot_formats{$fmt}{'suffix'};
+ delete $params{'snapshot_format'};
+ }
}
# now encode the parameters explicitly
--
1.5.6.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATHv2 8/8] gitweb: make the supported snapshot formats array global
2008-10-19 14:24 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta
@ 2008-10-19 14:24 ` Giuseppe Bilotta
2008-11-02 1:54 ` Jakub Narebski
2008-11-01 0:18 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Jakub Narebski
1 sibling, 1 reply; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-19 14:24 UTC (permalink / raw)
To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta
The @snapshot_fmts array, containing the list of the supported snapshot
formats, was recreated locally in three different routines (with the
different name @supported_fmts in one of them).
Its local generation is particularly expensive because two of the
callers, href() and format_snapshot_links(), are often called many times
in a single page.
Simplify code and speed up page generation by making the array global.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
---
gitweb/gitweb.perl | 21 ++++++++-------------
1 files changed, 8 insertions(+), 13 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 5fd5a1f..d1475b7 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -748,6 +748,10 @@ if (defined $searchtext) {
our $git_dir;
$git_dir = "$projectroot/$project" if $project;
+# list of supported snapshot formats
+our @snapshot_fmts = gitweb_check_feature('snapshot');
+@snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
+
# dispatch
if (!defined $action) {
if (defined $hash) {
@@ -858,11 +862,7 @@ sub href (%) {
# snapshot_format should always be defined when href()
# is called, but just in case some code forgets, we
# fall back to the default
- if (!$fmt) {
- my @snapshot_fmts = gitweb_check_feature('snapshot');
- @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
- $fmt = $snapshot_fmts[0];
- }
+ $fmt ||= $snapshot_fmts[0];
$href .= $known_snapshot_formats{$fmt}{'suffix'};
delete $params{'snapshot_format'};
}
@@ -1695,8 +1695,6 @@ sub format_diff_line {
# linked. Pass the hash of the tree/commit to snapshot.
sub format_snapshot_links {
my ($hash) = @_;
- my @snapshot_fmts = gitweb_check_feature('snapshot');
- @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
my $num_fmts = @snapshot_fmts;
if ($num_fmts > 1) {
# A parenthesized list of links bearing format names.
@@ -4898,20 +4896,17 @@ sub git_tree {
}
sub git_snapshot {
- my @supported_fmts = gitweb_check_feature('snapshot');
- @supported_fmts = filter_snapshot_fmts(@supported_fmts);
-
my $format = $input_params{'snapshot_format'};
- if (!@supported_fmts) {
+ if (!@snapshot_fmts) {
die_error(403, "Snapshots not allowed");
}
# default to first supported snapshot format
- $format ||= $supported_fmts[0];
+ $format ||= $snapshot_fmts[0];
if ($format !~ m/^[a-z0-9]+$/) {
die_error(400, "Invalid snapshot format parameter");
} elsif (!exists($known_snapshot_formats{$format})) {
die_error(400, "Unknown snapshot format");
- } elsif (!grep($_ eq $format, @supported_fmts)) {
+ } elsif (!grep($_ eq $format, @snapshot_fmts)) {
die_error(403, "Unsupported snapshot format");
}
--
1.5.6.5
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO
2008-10-16 20:27 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta
@ 2008-10-20 9:18 ` Jakub Narebski
2008-10-20 14:52 ` Giuseppe Bilotta
1 sibling, 1 reply; 26+ messages in thread
From: Jakub Narebski @ 2008-10-20 9:18 UTC (permalink / raw)
To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano
On Thu, 16 Oct 2008, Giuseppe Bilotta wrote:
> This patch makes it possible to use an URL such as
> $project/somebranch..otherbranch:/filename to get a diff between
> different version of a file. Paths like
> $project/$action/somebranch:/somefile..otherbranch:/otherfile are parsed
> as well.
Just a nitpick: why '$project' and '$action', but 'somebranch',
'otherbranch' and 'filename'?
>
> All '*diff' actions and in general actions that use $hash_parent[_base]
> and $file_parent can now get all of their parameters from PATH_INFO
Which currently mean 'shortlog', and I guess in the future would mean
also all other log-like views: 'log', 'history', 'search' (the commit
message kind, not the pickaxe kind), and perhaps also 'rss'/'atom'.
>
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
For what is worth:
Acked-by: Jakub Narebski <jnareb@gmail.com>
> ---
> gitweb/gitweb.perl | 36 ++++++++++++++++++++++++++++++++++--
> 1 files changed, 34 insertions(+), 2 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 49730f3..1a7b0b9 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -548,7 +548,12 @@ sub evaluate_path_info {
> 'history',
> );
>
> - my ($refname, $pathname) = split(/:/, $path_info, 2);
> + # horrible regexp to catch
> + # [$hash_parent_base[:$file_parent]..]$hash_parent[:$file_name]
This comment is really nice here, to explain what we try to catch.
Is it really "horrible"... let others decide.
Note: this (as also previous code) makes use of the fact that refname
cannot contain ':' as per git-check-ref-format(1), and the fact that
gitweb limits $hash* parameters to simple revision syntax, allowing
only SHA1 and refnames, and not allowing (see validate_refname() used
in $hash* validation) extended SHA1 syntax like <commit-ish>:<path>
for $hash ('h') param for 'blob' view: gitweb uses $hash_base and
$file_name for that.
(But I guess it is too long explanation to put it in comment)
Side note: the regexp below allow for $parentpathname to contain
'..', but we don't want to rely on such minute detail of implementation
detail (because it depends on whether we use greedy or non-greedy
matching there).
> + my ($parentrefname, $parentpathname, $refname, $pathname) =
> + ($path_info =~ /^(?:(.+?)(?::(.+))?\.\.)?(.+?)(?::(.+))?$/);
> +
> + # first, analyze the 'current' part
> if (defined $pathname) {
> # we got "branch:filename" or "branch:dir/"
> # we could use git_get_type(branch:pathname), but it needs $git_dir
> @@ -557,7 +562,13 @@ sub evaluate_path_info {
> $input_params{'action'} ||= "tree";
> $pathname =~ s,/$,,;
> } else {
> - $input_params{'action'} ||= "blob_plain";
> + # the default action depends on whether we had parent info
> + # or not
> + if ($parentrefname) {
> + $input_params{'action'} ||= "blobdiff_plain";
> + } else {
> + $input_params{'action'} ||= "blob_plain";
> + }
Nice.
I was wondering about 'project/hash_parent..hash' syntax, but then I have
realized that it doesn't change action (as in 'blob_plain' -> 'blobdiff_plain'),
but is always 'shortlog'.
By the way, I wonder if it should be 'blobdiff_plain' or just 'blobdiff'.
the 'blob_plain' was here to use gitweb as a kind of versioned web
server; there is no such equivalent for 'p/hbp..hb:f' syntax. On the
other hand it is consistent behavior, always using *_plain...
> }
> $input_params{'hash_base'} ||= $refname;
> $input_params{'file_name'} ||= $pathname;
> @@ -577,6 +588,27 @@ sub evaluate_path_info {
> $input_params{'hash'} ||= $refname;
> }
> }
> +
> + # next, handle the 'parent' part, if present
> + if (defined $parentrefname) {
> + # a missing pathspec defaults to the 'current' filename, allowing e.g.
> + # someproject/blobdiff/oldrev..newrev:/filename
> + if ($parentpathname) {
> + $parentpathname =~ s,^/+,,;
> + $parentpathname =~ s,/$,,;
Hmmm... should we strip trailing '/' here?
> + $input_params{'file_parent'} ||= $parentpathname;
> + } else {
> + $input_params{'file_parent'} ||= $input_params{'file_name'};
> + }
> + # we assume that hash_parent_base is wanted if a path was specified,
> + # or if the action wants hash_base instead of hash
Nice comment.
> + if (defined $input_params{'file_parent'} ||
> + grep {$input_params{'action'} eq $_} @wants_base) {
My preference of style would be to use here:
+ grep { $input_params{'action'} eq $_ } @wants_base) {
> + $input_params{'hash_parent_base'} ||= $parentrefname;
> + } else {
> + $input_params{'hash_parent'} ||= $parentrefname;
> + }
> + }
> }
> evaluate_path_info();
>
> --
> 1.5.6.5
>
>
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 5/5] gitweb: generate parent..current URLs
2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta
2008-10-19 12:11 ` [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO Giuseppe Bilotta
2008-10-19 14:24 ` [PATHv2 6/8] " Giuseppe Bilotta
@ 2008-10-20 10:49 ` Jakub Narebski
2008-10-20 14:57 ` Giuseppe Bilotta
2 siblings, 1 reply; 26+ messages in thread
From: Jakub Narebski @ 2008-10-20 10:49 UTC (permalink / raw)
To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano
On Thu, 16 Oct 2008, Giuseppe Bilotta wrote:
> If use_pathinfo is enabled, href now creates links that contain paths in
> the form $project/$action/oldhash:/oldname..newhash:/newname for actions
> that use hash_parent etc.
>
> If any of the filename contains two consecutive dots, it's kept as a CGI
> parameter since the resulting path would otherwise be ambiguous.
Note that this part is _required_ after previous patch, because
otherwise gitweb could generate path_info links which point to
different view than intended, when filename contains '..' in it,
like for example some test vectors in git test suite, for example
files in t/t5515
>
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
For what is worth:
Acked-by: Jakub Narebski <jnareb@gmail.com>
BTW. I wanted to test this using gitweb-Mechanize test from Lea Wiemann
GSoC 2008 work on gitweb caching, but I couldn't make it run as single
individual test (cd t && ./t9503-gitweb-Mechanize.sh); it requires to
run as "make test" and I didn't want that. I could have resurrected my
initial version of Test::WWW::Mechanize::CGI test...
> ---
> gitweb/gitweb.perl | 30 +++++++++++++++++++++++++-----
> 1 files changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 1a7b0b9..f4642e7 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -759,7 +759,8 @@ sub href (%) {
> # try to put as many parameters as possible in PATH_INFO:
> # - project name
> # - action
> - # - hash or hash_base:/filename
> + # - hash_parent or hash_parent_base:/file_parent
> + # - hash or hash_base:/file_name
Minor nit: this contain independent change 'filename' -> 'file_name',
but I think it is not worth separating...
>
> # When the script is the root DirectoryIndex for the domain,
> # $href here would be something like http://gitweb.example.com/
> @@ -778,17 +779,36 @@ sub href (%) {
> delete $params{'action'};
> }
>
> - # Finally, we put either hash_base:/file_name or hash
> + # Next, we put hash_parent_base:/file_parent..hash_base:/file_name,
> + # stripping nonexistent or useless pieces
> + $href .= "/" if ($params{'hash_base'} || $params{'hash_parent_base'}
> + || $params{'hash_parent'} || $params{'hash'});
Nice trick (and required change).
> if (defined $params{'hash_base'}) {
> - $href .= "/".esc_url($params{'hash_base'});
> - if (defined $params{'file_name'}) {
> + if (defined $params{'hash_parent_base'}) {
> + $href .= esc_url($params{'hash_parent_base'});
> + # skip the file_parent if it's the same as the file_name
> + delete $params{'file_parent'} if $params{'file_parent'} eq $params{'file_name'};
> + if (defined $params{'file_parent'} && $params{'file_parent'} !~ /\.\./) {
> + $href .= ":/".esc_url($params{'file_parent'});
> + delete $params{'file_parent'};
> + }
Side note: I wonder if we should use esc_url or esc_param here...
> + $href .= "..";
> + delete $params{'hash_parent'};
> + delete $params{'hash_parent_base'};
> + } elsif (defined $params{'hash_parent'}) {
> + $href .= esc_url($params{'hash_parent'}). "..";
> + delete $params{'hash_parent'};
> + }
> +
> + $href .= esc_url($params{'hash_base'});
> + if (defined $params{'file_name'} && $params{'file_name'} !~ /\.\./) {
> $href .= ":/".esc_url($params{'file_name'});
> delete $params{'file_name'};
> }
> delete $params{'hash'};
> delete $params{'hash_base'};
> } elsif (defined $params{'hash'}) {
> - $href .= "/".esc_url($params{'hash'});
> + $href .= esc_url($params{'hash'});
> delete $params{'hash'};
> }
> }
> --
> 1.5.6.5
>
>
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO
2008-10-20 9:18 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Jakub Narebski
@ 2008-10-20 14:52 ` Giuseppe Bilotta
0 siblings, 0 replies; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-20 14:52 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano
On Mon, Oct 20, 2008 at 11:18 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Thu, 16 Oct 2008, Giuseppe Bilotta wrote:
>
>> This patch makes it possible to use an URL such as
>> $project/somebranch..otherbranch:/filename to get a diff between
>> different version of a file. Paths like
>> $project/$action/somebranch:/somefile..otherbranch:/otherfile are parsed
>> as well.
>
> Just a nitpick: why '$project' and '$action', but 'somebranch',
> 'otherbranch' and 'filename'?
Good question ... I think I got distracted along the way.
>> All '*diff' actions and in general actions that use $hash_parent[_base]
>> and $file_parent can now get all of their parameters from PATH_INFO
>
> Which currently mean 'shortlog', and I guess in the future would mean
> also all other log-like views: 'log', 'history', 'search' (the commit
> message kind, not the pickaxe kind), and perhaps also 'rss'/'atom'.
I'm not sure rss/atom makes sense, but the others were already in my
todo list after the shortlog patch, so I'll try to get them ready as
soon as I have the time to refactor them as we discussed on IRC.
> Side note: the regexp below allow for $parentpathname to contain
> '..', but we don't want to rely on such minute detail of implementation
> detail (because it depends on whether we use greedy or non-greedy
> matching there).
>
>> + my ($parentrefname, $parentpathname, $refname, $pathname) =
>> + ($path_info =~ /^(?:(.+?)(?::(.+))?\.\.)?(.+?)(?::(.+))?$/);
Hm, actually it might be a better idea to make the first pathname
match non-greedy.
>> - $input_params{'action'} ||= "blob_plain";
>> + # the default action depends on whether we had parent info
>> + # or not
>> + if ($parentrefname) {
>> + $input_params{'action'} ||= "blobdiff_plain";
>> + } else {
>> + $input_params{'action'} ||= "blob_plain";
>> + }
>
> Nice.
>
> I was wondering about 'project/hash_parent..hash' syntax, but then I have
> realized that it doesn't change action (as in 'blob_plain' -> 'blobdiff_plain'),
> but is always 'shortlog'.
>
> By the way, I wonder if it should be 'blobdiff_plain' or just 'blobdiff'.
> the 'blob_plain' was here to use gitweb as a kind of versioned web
> server; there is no such equivalent for 'p/hbp..hb:f' syntax. On the
> other hand it is consistent behavior, always using *_plain...
Moreover, it allows sending shorter URLs for patches, which are the
ones you usually write by hand.
>> +
>> + # next, handle the 'parent' part, if present
>> + if (defined $parentrefname) {
>> + # a missing pathspec defaults to the 'current' filename, allowing e.g.
>> + # someproject/blobdiff/oldrev..newrev:/filename
>> + if ($parentpathname) {
>> + $parentpathname =~ s,^/+,,;
>> + $parentpathname =~ s,/$,,;
>
> Hmmm... should we strip trailing '/' here?
I must confess I don't remember why I decided that was needed.
--
Giuseppe "Oblomov" Bilotta
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 5/5] gitweb: generate parent..current URLs
2008-10-20 10:49 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Jakub Narebski
@ 2008-10-20 14:57 ` Giuseppe Bilotta
0 siblings, 0 replies; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-20 14:57 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano
On Mon, Oct 20, 2008 at 12:49 PM, Jakub Narebski <jnareb@gmail.com> wrote:
>> - # - hash or hash_base:/filename
>> + # - hash_parent or hash_parent_base:/file_parent
>> + # - hash or hash_base:/file_name
>
> Minor nit: this contain independent change 'filename' -> 'file_name',
> but I think it is not worth separating...
Oopsie. Oh well, I was getting so used to all those _ that it felt
strange without
>> # When the script is the root DirectoryIndex for the domain,
>> # $href here would be something like http://gitweb.example.com/
>> @@ -778,17 +779,36 @@ sub href (%) {
>> delete $params{'action'};
>> }
>>
>> - # Finally, we put either hash_base:/file_name or hash
>> + # Next, we put hash_parent_base:/file_parent..hash_base:/file_name,
>> + # stripping nonexistent or useless pieces
>> + $href .= "/" if ($params{'hash_base'} || $params{'hash_parent_base'}
>> + || $params{'hash_parent'} || $params{'hash'});
>
> Nice trick (and required change).
>
>> if (defined $params{'hash_base'}) {
>> - $href .= "/".esc_url($params{'hash_base'});
>> - if (defined $params{'file_name'}) {
>> + if (defined $params{'hash_parent_base'}) {
>> + $href .= esc_url($params{'hash_parent_base'});
>> + # skip the file_parent if it's the same as the file_name
>> + delete $params{'file_parent'} if $params{'file_parent'} eq $params{'file_name'};
>> + if (defined $params{'file_parent'} && $params{'file_parent'} !~ /\.\./) {
>> + $href .= ":/".esc_url($params{'file_parent'});
>> + delete $params{'file_parent'};
>> + }
>
> Side note: I wonder if we should use esc_url or esc_param here...
esc_url, I would say, allowing us to build RFC-compliant URLs. Isn't
esc_param for CGI?
--
Giuseppe "Oblomov" Bilotta
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO
2008-10-19 14:24 ` [PATHv2 6/8] " Giuseppe Bilotta
2008-10-19 14:24 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta
@ 2008-10-21 16:44 ` Jakub Narebski
2008-10-21 18:36 ` Giuseppe Bilotta
1 sibling, 1 reply; 26+ messages in thread
From: Jakub Narebski @ 2008-10-21 16:44 UTC (permalink / raw)
To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano
I like the idea behind this patch, to enable to use path_info for as
much gitweb parameters as possible. After this patch series the only
parameters which wouldn't be possible to represent in path_info would
be:
* @extra_options ('opt') multi-valued parameter, used to pass
thinks like '--no-merges', which cannot be fit in the "simplified"
list-like (as opposed to hash-like query string) path_info URL.
* $searchtype ('st') and $searchtext ('s') etc. parameters, which
are generated by HTML form, and are naturally generated in query
string format.
* $page ('pg') parameter, which could theoretically be added as last
part of path_info URL, for example $project/next/2/... if not for
pesky $project/history/next:/Documentation/2/ where you cannot be
sure that having /<number>/ at the end is rare.
* $order ('o') parameter, which would be hard to fit in path_info,
with its limitation of parameters being specified by position.
Or even next to impossible.
* 'by_tag'...
But I'd rather have this patch series to be in separate thread...
On Sun, 19 Oct 2008, Giuseppe Bilotta wrote:
> We parse requests for $project/snapshot/$head.$sfx as equivalent to
> $project/snapshot/$head?sf=$sfx, where $sfx is any of the known
> (although not necessarily supported) snapshot formats (or its default
> suffix).
>
> The filename for the resulting package preserves the requested
> extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz
> gives a .tar.gz), although for obvious reasons it doesn't preserve the
> basename (git/snapshot/next.tgz returns a file names git-next.tgz).
That is a bit of difference from sf=<format> in CGI query string, where
<format> is always a name of a format (for example 'tgz' or 'tbz2'),
and actual suffix is defined in %known_snapshot_formats (for example
'.tar.gz' and '.tar.bz2' respectively). Now you can specify snapshot
format either either by its name, for example 'tgz' (which is simple
lookup in hash) which result in proposed filename with '.tgz' suffix,
or you can specify suffix, for example 'tar.gz' (which requires
searching through all hash) which result in proposed filename with
'.tar.gz' suffix.
This is a bit of inconsistency; to be consistent with how we handle
'sf' CGI parameter we would translate 'tgz' $sfx into 'tar.gz' in
snapshot filename. This would also cover currently purely theoretical
case when different snapshot formats (for example 'tgz' and 'tgz9')
would use the same snapshot suffix (extension), but differ for example
in parameters passed to compressor (for example '-9' or '--best' in
the 'tgz9' case).
On the other hand one would expect that when URL which looks like
URL to snapshot ends with '.$sfx', then filename for snapshot would
also end with '.$sfx'.
This certainly requires some further thoughts.
>
> This introduces a potential case for ambiguity if a project has a head
> that ends with a snapshot-like suffix (.zip, .tgz, .tar.gz, etc) and the
> sf CGI parameter is not present; however, gitweb only produces URLs with
> the sf parameter, so this is only a potential issue for hand-coded URLs
> for extremely unusual project.
I think you wanted to say here "_currently_ produces URLs with the 'sf'
parameter" as the next patch in series changes this.
>
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
> ---
>
> I had second thoughts on this. Now we always look for the snapshot extension if
> the sf CGI parameter is missing, even if the project has a head that matches
> the full pseudo-refname $head.$sfx.
>
> The reason for this is that (1) there is no ambiguity for gitweb-generated
> URLs (2) the only URLs that could fail are hand-made URLs for extremely
> unusual projects and (3) it allows us to set gitweb up to generate
> (unambiguous) URLs without the sf CGI parameter.
This is also simpler and cheaper solution.
>
> This also means that I can add 3 patches to the series, instead of just one:
> * patch #6 that parses the new format
> * patch #7 that generates the new URLs
> * patch #8 for some code refactoring
Now, I haven't yet read the last patch in series, so I don't know if
it is independent refactoring, making sense even before patches named
#6 and #7 here, or is it connected with searching for snapshot format
by suffix it uses. If the former, it should be done upfront, as it
shouldn't need discussion, and being easier to be accepted into git.git.
If the latter, then it should probably be folded (squashed) into #6,
first patch in the series.
>
> gitweb/gitweb.perl | 34 ++++++++++++++++++++++++++++++++++
> 1 files changed, 34 insertions(+), 0 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 99c8c20..e9e9e60 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -609,6 +609,40 @@ sub evaluate_path_info {
> $input_params{'hash_parent'} ||= $parentrefname;
> }
> }
> +
> + # for the snapshot action, we allow URLs in the form
> + # $project/snapshot/$hash.ext
> + # where .ext determines the snapshot and gets removed from the
> + # passed $refname to provide the $hash.
> + #
> + # To be able to tell that $refname includes the format extension, we
> + # require the following two conditions to be satisfied:
> + # - the hash input parameter MUST have been set from the $refname part
> + # of the URL (i.e. they must be equal)
This means no "$project/.tgz?h=next", isn't it?
> + # - the snapshot format MUST NOT have been defined already
I would add "which means that 'sf' parameter is not set in URL", or
something like that as the last line of above comment.
I like that the code is so well commented, by the way.
> + if ($input_params{'action'} eq 'snapshot' && defined $refname &&
> + $refname eq $input_params{'hash'} &&
Minor nit.
I would use here (the question of style / better readability):
+ if ($input_params{'action'} eq 'snapshot' &&
+ defined $refname && $refname eq $input_params{'hash'} &&
to have both conditions about $refname in the same line.
> + !defined $input_params{'snapshot_format'}) {
> + # We loop over the known snapshot formats, checking for
> + # extensions. Allowed extensions are both the defined suffix
> + # (which includes the initial dot already) and the snapshot
> + # format key itself, with a prepended dot
> + while (my ($fmt, %opt) = each %known_snapshot_formats) {
> + my $hash = $refname;
> + my $sfx;
> + $hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//;
> + next unless $sfx = $1;
> + # a valid suffix was found, so set the snapshot format
> + # and reset the hash parameter
> + $input_params{'snapshot_format'} = $fmt;
> + $input_params{'hash'} = $hash;
> + # we also set the format suffix to the one requested
> + # in the URL: this way a request for e.g. .tgz returns
> + # a .tgz instead of a .tar.gz
> + $known_snapshot_formats{$fmt}{'suffix'} = $sfx;
> + last;
> + }
I'm not sure if it worth (see comment at the beginning of this mail)
adding this code, or just allow $sfx to be snapshot _name_ (key in
%known_snapshot_formats hash).
Otherwise it would be as simple as checking if $known_snapshot_formats{$sfx}
exists (assuming that snapshot format names does not contain '.').
If we decide to go more complicated route, then refactoring it in such
a way that suffixes are also keys to %known_snapshot_formats would be
preferred... err, sorry, not so simple. But refactoring this check
into separate subroutine (as I think last patch in series does) would
be good idea.
Also, I'd rather you checked if the $refname part contains '.' for it
to even consider that it can be suffix.
> + }
> }
> evaluate_path_info();
>
> --
> 1.5.6.5
>
>
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO
2008-10-21 16:44 ` [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO Jakub Narebski
@ 2008-10-21 18:36 ` Giuseppe Bilotta
2008-10-21 19:09 ` Junio C Hamano
0 siblings, 1 reply; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-10-21 18:36 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano
On Tue, Oct 21, 2008 at 6:44 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> I like the idea behind this patch, to enable to use path_info for as
> much gitweb parameters as possible. After this patch series the only
> parameters which wouldn't be possible to represent in path_info would
> be:
> * @extra_options ('opt') multi-valued parameter, used to pass
> thinks like '--no-merges', which cannot be fit in the "simplified"
> list-like (as opposed to hash-like query string) path_info URL.
> * $searchtype ('st') and $searchtext ('s') etc. parameters, which
> are generated by HTML form, and are naturally generated in query
> string format.
> * $page ('pg') parameter, which could theoretically be added as last
> part of path_info URL, for example $project/next/2/... if not for
> pesky $project/history/next:/Documentation/2/ where you cannot be
> sure that having /<number>/ at the end is rare.
> * $order ('o') parameter, which would be hard to fit in path_info,
> with its limitation of parameters being specified by position.
> Or even next to impossible.
> * 'by_tag'...
>
> But I'd rather have this patch series to be in separate thread...
Yes, a posteriori I think it's better too. I'll resend the 5 path_info
patches with the minor stylistic corrections you suggested, and send
these 3 separately.
> On Sun, 19 Oct 2008, Giuseppe Bilotta wrote:
>
>> We parse requests for $project/snapshot/$head.$sfx as equivalent to
>> $project/snapshot/$head?sf=$sfx, where $sfx is any of the known
>> (although not necessarily supported) snapshot formats (or its default
>> suffix).
>>
>> The filename for the resulting package preserves the requested
>> extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz
>> gives a .tar.gz), although for obvious reasons it doesn't preserve the
>> basename (git/snapshot/next.tgz returns a file names git-next.tgz).
>
> That is a bit of difference from sf=<format> in CGI query string, where
> <format> is always a name of a format (for example 'tgz' or 'tbz2'),
> and actual suffix is defined in %known_snapshot_formats (for example
> '.tar.gz' and '.tar.bz2' respectively). Now you can specify snapshot
> format either either by its name, for example 'tgz' (which is simple
> lookup in hash) which result in proposed filename with '.tgz' suffix,
> or you can specify suffix, for example 'tar.gz' (which requires
> searching through all hash) which result in proposed filename with
> '.tar.gz' suffix.
>
> This is a bit of inconsistency; to be consistent with how we handle
> 'sf' CGI parameter we would translate 'tgz' $sfx into 'tar.gz' in
> snapshot filename. This would also cover currently purely theoretical
> case when different snapshot formats (for example 'tgz' and 'tgz9')
> would use the same snapshot suffix (extension), but differ for example
> in parameters passed to compressor (for example '-9' or '--best' in
> the 'tgz9' case).
>
> On the other hand one would expect that when URL which looks like
> URL to snapshot ends with '.$sfx', then filename for snapshot would
> also end with '.$sfx'.
>
> This certainly requires some further thoughts.
What I decided was to set gitweb to always produce links with the
suffix (.e.g .tar.gz), but I saw no particular reason not to accept
the shorter version which is (1) commonly used as a suffix as well and
(2) happens to be the actual format key used by gitweb.
A different, possibly cleaner approach, but a more extensive change,
would be to have each format describe a list of suffixes, defaulting
to the first one on creation by identifying all of them. This is more
invasive because all of the uses of {'suffix'} have to be replaced
with {'suffix'}[0], or something like that (maybe we could add a
separate key 'other_suffixes' instead?)
>> This introduces a potential case for ambiguity if a project has a head
>> that ends with a snapshot-like suffix (.zip, .tgz, .tar.gz, etc) and the
>> sf CGI parameter is not present; however, gitweb only produces URLs with
>> the sf parameter, so this is only a potential issue for hand-coded URLs
>> for extremely unusual project.
>
> I think you wanted to say here "_currently_ produces URLs with the 'sf'
> parameter" as the next patch in series changes this.
Ah yes, good point.
>> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
>> ---
>>
>> I had second thoughts on this. Now we always look for the snapshot extension if
>> the sf CGI parameter is missing, even if the project has a head that matches
>> the full pseudo-refname $head.$sfx.
>>
>> The reason for this is that (1) there is no ambiguity for gitweb-generated
>> URLs (2) the only URLs that could fail are hand-made URLs for extremely
>> unusual projects and (3) it allows us to set gitweb up to generate
>> (unambiguous) URLs without the sf CGI parameter.
>
> This is also simpler and cheaper solution.
That, too 8-)
>> This also means that I can add 3 patches to the series, instead of just one:
>> * patch #6 that parses the new format
>> * patch #7 that generates the new URLs
>> * patch #8 for some code refactoring
>
> Now, I haven't yet read the last patch in series, so I don't know if
> it is independent refactoring, making sense even before patches named
> #6 and #7 here, or is it connected with searching for snapshot format
> by suffix it uses. If the former, it should be done upfront, as it
> shouldn't need discussion, and being easier to be accepted into git.git.
> If the latter, then it should probably be folded (squashed) into #6,
> first patch in the series.
In fact, patch #8 can be written independently of the other too, and
would provide a significant speed benefit for generation of pages with
lots of 'snapshot' links: what it does is just to make the 'supported
formats' array global, preparing it only once instead of re-preparing
it every time a snapshot link is created.
>> gitweb/gitweb.perl | 34 ++++++++++++++++++++++++++++++++++
>> 1 files changed, 34 insertions(+), 0 deletions(-)
>>
>> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
>> index 99c8c20..e9e9e60 100755
>> --- a/gitweb/gitweb.perl
>> +++ b/gitweb/gitweb.perl
>> @@ -609,6 +609,40 @@ sub evaluate_path_info {
>> $input_params{'hash_parent'} ||= $parentrefname;
>> }
>> }
>> +
>> + # for the snapshot action, we allow URLs in the form
>> + # $project/snapshot/$hash.ext
>> + # where .ext determines the snapshot and gets removed from the
>> + # passed $refname to provide the $hash.
>> + #
>> + # To be able to tell that $refname includes the format extension, we
>> + # require the following two conditions to be satisfied:
>> + # - the hash input parameter MUST have been set from the $refname part
>> + # of the URL (i.e. they must be equal)
>
> This means no "$project/.tgz?h=next", isn't it?
Right.
>> + # - the snapshot format MUST NOT have been defined already
>
> I would add "which means that 'sf' parameter is not set in URL", or
> something like that as the last line of above comment.
Good idea. I'll make it an 'e.g.' to keep the comment valid for future
additional parameter evaluation such as command-line input.
> I like that the code is so well commented, by the way.
Thanks.
>> + if ($input_params{'action'} eq 'snapshot' && defined $refname &&
>> + $refname eq $input_params{'hash'} &&
>
> Minor nit.
>
> I would use here (the question of style / better readability):
>
> + if ($input_params{'action'} eq 'snapshot' &&
> + defined $refname && $refname eq $input_params{'hash'} &&
>
> to have both conditions about $refname in the same line.
Yes, it'd look much better.
>> + !defined $input_params{'snapshot_format'}) {
>> + # We loop over the known snapshot formats, checking for
>> + # extensions. Allowed extensions are both the defined suffix
>> + # (which includes the initial dot already) and the snapshot
>> + # format key itself, with a prepended dot
>> + while (my ($fmt, %opt) = each %known_snapshot_formats) {
>> + my $hash = $refname;
>> + my $sfx;
>> + $hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//;
>> + next unless $sfx = $1;
>> + # a valid suffix was found, so set the snapshot format
>> + # and reset the hash parameter
>> + $input_params{'snapshot_format'} = $fmt;
>> + $input_params{'hash'} = $hash;
>> + # we also set the format suffix to the one requested
>> + # in the URL: this way a request for e.g. .tgz returns
>> + # a .tgz instead of a .tar.gz
>> + $known_snapshot_formats{$fmt}{'suffix'} = $sfx;
>> + last;
>> + }
>
> I'm not sure if it worth (see comment at the beginning of this mail)
> adding this code, or just allow $sfx to be snapshot _name_ (key in
> %known_snapshot_formats hash).
>
> Otherwise it would be as simple as checking if $known_snapshot_formats{$sfx}
> exists (assuming that snapshot format names does not contain '.').
>
> If we decide to go more complicated route, then refactoring it in such
> a way that suffixes are also keys to %known_snapshot_formats would be
> preferred... err, sorry, not so simple. But refactoring this check
> into separate subroutine (as I think last patch in series does) would
> be good idea.
See comments above.
> Also, I'd rather you checked if the $refname part contains '.' for it
> to even consider that it can be suffix.
Ah, good idea.
--
Giuseppe "Oblomov" Bilotta
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO
2008-10-21 18:36 ` Giuseppe Bilotta
@ 2008-10-21 19:09 ` Junio C Hamano
0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2008-10-21 19:09 UTC (permalink / raw)
To: Giuseppe Bilotta; +Cc: Jakub Narebski, git, Petr Baudis, Junio C Hamano
"Giuseppe Bilotta" <giuseppe.bilotta@gmail.com> writes:
>> But I'd rather have this patch series to be in separate thread...
>
> Yes, a posteriori I think it's better too. I'll resend the 5 path_info
> patches with the minor stylistic corrections you suggested, and send
> these 3 separately.
I've only been watching from the sidelines, but the discussion seemed
sensible. With the resend with Acks I think they are already 'next'
material.
Thanks, both of you.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO
2008-10-19 14:24 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta
2008-10-19 14:24 ` [PATHv2 8/8] gitweb: make the supported snapshot formats array global Giuseppe Bilotta
@ 2008-11-01 0:18 ` Jakub Narebski
2008-11-01 12:57 ` Giuseppe Bilotta
1 sibling, 1 reply; 26+ messages in thread
From: Jakub Narebski @ 2008-11-01 0:18 UTC (permalink / raw)
To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano
On Sun, 19 Oct 2008, Giuseppe Bilotta wrote:
I'm sorry for the delay.
> When PATH_INFO is active, get rid of the sf CGI parameter by embedding
> the snapshot format information in the PATH_INFO URL, in the form of an
> appropriate extension.
The question is: should we use format suffix (e.g. 'tar.gz'),
or format name (e.g. 'tgz')? The latter is later easier to parse,
see comments to first patch in better snapshot support for path_info.
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index e9e9e60..5fd5a1f 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -795,6 +795,7 @@ sub href (%) {
> # - action
> # - hash_parent or hash_parent_base:/file_parent
> # - hash or hash_base:/file_name
> + # - the snapshot_format as an appropriate suffix
>
> # When the script is the root DirectoryIndex for the domain,
> # $href here would be something like http://gitweb.example.com/
Good.
> @@ -806,6 +807,10 @@ sub href (%) {
> $href .= "/".esc_url($params{'project'}) if defined $params{'project'};
> delete $params{'project'};
>
> + # since we destructively absorb parameters, we keep this
> + # boolean that remembers if we're handling a snapshot
> + my $is_snapshot = $params{'action'} eq 'snapshot';
> +
Side note: we destructively absorb parameters, because parameters
which are not absorbed are then used to generate query string part
of URL.
Deleting parameter but remembering the fact that it was used is one
(but not only) solution.
> # Summary just uses the project path URL, any other action is
> # added to the URL
> if (defined $params{'action'}) {
> @@ -845,6 +850,22 @@ sub href (%) {
> $href .= esc_url($params{'hash'});
> delete $params{'hash'};
> }
> +
> + # If the action was a snapshot, we can absorb the
> + # snapshot_format parameter too
> + if ($is_snapshot) {
> + my $fmt = $params{'snapshot_format'};
> + # snapshot_format should always be defined when href()
> + # is called, but just in case some code forgets, we
> + # fall back to the default
> + if (!$fmt) {
> + my @snapshot_fmts = gitweb_check_feature('snapshot');
> + @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
> + $fmt = $snapshot_fmts[0];
> + }
I anderstand that the above code is improved with new patch?
> + $href .= $known_snapshot_formats{$fmt}{'suffix'};
Again: should we use snapshot prefix, or snapshot name, which means here
do we use $known_snapshot_formats{$fmt}{'suffix'}; or just $fmt; ?
> + delete $params{'snapshot_format'};
> + }
> }
>
> # now encode the parameters explicitly
> --
> 1.5.6.5
>
>
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO
2008-11-01 0:18 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Jakub Narebski
@ 2008-11-01 12:57 ` Giuseppe Bilotta
0 siblings, 0 replies; 26+ messages in thread
From: Giuseppe Bilotta @ 2008-11-01 12:57 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano
On Sat, Nov 1, 2008 at 1:18 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> On Sun, 19 Oct 2008, Giuseppe Bilotta wrote:
>
> I'm sorry for the delay.
No problem.
>> When PATH_INFO is active, get rid of the sf CGI parameter by embedding
>> the snapshot format information in the PATH_INFO URL, in the form of an
>> appropriate extension.
>
> The question is: should we use format suffix (e.g. 'tar.gz'),
> or format name (e.g. 'tgz')? The latter is later easier to parse,
> see comments to first patch in better snapshot support for path_info.
I think it's essentially a matter of deciding which extension to use
by default on output, and accepting either extension on input.
>> + # since we destructively absorb parameters, we keep this
>> + # boolean that remembers if we're handling a snapshot
>> + my $is_snapshot = $params{'action'} eq 'snapshot';
>> +
>
> Side note: we destructively absorb parameters, because parameters
> which are not absorbed are then used to generate query string part
> of URL.
>
> Deleting parameter but remembering the fact that it was used is one
> (but not only) solution.
Well, in the specific case of the 'action' parameter we could just
keep it around and only discard it at the end.
>> + if (!$fmt) {
>> + my @snapshot_fmts = gitweb_check_feature('snapshot');
>> + @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
>> + $fmt = $snapshot_fmts[0];
>> + }
>
> I anderstand that the above code is improved with new patch?
Yup, I followed your suggestion of putting the refactoring earlier.
>> + $href .= $known_snapshot_formats{$fmt}{'suffix'};
>
> Again: should we use snapshot prefix, or snapshot name, which means here
> do we use $known_snapshot_formats{$fmt}{'suffix'}; or just $fmt; ?
I really don't know, either is fine for me.
Maybe using $fmt has the benefit that it copes better with some buggy
versions of some browser (like Opera) that tend to strip the .gz from
.tar.gz files ...
--
Giuseppe "Oblomov" Bilotta
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 8/8] gitweb: make the supported snapshot formats array global
2008-10-19 14:24 ` [PATHv2 8/8] gitweb: make the supported snapshot formats array global Giuseppe Bilotta
@ 2008-11-02 1:54 ` Jakub Narebski
2008-11-02 8:50 ` Junio C Hamano
0 siblings, 1 reply; 26+ messages in thread
From: Jakub Narebski @ 2008-11-02 1:54 UTC (permalink / raw)
To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano
On Sun, 19 Oct 2008, Giuseppe Bilotta wrote:
> The @snapshot_fmts array, containing the list of the supported snapshot
> formats, was recreated locally in three different routines (with the
> different name @supported_fmts in one of them).
>
> Its local generation is particularly expensive because two of the
> callers, href() and format_snapshot_links(), are often called many times
> in a single page.
>
> Simplify code and speed up page generation by making the array global.
>
> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Very good idea, although I'd prefer for this patch to come first;
it is not controversial contrary to other patches in this (sub)series
which IMHO needs some thought first.
Acked-by: Jakub Narebski <jnareb@gmail.com>
> ---
> gitweb/gitweb.perl | 21 ++++++++-------------
> 1 files changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 5fd5a1f..d1475b7 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -748,6 +748,10 @@ if (defined $searchtext) {
> our $git_dir;
> $git_dir = "$projectroot/$project" if $project;
>
> +# list of supported snapshot formats
> +our @snapshot_fmts = gitweb_check_feature('snapshot');
> +@snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
> +
> # dispatch
> if (!defined $action) {
> if (defined $hash) {
> @@ -858,11 +862,7 @@ sub href (%) {
> # snapshot_format should always be defined when href()
> # is called, but just in case some code forgets, we
> # fall back to the default
> - if (!$fmt) {
> - my @snapshot_fmts = gitweb_check_feature('snapshot');
> - @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
> - $fmt = $snapshot_fmts[0];
> - }
> + $fmt ||= $snapshot_fmts[0];
> $href .= $known_snapshot_formats{$fmt}{'suffix'};
> delete $params{'snapshot_format'};
> }
> @@ -1695,8 +1695,6 @@ sub format_diff_line {
> # linked. Pass the hash of the tree/commit to snapshot.
> sub format_snapshot_links {
> my ($hash) = @_;
> - my @snapshot_fmts = gitweb_check_feature('snapshot');
> - @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
> my $num_fmts = @snapshot_fmts;
> if ($num_fmts > 1) {
> # A parenthesized list of links bearing format names.
> @@ -4898,20 +4896,17 @@ sub git_tree {
> }
>
> sub git_snapshot {
> - my @supported_fmts = gitweb_check_feature('snapshot');
> - @supported_fmts = filter_snapshot_fmts(@supported_fmts);
> -
> my $format = $input_params{'snapshot_format'};
> - if (!@supported_fmts) {
> + if (!@snapshot_fmts) {
> die_error(403, "Snapshots not allowed");
> }
> # default to first supported snapshot format
> - $format ||= $supported_fmts[0];
> + $format ||= $snapshot_fmts[0];
> if ($format !~ m/^[a-z0-9]+$/) {
> die_error(400, "Invalid snapshot format parameter");
> } elsif (!exists($known_snapshot_formats{$format})) {
> die_error(400, "Unknown snapshot format");
> - } elsif (!grep($_ eq $format, @supported_fmts)) {
> + } elsif (!grep($_ eq $format, @snapshot_fmts)) {
> die_error(403, "Unsupported snapshot format");
> }
>
> --
> 1.5.6.5
>
>
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 8/8] gitweb: make the supported snapshot formats array global
2008-11-02 1:54 ` Jakub Narebski
@ 2008-11-02 8:50 ` Junio C Hamano
0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2008-11-02 8:50 UTC (permalink / raw)
To: Giuseppe Bilotta; +Cc: Jakub Narebski, git, Petr Baudis
Jakub Narebski <jnareb@gmail.com> writes:
> Very good idea, although I'd prefer for this patch to come first;
> it is not controversial contrary to other patches in this (sub)series
> which IMHO needs some thought first.
>
> Acked-by: Jakub Narebski <jnareb@gmail.com>
I've dismissed the three patches 6-8 (without prejudice, as I lost track
of them); please resubmit and re-ack.
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2008-11-02 8:51 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-16 20:27 [PATCHv6 0/5] gitweb: PATH_INFO enhancement Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta
2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta
2008-10-19 12:11 ` [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO Giuseppe Bilotta
2008-10-19 14:24 ` [PATHv2 6/8] " Giuseppe Bilotta
2008-10-19 14:24 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta
2008-10-19 14:24 ` [PATHv2 8/8] gitweb: make the supported snapshot formats array global Giuseppe Bilotta
2008-11-02 1:54 ` Jakub Narebski
2008-11-02 8:50 ` Junio C Hamano
2008-11-01 0:18 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Jakub Narebski
2008-11-01 12:57 ` Giuseppe Bilotta
2008-10-21 16:44 ` [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO Jakub Narebski
2008-10-21 18:36 ` Giuseppe Bilotta
2008-10-21 19:09 ` Junio C Hamano
2008-10-20 10:49 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Jakub Narebski
2008-10-20 14:57 ` Giuseppe Bilotta
2008-10-20 9:18 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Jakub Narebski
2008-10-20 14:52 ` Giuseppe Bilotta
2008-10-18 23:26 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Jakub Narebski
2008-10-18 23:57 ` Giuseppe Bilotta
2008-10-19 8:43 ` Jakub Narebski
2008-10-18 23:14 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Jakub Narebski
2008-10-18 22:41 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).