* [PATCHv6 0/5] gitweb: PATH_INFO enhancement @ 2008-10-16 20:27 Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta 0 siblings, 1 reply; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-16 20:27 UTC (permalink / raw) To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta Sixth attempt for my gitweb PATH_INFO patchset, whose purpose is to reduce the use of CGI parameters by embedding as many parameters as possible in the URL path itself, provided the pathinfo feature is enabled. The new typical gitweb URL is therefore in the form $project/$action/$parent:$file..$hash:$file (with useless parts stripped). Backwards compatibility for old-style $project/$hash[:$file] URLs is kept, as long as $hash is not a refname whose name happens to match a git action. The main implementation is provided by paired patches (#1#2, #4#5) that implement parsing and generation of the new style URLs. Patch #3 is a minor improvement to the URL syntax that allows web sites to be properly browsable in raw mode. The patchset depends on my previous input parameter handling patch currently waiting in 'next'. This resend fixes a couple of problems I found in v5. Giuseppe Bilotta (5): gitweb: parse project/action/hash_base:filename PATH_INFO gitweb: generate project/action/hash URLs gitweb: use_pathinfo filenames start with / gitweb: parse parent..current syntax from PATH_INFO gitweb: generate parent..current URLs gitweb/gitweb.perl | 124 +++++++++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 114 insertions(+), 10 deletions(-) ^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO 2008-10-16 20:27 [PATCHv6 0/5] gitweb: PATH_INFO enhancement Giuseppe Bilotta @ 2008-10-16 20:27 ` Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta 2008-10-18 22:41 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski 0 siblings, 2 replies; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-16 20:27 UTC (permalink / raw) To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta This patch enables gitweb to parse URLs with more information embedded in PATH_INFO, reducing the need for CGI parameters. The typical gitweb path is now $project/$action/$hash_base:$file_name or $project/$action/$hash This is mostly backwards compatible with the old-style gitweb paths, $project/$branch[:$filename], except when it was used to access a branch whose name matches a gitweb action. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> --- gitweb/gitweb.perl | 37 +++++++++++++++++++++++++++++++------ 1 files changed, 31 insertions(+), 6 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index c5254af..6d0dc26 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -534,23 +534,48 @@ sub evaluate_path_info { return if $input_params{'action'}; $path_info =~ s,^\Q$project\E/*,,; + # next, check if we have an action + my $action = $path_info; + $action =~ s,/.*$,,; + if (exists $actions{$action}) { + $path_info =~ s,^$action/*,,; + $input_params{'action'} = $action; + } + + # list of actions that want hash_base instead of hash + my @wants_base = ( + 'tree', + 'history', + ); + my ($refname, $pathname) = split(/:/, $path_info, 2); if (defined $pathname) { - # we got "project.git/branch:filename" or "project.git/branch:dir/" + # we got "branch:filename" or "branch:dir/" # we could use git_get_type(branch:pathname), but it needs $git_dir $pathname =~ s,^/+,,; if (!$pathname || substr($pathname, -1) eq "/") { - $input_params{'action'} = "tree"; + $input_params{'action'} ||= "tree"; $pathname =~ s,/$,,; } else { - $input_params{'action'} = "blob_plain"; + $input_params{'action'} ||= "blob_plain"; } $input_params{'hash_base'} ||= $refname; $input_params{'file_name'} ||= $pathname; } elsif (defined $refname) { - # we got "project.git/branch" - $input_params{'action'} = "shortlog"; - $input_params{'hash'} ||= $refname; + # we got "branch". in this case we have to choose if we have to + # set hash or hash_base. + # + # Most of the actions without a pathname only want hash to be + # set, except for the ones specified in @wants_base that want + # hash_base instead. It should also be noted that hand-crafted + # links having 'history' as an action and no pathname or hash + # set will fail, but that happens regardless of PATH_INFO. + $input_params{'action'} ||= "shortlog"; + if (grep {$input_params{'action'} eq $_} @wants_base) { + $input_params{'hash_base'} ||= $refname; + } else { + $input_params{'hash'} ||= $refname; + } } } evaluate_path_info(); -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCHv6 2/5] gitweb: generate project/action/hash URLs 2008-10-16 20:27 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta @ 2008-10-16 20:27 ` Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta 2008-10-18 23:14 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Jakub Narebski 2008-10-18 22:41 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski 1 sibling, 2 replies; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-16 20:27 UTC (permalink / raw) To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta When generating path info URLs, reduce the number of CGI parameters by embedding action and hash_parent:filename or hash in the path. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> --- gitweb/gitweb.perl | 33 ++++++++++++++++++++++++++++++--- 1 files changed, 30 insertions(+), 3 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 6d0dc26..5337d40 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -724,14 +724,41 @@ sub href (%) { my ($use_pathinfo) = gitweb_check_feature('pathinfo'); if ($use_pathinfo) { - # use PATH_INFO for project name + # try to put as many parameters as possible in PATH_INFO: + # - project name + # - action + # - hash or hash_base:filename + + # When the script is the root DirectoryIndex for the domain, + # $href here would be something like http://gitweb.example.com/ + # Thus, we strip any trailing / from $href, to spare us double + # slashes in the final URL + $href =~ s,/$,,; + + # Then add the project name, if present $href .= "/".esc_url($params{'project'}) if defined $params{'project'}; delete $params{'project'}; - # Summary just uses the project path URL - if (defined $params{'action'} && $params{'action'} eq 'summary') { + # Summary just uses the project path URL, any other action is + # added to the URL + if (defined $params{'action'}) { + $href .= "/".esc_url($params{'action'}) unless $params{'action'} eq 'summary'; delete $params{'action'}; } + + # Finally, we put either hash_base:file_name or hash + if (defined $params{'hash_base'}) { + $href .= "/".esc_url($params{'hash_base'}); + if (defined $params{'file_name'}) { + $href .= ":".esc_url($params{'file_name'}); + delete $params{'file_name'}; + } + delete $params{'hash'}; + delete $params{'hash_base'}; + } elsif (defined $params{'hash'}) { + $href .= "/".esc_url($params{'hash'}); + delete $params{'hash'}; + } } # now encode the parameters explicitly -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / 2008-10-16 20:27 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta @ 2008-10-16 20:27 ` Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta 2008-10-18 23:26 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Jakub Narebski 2008-10-18 23:14 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Jakub Narebski 1 sibling, 2 replies; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-16 20:27 UTC (permalink / raw) To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta When using path info, make filenames start with a / (right after the : that separates them from the hash base). This minimal change allows relative navigation to work properly when viewing HTML files in raw ('blob_plain') mode. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> --- gitweb/gitweb.perl | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 5337d40..49730f3 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -727,7 +727,7 @@ sub href (%) { # try to put as many parameters as possible in PATH_INFO: # - project name # - action - # - hash or hash_base:filename + # - hash or hash_base:/filename # When the script is the root DirectoryIndex for the domain, # $href here would be something like http://gitweb.example.com/ @@ -746,11 +746,11 @@ sub href (%) { delete $params{'action'}; } - # Finally, we put either hash_base:file_name or hash + # Finally, we put either hash_base:/file_name or hash if (defined $params{'hash_base'}) { $href .= "/".esc_url($params{'hash_base'}); if (defined $params{'file_name'}) { - $href .= ":".esc_url($params{'file_name'}); + $href .= ":/".esc_url($params{'file_name'}); delete $params{'file_name'}; } delete $params{'hash'}; -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO 2008-10-16 20:27 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta @ 2008-10-16 20:27 ` Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta 2008-10-20 9:18 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Jakub Narebski 2008-10-18 23:26 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Jakub Narebski 1 sibling, 2 replies; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-16 20:27 UTC (permalink / raw) To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta This patch makes it possible to use an URL such as $project/somebranch..otherbranch:/filename to get a diff between different version of a file. Paths like $project/$action/somebranch:/somefile..otherbranch:/otherfile are parsed as well. All '*diff' actions and in general actions that use $hash_parent[_base] and $file_parent can now get all of their parameters from PATH_INFO Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> --- gitweb/gitweb.perl | 36 ++++++++++++++++++++++++++++++++++-- 1 files changed, 34 insertions(+), 2 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 49730f3..1a7b0b9 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -548,7 +548,12 @@ sub evaluate_path_info { 'history', ); - my ($refname, $pathname) = split(/:/, $path_info, 2); + # horrible regexp to catch + # [$hash_parent_base[:$file_parent]..]$hash_parent[:$file_name] + my ($parentrefname, $parentpathname, $refname, $pathname) = + ($path_info =~ /^(?:(.+?)(?::(.+))?\.\.)?(.+?)(?::(.+))?$/); + + # first, analyze the 'current' part if (defined $pathname) { # we got "branch:filename" or "branch:dir/" # we could use git_get_type(branch:pathname), but it needs $git_dir @@ -557,7 +562,13 @@ sub evaluate_path_info { $input_params{'action'} ||= "tree"; $pathname =~ s,/$,,; } else { - $input_params{'action'} ||= "blob_plain"; + # the default action depends on whether we had parent info + # or not + if ($parentrefname) { + $input_params{'action'} ||= "blobdiff_plain"; + } else { + $input_params{'action'} ||= "blob_plain"; + } } $input_params{'hash_base'} ||= $refname; $input_params{'file_name'} ||= $pathname; @@ -577,6 +588,27 @@ sub evaluate_path_info { $input_params{'hash'} ||= $refname; } } + + # next, handle the 'parent' part, if present + if (defined $parentrefname) { + # a missing pathspec defaults to the 'current' filename, allowing e.g. + # someproject/blobdiff/oldrev..newrev:/filename + if ($parentpathname) { + $parentpathname =~ s,^/+,,; + $parentpathname =~ s,/$,,; + $input_params{'file_parent'} ||= $parentpathname; + } else { + $input_params{'file_parent'} ||= $input_params{'file_name'}; + } + # we assume that hash_parent_base is wanted if a path was specified, + # or if the action wants hash_base instead of hash + if (defined $input_params{'file_parent'} || + grep {$input_params{'action'} eq $_} @wants_base) { + $input_params{'hash_parent_base'} ||= $parentrefname; + } else { + $input_params{'hash_parent'} ||= $parentrefname; + } + } } evaluate_path_info(); -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCHv6 5/5] gitweb: generate parent..current URLs 2008-10-16 20:27 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta @ 2008-10-16 20:27 ` Giuseppe Bilotta 2008-10-19 12:11 ` [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO Giuseppe Bilotta ` (2 more replies) 2008-10-20 9:18 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Jakub Narebski 1 sibling, 3 replies; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-16 20:27 UTC (permalink / raw) To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta If use_pathinfo is enabled, href now creates links that contain paths in the form $project/$action/oldhash:/oldname..newhash:/newname for actions that use hash_parent etc. If any of the filename contains two consecutive dots, it's kept as a CGI parameter since the resulting path would otherwise be ambiguous. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> --- gitweb/gitweb.perl | 30 +++++++++++++++++++++++++----- 1 files changed, 25 insertions(+), 5 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 1a7b0b9..f4642e7 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -759,7 +759,8 @@ sub href (%) { # try to put as many parameters as possible in PATH_INFO: # - project name # - action - # - hash or hash_base:/filename + # - hash_parent or hash_parent_base:/file_parent + # - hash or hash_base:/file_name # When the script is the root DirectoryIndex for the domain, # $href here would be something like http://gitweb.example.com/ @@ -778,17 +779,36 @@ sub href (%) { delete $params{'action'}; } - # Finally, we put either hash_base:/file_name or hash + # Next, we put hash_parent_base:/file_parent..hash_base:/file_name, + # stripping nonexistent or useless pieces + $href .= "/" if ($params{'hash_base'} || $params{'hash_parent_base'} + || $params{'hash_parent'} || $params{'hash'}); if (defined $params{'hash_base'}) { - $href .= "/".esc_url($params{'hash_base'}); - if (defined $params{'file_name'}) { + if (defined $params{'hash_parent_base'}) { + $href .= esc_url($params{'hash_parent_base'}); + # skip the file_parent if it's the same as the file_name + delete $params{'file_parent'} if $params{'file_parent'} eq $params{'file_name'}; + if (defined $params{'file_parent'} && $params{'file_parent'} !~ /\.\./) { + $href .= ":/".esc_url($params{'file_parent'}); + delete $params{'file_parent'}; + } + $href .= ".."; + delete $params{'hash_parent'}; + delete $params{'hash_parent_base'}; + } elsif (defined $params{'hash_parent'}) { + $href .= esc_url($params{'hash_parent'}). ".."; + delete $params{'hash_parent'}; + } + + $href .= esc_url($params{'hash_base'}); + if (defined $params{'file_name'} && $params{'file_name'} !~ /\.\./) { $href .= ":/".esc_url($params{'file_name'}); delete $params{'file_name'}; } delete $params{'hash'}; delete $params{'hash_base'}; } elsif (defined $params{'hash'}) { - $href .= "/".esc_url($params{'hash'}); + $href .= esc_url($params{'hash'}); delete $params{'hash'}; } } -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO 2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta @ 2008-10-19 12:11 ` Giuseppe Bilotta 2008-10-19 14:24 ` [PATHv2 6/8] " Giuseppe Bilotta 2008-10-20 10:49 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Jakub Narebski 2 siblings, 0 replies; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-19 12:11 UTC (permalink / raw) To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta We parse requests for $project/snapshot/$head.$sfx as equivalent to $project/snapshot/$head?sf=$sfx, where $sfx is any of the known (although not necessarily supported) snapshot formats (or its default suffix). The filename for the resulting package preserves the requested extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz gives a .tar.gz), although for obvious reasons it doesn't preserve the basename (git/snapshot/next.tgz returns a file names git-next.tgz). Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> --- Ok, I lied, there's a 6th patch ready. Its purpose is to allow embedding of the snapshot format parameter in the URL, in the form of an extension appended to the refname. I have not prepared the corresponding URL generation code yet, because of potential ambiguity that might arise: even though at URL generation time the project does not have a head named $head.$sfx, it is possible (although extremely unlikely) that such head is created later on, and the PATH_INFO generation code cannot therefore guarantee the permalink nature of the embedded snapshot format URL. gitweb/gitweb.perl | 35 +++++++++++++++++++++++++++++++++++ 1 files changed, 35 insertions(+), 0 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 99c8c20..7ffd10b 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -609,6 +609,41 @@ sub evaluate_path_info { $input_params{'hash_parent'} ||= $parentrefname; } } + + # for the snapshot action, we allow URLs in the form + # $project/snapshot/$hash.ext + # where .ext determines the snapshot and gets removed from the + # passed $refname to provide the $hash. + # + # To be able to tell that $refname includes the format extension, we + # require some conditions to be satisfied: + # - the hash input parameter MUST have been set from the $refname part + # of the URL (i.e. they must be equal) + # - the snapshot format MUST NOT have been defined already + # - the $refname itself MUST NOT be a valid refname + if ($input_params{'action'} eq 'snapshot' && defined $refname && + $refname eq $input_params{'hash'} && + !defined $input_params{'snapshot_format'} && + !parse_commit($refname)) { + # We loop over the known snapshot formats, checking for + # extensions. Allowed extensions are both the defined suffix + # (which includes the initial dot already) and the snapshot + # format key itself, with a prepended dot + while (my ($fmt, %opt) = each %known_snapshot_formats) { + my $hash = $refname; + my $sfx; + $hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//; + next unless $sfx = $1; + # a valid suffix was found, so set the snapshot format + # and reset the hash parameter + $input_params{'snapshot_format'} = $fmt; + $input_params{'hash'} = $hash; + # we also set the format suffix to the one requested + # in the URL: this way a request for e.g. .tgz returns + # a .tgz instead of a .tar.gz + $known_snapshot_formats{$fmt}{'suffix'} = $sfx; + } + } } evaluate_path_info(); -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO 2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta 2008-10-19 12:11 ` [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO Giuseppe Bilotta @ 2008-10-19 14:24 ` Giuseppe Bilotta 2008-10-19 14:24 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta 2008-10-21 16:44 ` [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO Jakub Narebski 2008-10-20 10:49 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Jakub Narebski 2 siblings, 2 replies; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-19 14:24 UTC (permalink / raw) To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta We parse requests for $project/snapshot/$head.$sfx as equivalent to $project/snapshot/$head?sf=$sfx, where $sfx is any of the known (although not necessarily supported) snapshot formats (or its default suffix). The filename for the resulting package preserves the requested extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz gives a .tar.gz), although for obvious reasons it doesn't preserve the basename (git/snapshot/next.tgz returns a file names git-next.tgz). This introduces a potential case for ambiguity if a project has a head that ends with a snapshot-like suffix (.zip, .tgz, .tar.gz, etc) and the sf CGI parameter is not present; however, gitweb only produces URLs with the sf parameter, so this is only a potential issue for hand-coded URLs for extremely unusual project. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> --- I had second thoughts on this. Now we always look for the snapshot extension if the sf CGI parameter is missing, even if the project has a head that matches the full pseudo-refname $head.$sfx. The reason for this is that (1) there is no ambiguity for gitweb-generated URLs (2) the only URLs that could fail are hand-made URLs for extremely unusual projects and (3) it allows us to set gitweb up to generate (unambiguous) URLs without the sf CGI parameter. This also means that I can add 3 patches to the series, instead of just one: * patch #6 that parses the new format * patch #7 that generates the new URLs * patch #8 for some code refactoring gitweb/gitweb.perl | 34 ++++++++++++++++++++++++++++++++++ 1 files changed, 34 insertions(+), 0 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 99c8c20..e9e9e60 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -609,6 +609,40 @@ sub evaluate_path_info { $input_params{'hash_parent'} ||= $parentrefname; } } + + # for the snapshot action, we allow URLs in the form + # $project/snapshot/$hash.ext + # where .ext determines the snapshot and gets removed from the + # passed $refname to provide the $hash. + # + # To be able to tell that $refname includes the format extension, we + # require the following two conditions to be satisfied: + # - the hash input parameter MUST have been set from the $refname part + # of the URL (i.e. they must be equal) + # - the snapshot format MUST NOT have been defined already + if ($input_params{'action'} eq 'snapshot' && defined $refname && + $refname eq $input_params{'hash'} && + !defined $input_params{'snapshot_format'}) { + # We loop over the known snapshot formats, checking for + # extensions. Allowed extensions are both the defined suffix + # (which includes the initial dot already) and the snapshot + # format key itself, with a prepended dot + while (my ($fmt, %opt) = each %known_snapshot_formats) { + my $hash = $refname; + my $sfx; + $hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//; + next unless $sfx = $1; + # a valid suffix was found, so set the snapshot format + # and reset the hash parameter + $input_params{'snapshot_format'} = $fmt; + $input_params{'hash'} = $hash; + # we also set the format suffix to the one requested + # in the URL: this way a request for e.g. .tgz returns + # a .tgz instead of a .tar.gz + $known_snapshot_formats{$fmt}{'suffix'} = $sfx; + last; + } + } } evaluate_path_info(); -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO 2008-10-19 14:24 ` [PATHv2 6/8] " Giuseppe Bilotta @ 2008-10-19 14:24 ` Giuseppe Bilotta 2008-10-19 14:24 ` [PATHv2 8/8] gitweb: make the supported snapshot formats array global Giuseppe Bilotta 2008-11-01 0:18 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Jakub Narebski 2008-10-21 16:44 ` [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO Jakub Narebski 1 sibling, 2 replies; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-19 14:24 UTC (permalink / raw) To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta When PATH_INFO is active, get rid of the sf CGI parameter by embedding the snapshot format information in the PATH_INFO URL, in the form of an appropriate extension. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> --- gitweb/gitweb.perl | 21 +++++++++++++++++++++ 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index e9e9e60..5fd5a1f 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -795,6 +795,7 @@ sub href (%) { # - action # - hash_parent or hash_parent_base:/file_parent # - hash or hash_base:/file_name + # - the snapshot_format as an appropriate suffix # When the script is the root DirectoryIndex for the domain, # $href here would be something like http://gitweb.example.com/ @@ -806,6 +807,10 @@ sub href (%) { $href .= "/".esc_url($params{'project'}) if defined $params{'project'}; delete $params{'project'}; + # since we destructively absorb parameters, we keep this + # boolean that remembers if we're handling a snapshot + my $is_snapshot = $params{'action'} eq 'snapshot'; + # Summary just uses the project path URL, any other action is # added to the URL if (defined $params{'action'}) { @@ -845,6 +850,22 @@ sub href (%) { $href .= esc_url($params{'hash'}); delete $params{'hash'}; } + + # If the action was a snapshot, we can absorb the + # snapshot_format parameter too + if ($is_snapshot) { + my $fmt = $params{'snapshot_format'}; + # snapshot_format should always be defined when href() + # is called, but just in case some code forgets, we + # fall back to the default + if (!$fmt) { + my @snapshot_fmts = gitweb_check_feature('snapshot'); + @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); + $fmt = $snapshot_fmts[0]; + } + $href .= $known_snapshot_formats{$fmt}{'suffix'}; + delete $params{'snapshot_format'}; + } } # now encode the parameters explicitly -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATHv2 8/8] gitweb: make the supported snapshot formats array global 2008-10-19 14:24 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta @ 2008-10-19 14:24 ` Giuseppe Bilotta 2008-11-02 1:54 ` Jakub Narebski 2008-11-01 0:18 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Jakub Narebski 1 sibling, 1 reply; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-19 14:24 UTC (permalink / raw) To: git; +Cc: Jakub Narebski, Petr Baudis, Junio C Hamano, Giuseppe Bilotta The @snapshot_fmts array, containing the list of the supported snapshot formats, was recreated locally in three different routines (with the different name @supported_fmts in one of them). Its local generation is particularly expensive because two of the callers, href() and format_snapshot_links(), are often called many times in a single page. Simplify code and speed up page generation by making the array global. Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> --- gitweb/gitweb.perl | 21 ++++++++------------- 1 files changed, 8 insertions(+), 13 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 5fd5a1f..d1475b7 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -748,6 +748,10 @@ if (defined $searchtext) { our $git_dir; $git_dir = "$projectroot/$project" if $project; +# list of supported snapshot formats +our @snapshot_fmts = gitweb_check_feature('snapshot'); +@snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); + # dispatch if (!defined $action) { if (defined $hash) { @@ -858,11 +862,7 @@ sub href (%) { # snapshot_format should always be defined when href() # is called, but just in case some code forgets, we # fall back to the default - if (!$fmt) { - my @snapshot_fmts = gitweb_check_feature('snapshot'); - @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); - $fmt = $snapshot_fmts[0]; - } + $fmt ||= $snapshot_fmts[0]; $href .= $known_snapshot_formats{$fmt}{'suffix'}; delete $params{'snapshot_format'}; } @@ -1695,8 +1695,6 @@ sub format_diff_line { # linked. Pass the hash of the tree/commit to snapshot. sub format_snapshot_links { my ($hash) = @_; - my @snapshot_fmts = gitweb_check_feature('snapshot'); - @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); my $num_fmts = @snapshot_fmts; if ($num_fmts > 1) { # A parenthesized list of links bearing format names. @@ -4898,20 +4896,17 @@ sub git_tree { } sub git_snapshot { - my @supported_fmts = gitweb_check_feature('snapshot'); - @supported_fmts = filter_snapshot_fmts(@supported_fmts); - my $format = $input_params{'snapshot_format'}; - if (!@supported_fmts) { + if (!@snapshot_fmts) { die_error(403, "Snapshots not allowed"); } # default to first supported snapshot format - $format ||= $supported_fmts[0]; + $format ||= $snapshot_fmts[0]; if ($format !~ m/^[a-z0-9]+$/) { die_error(400, "Invalid snapshot format parameter"); } elsif (!exists($known_snapshot_formats{$format})) { die_error(400, "Unknown snapshot format"); - } elsif (!grep($_ eq $format, @supported_fmts)) { + } elsif (!grep($_ eq $format, @snapshot_fmts)) { die_error(403, "Unsupported snapshot format"); } -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATHv2 8/8] gitweb: make the supported snapshot formats array global 2008-10-19 14:24 ` [PATHv2 8/8] gitweb: make the supported snapshot formats array global Giuseppe Bilotta @ 2008-11-02 1:54 ` Jakub Narebski 2008-11-02 8:50 ` Junio C Hamano 0 siblings, 1 reply; 26+ messages in thread From: Jakub Narebski @ 2008-11-02 1:54 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano On Sun, 19 Oct 2008, Giuseppe Bilotta wrote: > The @snapshot_fmts array, containing the list of the supported snapshot > formats, was recreated locally in three different routines (with the > different name @supported_fmts in one of them). > > Its local generation is particularly expensive because two of the > callers, href() and format_snapshot_links(), are often called many times > in a single page. > > Simplify code and speed up page generation by making the array global. > > Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Very good idea, although I'd prefer for this patch to come first; it is not controversial contrary to other patches in this (sub)series which IMHO needs some thought first. Acked-by: Jakub Narebski <jnareb@gmail.com> > --- > gitweb/gitweb.perl | 21 ++++++++------------- > 1 files changed, 8 insertions(+), 13 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 5fd5a1f..d1475b7 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -748,6 +748,10 @@ if (defined $searchtext) { > our $git_dir; > $git_dir = "$projectroot/$project" if $project; > > +# list of supported snapshot formats > +our @snapshot_fmts = gitweb_check_feature('snapshot'); > +@snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); > + > # dispatch > if (!defined $action) { > if (defined $hash) { > @@ -858,11 +862,7 @@ sub href (%) { > # snapshot_format should always be defined when href() > # is called, but just in case some code forgets, we > # fall back to the default > - if (!$fmt) { > - my @snapshot_fmts = gitweb_check_feature('snapshot'); > - @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); > - $fmt = $snapshot_fmts[0]; > - } > + $fmt ||= $snapshot_fmts[0]; > $href .= $known_snapshot_formats{$fmt}{'suffix'}; > delete $params{'snapshot_format'}; > } > @@ -1695,8 +1695,6 @@ sub format_diff_line { > # linked. Pass the hash of the tree/commit to snapshot. > sub format_snapshot_links { > my ($hash) = @_; > - my @snapshot_fmts = gitweb_check_feature('snapshot'); > - @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); > my $num_fmts = @snapshot_fmts; > if ($num_fmts > 1) { > # A parenthesized list of links bearing format names. > @@ -4898,20 +4896,17 @@ sub git_tree { > } > > sub git_snapshot { > - my @supported_fmts = gitweb_check_feature('snapshot'); > - @supported_fmts = filter_snapshot_fmts(@supported_fmts); > - > my $format = $input_params{'snapshot_format'}; > - if (!@supported_fmts) { > + if (!@snapshot_fmts) { > die_error(403, "Snapshots not allowed"); > } > # default to first supported snapshot format > - $format ||= $supported_fmts[0]; > + $format ||= $snapshot_fmts[0]; > if ($format !~ m/^[a-z0-9]+$/) { > die_error(400, "Invalid snapshot format parameter"); > } elsif (!exists($known_snapshot_formats{$format})) { > die_error(400, "Unknown snapshot format"); > - } elsif (!grep($_ eq $format, @supported_fmts)) { > + } elsif (!grep($_ eq $format, @snapshot_fmts)) { > die_error(403, "Unsupported snapshot format"); > } > > -- > 1.5.6.5 > > -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 8/8] gitweb: make the supported snapshot formats array global 2008-11-02 1:54 ` Jakub Narebski @ 2008-11-02 8:50 ` Junio C Hamano 0 siblings, 0 replies; 26+ messages in thread From: Junio C Hamano @ 2008-11-02 8:50 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: Jakub Narebski, git, Petr Baudis Jakub Narebski <jnareb@gmail.com> writes: > Very good idea, although I'd prefer for this patch to come first; > it is not controversial contrary to other patches in this (sub)series > which IMHO needs some thought first. > > Acked-by: Jakub Narebski <jnareb@gmail.com> I've dismissed the three patches 6-8 (without prejudice, as I lost track of them); please resubmit and re-ack. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO 2008-10-19 14:24 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta 2008-10-19 14:24 ` [PATHv2 8/8] gitweb: make the supported snapshot formats array global Giuseppe Bilotta @ 2008-11-01 0:18 ` Jakub Narebski 2008-11-01 12:57 ` Giuseppe Bilotta 1 sibling, 1 reply; 26+ messages in thread From: Jakub Narebski @ 2008-11-01 0:18 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano On Sun, 19 Oct 2008, Giuseppe Bilotta wrote: I'm sorry for the delay. > When PATH_INFO is active, get rid of the sf CGI parameter by embedding > the snapshot format information in the PATH_INFO URL, in the form of an > appropriate extension. The question is: should we use format suffix (e.g. 'tar.gz'), or format name (e.g. 'tgz')? The latter is later easier to parse, see comments to first patch in better snapshot support for path_info. > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index e9e9e60..5fd5a1f 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -795,6 +795,7 @@ sub href (%) { > # - action > # - hash_parent or hash_parent_base:/file_parent > # - hash or hash_base:/file_name > + # - the snapshot_format as an appropriate suffix > > # When the script is the root DirectoryIndex for the domain, > # $href here would be something like http://gitweb.example.com/ Good. > @@ -806,6 +807,10 @@ sub href (%) { > $href .= "/".esc_url($params{'project'}) if defined $params{'project'}; > delete $params{'project'}; > > + # since we destructively absorb parameters, we keep this > + # boolean that remembers if we're handling a snapshot > + my $is_snapshot = $params{'action'} eq 'snapshot'; > + Side note: we destructively absorb parameters, because parameters which are not absorbed are then used to generate query string part of URL. Deleting parameter but remembering the fact that it was used is one (but not only) solution. > # Summary just uses the project path URL, any other action is > # added to the URL > if (defined $params{'action'}) { > @@ -845,6 +850,22 @@ sub href (%) { > $href .= esc_url($params{'hash'}); > delete $params{'hash'}; > } > + > + # If the action was a snapshot, we can absorb the > + # snapshot_format parameter too > + if ($is_snapshot) { > + my $fmt = $params{'snapshot_format'}; > + # snapshot_format should always be defined when href() > + # is called, but just in case some code forgets, we > + # fall back to the default > + if (!$fmt) { > + my @snapshot_fmts = gitweb_check_feature('snapshot'); > + @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); > + $fmt = $snapshot_fmts[0]; > + } I anderstand that the above code is improved with new patch? > + $href .= $known_snapshot_formats{$fmt}{'suffix'}; Again: should we use snapshot prefix, or snapshot name, which means here do we use $known_snapshot_formats{$fmt}{'suffix'}; or just $fmt; ? > + delete $params{'snapshot_format'}; > + } > } > > # now encode the parameters explicitly > -- > 1.5.6.5 > > -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO 2008-11-01 0:18 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Jakub Narebski @ 2008-11-01 12:57 ` Giuseppe Bilotta 0 siblings, 0 replies; 26+ messages in thread From: Giuseppe Bilotta @ 2008-11-01 12:57 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano On Sat, Nov 1, 2008 at 1:18 AM, Jakub Narebski <jnareb@gmail.com> wrote: > On Sun, 19 Oct 2008, Giuseppe Bilotta wrote: > > I'm sorry for the delay. No problem. >> When PATH_INFO is active, get rid of the sf CGI parameter by embedding >> the snapshot format information in the PATH_INFO URL, in the form of an >> appropriate extension. > > The question is: should we use format suffix (e.g. 'tar.gz'), > or format name (e.g. 'tgz')? The latter is later easier to parse, > see comments to first patch in better snapshot support for path_info. I think it's essentially a matter of deciding which extension to use by default on output, and accepting either extension on input. >> + # since we destructively absorb parameters, we keep this >> + # boolean that remembers if we're handling a snapshot >> + my $is_snapshot = $params{'action'} eq 'snapshot'; >> + > > Side note: we destructively absorb parameters, because parameters > which are not absorbed are then used to generate query string part > of URL. > > Deleting parameter but remembering the fact that it was used is one > (but not only) solution. Well, in the specific case of the 'action' parameter we could just keep it around and only discard it at the end. >> + if (!$fmt) { >> + my @snapshot_fmts = gitweb_check_feature('snapshot'); >> + @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts); >> + $fmt = $snapshot_fmts[0]; >> + } > > I anderstand that the above code is improved with new patch? Yup, I followed your suggestion of putting the refactoring earlier. >> + $href .= $known_snapshot_formats{$fmt}{'suffix'}; > > Again: should we use snapshot prefix, or snapshot name, which means here > do we use $known_snapshot_formats{$fmt}{'suffix'}; or just $fmt; ? I really don't know, either is fine for me. Maybe using $fmt has the benefit that it copes better with some buggy versions of some browser (like Opera) that tend to strip the .gz from .tar.gz files ... -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO 2008-10-19 14:24 ` [PATHv2 6/8] " Giuseppe Bilotta 2008-10-19 14:24 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta @ 2008-10-21 16:44 ` Jakub Narebski 2008-10-21 18:36 ` Giuseppe Bilotta 1 sibling, 1 reply; 26+ messages in thread From: Jakub Narebski @ 2008-10-21 16:44 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano I like the idea behind this patch, to enable to use path_info for as much gitweb parameters as possible. After this patch series the only parameters which wouldn't be possible to represent in path_info would be: * @extra_options ('opt') multi-valued parameter, used to pass thinks like '--no-merges', which cannot be fit in the "simplified" list-like (as opposed to hash-like query string) path_info URL. * $searchtype ('st') and $searchtext ('s') etc. parameters, which are generated by HTML form, and are naturally generated in query string format. * $page ('pg') parameter, which could theoretically be added as last part of path_info URL, for example $project/next/2/... if not for pesky $project/history/next:/Documentation/2/ where you cannot be sure that having /<number>/ at the end is rare. * $order ('o') parameter, which would be hard to fit in path_info, with its limitation of parameters being specified by position. Or even next to impossible. * 'by_tag'... But I'd rather have this patch series to be in separate thread... On Sun, 19 Oct 2008, Giuseppe Bilotta wrote: > We parse requests for $project/snapshot/$head.$sfx as equivalent to > $project/snapshot/$head?sf=$sfx, where $sfx is any of the known > (although not necessarily supported) snapshot formats (or its default > suffix). > > The filename for the resulting package preserves the requested > extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz > gives a .tar.gz), although for obvious reasons it doesn't preserve the > basename (git/snapshot/next.tgz returns a file names git-next.tgz). That is a bit of difference from sf=<format> in CGI query string, where <format> is always a name of a format (for example 'tgz' or 'tbz2'), and actual suffix is defined in %known_snapshot_formats (for example '.tar.gz' and '.tar.bz2' respectively). Now you can specify snapshot format either either by its name, for example 'tgz' (which is simple lookup in hash) which result in proposed filename with '.tgz' suffix, or you can specify suffix, for example 'tar.gz' (which requires searching through all hash) which result in proposed filename with '.tar.gz' suffix. This is a bit of inconsistency; to be consistent with how we handle 'sf' CGI parameter we would translate 'tgz' $sfx into 'tar.gz' in snapshot filename. This would also cover currently purely theoretical case when different snapshot formats (for example 'tgz' and 'tgz9') would use the same snapshot suffix (extension), but differ for example in parameters passed to compressor (for example '-9' or '--best' in the 'tgz9' case). On the other hand one would expect that when URL which looks like URL to snapshot ends with '.$sfx', then filename for snapshot would also end with '.$sfx'. This certainly requires some further thoughts. > > This introduces a potential case for ambiguity if a project has a head > that ends with a snapshot-like suffix (.zip, .tgz, .tar.gz, etc) and the > sf CGI parameter is not present; however, gitweb only produces URLs with > the sf parameter, so this is only a potential issue for hand-coded URLs > for extremely unusual project. I think you wanted to say here "_currently_ produces URLs with the 'sf' parameter" as the next patch in series changes this. > > Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> > --- > > I had second thoughts on this. Now we always look for the snapshot extension if > the sf CGI parameter is missing, even if the project has a head that matches > the full pseudo-refname $head.$sfx. > > The reason for this is that (1) there is no ambiguity for gitweb-generated > URLs (2) the only URLs that could fail are hand-made URLs for extremely > unusual projects and (3) it allows us to set gitweb up to generate > (unambiguous) URLs without the sf CGI parameter. This is also simpler and cheaper solution. > > This also means that I can add 3 patches to the series, instead of just one: > * patch #6 that parses the new format > * patch #7 that generates the new URLs > * patch #8 for some code refactoring Now, I haven't yet read the last patch in series, so I don't know if it is independent refactoring, making sense even before patches named #6 and #7 here, or is it connected with searching for snapshot format by suffix it uses. If the former, it should be done upfront, as it shouldn't need discussion, and being easier to be accepted into git.git. If the latter, then it should probably be folded (squashed) into #6, first patch in the series. > > gitweb/gitweb.perl | 34 ++++++++++++++++++++++++++++++++++ > 1 files changed, 34 insertions(+), 0 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 99c8c20..e9e9e60 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -609,6 +609,40 @@ sub evaluate_path_info { > $input_params{'hash_parent'} ||= $parentrefname; > } > } > + > + # for the snapshot action, we allow URLs in the form > + # $project/snapshot/$hash.ext > + # where .ext determines the snapshot and gets removed from the > + # passed $refname to provide the $hash. > + # > + # To be able to tell that $refname includes the format extension, we > + # require the following two conditions to be satisfied: > + # - the hash input parameter MUST have been set from the $refname part > + # of the URL (i.e. they must be equal) This means no "$project/.tgz?h=next", isn't it? > + # - the snapshot format MUST NOT have been defined already I would add "which means that 'sf' parameter is not set in URL", or something like that as the last line of above comment. I like that the code is so well commented, by the way. > + if ($input_params{'action'} eq 'snapshot' && defined $refname && > + $refname eq $input_params{'hash'} && Minor nit. I would use here (the question of style / better readability): + if ($input_params{'action'} eq 'snapshot' && + defined $refname && $refname eq $input_params{'hash'} && to have both conditions about $refname in the same line. > + !defined $input_params{'snapshot_format'}) { > + # We loop over the known snapshot formats, checking for > + # extensions. Allowed extensions are both the defined suffix > + # (which includes the initial dot already) and the snapshot > + # format key itself, with a prepended dot > + while (my ($fmt, %opt) = each %known_snapshot_formats) { > + my $hash = $refname; > + my $sfx; > + $hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//; > + next unless $sfx = $1; > + # a valid suffix was found, so set the snapshot format > + # and reset the hash parameter > + $input_params{'snapshot_format'} = $fmt; > + $input_params{'hash'} = $hash; > + # we also set the format suffix to the one requested > + # in the URL: this way a request for e.g. .tgz returns > + # a .tgz instead of a .tar.gz > + $known_snapshot_formats{$fmt}{'suffix'} = $sfx; > + last; > + } I'm not sure if it worth (see comment at the beginning of this mail) adding this code, or just allow $sfx to be snapshot _name_ (key in %known_snapshot_formats hash). Otherwise it would be as simple as checking if $known_snapshot_formats{$sfx} exists (assuming that snapshot format names does not contain '.'). If we decide to go more complicated route, then refactoring it in such a way that suffixes are also keys to %known_snapshot_formats would be preferred... err, sorry, not so simple. But refactoring this check into separate subroutine (as I think last patch in series does) would be good idea. Also, I'd rather you checked if the $refname part contains '.' for it to even consider that it can be suffix. > + } > } > evaluate_path_info(); > > -- > 1.5.6.5 > > -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO 2008-10-21 16:44 ` [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO Jakub Narebski @ 2008-10-21 18:36 ` Giuseppe Bilotta 2008-10-21 19:09 ` Junio C Hamano 0 siblings, 1 reply; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-21 18:36 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano On Tue, Oct 21, 2008 at 6:44 PM, Jakub Narebski <jnareb@gmail.com> wrote: > I like the idea behind this patch, to enable to use path_info for as > much gitweb parameters as possible. After this patch series the only > parameters which wouldn't be possible to represent in path_info would > be: > * @extra_options ('opt') multi-valued parameter, used to pass > thinks like '--no-merges', which cannot be fit in the "simplified" > list-like (as opposed to hash-like query string) path_info URL. > * $searchtype ('st') and $searchtext ('s') etc. parameters, which > are generated by HTML form, and are naturally generated in query > string format. > * $page ('pg') parameter, which could theoretically be added as last > part of path_info URL, for example $project/next/2/... if not for > pesky $project/history/next:/Documentation/2/ where you cannot be > sure that having /<number>/ at the end is rare. > * $order ('o') parameter, which would be hard to fit in path_info, > with its limitation of parameters being specified by position. > Or even next to impossible. > * 'by_tag'... > > But I'd rather have this patch series to be in separate thread... Yes, a posteriori I think it's better too. I'll resend the 5 path_info patches with the minor stylistic corrections you suggested, and send these 3 separately. > On Sun, 19 Oct 2008, Giuseppe Bilotta wrote: > >> We parse requests for $project/snapshot/$head.$sfx as equivalent to >> $project/snapshot/$head?sf=$sfx, where $sfx is any of the known >> (although not necessarily supported) snapshot formats (or its default >> suffix). >> >> The filename for the resulting package preserves the requested >> extensions (so asking for a .tgz gives a .tgz, and asking for a .tar.gz >> gives a .tar.gz), although for obvious reasons it doesn't preserve the >> basename (git/snapshot/next.tgz returns a file names git-next.tgz). > > That is a bit of difference from sf=<format> in CGI query string, where > <format> is always a name of a format (for example 'tgz' or 'tbz2'), > and actual suffix is defined in %known_snapshot_formats (for example > '.tar.gz' and '.tar.bz2' respectively). Now you can specify snapshot > format either either by its name, for example 'tgz' (which is simple > lookup in hash) which result in proposed filename with '.tgz' suffix, > or you can specify suffix, for example 'tar.gz' (which requires > searching through all hash) which result in proposed filename with > '.tar.gz' suffix. > > This is a bit of inconsistency; to be consistent with how we handle > 'sf' CGI parameter we would translate 'tgz' $sfx into 'tar.gz' in > snapshot filename. This would also cover currently purely theoretical > case when different snapshot formats (for example 'tgz' and 'tgz9') > would use the same snapshot suffix (extension), but differ for example > in parameters passed to compressor (for example '-9' or '--best' in > the 'tgz9' case). > > On the other hand one would expect that when URL which looks like > URL to snapshot ends with '.$sfx', then filename for snapshot would > also end with '.$sfx'. > > This certainly requires some further thoughts. What I decided was to set gitweb to always produce links with the suffix (.e.g .tar.gz), but I saw no particular reason not to accept the shorter version which is (1) commonly used as a suffix as well and (2) happens to be the actual format key used by gitweb. A different, possibly cleaner approach, but a more extensive change, would be to have each format describe a list of suffixes, defaulting to the first one on creation by identifying all of them. This is more invasive because all of the uses of {'suffix'} have to be replaced with {'suffix'}[0], or something like that (maybe we could add a separate key 'other_suffixes' instead?) >> This introduces a potential case for ambiguity if a project has a head >> that ends with a snapshot-like suffix (.zip, .tgz, .tar.gz, etc) and the >> sf CGI parameter is not present; however, gitweb only produces URLs with >> the sf parameter, so this is only a potential issue for hand-coded URLs >> for extremely unusual project. > > I think you wanted to say here "_currently_ produces URLs with the 'sf' > parameter" as the next patch in series changes this. Ah yes, good point. >> Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> >> --- >> >> I had second thoughts on this. Now we always look for the snapshot extension if >> the sf CGI parameter is missing, even if the project has a head that matches >> the full pseudo-refname $head.$sfx. >> >> The reason for this is that (1) there is no ambiguity for gitweb-generated >> URLs (2) the only URLs that could fail are hand-made URLs for extremely >> unusual projects and (3) it allows us to set gitweb up to generate >> (unambiguous) URLs without the sf CGI parameter. > > This is also simpler and cheaper solution. That, too 8-) >> This also means that I can add 3 patches to the series, instead of just one: >> * patch #6 that parses the new format >> * patch #7 that generates the new URLs >> * patch #8 for some code refactoring > > Now, I haven't yet read the last patch in series, so I don't know if > it is independent refactoring, making sense even before patches named > #6 and #7 here, or is it connected with searching for snapshot format > by suffix it uses. If the former, it should be done upfront, as it > shouldn't need discussion, and being easier to be accepted into git.git. > If the latter, then it should probably be folded (squashed) into #6, > first patch in the series. In fact, patch #8 can be written independently of the other too, and would provide a significant speed benefit for generation of pages with lots of 'snapshot' links: what it does is just to make the 'supported formats' array global, preparing it only once instead of re-preparing it every time a snapshot link is created. >> gitweb/gitweb.perl | 34 ++++++++++++++++++++++++++++++++++ >> 1 files changed, 34 insertions(+), 0 deletions(-) >> >> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl >> index 99c8c20..e9e9e60 100755 >> --- a/gitweb/gitweb.perl >> +++ b/gitweb/gitweb.perl >> @@ -609,6 +609,40 @@ sub evaluate_path_info { >> $input_params{'hash_parent'} ||= $parentrefname; >> } >> } >> + >> + # for the snapshot action, we allow URLs in the form >> + # $project/snapshot/$hash.ext >> + # where .ext determines the snapshot and gets removed from the >> + # passed $refname to provide the $hash. >> + # >> + # To be able to tell that $refname includes the format extension, we >> + # require the following two conditions to be satisfied: >> + # - the hash input parameter MUST have been set from the $refname part >> + # of the URL (i.e. they must be equal) > > This means no "$project/.tgz?h=next", isn't it? Right. >> + # - the snapshot format MUST NOT have been defined already > > I would add "which means that 'sf' parameter is not set in URL", or > something like that as the last line of above comment. Good idea. I'll make it an 'e.g.' to keep the comment valid for future additional parameter evaluation such as command-line input. > I like that the code is so well commented, by the way. Thanks. >> + if ($input_params{'action'} eq 'snapshot' && defined $refname && >> + $refname eq $input_params{'hash'} && > > Minor nit. > > I would use here (the question of style / better readability): > > + if ($input_params{'action'} eq 'snapshot' && > + defined $refname && $refname eq $input_params{'hash'} && > > to have both conditions about $refname in the same line. Yes, it'd look much better. >> + !defined $input_params{'snapshot_format'}) { >> + # We loop over the known snapshot formats, checking for >> + # extensions. Allowed extensions are both the defined suffix >> + # (which includes the initial dot already) and the snapshot >> + # format key itself, with a prepended dot >> + while (my ($fmt, %opt) = each %known_snapshot_formats) { >> + my $hash = $refname; >> + my $sfx; >> + $hash =~ s/(\Q$opt{'suffix'}\E|\Q.$fmt\E)$//; >> + next unless $sfx = $1; >> + # a valid suffix was found, so set the snapshot format >> + # and reset the hash parameter >> + $input_params{'snapshot_format'} = $fmt; >> + $input_params{'hash'} = $hash; >> + # we also set the format suffix to the one requested >> + # in the URL: this way a request for e.g. .tgz returns >> + # a .tgz instead of a .tar.gz >> + $known_snapshot_formats{$fmt}{'suffix'} = $sfx; >> + last; >> + } > > I'm not sure if it worth (see comment at the beginning of this mail) > adding this code, or just allow $sfx to be snapshot _name_ (key in > %known_snapshot_formats hash). > > Otherwise it would be as simple as checking if $known_snapshot_formats{$sfx} > exists (assuming that snapshot format names does not contain '.'). > > If we decide to go more complicated route, then refactoring it in such > a way that suffixes are also keys to %known_snapshot_formats would be > preferred... err, sorry, not so simple. But refactoring this check > into separate subroutine (as I think last patch in series does) would > be good idea. See comments above. > Also, I'd rather you checked if the $refname part contains '.' for it > to even consider that it can be suffix. Ah, good idea. -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO 2008-10-21 18:36 ` Giuseppe Bilotta @ 2008-10-21 19:09 ` Junio C Hamano 0 siblings, 0 replies; 26+ messages in thread From: Junio C Hamano @ 2008-10-21 19:09 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: Jakub Narebski, git, Petr Baudis, Junio C Hamano "Giuseppe Bilotta" <giuseppe.bilotta@gmail.com> writes: >> But I'd rather have this patch series to be in separate thread... > > Yes, a posteriori I think it's better too. I'll resend the 5 path_info > patches with the minor stylistic corrections you suggested, and send > these 3 separately. I've only been watching from the sidelines, but the discussion seemed sensible. With the resend with Acks I think they are already 'next' material. Thanks, both of you. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 5/5] gitweb: generate parent..current URLs 2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta 2008-10-19 12:11 ` [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO Giuseppe Bilotta 2008-10-19 14:24 ` [PATHv2 6/8] " Giuseppe Bilotta @ 2008-10-20 10:49 ` Jakub Narebski 2008-10-20 14:57 ` Giuseppe Bilotta 2 siblings, 1 reply; 26+ messages in thread From: Jakub Narebski @ 2008-10-20 10:49 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano On Thu, 16 Oct 2008, Giuseppe Bilotta wrote: > If use_pathinfo is enabled, href now creates links that contain paths in > the form $project/$action/oldhash:/oldname..newhash:/newname for actions > that use hash_parent etc. > > If any of the filename contains two consecutive dots, it's kept as a CGI > parameter since the resulting path would otherwise be ambiguous. Note that this part is _required_ after previous patch, because otherwise gitweb could generate path_info links which point to different view than intended, when filename contains '..' in it, like for example some test vectors in git test suite, for example files in t/t5515 > > Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> For what is worth: Acked-by: Jakub Narebski <jnareb@gmail.com> BTW. I wanted to test this using gitweb-Mechanize test from Lea Wiemann GSoC 2008 work on gitweb caching, but I couldn't make it run as single individual test (cd t && ./t9503-gitweb-Mechanize.sh); it requires to run as "make test" and I didn't want that. I could have resurrected my initial version of Test::WWW::Mechanize::CGI test... > --- > gitweb/gitweb.perl | 30 +++++++++++++++++++++++++----- > 1 files changed, 25 insertions(+), 5 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 1a7b0b9..f4642e7 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -759,7 +759,8 @@ sub href (%) { > # try to put as many parameters as possible in PATH_INFO: > # - project name > # - action > - # - hash or hash_base:/filename > + # - hash_parent or hash_parent_base:/file_parent > + # - hash or hash_base:/file_name Minor nit: this contain independent change 'filename' -> 'file_name', but I think it is not worth separating... > > # When the script is the root DirectoryIndex for the domain, > # $href here would be something like http://gitweb.example.com/ > @@ -778,17 +779,36 @@ sub href (%) { > delete $params{'action'}; > } > > - # Finally, we put either hash_base:/file_name or hash > + # Next, we put hash_parent_base:/file_parent..hash_base:/file_name, > + # stripping nonexistent or useless pieces > + $href .= "/" if ($params{'hash_base'} || $params{'hash_parent_base'} > + || $params{'hash_parent'} || $params{'hash'}); Nice trick (and required change). > if (defined $params{'hash_base'}) { > - $href .= "/".esc_url($params{'hash_base'}); > - if (defined $params{'file_name'}) { > + if (defined $params{'hash_parent_base'}) { > + $href .= esc_url($params{'hash_parent_base'}); > + # skip the file_parent if it's the same as the file_name > + delete $params{'file_parent'} if $params{'file_parent'} eq $params{'file_name'}; > + if (defined $params{'file_parent'} && $params{'file_parent'} !~ /\.\./) { > + $href .= ":/".esc_url($params{'file_parent'}); > + delete $params{'file_parent'}; > + } Side note: I wonder if we should use esc_url or esc_param here... > + $href .= ".."; > + delete $params{'hash_parent'}; > + delete $params{'hash_parent_base'}; > + } elsif (defined $params{'hash_parent'}) { > + $href .= esc_url($params{'hash_parent'}). ".."; > + delete $params{'hash_parent'}; > + } > + > + $href .= esc_url($params{'hash_base'}); > + if (defined $params{'file_name'} && $params{'file_name'} !~ /\.\./) { > $href .= ":/".esc_url($params{'file_name'}); > delete $params{'file_name'}; > } > delete $params{'hash'}; > delete $params{'hash_base'}; > } elsif (defined $params{'hash'}) { > - $href .= "/".esc_url($params{'hash'}); > + $href .= esc_url($params{'hash'}); > delete $params{'hash'}; > } > } > -- > 1.5.6.5 > > -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 5/5] gitweb: generate parent..current URLs 2008-10-20 10:49 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Jakub Narebski @ 2008-10-20 14:57 ` Giuseppe Bilotta 0 siblings, 0 replies; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-20 14:57 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano On Mon, Oct 20, 2008 at 12:49 PM, Jakub Narebski <jnareb@gmail.com> wrote: >> - # - hash or hash_base:/filename >> + # - hash_parent or hash_parent_base:/file_parent >> + # - hash or hash_base:/file_name > > Minor nit: this contain independent change 'filename' -> 'file_name', > but I think it is not worth separating... Oopsie. Oh well, I was getting so used to all those _ that it felt strange without >> # When the script is the root DirectoryIndex for the domain, >> # $href here would be something like http://gitweb.example.com/ >> @@ -778,17 +779,36 @@ sub href (%) { >> delete $params{'action'}; >> } >> >> - # Finally, we put either hash_base:/file_name or hash >> + # Next, we put hash_parent_base:/file_parent..hash_base:/file_name, >> + # stripping nonexistent or useless pieces >> + $href .= "/" if ($params{'hash_base'} || $params{'hash_parent_base'} >> + || $params{'hash_parent'} || $params{'hash'}); > > Nice trick (and required change). > >> if (defined $params{'hash_base'}) { >> - $href .= "/".esc_url($params{'hash_base'}); >> - if (defined $params{'file_name'}) { >> + if (defined $params{'hash_parent_base'}) { >> + $href .= esc_url($params{'hash_parent_base'}); >> + # skip the file_parent if it's the same as the file_name >> + delete $params{'file_parent'} if $params{'file_parent'} eq $params{'file_name'}; >> + if (defined $params{'file_parent'} && $params{'file_parent'} !~ /\.\./) { >> + $href .= ":/".esc_url($params{'file_parent'}); >> + delete $params{'file_parent'}; >> + } > > Side note: I wonder if we should use esc_url or esc_param here... esc_url, I would say, allowing us to build RFC-compliant URLs. Isn't esc_param for CGI? -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO 2008-10-16 20:27 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta @ 2008-10-20 9:18 ` Jakub Narebski 2008-10-20 14:52 ` Giuseppe Bilotta 1 sibling, 1 reply; 26+ messages in thread From: Jakub Narebski @ 2008-10-20 9:18 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano On Thu, 16 Oct 2008, Giuseppe Bilotta wrote: > This patch makes it possible to use an URL such as > $project/somebranch..otherbranch:/filename to get a diff between > different version of a file. Paths like > $project/$action/somebranch:/somefile..otherbranch:/otherfile are parsed > as well. Just a nitpick: why '$project' and '$action', but 'somebranch', 'otherbranch' and 'filename'? > > All '*diff' actions and in general actions that use $hash_parent[_base] > and $file_parent can now get all of their parameters from PATH_INFO Which currently mean 'shortlog', and I guess in the future would mean also all other log-like views: 'log', 'history', 'search' (the commit message kind, not the pickaxe kind), and perhaps also 'rss'/'atom'. > > Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> For what is worth: Acked-by: Jakub Narebski <jnareb@gmail.com> > --- > gitweb/gitweb.perl | 36 ++++++++++++++++++++++++++++++++++-- > 1 files changed, 34 insertions(+), 2 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 49730f3..1a7b0b9 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -548,7 +548,12 @@ sub evaluate_path_info { > 'history', > ); > > - my ($refname, $pathname) = split(/:/, $path_info, 2); > + # horrible regexp to catch > + # [$hash_parent_base[:$file_parent]..]$hash_parent[:$file_name] This comment is really nice here, to explain what we try to catch. Is it really "horrible"... let others decide. Note: this (as also previous code) makes use of the fact that refname cannot contain ':' as per git-check-ref-format(1), and the fact that gitweb limits $hash* parameters to simple revision syntax, allowing only SHA1 and refnames, and not allowing (see validate_refname() used in $hash* validation) extended SHA1 syntax like <commit-ish>:<path> for $hash ('h') param for 'blob' view: gitweb uses $hash_base and $file_name for that. (But I guess it is too long explanation to put it in comment) Side note: the regexp below allow for $parentpathname to contain '..', but we don't want to rely on such minute detail of implementation detail (because it depends on whether we use greedy or non-greedy matching there). > + my ($parentrefname, $parentpathname, $refname, $pathname) = > + ($path_info =~ /^(?:(.+?)(?::(.+))?\.\.)?(.+?)(?::(.+))?$/); > + > + # first, analyze the 'current' part > if (defined $pathname) { > # we got "branch:filename" or "branch:dir/" > # we could use git_get_type(branch:pathname), but it needs $git_dir > @@ -557,7 +562,13 @@ sub evaluate_path_info { > $input_params{'action'} ||= "tree"; > $pathname =~ s,/$,,; > } else { > - $input_params{'action'} ||= "blob_plain"; > + # the default action depends on whether we had parent info > + # or not > + if ($parentrefname) { > + $input_params{'action'} ||= "blobdiff_plain"; > + } else { > + $input_params{'action'} ||= "blob_plain"; > + } Nice. I was wondering about 'project/hash_parent..hash' syntax, but then I have realized that it doesn't change action (as in 'blob_plain' -> 'blobdiff_plain'), but is always 'shortlog'. By the way, I wonder if it should be 'blobdiff_plain' or just 'blobdiff'. the 'blob_plain' was here to use gitweb as a kind of versioned web server; there is no such equivalent for 'p/hbp..hb:f' syntax. On the other hand it is consistent behavior, always using *_plain... > } > $input_params{'hash_base'} ||= $refname; > $input_params{'file_name'} ||= $pathname; > @@ -577,6 +588,27 @@ sub evaluate_path_info { > $input_params{'hash'} ||= $refname; > } > } > + > + # next, handle the 'parent' part, if present > + if (defined $parentrefname) { > + # a missing pathspec defaults to the 'current' filename, allowing e.g. > + # someproject/blobdiff/oldrev..newrev:/filename > + if ($parentpathname) { > + $parentpathname =~ s,^/+,,; > + $parentpathname =~ s,/$,,; Hmmm... should we strip trailing '/' here? > + $input_params{'file_parent'} ||= $parentpathname; > + } else { > + $input_params{'file_parent'} ||= $input_params{'file_name'}; > + } > + # we assume that hash_parent_base is wanted if a path was specified, > + # or if the action wants hash_base instead of hash Nice comment. > + if (defined $input_params{'file_parent'} || > + grep {$input_params{'action'} eq $_} @wants_base) { My preference of style would be to use here: + grep { $input_params{'action'} eq $_ } @wants_base) { > + $input_params{'hash_parent_base'} ||= $parentrefname; > + } else { > + $input_params{'hash_parent'} ||= $parentrefname; > + } > + } > } > evaluate_path_info(); > > -- > 1.5.6.5 > > -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO 2008-10-20 9:18 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Jakub Narebski @ 2008-10-20 14:52 ` Giuseppe Bilotta 0 siblings, 0 replies; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-20 14:52 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano On Mon, Oct 20, 2008 at 11:18 AM, Jakub Narebski <jnareb@gmail.com> wrote: > On Thu, 16 Oct 2008, Giuseppe Bilotta wrote: > >> This patch makes it possible to use an URL such as >> $project/somebranch..otherbranch:/filename to get a diff between >> different version of a file. Paths like >> $project/$action/somebranch:/somefile..otherbranch:/otherfile are parsed >> as well. > > Just a nitpick: why '$project' and '$action', but 'somebranch', > 'otherbranch' and 'filename'? Good question ... I think I got distracted along the way. >> All '*diff' actions and in general actions that use $hash_parent[_base] >> and $file_parent can now get all of their parameters from PATH_INFO > > Which currently mean 'shortlog', and I guess in the future would mean > also all other log-like views: 'log', 'history', 'search' (the commit > message kind, not the pickaxe kind), and perhaps also 'rss'/'atom'. I'm not sure rss/atom makes sense, but the others were already in my todo list after the shortlog patch, so I'll try to get them ready as soon as I have the time to refactor them as we discussed on IRC. > Side note: the regexp below allow for $parentpathname to contain > '..', but we don't want to rely on such minute detail of implementation > detail (because it depends on whether we use greedy or non-greedy > matching there). > >> + my ($parentrefname, $parentpathname, $refname, $pathname) = >> + ($path_info =~ /^(?:(.+?)(?::(.+))?\.\.)?(.+?)(?::(.+))?$/); Hm, actually it might be a better idea to make the first pathname match non-greedy. >> - $input_params{'action'} ||= "blob_plain"; >> + # the default action depends on whether we had parent info >> + # or not >> + if ($parentrefname) { >> + $input_params{'action'} ||= "blobdiff_plain"; >> + } else { >> + $input_params{'action'} ||= "blob_plain"; >> + } > > Nice. > > I was wondering about 'project/hash_parent..hash' syntax, but then I have > realized that it doesn't change action (as in 'blob_plain' -> 'blobdiff_plain'), > but is always 'shortlog'. > > By the way, I wonder if it should be 'blobdiff_plain' or just 'blobdiff'. > the 'blob_plain' was here to use gitweb as a kind of versioned web > server; there is no such equivalent for 'p/hbp..hb:f' syntax. On the > other hand it is consistent behavior, always using *_plain... Moreover, it allows sending shorter URLs for patches, which are the ones you usually write by hand. >> + >> + # next, handle the 'parent' part, if present >> + if (defined $parentrefname) { >> + # a missing pathspec defaults to the 'current' filename, allowing e.g. >> + # someproject/blobdiff/oldrev..newrev:/filename >> + if ($parentpathname) { >> + $parentpathname =~ s,^/+,,; >> + $parentpathname =~ s,/$,,; > > Hmmm... should we strip trailing '/' here? I must confess I don't remember why I decided that was needed. -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / 2008-10-16 20:27 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta @ 2008-10-18 23:26 ` Jakub Narebski 2008-10-18 23:57 ` Giuseppe Bilotta 1 sibling, 1 reply; 26+ messages in thread From: Jakub Narebski @ 2008-10-18 23:26 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano On Thu, 16 Oct 2008, Giuseppe Bilotta wrote: > When using path info, make filenames start with a / (right after the : > that separates them from the hash base). This minimal change allows > relative navigation to work properly when viewing HTML files in raw > ('blob_plain') mode. This means generating project/action/hash_base:/filename rather than earlier project/action/hash_base:filename, isn't it? For example for http://repo.or.cz/w/git.git/html:/git.html links leads to correct HTML pages, while for http://repo.or.cz/w/git.git/html:git.html they lead to empty gitweb page (no errors, so link checker would be fooled). > > Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Acked-by: Jakub Narebski <jnareb@gmail.com> > --- > gitweb/gitweb.perl | 6 +++--- > 1 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 5337d40..49730f3 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -727,7 +727,7 @@ sub href (%) { > # try to put as many parameters as possible in PATH_INFO: > # - project name > # - action > - # - hash or hash_base:filename > + # - hash or hash_base:/filename > > # When the script is the root DirectoryIndex for the domain, > # $href here would be something like http://gitweb.example.com/ > @@ -746,11 +746,11 @@ sub href (%) { > delete $params{'action'}; > } > > - # Finally, we put either hash_base:file_name or hash > + # Finally, we put either hash_base:/file_name or hash > if (defined $params{'hash_base'}) { > $href .= "/".esc_url($params{'hash_base'}); > if (defined $params{'file_name'}) { > - $href .= ":".esc_url($params{'file_name'}); > + $href .= ":/".esc_url($params{'file_name'}); > delete $params{'file_name'}; > } > delete $params{'hash'}; > -- > 1.5.6.5 > > -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / 2008-10-18 23:26 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Jakub Narebski @ 2008-10-18 23:57 ` Giuseppe Bilotta 2008-10-19 8:43 ` Jakub Narebski 0 siblings, 1 reply; 26+ messages in thread From: Giuseppe Bilotta @ 2008-10-18 23:57 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, Petr Baudis, Junio C Hamano On Sun, Oct 19, 2008 at 1:26 AM, Jakub Narebski <jnareb@gmail.com> wrote: > On Thu, 16 Oct 2008, Giuseppe Bilotta wrote: > >> When using path info, make filenames start with a / (right after the : >> that separates them from the hash base). This minimal change allows >> relative navigation to work properly when viewing HTML files in raw >> ('blob_plain') mode. > > This means generating project/action/hash_base:/filename rather than > earlier project/action/hash_base:filename, isn't it? Exactly. We _accept_ paths with or without the /, but we always generate paths with it. > For example for http://repo.or.cz/w/git.git/html:/git.html links leads > to correct HTML pages, while for http://repo.or.cz/w/git.git/html:git.html > they lead to empty gitweb page (no errors, so link checker would be > fooled). An idea that could be taken into consideration, if the ability to navigate web documents is deemed of primary importance, would be a redirect from the no-slash URL (a hand-coded one, given that with this patch we only generate slashed URLs) to the slashed URL. Not sure it's worth the effort (and reparsing) though: it would obviously be MUCH nicer if we could change the URL without having to actually reload the document ... -- Giuseppe "Oblomov" Bilotta ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / 2008-10-18 23:57 ` Giuseppe Bilotta @ 2008-10-19 8:43 ` Jakub Narebski 0 siblings, 0 replies; 26+ messages in thread From: Jakub Narebski @ 2008-10-19 8:43 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano On Sun, 19 Oct 2008, Giuseppe Bilotta wrote: > On Sun, Oct 19, 2008 at 1:26 AM, Jakub Narebski <jnareb@gmail.com> wrote: >> On Thu, 16 Oct 2008, Giuseppe Bilotta wrote: >> >>> When using path info, make filenames start with a / (right after the : >>> that separates them from the hash base). This minimal change allows >>> relative navigation to work properly when viewing HTML files in raw >>> ('blob_plain') mode. [...] >> For example for http://repo.or.cz/w/git.git/html:/git.html links leads >> to correct HTML pages, while for http://repo.or.cz/w/git.git/html:git.html >> they lead to empty gitweb page (no errors, so link checker would be >> fooled). > > An idea that could be taken into consideration, if the ability to > navigate web documents is deemed of primary importance, would be a > redirect from the no-slash URL (a hand-coded one, given that with this > patch we only generate slashed URLs) to the slashed URL. Not sure it's > worth the effort (and reparsing) though: it would obviously be MUCH > nicer if we could change the URL without having to actually reload the > document ... I think that changing URL without reloading is impossible because of security reasons. For example if you change document.location in JavaScript changing the URL (you can add links to non-existing anchors without reloading) then web browser reloads the page from new URL. What you can do is early redirect using 301 Moved Permanently (or similar) and Location: redirect, using $cgi->redirect() like in git_object(). Also not sure if it is worth the effort... -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 2/5] gitweb: generate project/action/hash URLs 2008-10-16 20:27 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta @ 2008-10-18 23:14 ` Jakub Narebski 1 sibling, 0 replies; 26+ messages in thread From: Jakub Narebski @ 2008-10-18 23:14 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano On Thu, 16 Oct 2008, Giuseppe Bilotta wrote: > When generating path info URLs, reduce the number of CGI parameters by > embedding action and hash_parent:filename or hash in the path. > > Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> Nice, well commented, and clean. Acked-by: Jakub Narebski <jnareb@gmail.com> > --- > gitweb/gitweb.perl | 33 ++++++++++++++++++++++++++++++--- > 1 files changed, 30 insertions(+), 3 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 6d0dc26..5337d40 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -724,14 +724,41 @@ sub href (%) { > > my ($use_pathinfo) = gitweb_check_feature('pathinfo'); > if ($use_pathinfo) { > - # use PATH_INFO for project name > + # try to put as many parameters as possible in PATH_INFO: > + # - project name > + # - action > + # - hash or hash_base:filename > + > + # When the script is the root DirectoryIndex for the domain, > + # $href here would be something like http://gitweb.example.com/ > + # Thus, we strip any trailing / from $href, to spare us double > + # slashes in the final URL > + $href =~ s,/$,,; I like having this comment here. > + > + # Then add the project name, if present > $href .= "/".esc_url($params{'project'}) if defined $params{'project'}; > delete $params{'project'}; > > - # Summary just uses the project path URL > - if (defined $params{'action'} && $params{'action'} eq 'summary') { > + # Summary just uses the project path URL, any other action is > + # added to the URL Which meant that we prefer http://base_url/project over http://base_url/project/summary > + if (defined $params{'action'}) { > + $href .= "/".esc_url($params{'action'}) unless $params{'action'} eq 'summary'; > delete $params{'action'}; > } > + > + # Finally, we put either hash_base:file_name or hash > + if (defined $params{'hash_base'}) { > + $href .= "/".esc_url($params{'hash_base'}); > + if (defined $params{'file_name'}) { > + $href .= ":".esc_url($params{'file_name'}); > + delete $params{'file_name'}; > + } > + delete $params{'hash'}; > + delete $params{'hash_base'}; > + } elsif (defined $params{'hash'}) { > + $href .= "/".esc_url($params{'hash'}); > + delete $params{'hash'}; > + } > } > > # now encode the parameters explicitly > -- > 1.5.6.5 > > -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO 2008-10-16 20:27 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta @ 2008-10-18 22:41 ` Jakub Narebski 1 sibling, 0 replies; 26+ messages in thread From: Jakub Narebski @ 2008-10-18 22:41 UTC (permalink / raw) To: Giuseppe Bilotta; +Cc: git, Petr Baudis, Junio C Hamano On Thu, 16 Oct 2008, Giuseppe Bilotta wrote: > This patch enables gitweb to parse URLs with more information embedded > in PATH_INFO, reducing the need for CGI parameters. The typical gitweb > path is now $project/$action/$hash_base:$file_name or > $project/$action/$hash > > This is mostly backwards compatible with the old-style gitweb paths, > $project/$branch[:$filename], except when it was used to access a branch > whose name matches a gitweb action. ...in which case old code would access branch, and new code would treat name as action. This shouldn't matter in practice, unless there are branches named exactly as gitweb actions. [But I think current commit description is enough. Just in case...] > > Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com> I like it. For what it is worth: Acked-by: Jakub Narebski <jnareb@gmail.com> > --- > gitweb/gitweb.perl | 37 +++++++++++++++++++++++++++++++------ > 1 files changed, 31 insertions(+), 6 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index c5254af..6d0dc26 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -534,23 +534,48 @@ sub evaluate_path_info { > return if $input_params{'action'}; > $path_info =~ s,^\Q$project\E/*,,; > > + # next, check if we have an action > + my $action = $path_info; > + $action =~ s,/.*$,,; > + if (exists $actions{$action}) { > + $path_info =~ s,^$action/*,,; > + $input_params{'action'} = $action; > + } > + > + # list of actions that want hash_base instead of hash ...and can have no file_name ('f') parameter. > + my @wants_base = ( > + 'tree', > + 'history', > + ); I like this solution. It makes path_info URL unambiguous: it is <project>/log/<hash> (where <hash> is usually refname), and <project>/tree/<hash_base> which is just shortcut for <project>/tree/<hash_base>:/ It means that the @wants_base list contains actions which require / use $hash_base and $file_name, but $file_name might be not set, which denotes top directory (root tree). > + > my ($refname, $pathname) = split(/:/, $path_info, 2); > if (defined $pathname) { > - # we got "project.git/branch:filename" or "project.git/branch:dir/" > + # we got "branch:filename" or "branch:dir/" > # we could use git_get_type(branch:pathname), but it needs $git_dir Well, this comment was there... but actually I think we avoid git_get_type("branch:pathname") because currently it is additional fork. And with convention that filename part of path_info ends in '/' for directories (quite sensible I think) it is not needed. Moreso after next patch in series, when we state action explicitly, and the only difference _might_ be for 'history' view (which is for directories and for files both). > $pathname =~ s,^/+,,; > if (!$pathname || substr($pathname, -1) eq "/") { > - $input_params{'action'} = "tree"; > + $input_params{'action'} ||= "tree"; > $pathname =~ s,/$,,; > } else { > - $input_params{'action'} = "blob_plain"; > + $input_params{'action'} ||= "blob_plain"; > } > $input_params{'hash_base'} ||= $refname; > $input_params{'file_name'} ||= $pathname; > } elsif (defined $refname) { > - # we got "project.git/branch" > - $input_params{'action'} = "shortlog"; > - $input_params{'hash'} ||= $refname; > + # we got "branch". in this case we have to choose if we have to ^ ^ |-|-- ??? (typo?) > + # set hash or hash_base. > + # > + # Most of the actions without a pathname only want hash to be > + # set, except for the ones specified in @wants_base that want > + # hash_base instead. It should also be noted that hand-crafted > + # links having 'history' as an action and no pathname or hash > + # set will fail, but that happens regardless of PATH_INFO. Ah, I see that the comment about what makes action to be put into @wants_base is here. > + $input_params{'action'} ||= "shortlog"; > + if (grep {$input_params{'action'} eq $_} @wants_base) { Style: I would use here (but perhaps it is just me) for better readibility + if (grep { $input_params{'action'} eq $_ } @wants_base) { > + $input_params{'hash_base'} ||= $refname; > + } else { > + $input_params{'hash'} ||= $refname; > + } > } > } > evaluate_path_info(); > -- > 1.5.6.5 > > -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2008-11-02 8:51 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-10-16 20:27 [PATCHv6 0/5] gitweb: PATH_INFO enhancement Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Giuseppe Bilotta 2008-10-16 20:27 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Giuseppe Bilotta 2008-10-19 12:11 ` [PATCH 6/6] gitweb: retrieve snapshot format from PATH_INFO Giuseppe Bilotta 2008-10-19 14:24 ` [PATHv2 6/8] " Giuseppe Bilotta 2008-10-19 14:24 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Giuseppe Bilotta 2008-10-19 14:24 ` [PATHv2 8/8] gitweb: make the supported snapshot formats array global Giuseppe Bilotta 2008-11-02 1:54 ` Jakub Narebski 2008-11-02 8:50 ` Junio C Hamano 2008-11-01 0:18 ` [PATHv2 7/8] gitweb: embed snapshot format parameter in PATH_INFO Jakub Narebski 2008-11-01 12:57 ` Giuseppe Bilotta 2008-10-21 16:44 ` [PATHv2 6/8] gitweb: retrieve snapshot format from PATH_INFO Jakub Narebski 2008-10-21 18:36 ` Giuseppe Bilotta 2008-10-21 19:09 ` Junio C Hamano 2008-10-20 10:49 ` [PATCHv6 5/5] gitweb: generate parent..current URLs Jakub Narebski 2008-10-20 14:57 ` Giuseppe Bilotta 2008-10-20 9:18 ` [PATCHv6 4/5] gitweb: parse parent..current syntax from PATH_INFO Jakub Narebski 2008-10-20 14:52 ` Giuseppe Bilotta 2008-10-18 23:26 ` [PATCHv6 3/5] gitweb: use_pathinfo filenames start with / Jakub Narebski 2008-10-18 23:57 ` Giuseppe Bilotta 2008-10-19 8:43 ` Jakub Narebski 2008-10-18 23:14 ` [PATCHv6 2/5] gitweb: generate project/action/hash URLs Jakub Narebski 2008-10-18 22:41 ` [PATCHv6 1/5] gitweb: parse project/action/hash_base:filename PATH_INFO Jakub Narebski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).