* [PATCH 0/9] Gitweb caching v5 @ 2010-01-14 1:22 John 'Warthog9' Hawley 2010-01-14 1:22 ` [PATCH 1/9] gitweb: Load checking John 'Warthog9' Hawley 2010-01-15 1:40 ` [PATCH 0/9] Gitweb caching v5 Jakub Narebski 0 siblings, 2 replies; 60+ messages in thread From: John 'Warthog9' Hawley @ 2010-01-14 1:22 UTC (permalink / raw) To: git Afternoon everyone, This is the latest incarnation of gitweb w/ caching. This is finally at the point where it should probably start either being considered for inclusion or mainline, or I need to accept that this will never get in and more perminantely fork (as is the case with Fedora where this is going in as gitweb-caching as a parrallel rpm package). That said this brings the base up to mainline (again), it updates a number of elements in the caching engine, and this is a much cleaner break-out of the tree vs. what I am currently developing against. v5: - Missed a couple of things that were in my local tree, and added them back in. - Split up the die_error and the version matching patch - Set version matching to be on by default - otherwise this really is code that will never get checked, or at best enabled by default by distributions - Added a minor code cleanup with respect to $site_header that was already in my tree - Applied against a more recent git tree vs. 1.6.6-rc2 - Removed breakout patch for now (did that in v4 actually) and will deal with that separately http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v5 v4: - major re-working of the caching layer to use file handle redirection instead of buffering output - other minor improvements http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v4 v3: - various minor re-works based on mailing list feedback, this series was not sent to the mailing list. v2: - Better breakout - You can actually disable the cache now - John 'Warthog9' Hawley John 'Warthog9' Hawley (9): gitweb: Load checking gitweb: change die_error to take "extra" argument for extended die information gitweb: Add option to force version match gitweb: Makefile improvements gitweb: add a get function to compliment print_local_time gitweb: add a get function to compliment print_sort_th gitweb: cleanup error message produced by undefined $site_header gitweb: Convert output to using indirect file handle gitweb: File based caching layer (from git.kernel.org) Makefile | 65 +--- gitweb/Makefile | 129 +++++++ gitweb/README | 10 +- gitweb/cache.pm | 283 ++++++++++++++ gitweb/gitweb.css | 6 + gitweb/gitweb.perl | 1034 ++++++++++++++++++++++++++++++---------------------- 6 files changed, 1030 insertions(+), 497 deletions(-) create mode 100644 gitweb/Makefile create mode 100644 gitweb/cache.pm ^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH 1/9] gitweb: Load checking 2010-01-14 1:22 [PATCH 0/9] Gitweb caching v5 John 'Warthog9' Hawley @ 2010-01-14 1:22 ` John 'Warthog9' Hawley 2010-01-14 1:22 ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information John 'Warthog9' Hawley 2010-01-15 22:30 ` [PATCH 1/9] gitweb: Load checking Jakub Narebski 2010-01-15 1:40 ` [PATCH 0/9] Gitweb caching v5 Jakub Narebski 1 sibling, 2 replies; 60+ messages in thread From: John 'Warthog9' Hawley @ 2010-01-14 1:22 UTC (permalink / raw) To: git From: John 'Warthog9' Hawley <warthog9@kernel.org> This changes slightly the behavior of gitweb, so that it verifies that the box isn't inundated with before attempting to serve gitweb. If the box is overloaded, it basically returns a 503 Server Unavailable until the load falls below the defined threshold. This helps dramatically if you have a box that's I/O bound, reaches a certain load and you don't want gitweb, the I/O hog that it is, increasing the pain the server is already undergoing. This behavior is controlled by $maxload configuration variable. Default is a load of 300, which for most cases should never be hit. Unset it (set it to undefined value, i.e. undef) to turn off checking. Currently it requires that '/proc/loadavg' file exists, otherwise the load check is bypassed (load is taken to be 0). So platforms that do not implement '/proc/loadavg' currently cannot use this feature. (provisions are included for additional checks to be added by others) Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- gitweb/README | 7 ++++++- gitweb/gitweb.perl | 45 +++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 47 insertions(+), 5 deletions(-) diff --git a/gitweb/README b/gitweb/README index e34ee79..6c2c8e1 100644 --- a/gitweb/README +++ b/gitweb/README @@ -174,7 +174,7 @@ not include variables usually directly set during build): Base URL for relative URLs in pages generated by gitweb, (e.g. $logo, $favicon, @stylesheets if they are relative URLs), needed and used only for URLs with nonempty PATH_INFO via - <base href="$base_url>. Usually gitweb sets its value correctly, + <base href="$base_url">. Usually gitweb sets its value correctly, and there is no need to set this variable, e.g. to $my_uri or "/". * $home_link Target of the home link on top of all pages (the first part of view @@ -228,6 +228,11 @@ not include variables usually directly set during build): repositories from launching cross-site scripting (XSS) attacks. Set this to true if you don't trust the content of your repositories. The default is false. + * $maxload + Used to set the maximum load that we will still respond to gitweb queries. + If server load exceed this value then return "503 Service Unavaliable" error. + Server load is taken to be 0 if gitweb cannot determine its value. Set it to + undefined value to turn it off. The default is 300. Projects list file format diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 7e477af..0a07d3a 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -221,6 +221,12 @@ our %avatar_size = ( 'double' => 32 ); +# Used to set the maximum load that we will still respond to gitweb queries. +# If server load exceed this value then return "503 server busy" error. +# If gitweb cannot determined server load, it is taken to be 0. +# Leave it undefined (or set to 'undef') to turn off load checking. +our $maxload = 300; + # You define site-wide feature defaults here; override them with # $GITWEB_CONFIG as necessary. our %feature = ( @@ -551,6 +557,32 @@ if (-e $GITWEB_CONFIG) { do $GITWEB_CONFIG_SYSTEM if -e $GITWEB_CONFIG_SYSTEM; } +# Get loadavg of system, to compare against $maxload. +# Currently it requires '/proc/loadavg' present to get loadavg; +# if it is not present it returns 0, which means no load checking. +sub get_loadavg { + if( -e '/proc/loadavg' ){ + open my $fd, '<', '/proc/loadavg' + or return 0; + my @load = split(/\s+/, scalar <$fd>); + close $fd; + + # The first three columns measure CPU and IO utilization of the last one, + # five, and 10 minute periods. The fourth column shows the number of + # currently running processes and the total number of processes in the m/n + # format. The last column displays the last process ID used. + return $load[0] || 0; + } + # additional checks for load average should go here for things that don't export + # /proc/loadavg + + return 0; +} + +if (defined $maxload && get_loadavg() > $maxload) { + die_error(503, "The load average on the server is too high"); +} + # version of the core git binary our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown"; $number_of_git_cmds++; @@ -3354,14 +3386,19 @@ sub git_footer_html { # 500: The server isn't configured properly, or # an internal error occurred (e.g. failed assertions caused by bugs), or # an unknown error occurred (e.g. the git binary died unexpectedly). +# 503: The server is currently unavailable (because it is overloaded, +# or down for maintenance). Generally, this is a temporary state. sub die_error { my $status = shift || 500; my $error = shift || "Internal server error"; - my %http_responses = (400 => '400 Bad Request', - 403 => '403 Forbidden', - 404 => '404 Not Found', - 500 => '500 Internal Server Error'); + my %http_responses = ( + 400 => '400 Bad Request', + 403 => '403 Forbidden', + 404 => '404 Not Found', + 500 => '500 Internal Server Error', + 503 => '503 Service Unavailable', + ); git_header_html($http_responses{$status}); print <<EOF; <div class="page_body"> -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information 2010-01-14 1:22 ` [PATCH 1/9] gitweb: Load checking John 'Warthog9' Hawley @ 2010-01-14 1:22 ` John 'Warthog9' Hawley 2010-01-14 1:22 ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley 2010-01-15 22:40 ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information Jakub Narebski 2010-01-15 22:30 ` [PATCH 1/9] gitweb: Load checking Jakub Narebski 1 sibling, 2 replies; 60+ messages in thread From: John 'Warthog9' Hawley @ 2010-01-14 1:22 UTC (permalink / raw) To: git This is a small change that just adds a 3rd, optional, parameter to die_error that allows for extended error information to be output along with what the error was. --- gitweb/gitweb.perl | 8 +++++++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 0a07d3a..8298de5 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3391,6 +3391,7 @@ sub git_footer_html { sub die_error { my $status = shift || 500; my $error = shift || "Internal server error"; + my $extra = shift; my %http_responses = ( 400 => '400 Bad Request', @@ -3405,8 +3406,13 @@ sub die_error { <br /><br /> $status - $error <br /> -</div> EOF + if (defined $extra) { + print "<hr />\n" . + "$extra\n"; + } + print "</div>\n"; + git_footer_html(); exit; } -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 3/9] gitweb: Add option to force version match 2010-01-14 1:22 ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information John 'Warthog9' Hawley @ 2010-01-14 1:22 ` John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 4/9] gitweb: Makefile improvements John 'Warthog9' Hawley ` (2 more replies) 2010-01-15 22:40 ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information Jakub Narebski 1 sibling, 3 replies; 60+ messages in thread From: John 'Warthog9' Hawley @ 2010-01-14 1:22 UTC (permalink / raw) To: git This adds $git_versions_must_match variable, which is set to true, checks that we are running on the same version of git that we shipped with, and if not throw '500 Internal Server Error' error. What is checked is the version of gitweb (embedded in building gitweb.cgi), against version of runtime git binary used. Gitweb can usually run with a mismatched git install. This is more here to give an obvious warning as to whats going on vs. silently failing. By default this feature is turned on. Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- gitweb/README | 3 +++ gitweb/gitweb.perl | 23 +++++++++++++++++++++++ 2 files changed, 26 insertions(+), 0 deletions(-) diff --git a/gitweb/README b/gitweb/README index 6c2c8e1..03151d2 100644 --- a/gitweb/README +++ b/gitweb/README @@ -233,6 +233,9 @@ not include variables usually directly set during build): If server load exceed this value then return "503 Service Unavaliable" error. Server load is taken to be 0 if gitweb cannot determine its value. Set it to undefined value to turn it off. The default is 300. + * $git_versions_must_match + If set, gitweb fails with 500 Internal Server Error if the version of gitweb + doesn't match version of git binary. The default is true. Projects list file format diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 8298de5..b41bc33 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -221,6 +221,9 @@ our %avatar_size = ( 'double' => 32 ); +# If it is true, exit if gitweb version and git binary version don't match +our $git_versions_must_match = 1; + # Used to set the maximum load that we will still respond to gitweb queries. # If server load exceed this value then return "503 server busy" error. # If gitweb cannot determined server load, it is taken to be 0. @@ -587,6 +590,26 @@ if (defined $maxload && get_loadavg() > $maxload) { our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown"; $number_of_git_cmds++; +# Throw an error if git versions does not match, if $git_versions_must_match is true. +if ($git_versions_must_match && + $git_version ne $version) { + my $admin_contact = + defined $ENV{'SERVER_ADMIN'} ? ", $ENV{'SERVER_ADMIN'}," : ''; + my $err_msg = <<EOT; +<h1 align="center">*** Warning ***</h1> +<p> +This version of gitweb was compiled for <b>@{[esc_html($version)]}</b>, +however git version <b>@{[esc_html($git_version)]}</b> was found on server, +and administrator requested strict version checking. +</p> +<p> +Please contact the server administrator${admin_contact} to either configure +gitweb to allow mismatched versions, or update git or gitweb installation. +</p> +EOT + die_error(500, 'Internal server error', $err_msg); +} + $projects_list ||= $projectroot; # ====================================================================== -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 4/9] gitweb: Makefile improvements 2010-01-14 1:22 ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley @ 2010-01-14 1:23 ` John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time John 'Warthog9' Hawley 2010-01-15 23:36 ` [PATCH 3/9] gitweb: Add option to force version match Jakub Narebski 2010-01-24 21:59 ` Petr Baudis 2 siblings, 1 reply; 60+ messages in thread From: John 'Warthog9' Hawley @ 2010-01-14 1:23 UTC (permalink / raw) To: git From: John 'Warthog9' Hawley <warthog9@kernel.org> This commit adjust the main Makefile so you can simply run make gitweb which in turn calls gitweb/Makefile. This means that in order to generate gitweb, you can simply run 'make' from gitweb subdirectory: cd gitweb make Targets gitweb/gitweb.cgi and (dependent on JSMIN being defined) gitweb/gitweb.min.js in main Makefile are preserved for backward compatibility. Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- Makefile | 65 +++++---------------------- gitweb/Makefile | 129 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 141 insertions(+), 53 deletions(-) create mode 100644 gitweb/Makefile diff --git a/Makefile b/Makefile index be18389..9f069a8 100644 --- a/Makefile +++ b/Makefile @@ -282,29 +282,6 @@ pathsep = : # JavaScript minifier invocation that can function as filter JSMIN = -# default configuration for gitweb -GITWEB_CONFIG = gitweb_config.perl -GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf -GITWEB_HOME_LINK_STR = projects -GITWEB_SITENAME = -GITWEB_PROJECTROOT = /pub/git -GITWEB_PROJECT_MAXDEPTH = 2007 -GITWEB_EXPORT_OK = -GITWEB_STRICT_EXPORT = -GITWEB_BASE_URL = -GITWEB_LIST = -GITWEB_HOMETEXT = indextext.html -GITWEB_CSS = gitweb.css -GITWEB_LOGO = git-logo.png -GITWEB_FAVICON = git-favicon.png -ifdef JSMIN -GITWEB_JS = gitweb.min.js -else -GITWEB_JS = gitweb.js -endif -GITWEB_SITE_HEADER = -GITWEB_SITE_FOOTER = - export prefix bindir sharedir sysconfdir CC = gcc @@ -1535,6 +1512,11 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)): % : %.perl chmod +x $@+ && \ mv $@+ $@ + +.PHONY: gitweb +gitweb: + $(QUIET_SUBDIR0)gitweb $(QUIET_SUBDIR1) all + ifdef JSMIN OTHER_PROGRAMS += gitweb/gitweb.cgi gitweb/gitweb.min.js gitweb/gitweb.cgi: gitweb/gitweb.perl gitweb/gitweb.min.js @@ -1542,30 +1524,13 @@ else OTHER_PROGRAMS += gitweb/gitweb.cgi gitweb/gitweb.cgi: gitweb/gitweb.perl endif - $(QUIET_GEN)$(RM) $@ $@+ && \ - sed -e '1s|#!.*perl|#!$(PERL_PATH_SQ)|' \ - -e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \ - -e 's|++GIT_BINDIR++|$(bindir)|g' \ - -e 's|++GITWEB_CONFIG++|$(GITWEB_CONFIG)|g' \ - -e 's|++GITWEB_CONFIG_SYSTEM++|$(GITWEB_CONFIG_SYSTEM)|g' \ - -e 's|++GITWEB_HOME_LINK_STR++|$(GITWEB_HOME_LINK_STR)|g' \ - -e 's|++GITWEB_SITENAME++|$(GITWEB_SITENAME)|g' \ - -e 's|++GITWEB_PROJECTROOT++|$(GITWEB_PROJECTROOT)|g' \ - -e 's|"++GITWEB_PROJECT_MAXDEPTH++"|$(GITWEB_PROJECT_MAXDEPTH)|g' \ - -e 's|++GITWEB_EXPORT_OK++|$(GITWEB_EXPORT_OK)|g' \ - -e 's|++GITWEB_STRICT_EXPORT++|$(GITWEB_STRICT_EXPORT)|g' \ - -e 's|++GITWEB_BASE_URL++|$(GITWEB_BASE_URL)|g' \ - -e 's|++GITWEB_LIST++|$(GITWEB_LIST)|g' \ - -e 's|++GITWEB_HOMETEXT++|$(GITWEB_HOMETEXT)|g' \ - -e 's|++GITWEB_CSS++|$(GITWEB_CSS)|g' \ - -e 's|++GITWEB_LOGO++|$(GITWEB_LOGO)|g' \ - -e 's|++GITWEB_FAVICON++|$(GITWEB_FAVICON)|g' \ - -e 's|++GITWEB_JS++|$(GITWEB_JS)|g' \ - -e 's|++GITWEB_SITE_HEADER++|$(GITWEB_SITE_HEADER)|g' \ - -e 's|++GITWEB_SITE_FOOTER++|$(GITWEB_SITE_FOOTER)|g' \ - $< >$@+ && \ - chmod +x $@+ && \ - mv $@+ $@ + $(QUIET_SUBDIR0)gitweb $(QUIET_SUBDIR1) $(patsubst gitweb/%,%,$@) + +ifdef JSMIN +gitweb/gitweb.min.js: gitweb/gitweb.js + $(QUIET_SUBDIR0)gitweb $(QUIET_SUBDIR1) $(patsubst gitweb/%,%,$@) +endif # JSMIN + git-instaweb: git-instaweb.sh gitweb/gitweb.cgi gitweb/gitweb.css gitweb/gitweb.js $(QUIET_GEN)$(RM) $@ $@+ && \ @@ -1592,12 +1557,6 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)) git-instaweb: % : unimplemented.sh mv $@+ $@ endif # NO_PERL - -ifdef JSMIN -gitweb/gitweb.min.js: gitweb/gitweb.js - $(QUIET_GEN)$(JSMIN) <$< >$@ -endif # JSMIN - ifndef NO_PYTHON $(patsubst %.py,%,$(SCRIPT_PYTHON)): GIT-CFLAGS $(patsubst %.py,%,$(SCRIPT_PYTHON)): % : %.py diff --git a/gitweb/Makefile b/gitweb/Makefile new file mode 100644 index 0000000..c9eb1ee --- /dev/null +++ b/gitweb/Makefile @@ -0,0 +1,129 @@ +# The default target of this Makefile is... +all:: + +# Define V=1 to have a more verbose compile. +# +# Define JSMIN to point to JavaScript minifier that functions as +# a filter to have gitweb.js minified. +# + +prefix ?= $(HOME) +bindir ?= $(prefix)/bin +RM ?= rm -f + +# JavaScript minifier invocation that can function as filter +JSMIN ?= + +# default configuration for gitweb +GITWEB_CONFIG = gitweb_config.perl +GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf +GITWEB_HOME_LINK_STR = projects +GITWEB_SITENAME = +GITWEB_PROJECTROOT = /pub/git +GITWEB_PROJECT_MAXDEPTH = 2007 +GITWEB_EXPORT_OK = +GITWEB_STRICT_EXPORT = +GITWEB_BASE_URL = +GITWEB_LIST = +GITWEB_HOMETEXT = indextext.html +GITWEB_CSS = gitweb.css +GITWEB_LOGO = git-logo.png +GITWEB_FAVICON = git-favicon.png +ifdef JSMIN +GITWEB_JS = gitweb.min.js +else +GITWEB_JS = gitweb.js +endif +GITWEB_SITE_HEADER = +GITWEB_SITE_FOOTER = + +# include user config +-include ../config.mak.autogen +-include ../config.mak + +# determine version +../GIT-VERSION-FILE: .FORCE-GIT-VERSION-FILE + $(QUIET_SUBDIR0)../ $(QUIET_SUBDIR1) GIT-VERSION-FILE + +-include ../GIT-VERSION-FILE + +### Build rules + +SHELL_PATH ?= $(SHELL) +PERL_PATH ?= /usr/bin/perl + +# Shell quote; +bindir_SQ = $(subst ','\'',$(bindir)) #' +SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH)) #' +PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH)) #' + +# Quiet generation (unless V=1) +QUIET_SUBDIR0 = +$(MAKE) -C # space to separate -C and subdir +QUIET_SUBDIR1 = + +ifneq ($(findstring $(MAKEFLAGS),w),w) +PRINT_DIR = --no-print-directory +else # "make -w" +NO_SUBDIR = : +endif + +ifneq ($(findstring $(MAKEFLAGS),s),s) +ifndef V + QUIET = @ + QUIET_GEN = $(QUIET)echo ' ' GEN $@; + QUIET_SUBDIR0 = +@subdir= + QUIET_SUBDIR1 = ;$(NO_SUBDIR) echo ' ' SUBDIR $$subdir; \ + $(MAKE) $(PRINT_DIR) -C $$subdir + export V + export QUIET + export QUIET_GEN + export QUIET_SUBDIR0 + export QUIET_SUBDIR1 +endif +endif + +all:: gitweb.cgi + +ifdef JSMIN +FILES=gitweb.cgi gitweb.min.js +gitweb.cgi: gitweb.perl gitweb.min.js +else # !JSMIN +FILES=gitweb.cgi +gitweb.cgi: gitweb.perl +endif # JSMIN + +gitweb.cgi: + $(QUIET_GEN)$(RM) $@ $@+ && \ + sed -e '1s|#!.*perl|#!$(PERL_PATH_SQ)|' \ + -e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \ + -e 's|++GIT_BINDIR++|$(bindir)|g' \ + -e 's|++GITWEB_CONFIG++|$(GITWEB_CONFIG)|g' \ + -e 's|++GITWEB_CONFIG_SYSTEM++|$(GITWEB_CONFIG_SYSTEM)|g' \ + -e 's|++GITWEB_HOME_LINK_STR++|$(GITWEB_HOME_LINK_STR)|g' \ + -e 's|++GITWEB_SITENAME++|$(GITWEB_SITENAME)|g' \ + -e 's|++GITWEB_PROJECTROOT++|$(GITWEB_PROJECTROOT)|g' \ + -e 's|"++GITWEB_PROJECT_MAXDEPTH++"|$(GITWEB_PROJECT_MAXDEPTH)|g' \ + -e 's|++GITWEB_EXPORT_OK++|$(GITWEB_EXPORT_OK)|g' \ + -e 's|++GITWEB_STRICT_EXPORT++|$(GITWEB_STRICT_EXPORT)|g' \ + -e 's|++GITWEB_BASE_URL++|$(GITWEB_BASE_URL)|g' \ + -e 's|++GITWEB_LIST++|$(GITWEB_LIST)|g' \ + -e 's|++GITWEB_HOMETEXT++|$(GITWEB_HOMETEXT)|g' \ + -e 's|++GITWEB_CSS++|$(GITWEB_CSS)|g' \ + -e 's|++GITWEB_LOGO++|$(GITWEB_LOGO)|g' \ + -e 's|++GITWEB_FAVICON++|$(GITWEB_FAVICON)|g' \ + -e 's|++GITWEB_JS++|$(GITWEB_JS)|g' \ + -e 's|++GITWEB_SITE_HEADER++|$(GITWEB_SITE_HEADER)|g' \ + -e 's|++GITWEB_SITE_FOOTER++|$(GITWEB_SITE_FOOTER)|g' \ + $< >$@+ && \ + chmod +x $@+ && \ + mv $@+ $@ + +ifdef JSMIN +gitweb.min.js: gitweb.js + $(QUIET_GEN)$(JSMIN) <$< >$@ +endif # JSMIN + +clean: + $(RM) $(FILES) + +.PHONY: all clean .FORCE-GIT-VERSION-FILE -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 5/9] gitweb: add a get function to compliment print_local_time 2010-01-14 1:23 ` [PATCH 4/9] gitweb: Makefile improvements John 'Warthog9' Hawley @ 2010-01-14 1:23 ` John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 6/9] gitweb: add a get function to compliment print_sort_th John 'Warthog9' Hawley 2010-01-23 11:13 ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time Jakub Narebski 0 siblings, 2 replies; 60+ messages in thread From: John 'Warthog9' Hawley @ 2010-01-14 1:23 UTC (permalink / raw) To: git This adds a get function for print_local_time so that the basic function can be used outside of their straight printing operation. --- gitweb/gitweb.perl | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index b41bc33..d38aad6 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3537,14 +3537,21 @@ sub git_print_header_div { } sub print_local_time { + print get_local_time(@_); +} + +sub get_local_time { + my $localtime = ""; my %date = @_; if ($date{'hour_local'} < 6) { - printf(" (<span class=\"atnight\">%02d:%02d</span> %s)", + $localtime .= sprintf(" (<span class=\"atnight\">%02d:%02d</span> %s)", $date{'hour_local'}, $date{'minute_local'}, $date{'tz_local'}); } else { - printf(" (%02d:%02d %s)", + $localtime .= sprintf(" (%02d:%02d %s)", $date{'hour_local'}, $date{'minute_local'}, $date{'tz_local'}); } + + return $localtime } # Outputs the author name and date in long form -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 6/9] gitweb: add a get function to compliment print_sort_th 2010-01-14 1:23 ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time John 'Warthog9' Hawley @ 2010-01-14 1:23 ` John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header John 'Warthog9' Hawley 2010-01-23 11:13 ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time Jakub Narebski 1 sibling, 1 reply; 60+ messages in thread From: John 'Warthog9' Hawley @ 2010-01-14 1:23 UTC (permalink / raw) To: git This adds a get function for print_sort_th so that the basic function can be used outside of their straight printing operation. --- gitweb/gitweb.perl | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index d38aad6..07fdeb5 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -4375,17 +4375,24 @@ sub fill_project_list_info { # print 'sort by' <th> element, generating 'sort by $name' replay link # if that order is not selected sub print_sort_th { + print get_sort_th(@_); +} + +sub get_sort_th { my ($name, $order, $header) = @_; + my $sortth = ""; $header ||= ucfirst($name); if ($order eq $name) { - print "<th>$header</th>\n"; + $sortth .= "<th>$header</th>\n"; } else { - print "<th>" . + $sortth .= "<th>" . $cgi->a({-href => href(-replay=>1, order=>$name), -class => "header"}, $header) . "</th>\n"; } + + return $sortth; } sub git_project_list_body { -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header 2010-01-14 1:23 ` [PATCH 6/9] gitweb: add a get function to compliment print_sort_th John 'Warthog9' Hawley @ 2010-01-14 1:23 ` John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle John 'Warthog9' Hawley 2010-01-15 23:49 ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header Jakub Narebski 0 siblings, 2 replies; 60+ messages in thread From: John 'Warthog9' Hawley @ 2010-01-14 1:23 UTC (permalink / raw) To: git If $site_header is not defined you get extraneous errors in the web logs: [Wed Jan 13 16:55:42 2010] [error] [client ::1] [Wed Jan 13 16:55:42 2010] gitweb.cgi: Use of uninitialized value $site_header in -f at /var/www/gitweb/gitweb.cgi line 3287., referer: http://git/gitweb.cgi for example. This ensures that the variable is defined before trying to use it. --- gitweb/gitweb.perl | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 07fdeb5..c4a177d 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3277,7 +3277,7 @@ EOF print "</head>\n" . "<body>\n"; - if (-f $site_header) { + if ($site_header && -f $site_header) { insert_file($site_header); } -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 8/9] gitweb: Convert output to using indirect file handle 2010-01-14 1:23 ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header John 'Warthog9' Hawley @ 2010-01-14 1:23 ` John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley 2010-01-16 0:43 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle Jakub Narebski 2010-01-15 23:49 ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header Jakub Narebski 1 sibling, 2 replies; 60+ messages in thread From: John 'Warthog9' Hawley @ 2010-01-14 1:23 UTC (permalink / raw) To: git This converts the output handling of gitweb to using an indirect file handle. This is in preparation to add the caching layer. This is a slight modification to the way I was originally doing it by passing the output around. This should be a nop and this shouldn't change the behavior of gitweb. This does leave error reporting functions (die_error specifically) continuing to output directly as I want to garauntee those will report their errors regardless of what may be going on with respect to the rest of the output. --- gitweb/gitweb.perl | 880 ++++++++++++++++++++++++++-------------------------- 1 files changed, 448 insertions(+), 432 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index c4a177d..8bb323c 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -450,6 +450,13 @@ our %feature = ( 'default' => [0]}, ); +# Basic file handler for all of gitweb, there are two of them. The first +# is the basic text/html file handler which is used for everything other +# then the binary files, that uses a separate file handler though +# these are both set to STDOUT for the time being. +our $output_handler = *STDOUT; +our $output_handler_bin = *STDOUT; + sub gitweb_get_feature { my ($name) = @_; return unless exists $feature{$name}; @@ -3081,7 +3088,7 @@ sub insert_file { my $filename = shift; open my $fd, '<', $filename; - print map { to_utf8($_) } <$fd>; + print {$output_handler} map { to_utf8($_) } <$fd>; close $fd; } @@ -3198,10 +3205,10 @@ sub git_header_html { } else { $content_type = 'text/html'; } - print $cgi->header(-type=>$content_type, -charset => 'utf-8', + print {$output_handler} $cgi->header(-type=>$content_type, -charset => 'utf-8', -status=> $status, -expires => $expires); my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : ''; - print <<EOF; + print {$output_handler} <<EOF; <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US"> @@ -3216,16 +3223,16 @@ EOF # the stylesheet, favicon etc urls won't work correctly with path_info # unless we set the appropriate base URL if ($ENV{'PATH_INFO'}) { - print "<base href=\"".esc_url($base_url)."\" />\n"; + print {$output_handler} "<base href=\"".esc_url($base_url)."\" />\n"; } # print out each stylesheet that exist, providing backwards capability # for those people who defined $stylesheet in a config file if (defined $stylesheet) { - print '<link rel="stylesheet" type="text/css" href="'.$stylesheet.'"/>'."\n"; + print {$output_handler} '<link rel="stylesheet" type="text/css" href="'.$stylesheet.'"/>'."\n"; } else { foreach my $stylesheet (@stylesheets) { next unless $stylesheet; - print '<link rel="stylesheet" type="text/css" href="'.$stylesheet.'"/>'."\n"; + print {$output_handler} '<link rel="stylesheet" type="text/css" href="'.$stylesheet.'"/>'."\n"; } } if (defined $project) { @@ -3244,7 +3251,7 @@ EOF $href_params{'action'} = $type; $link_attr{'-href'} = href(%href_params); - print "<link ". + print {$output_handler} "<link ". "rel=\"$link_attr{'-rel'}\" ". "title=\"$link_attr{'-title'}\" ". "href=\"$link_attr{'-href'}\" ". @@ -3254,7 +3261,7 @@ EOF $href_params{'extra_options'} = '--no-merges'; $link_attr{'-href'} = href(%href_params); $link_attr{'-title'} .= ' (no merges)'; - print "<link ". + print {$output_handler} "<link ". "rel=\"$link_attr{'-rel'}\" ". "title=\"$link_attr{'-title'}\" ". "href=\"$link_attr{'-href'}\" ". @@ -3263,37 +3270,37 @@ EOF } } else { - printf('<link rel="alternate" title="%s projects list" '. + printf( {$output_handler} '<link rel="alternate" title="%s projects list" '. 'href="%s" type="text/plain; charset=utf-8" />'."\n", $site_name, href(project=>undef, action=>"project_index")); - printf('<link rel="alternate" title="%s projects feeds" '. + printf( {$output_handler} '<link rel="alternate" title="%s projects feeds" '. 'href="%s" type="text/x-opml" />'."\n", $site_name, href(project=>undef, action=>"opml")); } if (defined $favicon) { - print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n); + print {$output_handler} qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n); } - print "</head>\n" . + print {$output_handler} "</head>\n" . "<body>\n"; if ($site_header && -f $site_header) { insert_file($site_header); } - print "<div class=\"page_header\">\n" . + print {$output_handler} "<div class=\"page_header\">\n" . $cgi->a({-href => esc_url($logo_url), -title => $logo_label}, qq(<img src="$logo" width="72" height="27" alt="git" class="logo"/>)); - print $cgi->a({-href => esc_url($home_link)}, $home_link_str) . " / "; + print {$output_handler} $cgi->a({-href => esc_url($home_link)}, $home_link_str) . " / "; if (defined $project) { - print $cgi->a({-href => href(action=>"summary")}, esc_html($project)); + print {$output_handler} $cgi->a({-href => href(action=>"summary")}, esc_html($project)); if (defined $action) { - print " / $action"; + print {$output_handler} " / $action"; } - print "\n"; + print {$output_handler} "\n"; } - print "</div>\n"; + print {$output_handler} "</div>\n"; my $have_search = gitweb_check_feature('search'); if (defined $project && $have_search) { @@ -3313,7 +3320,7 @@ EOF if ($use_pathinfo) { $action .= "/".esc_url($project); } - print $cgi->startform(-method => "get", -action => $action) . + print {$output_handler} $cgi->startform(-method => "get", -action => $action) . "<div class=\"search\">\n" . (!$use_pathinfo && $cgi->input({-name=>"p", -value=>$project, -type=>"hidden"}) . "\n") . @@ -3336,11 +3343,11 @@ EOF sub git_footer_html { my $feed_class = 'rss_logo'; - print "<div class=\"page_footer\">\n"; + print {$output_handler} "<div class=\"page_footer\">\n"; if (defined $project) { my $descr = git_get_project_description($project); if (defined $descr) { - print "<div class=\"page_footer_text\">" . esc_html($descr) . "</div>\n"; + print {$output_handler} "<div class=\"page_footer_text\">" . esc_html($descr) . "</div>\n"; } my %href_params = get_feed_info(); @@ -3351,22 +3358,22 @@ sub git_footer_html { foreach my $format qw(RSS Atom) { $href_params{'action'} = lc($format); - print $cgi->a({-href => href(%href_params), + print {$output_handler} $cgi->a({-href => href(%href_params), -title => "$href_params{'-title'} $format feed", -class => $feed_class}, $format)."\n"; } } else { - print $cgi->a({-href => href(project=>undef, action=>"opml"), + print {$output_handler} $cgi->a({-href => href(project=>undef, action=>"opml"), -class => $feed_class}, "OPML") . " "; - print $cgi->a({-href => href(project=>undef, action=>"project_index"), + print {$output_handler} $cgi->a({-href => href(project=>undef, action=>"project_index"), -class => $feed_class}, "TXT") . "\n"; } - print "</div>\n"; # class="page_footer" + print {$output_handler} "</div>\n"; # class="page_footer" if (defined $t0 && gitweb_check_feature('timed')) { - print "<div id=\"generating_info\">\n"; - print 'This page took '. + print {$output_handler} "<div id=\"generating_info\">\n"; + print {$output_handler} 'This page took '. '<span id="generating_time" class="time_span">'. Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]). ' seconds </span>'. @@ -3375,26 +3382,26 @@ sub git_footer_html { $number_of_git_cmds. '</span> git commands '. " to generate.\n"; - print "</div>\n"; # class="page_footer" + print {$output_handler} "</div>\n"; # class="page_footer" } if (-f $site_footer) { insert_file($site_footer); } - print qq!<script type="text/javascript" src="$javascript"></script>\n!; + print {$output_handler} qq!<script type="text/javascript" src="$javascript"></script>\n!; if ($action eq 'blame_incremental') { - print qq!<script type="text/javascript">\n!. + print {$output_handler} qq!<script type="text/javascript">\n!. qq!startBlame("!. href(action=>"blame_data", -replay=>1) .qq!",\n!. qq! "!. href() .qq!");\n!. qq!</script>\n!; } elsif (gitweb_check_feature('javascript-actions')) { - print qq!<script type="text/javascript">\n!. + print {$output_handler} qq!<script type="text/javascript">\n!. qq!window.onload = fixLinks;\n!. qq!</script>\n!; } - print "</body>\n" . + print {$output_handler} "</body>\n" . "</html>"; } @@ -3416,6 +3423,14 @@ sub die_error { my $error = shift || "Internal server error"; my $extra = shift; + # The output handlers for die_error need to be reset to STDOUT + # so that half the message isn't being output to random and + # half to STDOUT as expected. This is mainly for the benefit + # of using git_header_html() and git_footer_html() since those + # internaly use the indirect print handler. + $output_handler = *STDOUT; + $output_handler_bin = *STDOUT; + my %http_responses = ( 400 => '400 Bad Request', 403 => '403 Forbidden', @@ -3484,12 +3499,12 @@ sub git_print_page_nav { $arg{$label}{'_href'} = $link; } - print "<div class=\"page_nav\">\n" . + print {$output_handler} "<div class=\"page_nav\">\n" . (join " | ", map { $_ eq $current ? $_ : $cgi->a({-href => ($arg{$_}{_href} ? $arg{$_}{_href} : href(%{$arg{$_}}))}, "$_") } @navs); - print "<br/>\n$extra<br/>\n" . + print {$output_handler} "<br/>\n$extra<br/>\n" . "</div>\n"; } @@ -3530,14 +3545,14 @@ sub git_print_header_div { $args{'hash'} = $hash if $hash; $args{'hash_base'} = $hash_base if $hash_base; - print "<div class=\"header\">\n" . + print {$output_handler} "<div class=\"header\">\n" . $cgi->a({-href => href(%args), -class => "title"}, $title ? $title : $action) . "\n</div>\n"; } sub print_local_time { - print get_local_time(@_); + print {$output_handler} get_local_time(@_); } sub get_local_time { @@ -3562,11 +3577,11 @@ sub git_print_authorship { my $author = $co->{'author_name'}; my %ad = parse_date($co->{'author_epoch'}, $co->{'author_tz'}); - print "<$tag class=\"author_date\">" . + print {$output_handler} "<$tag class=\"author_date\">" . format_search_author($author, "author", esc_html($author)) . " [$ad{'rfc2822'}"; - print_local_time(%ad) if ($opts{-localtime}); - print "]" . git_get_avatar($co->{'author_email'}, -pad_before => 1) + print {$output_handler} get_local_time(%ad) if ($opts{-localtime}); + print {$output_handler} "]" . git_get_avatar($co->{'author_email'}, -pad_before => 1) . "</$tag>\n"; } @@ -3582,7 +3597,7 @@ sub git_print_authorship_rows { @people = ('author', 'committer') unless @people; foreach my $who (@people) { my %wd = parse_date($co->{"${who}_epoch"}, $co->{"${who}_tz"}); - print "<tr><td>$who</td><td>" . + print {$output_handler} "<tr><td>$who</td><td>" . format_search_author($co->{"${who}_name"}, $who, esc_html($co->{"${who}_name"})) . " " . format_search_author($co->{"${who}_email"}, $who, @@ -3592,8 +3607,8 @@ sub git_print_authorship_rows { "</td></tr>\n" . "<tr>" . "<td></td><td> $wd{'rfc2822'}"; - print_local_time(%wd); - print "</td>" . + print {$output_handler} get_local_time(%wd); + print {$output_handler} "</td>" . "</tr>\n"; } } @@ -3603,11 +3618,10 @@ sub git_print_page_path { my $type = shift; my $hb = shift; - - print "<div class=\"page_path\">"; - print $cgi->a({-href => href(action=>"tree", hash_base=>$hb), + print {$output_handler} "<div class=\"page_path\">"; + print {$output_handler} $cgi->a({-href => href(action=>"tree", hash_base=>$hb), -title => 'tree root'}, to_utf8("[$project]")); - print " / "; + print {$output_handler} " / "; if (defined $name) { my @dirname = split '/', $name; my $basename = pop @dirname; @@ -3615,25 +3629,25 @@ sub git_print_page_path { foreach my $dir (@dirname) { $fullname .= ($fullname ? '/' : '') . $dir; - print $cgi->a({-href => href(action=>"tree", file_name=>$fullname, + print {$output_handler} $cgi->a({-href => href(action=>"tree", file_name=>$fullname, hash_base=>$hb), -title => $fullname}, esc_path($dir)); - print " / "; + print {$output_handler} " / "; } if (defined $type && $type eq 'blob') { - print $cgi->a({-href => href(action=>"blob_plain", file_name=>$file_name, + print {$output_handler} $cgi->a({-href => href(action=>"blob_plain", file_name=>$file_name, hash_base=>$hb), -title => $name}, esc_path($basename)); } elsif (defined $type && $type eq 'tree') { - print $cgi->a({-href => href(action=>"tree", file_name=>$file_name, + print {$output_handler} $cgi->a({-href => href(action=>"tree", file_name=>$file_name, hash_base=>$hb), -title => $name}, esc_path($basename)); - print " / "; + print {$output_handler} " / "; } else { - print esc_path($basename); + print {$output_handler} esc_path($basename); } } - print "<br/></div>\n"; + print {$output_handler} "<br/></div>\n"; } sub git_print_log { @@ -3657,7 +3671,7 @@ sub git_print_log { $signoff = 1; $empty = 0; if (! $opts{'-remove_signoff'}) { - print "<span class=\"signoff\">" . esc_html($line) . "</span><br/>\n"; + print {$output_handler} "<span class=\"signoff\">" . esc_html($line) . "</span><br/>\n"; next; } else { # remove signoff lines @@ -3676,12 +3690,12 @@ sub git_print_log { $empty = 0; } - print format_log_line_html($line) . "<br/>\n"; + print {$output_handler} format_log_line_html($line) . "<br/>\n"; } if ($opts{'-final_empty_line'}) { # end with single empty line - print "<br/>\n" unless $empty; + print {$output_handler} "<br/>\n" unless $empty; } } @@ -3754,12 +3768,12 @@ sub git_print_tree_entry { # the mode of the entry, list is the name of the entry, an href, # and link is the action links of the entry. - print "<td class=\"mode\">" . mode_str($t->{'mode'}) . "</td>\n"; + print {$output_handler} "<td class=\"mode\">" . mode_str($t->{'mode'}) . "</td>\n"; if (exists $t->{'size'}) { - print "<td class=\"size\">$t->{'size'}</td>\n"; + print {$output_handler} "<td class=\"size\">$t->{'size'}</td>\n"; } if ($t->{'type'} eq "blob") { - print "<td class=\"list\">" . + print {$output_handler} "<td class=\"list\">" . $cgi->a({-href => href(action=>"blob", hash=>$t->{'hash'}, file_name=>"$basedir$t->{'name'}", %base_key), -class => "list"}, esc_path($t->{'name'})); @@ -3768,71 +3782,71 @@ sub git_print_tree_entry { if ($link_target) { my $norm_target = normalize_link_target($link_target, $basedir); if (defined $norm_target) { - print " -> " . + print {$output_handler} " -> " . $cgi->a({-href => href(action=>"object", hash_base=>$hash_base, file_name=>$norm_target), -title => $norm_target}, esc_path($link_target)); } else { - print " -> " . esc_path($link_target); + print {$output_handler} " -> " . esc_path($link_target); } } } - print "</td>\n"; - print "<td class=\"link\">"; - print $cgi->a({-href => href(action=>"blob", hash=>$t->{'hash'}, + print {$output_handler} "</td>\n"; + print {$output_handler} "<td class=\"link\">"; + print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$t->{'hash'}, file_name=>"$basedir$t->{'name'}", %base_key)}, "blob"); if ($have_blame) { - print " | " . + print {$output_handler} " | " . $cgi->a({-href => href(action=>"blame", hash=>$t->{'hash'}, file_name=>"$basedir$t->{'name'}", %base_key)}, "blame"); } if (defined $hash_base) { - print " | " . + print {$output_handler} " | " . $cgi->a({-href => href(action=>"history", hash_base=>$hash_base, hash=>$t->{'hash'}, file_name=>"$basedir$t->{'name'}")}, "history"); } - print " | " . + print {$output_handler} " | " . $cgi->a({-href => href(action=>"blob_plain", hash_base=>$hash_base, file_name=>"$basedir$t->{'name'}")}, "raw"); - print "</td>\n"; + print {$output_handler} "</td>\n"; } elsif ($t->{'type'} eq "tree") { - print "<td class=\"list\">"; - print $cgi->a({-href => href(action=>"tree", hash=>$t->{'hash'}, + print {$output_handler} "<td class=\"list\">"; + print {$output_handler} $cgi->a({-href => href(action=>"tree", hash=>$t->{'hash'}, file_name=>"$basedir$t->{'name'}", %base_key)}, esc_path($t->{'name'})); - print "</td>\n"; - print "<td class=\"link\">"; - print $cgi->a({-href => href(action=>"tree", hash=>$t->{'hash'}, + print {$output_handler} "</td>\n"; + print {$output_handler} "<td class=\"link\">"; + print {$output_handler} $cgi->a({-href => href(action=>"tree", hash=>$t->{'hash'}, file_name=>"$basedir$t->{'name'}", %base_key)}, "tree"); if (defined $hash_base) { - print " | " . + print {$output_handler} " | " . $cgi->a({-href => href(action=>"history", hash_base=>$hash_base, file_name=>"$basedir$t->{'name'}")}, "history"); } - print "</td>\n"; + print {$output_handler} "</td>\n"; } else { # unknown object: we can only present history for it # (this includes 'commit' object, i.e. submodule support) - print "<td class=\"list\">" . + print {$output_handler} "<td class=\"list\">" . esc_path($t->{'name'}) . "</td>\n"; - print "<td class=\"link\">"; + print {$output_handler} "<td class=\"link\">"; if (defined $hash_base) { - print $cgi->a({-href => href(action=>"history", + print {$output_handler} $cgi->a({-href => href(action=>"history", hash_base=>$hash_base, file_name=>"$basedir$t->{'name'}")}, "history"); } - print "</td>\n"; + print {$output_handler} "</td>\n"; } } @@ -3879,13 +3893,13 @@ sub git_difftree_body { my ($difftree, $hash, @parents) = @_; my ($parent) = $parents[0]; my $have_blame = gitweb_check_feature('blame'); - print "<div class=\"list_head\">\n"; + print {$output_handler} "<div class=\"list_head\">\n"; if ($#{$difftree} > 10) { - print(($#{$difftree} + 1) . " files changed:\n"); + print {$output_handler} (($#{$difftree} + 1) . " files changed:\n"); } - print "</div>\n"; + print {$output_handler} "</div>\n"; - print "<table class=\"" . + print {$output_handler} "<table class=\"" . (@parents > 1 ? "combined " : "") . "diff_tree\">\n"; @@ -3893,11 +3907,11 @@ sub git_difftree_body { my $has_header = @$difftree && @parents > 1 && $action eq 'commitdiff'; if ($has_header) { # table header - print "<thead><tr>\n" . + print {$output_handler} "<thead><tr>\n" . "<th></th><th></th>\n"; # filename, patchN link for (my $i = 0; $i < @parents; $i++) { my $par = $parents[$i]; - print "<th>" . + print {$output_handler} "<th>" . $cgi->a({-href => href(action=>"commitdiff", hash=>$hash, hash_parent=>$par), -title => 'commitdiff to parent number ' . @@ -3905,7 +3919,7 @@ sub git_difftree_body { $i+1) . " </th>\n"; } - print "</tr></thead>\n<tbody>\n"; + print {$output_handler} "</tr></thead>\n<tbody>\n"; } my $alternate = 1; @@ -3914,9 +3928,9 @@ sub git_difftree_body { my $diff = parsed_difftree_line($line); if ($alternate) { - print "<tr class=\"dark\">\n"; + print {$output_handler} "<tr class=\"dark\">\n"; } else { - print "<tr class=\"light\">\n"; + print {$output_handler} "<tr class=\"light\">\n"; } $alternate ^= 1; @@ -3927,14 +3941,14 @@ sub git_difftree_body { if (!is_deleted($diff)) { # file exists in the result (child) commit - print "<td>" . + print {$output_handler} "<td>" . $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'}, file_name=>$diff->{'to_file'}, hash_base=>$hash), -class => "list"}, esc_path($diff->{'to_file'})) . "</td>\n"; } else { - print "<td>" . + print {$output_handler} "<td>" . esc_path($diff->{'to_file'}) . "</td>\n"; } @@ -3942,7 +3956,7 @@ sub git_difftree_body { if ($action eq 'commitdiff') { # link to patch $patchno++; - print "<td class=\"link\">" . + print {$output_handler} "<td class=\"link\">" . $cgi->a({-href => "#patch$patchno"}, "patch") . " | " . "</td>\n"; @@ -3960,9 +3974,9 @@ sub git_difftree_body { $not_deleted ||= ($status ne 'D'); if ($status eq 'A') { - print "<td class=\"link\" align=\"right\"> | </td>\n"; + print {$output_handler} "<td class=\"link\" align=\"right\"> | </td>\n"; } elsif ($status eq 'D') { - print "<td class=\"link\">" . + print {$output_handler} "<td class=\"link\">" . $cgi->a({-href => href(action=>"blob", hash_base=>$hash, hash=>$from_hash, @@ -3971,11 +3985,11 @@ sub git_difftree_body { " | </td>\n"; } else { if ($diff->{'to_id'} eq $from_hash) { - print "<td class=\"link nochange\">"; + print {$output_handler} "<td class=\"link nochange\">"; } else { - print "<td class=\"link\">"; + print {$output_handler} "<td class=\"link\">"; } - print $cgi->a({-href => href(action=>"blobdiff", + print {$output_handler} $cgi->a({-href => href(action=>"blobdiff", hash=>$diff->{'to_id'}, hash_parent=>$from_hash, hash_base=>$hash, @@ -3987,24 +4001,24 @@ sub git_difftree_body { } } - print "<td class=\"link\">"; + print {$output_handler} "<td class=\"link\">"; if ($not_deleted) { - print $cgi->a({-href => href(action=>"blob", + print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'}, file_name=>$diff->{'to_file'}, hash_base=>$hash)}, "blob"); - print " | " if ($has_history); + print {$output_handler} " | " if ($has_history); } if ($has_history) { - print $cgi->a({-href => href(action=>"history", + print {$output_handler} $cgi->a({-href => href(action=>"history", file_name=>$diff->{'to_file'}, hash_base=>$hash)}, "history"); } - print "</td>\n"; + print {$output_handler} "</td>\n"; - print "</tr>\n"; + print {$output_handler} "</tr>\n"; next; # instead of 'else' clause, to avoid extra indent } # else ordinary diff @@ -4030,51 +4044,51 @@ sub git_difftree_body { my $mode_chng = "<span class=\"file_status new\">[new $to_file_type"; $mode_chng .= " with mode: $to_mode_str" if $to_mode_str; $mode_chng .= "]</span>"; - print "<td>"; - print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'}, + print {$output_handler} "<td>"; + print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'}, hash_base=>$hash, file_name=>$diff->{'file'}), -class => "list"}, esc_path($diff->{'file'})); - print "</td>\n"; - print "<td>$mode_chng</td>\n"; - print "<td class=\"link\">"; + print {$output_handler} "</td>\n"; + print {$output_handler} "<td>$mode_chng</td>\n"; + print {$output_handler} "<td class=\"link\">"; if ($action eq 'commitdiff') { # link to patch $patchno++; - print $cgi->a({-href => "#patch$patchno"}, "patch"); - print " | "; + print {$output_handler} $cgi->a({-href => "#patch$patchno"}, "patch"); + print {$output_handler} " | "; } - print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'}, + print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'}, hash_base=>$hash, file_name=>$diff->{'file'})}, "blob"); - print "</td>\n"; + print {$output_handler} "</td>\n"; } elsif ($diff->{'status'} eq "D") { # deleted my $mode_chng = "<span class=\"file_status deleted\">[deleted $from_file_type]</span>"; - print "<td>"; - print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'from_id'}, + print {$output_handler} "<td>"; + print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'from_id'}, hash_base=>$parent, file_name=>$diff->{'file'}), -class => "list"}, esc_path($diff->{'file'})); - print "</td>\n"; - print "<td>$mode_chng</td>\n"; - print "<td class=\"link\">"; + print {$output_handler} "</td>\n"; + print {$output_handler} "<td>$mode_chng</td>\n"; + print {$output_handler} "<td class=\"link\">"; if ($action eq 'commitdiff') { # link to patch $patchno++; - print $cgi->a({-href => "#patch$patchno"}, "patch"); - print " | "; + print {$output_handler} $cgi->a({-href => "#patch$patchno"}, "patch"); + print {$output_handler} " | "; } - print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'from_id'}, + print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'from_id'}, hash_base=>$parent, file_name=>$diff->{'file'})}, "blob") . " | "; if ($have_blame) { - print $cgi->a({-href => href(action=>"blame", hash_base=>$parent, + print {$output_handler} $cgi->a({-href => href(action=>"blame", hash_base=>$parent, file_name=>$diff->{'file'})}, "blame") . " | "; } - print $cgi->a({-href => href(action=>"history", hash_base=>$parent, + print {$output_handler} $cgi->a({-href => href(action=>"history", hash_base=>$parent, file_name=>$diff->{'file'})}, "history"); - print "</td>\n"; + print {$output_handler} "</td>\n"; } elsif ($diff->{'status'} eq "M" || $diff->{'status'} eq "T") { # modified, or type changed my $mode_chnge = ""; @@ -4092,39 +4106,39 @@ sub git_difftree_body { } $mode_chnge .= "]</span>\n"; } - print "<td>"; - print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'}, + print {$output_handler} "<td>"; + print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'}, hash_base=>$hash, file_name=>$diff->{'file'}), -class => "list"}, esc_path($diff->{'file'})); - print "</td>\n"; - print "<td>$mode_chnge</td>\n"; - print "<td class=\"link\">"; + print {$output_handler} "</td>\n"; + print {$output_handler} "<td>$mode_chnge</td>\n"; + print {$output_handler} "<td class=\"link\">"; if ($action eq 'commitdiff') { # link to patch $patchno++; - print $cgi->a({-href => "#patch$patchno"}, "patch") . + print {$output_handler} $cgi->a({-href => "#patch$patchno"}, "patch") . " | "; } elsif ($diff->{'to_id'} ne $diff->{'from_id'}) { # "commit" view and modified file (not onlu mode changed) - print $cgi->a({-href => href(action=>"blobdiff", + print {$output_handler} $cgi->a({-href => href(action=>"blobdiff", hash=>$diff->{'to_id'}, hash_parent=>$diff->{'from_id'}, hash_base=>$hash, hash_parent_base=>$parent, file_name=>$diff->{'file'})}, "diff") . " | "; } - print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'}, + print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'}, hash_base=>$hash, file_name=>$diff->{'file'})}, "blob") . " | "; if ($have_blame) { - print $cgi->a({-href => href(action=>"blame", hash_base=>$hash, + print {$output_handler} $cgi->a({-href => href(action=>"blame", hash_base=>$hash, file_name=>$diff->{'file'})}, "blame") . " | "; } - print $cgi->a({-href => href(action=>"history", hash_base=>$hash, + print {$output_handler} $cgi->a({-href => href(action=>"history", hash_base=>$hash, file_name=>$diff->{'file'})}, "history"); - print "</td>\n"; + print {$output_handler} "</td>\n"; } elsif ($diff->{'status'} eq "R" || $diff->{'status'} eq "C") { # renamed or copied my %status_name = ('R' => 'moved', 'C' => 'copied'); @@ -4134,7 +4148,7 @@ sub git_difftree_body { # mode also for directories, so we cannot use $to_mode_str $mode_chng = sprintf(", mode: %04o", $to_mode_oct & 0777); } - print "<td>" . + print {$output_handler} "<td>" . $cgi->a({-href => href(action=>"blob", hash_base=>$hash, hash=>$diff->{'to_id'}, file_name=>$diff->{'to_file'}), -class => "list"}, esc_path($diff->{'to_file'})) . "</td>\n" . @@ -4147,35 +4161,35 @@ sub git_difftree_body { if ($action eq 'commitdiff') { # link to patch $patchno++; - print $cgi->a({-href => "#patch$patchno"}, "patch") . + print {$output_handler} $cgi->a({-href => "#patch$patchno"}, "patch") . " | "; } elsif ($diff->{'to_id'} ne $diff->{'from_id'}) { # "commit" view and modified file (not only pure rename or copy) - print $cgi->a({-href => href(action=>"blobdiff", + print {$output_handler} $cgi->a({-href => href(action=>"blobdiff", hash=>$diff->{'to_id'}, hash_parent=>$diff->{'from_id'}, hash_base=>$hash, hash_parent_base=>$parent, file_name=>$diff->{'to_file'}, file_parent=>$diff->{'from_file'})}, "diff") . " | "; } - print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'}, + print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'}, hash_base=>$parent, file_name=>$diff->{'to_file'})}, "blob") . " | "; if ($have_blame) { - print $cgi->a({-href => href(action=>"blame", hash_base=>$hash, + print {$output_handler} $cgi->a({-href => href(action=>"blame", hash_base=>$hash, file_name=>$diff->{'to_file'})}, "blame") . " | "; } - print $cgi->a({-href => href(action=>"history", hash_base=>$hash, + print {$output_handler} $cgi->a({-href => href(action=>"history", hash_base=>$hash, file_name=>$diff->{'to_file'})}, "history"); - print "</td>\n"; + print {$output_handler} "</td>\n"; } # we should not encounter Unmerged (U) or Unknown (X) status - print "</tr>\n"; + print {$output_handler} "</tr>\n"; } - print "</tbody>" if $has_header; - print "</table>\n"; + print {$output_handler} "</tbody>" if $has_header; + print {$output_handler} "</table>\n"; } sub git_patchset_body { @@ -4190,7 +4204,7 @@ sub git_patchset_body { my $to_name; my (%from, %to); - print "<div class=\"patchset\">\n"; + print {$output_handler} "<div class=\"patchset\">\n"; # skip to first patch while ($patch_line = <$fd>) { @@ -4218,7 +4232,7 @@ sub git_patchset_body { # and parse raw git-diff line if needed if (is_patch_split($diffinfo, { 'to_file' => $to_name })) { # this is continuation of a split patch - print "<div class=\"patch cont\">\n"; + print {$output_handler} "<div class=\"patch cont\">\n"; } else { # advance raw git-diff output if needed $patch_idx++ if defined $diffinfo; @@ -4230,7 +4244,7 @@ sub git_patchset_body { # find which patch (using pathname of result) we are at now; if ($is_combined) { while ($to_name ne $diffinfo->{'to_file'}) { - print "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n" . + print {$output_handler} "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n" . format_diff_cc_simplified($diffinfo, @hash_parents) . "</div>\n"; # class="patch" @@ -4247,7 +4261,7 @@ sub git_patchset_body { # this is first patch for raw difftree line with $patch_idx index # we index @$difftree array from 0, but number patches from 1 - print "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n"; + print {$output_handler} "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n"; } # git diff header @@ -4255,25 +4269,25 @@ sub git_patchset_body { #assert($patch_line !~ m!$/$!) if DEBUG; # is chomp-ed $patch_number++; # print "git diff" header - print format_git_diff_header_line($patch_line, $diffinfo, + print {$output_handler} format_git_diff_header_line($patch_line, $diffinfo, \%from, \%to); # print extended diff header - print "<div class=\"diff extended_header\">\n"; + print {$output_handler} "<div class=\"diff extended_header\">\n"; EXTENDED_HEADER: while ($patch_line = <$fd>) { chomp $patch_line; last EXTENDED_HEADER if ($patch_line =~ m/^--- |^diff /); - print format_extended_diff_header_line($patch_line, $diffinfo, + print {$output_handler} format_extended_diff_header_line($patch_line, $diffinfo, \%from, \%to); } - print "</div>\n"; # class="diff extended_header" + print {$output_handler} "</div>\n"; # class="diff extended_header" # from-file/to-file diff header if (! $patch_line) { - print "</div>\n"; # class="patch" + print {$output_handler} "</div>\n"; # class="patch" last PATCH; } next PATCH if ($patch_line =~ m/^diff /); @@ -4284,7 +4298,7 @@ sub git_patchset_body { chomp $patch_line; #assert($patch_line =~ m/^\+\+\+/) if DEBUG; - print format_diff_from_to_header($last_patch_line, $patch_line, + print {$output_handler} format_diff_from_to_header($last_patch_line, $patch_line, $diffinfo, \%from, \%to, @hash_parents); @@ -4295,11 +4309,11 @@ sub git_patchset_body { next PATCH if ($patch_line =~ m/^diff /); - print format_diff_line($patch_line, \%from, \%to); + print {$output_handler} format_diff_line($patch_line, \%from, \%to); } } continue { - print "</div>\n"; # class="patch" + print {$output_handler} "</div>\n"; # class="patch" } # for compact combined (--cc) format, with chunk and patch simpliciaction @@ -4311,7 +4325,7 @@ sub git_patchset_body { $diffinfo = parsed_difftree_line($difftree->[$patch_idx]); # generate anchor for "patch" links in difftree / whatchanged part - print "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n" . + print {$output_handler} "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n" . format_diff_cc_simplified($diffinfo, @hash_parents) . "</div>\n"; # class="patch" @@ -4320,13 +4334,13 @@ sub git_patchset_body { if ($patch_number == 0) { if (@hash_parents > 1) { - print "<div class=\"diff nodifferences\">Trivial merge</div>\n"; + print {$output_handler} "<div class=\"diff nodifferences\">Trivial merge</div>\n"; } else { - print "<div class=\"diff nodifferences\">No differences found</div>\n"; + print {$output_handler} "<div class=\"diff nodifferences\">No differences found</div>\n"; } } - print "</div>\n"; # class="patchset" + print {$output_handler} "</div>\n"; # class="patchset" } # . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . @@ -4375,7 +4389,7 @@ sub fill_project_list_info { # print 'sort by' <th> element, generating 'sort by $name' replay link # if that order is not selected sub print_sort_th { - print get_sort_th(@_); + print {$output_handler} get_sort_th(@_); } sub get_sort_th { @@ -4428,20 +4442,20 @@ sub git_project_list_body { } } my $cloud = git_populate_project_tagcloud(\%ctags); - print git_show_project_tagcloud($cloud, 64); + print {$output_handler} git_show_project_tagcloud($cloud, 64); } - print "<table class=\"project_list\">\n"; + print {$output_handler} "<table class=\"project_list\">\n"; unless ($no_header) { - print "<tr>\n"; + print {$output_handler} "<tr>\n"; if ($check_forks) { - print "<th></th>\n"; + print {$output_handler} "<th></th>\n"; } - print_sort_th('project', $order, 'Project'); - print_sort_th('descr', $order, 'Description'); - print_sort_th('owner', $order, 'Owner'); - print_sort_th('age', $order, 'Last Change'); - print "<th></th>\n" . # for links + print {$output_handler} get_sort_th('project', $order, 'Project'); + print {$output_handler} get_sort_th('descr', $order, 'Description'); + print {$output_handler} get_sort_th('owner', $order, 'Owner'); + print {$output_handler} get_sort_th('age', $order, 'Last Change'); + print {$output_handler} "<th></th>\n" . # for links "</tr>\n"; } my $alternate = 1; @@ -4461,26 +4475,26 @@ sub git_project_list_body { } if ($alternate) { - print "<tr class=\"dark\">\n"; + print {$output_handler} "<tr class=\"dark\">\n"; } else { - print "<tr class=\"light\">\n"; + print {$output_handler} "<tr class=\"light\">\n"; } $alternate ^= 1; if ($check_forks) { - print "<td>"; + print {$output_handler} "<td>"; if ($pr->{'forks'}) { - print "<!-- $pr->{'forks'} -->\n"; - print $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "+"); + print {$output_handler} "<!-- $pr->{'forks'} -->\n"; + print {$output_handler} $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "+"); } - print "</td>\n"; + print {$output_handler} "</td>\n"; } - print "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"), - -class => "list"}, esc_html($pr->{'path'})) . "</td>\n" . + print {$output_handler} "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"), + -class => "list"}, esc_html($pr->{'path'})) ."</td>\n". "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"), -class => "list", -title => $pr->{'descr_long'}}, esc_html($pr->{'descr'})) . "</td>\n" . "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n"; - print "<td class=\"". age_class($pr->{'age'}) . "\">" . + print {$output_handler} "<td class=\"". age_class($pr->{'age'}) . "\">" . (defined $pr->{'age_string'} ? $pr->{'age_string'} : "No commits") . "</td>\n" . "<td class=\"link\">" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary")}, "summary") . " | " . @@ -4492,14 +4506,14 @@ sub git_project_list_body { "</tr>\n"; } if (defined $extra) { - print "<tr>\n"; + print {$output_handler} "<tr>\n"; if ($check_forks) { - print "<td></td>\n"; + print {$output_handler} "<td></td>\n"; } - print "<td colspan=\"5\">$extra</td>\n" . + print {$output_handler} "<td colspan=\"5\">$extra</td>\n" . "</tr>\n"; } - print "</table>\n"; + print {$output_handler} "</table>\n"; } sub git_log_body { @@ -4519,7 +4533,7 @@ sub git_log_body { "<span class=\"age\">$co{'age_string'}</span>" . esc_html($co{'title'}) . $ref, $commit); - print "<div class=\"title_text\">\n" . + print {$output_handler} "<div class=\"title_text\">\n" . "<div class=\"log_link\">\n" . $cgi->a({-href => href(action=>"commit", hash=>$commit)}, "commit") . " | " . @@ -4529,16 +4543,16 @@ sub git_log_body { "<br/>\n" . "</div>\n"; git_print_authorship(\%co, -tag => 'span'); - print "<br/>\n</div>\n"; + print {$output_handler} "<br/>\n</div>\n"; - print "<div class=\"log_body\">\n"; + print {$output_handler} "<div class=\"log_body\">\n"; git_print_log($co{'comment'}, -final_empty_line=> 1); - print "</div>\n"; + print {$output_handler} "</div>\n"; } if ($extra) { - print "<div class=\"page_nav\">\n"; - print "$extra\n"; - print "</div>\n"; + print {$output_handler} "<div class=\"page_nav\">\n"; + print {$output_handler} "$extra\n"; + print {$output_handler} "</div>\n"; } } @@ -4549,41 +4563,41 @@ sub git_shortlog_body { $from = 0 unless defined $from; $to = $#{$commitlist} if (!defined $to || $#{$commitlist} < $to); - print "<table class=\"shortlog\">\n"; + print {$output_handler} "<table class=\"shortlog\">\n"; my $alternate = 1; for (my $i = $from; $i <= $to; $i++) { my %co = %{$commitlist->[$i]}; my $commit = $co{'id'}; my $ref = format_ref_marker($refs, $commit); if ($alternate) { - print "<tr class=\"dark\">\n"; + print {$output_handler} "<tr class=\"dark\">\n"; } else { - print "<tr class=\"light\">\n"; + print {$output_handler} "<tr class=\"light\">\n"; } $alternate ^= 1; # git_summary() used print "<td><i>$co{'age_string'}</i></td>\n" . - print "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" . + print {$output_handler} "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" . format_author_html('td', \%co, 10) . "<td>"; - print format_subject_html($co{'title'}, $co{'title_short'}, + print {$output_handler} format_subject_html($co{'title'}, $co{'title_short'}, href(action=>"commit", hash=>$commit), $ref); - print "</td>\n" . + print {$output_handler} "</td>\n" . "<td class=\"link\">" . $cgi->a({-href => href(action=>"commit", hash=>$commit)}, "commit") . " | " . $cgi->a({-href => href(action=>"commitdiff", hash=>$commit)}, "commitdiff") . " | " . $cgi->a({-href => href(action=>"tree", hash=>$commit, hash_base=>$commit)}, "tree"); my $snapshot_links = format_snapshot_links($commit); if (defined $snapshot_links) { - print " | " . $snapshot_links; + print {$output_handler} " | " . $snapshot_links; } - print "</td>\n" . + print {$output_handler} "</td>\n" . "</tr>\n"; } if (defined $extra) { - print "<tr>\n" . + print {$output_handler} "<tr>\n" . "<td colspan=\"4\">$extra</td>\n" . "</tr>\n"; } - print "</table>\n"; + print {$output_handler} "</table>\n"; } sub git_history_body { @@ -4594,7 +4608,7 @@ sub git_history_body { $from = 0 unless defined $from; $to = $#{$commitlist} unless (defined $to && $to <= $#{$commitlist}); - print "<table class=\"history\">\n"; + print {$output_handler} "<table class=\"history\">\n"; my $alternate = 1; for (my $i = $from; $i <= $to; $i++) { my %co = %{$commitlist->[$i]}; @@ -4606,18 +4620,18 @@ sub git_history_body { my $ref = format_ref_marker($refs, $commit); if ($alternate) { - print "<tr class=\"dark\">\n"; + print {$output_handler} "<tr class=\"dark\">\n"; } else { - print "<tr class=\"light\">\n"; + print {$output_handler} "<tr class=\"light\">\n"; } $alternate ^= 1; - print "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" . + print {$output_handler} "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" . # shortlog: format_author_html('td', \%co, 10) format_author_html('td', \%co, 15, 3) . "<td>"; # originally git_history used chop_str($co{'title'}, 50) - print format_subject_html($co{'title'}, $co{'title_short'}, + print {$output_handler} format_subject_html($co{'title'}, $co{'title_short'}, href(action=>"commit", hash=>$commit), $ref); - print "</td>\n" . + print {$output_handler} "</td>\n" . "<td class=\"link\">" . $cgi->a({-href => href(action=>$ftype, hash_base=>$commit, file_name=>$file_name)}, $ftype) . " | " . $cgi->a({-href => href(action=>"commitdiff", hash=>$commit)}, "commitdiff"); @@ -4627,7 +4641,7 @@ sub git_history_body { my $blob_parent = git_get_hash_by_path($commit, $file_name); if (defined $blob_current && defined $blob_parent && $blob_current ne $blob_parent) { - print " | " . + print {$output_handler} " | " . $cgi->a({-href => href(action=>"blobdiff", hash=>$blob_current, hash_parent=>$blob_parent, hash_base=>$hash_base, hash_parent_base=>$commit, @@ -4635,15 +4649,15 @@ sub git_history_body { "diff to current"); } } - print "</td>\n" . + print {$output_handler} "</td>\n" . "</tr>\n"; } if (defined $extra) { - print "<tr>\n" . + print {$output_handler} "<tr>\n" . "<td colspan=\"4\">$extra</td>\n" . "</tr>\n"; } - print "</table>\n"; + print {$output_handler} "</table>\n"; } sub git_tags_body { @@ -4652,7 +4666,7 @@ sub git_tags_body { $from = 0 unless defined $from; $to = $#{$taglist} if (!defined $to || $#{$taglist} < $to); - print "<table class=\"tags\">\n"; + print {$output_handler} "<table class=\"tags\">\n"; my $alternate = 1; for (my $i = $from; $i <= $to; $i++) { my $entry = $taglist->[$i]; @@ -4663,50 +4677,50 @@ sub git_tags_body { $comment_short = chop_str($comment, 30, 5); } if ($alternate) { - print "<tr class=\"dark\">\n"; + print {$output_handler} "<tr class=\"dark\">\n"; } else { - print "<tr class=\"light\">\n"; + print {$output_handler} "<tr class=\"light\">\n"; } $alternate ^= 1; if (defined $tag{'age'}) { - print "<td><i>$tag{'age'}</i></td>\n"; + print {$output_handler} "<td><i>$tag{'age'}</i></td>\n"; } else { - print "<td></td>\n"; + print {$output_handler} "<td></td>\n"; } - print "<td>" . + print {$output_handler} "<td>" . $cgi->a({-href => href(action=>$tag{'reftype'}, hash=>$tag{'refid'}), -class => "list name"}, esc_html($tag{'name'})) . "</td>\n" . "<td>"; if (defined $comment) { - print format_subject_html($comment, $comment_short, + print {$output_handler} format_subject_html($comment, $comment_short, href(action=>"tag", hash=>$tag{'id'})); } - print "</td>\n" . + print {$output_handler} "</td>\n" . "<td class=\"selflink\">"; if ($tag{'type'} eq "tag") { - print $cgi->a({-href => href(action=>"tag", hash=>$tag{'id'})}, "tag"); + print {$output_handler} $cgi->a({-href => href(action=>"tag", hash=>$tag{'id'})}, "tag"); } else { - print " "; + print {$output_handler} " "; } - print "</td>\n" . + print {$output_handler} "</td>\n" . "<td class=\"link\">" . " | " . $cgi->a({-href => href(action=>$tag{'reftype'}, hash=>$tag{'refid'})}, $tag{'reftype'}); if ($tag{'reftype'} eq "commit") { - print " | " . $cgi->a({-href => href(action=>"shortlog", hash=>$tag{'fullname'})}, "shortlog") . + print {$output_handler} " | " . $cgi->a({-href => href(action=>"shortlog", hash=>$tag{'fullname'})}, "shortlog") . " | " . $cgi->a({-href => href(action=>"log", hash=>$tag{'fullname'})}, "log"); } elsif ($tag{'reftype'} eq "blob") { - print " | " . $cgi->a({-href => href(action=>"blob_plain", hash=>$tag{'refid'})}, "raw"); + print {$output_handler} " | " . $cgi->a({-href => href(action=>"blob_plain", hash=>$tag{'refid'})}, "raw"); } - print "</td>\n" . + print {$output_handler} "</td>\n" . "</tr>"; } if (defined $extra) { - print "<tr>\n" . + print {$output_handler} "<tr>\n" . "<td colspan=\"5\">$extra</td>\n" . "</tr>\n"; } - print "</table>\n"; + print {$output_handler} "</table>\n"; } sub git_heads_body { @@ -4715,19 +4729,19 @@ sub git_heads_body { $from = 0 unless defined $from; $to = $#{$headlist} if (!defined $to || $#{$headlist} < $to); - print "<table class=\"heads\">\n"; + print {$output_handler} "<table class=\"heads\">\n"; my $alternate = 1; for (my $i = $from; $i <= $to; $i++) { my $entry = $headlist->[$i]; my %ref = %$entry; my $curr = $ref{'id'} eq $head; if ($alternate) { - print "<tr class=\"dark\">\n"; + print {$output_handler} "<tr class=\"dark\">\n"; } else { - print "<tr class=\"light\">\n"; + print {$output_handler} "<tr class=\"light\">\n"; } $alternate ^= 1; - print "<td><i>$ref{'age'}</i></td>\n" . + print {$output_handler} "<td><i>$ref{'age'}</i></td>\n" . ($curr ? "<td class=\"current_head\">" : "<td>") . $cgi->a({-href => href(action=>"shortlog", hash=>$ref{'fullname'}), -class => "list name"},esc_html($ref{'name'})) . @@ -4740,11 +4754,11 @@ sub git_heads_body { "</tr>"; } if (defined $extra) { - print "<tr>\n" . + print {$output_handler} "<tr>\n" . "<td colspan=\"3\">$extra</td>\n" . "</tr>\n"; } - print "</table>\n"; + print {$output_handler} "</table>\n"; } sub git_search_grep_body { @@ -4752,7 +4766,7 @@ sub git_search_grep_body { $from = 0 unless defined $from; $to = $#{$commitlist} if (!defined $to || $#{$commitlist} < $to); - print "<table class=\"commit_search\">\n"; + print {$output_handler} "<table class=\"commit_search\">\n"; my $alternate = 1; for (my $i = $from; $i <= $to; $i++) { my %co = %{$commitlist->[$i]}; @@ -4761,12 +4775,12 @@ sub git_search_grep_body { } my $commit = $co{'id'}; if ($alternate) { - print "<tr class=\"dark\">\n"; + print {$output_handler} "<tr class=\"dark\">\n"; } else { - print "<tr class=\"light\">\n"; + print {$output_handler} "<tr class=\"light\">\n"; } $alternate ^= 1; - print "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" . + print {$output_handler} "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" . format_author_html('td', \%co, 15, 5) . "<td>" . $cgi->a({-href => href(action=>"commit", hash=>$co{'id'}), @@ -4786,25 +4800,25 @@ sub git_search_grep_body { $match = esc_html($match); $trail = esc_html($trail); - print "$lead<span class=\"match\">$match</span>$trail<br />"; + print {$output_handler} "$lead<span class=\"match\">$match</span>$trail<br />"; } } - print "</td>\n" . + print {$output_handler} "</td>\n" . "<td class=\"link\">" . $cgi->a({-href => href(action=>"commit", hash=>$co{'id'})}, "commit") . " | " . $cgi->a({-href => href(action=>"commitdiff", hash=>$co{'id'})}, "commitdiff") . " | " . $cgi->a({-href => href(action=>"tree", hash=>$co{'tree'}, hash_base=>$co{'id'})}, "tree"); - print "</td>\n" . + print {$output_handler} "</td>\n" . "</tr>\n"; } if (defined $extra) { - print "<tr>\n" . + print {$output_handler} "<tr>\n" . "<td colspan=\"3\">$extra</td>\n" . "</tr>\n"; } - print "</table>\n"; + print {$output_handler} "</table>\n"; } ## ====================================================================== @@ -4824,11 +4838,11 @@ sub git_project_list { git_header_html(); if (-f $home_text) { - print "<div class=\"index_include\">\n"; + print {$output_handler} "<div class=\"index_include\">\n"; insert_file($home_text); - print "</div>\n"; + print {$output_handler} "</div>\n"; } - print $cgi->startform(-method => "get") . + print {$output_handler} $cgi->startform(-method => "get") . "<p class=\"projsearch\">Search:\n" . $cgi->textfield(-name => "s", -value => $searchtext) . "\n" . "</p>" . @@ -4858,7 +4872,7 @@ sub git_forks { sub git_project_index { my @projects = git_get_projects_list($project); - print $cgi->header( + print {$output_handler} $cgi->header( -type => 'text/plain', -charset => 'utf-8', -content_disposition => 'inline; filename="index.aux"'); @@ -4875,7 +4889,7 @@ sub git_project_index { $path =~ s/ /\+/g; $owner =~ s/ /\+/g; - print "$path $owner\n"; + print {$output_handler} "$path $owner\n"; } } @@ -4902,12 +4916,12 @@ sub git_summary { git_header_html(); git_print_page_nav('summary','', $head); - print "<div class=\"title\"> </div>\n"; - print "<table class=\"projects_list\">\n" . + print {$output_handler} "<div class=\"title\"> </div>\n"; + print {$output_handler} "<table class=\"projects_list\">\n" . "<tr id=\"metadata_desc\"><td>description</td><td>" . esc_html($descr) . "</td></tr>\n" . "<tr id=\"metadata_owner\"><td>owner</td><td>" . esc_html($owner) . "</td></tr>\n"; if (defined $cd{'rfc2822'}) { - print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n"; + print {$output_handler} "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n"; } # use per project git URL list in $projectroot/$project/cloneurl @@ -4917,7 +4931,7 @@ sub git_summary { @url_list = map { "$_/$project" } @git_base_url_list unless @url_list; foreach my $git_url (@url_list) { next unless $git_url; - print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n"; + print {$output_handler} "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n"; $url_tag = ""; } @@ -4926,23 +4940,23 @@ sub git_summary { if ($show_ctags) { my $ctags = git_get_project_ctags($project); my $cloud = git_populate_project_tagcloud($ctags); - print "<tr id=\"metadata_ctags\"><td>Content tags:<br />"; - print "</td>\n<td>" unless %$ctags; - print "<form action=\"$show_ctags\" method=\"post\"><input type=\"hidden\" name=\"p\" value=\"$project\" />Add: <input type=\"text\" name=\"t\" size=\"8\" /></form>"; - print "</td>\n<td>" if %$ctags; - print git_show_project_tagcloud($cloud, 48); - print "</td></tr>"; + print {$output_handler} "<tr id=\"metadata_ctags\"><td>Content tags:<br />"; + print {$output_handler} "</td>\n<td>" unless %$ctags; + print {$output_handler} "<form action=\"$show_ctags\" method=\"post\"><input type=\"hidden\" name=\"p\" value=\"$project\" />Add: <input type=\"text\" name=\"t\" size=\"8\" /></form>"; + print {$output_handler} "</td>\n<td>" if %$ctags; + print {$output_handler} git_show_project_tagcloud($cloud, 48); + print {$output_handler} "</td></tr>"; } - print "</table>\n"; + print {$output_handler} "</table>\n"; # If XSS prevention is on, we don't include README.html. # TODO: Allow a readme in some safe format. if (!$prevent_xss && -s "$projectroot/$project/README.html") { - print "<div class=\"title\">readme</div>\n" . + print {$output_handler} "<div class=\"title\">readme</div>\n" . "<div class=\"readme\">\n"; insert_file("$projectroot/$project/README.html"); - print "\n</div>\n"; # class="readme" + print {$output_handler} "\n</div>\n"; # class="readme" } # we need to request one more than 16 (0..15) to check if @@ -4991,7 +5005,7 @@ sub git_tag { } git_print_header_div('commit', esc_html($tag{'name'}), $hash); - print "<div class=\"title_text\">\n" . + print {$output_handler} "<div class=\"title_text\">\n" . "<table class=\"object_header\">\n" . "<tr>\n" . "<td>object</td>\n" . @@ -5003,15 +5017,15 @@ sub git_tag { if (defined($tag{'author'})) { git_print_authorship_rows(\%tag, 'author'); } - print "</table>\n\n" . + print {$output_handler} "</table>\n\n" . "</div>\n"; - print "<div class=\"page_body\">"; + print {$output_handler} "<div class=\"page_body\">"; my $comment = $tag{'comment'}; foreach my $line (@$comment) { chomp $line; - print esc_html($line, -nbsp=>1) . "<br/>\n"; + print {$output_handler} esc_html($line, -nbsp=>1) . "<br/>\n"; } - print "</div>\n"; + print {$output_handler} "</div>\n"; git_footer_html(); } @@ -5062,21 +5076,23 @@ sub git_blame_common { # incremental blame data returns early if ($format eq 'data') { - print $cgi->header( + print {$output_handler} $cgi->header( -type=>"text/plain", -charset => "utf-8", -status=> "200 OK"); local $| = 1; # output autoflush - print while <$fd>; + while (<$fd>) { + print {$output_handler} $_; + } close $fd - or print "ERROR $!\n"; + or die_error(500, "ERROR $!\n"); - print 'END'; + print {$output_handler} 'END'; if (defined $t0 && gitweb_check_feature('timed')) { - print ' '. + print {$output_handler} ' '. Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]). ' '.$number_of_git_cmds; } - print "\n"; + print {$output_handler} "\n"; return; } @@ -5109,20 +5125,20 @@ sub git_blame_common { # page body if ($format eq 'incremental') { - print "<noscript>\n<div class=\"error\"><center><b>\n". + print {$output_handler} "<noscript>\n<div class=\"error\"><center><b>\n". "This page requires JavaScript to run.\n Use ". $cgi->a({-href => href(action=>'blame',javascript=>0,-replay=>1)}, 'this page'). " instead.\n". "</b></center></div>\n</noscript>\n"; - print qq!<div id="progress_bar" style="width: 100%; background-color: yellow"></div>\n!; + print {$output_handler} qq!<div id="progress_bar" style="width: 100%; background-color: yellow"></div>\n!; } - print qq!<div class="page_body">\n!; - print qq!<div id="progress_info">... / ...</div>\n! + print {$output_handler} qq!<div class="page_body">\n!; + print {$output_handler} qq!<div id="progress_info">... / ...</div>\n! if ($format eq 'incremental'); - print qq!<table id="blame_table" class="blame" width="100%">\n!. + print {$output_handler} qq!<table id="blame_table" class="blame" width="100%">\n!. #qq!<col width="5.5em" /><col width="2.5em" /><col width="*" />\n!. qq!<thead>\n!. qq!<tr><th>Commit</th><th>Line</th><th>Data</th></tr>\n!. @@ -5143,12 +5159,12 @@ sub git_blame_common { chomp $line; $linenr++; - print qq!<tr id="l$linenr" class="$color_class">!. + print {$output_handler} qq!<tr id="l$linenr" class="$color_class">!. qq!<td class="sha1"><a href=""> </a></td>!. qq!<td class="linenr">!. qq!<a class="linenr" href="">$linenr</a></td>!; - print qq!<td class="pre">! . esc_html($line) . "</td>\n"; - print qq!</tr>\n!; + print {$output_handler} qq!<td class="pre">! . esc_html($line) . "</td>\n"; + print {$output_handler} qq!</tr>\n!; } } else { # porcelain, i.e. ordinary blame @@ -5189,25 +5205,25 @@ sub git_blame_common { $tr_class .= ' boundary' if (exists $meta->{'boundary'}); $tr_class .= ' no-previous' if ($meta->{'nprevious'} == 0); $tr_class .= ' multiple-previous' if ($meta->{'nprevious'} > 1); - print "<tr id=\"l$lineno\" class=\"$tr_class\">\n"; + print {$output_handler} "<tr id=\"l$lineno\" class=\"$tr_class\">\n"; if ($group_size) { - print "<td class=\"sha1\""; - print " title=\"". esc_html($author) . ", $date\""; - print " rowspan=\"$group_size\"" if ($group_size > 1); - print ">"; - print $cgi->a({-href => href(action=>"commit", + print {$output_handler} "<td class=\"sha1\""; + print {$output_handler} " title=\"". esc_html($author) . ", $date\""; + print {$output_handler} " rowspan=\"$group_size\"" if ($group_size > 1); + print {$output_handler} ">"; + print {$output_handler} $cgi->a({-href => href(action=>"commit", hash=>$full_rev, file_name=>$file_name)}, esc_html($short_rev)); if ($group_size >= 2) { my @author_initials = ($author =~ /\b([[:upper:]])\B/g); if (@author_initials) { - print "<br />" . + print {$output_handler} "<br />" . esc_html(join('', @author_initials)); # or join('.', ...) } } - print "</td>\n"; + print {$output_handler} "</td>\n"; } # 'previous' <sha1 of parent commit> <filename at commit> if (exists $meta->{'previous'} && @@ -5224,23 +5240,23 @@ sub git_blame_common { my $blamed = href(action => 'blame', file_name => $linenr_filename, hash_base => $linenr_commit); - print "<td class=\"linenr\">"; - print $cgi->a({ -href => "$blamed#l$orig_lineno", + print {$output_handler} "<td class=\"linenr\">"; + print {$output_handler} $cgi->a({ -href => "$blamed#l$orig_lineno", -class => "linenr" }, esc_html($lineno)); - print "</td>"; - print "<td class=\"pre\">" . esc_html($data) . "</td>\n"; - print "</tr>\n"; + print {$output_handler} "</td>"; + print {$output_handler} "<td class=\"pre\">" . esc_html($data) . "</td>\n"; + print {$output_handler} "</tr>\n"; } # end while } # footer - print "</tbody>\n". + print {$output_handler} "</tbody>\n". "</table>\n"; # class="blame" - print "</div>\n"; # class="blame_body" + print {$output_handler} "</div>\n"; # class="blame_body" close $fd - or print "Reading blob failed\n"; + or print {$output_handler} "Reading blob failed\n"; git_footer_html(); } @@ -5323,16 +5339,17 @@ sub git_blob_plain { my $sandbox = $prevent_xss && $type !~ m!^(?:text/plain|image/(?:gif|png|jpeg))$!; - print $cgi->header( + print {$output_handler} $cgi->header( -type => $type, -expires => $expires, -content_disposition => ($sandbox ? 'attachment' : 'inline') . '; filename="' . $save_as . '"'); local $/ = undef; - binmode STDOUT, ':raw'; - print <$fd>; - binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi + + binmode $output_handler_bin, ':raw'; + print {$output_handler_bin} <$fd>; + binmode $output_handler_bin, ':utf8'; # as set at the beginning of gitweb.cgi close $fd; } @@ -5391,18 +5408,18 @@ sub git_blob { git_print_page_nav('','', $hash_base,$co{'tree'},$hash_base, $formats_nav); git_print_header_div('commit', esc_html($co{'title'}), $hash_base); } else { - print "<div class=\"page_nav\">\n" . + print {$output_handler} "<div class=\"page_nav\">\n" . "<br/><br/></div>\n" . "<div class=\"title\">$hash</div>\n"; } git_print_page_path($file_name, "blob", $hash_base); - print "<div class=\"page_body\">\n"; + print {$output_handler} "<div class=\"page_body\">\n"; if ($mimetype =~ m!^image/!) { - print qq!<img type="$mimetype"!; + print {$output_handler} qq!<img type="$mimetype"!; if ($file_name) { - print qq! alt="$file_name" title="$file_name"!; + print {$output_handler} qq! alt="$file_name" title="$file_name"!; } - print qq! src="! . + print {$output_handler} qq! src="! . href(action=>"blob_plain", hash=>$hash, hash_base=>$hash_base, file_name=>$file_name) . qq!" />\n!; @@ -5412,14 +5429,14 @@ sub git_blob { chomp $line; $nr++; $line = untabify($line); - printf "<div class=\"pre\"><a id=\"l%i\" href=\"" . href(-replay => 1) + printf {$output_handler} "<div class=\"pre\"><a id=\"l%i\" href=\"" . href(-replay => 1) . "#l%i\" class=\"linenr\">%4i</a> %s</div>\n", $nr, $nr, $nr, esc_html($line, -nbsp=>1); } } close $fd - or print "Reading blob failed.\n"; - print "</div>"; + or print {$output_handler} "Reading blob failed.\n"; + print {$output_handler} "</div>"; git_footer_html(); } @@ -5474,9 +5491,9 @@ sub git_tree { git_print_header_div('commit', esc_html($co{'title'}) . $ref, $hash_base); } else { undef $hash_base; - print "<div class=\"page_nav\">\n"; - print "<br/><br/></div>\n"; - print "<div class=\"title\">$hash</div>\n"; + print {$output_handler} "<div class=\"page_nav\">\n"; + print {$output_handler} "<br/><br/></div>\n"; + print {$output_handler} "<div class=\"title\">$hash</div>\n"; } if (defined $file_name) { $basedir = $file_name; @@ -5485,16 +5502,16 @@ sub git_tree { } git_print_page_path($file_name, 'tree', $hash_base); } - print "<div class=\"page_body\">\n"; - print "<table class=\"tree\">\n"; + print {$output_handler} "<div class=\"page_body\">\n"; + print {$output_handler} "<table class=\"tree\">\n"; my $alternate = 1; # '..' (top directory) link if possible if (defined $hash_base && defined $file_name && $file_name =~ m![^/]+$!) { if ($alternate) { - print "<tr class=\"dark\">\n"; + print {$output_handler} "<tr class=\"dark\">\n"; } else { - print "<tr class=\"light\">\n"; + print {$output_handler} "<tr class=\"light\">\n"; } $alternate ^= 1; @@ -5502,33 +5519,33 @@ sub git_tree { $up =~ s!/?[^/]+$!!; undef $up unless $up; # based on git_print_tree_entry - print '<td class="mode">' . mode_str('040000') . "</td>\n"; - print '<td class="size"> </td>'."\n" if $show_sizes; - print '<td class="list">'; - print $cgi->a({-href => href(action=>"tree", + print {$output_handler} '<td class="mode">' . mode_str('040000') . "</td>\n"; + print {$output_handler} '<td class="size"> </td>'."\n" if $show_sizes; + print {$output_handler} '<td class="list">'; + print {$output_handler} $cgi->a({-href => href(action=>"tree", hash_base=>$hash_base, file_name=>$up)}, ".."); - print "</td>\n"; - print "<td class=\"link\"></td>\n"; + print {$output_handler} "</td>\n"; + print {$output_handler} "<td class=\"link\"></td>\n"; - print "</tr>\n"; + print {$output_handler} "</tr>\n"; } foreach my $line (@entries) { my %t = parse_ls_tree_line($line, -z => 1, -l => $show_sizes); if ($alternate) { - print "<tr class=\"dark\">\n"; + print {$output_handler} "<tr class=\"dark\">\n"; } else { - print "<tr class=\"light\">\n"; + print {$output_handler} "<tr class=\"light\">\n"; } $alternate ^= 1; git_print_tree_entry(\%t, $basedir, $hash_base, $have_blame); - print "</tr>\n"; + print {$output_handler} "</tr>\n"; } - print "</table>\n" . + print {$output_handler} "</table>\n" . "</div>"; git_footer_html(); } @@ -5605,16 +5622,16 @@ sub git_snapshot { } $filename =~ s/(["\\])/\\$1/g; - print $cgi->header( + print {$output_handler} $cgi->header( -type => $known_snapshot_formats{$format}{'type'}, -content_disposition => 'inline; filename="' . $filename . '"', -status => '200 OK'); open my $fd, "-|", $cmd or die_error(500, "Execute git-archive failed"); - binmode STDOUT, ':raw'; - print <$fd>; - binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi + binmode $output_handler_bin, ':raw'; + print {$output_handler_bin} <$fd>; + binmode $output_handler_bin, ':utf8'; # as set at the beginning of gitweb.cgi close $fd; } @@ -5765,11 +5782,11 @@ sub git_commit { } else { git_print_header_div('tree', esc_html($co{'title'}) . $ref, $co{'tree'}, $hash); } - print "<div class=\"title_text\">\n" . + print {$output_handler} "<div class=\"title_text\">\n" . "<table class=\"object_header\">\n"; git_print_authorship_rows(\%co); - print "<tr><td>commit</td><td class=\"sha1\">$co{'id'}</td></tr>\n"; - print "<tr>" . + print {$output_handler} "<tr><td>commit</td><td class=\"sha1\">$co{'id'}</td></tr>\n"; + print {$output_handler} "<tr>" . "<td>tree</td>" . "<td class=\"sha1\">" . $cgi->a({-href => href(action=>"tree", hash=>$co{'tree'}, hash_base=>$hash), @@ -5780,13 +5797,13 @@ sub git_commit { "tree"); my $snapshot_links = format_snapshot_links($hash); if (defined $snapshot_links) { - print " | " . $snapshot_links; + print {$output_handler} " | " . $snapshot_links; } - print "</td>" . + print {$output_handler} "</td>" . "</tr>\n"; foreach my $par (@$parents) { - print "<tr>" . + print {$output_handler} "<tr>" . "<td>parent</td>" . "<td class=\"sha1\">" . $cgi->a({-href => href(action=>"commit", hash=>$par), @@ -5799,12 +5816,12 @@ sub git_commit { "</td>" . "</tr>\n"; } - print "</table>". + print {$output_handler} "</table>". "</div>\n"; - print "<div class=\"page_body\">\n"; + print {$output_handler} "<div class=\"page_body\">\n"; git_print_log($co{'comment'}); - print "</div>\n"; + print {$output_handler} "</div>\n"; git_difftree_body(\@difftree, $hash, @$parents); @@ -5852,7 +5869,7 @@ sub git_object { die_error(400, "Not enough information to find object"); } - print $cgi->redirect(-uri => href(action=>$type, -full=>1, + print {$output_handler} $cgi->redirect(-uri => href(action=>$type, -full=>1, hash=>$hash, hash_base=>$hash_base, file_name=>$file_name), -status => '302 Found'); @@ -5943,23 +5960,23 @@ sub git_blobdiff { git_print_page_nav('','', $hash_base,$co{'tree'},$hash_base, $formats_nav); git_print_header_div('commit', esc_html($co{'title'}), $hash_base); } else { - print "<div class=\"page_nav\"><br/>$formats_nav<br/></div>\n"; - print "<div class=\"title\">$hash vs $hash_parent</div>\n"; + print {$output_handler} "<div class=\"page_nav\"><br/>$formats_nav<br/></div>\n"; + print {$output_handler} "<div class=\"title\">$hash vs $hash_parent</div>\n"; } if (defined $file_name) { git_print_page_path($file_name, "blob", $hash_base); } else { - print "<div class=\"page_path\"></div>\n"; + print {$output_handler} "<div class=\"page_path\"></div>\n"; } } elsif ($format eq 'plain') { - print $cgi->header( + print {$output_handler} $cgi->header( -type => 'text/plain', -charset => 'utf-8', -expires => $expires, -content_disposition => 'inline; filename="' . "$file_name" . '.patch"'); - print "X-Git-Url: " . $cgi->self_url() . "\n\n"; + print {$output_handler} "X-Git-Url: " . $cgi->self_url() . "\n\n"; } else { die_error(400, "Unknown blobdiff format"); @@ -5967,12 +5984,12 @@ sub git_blobdiff { # patch if ($format eq 'html') { - print "<div class=\"page_body\">\n"; + print {$output_handler} "<div class=\"page_body\">\n"; git_patchset_body($fd, [ \%diffinfo ], $hash_base, $hash_parent_base); close $fd; - print "</div>\n"; # class="page_body" + print {$output_handler} "</div>\n"; # class="page_body" git_footer_html(); } else { @@ -5980,12 +5997,12 @@ sub git_blobdiff { $line =~ s!a/($hash|$hash_parent)!'a/'.esc_path($diffinfo{'from_file'})!eg; $line =~ s!b/($hash|$hash_parent)!'b/'.esc_path($diffinfo{'to_file'})!eg; - print $line; + print {$output_handler} $line; last if $line =~ m!^\+\+\+!; } local $/ = undef; - print <$fd>; + print {$output_handler} <$fd>; close $fd; } } @@ -6148,16 +6165,16 @@ sub git_commitdiff { git_header_html(undef, $expires); git_print_page_nav('commitdiff','', $hash,$co{'tree'},$hash, $formats_nav); git_print_header_div('commit', esc_html($co{'title'}) . $ref, $hash); - print "<div class=\"title_text\">\n" . + print {$output_handler} "<div class=\"title_text\">\n" . "<table class=\"object_header\">\n"; git_print_authorship_rows(\%co); - print "</table>". + print {$output_handler} "</table>". "</div>\n"; - print "<div class=\"page_body\">\n"; + print {$output_handler} "<div class=\"page_body\">\n"; if (@{$co{'comment'}} > 1) { - print "<div class=\"log\">\n"; + print {$output_handler} "<div class=\"log\">\n"; git_print_log($co{'comment'}, -final_empty_line=> 1, -remove_title => 1); - print "</div>\n"; # class="log" + print {$output_handler} "</div>\n"; # class="log" } } elsif ($format eq 'plain') { @@ -6165,27 +6182,27 @@ sub git_commitdiff { my $tagname = git_get_rev_name_tags($hash); my $filename = basename($project) . "-$hash.patch"; - print $cgi->header( + print {$output_handler} $cgi->header( -type => 'text/plain', -charset => 'utf-8', -expires => $expires, -content_disposition => 'inline; filename="' . "$filename" . '"'); my %ad = parse_date($co{'author_epoch'}, $co{'author_tz'}); - print "From: " . to_utf8($co{'author'}) . "\n"; - print "Date: $ad{'rfc2822'} ($ad{'tz_local'})\n"; - print "Subject: " . to_utf8($co{'title'}) . "\n"; + print {$output_handler} "From: " . to_utf8($co{'author'}) . "\n"; + print {$output_handler} "Date: $ad{'rfc2822'} ($ad{'tz_local'})\n"; + print {$output_handler} "Subject: " . to_utf8($co{'title'}) . "\n"; - print "X-Git-Tag: $tagname\n" if $tagname; - print "X-Git-Url: " . $cgi->self_url() . "\n\n"; + print {$output_handler} "X-Git-Tag: $tagname\n" if $tagname; + print {$output_handler} "X-Git-Url: " . $cgi->self_url() . "\n\n"; foreach my $line (@{$co{'comment'}}) { - print to_utf8($line) . "\n"; + print {$output_handler} to_utf8($line) . "\n"; } - print "---\n\n"; + print {$output_handler} "---\n\n"; } elsif ($format eq 'patch') { my $filename = basename($project) . "-$hash.patch"; - print $cgi->header( + print {$output_handler} $cgi->header( -type => 'text/plain', -charset => 'utf-8', -expires => $expires, @@ -6198,24 +6215,24 @@ sub git_commitdiff { $hash_parent eq '-c' || $hash_parent eq '--cc'; git_difftree_body(\@difftree, $hash, $use_parents ? @{$co{'parents'}} : $hash_parent); - print "<br/>\n"; + print {$output_handler} "<br/>\n"; git_patchset_body($fd, \@difftree, $hash, $use_parents ? @{$co{'parents'}} : $hash_parent); close $fd; - print "</div>\n"; # class="page_body" + print {$output_handler} "</div>\n"; # class="page_body" git_footer_html(); } elsif ($format eq 'plain') { local $/ = undef; - print <$fd>; + print {$output_handler} <$fd>; close $fd - or print "Reading git-diff-tree failed\n"; + or print {$output_handler} "Reading git-diff-tree failed\n"; } elsif ($format eq 'patch') { local $/ = undef; - print <$fd>; + print {$output_handler} <$fd>; close $fd - or print "Reading git-format-patch failed\n"; + or print {$output_handler} "Reading git-format-patch failed\n"; } } @@ -6318,7 +6335,7 @@ sub git_search { git_print_page_nav('','', $hash,$co{'tree'},$hash); git_print_header_div('commit', esc_html($co{'title'}), $hash); - print "<table class=\"pickaxe search\">\n"; + print {$output_handler} "<table class=\"pickaxe search\">\n"; my $alternate = 1; local $/ = "\n"; open my $fd, '-|', git_cmd(), '--no-pager', 'log', @diff_opts, @@ -6334,24 +6351,24 @@ sub git_search { if (defined $set{'commit'}) { # finish previous commit if (%co) { - print "</td>\n" . + print {$output_handler} "</td>\n" . "<td class=\"link\">" . $cgi->a({-href => href(action=>"commit", hash=>$co{'id'})}, "commit") . " | " . $cgi->a({-href => href(action=>"tree", hash=>$co{'tree'}, hash_base=>$co{'id'})}, "tree"); - print "</td>\n" . + print {$output_handler} "</td>\n" . "</tr>\n"; } if ($alternate) { - print "<tr class=\"dark\">\n"; + print {$output_handler} "<tr class=\"dark\">\n"; } else { - print "<tr class=\"light\">\n"; + print {$output_handler} "<tr class=\"light\">\n"; } $alternate ^= 1; %co = parse_commit($set{'commit'}); my $author = chop_and_escape_str($co{'author_name'}, 15, 5); - print "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" . + print {$output_handler} "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" . "<td><i>$author</i></td>\n" . "<td>" . $cgi->a({-href => href(action=>"commit", hash=>$co{'id'}), @@ -6360,7 +6377,7 @@ sub git_search { } elsif (defined $set{'to_id'}) { next if ($set{'to_id'} =~ m/^0{40}$/); - print $cgi->a({-href => href(action=>"blob", hash_base=>$co{'id'}, + print {$output_handler} $cgi->a({-href => href(action=>"blob", hash_base=>$co{'id'}, hash=>$set{'to_id'}, file_name=>$set{'to_file'}), -class => "list"}, "<span class=\"match\">" . esc_path($set{'file'}) . "</span>") . @@ -6371,23 +6388,23 @@ sub git_search { # finish last commit (warning: repetition!) if (%co) { - print "</td>\n" . + print {$output_handler} "</td>\n" . "<td class=\"link\">" . $cgi->a({-href => href(action=>"commit", hash=>$co{'id'})}, "commit") . " | " . $cgi->a({-href => href(action=>"tree", hash=>$co{'tree'}, hash_base=>$co{'id'})}, "tree"); - print "</td>\n" . + print {$output_handler} "</td>\n" . "</tr>\n"; } - print "</table>\n"; + print {$output_handler} "</table>\n"; } if ($searchtype eq 'grep') { git_print_page_nav('','', $hash,$co{'tree'},$hash); git_print_header_div('commit', esc_html($co{'title'}), $hash); - print "<table class=\"grep_search\">\n"; + print {$output_handler} "<table class=\"grep_search\">\n"; my $alternate = 1; my $matches = 0; local $/ = "\n"; @@ -6406,21 +6423,21 @@ sub git_search { (undef, $file, $lno, $ltext) = split(/:/, $line, 4); } if ($file ne $lastfile) { - $lastfile and print "</td></tr>\n"; + $lastfile and print {$output_handler} "</td></tr>\n"; if ($alternate++) { - print "<tr class=\"dark\">\n"; + print {$output_handler} "<tr class=\"dark\">\n"; } else { - print "<tr class=\"light\">\n"; + print {$output_handler} "<tr class=\"light\">\n"; } - print "<td class=\"list\">". + print {$output_handler} "<td class=\"list\">". $cgi->a({-href => href(action=>"blob", hash=>$co{'hash'}, file_name=>"$file"), -class => "list"}, esc_path($file)); - print "</td><td>\n"; + print {$output_handler} "</td><td>\n"; $lastfile = $file; } if ($binary) { - print "<div class=\"binary\">Binary file</div>\n"; + print {$output_handler} "<div class=\"binary\">Binary file</div>\n"; } else { $ltext = untabify($ltext); if ($ltext =~ m/^(.*)($search_regexp)(.*)$/i) { @@ -6432,7 +6449,7 @@ sub git_search { } else { $ltext = esc_html($ltext, -nbsp=>1); } - print "<div class=\"pre\">" . + print {$output_handler} "<div class=\"pre\">" . $cgi->a({-href => href(action=>"blob", hash=>$co{'hash'}, file_name=>"$file").'#l'.$lno, -class => "linenr"}, sprintf('%4i', $lno)) @@ -6440,16 +6457,16 @@ sub git_search { } } if ($lastfile) { - print "</td></tr>\n"; + print {$output_handler} "</td></tr>\n"; if ($matches > 1000) { - print "<div class=\"diff nodifferences\">Too many matches, listing trimmed</div>\n"; + print {$output_handler} "<div class=\"diff nodifferences\">Too many matches, listing trimmed</div>\n"; } } else { - print "<div class=\"diff nodifferences\">No matches found</div>\n"; + print {$output_handler} "<div class=\"diff nodifferences\">No matches found</div>\n"; } close $fd; - print "</table>\n"; + print {$output_handler} "</table>\n"; } git_footer_html(); } @@ -6457,7 +6474,7 @@ sub git_search { sub git_search_help { git_header_html(); git_print_page_nav('','', $hash,$hash,$hash); - print <<EOT; + print {$output_handler} <<EOT; <p><strong>Pattern</strong> is by default a normal string that is matched precisely (but without regard to case, except in the case of pickaxe). However, when you check the <em>re</em> checkbox, the pattern entered is recognized as the POSIX extended @@ -6469,7 +6486,7 @@ insensitive).</p> EOT my $have_grep = gitweb_check_feature('grep'); if ($have_grep) { - print <<EOT; + print {$output_handler} <<EOT; <dt><b>grep</b></dt> <dd>All files in the currently selected tree (HEAD unless you are explicitly browsing a different one) are searched for the given pattern. On large trees, this search can take @@ -6478,7 +6495,7 @@ due to git-grep peculiarity, currently if regexp mode is turned off, the matches case-sensitive.</dd> EOT } - print <<EOT; + print {$output_handler} <<EOT; <dt><b>author</b></dt> <dd>Name and e-mail of the change author and date of birth of the patch will be scanned for the given pattern.</dd> <dt><b>committer</b></dt> @@ -6486,7 +6503,7 @@ EOT EOT my $have_pickaxe = gitweb_check_feature('pickaxe'); if ($have_pickaxe) { - print <<EOT; + print {$output_handler} <<EOT; <dt><b>pickaxe</b></dt> <dd>All commits that caused the string to appear or disappear from any file (changes that added, removed or "modified" the string) will be listed. This search can take a while and @@ -6494,7 +6511,7 @@ takes a lot of strain on the server, so please use it wisely. Note that since yo interested even in changes just changing the case as well, this search is case sensitive.</dd> EOT } - print "</dl>\n"; + print {$output_handler} "</dl>\n"; git_footer_html(); } @@ -6541,7 +6558,7 @@ sub git_feed { $since = Time::ParseDate::parsedate($if_modified, GMT => 1); } if (defined $since && $latest_epoch <= $since) { - print $cgi->header( + print {$output_handler} $cgi->header( -type => $content_type, -charset => 'utf-8', -last_modified => $latest_date{'rfc2822'}, @@ -6549,12 +6566,12 @@ sub git_feed { return; } } - print $cgi->header( + print {$output_handler} $cgi->header( -type => $content_type, -charset => 'utf-8', -last_modified => $latest_date{'rfc2822'}); } else { - print $cgi->header( + print {$output_handler} $cgi->header( -type => $content_type, -charset => 'utf-8'); } @@ -6598,13 +6615,13 @@ sub git_feed { } else { $alt_url = href(-full=>1, action=>"summary"); } - print qq!<?xml version="1.0" encoding="utf-8"?>\n!; + print {$output_handler} qq!<?xml version="1.0" encoding="utf-8"?>\n!; if ($format eq 'rss') { - print <<XML; + print {$output_handler} <<XML; <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"> <channel> XML - print "<title>$title</title>\n" . + print {$output_handler} "<title>$title</title>\n" . "<link>$alt_url</link>\n" . "<description>$descr</description>\n" . "<language>en</language>\n" . @@ -6614,22 +6631,22 @@ XML # prefer the logo to the favicon, since RSS # doesn't allow both my $img = esc_url($logo || $favicon); - print "<image>\n" . + print {$output_handler} "<image>\n" . "<url>$img</url>\n" . "<title>$title</title>\n" . "<link>$alt_url</link>\n" . "</image>\n"; } if (%latest_date) { - print "<pubDate>$latest_date{'rfc2822'}</pubDate>\n"; - print "<lastBuildDate>$latest_date{'rfc2822'}</lastBuildDate>\n"; + print {$output_handler} "<pubDate>$latest_date{'rfc2822'}</pubDate>\n"; + print {$output_handler} "<lastBuildDate>$latest_date{'rfc2822'}</lastBuildDate>\n"; } - print "<generator>gitweb v.$version/$git_version</generator>\n"; + print {$output_handler} "<generator>gitweb v.$version/$git_version</generator>\n"; } elsif ($format eq 'atom') { - print <<XML; + print {$output_handler} <<XML; <feed xmlns="http://www.w3.org/2005/Atom"> XML - print "<title>$title</title>\n" . + print {$output_handler} "<title>$title</title>\n" . "<subtitle>$descr</subtitle>\n" . '<link rel="alternate" type="text/html" href="' . $alt_url . '" />' . "\n" . @@ -6639,19 +6656,19 @@ XML # use project owner for feed author "<author><name>$owner</name></author>\n"; if (defined $favicon) { - print "<icon>" . esc_url($favicon) . "</icon>\n"; + print {$output_handler} "<icon>" . esc_url($favicon) . "</icon>\n"; } if (defined $logo_url) { # not twice as wide as tall: 72 x 27 pixels - print "<logo>" . esc_url($logo) . "</logo>\n"; + print {$output_handler} "<logo>" . esc_url($logo) . "</logo>\n"; } if (! %latest_date) { # dummy date to keep the feed valid until commits trickle in: - print "<updated>1970-01-01T00:00:00Z</updated>\n"; + print {$output_handler} "<updated>1970-01-01T00:00:00Z</updated>\n"; } else { - print "<updated>$latest_date{'iso-8601'}</updated>\n"; + print {$output_handler} "<updated>$latest_date{'iso-8601'}</updated>\n"; } - print "<generator version='$version/$git_version'>gitweb</generator>\n"; + print {$output_handler} "<generator version='$version/$git_version'>gitweb</generator>\n"; } # contents @@ -6676,7 +6693,7 @@ XML # print element (entry, item) my $co_url = href(-full=>1, action=>"commitdiff", hash=>$commit); if ($format eq 'rss') { - print "<item>\n" . + print {$output_handler} "<item>\n" . "<title>" . esc_html($co{'title'}) . "</title>\n" . "<author>" . esc_html($co{'author'}) . "</author>\n" . "<pubDate>$cd{'rfc2822'}</pubDate>\n" . @@ -6686,22 +6703,22 @@ XML "<content:encoded>" . "<![CDATA[\n"; } elsif ($format eq 'atom') { - print "<entry>\n" . + print {$output_handler} "<entry>\n" . "<title type=\"html\">" . esc_html($co{'title'}) . "</title>\n" . "<updated>$cd{'iso-8601'}</updated>\n" . "<author>\n" . " <name>" . esc_html($co{'author_name'}) . "</name>\n"; if ($co{'author_email'}) { - print " <email>" . esc_html($co{'author_email'}) . "</email>\n"; + print {$output_handler} " <email>" . esc_html($co{'author_email'}) . "</email>\n"; } - print "</author>\n" . + print {$output_handler} "</author>\n" . # use committer for contributor "<contributor>\n" . " <name>" . esc_html($co{'committer_name'}) . "</name>\n"; if ($co{'committer_email'}) { - print " <email>" . esc_html($co{'committer_email'}) . "</email>\n"; + print {$output_handler} " <email>" . esc_html($co{'committer_email'}) . "</email>\n"; } - print "</contributor>\n" . + print {$output_handler} "</contributor>\n" . "<published>$cd{'iso-8601'}</published>\n" . "<link rel=\"alternate\" type=\"text/html\" href=\"$co_url\" />\n" . "<id>$co_url</id>\n" . @@ -6709,19 +6726,19 @@ XML "<div xmlns=\"http://www.w3.org/1999/xhtml\">\n"; } my $comment = $co{'comment'}; - print "<pre>\n"; + print {$output_handler} "<pre>\n"; foreach my $line (@$comment) { $line = esc_html($line); - print "$line\n"; + print {$output_handler} "$line\n"; } - print "</pre><ul>\n"; + print {$output_handler} "</pre><ul>\n"; foreach my $difftree_line (@difftree) { my %difftree = parse_difftree_raw_line($difftree_line); next if !$difftree{'from_id'}; my $file = $difftree{'file'} || $difftree{'to_file'}; - print "<li>" . + print {$output_handler} "<li>" . "[" . $cgi->a({-href => href(-full=>1, action=>"blobdiff", hash=>$difftree{'to_id'}, hash_parent=>$difftree{'from_id'}, @@ -6729,26 +6746,26 @@ XML file_name=>$file, file_parent=>$difftree{'from_file'}), -title => "diff"}, 'D'); if ($have_blame) { - print $cgi->a({-href => href(-full=>1, action=>"blame", + print {$output_handler} $cgi->a({-href => href(-full=>1, action=>"blame", file_name=>$file, hash_base=>$commit), -title => "blame"}, 'B'); } # if this is not a feed of a file history if (!defined $file_name || $file_name ne $file) { - print $cgi->a({-href => href(-full=>1, action=>"history", + print {$output_handler} $cgi->a({-href => href(-full=>1, action=>"history", file_name=>$file, hash=>$commit), -title => "history"}, 'H'); } $file = esc_path($file); - print "] ". + print {$output_handler} "] ". "$file</li>\n"; } if ($format eq 'rss') { - print "</ul>]]>\n" . + print {$output_handler} "</ul>]]>\n" . "</content:encoded>\n" . "</item>\n"; } elsif ($format eq 'atom') { - print "</ul>\n</div>\n" . + print {$output_handler} "</ul>\n</div>\n" . "</content>\n" . "</entry>\n"; } @@ -6756,9 +6773,9 @@ XML # end of feed if ($format eq 'rss') { - print "</channel>\n</rss>\n"; + print {$output_handler} "</channel>\n</rss>\n"; } elsif ($format eq 'atom') { - print "</feed>\n"; + print {$output_handler} "</feed>\n"; } } @@ -6772,13 +6789,12 @@ sub git_atom { sub git_opml { my @list = git_get_projects_list(); - - print $cgi->header( + print {$output_handler} $cgi->header( -type => 'text/xml', -charset => 'utf-8', -content_disposition => 'inline; filename="opml.xml"'); - print <<XML; + print {$output_handler} <<XML; <?xml version="1.0" encoding="utf-8"?> <opml version="1.0"> <head> @@ -6803,9 +6819,9 @@ XML my $path = esc_html(chop_str($proj{'path'}, 25, 5)); my $rss = href('project' => $proj{'path'}, 'action' => 'rss', -full => 1); my $html = href('project' => $proj{'path'}, 'action' => 'summary', -full => 1); - print "<outline type=\"rss\" text=\"$path\" title=\"$path\" xmlUrl=\"$rss\" htmlUrl=\"$html\"/>\n"; + print {$output_handler} "<outline type=\"rss\" text=\"$path\" title=\"$path\" xmlUrl=\"$rss\" htmlUrl=\"$html\"/>\n"; } - print <<XML; + print {$output_handler} <<XML; </outline> </body> </opml> -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) 2010-01-14 1:23 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle John 'Warthog9' Hawley @ 2010-01-14 1:23 ` John 'Warthog9' Hawley 2010-01-16 2:48 ` Jakub Narebski ` (2 more replies) 2010-01-16 0:43 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle Jakub Narebski 1 sibling, 3 replies; 60+ messages in thread From: John 'Warthog9' Hawley @ 2010-01-14 1:23 UTC (permalink / raw) To: git This is a very large patch that implements the file based caching layer that is used on such large sites as kernel.org and soon git.fedoraproject.org. This provides a simple, and straight forward caching mechanism that scales dramatically better than Gitweb by itself. The caching layer basically buffers the output that Gitweb would normally return, and saves that output to a cache file on the local disk. When the file is requested it attempts to gain a shared lock on the cache file and cat it out to the client. Should an exclusive lock be on a file (it's being updated) the code has a choice to either update in the background and go ahead and show the stale page while update is being performed, or stall the client(s) until the page is generated. There are two forms of stalling involved here, background building and non-background building, both of which are discussed in the configuration page. There are still a few known "issues" with respect to this: - Code needs to be added to be "browser" aware so that clients like wget that are trying to get a binary blob don't obtain a "Generating..." page - There is an intermittent flushing issue that has yet to be tracked down Caching is disabled by default with the $cache_enable variable, setting this to 1 will enable file based caching. It is expected that this will be extended to include additional types of caching (like memcached) in the future and should not be exclusively considered a binary value. --- gitweb/cache.pm | 283 ++++++++++++++++++++++++++++++++++++++++++++++++++++ gitweb/gitweb.css | 6 + gitweb/gitweb.perl | 58 ++++++++++- 3 files changed, 344 insertions(+), 3 deletions(-) create mode 100644 gitweb/cache.pm diff --git a/gitweb/cache.pm b/gitweb/cache.pm new file mode 100644 index 0000000..d08bcec --- /dev/null +++ b/gitweb/cache.pm @@ -0,0 +1,283 @@ +# gitweb - simple web interface to track changes in git repositories +# +# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net> +# +# This program is licensed under the GPLv2 + +# +# Gitweb caching engine +# + +use File::Path qw(make_path remove_tree); +use Digest::MD5 qw(md5 md5_hex md5_base64); +use Fcntl ':flock'; + +sub cache_fetch { + my ($action) = @_; + my $cacheTime = 0; + + # Deal with cache being disabled + if( $cache_enable == 0 ){ + $output_handler = *STDOUT; + $output_handler_bin = *STDOUT; + $actions{$action}->(); + return; + }elsif( $cache_enable == 1 ){ + #obviously we are using file based caching + + if(! -d $cachedir){ + print "*** Warning ***: Caching enabled but cache directory does not exsist. ($cachedir)\n"; + mkdir ("cache", 0665) || die "Cannot create cache dir - you will need to manually create"; + print "Cache directory created successfully\n"; + } + + our $full_url = "$my_url?". $ENV{'QUERY_STRING'}; + our $urlhash = md5_hex($full_url); + our $fullhashdir = "$cachedir/". substr( $urlhash, 0, 2) ."/"; + + my $numdirs = make_path( $fullhashdir, { mode => 0777, error => \my $mkdirerr, } ); + if( @$mkdirerr ){ + my $mkdirerrmsg = ""; + for my $diag (@$mkdirerr) { + my ($file, $message) = %$diag; + if($file eq '' ){ + $mkdirerrmsg .= "general error: $message\n"; + }else{ + $mkdirerrmsg .= "problem unlinking $file: $message\n"; + } + } + die_error(500, "Could not create cache directory | $mkdirerrmsg"); + } + $fullhashpath = "$fullhashdir/". substr( $urlhash, 2 ); + $fullhashbinpath = "$fullhashpath.bin"; + } # done dealing with cache enabled / disabled + + if(! -e "$fullhashpath" ){ + if(! defined(my $childPid = fork()) ){ + cacheUpdate($action,0); + cacheDisplay($action); + } elsif ( $childPid == 0 ){ + #run the updater + cacheUpdate($action,1); + }else{ + cacheWaitForUpdate($action); + } + }else{ + #if cache is out dated, update + #else displayCache(); + open(cacheFile, '<', "$fullhashpath"); + stat(cacheFile); + close(cacheFile); + $cacheTime = get_loadavg() * 60; + if( $cacheTime > $maxCacheTime ){ + $cacheTime = $maxCacheTime; + } + if( $cacheTime < $minCacheTime ){ + $cacheTime = $minCacheTime; + } + if( (stat(_))[9] < (time - $cacheTime) ){ + if( ! defined(my $childPid = fork()) ){ + cacheUpdate($action,0); + cacheDisplay($action); + } elsif ( $childPid == 0 ){ + #run the updater + #print "Running updater\n"; + cacheUpdate($action,1); + }else{ + #print "Waiting for update\n"; + cacheWaitForUpdate($action); + } + } else { + cacheDisplay($action); + } + + + } + + # + # If all of the caching failes - lets go ahead and press on without it and fall back to 'default' + # non-caching behavior. This is the softest of the failure conditions. + # + #$actions{$action}->(); +} + +sub cacheUpdate { + my ($action,$areForked) = @_; + my $lockingStatus; + my $fileData = ""; + + if($backgroundCache){ + open(cacheFileBG, '>:utf8', "$fullhashpath.bg"); + my $lockStatBG = flock(cacheFileBG,LOCK_EX|LOCK_NB); + + $lockStatus = $lockStatBG; + }else{ + open(cacheFile, '>:utf8', "$fullhashpath"); + my $lockStat = flock(cacheFile,LOCK_EX|LOCK_NB); + + $lockStatus = $lockStat; + } + #print "lock status: $lockStat\n"; + + + if (! $lockStatus ){ + if ( $areForked ){ + exit(0); + }else{ + return; + } + } + + if( + $action eq "snapshot" + || + $action eq "blob_plain" + ){ + open cacheFileBin, '>', $fullhashbinpath or die_error(500, "Could not open bin dump file"); + $output_handler_bin = *cacheFileBin; + } + + $output_handler = *cacheFile; + + if($backgroundCache){ + open(cacheFile, '>:utf8', "$fullhashpath"); + $lockStat = flock(cacheFile,LOCK_EX); + + if (! $lockStat ){ + if ( $areForked ){ + exit(0); + }else{ + return; + } + } + } + + $actions{$action}->(); + + if( + $action eq "snapshot" + || + $action eq "blob_plain" + ){ + close(cacheFileBin); + } + + flock(cacheFile,LOCK_UN); + close(cacheFile); + + if($backgroundCache){ + flock(cacheFileBG,LOCK_UN); + close(cacheFileBG); + } + + if ( $areForked ){ + exit(0); + } else { + return; + } +} + + +sub cacheWaitForUpdate { + my ($action) = @_; + my $x = 0; + my $max = 10; + my $lockStat = 0; + + if( $backgroundCache ){ + if( -e "$fullhashpath" ){ + open(cacheFile, '<:utf8', "$fullhashpath"); + $lockStat = flock(cacheFile,LOCK_SH|LOCK_NB); + stat(cacheFile); + close(cacheFile); + + if( $lockStat && ( (stat(_))[9] > (time - $maxCacheLife) ) ){ + cacheDisplay($action); + return; + } + } + } + + if( + $action eq "atom" + || + $action eq "rss" + || + $action eq "opml" + ){ + do { + sleep 2 if $x > 0; + open(cacheFile, '<:utf8', "$fullhashpath"); + $lockStat = flock(cacheFile,LOCK_SH|LOCK_NB); + close(cacheFile); + $x++; + $combinedLockStat = $lockStat; + } while ((! $combinedLockStat) && ($x < $max)); + + if( $x != $max ){ + cacheDisplay($action); + } + return; + } + + $| = 1; + + print $::cgi->header(-type=>'text/html', -charset => 'utf-8', + -status=> 200, -expires => 'never'); + + print <<EOF; +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www/w3.porg/TR/html4/strict.dtd"> +<!-- git web w/caching interface version $version, (C) 2006-2010, John 'Warthog9' Hawley <warthog9\@kernel.org> --> +<!-- git core binaries version $git_version --> +<head> +<meta http-equiv="content-type" content="$content_type; charset=utf-8"/> +<meta name="generator" content="gitweb/$version git/$git_version"/> +<meta name="robots" content="index, nofollow"/> +<meta http-equiv="refresh" content="0"/> +<title>$title</title> +</head> +<body> +EOF + + print "Generating.."; + do { + print "."; + sleep 2 if $x > 0; + open(cacheFile, '<:utf8', "$fullhashpath"); + $lockStat = flock(cacheFile,LOCK_SH|LOCK_NB); + close(cacheFile); + $x++; + $combinedLockStat = $lockStat; + } while ((! $combinedLockStat) && ($x < $max)); + print <<EOF; +</body> +</html> +EOF + return; +} + +sub cacheDisplay { + my ($action) = @_; + open(cacheFile, '<:utf8', "$fullhashpath"); + $lockStat = flock(cacheFile,LOCK_SH|LOCK_NB); + if (! $lockStat ){ + close(cacheFile); + cacheWaitForUpdate($action); + } + + while( <cacheFile> ){ + print $_; + } + if( + $action eq "snapshot" + || + $action eq "blob_plain" + ){ + open(cacheFileBin, '<', "$fullhashbinpath"); + binmode STDOUT, ':raw'; + print <cacheFileBin>; + binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi + close(cacheFileBin); + } + close(cacheFile); +} diff --git a/gitweb/gitweb.css b/gitweb/gitweb.css index 50067f2..f809f16 100644 --- a/gitweb/gitweb.css +++ b/gitweb/gitweb.css @@ -67,6 +67,12 @@ div.page_path { border-width: 0px 0px 1px; } +div.cachetime { + float: left; + margin-right: 10px; + color: #555555; +} + div.page_footer { height: 17px; padding: 4px 8px; diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 8bb323c..ec95bb9 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -230,6 +230,50 @@ our $git_versions_must_match = 1; # Leave it undefined (or set to 'undef') to turn off load checking. our $maxload = 300; +# This enables/disables the caching layer in gitweb. This currently only supports the +# 'dumb' file based caching layer, primarily used on git.kernel.org. this is reasonably +# effective but it has the downside of requiring a huge amount of disk space if there +# are a number of repositories involved. It is not uncommon for git.kernel.org to have +# on the order of 80G - 120G accumulate over the course of a few months. It is recommended +# that the cache directory be periodically completely deleted, and this is safe to perform. +# Suggested mechanism +# mv $cacheidr $cachedir.flush;mkdir $cachedir;rm -rf $cachedir.flush +# Value is binary. 0 = disabled (default), 1 = enabled. +# +# Values of caching: +# 1 = 'dumb' file based caching used on git.kernel.org +our $cache_enable = 0; + +# Used to set the minimum cache timeout for the dynamic caching algorithm. Basically +# if we calculate the cache to be under this number of seconds we set the cache timeout +# to this minimum. +# Value is in seconds. 1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour +our $minCacheTime = 20; + +# Used to set the maximum cache timeout for the dynamic caching algorithm. Basically +# if we calculate the cache to exceed this number of seconds we set the cache timeout +# to this maximum. +# Value is in seconds. 1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour +our $maxCacheTime = 1200; + +# If you need to change the location of the caching directory, override this +# otherwise this will probably do fine for you +our $cachedir = 'cache'; + +# If this is set (to 1) cache will do it's best to always display something instead +# of making someone wait for the cache to update. This will launch the cacheUpdate +# into the background and it will lock a <file>.bg file and will only lock the +# actual cache file when it needs to write into it. In theory this will make +# gitweb seem more responsive at the price of possibly stale data. +our $backgroundCache = 1; + +# Used to set the maximum cache file life. If a cache files last modify time exceeds +# this value, it will assume that the data is just too old, and HAS to be regenerated +# instead of trying to display the existing cache data. +# Value is in seconds. 1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour +# 18000 = 5 hours +our $maxCacheLife = 18000; + # You define site-wide feature defaults here; override them with # $GITWEB_CONFIG as necessary. our %feature = ( @@ -593,6 +637,11 @@ if (defined $maxload && get_loadavg() > $maxload) { die_error(503, "The load average on the server is too high"); } +# +# Includes +# +do 'cache.pm'; + # version of the core git binary our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown"; $number_of_git_cmds++; @@ -994,7 +1043,7 @@ if ($action !~ m/^(?:opml|project_list|project_index)$/ && !$project) { die_error(400, "Project needed"); } -$actions{$action}->(); +cache_fetch($action); exit; ## ====================================================================== @@ -3200,7 +3249,9 @@ sub git_header_html { # support xhtml+xml but choking when it gets what it asked for. if (defined $cgi->http('HTTP_ACCEPT') && $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ && - $cgi->Accept('application/xhtml+xml') != 0) { + $cgi->Accept('application/xhtml+xml') != 0 + && + $cache_enable == 0) { $content_type = 'application/xhtml+xml'; } else { $content_type = 'text/html'; @@ -3344,6 +3395,7 @@ sub git_footer_html { my $feed_class = 'rss_logo'; print {$output_handler} "<div class=\"page_footer\">\n"; + print {$output_handler} "<div class=\"cachetime\">Cache Last Updated: ". gmtime( time ) ." GMT</div>\n"; if (defined $project) { my $descr = git_get_project_description($project); if (defined $descr) { @@ -3424,7 +3476,7 @@ sub die_error { my $extra = shift; # The output handlers for die_error need to be reset to STDOUT - # so that half the message isn't being output to random and + # so that half the message isn't being output to random and # half to STDOUT as expected. This is mainly for the benefit # of using git_header_html() and git_footer_html() since those # internaly use the indirect print handler. -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) 2010-01-14 1:23 ` [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley @ 2010-01-16 2:48 ` Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski [not found] ` <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com> 2 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-16 2:48 UTC (permalink / raw) To: John 'Warthog9' Hawley, John 'Warthog9' Hawley; +Cc: git "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes: > This is a very large patch This is true, and that is why I am woeking on splitting this patch into series of smaller patches, each adding single feature present in this megapatch (this code drop)... and cleaning up (and improving) it while at it. This hopefully would make it easier to review. > that implements the file based > caching layer that is used on such large sites as kernel.org and > soon git.fedoraproject.org. This provides a simple, and straight > forward caching mechanism that scales dramatically better than > Gitweb by itself. Do you have any benchmarks comparing gitweb performace with and without caching enabled? > > The caching layer basically buffers the output that Gitweb would > normally return, and saves that output to a cache file on the local > disk. When the file is requested it attempts to gain a shared lock > on the cache file and cat it out to the client. Should an exclusive > lock be on a file (it's being updated) the code has a choice to either > update in the background and go ahead and show the stale page while > update is being performed, or stall the client(s) until the page > is generated. The above paragraph is not very clear to me. Correct me if I am wrong, but as I understand it the cache architecture is as following: * This patch implements output caching, which means that the whole gitweb response, including HTTP headers, is stored in cache. (This means that in absence of extra mechanism content-type negotiation should be disabled when caching is turned on). * Caching engine used implements simple file based caching layer, where cached data is stored verbatim in cache file (no serialization / hibernating / marshalling of data - better performance, and possibility of X-Sendfile support). Cache expiration is global value, i.e. is not stored along cache entry in file. Cache entries expire based on mtime of file. * When there exist cache entry for given request, and it is not expired, gitweb output is served directtly from cache file. * When there exist cache entry for given request, but it is expired, one process acquires exclusive (writer) lock on file; the rest of clients get served stale data. * When there does not exist cache entry for given request, one process acquires exclusive (writer) lock on cache file; the rest of clients wait for cache to be filled. > > There are two forms of stalling involved here, background building > and non-background building, both of which are discussed in the > configuration page. I'd like to have at least design decisions put into commit message, and perhaps also have caching mechanism described in separate section in gitweb/README. > > There are still a few known "issues" with respect to this: > - Code needs to be added to be "browser" aware so > that clients like wget that are trying to get a > binary blob don't obtain a "Generating..." page This issue should be clearly addressed: when do we serve "Generating..." page, and when we do not. The issue is not only wget trying to download binary blob or patchset, or snapshot, but also binary blob which is image referenced from a blob which is HTML, and there is issue of web feeds (accessed by feed readers). > - There is an intermittent flushing issue that has yet > to be tracked down Could you tell us more where does this shows (what are the symptompts)? BTW if it was split into small separate commits, you could be able to find bug by bisecting history. Also troubles with finding this bug might mean that code is not very clean. > > Caching is disabled by default with the $cache_enable variable, > setting this to 1 will enable file based caching. It is expected > that this will be extended to include additional types of caching > (like memcached) in the future and should not be exclusively > considered a binary value. Not a good idea, IMHO. In my rewrite of this patch there is _boolean_ $caching_enabled variable which controls if (output) caching is enabled or not, and $cache variable holding instance of cache engine, which might be used to select different caching that simple file-based caching. Signoff? > --- > gitweb/cache.pm | 283 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > gitweb/gitweb.css | 6 + > gitweb/gitweb.perl | 58 ++++++++++- > 3 files changed, 344 insertions(+), 3 deletions(-) > create mode 100644 gitweb/cache.pm Very large patch... but no updates to gitweb/README, no updates to t/gitweb-lib.sh (I guess that gitweb tests are no longer working). > diff --git a/gitweb/cache.pm b/gitweb/cache.pm > new file mode 100644 > index 0000000..d08bcec > --- /dev/null > +++ b/gitweb/cache.pm > @@ -0,0 +1,283 @@ > +# gitweb - simple web interface to track changes in git repositories > +# > +# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net> > +# > +# This program is licensed under the GPLv2 > + > +# > +# Gitweb caching engine > +# > + > +use File::Path qw(make_path remove_tree); Using make_path (you do not use remove_tree, so there is no need for importing it) instead of older mkdir interface requires File::Path version 2.0 (which meant that I had to upgrade File::Path). This at least should be mentioned in the comment, perhaps also in gitweb/INSTALL. > +use Digest::MD5 qw(md5 md5_hex md5_base64); You use only md5_hex; no need to import other functions. > +use Fcntl ':flock'; > + > +sub cache_fetch { > + my ($action) = @_; > + my $cacheTime = 0; > + > + # Deal with cache being disabled > + if( $cache_enable == 0 ){ Style: + if ($cache_enable == 0) { or better + if ($cache_enabled) { > + $output_handler = *STDOUT; > + $output_handler_bin = *STDOUT; There should be no need for that, as $output_handle is set to *STDOUT (or \*STDOUT) anyway. > + $actions{$action}->(); > + return; Anyway I think that the whole block should be _outside_ cache_fetch, which should be invoked only if caching is enabled. For example in gitweb.perl: if ($caching_enabled) { do $cache_pm; die $@ if $@; # ... cache_fetch($cache, $action); } else { $actions{$action}->(); } > + }elsif( $cache_enable == 1 ){ Style. > + #obviously we are using file based caching See my comment about using $cache_enable as enum selecting cache type (blergh). BTW what's with 'obviously'? > + > + if(! -d $cachedir){ Style. > + print "*** Warning ***: Caching enabled but cache directory does not exsist. ($cachedir)\n"; Why this warning? Is it really necessary? > + mkdir ("cache", 0665) || die "Cannot create cache dir - you will need to manually create"; > + print "Cache directory created successfully\n"; > + } > + > + our $full_url = "$my_url?". $ENV{'QUERY_STRING'}; Wouldn't work if you client uses path_info URL, e.g. http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel That's why I use href(-replay=>1, -full_url=>1, -path_info=>0) for cache key for request (you could use freeze(\%input_params) instead, where freeze is from Storable module). > + our $urlhash = md5_hex($full_url); > + our $fullhashdir = "$cachedir/". substr( $urlhash, 0, 2) ."/"; Is depth 2 enough for cache? > + > + my $numdirs = make_path( $fullhashdir, { mode => 0777, error => \my $mkdirerr, } ); > + if( @$mkdirerr ){ > + my $mkdirerrmsg = ""; > + for my $diag (@$mkdirerr) { > + my ($file, $message) = %$diag; > + if($file eq '' ){ > + $mkdirerrmsg .= "general error: $message\n"; > + }else{ > + $mkdirerrmsg .= "problem unlinking $file: $message\n"; > + } > + } > + die_error(500, "Could not create cache directory | $mkdirerrmsg"); > + } > + $fullhashpath = "$fullhashdir/". substr( $urlhash, 2 ); > + $fullhashbinpath = "$fullhashpath.bin"; > + } # done dealing with cache enabled / disabled Note also if dealing with caching enabled / disabled was outside cache_fetch you would have less nested code. > + > + if(! -e "$fullhashpath" ){ > + if(! defined(my $childPid = fork()) ){ Style. > + cacheUpdate($action,0); > + cacheDisplay($action); Why camelCase Java/JavaScript-like convention, quite different from the C-like naming convention used elsewhere in gitweb? > + } elsif ( $childPid == 0 ){ > + #run the updater > + cacheUpdate($action,1); cacheUpdate($action,0) vs cacheUpdate($action,1) is very cryptic distinctions. It would be better to use "named parameter" and/or separate, differently named, [wrapper] functions. > + }else{ > + cacheWaitForUpdate($action); > + } This whole block should probably be in a separate function. > + }else{ > + #if cache is out dated, update > + #else displayCache(); > + open(cacheFile, '<', "$fullhashpath"); > + stat(cacheFile); > + close(cacheFile); You don't need to open file to stat it. > + $cacheTime = get_loadavg() * 60; > + if( $cacheTime > $maxCacheTime ){ > + $cacheTime = $maxCacheTime; > + } > + if( $cacheTime < $minCacheTime ){ > + $cacheTime = $minCacheTime; > + } This should probably be a separate function (effective cache expiraton time). Also adaptiveness of caching is not described in commit message. > + if( (stat(_))[9] < (time - $cacheTime) ){ > + if( ! defined(my $childPid = fork()) ){ > + cacheUpdate($action,0); > + cacheDisplay($action); > + } elsif ( $childPid == 0 ){ > + #run the updater > + #print "Running updater\n"; Remains of debugging. > + cacheUpdate($action,1); > + }else{ > + #print "Waiting for update\n"; > + cacheWaitForUpdate($action); > + } Repeated code (I think). > + } else { > + cacheDisplay($action); > + } > + > + > + } > + > + # > + # If all of the caching failes - lets go ahead and press on without it and fall back to 'default' > + # non-caching behavior. This is the softest of the failure conditions. > + # > + #$actions{$action}->(); Why is this commented out? > +} > + > +sub cacheUpdate { > + my ($action,$areForked) = @_; > + my $lockingStatus; > + my $fileData = ""; > + > + if($backgroundCache){ > + open(cacheFileBG, '>:utf8', "$fullhashpath.bg"); > + my $lockStatBG = flock(cacheFileBG,LOCK_EX|LOCK_NB); > + > + $lockStatus = $lockStatBG; > + }else{ > + open(cacheFile, '>:utf8', "$fullhashpath"); > + my $lockStat = flock(cacheFile,LOCK_EX|LOCK_NB); > + > + $lockStatus = $lockStat; > + } Almost identical code. Use of global handles instead of indirect filehandles. > + #print "lock status: $lockStat\n"; > + > + > + if (! $lockStatus ){ > + if ( $areForked ){ > + exit(0); > + }else{ > + return; > + } > + } This conditional needs explanation (comment), I think. > + > + if( > + $action eq "snapshot" > + || > + $action eq "blob_plain" This condition should be put in a separate function/ > + ){ > + open cacheFileBin, '>', $fullhashbinpath or die_error(500, "Could not open bin dump file"); > + $output_handler_bin = *cacheFileBin; > + } > + > + $output_handler = *cacheFile; > + > + if($backgroundCache){ > + open(cacheFile, '>:utf8', "$fullhashpath"); Why "$fullhashpath" and not simply $fullhashpath? > + $lockStat = flock(cacheFile,LOCK_EX); > + > + if (! $lockStat ){ > + if ( $areForked ){ > + exit(0); > + }else{ > + return; > + } > + } Repeated code. > + } > + > + $actions{$action}->(); > + > + if( > + $action eq "snapshot" > + || > + $action eq "blob_plain" > + ){ > + close(cacheFileBin); > + } > + > + flock(cacheFile,LOCK_UN); > + close(cacheFile); > + > + if($backgroundCache){ > + flock(cacheFileBG,LOCK_UN); > + close(cacheFileBG); > + } > + > + if ( $areForked ){ > + exit(0); > + } else { > + return; > + } > +} > + > + > +sub cacheWaitForUpdate { > + my ($action) = @_; > + my $x = 0; > + my $max = 10; What is $x, what is $max? > + my $lockStat = 0; > + > + if( $backgroundCache ){ > + if( -e "$fullhashpath" ){ > + open(cacheFile, '<:utf8', "$fullhashpath"); Why opening with :uft8, and not with :raw? I don't think we need to do the eventual conversion once again... > + $lockStat = flock(cacheFile,LOCK_SH|LOCK_NB); > + stat(cacheFile); > + close(cacheFile); > + > + if( $lockStat && ( (stat(_))[9] > (time - $maxCacheLife) ) ){ > + cacheDisplay($action); > + return; > + } Why do we deal with cache expiration in two places? If it is not a bug, it should be explained in a comment. > + } > + } > + > + if( > + $action eq "atom" > + || > + $action eq "rss" > + || > + $action eq "opml" > + ){ > + do { > + sleep 2 if $x > 0; > + open(cacheFile, '<:utf8', "$fullhashpath"); > + $lockStat = flock(cacheFile,LOCK_SH|LOCK_NB); > + close(cacheFile); > + $x++; > + $combinedLockStat = $lockStat; > + } while ((! $combinedLockStat) && ($x < $max)); Why busy wait instead of _blocking_ lock, i.e. waiting on lock for it to be free? It doesn't look like we _do_ anything in the loop. Ah, I see that we wait at most 2*$max seconds (where interval of 2 seconds is hardcoded). Is it really necessary? > + > + if( $x != $max ){ > + cacheDisplay($action); > + } > + return; > + } > + > + $| = 1; > + > + print $::cgi->header(-type=>'text/html', -charset => 'utf-8', > + -status=> 200, -expires => 'never'); > + > + print <<EOF; > +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www/w3.porg/TR/html4/strict.dtd"> > +<!-- git web w/caching interface version $version, (C) 2006-2010, John 'Warthog9' Hawley <warthog9\@kernel.org> --> > +<!-- git core binaries version $git_version --> > +<head> > +<meta http-equiv="content-type" content="$content_type; charset=utf-8"/> > +<meta name="generator" content="gitweb/$version git/$git_version"/> > +<meta name="robots" content="index, nofollow"/> > +<meta http-equiv="refresh" content="0"/> > +<title>$title</title> > +</head> > +<body> > +EOF > + > + print "Generating.."; > + do { > + print "."; > + sleep 2 if $x > 0; > + open(cacheFile, '<:utf8', "$fullhashpath"); > + $lockStat = flock(cacheFile,LOCK_SH|LOCK_NB); > + close(cacheFile); > + $x++; > + $combinedLockStat = $lockStat; > + } while ((! $combinedLockStat) && ($x < $max)); This trick of having http-equiv 'refresh' meta with the delay of 0 seconds, but not closing the output and therefore not triggering redirect should be described in comments, and perhaps also in the commit message. > + print <<EOF; > +</body> > +</html> > +EOF > + return; > +} > + > +sub cacheDisplay { > + my ($action) = @_; > + open(cacheFile, '<:utf8', "$fullhashpath"); > + $lockStat = flock(cacheFile,LOCK_SH|LOCK_NB); > + if (! $lockStat ){ > + close(cacheFile); > + cacheWaitForUpdate($action); > + } > + > + while( <cacheFile> ){ > + print $_; > + } Why not slurp it (local $/ = undef), but write line after line? > + if( > + $action eq "snapshot" > + || > + $action eq "blob_plain" > + ){ > + open(cacheFileBin, '<', "$fullhashbinpath"); > + binmode STDOUT, ':raw'; > + print <cacheFileBin>; Why not slurp it (local $/ = undef), but write line after line, implicitly? > + binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi > + close(cacheFileBin); > + } > + close(cacheFile); > +} > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 8bb323c..ec95bb9 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -230,6 +230,50 @@ our $git_versions_must_match = 1; > # Leave it undefined (or set to 'undef') to turn off load checking. > our $maxload = 300; > > +# This enables/disables the caching layer in gitweb. This currently only supports the > +# 'dumb' file based caching layer, primarily used on git.kernel.org. this is reasonably > +# effective but it has the downside of requiring a huge amount of disk space if there > +# are a number of repositories involved. It is not uncommon for git.kernel.org to have > +# on the order of 80G - 120G accumulate over the course of a few months. It is recommended > +# that the cache directory be periodically completely deleted, and this is safe to perform. > +# Suggested mechanism > +# mv $cacheidr $cachedir.flush;mkdir $cachedir;rm -rf $cachedir.flush > +# Value is binary. 0 = disabled (default), 1 = enabled. > +# > +# Values of caching: > +# 1 = 'dumb' file based caching used on git.kernel.org > +our $cache_enable = 0; > + > +# Used to set the minimum cache timeout for the dynamic caching algorithm. Basically > +# if we calculate the cache to be under this number of seconds we set the cache timeout > +# to this minimum. > +# Value is in seconds. 1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour > +our $minCacheTime = 20; > + > +# Used to set the maximum cache timeout for the dynamic caching algorithm. Basically > +# if we calculate the cache to exceed this number of seconds we set the cache timeout > +# to this maximum. > +# Value is in seconds. 1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour > +our $maxCacheTime = 1200; > + > +# If you need to change the location of the caching directory, override this > +# otherwise this will probably do fine for you > +our $cachedir = 'cache'; Why not '/tmp/gitweb-cache', or '/var/cache/gitweb'? Perhaps use TMPDIR / File::Spec->tmpdir() if it is undefined? Note that this path is relative to the place where we run gitweb from, which is important for gitweb tests. > + > +# If this is set (to 1) cache will do it's best to always display something instead > +# of making someone wait for the cache to update. This will launch the cacheUpdate > +# into the background and it will lock a <file>.bg file and will only lock the > +# actual cache file when it needs to write into it. In theory this will make > +# gitweb seem more responsive at the price of possibly stale data. > +our $backgroundCache = 1; Does it mean that if there exist cache entry for given request, but it is expired, also the client that created write lock gets stale data instead of 'Generating...' info, and updates/regenerates cache using background process? This comment is not entirely clear for me. > + > +# Used to set the maximum cache file life. If a cache files last modify time exceeds > +# this value, it will assume that the data is just too old, and HAS to be regenerated > +# instead of trying to display the existing cache data. > +# Value is in seconds. 1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour > +# 18000 = 5 hours > +our $maxCacheLife = 18000; This should also be mentioned in commit message (modifying what I wrote). > + > # You define site-wide feature defaults here; override them with > # $GITWEB_CONFIG as necessary. > our %feature = ( > @@ -593,6 +637,11 @@ if (defined $maxload && get_loadavg() > $maxload) { > die_error(503, "The load average on the server is too high"); > } > > +# > +# Includes > +# > +do 'cache.pm'; Should be +do "$cache_pm"; if you don't use require, where $cache_pm can be overriden in gitweb config, otherwise gitweb caching tests wouldn't work: they invoke gitweb from test directory. > + > # version of the core git binary > our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown"; > $number_of_git_cmds++; > @@ -994,7 +1043,7 @@ if ($action !~ m/^(?:opml|project_list|project_index)$/ && > !$project) { > die_error(400, "Project needed"); > } > -$actions{$action}->(); > +cache_fetch($action); > exit; > As I wrote, I think cache_fetch should be invoked only when caching is enabled. > ## ====================================================================== > @@ -3200,7 +3249,9 @@ sub git_header_html { > # support xhtml+xml but choking when it gets what it asked for. > if (defined $cgi->http('HTTP_ACCEPT') && > $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ && > - $cgi->Accept('application/xhtml+xml') != 0) { > + $cgi->Accept('application/xhtml+xml') != 0 > + && > + $cache_enable == 0) { > $content_type = 'application/xhtml+xml'; > } else { > $content_type = 'text/html'; O.K. > @@ -3344,6 +3395,7 @@ sub git_footer_html { > my $feed_class = 'rss_logo'; > > print {$output_handler} "<div class=\"page_footer\">\n"; > + print {$output_handler} "<div class=\"cachetime\">Cache Last Updated: ". gmtime( time ) ." GMT</div>\n"; Shouldn't this be conditional on $cache_enabled? > if (defined $project) { > my $descr = git_get_project_description($project); > if (defined $descr) { BTW. you need, I think, protect timing info and do not show it if caching is enabled. It doesn't make much sense to show how much time it took to generate page... when said page could have been retrieved from cache. But it might make sense; I am not sure. > @@ -3424,7 +3476,7 @@ sub die_error { > my $extra = shift; > > # The output handlers for die_error need to be reset to STDOUT > - # so that half the message isn't being output to random and > + # so that half the message isn't being output to random and > # half to STDOUT as expected. This is mainly for the benefit > # of using git_header_html() and git_footer_html() since those > # internaly use the indirect print handler. It looks like spurious change. > -- > 1.6.5.2 > -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 60+ messages in thread
* [RFC PATCH 00/10] gitweb: Simple file based output caching 2010-01-14 1:23 ` [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley 2010-01-16 2:48 ` Jakub Narebski @ 2010-01-23 0:27 ` Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski ` (11 more replies) [not found] ` <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com> 2 siblings, 12 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-23 0:27 UTC (permalink / raw) To: git Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley, Jakub Narebski This 10 patches long patch series is intended as proof of concept for splitting large 'gitweb: File based caching layer (from git.kernel.org)' mega-patch by John 'Warthog9' Hawley aka J.H., by starting small and adding features piece by piece. This patch is meant as replacement for last two patches: * [PATCH 8/9] gitweb: Convert output to using indirect file handle Message-ID: <1263432185-21334-9-git-send-email-warthog9@eaglescrag.net> * [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) Message-ID: <1263432185-21334-10-git-send-email-warthog9@eaglescrag.net> in the long patch series by J.H. * [PATCH 0/9] Gitweb caching v5 http://thread.gmane.org/gmane.comp.version-control.git/136913 Note that this patch series is part of 'gitweb/cache-kernel' branch of http://repo.or.cz/w/git/jnareb-git.git repository (gitweb link), built on top of modified patches from 'Gitweb caching v2' series (from 'gitweb-ml-v2' branch of http://git.kernel.org/?p=git/warthog9/gitweb.git repository). Therefore they might not apply as straight replacements on top of early parts of 'gitweb-ml-v5' branch. This is work in progress (showing how I see introducing output caching to gitweb), it lacks proper documentation (POD for gitweb/cache.pm, new configuration variables in gitweb/README, perhaps "Gitweb caching" section in gitweb/README and gitweb/cache.pm mentioned in gitweb/INSTALL), and commits/patches marked '(WIP)' lacks proper commit message. Just food for thought... Table of contents: ~~~~~~~~~~~~~~~~~~ [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing for caching) [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) output caching (WIP) [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' problem (WIP) [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for filling cache (WIP) [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) Diffstat: ~~~~~~~~~ gitweb/cache.pm | 566 ++++++++++ gitweb/gitweb.perl | 1923 +++++++++++++++++--------------- t/gitweb-lib.sh | 2 + t/t9500-gitweb-standalone-no-errors.sh | 13 + t/t9503-gitweb-caching.sh | 32 + t/t9503/test_cache_interface.pl | 195 ++++ t/test-lib.sh | 3 + 7 files changed, 1836 insertions(+), 898 deletions(-) create mode 100644 gitweb/cache.pm create mode 100755 t/t9503-gitweb-caching.sh create mode 100755 t/t9503/test_cache_interface.pl ^ permalink raw reply [flat|nested] 60+ messages in thread
* [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski @ 2010-01-23 0:27 ` Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching Jakub Narebski ` (10 subsequent siblings) 11 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-23 0:27 UTC (permalink / raw) To: git Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley, Jakub Narebski If named boolean option -path_info is passed to href() subroutine, use its value, instead of querying 'pathinfo' feature to check whether to generate path_info URL (if generating path_info link is possible at all). href(-replay=>1, -path_info=>0) is meant to be used to generate a key for caching gitweb output; alternate solution would be to use freeze() from Storable (core module) on %input_params hash (or its reference), e.g.: $key = freeze \%input_params; or other serialization technique. While at it document extra options/flags to href(). Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- Actually after discussion with J.H. http://thread.gmane.org/gmane.comp.version-control.git/136913/focus=137061 I have changed my mind and now I think that good human-readable key for output caching is href(-replay=>1, -full=>1, -path_info=>0); The reason behind -path_info=>0 is that we want to have the same cache entry (the same cache file) regardless of whether we use path_info URL, or non-path_info URL. The reason behind -full_path=>1 (from what I understand from J.H. argument) is that you can have two gitweb deployments sharing the same cache root, but in which the same project name can point to different projects (different $projectsroot). gitweb/gitweb.perl | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 2f6a8e1..cd5073c 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -1004,6 +1004,10 @@ exit; ## ====================================================================== ## action links +# possible values of extra options +# -full => 0|1 - use absolute/full URL ($my_uri/$my_url as base) +# -replay => 1 - start from a current view (replay with modifications) +# -path_info => 0|1 - don't use/use path_info URL (if possible) sub href { my %params = @_; # default is to use -absolute url() i.e. $my_uri @@ -1020,7 +1024,8 @@ sub href { } my $use_pathinfo = gitweb_check_feature('pathinfo'); - if ($use_pathinfo and defined $params{'project'}) { + if (defined $params{'project'} && + (exists $params{-path_info} ? $params{-path_info} : $use_pathinfo)) { # try to put as many parameters as possible in PATH_INFO: # - project name # - action -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski @ 2010-01-23 0:27 ` Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski ` (9 subsequent siblings) 11 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-23 0:27 UTC (permalink / raw) To: git Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley, Jakub Narebski This is first step towards implementing file based output (response) caching layer that is used on such large sites as kernel.org. This patch introduces GitwebCaching::SimpleFileCache package, which follows Cache::Cache / CHI interface, although do not implement it fully. The intent of following established convention is to be able in the future to replace our simple file based cache e.g. by one using memcached. Like in original patch by John 'Warthog9' Hawley (J.H.) (the one this commit intends to be incremental step to), the data is stored in the case as-is, without adding metadata (like expiration date), and without serialization (which means only scalar data). To be implemented (from original patch by J.H.): * cache expiration (based on file stats, current time and global expiration time); currently elements in cache do not expire * actually using this cache in gitweb, except error pages * adaptive cache expiration, based on average system load * optional locking interface, where only one process can update cache (using flock) * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Possible extensions (beyond what was in original patch): * (optionally) show information about cache utilization * AJAX (JavaScript-based) progress indicator * JavaScript code to update relative dates in cached output * make cache size-aware (try to not exceed specified maximum size) * utilize X-Sendfile header (or equivalent) to show cached data (optional, as it makes sense only if web server supports sendfile feature and have it enabled) * variable expiration feature from CHI, allowing items to expire a bit earlier than the stated expiration time to prevent cache miss stampedes (although locking, if available, should take care of this). The code of GitwebCaching::SimpleFileCache package in gitweb/cache.pm was heavily based on file-based cache in Cache::Cache package, i.e. on Cache::FileCache, Cache::FileBackend and Cache::BaseCache (including implementing atomic write, something that original patch lacks). This patch does not yet enable output caching in gitweb (it doesn't have all required features yet); on the other hand it includes tests, currently testing only cache Perl API. Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- Large parts of this code are based _heavily_ on Cache::FileCache implementation (including Cache::FileBackend and Cache::BaseCache) from Cache::Cache distribution (which is dual licensed using (Perl) Artistic License and GNU General Public License, like Perl itself). In the final version of code it should probably be cleaned up. Also although it implements 'compute' interface from CHI (Unified cache interface), actual CHI code was not used even as reference. (Nevertheless gitweb/cache.pm is meant to use minimal dependencies, like gitweb itself, so it won't use Moose / Mouse for OO). gitweb/cache.pm | 317 +++++++++++++++++++++++++++++++++++++++ t/t9503-gitweb-caching.sh | 32 ++++ t/t9503/test_cache_interface.pl | 77 ++++++++++ t/test-lib.sh | 3 + 4 files changed, 429 insertions(+), 0 deletions(-) create mode 100644 gitweb/cache.pm create mode 100755 t/t9503-gitweb-caching.sh create mode 100755 t/t9503/test_cache_interface.pl diff --git a/gitweb/cache.pm b/gitweb/cache.pm new file mode 100644 index 0000000..ea544b0 --- /dev/null +++ b/gitweb/cache.pm @@ -0,0 +1,317 @@ +# gitweb - simple web interface to track changes in git repositories +# +# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net> +# +# This program is licensed under the GPLv2 + +# +# Gitweb caching engine +# + +{ +# Minimalistic cache that stores data in the filesystem, without serialization +# and currently without any kind of cache expiration (all keys last forever till +# they got explicitely removed) +# +# It follows Cache::Cache and CHI interface (but does not implement it fully) + +package GitwebCache::SimpleFileCache; + +use strict; +use warnings; + +use File::Path qw(make_path); # requires version >= 2.0 +use File::Spec; +use File::Temp; +use Digest::MD5 qw(md5_hex); + +# by default, the cache nests all entries on the filesystem two +# directories deep + +our $DEFAULT_CACHE_DEPTH = 2; + +# by default, the root of the cache is located in 'cache'. + +our $DEFAULT_CACHE_ROOT = "cache"; + +# ...................................................................... +# constructor + +# The options are set by passing in a reference to a hash containing +# any of the following keys: +# * 'namespace' +# The namespace associated with this cache. This allows easy separation of +# multiple, distinct caches without worrying about key collision. Defaults +# to '' (which does not allow for simple implementation of clear() method). +# * 'cache_root' +# The location in the filesystem that will hold the root of the cache. +# Defaults to 'cache', relative to gitweb.cgi directory. +# * 'cache_depth' +# The number of subdirectories deep to cache object item. This should be +# large enough that no cache directory has more than a few hundred objects. +# Defaults to 2 unless explicitly set. +sub new { + my ($proto, $p_options_hash_ref) = @_; + + my $class = ref($proto) || $proto; + my $self = {}; + $self = bless($self, $class); + + my ($root, $depth, $ns); + if (defined $p_options_hash_ref) { + $root = $p_options_hash_ref->{'cache_root'}; + $depth = $p_options_hash_ref->{'cache_depth'}; + $ns = $p_options_hash_ref->{'namespace'}; + } + $root = $DEFAULT_CACHE_ROOT unless defined($root); + $depth = $DEFAULT_CACHE_DEPTH unless defined($depth); + $ns = '' unless defined($ns); + + $self->set_root($root); + $self->set_depth($depth); + $self->set_namespace($ns); + + return $self; +} + +# ...................................................................... +# accessors + +sub get_depth { + my ($self) = @_; + + return $self->{'_Depth'}; +} + +sub set_depth { + my ($self, $depth) = @_; + + $self->{'_Depth'} = $depth; +} + +sub get_root { + my ($self) = @_; + + return $self->{'_Root'}; +} + + +sub set_root { + my ($self, $root) = @_; + + $self->{'_Root'} = $root; +} + +sub get_namespace { + my ($self) = @_; + + return $self->{'_Namespace'}; +} + + +sub set_namespace { + my ($self, $namespace) = @_; + + $self->{'_Namespace'} = $namespace; +} + +# ---------------------------------------------------------------------- +# (private) utility functions and methods + +# Take an human readable key, and create a unique (hashed) key from it +sub _Build_Hashed_Key { + my ($p_key) = @_; + + return md5_hex($p_key); +} + +# Take an human readable key, and return file path +sub _path_to_key { + my ($self, $p_namespace, $p_key) = @_; + + return $self->_path_to_hashed_key($p_namespace, + _Build_Hashed_Key($p_key)); +} + +# Take hashed key, and return file path +sub _path_to_hashed_key { + my ($self, $p_namespace, $p_hashed_key) = @_; + + return File::Spec->catfile($self->get_root(), $p_namespace, + _Split_Word($p_hashed_key, $self->get_depth())); +} + +# Split word into N components, where each component but last is two-letter word +# e.g. _Split_Word("06b90e786e304a18fdfbd7c7bcc41a6b", 2) == qw(06 b90e786e304a18fdfbd7c7bcc41a6b); +# _Split_Word("06b90e786e304a18fdfbd7c7bcc41a6b", 3) == qw(06 b9 0e786e304a18fdfbd7c7bcc41a6b); +sub _Split_Word { + my ($p_word, $p_depth) = @_; + + $p_depth--; # now it is number of leading 2-letter components + return unpack("(a2)$p_depth a*", $p_word); +} + +sub _Read_File { + my ($p_path) = @_; + + -e $p_path + or return undef; + + open(my $fh, '<', $p_path) + or return undef; + + local $/ = undef; + my $data = <$fh>; + + close($fh); + + return $data; +} + +# write a file atomically, assuming that path leading to file exists +sub _Write_File { + my ($p_path, $p_data) = @_; + + my ($volume, $directory, $filename) = File::Spec->splitpath($p_path); + if (defined $directory and defined $volume) { + $directory = File::Spec->catpath($volume, $directory, ''); + } + + my $temp = File::Temp->new(DIR => $directory, + TEMPLATE => "${filename}_XXXXX", + SUFFIX => '.tmp'); + binmode($temp); + print {$temp} $p_data; + close($temp); + + rename($temp, $p_path); +} + +# ensures that directory leading to path exists, or dies +sub _Make_Path { + my ($p_path, $p_dir) = @_; + + my ($volume, $directory, $filename) = File::Spec->splitpath($p_path); + if (defined $directory and defined $volume) { + $directory = File::Spec->catpath($volume, $directory, ""); + } + + return + unless (defined $directory and not -d $directory); + + my $numdirs = make_path($directory, + { mode => 0777, error => \my $mkdirerr }); + if (@$mkdirerr) { + my $mkdirerrmsg = ""; + for my $diag (@$mkdirerr) { + my ($file, $message) = %$diag; + if ($file eq '' ){ + $mkdirerrmsg .= "general error: $message\n"; + } else { + $mkdirerrmsg .= "problem unlinking $file: $message\n"; + } + } + #die_error(500, "Could not create cache directory | $mkdirerrmsg"); + } +} + +sub _Remove_File { + my ($p_path) = @_; + + if (-f $p_path) { + unlink($p_path); + } +} + +# _read_data and _write_data methods do deserialization/serialization +# in original implementation in Cache::Cache distribution + +sub _read_data { + my ($self, $p_path) = @_; + + return _Read_File($p_path); +} + +sub _write_data { + my ($self, $p_path, $p_data) = @_; + + _Make_Path($p_path); + _Write_File($p_path, $p_data); +} + +# ---------------------------------------------------------------------- +# worker methods (explicit namespace) + +sub restore { + my ($self, $p_namespace, $p_key) = @_; + + return $self->_read_data($self->_path_to_key($p_namespace, $p_key)); +} + +sub store { + my ($self, $p_namespace, $p_key, $p_data) = @_; + + $self->_write_data($self->_path_to_key($p_namespace, $p_key), + $p_data); +} + +sub delete_key { + my ($self, $p_namespace, $p_key) = @_; + + _Remove_File($self->_path_to_key($p_namespace, $p_key)); +} + +sub get_size { + my ($self, $p_namespace, $p_key) = @_; + + my $path = $self->_path_to_key($p_namespace, $p_key); + if (-e $path) { + return -s $path; + } + return 0; +} + +# ...................................................................... +# interface methods + +# Removing and expiring + +sub remove { + my ($self, $p_key) = @_; + + $self->delete_key($self->get_namespace(), $p_key); +} + +# Getting and setting + +sub set { + my ($self, $p_key, $p_data) = @_; + + $self->store($self->get_namespace(), $p_key, $p_data); +} + +sub get { + my ($self, $p_key) = @_; + + my $data = $self->restore($self->get_namespace(), $p_key) + or return undef; + + return $data; +} + +sub compute { + my ($self, $p_key, $p_coderef) = @_; + + my $data = $self->get($p_key); + if (!defined $data) { + $data = $p_coderef->($self, $p_key); + $self->set($p_key, $data); + } + + return $data; +} + +1; +} # end of package GitwebCache::SimpleFileCache; + +1; diff --git a/t/t9503-gitweb-caching.sh b/t/t9503-gitweb-caching.sh new file mode 100755 index 0000000..768080c --- /dev/null +++ b/t/t9503-gitweb-caching.sh @@ -0,0 +1,32 @@ +#!/bin/sh +# +# Copyright (c) 2010 Jakub Narebski +# + +test_description='caching interface to be used in gitweb' +#test_description='caching interface used in gitweb, gitweb caching +# +#This test checks cache (interface) used in gitweb caching, caching +#infrastructure and gitweb response (output) caching (the last by +#running gitweb as CGI script from commandline).' + +# for now we are running only cache interface tests +. ./test-lib.sh + +# this test is present in gitweb-lib.sh +if ! test_have_prereq PERL; then + say 'perl not available, skipping test' + test_done +fi + +"$PERL_PATH" -MTest::More -e 0 >/dev/null 2>&1 || { + say 'perl module Test::More unavailable, skipping test' + test_done +} + +# ---------------------------------------------------------------------- + +test_external 'GitwebCache::* Perl API (in gitweb/cache.pm)' \ + "$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_cache_interface.pl + +test_done diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl new file mode 100755 index 0000000..0b6628b --- /dev/null +++ b/t/t9503/test_cache_interface.pl @@ -0,0 +1,77 @@ +#!/usr/bin/perl +use lib (split(/:/, $ENV{GITPERLLIB})); + +use warnings; +use strict; + +use Test::More; + +# test source version; there is no installation target for gitweb +my $cache_pm = "$ENV{TEST_DIRECTORY}/../gitweb/cache.pm"; + +unless (-f "$cache_pm") { + plan skip_all => "gitweb/cache.pm not found"; +} + +# it is currently not a proper Perl module, so we use 'do FILE' +#ok(eval { do "$cache_pm"; 1 or die $!; }, "loading gitweb/cache.pm"); +my $return = do "$cache_pm"; +ok(!$@, "parse gitweb/cache.pm"); +ok(defined $return, "do gitweb/cache.pm"); +ok($return, "run gitweb/cache.pm"); +# instead of: BEGIN { use_ok('GitwebCache::SimpleFileCache') } + +# Test creating a cache +# +my $cache = new_ok('GitwebCache::SimpleFileCache', + [ { 'cache_root' => 'cache', 'cache_depth' => 2 } ]); + +# Test that default values are defined +# +ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT, + '$DEFAULT_CACHE_ROOT defined'); +ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH, + '$DEFAULT_CACHE_DEPTH defined'); + +# Test accessors and default values for cache +# +SKIP: { + skip 'default values not defined', 3 + unless ($GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT && + $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH); + + is($cache->get_namespace(), '', "default namespace is ''"); + is($cache->get_root(), $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT, + "default cache root is '$GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT'"); + cmp_ok($cache->get_depth(), '==', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH, + "default cache depth is $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH"); +} + +# Test the getting, setting, and removal of a cached value +# (Cache::Cache interface) +# +my $key = 'Test Key'; +my $value = 'Test Value'; +can_ok($cache, qw(get set remove)); +#ok(!defined($cache->get($key)), 'get before set') +# or diag("get returned '", $cache->get($key), "' for $key"); +$cache->set($key, $value); +is($cache->get($key), $value, 'get after set, returns cached value'); +$cache->remove($key); +ok(!defined($cache->get($key)), 'get after remove, is undefined'); + +# Test the getting and setting of a cached value +# (CHI interface) +# +my $call_count = 0; +sub get_value { + $call_count++; + return $value; +} +can_ok($cache, qw(compute)); +is($cache->compute($key, \&get_value), $value, 'compute 1st time (set)'); +is($cache->compute($key, \&get_value), $value, 'compute 2nd time (get)'); +is($cache->compute($key, \&get_value), $value, 'compute 3rd time (get)'); +cmp_ok($call_count, '==', 1, 'get_value() is called once'); + +done_testing(); diff --git a/t/test-lib.sh b/t/test-lib.sh index 142f36f..9282d9e 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -429,6 +429,9 @@ test_external () { # Announce the script to reduce confusion about the # test output that follows. say_color "" " run $test_count: $descr ($*)" + # Export TEST_DIRECTORY and TRASH_DIRECTORY + # to be able to use them in script + export TEST_DIRECTORY TRASH_DIRECTORY # Run command; redirect its stderr to &4 as in # test_run_, but keep its stdout on our stdout even in # non-verbose mode. -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching Jakub Narebski @ 2010-01-23 0:27 ` Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) output caching (WIP) Jakub Narebski ` (8 subsequent siblings) 11 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-23 0:27 UTC (permalink / raw) To: git Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley, Jakub Narebski Add stat-based cache expiration to file-based GitwebCache::SimpleFileCache. Contrary to the way other caching interfaces such as Cache::Cache and CHI do it, the time cache element expires in is _global_ value associated with cache instance, and is not local property of cache entry. (Currently cache entry does not store any metadata associated with entry... which means that there is no need for serialization / marshalling / freezing and thawing.) Default expire time is -1, which means never expire. To check if cache entry is expired, GitwebCache::SimpleFileCache compares difference between mtime (last modify time) of a cache file and current time with (global) time to expire. It is done using CHI-compatibile is_valid() method. Add test checking that expire time of 0 (expire now) works correctly. While at it show diagnostic if there were parse errors in gitweb/cache.pm (TO BE MOVED TO PREVIOUS COMMIT). Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- The main difference from the way J.H. did it in his patch adding output caching to gitweb is that here I uses imply stat, and not open/stat/close. Now that cahcing engine supports cache expiration, we can add caching support to gitweb. gitweb/cache.pm | 47 ++++++++++++++++++++++++++++++++++++++- t/t9503/test_cache_interface.pl | 11 ++++++++- 2 files changed, 56 insertions(+), 2 deletions(-) diff --git a/gitweb/cache.pm b/gitweb/cache.pm index ea544b0..12a7a78 100644 --- a/gitweb/cache.pm +++ b/gitweb/cache.pm @@ -50,6 +50,10 @@ our $DEFAULT_CACHE_ROOT = "cache"; # The number of subdirectories deep to cache object item. This should be # large enough that no cache directory has more than a few hundred objects. # Defaults to 2 unless explicitly set. +# * 'default_expires_in' (Cache::Cache compatibile), +# 'expires_in' (CHI compatibile) [seconds] +# The expiration time for objects place in the cache. +# Defaults to $EXPIRES_NEVER if not explicitly set. sub new { my ($proto, $p_options_hash_ref) = @_; @@ -57,19 +61,24 @@ sub new { my $self = {}; $self = bless($self, $class); - my ($root, $depth, $ns); + my ($root, $depth, $ns, $expires_in); if (defined $p_options_hash_ref) { $root = $p_options_hash_ref->{'cache_root'}; $depth = $p_options_hash_ref->{'cache_depth'}; $ns = $p_options_hash_ref->{'namespace'}; + $expires_in = + $p_options_hash_ref->{'default_expires_in'} || + $p_options_hash_ref->{'expires_in'}; } $root = $DEFAULT_CACHE_ROOT unless defined($root); $depth = $DEFAULT_CACHE_DEPTH unless defined($depth); $ns = '' unless defined($ns); + $expires_in = -1 unless defined($expires_in); # <0 means never $self->set_root($root); $self->set_depth($depth); $self->set_namespace($ns); + $self->set_expires_in($expires_in); return $self; } @@ -115,6 +124,20 @@ sub set_namespace { $self->{'_Namespace'} = $namespace; } +sub get_expires_in { + my ($self) = @_; + + return $self->{'_Expires_In'}; +} + + +sub set_expires_in { + my ($self, $expires_in) = @_; + + $self->{'_Expires_In'} = $expires_in; +} + + # ---------------------------------------------------------------------- # (private) utility functions and methods @@ -282,6 +305,27 @@ sub remove { $self->delete_key($self->get_namespace(), $p_key); } +# exists in cache and is not expired +sub is_valid { + my ($self, $p_key) = @_; + + # should there be namespace variant of this function? + my $path = $self->_path_to_key($self->get_namespace(), $p_key); + + # does file exists in cache? + return 0 unless -f $path; + + # expire time can be set to never + my $expires_in = $self->get_expires_in(); + return 1 unless (defined $expires_in && $expires_in >= 0); + + # is file expired? + my $mtime = (stat(_))[9]; + my $now = time(); + + return (($now - $mtime) < $expires_in); +} + # Getting and setting sub set { @@ -293,6 +337,7 @@ sub set { sub get { my ($self, $p_key) = @_; + return undef unless $self->is_valid($p_key); my $data = $self->restore($self->get_namespace(), $p_key) or return undef; diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl index 0b6628b..3644ca8 100755 --- a/t/t9503/test_cache_interface.pl +++ b/t/t9503/test_cache_interface.pl @@ -16,7 +16,8 @@ unless (-f "$cache_pm") { # it is currently not a proper Perl module, so we use 'do FILE' #ok(eval { do "$cache_pm"; 1 or die $!; }, "loading gitweb/cache.pm"); my $return = do "$cache_pm"; -ok(!$@, "parse gitweb/cache.pm"); +ok(!$@, "parse gitweb/cache.pm") + or diag("parse error:\n", $@); ok(defined $return, "do gitweb/cache.pm"); ok($return, "run gitweb/cache.pm"); # instead of: BEGIN { use_ok('GitwebCache::SimpleFileCache') } @@ -74,4 +75,12 @@ is($cache->compute($key, \&get_value), $value, 'compute 2nd time (get)'); is($cache->compute($key, \&get_value), $value, 'compute 3rd time (get)'); cmp_ok($call_count, '==', 1, 'get_value() is called once'); +# Test cache expiration for 'expire now' +# +$cache->set_expires_in(0); +is($cache->get_expires_in(), 0, '"expires in" is set to now (0)'); +$cache->set($key, $value); +ok(!defined($cache->get($key)), 'cache is expired'); + + done_testing(); -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) output caching (WIP) 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski ` (2 preceding siblings ...) 2010-01-23 0:27 ` [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski @ 2010-01-23 0:27 ` Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) Jakub Narebski ` (7 subsequent siblings) 11 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-23 0:27 UTC (permalink / raw) To: git Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley, Jakub Narebski Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This patch, as you can see, lack proper commit message: it is work in progress. As you can see cache_fetch() subroutine is much, much simpler that the one in original patch by J.H.: [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) Message-ID: <1263432185-21334-10-git-send-email-warthog9@eaglescrag.net> http://permalink.gmane.org/gmane.comp.version-control.git/136917 As you can see I have introduced $cache_pm variable, to be able to test caching in t/t9500-gitweb-standalone-no-errors.sh, but also to be able to install cache.pm in some other place than along gitweb.cgi. There would be no such problems if we used 'require GitwebCache' or somesuch, in place of 'do "cache.pm"' like in original patch by J.H. But at leats for now I have decided to follow J.H. in this issue. Instead of using binary (sic!) valued $cache_enable as in J.H. patch, I use set of two variables: $cache (to be able to select what caching engine to use, and what features should be enabled), and $caching_enabled to actually enable/disable cache. When caching is enabled gitweb do not output timing info (time to generate page), as it would contain incorrect information if the page was retrieved from cache; in this place we could put cachetime info from the original patch by J.H., which means writing Cache Last Updated: ". gmtime( time ) I have forgot about adding this feature... ATTENTION !!!: I have run both tests (t9500 to check for errors in gitweb.perl, and t9503 to test the API), but I haven't actually tested that *gitweb itself* behaves correctly. gitweb/cache.pm | 36 ++++++++++++++++++++++++++++ gitweb/gitweb.perl | 40 +++++++++++++++++++++++++++++-- t/gitweb-lib.sh | 2 + t/t9500-gitweb-standalone-no-errors.sh | 13 ++++++++++ 4 files changed, 88 insertions(+), 3 deletions(-) diff --git a/gitweb/cache.pm b/gitweb/cache.pm index 12a7a78..3a33158 100644 --- a/gitweb/cache.pm +++ b/gitweb/cache.pm @@ -359,4 +359,41 @@ sub compute { 1; } # end of package GitwebCache::SimpleFileCache; +# human readable key identifying gitweb output +sub gitweb_output_key { + return href(-replay => 1, -full => 1, -path_info => 0); +} + +sub cache_fetch { + my ($cache, $action) = @_; + + my $key = gitweb_output_key(); + my $data = $cache->get($key); + + if (defined $data) { + # print cached data + binmode STDOUT, ':raw'; + print STDOUT $data; + + } else { + # calculate data and regenerate data + open my $data_fh, '>', \$data + or die "Can't open memory file: $!"; + # matches "binmode STDOUT, ':uft8'" at beginning + binmode $data_fh, ':utf8'; + + $out = $data_fh || \*STDOUT; + $actions{$action}->(); + + if (defined $data) { + $cache->set($key, $data); + binmode STDOUT, ':raw'; + local $/ = undef; + print STDOUT $data; + } + + close $data_fh; + } +} + 1; diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index cd5073c..0394dc8 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -234,6 +234,22 @@ our $gitlinkurl_base = ("++GITWEB_BASE_URL++" =~ m!^(git://.*)$!) ? $1 : ''; # Leave it undefined (or set to 'undef') to turn off load checking. our $maxload = 300; +# This enables/disables the caching layer in gitweb. Currently supported +# is only output (response) caching, similar to the one used on git.kernel.org. +our $caching_enabled = 0; +# Set to _initialized_ instance of cache interface implementing (at least) +# get($key) and set($key, $data) methods (Cache::Cache and CHI interfaces). +# If unset, GitwebCache::SimpleFileCache would be used, which is 'dumb' +# (but fast) file based caching layer, currently without any support for +# cache size limiting. It is therefore recommended that the cache directory +# be periodically completely deleted; this operation is safe to perform. +# Suggested mechanism: +# mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush +our $cache; +# Locations of 'cache.pm' file; if it is relative path, it is relative to +# the directory gitweb is run from +our $cache_pm = 'cache.pm'; + # You define site-wide feature defaults here; override them with # $GITWEB_CONFIG as necessary. our %feature = ( @@ -998,7 +1014,21 @@ if ($action !~ m/^(?:opml|project_list|project_index)$/ && !$project) { die_error(400, "Project needed"); } -$actions{$action}->(); + +if ($caching_enabled) { + do $cache_pm; + die $@ if $@; + + $cache ||= GitwebCache::SimpleFileCache->new({ + 'cache_root' => '/tmp/cache', + 'cache_depth' => 2, + 'expires_in' => 20, # in seconds + }); + cache_fetch($cache, $action); +} else { + $actions{$action}->(); +} + exit; ## ====================================================================== @@ -3207,7 +3237,9 @@ sub git_header_html { # 'application/xhtml+xml', otherwise send it as plain old 'text/html'. # we have to do this because MSIE sometimes globs '*/*', pretending to # support xhtml+xml but choking when it gets what it asked for. - if (defined $cgi->http('HTTP_ACCEPT') && + # Disable content-type negotiation when caching (use mimetype good for all). + if (!$caching_enabled && + defined $cgi->http('HTTP_ACCEPT') && $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ && $cgi->Accept('application/xhtml+xml') != 0) { $content_type = 'application/xhtml+xml'; @@ -3380,7 +3412,9 @@ sub git_footer_html { } print {$out} "</div>\n"; # class="page_footer" - if (defined $t0 && gitweb_check_feature('timed')) { + # timing info doesn't make much sense with output (response) caching + if (!$caching_enabled && + defined $t0 && gitweb_check_feature('timed')) { print {$out} "<div id=\"generating_info\">\n"; print {$out} 'This page took '. '<span id="generating_time" class="time_span">'. diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh index d9ffc90..d041083 100755 --- a/t/gitweb-lib.sh +++ b/t/gitweb-lib.sh @@ -27,6 +27,8 @@ our \$export_ok = ''; our \$strict_export = ''; our \$git_versions_must_match = 0; +our \$cache_pm = '$TEST_DIRECTORY/../gitweb/cache.pm'; + EOF cat >.git/description <<EOF diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh index 2fc7fdb..0f93962 100755 --- a/t/t9500-gitweb-standalone-no-errors.sh +++ b/t/t9500-gitweb-standalone-no-errors.sh @@ -639,4 +639,17 @@ test_expect_success \ gitweb_run "p=.git;a=summary"' test_debug 'cat gitweb.log' +# ---------------------------------------------------------------------- +# caching + +cat >>gitweb_config.perl <<\EOF + +$caching_enabled = 1; +EOF +test_expect_success \ + 'caching enabled' \ + 'gitweb_run "p=.git;a=summary"' +test_debug 'cat gitweb.log' + + test_done -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski ` (3 preceding siblings ...) 2010-01-23 0:27 ` [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) output caching (WIP) Jakub Narebski @ 2010-01-23 0:27 ` Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) Jakub Narebski ` (6 subsequent siblings) 11 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-23 0:27 UTC (permalink / raw) To: git Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley, Jakub Narebski Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This patch, as you can see, lack proper commit message: it is work in progress. As it is quick'n'dirty proof of concept patch, it lacks variables in gitweb.perl that can be used to control cache behaviour. In original patch in J.H. those were the following variables: $minCacheTime, $maxCacheTime. In the final version we should probably leave $check_load undefined, or set it to 'sub { return 0; }' instead of requiring that there is get_loadavg() subroutine in main namespace (see t9503 test change). It would be alos good idea to leave ->set_expires_in() subroutine which would set min and max range so that adaptive cache lifetime is effectively disabled. gitweb/cache.pm | 82 ++++++++++++++++++++++++++++++++++---- t/t9503/test_cache_interface.pl | 8 +++- 2 files changed, 80 insertions(+), 10 deletions(-) diff --git a/gitweb/cache.pm b/gitweb/cache.pm index 3a33158..9692e8d 100644 --- a/gitweb/cache.pm +++ b/gitweb/cache.pm @@ -61,24 +61,36 @@ sub new { my $self = {}; $self = bless($self, $class); - my ($root, $depth, $ns, $expires_in); + my ($root, $depth, $ns); + my ($expires_min, $expires_max, $increase_factor, $check_load); if (defined $p_options_hash_ref) { $root = $p_options_hash_ref->{'cache_root'}; $depth = $p_options_hash_ref->{'cache_depth'}; $ns = $p_options_hash_ref->{'namespace'}; - $expires_in = + $expires_min = + $p_options_hash_ref->{'expires_min'} || $p_options_hash_ref->{'default_expires_in'} || $p_options_hash_ref->{'expires_in'}; + $expires_max = + $p_options_hash_ref->{'expires_max'}; + $increase_factor = $p_options_hash_ref->{'expires_factor'}; + $check_load = $p_options_hash_ref->{'check_load'}; } $root = $DEFAULT_CACHE_ROOT unless defined($root); $depth = $DEFAULT_CACHE_DEPTH unless defined($depth); $ns = '' unless defined($ns); - $expires_in = -1 unless defined($expires_in); # <0 means never + $expires_min = 20 unless defined($expires_min); + $expires_max = 1200 unless defined($expires_max); + $increase_factor = 60 unless defined($increase_factor); + $check_load = \&main::get_loadavg unless defined($check_load); $self->set_root($root); $self->set_depth($depth); $self->set_namespace($ns); - $self->set_expires_in($expires_in); + $self->set_expires_min($expires_min); + $self->set_expires_max($expires_max); + $self->set_increase_factor($increase_factor); + $self->set_check_load($check_load); return $self; } @@ -124,19 +136,71 @@ sub set_namespace { $self->{'_Namespace'} = $namespace; } -sub get_expires_in { + +sub get_expires_min { my ($self) = @_; - return $self->{'_Expires_In'}; + return $self->{'_Expires_Min'}; +} + +sub set_expires_min { + my ($self, $expires_min) = @_; + + $self->{'_Expires_Min'} = $expires_min; } +sub get_expires_max { + my ($self) = @_; + + return $self->{'_Expires_Max'}; +} + +sub set_expires_max { + my ($self, $expires_max) = @_; + + $self->{'_Expires_Max'} = $expires_max; +} + +sub get_increase_factor { + my ($self) = @_; + + return $self->{'_Increase_Factor'}; +} -sub set_expires_in { - my ($self, $expires_in) = @_; +sub set_increase_factor { + my ($self, $increase_factor) = @_; - $self->{'_Expires_In'} = $expires_in; + $self->{'_Increase_Factor'} = $increase_factor; } +sub get_check_load { + my ($self) = @_; + + return $self->{'_Check_Load'}; +} + +sub set_check_load { + my ($self, $sub) = @_; + + $self->{'_Check_Load'} = $sub; +} + +# ...................................................................... + +sub get_expires_in { + my ($self) = @_; + my $expires_in = + #$self->get_expires_min() + + $self->get_increase_factor() * $self->get_check_load()->(); + + if ($expires_in < $self->get_expires_min()) { + return $self->get_expires_min(); + } elsif ($expires_in > $self->get_expires_max()) { + return $self->get_expires_max(); + } + + return $expires_in; +} # ---------------------------------------------------------------------- # (private) utility functions and methods diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl index 3644ca8..0870b87 100755 --- a/t/t9503/test_cache_interface.pl +++ b/t/t9503/test_cache_interface.pl @@ -6,6 +6,11 @@ use strict; use Test::More; +# mockup +sub get_loadavg { + return 0.0; +} + # test source version; there is no installation target for gitweb my $cache_pm = "$ENV{TEST_DIRECTORY}/../gitweb/cache.pm"; @@ -77,7 +82,8 @@ cmp_ok($call_count, '==', 1, 'get_value() is called once'); # Test cache expiration for 'expire now' # -$cache->set_expires_in(0); +$cache->set_expires_min(0); +$cache->set_expires_max(0); is($cache->get_expires_in(), 0, '"expires in" is set to now (0)'); $cache->set($key, $value); ok(!defined($cache->get($key)), 'cache is expired'); -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski ` (4 preceding siblings ...) 2010-01-23 0:27 ` [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) Jakub Narebski @ 2010-01-23 0:27 ` Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' problem (WIP) Jakub Narebski ` (5 subsequent siblings) 11 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-23 0:27 UTC (permalink / raw) To: git Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley, Jakub Narebski Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This patch, as you can see, lack proper commit message: it is work in progress. The reason behind supporting compute() as interface to cache is that with this interface it is possible (as it can be seen in the following patch) to use locking to avoid cache miss stampede (only one process regenerates cache). The support for $cache which do not provide '->compute($key, $code)' method is left just in case we would want to use such (external) caching engine. We should probably add generic_compute() subroutine which would use get/set, and provide compute-like interface. gitweb/cache.pm | 53 +++++++++++++++++++++++++++++++++++++++++++---------- 1 files changed, 43 insertions(+), 10 deletions(-) diff --git a/gitweb/cache.pm b/gitweb/cache.pm index 9692e8d..8dd4f39 100644 --- a/gitweb/cache.pm +++ b/gitweb/cache.pm @@ -432,6 +432,48 @@ sub cache_fetch { my ($cache, $action) = @_; my $key = gitweb_output_key(); + if ($cache->can('compute')) { + cache_fetch_compute($cache, $action, $key); + } else { + cache_fetch_get_set($cache, $action, $key); + } +} + +# calculate data to regenerate cache +sub cache_calculate { + my ($action) = @_; + + my $data; + open my $data_fh, '>', \$data + or die "Can't open memory file: $!"; + # matches "binmode STDOUT, ':uft8'" at beginning + binmode $data_fh, ':utf8'; + + $out = $data_fh || \*STDOUT; + $actions{$action}->(); + + close $data_fh; + + return $data; +} + +# for $cache which can ->compute($key, $code) +sub cache_fetch_compute { + my ($cache, $action, $key) = @_; + + my $data = $cache->compute($key, sub { cache_calculate($action) }); + + if (defined $data) { + # print cached data + binmode STDOUT, ':raw'; + local $/ = undef; + print STDOUT $data; + } +} + +# for $cache which can ->get($key) and ->set($key, $data) +sub cache_fetch_get_set { + my ($cache, $action, $key) = @_; + my $data = $cache->get($key); if (defined $data) { @@ -440,22 +482,13 @@ sub cache_fetch { print STDOUT $data; } else { - # calculate data and regenerate data - open my $data_fh, '>', \$data - or die "Can't open memory file: $!"; - # matches "binmode STDOUT, ':uft8'" at beginning - binmode $data_fh, ':utf8'; - - $out = $data_fh || \*STDOUT; - $actions{$action}->(); + $data = cache_calculate($action); if (defined $data) { $cache->set($key, $data); binmode STDOUT, ':raw'; local $/ = undef; print STDOUT $data; } - - close $data_fh; } } -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' problem (WIP) 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski ` (5 preceding siblings ...) 2010-01-23 0:27 ` [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) Jakub Narebski @ 2010-01-23 0:27 ` Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for filling cache (WIP) Jakub Narebski ` (4 subsequent siblings) 11 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-23 0:27 UTC (permalink / raw) To: git Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley, Jakub Narebski Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This patch, as you can see, lack proper commit message: it is work in progress. As you can see we can almost do the same for the cache which supports only get/set interface... but for replacing ->retrieve() with ->get() in ->compute(). This is argument for generic_compute subroutine, mentioned in previous patch. Note that we actually can and do test that provided mechanism avoid cache miss stampede (aka 'stampeding herd') problem... although the test should probably be run only with --long (this would need update to t/test-lib.sh to pass GIT_TEST_LONG to external tests in test_external and the like), as we need to sleep at least one second to ensure that we would have 'stampeding herd' problem. gitweb/cache.pm | 27 ++++++++++++++-- t/t9503/test_cache_interface.pl | 66 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 90 insertions(+), 3 deletions(-) diff --git a/gitweb/cache.pm b/gitweb/cache.pm index 8dd4f39..f514ee9 100644 --- a/gitweb/cache.pm +++ b/gitweb/cache.pm @@ -24,6 +24,7 @@ use File::Path qw(make_path); # requires version >= 2.0 use File::Spec; use File::Temp; use Digest::MD5 qw(md5_hex); +use Fcntl qw(:flock); # by default, the cache nests all entries on the filesystem two # directories deep @@ -217,7 +218,7 @@ sub _path_to_key { my ($self, $p_namespace, $p_key) = @_; return $self->_path_to_hashed_key($p_namespace, - _Build_Hashed_Key($p_key)); + _Build_Hashed_Key($p_key)); } # Take hashed key, and return file path @@ -228,6 +229,13 @@ sub _path_to_hashed_key { _Split_Word($p_hashed_key, $self->get_depth())); } +sub _lockfile_to_key { + my ($self, $p_namespace, $p_key) = @_; + + return $self->_path_to_hashed_key($p_namespace, + _Build_Hashed_Key($p_key)) . '.lock'; +} + # Split word into N components, where each component but last is two-letter word # e.g. _Split_Word("06b90e786e304a18fdfbd7c7bcc41a6b", 2) == qw(06 b90e786e304a18fdfbd7c7bcc41a6b); # _Split_Word("06b90e786e304a18fdfbd7c7bcc41a6b", 3) == qw(06 b9 0e786e304a18fdfbd7c7bcc41a6b); @@ -412,17 +420,30 @@ sub compute { my ($self, $p_key, $p_coderef) = @_; my $data = $self->get($p_key); - if (!defined $data) { + return $data if defined $data; + + my $lockfile = $self->_lockfile_to_key($self->get_namespace(), $p_key); + _Make_Path($lockfile); + open my $lock_fh, '+>', $lockfile; + # or die "Can't open lockfile '$lockfile': $!"; + if (my $lock_state = flock($lock_fh, LOCK_EX | LOCK_NB)) { + # acquired writers lock $data = $p_coderef->($self, $p_key); $self->set($p_key, $data); + } else { + # get readers lock + flock($lock_fh, LOCK_SH); + $data = $self->restore($self->get_namespace(), $p_key); } - + close $lock_fh; return $data; } 1; } # end of package GitwebCache::SimpleFileCache; +# ====================================================================== + # human readable key identifying gitweb output sub gitweb_output_key { return href(-replay => 1, -full => 1, -path_info => 0); diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl index 0870b87..43b806d 100755 --- a/t/t9503/test_cache_interface.pl +++ b/t/t9503/test_cache_interface.pl @@ -65,6 +65,8 @@ $cache->set($key, $value); is($cache->get($key), $value, 'get after set, returns cached value'); $cache->remove($key); ok(!defined($cache->get($key)), 'get after remove, is undefined'); +eval { $cache->remove('Not-Existent Key'); }; +ok(!$@, 'remove on non-existent key doesn\'t die'); # Test the getting and setting of a cached value # (CHI interface) @@ -80,6 +82,70 @@ is($cache->compute($key, \&get_value), $value, 'compute 2nd time (get)'); is($cache->compute($key, \&get_value), $value, 'compute 3rd time (get)'); cmp_ok($call_count, '==', 1, 'get_value() is called once'); +# Test 'stampeding herd' / cache miss stampede problem +# (probably should be run only if GIT_TEST_LONG) +sub get_value_slow { + $call_count++; + sleep 1; + return $value; +} +my ($pid, $kid_fh); + +$call_count = 0; +$cache->remove($key); +$pid = open $kid_fh, '-|'; +SKIP: { + skip "cannot fork: $!", 1 + unless defined $pid; + + my $data = $cache->get($key); + if (!defined $data) { + $data = get_value_slow(); + $cache->set($key, $data); + } + + if ($pid) { + my $child_count = <$kid_fh>; + chomp $child_count; + + waitpid $pid, 0; + close $kid_fh; + + $call_count += $child_count; + } else { + print "$call_count\n"; + exit 0; + } + + cmp_ok($call_count, '==', 2, 'parallel get/set: get_value_slow() called twice'); +} + +$call_count = 0; +$cache->remove($key); +$pid = open $kid_fh, '-|'; +SKIP: { + skip "cannot fork: $!", 1 + unless defined $pid; + + my $data = $cache->compute($key, \&get_value_slow); + + if ($pid) { + my $child_count = <$kid_fh>; + chomp $child_count; + + waitpid $pid, 0; + close $kid_fh; + + $call_count += $child_count; + } else { + print "$call_count\n"; + exit 0; + } + + cmp_ok($call_count, '==', 1, 'parallel compute: get_value_slow() called once'); +} + + # Test cache expiration for 'expire now' # $cache->set_expires_min(0); -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for filling cache (WIP) 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski ` (6 preceding siblings ...) 2010-01-23 0:27 ` [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' problem (WIP) Jakub Narebski @ 2010-01-23 0:27 ` Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating " Jakub Narebski ` (3 subsequent siblings) 11 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-23 0:27 UTC (permalink / raw) To: git Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley, Jakub Narebski Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This patch, as you can see, lack proper commit message: it is work in progress. Note that contrary to original patch by J.H. we fork unconditionally; otherwise the process which do recalculate cache entry would be at disadvantage compared to processes which failed race to acquire writers lock. Perhaps this should be made configurable, though. Also note that currently there is no limit on how stale the data which is being served while regenerating data can be, i.e. there is no equivalent for $maxCacheLife. Simplest solution (I guess) would be to just delete cache entry if it is too stale upfront. gitweb/cache.pm | 32 +++++++++++++++++++++++++++----- t/t9503/test_cache_interface.pl | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+), 5 deletions(-) diff --git a/gitweb/cache.pm b/gitweb/cache.pm index f514ee9..1adf467 100644 --- a/gitweb/cache.pm +++ b/gitweb/cache.pm @@ -426,14 +426,36 @@ sub compute { _Make_Path($lockfile); open my $lock_fh, '+>', $lockfile; # or die "Can't open lockfile '$lockfile': $!"; + + # try to retrieve stale data + $data = $self->restore($self->get_namespace(), $p_key); + if (my $lock_state = flock($lock_fh, LOCK_EX | LOCK_NB)) { # acquired writers lock - $data = $p_coderef->($self, $p_key); - $self->set($p_key, $data); + my $pid = fork() if $data; + if (!defined $pid || $pid) { + # parent, or didn't fork + $data = $p_coderef->($self, $p_key); + $self->set($p_key, $data); + + if ($pid) { + # wait for child (which would print) and exit + waitpid $pid, 0; + exit 0; + } else { + # there is no child, or was no $data to serve in background + ; + } + } else { + # child to serve $data + ; + } } else { - # get readers lock - flock($lock_fh, LOCK_SH); - $data = $self->restore($self->get_namespace(), $p_key); + if (!defined $data) { + # get readers lock if there is no stale data to serve + flock($lock_fh, LOCK_SH); + $data = $self->restore($self->get_namespace(), $p_key); + } } close $lock_fh; return $data; diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl index 43b806d..788e8f2 100755 --- a/t/t9503/test_cache_interface.pl +++ b/t/t9503/test_cache_interface.pl @@ -154,5 +154,42 @@ is($cache->get_expires_in(), 0, '"expires in" is set to now (0)'); $cache->set($key, $value); ok(!defined($cache->get($key)), 'cache is expired'); +# Test that cache returns stale data in existing but expired cache situation +# (probably should be run only if GIT_TEST_LONG) +$cache->set_expires_min(0); +$cache->set_expires_max(0); +my $stale_value = 'Stale Value'; +my $child_data = ''; +$cache->set($key, $stale_value); +$call_count = 0; +$pid = open $kid_fh, '-|'; +SKIP: { + skip "cannot fork: $!", 4 + unless defined $pid; + + my $data = $cache->compute($key, \&get_value_slow); + + if ($pid) { + $child_data = <$kid_fh>; + chomp $child_data; + + waitpid $pid, 0; + close $kid_fh; + } else { + print "$data\n"; + exit 0; + } + + is($data, $stale_value, 'stale data in parent when expired'); + is($child_data, $stale_value, 'stale data in child when expired'); + + # never expire + $cache->set_expires_min(-1); + $cache->set_expires_max(-1); + is($cache->get($key), $value, 'value got set correctly'); +} +$cache->set_expires_min(0); +$cache->set_expires_max(0); + done_testing(); -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski ` (7 preceding siblings ...) 2010-01-23 0:27 ` [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for filling cache (WIP) Jakub Narebski @ 2010-01-23 0:27 ` Jakub Narebski 2010-01-24 22:24 ` Petr Baudis 2010-01-23 19:55 ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H. ` (2 subsequent siblings) 11 siblings, 1 reply; 60+ messages in thread From: Jakub Narebski @ 2010-01-23 0:27 UTC (permalink / raw) To: git Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley, Jakub Narebski Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This patch, as you can see, lack proper commit message: it is work in progress. Contrary to the patch by J.H. the subroutine that is responsible for doing "Generating..." progress info / activity indicator is in gitweb.perl, and not in cache.pm. Also it does not use not declared nor defined $title variable ;-) Compared with original patch by J.H. gitweb prints generating info in more restricted set of situations; the set of actions where gitweb does not generate activity indicator is larger. We could probably provide activity indicator also for (possibly) non-HTML output, like 'blob_plain' or 'patches', provided that 'User-Agent' denotes that we are using web browser. In the future we would probably want Ajax-y JavaScript-based activity indicator, without need for busy-wait. NOTE that I have not tested that it works, only that it should not contain syntax errors... gitweb/cache.pm | 32 ++++++++++++++++++++- gitweb/gitweb.perl | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 104 insertions(+), 3 deletions(-) diff --git a/gitweb/cache.pm b/gitweb/cache.pm index 1adf467..2c7be3a 100644 --- a/gitweb/cache.pm +++ b/gitweb/cache.pm @@ -63,7 +63,8 @@ sub new { $self = bless($self, $class); my ($root, $depth, $ns); - my ($expires_min, $expires_max, $increase_factor, $check_load); + my ($expires_min, $expires_max, $increase_factor, + $check_load, $generating_info); if (defined $p_options_hash_ref) { $root = $p_options_hash_ref->{'cache_root'}; $depth = $p_options_hash_ref->{'cache_depth'}; @@ -76,6 +77,7 @@ sub new { $p_options_hash_ref->{'expires_max'}; $increase_factor = $p_options_hash_ref->{'expires_factor'}; $check_load = $p_options_hash_ref->{'check_load'}; + $generating_info = $p_options_hash_ref->{'generating_info'}; } $root = $DEFAULT_CACHE_ROOT unless defined($root); $depth = $DEFAULT_CACHE_DEPTH unless defined($depth); @@ -92,6 +94,7 @@ sub new { $self->set_expires_max($expires_max); $self->set_increase_factor($increase_factor); $self->set_check_load($check_load); + $self->set_generating_info($generating_info); return $self; } @@ -186,6 +189,18 @@ sub set_check_load { $self->{'_Check_Load'} = $sub; } +sub get_generating_info { + my ($self) = @_; + + return $self->{'_Generating_Info'}; +} + +sub set_generating_info { + my ($self, $sub) = @_; + + $self->{'_Generating_Info'} = $sub; +} + # ...................................................................... sub get_expires_in { @@ -203,6 +218,12 @@ sub get_expires_in { return $expires_in; } +sub generating_info { + if (defined $self->get_generating_info()) { + $self->get_generating_info()->($self, @_); + } +} + # ---------------------------------------------------------------------- # (private) utility functions and methods @@ -434,6 +455,9 @@ sub compute { # acquired writers lock my $pid = fork() if $data; if (!defined $pid || $pid) { + # provide "generating page..." info if there is no stale data to serve + $self->generating_info($p_key, $lock_fh) + unless ($data); # parent, or didn't fork $data = $p_coderef->($self, $p_key); $self->set($p_key, $data); @@ -451,8 +475,12 @@ sub compute { ; } } else { + # some else process is (re)generating cache if (!defined $data) { - # get readers lock if there is no stale data to serve + # there is no stale data to serve + # provide "generating page..." info + $self->generating_info($p_key, $lock_fh); + # get readers lock flock($lock_fh, LOCK_SH); $data = $self->restore($self->get_namespace(), $p_key); } diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 0394dc8..cb46b8c 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -13,7 +13,7 @@ use CGI qw(:standard :escapeHTML -nosticky); use CGI::Util qw(unescape); use CGI::Carp qw(fatalsToBrowser); use Encode; -use Fcntl ':mode'; +use Fcntl qw(:mode :flock); use File::Find qw(); use File::Basename qw(basename); binmode STDOUT, ':utf8'; @@ -1023,6 +1023,7 @@ if ($caching_enabled) { 'cache_root' => '/tmp/cache', 'cache_depth' => 2, 'expires_in' => 20, # in seconds + 'generating_info' => \&git_generating_data_html, }); cache_fetch($cache, $action); } else { @@ -3215,6 +3216,78 @@ sub blob_contenttype { ## ====================================================================== ## functions printing HTML: header, footer, error page +sub git_generating_data_html { + my ($cache, $key, $lock_fh) = @_; + + if ($action eq 'atom' || $action eq 'rss' || $action eq 'opml' || # feeds + $action eq 'blob_plain' || # unknown mimetype + $action eq 'commitdiff_plain' || # text/plain + $action eq 'patch' || $action eq 'patches' || # text/plain + $action eq 'snapshot') { # binary + return; + } + + my $title = "[Generating...] $site_name"; + if (defined $project) { + $title .= " - " . to_utf8($project); + if (defined $action) { + $title .= "/$action"; + if (defined $file_name) { + $title .= " - " . esc_path($file_name); + if ($action eq "tree" && $file_name !~ m|/$|) { + $title .= "/"; + } + } + } + } + + my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : ''; + print STDOUT $cgi->header(-type => 'text/html', -charset => 'utf-8', + -status=> '200 OK', -expires => 'now'); + print STDOUT <<"EOF"; +<?xml version="1.0" encoding="utf-8"?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US"> +<!-- git web interface version $version --> +<!-- git core binaries version $git_version --> +<head> +<meta http-equiv="content-type" content="text/html; charset=utf-8" /> +<meta http-equiv="refresh" content="0" /> +<meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version" /> +<meta name="robots" content="noindex, nofollow" /> +<title>$title</title> +</head> +<body> +EOF + print STDOUT 'Generating..'; + + my $ready; + my ($wait, $wait_step, $wait_max); # in seconds + $wait = 0; + $wait_step = 2; + $wait_max = 20; + $| = 1; # autoflush + do { + print STDOUT '.'; + + sleep $wait_step if ($wait > 0); + $wait += $wait_step; + + $ready = flock($lock_fh, LOCK_SH|LOCK_NB) + if $lock_fh; + + } while (!$ready && ($wait < $wait_max)); + + print STDOUT <<"EOF"; + +</body> +</html> +EOF + #exit 0; + return; +} + sub git_header_html { my $status = shift || "200 OK"; my $expires = shift; -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-23 0:27 ` [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating " Jakub Narebski @ 2010-01-24 22:24 ` Petr Baudis 2010-01-25 0:03 ` Jakub Narebski 2010-01-25 11:46 ` Jakub Narebski 0 siblings, 2 replies; 60+ messages in thread From: Petr Baudis @ 2010-01-24 22:24 UTC (permalink / raw) To: Jakub Narebski Cc: git, John 'Warthog9' Hawley, John 'Warthog9' Hawley I have stupid question, common to both the original patch and this RFC. > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when > regenerating cache (WIP) Just why is a "Generating..." page appropriate? I have to admit I hate it; can you please at least make it configurable? Why is it needed at all? It generates unnecessary redirects, clears my previous page in the browser by a useless placeholder, confuses non-interactive HTTP clients and I just can't see the real purpose. What's the gain? Thanks, -- Petr "Pasky" Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-24 22:24 ` Petr Baudis @ 2010-01-25 0:03 ` Jakub Narebski 2010-01-25 1:17 ` Jakub Narebski 2010-01-25 11:46 ` Jakub Narebski 1 sibling, 1 reply; 60+ messages in thread From: Jakub Narebski @ 2010-01-25 0:03 UTC (permalink / raw) To: Petr Baudis Cc: git, John 'Warthog9' Hawley, John 'Warthog9' Hawley On Sun, 24 Jan 2010, Petr Baudis wrote: > I have stupid question, common to both the original patch and this > RFC. > > > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when > > regenerating cache (WIP) > > Just why is a "Generating..." page appropriate? > > I have to admit I hate it; can you please at least make it > configurable? Why is it needed at all? It generates unnecessary > redirects, clears my previous page in the browser by a useless > placeholder, confuses non-interactive HTTP clients and I just can't > see the real purpose. > > What's the gain? Don't ask me, I am just following original patch by John 'Warthog9' Hawley (J.H.) here, splitting it into smaller and hopefully easier to review parts, with each separate feature in a separate patch (so that features like this one are more visible). I _think_ that in original patch by J.H. this feature was configurable (at least to some extent). As this series is more proof of concept than ready to apply patches, some of configurability is missing currently. I think that adding the "Generating..." page follows behavior of other well known web applications, such as for example GMail. Currently if caching is enabled, and there is no appropriate entry in cache (not even [slightly] stale entry) the page is shown only when it is generated in full: there is no streaming (although I planned to add it). So for example for 'blame' view or for pickaxe search you could wait quite a bit for a page to be shown. The "Generating..." page is a kind of activity indicator here. Note that the JavaScript-using Ajax-y version from the following patch '[RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when regenerating cache (WIP)' http://permalink.gmane.org/gmane.comp.version-control.git/137896 is not visible in browser history, as it simply replaces "Generating..." page with the result. So there is no 'unnecessary redirect', and there is no 'useless placeholder' in browser history. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-25 0:03 ` Jakub Narebski @ 2010-01-25 1:17 ` Jakub Narebski 0 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-25 1:17 UTC (permalink / raw) To: Petr Baudis Cc: git, John 'Warthog9' Hawley, John 'Warthog9' Hawley Jakub Narebski wrote: > On Sun, 24 Jan 2010, Petr Baudis wrote: > > > I have stupid question, common to both the original patch and this > > RFC. > > > > > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when > > > regenerating cache (WIP) > > > > Just why is a "Generating..." page appropriate? > > > > I have to admit I hate it; can you please at least make it > > configurable? Why is it needed at all? It generates unnecessary > > redirects, clears my previous page in the browser by a useless > > placeholder, confuses non-interactive HTTP clients and I just can't > > see the real purpose. > Note that the JavaScript-using Ajax-y version from the following patch > '[RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when > regenerating cache (WIP)' > http://permalink.gmane.org/gmane.comp.version-control.git/137896 > is not visible in browser history, as it simply replaces "Generating..." > page with the result. So there is no 'unnecessary redirect', and there > is no 'useless placeholder' in browser history. Actually I have checked and the version used on git.kernel.org does show "Generating..." page, but it does not appear in the history. I guess that refresh with no URL does not count as redirect. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-24 22:24 ` Petr Baudis 2010-01-25 0:03 ` Jakub Narebski @ 2010-01-25 11:46 ` Jakub Narebski 2010-01-25 13:02 ` Petr Baudis 1 sibling, 1 reply; 60+ messages in thread From: Jakub Narebski @ 2010-01-25 11:46 UTC (permalink / raw) To: Petr Baudis Cc: git, John 'Warthog9' Hawley, John 'Warthog9' Hawley On Sun, 24 Jan 2010, Petr Baudis wrote: > I have stupid question, common to both the original patch and this > RFC. > > > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when > > regenerating cache (WIP) > > Just why is a "Generating..." page appropriate? > > I have to admit I hate it; can you please at least make it > configurable? Why is it needed at all? It [...] confuses > non-interactive HTTP clients [...] First, if I understand the code correctly HTTP clients which do not honor metaredirect (http-equiv refresh) would get page which looks the following <html> Generating... </html> <html> Gitweb page </html> Second, gitweb can always check User-Agent header, and serve "Generating..." page only to web browsers: unless (defined $cgi->user_agent() && $cgi->user_agent() =~ /\b(Mozilla|Opera)\b/i) { return; } or something like that. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-25 11:46 ` Jakub Narebski @ 2010-01-25 13:02 ` Petr Baudis 2010-01-25 13:48 ` Jakub Narebski 0 siblings, 1 reply; 60+ messages in thread From: Petr Baudis @ 2010-01-25 13:02 UTC (permalink / raw) To: Jakub Narebski Cc: git, John 'Warthog9' Hawley, John 'Warthog9' Hawley On Mon, Jan 25, 2010 at 12:46:39PM +0100, Jakub Narebski wrote: > On Sun, 24 Jan 2010, Petr Baudis wrote: > > I have stupid question, common to both the original patch and this > > RFC. > > > > > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when > > > regenerating cache (WIP) > > > > Just why is a "Generating..." page appropriate? > > > > I have to admit I hate it; can you please at least make it > > configurable? Why is it needed at all? It [...] confuses > > non-interactive HTTP clients [...] > > First, if I understand the code correctly HTTP clients which do not > honor metaredirect (http-equiv refresh) would get page which looks > the following > > <html> > Generating... > </html> > <html> > Gitweb page > </html> To be clear, I conjectured it confuses non-interactive HTTP clients from one of the TODOs in your patch series - I'm not sure about it myself. :) > Second, gitweb can always check User-Agent header, and serve > "Generating..." page only to web browsers: > > unless (defined $cgi->user_agent() && > $cgi->user_agent() =~ /\b(Mozilla|Opera)\b/i) { > return; > } > > or something like that. I'm not too happy with this. What about Safari? Opera? ELinks? There's a lot of web browsers. Most of the issues can be worked around, but I'm not sure why to go through all the trouble. I just personally don't see the value in having the placeholder in there at all, to me it is distracting UI even if all the technicalities are put aside. But if it will be possible to turn this off eventually, it's all your call whether to bother implementing it. :-) -- Petr "Pasky" Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-25 13:02 ` Petr Baudis @ 2010-01-25 13:48 ` Jakub Narebski 2010-01-25 13:56 ` Petr Baudis 2010-01-25 20:41 ` J.H. 0 siblings, 2 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-25 13:48 UTC (permalink / raw) To: Petr Baudis Cc: git, John 'Warthog9' Hawley, John 'Warthog9' Hawley On Mon, Jan 25, 2010, Petr Baudis wrote: > On Mon, Jan 25, 2010 at 12:46:39PM +0100, Jakub Narebski wrote: >> On Sun, 24 Jan 2010, Petr Baudis wrote: >>> I have stupid question, common to both the original patch and this >>> RFC. >>> >>>> [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when >>>> regenerating cache (WIP) >>> >>> Just why is a "Generating..." page appropriate? >>> >>> I have to admit I hate it; can you please at least make it >>> configurable? Why is it needed at all? It [...] confuses >>> non-interactive HTTP clients [...] >> Second, gitweb can always check User-Agent header, and serve >> "Generating..." page only to web browsers: >> >> unless (defined $cgi->user_agent() && >> $cgi->user_agent() =~ /\b(Mozilla|Opera)\b/i) { >> return; >> } >> >> or something like that. > > I'm not too happy with this. What about Safari? Opera? ELinks? There's a > lot of web browsers. The "Mozilla" part would catch all "Mozilla compatibile" web browsers, including Firefox (and other Gecko-based web browsers), Internet Explorer, WebKit based browsers including Safari and Chrome and Konqueror. The "Opera" part would catch Opera. http://www.nczonline.net/blog/2010/01/12/history-of-the-user-agent-string/ As to other web browsers like Elinks, Lynx, w3m, Dillo, etc.: the issue is whether they honor '<meta http-equiv="refresh" content="0" />'. I think it is better to stay on the safe side; it is not disaster if web browser is not shown "Generating..." page where it could (but see explanation below). > Most of the issues can be worked around, but I'm not sure why to go > through all the trouble. I just personally don't see the value in having > the placeholder in there at all, to me it is distracting UI even if all > the technicalities are put aside. The issue that "Generating..." page tries to solve is, I think, the following. Some actions, like 'blame' view or pickaxe search, or grep search, can take quite a long time to generate, with times counted in tens of seconds. It is not that visible for non-caching, because gitweb streams output so we have at least _some_ output upfront quite fast. Now those patches (mine and J.H. both) make gitweb use locking (it is IIRC configurable in J.H. patch) to make only one process generate the page if it is missing from cache, or is stale. Now if it is missing, we have to wait until it is generated in full before being able to show it to client. While it is possible to "tee" output (using PerlIO::tee, or Capture::Tiny, or tie like CGI::Cache) writing it simultaneously to browser and to cache for the process that is generating data, it is as far as I understand it impossible for processes which are waiting for data. Therefore the need for "Generating..." page, so the user does not think that web server hung or something, and is not generating output. We can try to reduce occurrences of cache miss stampedes by using 'expires_variance' feature[1] from CHI - Unified caching interface. We can also turn off locking and tee output to have some output upfront as an activity indicator instead of this "Generating..." page. [1]: http://search.cpan.org/~jswartz/CHI-0.33/lib/CHI.pm#set > But if it will be possible to turn this off eventually, it's all your > call whether to bother implementing it. :-) In my implementation it is (or rather would be) as simple as just not passing 'generating_info' => \&git_generating_data_html in the GitwebCache::SimpleFileCache constructor. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-25 13:48 ` Jakub Narebski @ 2010-01-25 13:56 ` Petr Baudis 2010-01-25 20:32 ` J.H. 2010-01-25 20:58 ` Jakub Narebski 2010-01-25 20:41 ` J.H. 1 sibling, 2 replies; 60+ messages in thread From: Petr Baudis @ 2010-01-25 13:56 UTC (permalink / raw) To: Jakub Narebski Cc: git, John 'Warthog9' Hawley, John 'Warthog9' Hawley On Mon, Jan 25, 2010 at 02:48:26PM +0100, Jakub Narebski wrote: > Now those patches (mine and J.H. both) make gitweb use locking > (it is IIRC configurable in J.H. patch) to make only one process > generate the page if it is missing from cache, or is stale. Now > if it is missing, we have to wait until it is generated in full > before being able to show it to client. While it is possible to > "tee" output (using PerlIO::tee, or Capture::Tiny, or tie like > CGI::Cache) writing it simultaneously to browser and to cache for > the process that is generating data, it is as far as I understand > it impossible for processes which are waiting for data. Therefore > the need for "Generating..." page, so the user does not think that > web server hung or something, and is not generating output. Ah, ok, so the message is there to cover up for a technical problem. ;-) I didn't quite realize. Then, it would be great to tweak the mechanisms so that the user does not really have to wait. So, I wonder about two things: (i) How often does it happen that two requests for the same page are received? Has anyone measured it? Or is at least able to make a minimally educated guess? IOW, isn't this premature optimization? (ii) Can't the locked gitwebs do the equivalent of tail -f? P.S.: Again the disclaimer - if this is "too hard", it's better to accept patches like they are, then improve this later. But perhaps a better solution would be not to clutter the code by optimizing this case at all if it's not clear it really matters in the real world. Petr "Pasky" Baudis ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-25 13:56 ` Petr Baudis @ 2010-01-25 20:32 ` J.H. 2010-01-26 1:49 ` Jakub Narebski 2010-01-28 17:39 ` Petr Baudis 2010-01-25 20:58 ` Jakub Narebski 1 sibling, 2 replies; 60+ messages in thread From: J.H. @ 2010-01-25 20:32 UTC (permalink / raw) To: Petr Baudis; +Cc: Jakub Narebski, git, John 'Warthog9' Hawley On 01/25/2010 05:56 AM, Petr Baudis wrote: > On Mon, Jan 25, 2010 at 02:48:26PM +0100, Jakub Narebski wrote: >> Now those patches (mine and J.H. both) make gitweb use locking >> (it is IIRC configurable in J.H. patch) to make only one process >> generate the page if it is missing from cache, or is stale. Now >> if it is missing, we have to wait until it is generated in full >> before being able to show it to client. While it is possible to >> "tee" output (using PerlIO::tee, or Capture::Tiny, or tie like >> CGI::Cache) writing it simultaneously to browser and to cache for >> the process that is generating data, it is as far as I understand >> it impossible for processes which are waiting for data. Therefore >> the need for "Generating..." page, so the user does not think that >> web server hung or something, and is not generating output. > > Ah, ok, so the message is there to cover up for a technical problem. ;-) > I didn't quite realize. Then, it would be great to tweak the mechanisms > so that the user does not really have to wait. No, that is an incorrect assumption on how the 'Generating...' page works, and your missing a bit of the point. (1) The message itself 'Generating...' is a que to the user that something is happening and that the browser is not actually hanging. Web users are at the point where if things are not instantaneous and show immediately they will either browse away completely or hit the refresh button incessantly until content does appear. While the page is usually only seen for about a second, and I'll admit it can be annoying, it's nothing more than a 'sit tight a second'. For things like the front page it can take upwards of 7 seconds to generate for a single user, a lot to ask for a no response scenario. (2) It prevents the stampeding herd problem, which was very vehemently discussed 4 years ago by HPA and myself and roughly boils down to this: When a single user comes into the site, in particular the front page, it kicks off a process that will start to generate at it, causing a huge amount of git requests into individual repositories and a lot of disk i/o. A second user will then come in and the same requests will start to be done from the beginning again, and so on until you basically kill the machine because the disk i/o goes up enough that it can't ever service the requests fast enough. This does 2 things in the end: 1) means there's only 1 copy of the page ever being generated, thus meaning there isn't extraneous and dangerous disk i/o going on on the system 2) prevents a user from reporting to the website that it's broken by giving them a visual que that things aren't broken. > So, I wonder about two things: > > (i) How often does it happen that two requests for the same page are > received? Has anyone measured it? Or is at least able to make > a minimally educated guess? IOW, isn't this premature optimization? For most pages, not many but it happens more often than you think. The data I have is much too old to be useful now but the front page could, at times, have up to 30 people waiting for it without caching. This is a very important patch believe it or not. For a site the size of kernel.org it cannot exist without this. But here's a quick stat, in 36 hours git.kernel.org has had 156099 accesses world wide or about 1.2 accesses a second. android.git.kernel.org, in the same time period has had 115818 accesses. If the first request takes 7 seconds to generate, by the time it's done there are now 3 additional requests running. If it again takes 7 seconds to generate there are now another 3 requests running, etc. Very quickly you've got so much i/o running the box more or less is useless. > (ii) Can't the locked gitwebs do the equivalent of tail -f? Not really going to help much, most of the gitweb operations won't output much of anything beyond the header until it's collected all of the data it needs anyway and then there will be a flurry of output. It also means that this 'Generating...' page will only work for caching schemes that tail can read out of, which I'm not sure it would work all that well with things like memcached or a non-custom caching layer where we don't necessarily have direct access to the file being written to. At least the way I had it (and I'll admit I haven't read through Jakub's re-working of my patches so I don't know if it's still there) is that with background caching you only get the 'Generating...' page if it's new or the content is grossly out of data. If it's a popular page and it's not grossly out of date it shows you the 'stale' data while it generates the new content in the background anyway, only locking you out when the new file is being written. Or at least that's how I had it. - John 'Warthog9' Hawley ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-25 20:32 ` J.H. @ 2010-01-26 1:49 ` Jakub Narebski 2010-01-28 17:39 ` Petr Baudis 1 sibling, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-26 1:49 UTC (permalink / raw) To: J.H.; +Cc: Petr Baudis, git, John 'Warthog9' Hawley On Mon, 25 Jan 2010, J.H. wrote: > On 01/25/2010 05:56 AM, Petr Baudis wrote: >> On Mon, Jan 25, 2010 at 02:48:26PM +0100, Jakub Narebski wrote: >>> Now those patches (mine and J.H. both) make gitweb use locking >>> (it is IIRC configurable in J.H. patch) to make only one process >>> generate the page if it is missing from cache, or is stale. Now >>> if it is missing, we have to wait until it is generated in full >>> before being able to show it to client. While it is possible to >>> "tee" output (using PerlIO::tee, or Capture::Tiny, or tie like >>> CGI::Cache) writing it simultaneously to browser and to cache for >>> the process that is generating data, it is as far as I understand >>> it impossible for processes which are waiting for data. Therefore >>> the need for "Generating..." page, so the user does not think that >>> web server hung or something, and is not generating output. >> >> Ah, ok, so the message is there to cover up for a technical problem. ;-) >> I didn't quite realize. Then, it would be great to tweak the mechanisms >> so that the user does not really have to wait. > > No, that is an incorrect assumption on how the 'Generating...' page > works, and your missing a bit of the point. > > (1) The message itself 'Generating...' is a cue to the user that > something is happening and that the browser is not actually hanging. > Web users are at the point where if things are not instantaneous and > show immediately they will either browse away completely or hit the > refresh button incessantly until content does appear. While the page is > usually only seen for about a second, and I'll admit it can be annoying, > it's nothing more than a 'sit tight a second'. For things like the > front page it can take upwards of 7 seconds to generate for a single > user, a lot to ask for a no response scenario. Pasky is asking later whether instead of separate "Generating..." page (be it server side using trick with http-equiv refresh taking place when page/connection is closed, or JavaScript Ajax-y one) couldn't gitweb use streaming and have early parts of generated page (in particular the page header and navigation bar) serve as activity indicator. I think it is possible, if a bit more complicated, and such feature can be introduced in a separate commit. As to 'sit tight for a second': the code can be modified to print/show "Generating..." page only if we are waiting more than second (or more than half of second with utime from Time::HiRes) for the data. Pasky, would such change address some of your concerns about "Generating..." page as activity indicator? > > (2) It prevents the stampeding herd problem, which was very vehemently > discussed 4 years ago by HPA and myself and roughly boils down to this: > > When a single user comes into the site, in particular the front page, it > kicks off a process that will start to generate at it, causing a huge > amount of git requests into individual repositories and a lot of disk > i/o. A second user will then come in and the same requests will start > to be done from the beginning again, and so on until you basically kill > the machine because the disk i/o goes up enough that it can't ever > service the requests fast enough. As you can see from the way my proof of concept (of splitting) caching series is split, I think that those two features are separate. Using locking to have one writer, and protect against cache miss stampede problem is one issue, and having "Generating..." page is another issue. They are not orthogonal as they are now, as "Generating..." page depends on locking to check if page got generated, but it could simply check whether the cache file is present already. > > This does 2 things in the end: > > 1) means there's only 1 copy of the page ever being generated, thus > meaning there isn't extraneous and dangerous disk i/o going on on the system > > 2) prevents a user from reporting to the website that it's broken by > giving them a visual cue that things aren't broken. 1) is about locking ('expires_variance' can help there, too), 2) is about "Generating..." but could be done with streaming: "tee"/"tail" solution. >> So, I wonder about two things: >> >> (i) How often does it happen that two requests for the same page are >> received? Has anyone measured it? Or is at least able to make >> a minimally educated guess? IOW, isn't this premature optimization? > > For most pages, not many but it happens more often than you think. The > data I have is much too old to be useful now but the front page could, > at times, have up to 30 people waiting for it without caching. This is > a very important patch believe it or not. For a site the size of > kernel.org it cannot exist without this. > > But here's a quick stat, in 36 hours git.kernel.org has had > 156099 accesses world wide or about 1.2 accesses a second. > > android.git.kernel.org, in the same time period has had 115818 accesses. > > If the first request takes 7 seconds to generate, by the time it's done > there are now 3 additional requests running. If it again takes 7 > seconds to generate there are now another 3 requests running, etc. Very > quickly you've got so much i/o running the box more or less is useless. On the other hand do cache for front page (projects list) ever expire so that there are no stale data to serve? So the issue with growing I/O is I guess only with priming the cache. BTW. I wonder how well the 'expires_variance' solution used in CHI would protect against cache miss stampedes; that probably depends on the ratio of variance to time it takes gitweb to generate the page. >> (ii) Can't the locked gitwebs do the equivalent of tail -f? > > Not really going to help much, most of the gitweb operations won't > output much of anything beyond the header until it's collected all of > the data it needs anyway and then there will be a flurry of output. Well, it might be enough to have headers as activity indicator... but on the other hand users can think that server is broken if they see no further data, as opposed to the "Generating..." page. > It also means that this 'Generating...' page will only work for caching > schemes that tail can read out of, which I'm not sure it would work all > that well with things like memcached or a non-custom caching layer where > we don't necessarily have direct access to the file being written to. I don't think that caching interfaces such as CHI or Cache::Cache, or individual caching engines such as Cache::Memcached or Cache::FastMmap offer streaming access. OTOH we can read from the side as the data is generated ("tee" to a temporary file, or something), but it is even more complication. > > At least the way I had it (and I'll admit I haven't read through Jakub's > re-working of my patches so I don't know if it's still there) is that > with background caching you only get the 'Generating...' page if it's > new or the content is grossly out of data. If it's a popular page and > it's not grossly out of date it shows you the 'stale' data while it > generates the new content in the background anyway, only locking you out > when the new file is being written. Or at least that's how I had it. It is there, i.e. gitweb would serve stale data if it can be found, but it lacks 'too stale to serve as stale data' timeout/expire time. I skipped this feature to get split series out of door faster. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-25 20:32 ` J.H. 2010-01-26 1:49 ` Jakub Narebski @ 2010-01-28 17:39 ` Petr Baudis 2010-01-31 11:58 ` Jakub Narebski 1 sibling, 1 reply; 60+ messages in thread From: Petr Baudis @ 2010-01-28 17:39 UTC (permalink / raw) To: J.H.; +Cc: Jakub Narebski, git, John 'Warthog9' Hawley On Mon, Jan 25, 2010 at 12:32:37PM -0800, J.H. wrote: > This does 2 things in the end: > > 1) means there's only 1 copy of the page ever being generated, thus > meaning there isn't extraneous and dangerous disk i/o going on on the system But this has nothing to do with what you _do_ when there are multiple requests, whether you do the same as if caching was disabled (hang until content is generated) or doing something novel (creating redirects through "Generating..." page). > 2) prevents a user from reporting to the website that it's broken by > giving them a visual que that things aren't broken. But this has nothing to do with caching per se, right? I think it actually makes _no difference_ if caching is enabled or not to this problem, or am I missing something? My point is, I guess, that showing the Generating page doesn't seem to have actually anything to do with the caching itself? > > So, I wonder about two things: > > > > (i) How often does it happen that two requests for the same page are > > received? Has anyone measured it? Or is at least able to make > > a minimally educated guess? IOW, isn't this premature optimization? > > For most pages, not many but it happens more often than you think. The > data I have is much too old to be useful now but the front page could, > at times, have up to 30 people waiting for it without caching. This is > a very important patch believe it or not. For a site the size of > kernel.org it cannot exist without this. ..snip numbers.. Ok, I think you convinced me here (even if I think the front page is kind of special case). > > (ii) Can't the locked gitwebs do the equivalent of tail -f? > > Not really going to help much, most of the gitweb operations won't > output much of anything beyond the header until it's collected all of > the data it needs anyway and then there will be a flurry of output. It > also means that this 'Generating...' page will only work for caching > schemes that tail can read out of, which I'm not sure it would work all > that well with things like memcached or a non-custom caching layer where > we don't necessarily have direct access to the file being written to. > > At least the way I had it (and I'll admit I haven't read through Jakub's > re-working of my patches so I don't know if it's still there) is that > with background caching you only get the 'Generating...' page if it's > new or the content is grossly out of data. If it's a popular page and > it's not grossly out of date it shows you the 'stale' data while it > generates the new content in the background anyway, only locking you out > when the new file is being written. Or at least that's how I had it. Well, my user experience with gitweb on kernel.org is that I get Generating... page all the time when I dive deep enough to the object tree. I just find it really distracting and sometimes troublesome when I want to wget some final object. I think it's fine to take in the caching support with the Generating... page in the bundle, but I do want to declare my intention to get rid of it later, at least for caching backends that could do without it - for pages where content appears incrementally, tail -f, for pages where content appears all at once, show at least the header and some "I'm busy" notification without redirects. -- Petr "Pasky" Baudis If you can't see the value in jet powered ants you should turn in your nerd card. -- Dunbal (464142) ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-28 17:39 ` Petr Baudis @ 2010-01-31 11:58 ` Jakub Narebski 0 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-31 11:58 UTC (permalink / raw) To: Petr Baudis; +Cc: J.H., git, John 'Warthog9' Hawley On Thu, Jan 28, 2010, Petr Baudis wrote: > On Mon, Jan 25, 2010 at 12:32:37PM -0800, J.H. wrote: > > This does 2 things in the end: > > > > 1) means there's only 1 copy of the page ever being generated, thus > > meaning there isn't extraneous and dangerous disk i/o going on on the system > > But this has nothing to do with what you _do_ when there are multiple > requests, whether you do the same as if caching was disabled (hang until > content is generated) or doing something novel (creating redirects > through "Generating..." page). > > > 2) prevents a user from reporting to the website that it's broken by > > giving them a visual que that things aren't broken. > > But this has nothing to do with caching per se, right? I think it > actually makes _no difference_ if caching is enabled or not to this > problem, or am I missing something? > > > My point is, I guess, that showing the Generating page doesn't seem to > have actually anything to do with the caching itself? The point is that without caching it is easy to streaming response, and to consider early parts of page (like page header, generated before any heavy work) to serve as activity indicator. With caching it is difficult to have streaming response, both from technical point of view (writer must generate to client and to cache simultaneously, readers must know when writer finished work to close connection), and from robustness point of view (what happens if writer is interrupted / killed before finishing generating output). With "generate then display" (which is not exclusive to caching, and is another possible way of generating content even without caching) we rather need some kind of activity indicator like "Generating..." page. I think that "Generating..." page can be improved in two ways: * Show "Generating..." page only if we are waiting for response for more than one second. This might need mucking with alarms, as I think that sleep 1 before $self->generating_info(...) would be not a good solution. * Stream response (using PerlIO::tee layer from PerlIO::Util, or Capture::Tiny module, or tied filehandle like in CGI::Cache) for writer (i.e. process generating data), and wait for it to be finished (perhaps with "Generating...") in readers. This way you wouldn't get "Generating..." page for rare views/URLs, and for common views/URLs there is high probability that you would not need "Generating..." page as there would be slightly stale response to serve. Of course one can implement _both_ of those solutions, i.e. wait one seconds in readers, and stream in writer. I am not sure, but there might be another issue why activity indicator is more important for the case with caching enabled. If you interrupt writer, one of readers waiting for finished data would have to take role of writer, which besides need for technical solution to this problem would mean longer wait. [..] > > > (ii) Can't the locked gitwebs do the equivalent of tail -f? > > > > Not really going to help much, most of the gitweb operations won't > > output much of anything beyond the header until it's collected all of > > the data it needs anyway and then there will be a flurry of output. It > > also means that this 'Generating...' page will only work for caching > > schemes that tail can read out of, which I'm not sure it would work all > > that well with things like memcached or a non-custom caching layer where > > we don't necessarily have direct access to the file being written to. > > > > At least the way I had it (and I'll admit I haven't read through Jakub's > > re-working of my patches so I don't know if it's still there) is that > > with background caching you only get the 'Generating...' page if it's > > new or the content is grossly out of data. If it's a popular page and > > it's not grossly out of date it shows you the 'stale' data while it > > generates the new content in the background anyway, only locking you out > > when the new file is being written. Or at least that's how I had it. > > Well, my user experience with gitweb on kernel.org is that I get > "Generating..." page all the time when I dive deep enough to the object > tree. I just find it really distracting and sometimes troublesome when > I want to wget some final object. First, the user_agent checking would help there (it's a pity that all web spiders (bots) and all non-interactive downloaders do not say what they are explicitly in User-Agent string). Second, I guess that waiting 1 second (or more) before showing "Generating..." page would help in most cases. > > I think it's fine to take in the caching support with the Generating... > page in the bundle, but I do want to declare my intention to get rid of > it later, at least for caching backends that could do without it - for > pages where content appears incrementally, tail -f, for pages where > content appears all at once, show at least the header and some "I'm > busy" notification without redirects. In the final version this should be fully configurable. Note that the series of patches I have send were just proof of concept for splitting caching patch into smaller parts / individual features. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-25 13:56 ` Petr Baudis 2010-01-25 20:32 ` J.H. @ 2010-01-25 20:58 ` Jakub Narebski 1 sibling, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-25 20:58 UTC (permalink / raw) To: Petr Baudis Cc: git, John 'Warthog9' Hawley, John 'Warthog9' Hawley On Mon, Jan 25, 2010 at 14:56 +0100, Petr Baudis wrote: > On Mon, Jan 25, 2010 at 02:48:26PM +0100, Jakub Narebski wrote: > > Now those patches (mine and J.H. both) make gitweb use locking > > (it is IIRC configurable in J.H. patch) to make only one process > > generate the page if it is missing from cache, or is stale. Now > > if it is missing, we have to wait until it is generated in full > > before being able to show it to client. While it is possible to > > "tee" output (using PerlIO::tee, or Capture::Tiny, or tie like > > CGI::Cache) writing it simultaneously to browser and to cache for > > the process that is generating data, it is as far as I understand > > it impossible for processes which are waiting for data. Therefore > > the need for "Generating..." page, so the user does not think that > > web server hung or something, and is not generating output. > > Ah, ok, so the message is there to cover up for a technical problem. ;-) > I didn't quite realize. Then, it would be great to tweak the mechanisms > so that the user does not really have to wait. Well, the mechanism would certainly be configurable in final version (current split version is more of proof of concept of splitting). > > So, I wonder about two things: > > (i) How often does it happen that two requests for the same page are > received? Has anyone measured it? Or is at least able to make > a minimally educated guess? IOW, isn't this premature optimization? To be more exact the question is how often second request for the same page appears when earlier request didn't finished processing. It is the matter of both frequency of given requests, and time it takes to generate request (which grows with growing load on server). As to measurements: Pasky, do you have access logs, or their analysis a la AWStats, Webalizer and the like, for repo.or.cz? Warthog9, do you have access logs or analysis for git.kernel.org? Can you get similar from fedorahosted? > (ii) Can't the locked gitwebs do the equivalent of tail -f? Well, it could, in principle, but it would need some changes. First, instead of using temporary file to create cache entry atomically (write to temporary file, then rename) the process generating data would have to write to file other processes can read from. It could be e.g. lockfile. Second, there would be needed extended cache API so that generated data is streamed to cache file, ->set($key, $data) ==> ->set($key, $fh) or ->set_io($key, $fh). This would mean some complications, but what might be more important is that this trick would not work as far as I can see with other caching backends / caching engines that the one from gitweb/cache.pm (like memcached or mmap based ones). Then the code could look like the following (in pseudocode): try to acquire writers lock if (acquired writers lock) { generate and "tee" response create cache entry } else { # <<<<<< while (not acquired writers lock && sysread something) { print <data>; } # >>>>>> retrieve and print (rest) of data } where parts between <<<<<< and >>>>>> are new. But there is another complication: gitweb needs to be able to deal with the situation where process generating data got interrupted before creating full output, or process generating data ran die_error which does not generate any cache entry (e.g. if the URL we are trying to access returns 404 not found - the check for existence of object can take a while if the system is busy, I think). Now in current implementation either cache entry is written in full, or it is not written at all. It would be, I think, fairly easy to check with the current code whether cache entry got generated when we acquired readers lock (when the process get terminated, the lock gets released, which is advantage over using atomic creating file with O_EXCL for locking), and if we didn't repeat the whole process. With the "tee"/"tail" solution if the process generating data got interrupted before end, we can detect such situation, but currently I have no idea what should be done in such situation. We can as easily as for the current solution (which needs "Generating..." page for activity indicator) to detect die_error situation, and with some care i.e. with not writing to cache file directly we can ensure that cache entries contain full, correctly generated data. > > P.S.: Again the disclaimer - if this is "too hard", it's better to > accept patches like they are, then improve this later. But perhaps > a better solution would be not to clutter the code by optimizing this > case at all if it's not clear it really matters in the real world. See above. P.S. I have noticed that with current implementation (well, I am not sure if it is true also for J.H. implementation) there is problem if there is more than one process trying to request URL which result in die_error being called. The design decision, present in original patch, was to not cache "die_error" / non-"200 OK" pages; it seems sane, but I don't know if it was a correct decision. The solution for interrupted generating process, described above, works also for die_error pages, although it makes die_error pages slower for such (hopefully rare) situation of simultaneous errorneous request. P.P.S. Both Pasky's approach to caching projects_list page, and Lea Wiemann work on "gitweb caching" project for Google Summer of Code 2008 approached caching in different way: by caching (parsed) data, not by caching output. Note however that for some actions like 'snapshot' we would probably want to have response/output caching anyway. Also for output caching we can use X-Sendfile (or like) extension. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-25 13:48 ` Jakub Narebski 2010-01-25 13:56 ` Petr Baudis @ 2010-01-25 20:41 ` J.H. 2010-01-26 2:30 ` Jakub Narebski 1 sibling, 1 reply; 60+ messages in thread From: J.H. @ 2010-01-25 20:41 UTC (permalink / raw) To: Jakub Narebski; +Cc: Petr Baudis, git, John 'Warthog9' Hawley On 01/25/2010 05:48 AM, Jakub Narebski wrote: > On Mon, Jan 25, 2010, Petr Baudis wrote: >> On Mon, Jan 25, 2010 at 12:46:39PM +0100, Jakub Narebski wrote: >>> On Sun, 24 Jan 2010, Petr Baudis wrote: > >>>> I have stupid question, common to both the original patch and this >>>> RFC. >>>> >>>>> [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when >>>>> regenerating cache (WIP) >>>> >>>> Just why is a "Generating..." page appropriate? >>>> >>>> I have to admit I hate it; can you please at least make it >>>> configurable? Why is it needed at all? It [...] confuses >>>> non-interactive HTTP clients [...] > >>> Second, gitweb can always check User-Agent header, and serve >>> "Generating..." page only to web browsers: >>> >>> unless (defined $cgi->user_agent() && >>> $cgi->user_agent() =~ /\b(Mozilla|Opera)\b/i) { >>> return; >>> } >>> >>> or something like that. >> >> I'm not too happy with this. What about Safari? Opera? ELinks? There's a >> lot of web browsers. > > The "Mozilla" part would catch all "Mozilla compatibile" web browsers, > including Firefox (and other Gecko-based web browsers), Internet Explorer, > WebKit based browsers including Safari and Chrome and Konqueror. > The "Opera" part would catch Opera. > http://www.nczonline.net/blog/2010/01/12/history-of-the-user-agent-string/ > > As to other web browsers like Elinks, Lynx, w3m, Dillo, etc.: the issue > is whether they honor '<meta http-equiv="refresh" content="0" />'. > I think it is better to stay on the safe side; it is not disaster if web > browser is not shown "Generating..." page where it could (but see > explanation below). Most of them do, that particular tag has been around for a long time and since it doesn't require Javascript to do the page refresh it's pretty much universal. The problem is going to be with things like wget when someone wants to snag a binary file. This works fine if the file is already cached, but the user doesn't get what they are expecting if they get a blob that isn't the final file, but the html contents of the page. I don't know of any hint that things like wget would send to the server that you could switch based on, but it would be more or less the non-background caching state. >> Most of the issues can be worked around, but I'm not sure why to go >> through all the trouble. I just personally don't see the value in having >> the placeholder in there at all, to me it is distracting UI even if all >> the technicalities are put aside. > > The issue that "Generating..." page tries to solve is, I think, the > following. > > Some actions, like 'blame' view or pickaxe search, or grep search, > can take quite a long time to generate, with times counted in > tens of seconds. It is not that visible for non-caching, because > gitweb streams output so we have at least _some_ output upfront quite > fast. > > Now those patches (mine and J.H. both) make gitweb use locking > (it is IIRC configurable in J.H. patch) to make only one process > generate the page if it is missing from cache, or is stale. Now > if it is missing, we have to wait until it is generated in full > before being able to show it to client. While it is possible to > "tee" output (using PerlIO::tee, or Capture::Tiny, or tie like > CGI::Cache) writing it simultaneously to browser and to cache for > the process that is generating data, it is as far as I understand > it impossible for processes which are waiting for data. Therefore > the need for "Generating..." page, so the user does not think that > web server hung or something, and is not generating output. > > We can try to reduce occurrences of cache miss stampedes by using > 'expires_variance' feature[1] from CHI - Unified caching interface. > We can also turn off locking and tee output to have some output upfront > as an activity indicator instead of this "Generating..." page. > > [1]: http://search.cpan.org/~jswartz/CHI-0.33/lib/CHI.pm#set > >> But if it will be possible to turn this off eventually, it's all your >> call whether to bother implementing it. :-) > > In my implementation it is (or rather would be) as simple as just > not passing 'generating_info' => \&git_generating_data_html in the > GitwebCache::SimpleFileCache constructor. > At least in mine it was don't allow background caching. It would force everyone to wait on the one process that was actually generating content. But it means a few blank pages with a spinning working icon until the cache releases it's exclusive lock. - John 'Warthog9' Hawley ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP) 2010-01-25 20:41 ` J.H. @ 2010-01-26 2:30 ` Jakub Narebski 0 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-26 2:30 UTC (permalink / raw) To: J.H.; +Cc: Petr Baudis, git, John 'Warthog9' Hawley On Mon, 25 Jan 2010, J.H. wrote: > On 01/25/2010 05:48 AM, Jakub Narebski wrote: >> On Mon, Jan 25, 2010, Petr Baudis wrote: >>> On Mon, Jan 25, 2010 at 12:46:39PM +0100, Jakub Narebski wrote: >>>> On Sun, 24 Jan 2010, Petr Baudis wrote: >> >>>>> I have stupid question, common to both the original patch and this >>>>> RFC. >>>>> >>>>>> [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when >>>>>> regenerating cache (WIP) >>>>> >>>>> Just why is a "Generating..." page appropriate? >>>>> >>>>> I have to admit I hate it; can you please at least make it >>>>> configurable? Why is it needed at all? It [...] confuses >>>>> non-interactive HTTP clients [...] >> >>>> Second, gitweb can always check User-Agent header, and serve >>>> "Generating..." page only to web browsers: >>>> >>>> unless (defined $cgi->user_agent() && >>>> $cgi->user_agent() =~ /\b(Mozilla|Opera)\b/i) { >>>> return; >>>> } >>>> >>>> or something like that. >>> >>> I'm not too happy with this. What about Safari? Opera? ELinks? There's a >>> lot of web browsers. >> >> The "Mozilla" part would catch all "Mozilla compatibile" web browsers, >> including Firefox (and other Gecko-based web browsers), Internet Explorer, >> WebKit based browsers including Safari and Chrome and Konqueror. >> The "Opera" part would catch Opera. >> http://www.nczonline.net/blog/2010/01/12/history-of-the-user-agent-string/ >> >> As to other web browsers like Elinks, Lynx, w3m, Dillo, etc.: the issue >> is whether they honor '<meta http-equiv="refresh" content="0" />'. >> I think it is better to stay on the safe side; it is not disaster if web >> browser is not shown "Generating..." page where it could (but see >> explanation below). > > Most of them do, that particular tag has been around for a long time and > since it doesn't require Javascript to do the page refresh it's pretty > much universal. For this method to work there is additional assumption about web browser, namely that it would follow refresh only after it gets page in full (only when connection closes). But perhaps that is true for all web browsers following http-equiv refresh. > > The problem is going to be with things like wget when someone wants to > snag a binary file. This works fine if the file is already cached, but > the user doesn't get what they are expecting if they get a blob that > isn't the final file, but the html contents of the page. I don't know > of any hint that things like wget would send to the server that you > could switch based on, but it would be more or less the non-background > caching state. Well, the above solution would work as long as you don't run wget with --user-agent. [...] >>> But if it will be possible to turn this off eventually, it's all your >>> call whether to bother implementing it. :-) >> >> In my implementation it is (or rather would be) as simple as just >> not passing 'generating_info' => \&git_generating_data_html in the >> GitwebCache::SimpleFileCache constructor. > > At least in mine it was don't allow background caching. It would force > everyone to wait on the one process that was actually generating > content. But it means a few blank pages with a spinning working icon > until the cache releases it's exclusive lock. Theoretically at least having "Generating..." page is orthogonal to using locking, to background serving and to serving stale data. In current implementation those features are dependent, serving stale data on locking, and "Generating..." also on locking. I think that serving stale data should depend on background caching, as otherwise we introduce inequality in treating process which regenerates cache (and must wait without background caching) and rest of processes (which are served stale data). Certainly in final version of patch all this should be made configurable. P.S. With background caching there is a question whether parent should generate and child display stale or "Generating...", or whether parent should display stale/activity and child (re)generate cache entry. Also there is an issue whether parent should waitpid for child, would fiddling with SIGCHLD handler be enough, or is it not needed. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 00/10] gitweb: Simple file based output caching 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski ` (8 preceding siblings ...) 2010-01-23 0:27 ` [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating " Jakub Narebski @ 2010-01-23 19:55 ` J.H. 2010-01-24 13:54 ` [RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when regenerating cache (WIP) Jakub Narebski 2010-02-06 0:51 ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H. 11 siblings, 0 replies; 60+ messages in thread From: J.H. @ 2010-01-23 19:55 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley Just a heads up I wasn't able to get to these yesterday (Friday), but I'll try and work in looking these over come Monday. - John 'Warthog9' Hawley On 01/22/2010 04:27 PM, Jakub Narebski wrote: > This 10 patches long patch series is intended as proof of concept > for splitting large 'gitweb: File based caching layer (from git.kernel.org)' > mega-patch by John 'Warthog9' Hawley aka J.H., by starting small and > adding features piece by piece. > > This patch is meant as replacement for last two patches: > * [PATCH 8/9] gitweb: Convert output to using indirect file handle > Message-ID: <1263432185-21334-9-git-send-email-warthog9@eaglescrag.net> > * [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) > Message-ID: <1263432185-21334-10-git-send-email-warthog9@eaglescrag.net> > > in the long patch series by J.H. > * [PATCH 0/9] Gitweb caching v5 > http://thread.gmane.org/gmane.comp.version-control.git/136913 > > Note that this patch series is part of 'gitweb/cache-kernel' branch of > http://repo.or.cz/w/git/jnareb-git.git repository (gitweb link), built > on top of modified patches from 'Gitweb caching v2' series (from > 'gitweb-ml-v2' branch of http://git.kernel.org/?p=git/warthog9/gitweb.git > repository). Therefore they might not apply as straight replacements > on top of early parts of 'gitweb-ml-v5' branch. > > This is work in progress (showing how I see introducing output caching > to gitweb), it lacks proper documentation (POD for gitweb/cache.pm, > new configuration variables in gitweb/README, perhaps "Gitweb caching" > section in gitweb/README and gitweb/cache.pm mentioned in gitweb/INSTALL), > and commits/patches marked '(WIP)' lacks proper commit message. > > Just food for thought... > > Table of contents: > ~~~~~~~~~~~~~~~~~~ > [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing > for caching) > [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) > [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching > [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration > [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) > output caching (WIP) > [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) > [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) > [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' > problem (WIP) > [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for > filling cache (WIP) > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when > regenerating cache (WIP) > > > Diffstat: > ~~~~~~~~~ > > gitweb/cache.pm | 566 ++++++++++ > gitweb/gitweb.perl | 1923 +++++++++++++++++--------------- > t/gitweb-lib.sh | 2 + > t/t9500-gitweb-standalone-no-errors.sh | 13 + > t/t9503-gitweb-caching.sh | 32 + > t/t9503/test_cache_interface.pl | 195 ++++ > t/test-lib.sh | 3 + > 7 files changed, 1836 insertions(+), 898 deletions(-) > create mode 100644 gitweb/cache.pm > create mode 100755 t/t9503-gitweb-caching.sh > create mode 100755 t/t9503/test_cache_interface.pl ^ permalink raw reply [flat|nested] 60+ messages in thread
* [RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when regenerating cache (WIP) 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski ` (9 preceding siblings ...) 2010-01-23 19:55 ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H. @ 2010-01-24 13:54 ` Jakub Narebski 2010-02-06 0:51 ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H. 11 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-24 13:54 UTC (permalink / raw) To: git Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley, Jakub Narebski Signed-off-by: Jakub Narebski <jnareb@gmail.com> --- This patch, as you can see, lack proper commit message: it is work in progress (and very much proof of concept patch). This is Ajax-y JavaScript-based activity indicator. It is very bare-bones, without any error checking or fallback in the case we misdetected and JavaScript is turned off. Also currently it would work only for subsequent links from gitweb, with 'js=1' added. We need to mark Ajax request to _not_ create "Generating..." page. I have used there custon 'X-Ajax' HTTP header; we could use 'Pragma: wait', or ejen additional query parameter for that. gitweb/gitweb.perl | 69 +++++++++++++++++++++++++++++++++++++-------------- 1 files changed, 50 insertions(+), 19 deletions(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index cb46b8c..d422a4b 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3219,6 +3219,9 @@ sub blob_contenttype { sub git_generating_data_html { my ($cache, $key, $lock_fh) = @_; + if ($cgi->http('X-Ajax')) { + return; + } if ($action eq 'atom' || $action eq 'rss' || $action eq 'opml' || # feeds $action eq 'blob_plain' || # unknown mimetype $action eq 'commitdiff_plain' || # text/plain @@ -3241,6 +3244,7 @@ sub git_generating_data_html { } } + my $use_javascript = $cgi->param('js'); my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : ''; print STDOUT $cgi->header(-type => 'text/html', -charset => 'utf-8', -status=> '200 OK', -expires => 'now'); @@ -3253,31 +3257,58 @@ sub git_generating_data_html { <!-- git core binaries version $git_version --> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> -<meta http-equiv="refresh" content="0" /> +EOF + if (!$use_javascript) { + print STDOUT qq!<meta http-equiv="refresh" content="0" />\n!; + } + print STDOUT <<"EOF"; <meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version" /> <meta name="robots" content="noindex, nofollow" /> <title>$title</title> </head> <body> EOF - print STDOUT 'Generating..'; - - my $ready; - my ($wait, $wait_step, $wait_max); # in seconds - $wait = 0; - $wait_step = 2; - $wait_max = 20; - $| = 1; # autoflush - do { - print STDOUT '.'; - - sleep $wait_step if ($wait > 0); - $wait += $wait_step; - - $ready = flock($lock_fh, LOCK_SH|LOCK_NB) - if $lock_fh; - - } while (!$ready && ($wait < $wait_max)); + if ($use_javascript) { + my $url_self = href(-replay=>1); + print STDOUT <<"EOF"; +Generating...<br /> +<script type="text/javascript" src="$javascript"></script> +<script type="text/javascript"> +window.onload = function () { + xhr = createRequestObject(); + xhr.onreadystatechange = function () { + if (xhr.readyState === 4 && xhr.status === 200) { + // intentionally overwrite document + document.write(xhr.responseText); + } + } + xhr.open('GET', $url_self); + // mark request to not show "Generating..." page + xhr.setRequestHeader('X-Ajax', 'Yes'); + xhr.send(null); +} +</script> +EOF + } else { + print STDOUT 'Generating..'; + + my $ready; + my ($wait, $wait_step, $wait_max); # in seconds + $wait = 0; + $wait_step = 2; + $wait_max = 20; + $| = 1; # autoflush + do { + print STDOUT '.'; + + sleep $wait_step if ($wait > 0); + $wait += $wait_step; + + $ready = flock($lock_fh, LOCK_SH|LOCK_NB) + if $lock_fh; + + } while (!$ready && ($wait < $wait_max)); + } print STDOUT <<"EOF"; -- 1.6.6.1 ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 00/10] gitweb: Simple file based output caching 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski ` (10 preceding siblings ...) 2010-01-24 13:54 ` [RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when regenerating cache (WIP) Jakub Narebski @ 2010-02-06 0:51 ` J.H. 2010-02-06 23:56 ` Jakub Narebski 11 siblings, 1 reply; 60+ messages in thread From: J.H. @ 2010-02-06 0:51 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley > Table of contents: > ~~~~~~~~~~~~~~~~~~ > [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing > for caching) This looks fine, I did some quick testing to verify that this would work - and it does. The only caveat that needs to be aware is that if the layer is going to output binary data it needs to flip the whole stream to :raw before outputting (this is going to be more specific to the caching layer). One advantage to having the file handles separate is that it's easier to distinguish if the data is going to need to be binary data that will need to be flipped properly. Also means you could cache the binary data differently than textual data. I.E. binary data gets saved to disk, but page data gets saved to memcached. Just food for thought, I'm not sure which way makes more sense personally, though I would have a tendency to err on the side of flexibility and have both. > [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) note: delaying additional comment till I've finished reading through the basics of the following patches. > [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching Ok this is quite the departure from what I had, I'm unsure that it's the right way to go, but it obviously has merits (I.E. much simpler addition of any Cache::Cache or CHI compatible caching layer) This patch itself looks fine, and as it states it borrows heavily from my basic implementation - just wraps it differently. I might have some thoughts on extending this a bit to be a bit more flushed out from a basic standpoint. Need to dig through it some more, but I'm generally ok with it. > [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration Looks fine to me, though the note about getting the errors should get moved to previous patch, as it says. Note: I'm going to stop here as the following are WIP and I want to play around with this particular direction on my own a little more before further comment. There's some ideas running around I want to try and get down in code first. Me moving on and trying these other ideas is not a reflection on the following patches, just some alternative thinking before I discuss some other ideas on the following patches. Also I've been sitting on this e-mail in this state for almost a week while I've been playing with this and having to fight other fires and I know that Jakub has been looking for commentary on this. > [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) > output caching (WIP) > [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) > [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) > [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' > problem (WIP) > [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for > filling cache (WIP) > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when > regenerating cache (WIP) ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 00/10] gitweb: Simple file based output caching 2010-02-06 0:51 ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H. @ 2010-02-06 23:56 ` Jakub Narebski 2010-02-07 12:35 ` Jakub Narebski 0 siblings, 1 reply; 60+ messages in thread From: Jakub Narebski @ 2010-02-06 23:56 UTC (permalink / raw) To: J.H.; +Cc: git, John 'Warthog9' Hawley, Petr Baudis On Sat, 6 Feb 2010, J.H. wrote: > > Table of contents: > > ~~~~~~~~~~~~~~~~~~ > > [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing > > for caching) I am working on v2 of this series, where this patch is not necessary. It uses *STDOUT->push_layer(scalar => \$data) and *STDOUT->pop_layer() from PerlIO::Util if it is available, and manipulation of *STDOUT (which means *STDOUT = $data_fh and not $out = $data_fh). But I must say that doing capture of STDOUT (only; STDERR is not captured) without requiring extra Perl modules (like recommended Capture::Tiny or e.g. IO::Capture), and especially testing that it works correctly with capturing output of cache_fetch is serious PITA. This patch has the advantage that all operations are simpler. In particular it is easy to have section which should be not captured, or where capture should be turned off (slightly different). It has the disadvantage that all future contributions must use "print $out <something>" / "print {$out} <something>", and that contributions from before this change would have to be carefully updated. (Well, we could probably add the test that would check that everything that needs to go to $out does, and everything that shouldn't got to $out but to STDOUT doesn't.) If I were to have such patch in new version of "gitweb output caching" series, I would make the following changes: * (optionally) use simpler 'print $out <sth>' instead of visually distinct 'print {$out} <sth>', where from first glance one can see that $out is filehandle and not something to be printed * use short filehandle name: $out, or $oh, or $o/$O. * split above patch in 2 to 4 patches: - pure mechanical (scripted) change: + print <sth> -> print $out <sth> + printf(<sth>) -> printf($out <sth>) + binmode STDOUT -> binmode $out The last with possible exception of very first binmode call. - realign (purely whitespace change) - wrap too long lines (newlines and whitespace), optional - change $out to $bout/$bin ($binary_output_fh) where needed; but see comment below (optional) > > This looks fine, I did some quick testing to verify that this would work > - and it does. I have only ran test, and didn't actually check that it works correctly. This commit shouldn't change gitweb behaviour at all. > > The only caveat that needs to be aware is that if the layer is going to > output binary data it needs to flip the whole stream to :raw before > outputting (this is going to be more specific to the caching layer). > > One advantage to having the file handles separate is that it's easier to > distinguish if the data is going to need to be binary data that will > need to be flipped properly. I don't think that it would be needed. First, all mode changing operations, i.e. calls to binmode are changed to act on $out rather than on STDOUT it means. It means that if we are using 'in memory file' to capture output to scalar variable, then captured data would be properly converted in variable. So it would be enough to save this variable in :raw mode to file. If we are saving directly to cache file, then of course saved data would go through layer and would be converted properly. In any case in cache file we would have _already_ _converted_ data. This means that regardless whether $out used ':utf8' (pseudo)layer, or ':raw' (pseudo)layer, if we read from cache file in ':raw' (binary mode) and print data from cache to original (true) STDOUT also in ':raw' mode, we would print correctly formatted data. > > Also means you could cache the binary data differently than textual data. > > I.E. binary data gets saved to disk, but page data gets saved to memcached. That's true, but on the other hand it would be easy to add some extra command marking data as binary below binmode. Or we can examine IO layers (using PerlIO::get_layers($out); the PerlIO module is in Perl core) if there is 'utf8' layer or 'raw' (pseudo)layer. > > Just food for thought, I'm not sure which way makes more sense > personally, though I would have a tendency to err on the side of > flexibility and have both. It might be good idea... but nevertheless I'd like to have short name for binary filehandle, if we decode to keep it. What should it be? $bout, $bin, $B, $bin_out, $out_bin, $bin_fh? > > > [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) > > note: delaying additional comment till I've finished reading through the > basics of the following patches. This is to use later _full_ _normalized_ URI as cache key for given page. IIRC in your original patch you ignored path_info; but on the other hand git.kernel.org has path_info feature turned off... > > > [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching > > Ok this is quite the departure from what I had, I'm unsure that it's the > right way to go, but it obviously has merits (I.E. much simpler addition > of any Cache::Cache or CHI compatible caching layer) > > This patch itself looks fine, and as it states it borrows heavily from > my basic implementation - just wraps it differently. I might have some > thoughts on extending this a bit to be a bit more flushed out from a > basic standpoint. > > Need to dig through it some more, but I'm generally ok with it. Note that the new implementation in (not send yet) new version of "gitweb output caching" series is based more on newer and more modern CHI unified interface rather than older Cache::Cache interface. It is I think much cleaner and easier to read. The major difference from your implementation is that in my version the gitweb caching engine uses "save to temporary file + rename file to final name" method to have atomic write to cache (atomic cache filling). It should be more robust, but OTOH it introduces a bit of performance penalty. With locking and single writer we could use predictable temporary file name rather than using tempfile/mkstemp or equivalent from File::Temp, or UUID based filename like CHI does it. Also, tests. Current code (even the v2 version) lacks proper error detection, error signalling and logging. > > > [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration > > Looks fine to me, though the note about getting the errors should get > moved to previous patch, as it says. I wanted to get this series out faster, that is why it is not polished. > > Note: I'm going to stop here as the following are WIP and I want to play > around with this particular direction on my own a little more before > further comment. There's some ideas running around I want to try and > get down in code first. Me moving on and trying these other ideas is > not a reflection on the following patches, just some alternative > thinking before I discuss some other ideas on the following patches. Take a look at gitweb/cache-kernel-v2 branch (the new caching series). Note however that it would be subject to rebasing / changes. > > Also I've been sitting on this e-mail in this state for almost a week > while I've been playing with this and having to fight other fires and I > know that Jakub has been looking for commentary on this. Thank you very much for your commentary, in spite of your heavy load. > > > [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) > > output caching (WIP) > > [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) > > [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) > > [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' > > problem (WIP) > > [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for > > filling cache (WIP) > > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when > > regenerating cache (WIP) There is new version of this series in gitweb/cache-kernel-v2 in my git/jnareb-git.git fork (clone) of git.git repository at repo.or.cz. Now all commits have proper description (for first series one had to read comment section in emails for commit description), and all features are tested (at least on API level, and to some extent) -- full tests do require having PerlIO::Util installed (I have done it following local::lib and installing it from 'cpan' client), though. Also all features are fully configurable, to even greater extent than in original series by J.H. (this what what v1 was lacking). And there is (see diffstat) section about caching in gitweb/README. The following changes since commit d5f8a3d6f4d946c33459e00edf02819f89711777: Junio C Hamano (1): Merge branch 'master' into next are available in the git repository at: git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel-v2 You can view it via gitweb at: http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel-v2 http://repo.or.cz/w/git/jnareb-git.git/log/refs/heads/gitweb/cache-kernel-v2 SHORTLOG (10): gitweb: href(..., -path_info => 0|1) gitweb/cache.pm - Very simple file based caching gitweb/cache.pm - Stat-based cache expiration gitweb: Use Cache::Cache compatibile (get, set) output caching gitweb/cache.pm - Adaptive cache expiration time gitweb: Use CHI compatibile (compute method) caching gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem gitweb/cache.pm - Serve stale data when waiting for filling cache gitweb/cache.pm - Regenerate (refresh) cache in background gitweb: Show appropriate "Generating..." page when regenerating cache gitweb/README | 70 +++++ gitweb/cache.pm | 527 ++++++++++++++++++++++++++++++++ gitweb/gitweb.perl | 301 +++++++++++++++++- t/gitweb-lib.sh | 2 + t/t9500-gitweb-standalone-no-errors.sh | 19 ++ t/t9503-gitweb-caching.sh | 32 ++ t/t9503/test_cache_interface.pl | 380 +++++++++++++++++++++++ t/test-lib.sh | 3 + 8 files changed, 1319 insertions(+), 15 deletions(-) create mode 100644 gitweb/cache.pm create mode 100755 t/t9503-gitweb-caching.sh create mode 100755 t/t9503/test_cache_interface.pl -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 00/10] gitweb: Simple file based output caching 2010-02-06 23:56 ` Jakub Narebski @ 2010-02-07 12:35 ` Jakub Narebski 0 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-02-07 12:35 UTC (permalink / raw) To: J.H.; +Cc: git, John 'Warthog9' Hawley On Sun, 7 Feb 2010, Jakub Narebski wrote: > There is new version of this series in gitweb/cache-kernel-v2 in my > git/jnareb-git.git fork (clone) of git.git repository at repo.or.cz. > Now all commits have proper description (for first series one had to > read comment section in emails for commit description), [...] Below there are commit messages for gitweb/cache-kernel-v2 branch after rebase and fixups: commit 560e2ab10d0f8457fbeca7a26814ff3e32396f7b Author: Jakub Narebski <jnareb@gmail.com> Date: Sun Feb 7 11:27:22 2010 +0100 gitweb: href(..., -path_info => 0|1) If named boolean option -path_info is passed to href() subroutine, use its value to decide whether to generate path_info URL form. If this option is not passed, href() queries 'pathinfo' feature to check whether to generate path_info URL (if generating path_info link is possible at all). href(-replay=>1, -path_info=>0) is meant to be used to generate a key for caching gitweb output; alternate solution would be to use freeze() from Storable (core module) on %input_params hash (or its reference), e.g.: $key = freeze \%input_params; or other serialization technique. While at it document extra options/flags to href(). Signed-off-by: Jakub Narebski <jnareb@gmail.com> gitweb/gitweb.perl | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) commit dd6e8dc27d5b799bd2a1aed03738195dfe3bc5e7 Author: Jakub Narebski <jnareb@gmail.com> Date: Sun Feb 7 13:13:06 2010 +0100 gitweb/cache.pm - Very simple file based caching This is first step towards implementing file based output (response) caching layer that is used on such large sites as kernel.org. This patch introduces GitwebCaching::SimpleFileCache package, which follows Cache::Cache / CHI interface, although do not implement it fully. The intent of following established convention is to be able in the future to replace our simple file based cache e.g. by one using memcached. Like in original patch by John 'Warthog9' Hawley (J.H.) (the one this commit intends to be incremental step to), the data is stored in the case as-is, without adding metadata (like expiration date), and without serialization (which means only scalar data). To be implemented (from original patch by J.H.): * cache expiration (based on file stats, current time and global expiration time); currently elements in cache do not expire * actually using this cache in gitweb, except error pages * adaptive cache expiration, based on average system load * optional locking interface, where only one process can update cache (using flock) * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Possible extensions (beyond what was in original patch): * (optionally) show information about cache utilization * AJAX (JavaScript-based) progress indicator * JavaScript code to update relative dates in cached output * make cache size-aware (try to not exceed specified maximum size) * utilize X-Sendfile header (or equivalent) to show cached data (optional, as it makes sense only if web server supports sendfile feature and have it enabled) * variable expiration feature from CHI, allowing items to expire a bit earlier than the stated expiration time to prevent cache miss stampedes (although locking, if available, should take care of this). The code of GitwebCaching::SimpleFileCache package in gitweb/cache.pm was heavily based on file-based cache in Cache::Cache package, i.e. on Cache::FileCache, Cache::FileBackend and Cache::BaseCache, and on file-based cache in CHI, i.e. on CHI::Driver::File and CHI::Driver (including implementing atomic write, something that original patch lacks). This patch does not yet enable output caching in gitweb (it doesn't have all required features yet); on the other hand it includes tests, currently testing only cache Perl API. Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> gitweb/cache.pm | 269 +++++++++++++++++++++++++++++++++++++++ t/t9503-gitweb-caching.sh | 32 +++++ t/t9503/test_cache_interface.pl | 84 ++++++++++++ t/test-lib.sh | 3 + 4 files changed, 388 insertions(+), 0 deletions(-) create mode 100644 gitweb/cache.pm create mode 100755 t/t9503-gitweb-caching.sh create mode 100755 t/t9503/test_cache_interface.pl commit 3914e7da792fec50fcc64c0e644d54cf4451703a Author: Jakub Narebski <jnareb@gmail.com> Date: Sun Feb 7 13:13:17 2010 +0100 gitweb/cache.pm - Stat-based cache expiration Add stat-based cache expiration to file-based GitwebCache::SimpleFileCache. Contrary to the way other caching interfaces such as Cache::Cache and CHI do it, the time cache element expires in is _global_ value associated with cache instance, and is not local property of cache entry. (Currently cache entry does not store any metadata associated with entry... which means that there is no need for serialization / marshalling / freezing and thawing.) Default expire time is -1, which means never expire. To check if cache entry is expired, GitwebCache::SimpleFileCache compares difference between mtime (last modify time) of a cache file and current time with (global) time to expire. It is done using CHI-compatible is_valid() method. Add some tests checking that expiring works correctly (on the level of API). To be implemented (from original patch by J.H.): * actually using this cache in gitweb, except error pages * adaptive cache expiration, based on average system load * optional locking interface, where only one process can update cache (using flock) * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> gitweb/cache.pm | 34 ++++++++++++++++++++++++++++++++-- t/t9503/test_cache_interface.pl | 10 ++++++++++ 2 files changed, 42 insertions(+), 2 deletions(-) commit a55625cb0f2d6c08a28e774fd2ddb4e5347a24b3 Author: Jakub Narebski <jnareb@gmail.com> Date: Sun Feb 7 13:13:27 2010 +0100 gitweb: Use Cache::Cache compatible (get, set) output caching This commit actually adds output caching to gitweb, as we have now minimal features required for it in GitwebCache::SimpleFileCache (a 'dumb' but fast file-based cache engine). To enable cache you need at least set $caching_enabled to true in gitweb config, and copy cache.pm from gitweb/ alongside gitweb.cgi - this is described in more detail in the new "Gitweb caching" section in gitweb/README Currently cache support related subroutines in cache.pm (which are outside GitwebCache::SimpleFileCache package) are not well separated from gitweb script itself; cache.pm lacks encapsulation. cache.pm assumes that there are href() subroutine and %actions variable, and that there exist $actions{$action} (where $action is parameter passed to cache_fetch), and it is a code reference (see also comments in t/t9503/test_cache_interface.pl). This is remaining artifact from the original patch by J.H. (which also had cache_fetch() subroutine). Gitweb itself uses directly only cache_fetch, to get page from cache or to generate page and save it to cache, and cache_stop, to be used in die_error subroutine, as currently error pages are not cached. The cache_fetch subroutine captures output (from STDOUT only, as STDERR is usually logged) using either ->push_layer()/->pop_layer() from PerlIO::Util submodule (if it is available), or by setting and restoring *STDOUT. Note that only the former could be tested reliably to be reliable in t9503 test! Enabling caching causes the following additional changes to gitweb output: * Disables content-type negotiation (choosing between 'text/html' mimetype and 'application/xhtml+xml') when caching, as there is no content-type negotiation done when retrieving page from cache. Use 'text/html' mimetype that can be used by all browsers. * Disable timing info (how much time it took to generate original page, and how many git commands it took), and in its place show when page was originally generated (in GMT / UTC timezone). Add basic tests of caching support to t9500-gitweb-standalone-no-errors test: set $caching_enabled to true and check for errors for first time run (generating cache) and second time run (retrieving from cache) for a single view - summary view for a project. If PerlIO::Util is available (see comments), test that cache_fetch behaves correctly, namely that it saves and restores action output in cache, and that it prints generated output or cached output. To be implemented (from original patch by J.H.): * adaptive cache expiration, based on average system load * optional locking interface, where only one process can update cache (using flock) * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> gitweb/README | 70 ++++++++++++++++++++++ gitweb/cache.pm | 78 ++++++++++++++++++++++++ gitweb/gitweb.perl | 102 ++++++++++++++++++++++++++++---- t/gitweb-lib.sh | 2 + t/t9500-gitweb-standalone-no-errors.sh | 19 ++++++ t/t9503/test_cache_interface.pl | 93 +++++++++++++++++++++++++++++ 6 files changed, 352 insertions(+), 12 deletions(-) commit 3e471ebd31e881ce1439f23075378c2ec6b95e4d Author: Jakub Narebski <jnareb@gmail.com> Date: Sun Feb 7 13:13:31 2010 +0100 gitweb/cache.pm - Adaptive cache expiration time Add to GitwebCache::SimpleFileCache support for adaptive lifetime (cache expiration) control. Cache lifetime can be increased or decreased by any factor, e.g. load average, through the definition of the 'check_load' callback. Note that using ->set_expires_in, or unsetting 'check_load' via ->set_check_load(undef) turns off adaptive caching. Make gitweb automatically adjust cache lifetime by load, using get_loadavg() function. Define and describe default parameters for dynamic (adaptive) cache expiration time control. There are some very basic tests of dynamic expiration time in t9503, namely checking if dynamic expire time is within given upper and lower bounds. To be implemented (from original patch by J.H.): * optional locking interface, where only one process can update cache (using flock) * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> gitweb/cache.pm | 55 +++++++++++++++++++++++++++++++++++--- gitweb/gitweb.perl | 27 +++++++++++++++++- t/t9503/test_cache_interface.pl | 22 +++++++++++++++ 3 files changed, 97 insertions(+), 7 deletions(-) commit 984390f99c33d82cd4ddbfa6e00c721d9e74cddb Author: Jakub Narebski <jnareb@gmail.com> Date: Sun Feb 7 13:13:52 2010 +0100 gitweb: Use CHI compatible (compute method) caching If $cache provides CHI compatible ->compute($key, $code) method, use it instead of Cache::Cache compatible ->get($key) and ->set($key, $data). While at it, refactor regenerating cache into cache_calculate subroutine. GitwebCache::SimpleFileCache provides 'compute' method, which currently simply use 'get' and 'set' methods in proscribed manner. Nevertheless 'compute' method can be more flexible in choosing when to refresh cache, and which process is to refresh/(re)generate cache entry. This method would use (advisory) locking to prevent 'cache miss stampede' (aka 'stampeding herd') problem in the next commit. Signed-off-by: Jakub Narebski <jnareb@gmail.com> gitweb/cache.pm | 39 ++++++++++++++++++++++++++++++++++++--- 1 files changed, 36 insertions(+), 3 deletions(-) commit 7d0109e4379f5187364edf7c25cdbc5247609f64 Author: Jakub Narebski <jnareb@gmail.com> Date: Sun Feb 7 13:18:14 2010 +0100 gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem In the ->compute($key, $code) method from GitwebCache::SimpleFileCache, use locking (via flock) to ensure that only one process would generate data to update/fill-in cache; the rest would wait for the cache to be (re)generated and would read data from cache. Currently this feature can not be disabled (via %cache_options). A test in t9503 shows that in the case where there are two clients trying to simultaneously access non-existent or stale cache entry, (and generating data takes (artifically) a bit of time), if they are using ->compute method the data is (re)generated once, as opposed to if those clients are just using ->get/->set methods. To be implemented (from original patch by J.H.): * background building, and showing stale cache * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> gitweb/cache.pm | 29 ++++++++++++++++- t/t9503/test_cache_interface.pl | 65 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 92 insertions(+), 2 deletions(-) commit e7985f69eb9000860b155939d5fd7040e30f682f Author: Jakub Narebski <jnareb@gmail.com> Date: Sun Feb 7 13:19:21 2010 +0100 gitweb/cache.pm - Serve stale data when waiting for filling cache When process fails to acquire exclusive (writers) lock, then instead of waiting for the other process to (re)generate and fill cache, serve stale (expired) data from cache. This is of course possible only if there is some stale data in cache for given key. This feature of GitwebCache::SimpleFileCache is used only for an ->update($key, $code) method. It is controlled by 'max_lifetime' cache parameter; you can set it to -1 to always serve stale data if it exists, and you can set it to 0 (or any value smaller than 'expires_min') to turn this feature off. This feature, as it is implemented currently, makes ->update() method a bit assymetric with respect to process that acquired writers lock and those processes that didn't, which can be seen in the new test in t9503. The process that is to regenerate (refresh) data in cache must wait for the data to be generated in full before showing anything to client, while the other processes show stale (expired) data immediately. In order to remove or reduce this assymetry gitweb would need to employ one of the two alternate solutions. Either data should be (re)generated in background, so that process that acquired writers lock would generate data in background while serving stale data, or alternatively the process that generates data should pass output to original STDOUT while capturing it ("tee" otput). When developing this feature, ->is_valid() method acquired additional extra optional parameter, where one cap pass expire time instead of using cache-wode global expire time. To be implemented (from original patch by J.H.): * background building, * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> gitweb/cache.pm | 23 ++++++++++---- gitweb/gitweb.perl | 8 +++++ t/t9503/test_cache_interface.pl | 63 +++++++++++++++++++++++++++++++++++++- 3 files changed, 86 insertions(+), 8 deletions(-) commit 19911970b8a811a6382e39a10b071bff1dd4bd70 Author: Jakub Narebski <jnareb@gmail.com> Date: Sun Feb 7 13:20:46 2010 +0100 gitweb/cache.pm - Regenerate (refresh) cache in background This commit removes assymetry in serving stale data (if it exists) when regenerating cache in GitwebCache::SimpleFileCache. The process that acquired exclusive (writers) lock, and is therefore selected to be the one that (re)generates data to fill the cache, can now generate data in background, while serving stale data. This feature can be enabled or disabled on demand via 'background_cache' cache parameter. It is turned on by default. To be implemented (from original patch by J.H.): * server-side progress indicator when waiting for filling cache, which in turn requires separating situations (like snapshots and other non-HTML responses) where we should not show 'please wait' message Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> gitweb/cache.pm | 36 +++++++++++++++++++++++++++++------- gitweb/gitweb.perl | 9 +++++++++ t/t9503/test_cache_interface.pl | 14 ++++++++------ 3 files changed, 46 insertions(+), 13 deletions(-) commit ce97bb5bc1660f6d5c9b9be68c556ac94097978c Author: Jakub Narebski <jnareb@gmail.com> Date: Sun Feb 7 13:21:10 2010 +0100 gitweb: Show appropriate "Generating..." page when regenerating cache When there exist stale/expired (but not too stale) version of (re)generated page in cache, gitweb returns stale version (and updates cache in background, assuming 'background_cache' is set to true value). When there is no stale version suitable to serve the client, currently we have to wait for the data to be generated in full before showing it. Add to GitwebCache::SimpleFileCache, via 'generating_info' callback, the ability to show user some activity indicator / progress bar, to show that we are working on generating data. Gitweb itself uses "Generating..." page as activity indicator, which redirects (via <meta http-equiv="Refresh" ...>) to refreshed version of the page after the cache is filled (via trick of not closing page and therefore not closing connection till data is available in cache, checked by getting shared/readers lock on lockfile for cache entry). The git_generating_data_html() subroutine, which is used by gitweb to implement this feature, is highly configurable: you can choose initial delay, frequency of writing some data so that connection won't get closed, and maximum time to wait for data in "Generating..." page (see %generating_options hash). Currently git_generating_data_html() contains hardcoded "whitelist" of actions for which such HTML "Generating..." page makes sense. This implements final feature from the original gitweb output caching patch by J.H. Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org> Signed-off-by: Jakub Narebski <jnareb@gmail.com> gitweb/cache.pm | 23 +++++- gitweb/gitweb.perl | 154 ++++++++++++++++++++++++++++++++++++++- t/t9503/test_cache_interface.pl | 45 +++++++++++ 3 files changed, 216 insertions(+), 6 deletions(-) -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
[parent not found: <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com>]
* Re: [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing for caching) [not found] ` <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com> @ 2010-01-23 0:48 ` Jakub Narebski 2010-02-07 21:32 ` Jakub Narebski 1 sibling, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-23 0:48 UTC (permalink / raw) To: git; +Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley As you can (or rather can't ;-)) see this patch didn't made it into list, because with 119,993 characters in format-patch patch it probably exceeds a little bit exceeding 100,000 characters message size limit on VGER. The problem is that it doesn't make sense to send partial patch... well, perhaps reindent and breaking of exceedingly long lines should be split into separate patch... For now you can view the patch via gitweb http://repo.or.cz/w/git/jnareb-git.git/commitdiff/0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557 and apply it from http://repo.or.cz/w/git/jnareb-git.git/patch/0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557 The comment for this message (for this patch) can be seen below. -- >8 -- On Sat, 23 Jan 2010, Jakub Narebski wrote: > This means replacing > > print <something>; > by > print {$out} <something>; > > and > > binmode STDOUT, <layer>; > by > binmode $out, <layer>; > > where $out is global variable set to \*STDOUT at the beginning of > gitweb, but after reading gitweb config. This way it would be simple > to e.g. tie output filehandle or use PerlIO layers to simultaneously > write to standard output and to some specified file (like "tee" > utility does), or redirect output to a scalar, or a file. > > die_error (re)sets $out to \*STDOUT; we would (probably) want to treat > errors in a special way, and do not cache them. > > > The only other differences are reindent of continued lines (if > needed), and sometimes word-wrapping lines which this change made too > long. > > Signed-off-by: Jakub Narebski <jnareb@gmail.com> > --- > This patch is meant as (straight) replacement for the following patch > by J.H. (John 'Warthog9' Hawley): > * [PATCH 8/9] gitweb: Convert output to using indirect file handle > Message-ID: <1263432185-21334-9-git-send-email-warthog9@eaglescrag.net> > http://permalink.gmane.org/gmane.comp.version-control.git/136915 > > Actually this patch precedes (was written before) the patch by J.H. > > This patch was written _before_ comment from Junio that it would be > better to simply use > print $out <something>; > and do not try to be too clever. > > > Differences from patch by J.H.: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > * Only one output handle, instead of having one output handle for text, > (':utf8'), and another output handler for binary files (:raw). > > I do assume that I can write to handler with appropriate layer: > :utf8/:raw, and then I can simply read from cache file in :raw > binmode, as the data is already converted correctly. > > * Shorter name for output handle: $out instead of $output_handler > > * Set output handler to \*STDOUT (default value in declaration of this > variable, and also in die_error() subroutine), instead of *STDOUT. > This way $out is indirect filehandle, instead of using direct > filehandle which is _global_ to current package (see perlopentut(1)). > > * Reindent continuation lines, i.e. > > print <line1> . > <line2>; > > got replaced (reindented) with > > print {$out} <line1> . > <line2>; > > In some places lines were broken into two, when after indent the > line got too long. > > * Slightly different replacement for printf -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing for caching) [not found] ` <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com> 2010-01-23 0:48 ` [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing for caching) Jakub Narebski @ 2010-02-07 21:32 ` Jakub Narebski 1 sibling, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-02-07 21:32 UTC (permalink / raw) To: git On Sat, 23 Feb 2010, Jakub Narebski wrote: > This means replacing > > print <something>; > by > print {$out} <something>; > > and > > binmode STDOUT, <layer>; > by > binmode $out, <layer>; [...] > This patch was written _before_ comment from Junio that it would be > better to simply use > print $out <something>; > and do not try to be too clever. Actually 'print FILEHANDLE LIST' form is an example of indirect object syntax in Perl (which for filehandles was present before there were objects in Perl). In indirect object notation the "first argument" must be a name, a scalar variable, or a block (like {$out}). If instead of $out there would be something more complicated, we would have to use block form ('print {$out[$i]} <sth>', for example). -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle 2010-01-14 1:23 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley @ 2010-01-16 0:43 ` Jakub Narebski 2010-01-16 0:58 ` Junio C Hamano 2010-01-24 22:14 ` Petr Baudis 1 sibling, 2 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-16 0:43 UTC (permalink / raw) To: John 'Warthog9' Hawley; +Cc: git "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes: > This converts the output handling of gitweb to using an indirect > file handle. This is in preparation to add the caching layer. This > is a slight modification to the way I was originally doing it by > passing the output around. This should be a nop and this shouldn't > change the behavior of gitweb. This does leave error reporting > functions (die_error specifically) continuing to output directly > as I want to garauntee those will report their errors regardless of > what may be going on with respect to the rest of the output. Signoff? Compare with my version of this patch: http://repo.or.cz/w/git/jnareb-git.git/commitdiff/0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557 in the gitweb/cache-kernel branch: http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel My commit message looks like the following: .... gitweb: Print to explicit filehandle (preparing for caching) This means replacing print <something>; by print {$out} <something>; and binmode STDOUT, <layer>; by binmode $out, <layer>; where $out is global variable set to \*STDOUT at the beginning of gitweb, but after reading gitweb config. This way it would be simple to e.g. tie output filehandle or use PerlIO layers to simultaneously write to standard output and to some specified file (like "tee" utility does), or redirect output to a scalar, or a file. die_error (re)sets $out to \*STDOUT; we would (probably) want to treat errors in a special way, and do not cache them. The only other differences are reindent of continued lines (if needed), and sometimes word-wrapping lines which this change made too long. Signed-off-by: Jakub Narebski <jnareb@gmail.com> .... > --- > gitweb/gitweb.perl | 880 ++++++++++++++++++++++++++-------------------------- > 1 files changed, 448 insertions(+), 432 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index c4a177d..8bb323c 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -450,6 +450,13 @@ our %feature = ( > 'default' => [0]}, > ); > > +# Basic file handler for all of gitweb, there are two of them. The first > +# is the basic text/html file handler which is used for everything other > +# then the binary files, that uses a separate file handler though > +# these are both set to STDOUT for the time being. > +our $output_handler = *STDOUT; > +our $output_handler_bin = *STDOUT; > + First it is not file handleR, but filehandle. Second, there is no need for separate filehandle for binary files, if you do it correctly (i.e. call binmode on filehandle, and not on STDOUT). When caching is enabled, and 'print {$output_handle} <sth>' prints to in-memory file (or even directly to cache file) it would do conversion, so when reading from cache file we can dump it raw, in binary mode. Third, wouldn't it be better to use shorter variable name, e.g. $out or $oh, instead of $output_handle? We would be able to align print(f) statements without making lines much longer. Fourth, there is slight difference between our $out = *STDOUT; and out $out = \*STDOUT; In the former we have global variable, in latter we have indirect filehandle. CGI::Cache uses the latter form, IIRC. > @@ -3313,7 +3320,7 @@ EOF > if ($use_pathinfo) { > $action .= "/".esc_url($project); > } > - print $cgi->startform(-method => "get", -action => $action) . > + print {$output_handler} $cgi->startform(-method => "get", -action => $action) . > "<div class=\"search\">\n" . > (!$use_pathinfo && > $cgi->input({-name=>"p", -value=>$project, -type=>"hidden"}) . "\n") . Here for example after change gitweb source stops being nicely aligned. OTOH it makes for bigger patch. In my version I did realign. You can always check for true differences with "diff -w". -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle 2010-01-16 0:43 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle Jakub Narebski @ 2010-01-16 0:58 ` Junio C Hamano 2010-01-16 1:14 ` Jakub Narebski 2010-01-24 22:14 ` Petr Baudis 1 sibling, 1 reply; 60+ messages in thread From: Junio C Hamano @ 2010-01-16 0:58 UTC (permalink / raw) To: Jakub Narebski; +Cc: John 'Warthog9' Hawley, git Jakub Narebski <jnareb@gmail.com> writes: > This means replacing > > print <something>; > by > print {$out} <something>; Just out of curiosity, how is this different from print $out <something>; ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle 2010-01-16 0:58 ` Junio C Hamano @ 2010-01-16 1:14 ` Jakub Narebski 2010-01-16 1:41 ` Junio C Hamano 0 siblings, 1 reply; 60+ messages in thread From: Jakub Narebski @ 2010-01-16 1:14 UTC (permalink / raw) To: Junio C Hamano; +Cc: John 'Warthog9' Hawley, git On Sat, 16 Jan 2010, Junio C Hamano wrote: > Jakub Narebski <jnareb@gmail.com> writes: > > > This means replacing > > > > print <something>; > > by > > print {$out} <something>; > > Just out of curiosity, how is this different from > > print $out <something>; Actually there is no difference. It doesn't matter one way or other in situations in gitweb. I have thought however (but I might be mistaken) that "print {$fh} <sth>" is idiomatic Perl. 'perldoc -f print' says: Note that if you're storing FILEHANDLES in an array or other expression, you will have to use a block returning its value instead: print { $files[$i] } "stuff\n"; print { $OK ? STDOUT : STDERR } "stuff\n"; Also, there is no "," between FILEHANDLE and LIST in "print FILEHANDLE LIST" -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle 2010-01-16 1:14 ` Jakub Narebski @ 2010-01-16 1:41 ` Junio C Hamano 0 siblings, 0 replies; 60+ messages in thread From: Junio C Hamano @ 2010-01-16 1:41 UTC (permalink / raw) To: Jakub Narebski; +Cc: John 'Warthog9' Hawley, git Jakub Narebski <jnareb@gmail.com> writes: > I have thought however (but I might be mistaken) that "print {$fh} <sth>" > is idiomatic Perl. > > 'perldoc -f print' says: > Note that if you're storing FILEHANDLES in an array or other expression, > you will have to use a block returning its value instead: Note that "in an array or other expression". I've always thought the intention of this phrase was "you _could_ help the parser by doing this, if you have expression more complex than a simple scalar variable reference". IOW, I know that {} _can_ be used there, but I haven't seen people write {$a_single_variable}, especially without a space around the "expression" (technically, a single variable is an expression), when print $fh <stuff> suffices, and I was curious why you chose to use the syntax when it wasn't necessary. Besides, {$fh} looks so eh... (hesitates to mention a dirty word ^W^W^Wthe name of a different language, but bleeps it out)... ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle 2010-01-16 0:43 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle Jakub Narebski 2010-01-16 0:58 ` Junio C Hamano @ 2010-01-24 22:14 ` Petr Baudis 2010-01-25 1:47 ` Jakub Narebski 1 sibling, 1 reply; 60+ messages in thread From: Petr Baudis @ 2010-01-24 22:14 UTC (permalink / raw) To: Jakub Narebski; +Cc: John 'Warthog9' Hawley, git On Fri, Jan 15, 2010 at 04:43:32PM -0800, Jakub Narebski wrote: > Third, wouldn't it be better to use shorter variable name, e.g. $out > or $oh, instead of $output_handle? We would be able to align print(f) > statements without making lines much longer. I totally agree, I was going to comment on this too. Please use something very short, I think even the three characters might be pushing it a bit. Having long idiom will make it real pain to both read and write gitweb code. It would still be nice if we could have some way to avoid this hack entirely. Couldn't we layer PerlIO::via over STDOUT and continue to use argument-less print as before? I think we (well, in practice, "you" nowadays :) should be less perfectionist when accepting gitweb contributions, so if the answer is "it's too hard for me to do right now", that's fine I guess; I know I can't contribute the code currently. :( Petr "Pasky" Baudis ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle 2010-01-24 22:14 ` Petr Baudis @ 2010-01-25 1:47 ` Jakub Narebski 2010-01-25 20:48 ` J.H. 0 siblings, 1 reply; 60+ messages in thread From: Jakub Narebski @ 2010-01-25 1:47 UTC (permalink / raw) To: Petr Baudis, John 'Warthog9' Hawley Cc: John 'Warthog9' Hawley, git On Sun, Jan 24, 2010 at 23:14 +0100, Petr Baudis wrote: > On Fri, Jan 15, 2010 at 04:43:32PM -0800, Jakub Narebski wrote: > > Third, wouldn't it be better to use shorter variable name, e.g. $out > > or $oh, instead of $output_handle? We would be able to align print(f) > > statements without making lines much longer. > > I totally agree, I was going to comment on this too. Please use > something very short, I think even the three characters might be pushing > it a bit. Having long idiom will make it real pain to both read and > write gitweb code. > > It would still be nice if we could have some way to avoid this hack > entirely. Couldn't we layer PerlIO::via over STDOUT and continue to use > argument-less print as before? First, the original version in v2 series, the one that can be found in 42641b1 (GITWEB - File based caching layer, 2009-12-10) commit in git://git.kernel.org/pub/scm/git/warthog9/gitweb.git gitweb-ml-v2 http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v2 the change was even larger. It was print <something>; replaced by $output .= <something>; and of course make functions that printed sth return formatted output instead. Also with original version you paid the price even if caching was turned off (not used): the whole output had to fit in memory, and there was no streaming, which meant that the whole output had to be generated before it could be displayed (decreasing latency). Second, PerlIO::via is not a good solution. One solution would be to require Capture::Tiny if caching is enabled, and just do $out = capture { $actions{$action}->(); } or even simply (if it is possible with the way 'capture' is prototyped) $out = capture $actions{$action}; Another solution would be to use PerlIO::Util to redirect output to scalar: *STDOUT->push_layer(scalar => \$buffer); # ... *STDOUT->pop_layer(); Yet another solution would be to manipulate *STDOUT directly $stdout = *STDOUT; *STDOUT = $buffer_fh; ... *STDOUT = $stdout; for which, I think, to be done correctly we would have to duplicate (large?) parts of Capture::Tiny. So there is a bit of rule of preservation of difficulty at work. Either we have large patch adding explicit filehandle to all print statements 'print <sth> -> print $out <sth>' but simple code, or have smaller patch but complicated *STDOUT manipulation, or have small patch but rely on non-core CPAN modules present. > I think we (well, in practice, "you" nowadays :) should be less > perfectionist when accepting gitweb contributions, so if the answer is > "it's too hard for me to do right now", that's fine I guess; I know I > can't contribute the code currently. :( True. At least two series ('categories' which I have copy locally in http://repo.or.cz/w/git/jnareb-git.git as gitweb/category, and 'committags' by Marcel M. Cary[1]) that are not yet in gitweb because of that... [1]: http://thread.gmane.org/gmane.comp.version-control.git/100418/focus=133141 -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle 2010-01-25 1:47 ` Jakub Narebski @ 2010-01-25 20:48 ` J.H. 2010-01-25 21:48 ` Jakub Narebski 0 siblings, 1 reply; 60+ messages in thread From: J.H. @ 2010-01-25 20:48 UTC (permalink / raw) To: Jakub Narebski; +Cc: Petr Baudis, John 'Warthog9' Hawley, git > So there is a bit of rule of preservation of difficulty at work. Either > we have large patch adding explicit filehandle to all print statements > 'print <sth> -> print $out <sth>' but simple code, or have smaller patch > but complicated *STDOUT manipulation, or have small patch but rely on > non-core CPAN modules present. I think depending in non-core CPAN modules is a really bad idea, and will cause some concern and consternation with mainline distributions, besides making it more complicated for users to get this up and running quickly and easily. While I agree there are other ways of handling this I think just adding the filehandle to the print statements provides the easiest cross section of usability and functionality for everything involved. - John 'Warthog9' Hawley ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle 2010-01-25 20:48 ` J.H. @ 2010-01-25 21:48 ` Jakub Narebski 0 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-25 21:48 UTC (permalink / raw) To: J.H.; +Cc: Petr Baudis, John 'Warthog9' Hawley, git On Mon, 25 Jan 2010, J.H. wrote: > > So there is a bit of rule of preservation of difficulty at work. Either > > we have large patch adding explicit filehandle to all print statements > > 'print <sth> -> print $out <sth>' but simple code, or have smaller patch > > but complicated *STDOUT manipulation, or have small patch but rely on > > non-core CPAN modules present. > > I think depending in non-core CPAN modules is a really bad idea, and > will cause some concern and consternation with mainline distributions, > besides making it more complicated for users to get this up and running > quickly and easily. Note: the modules in question are Capture::Tiny or PerlIO::Util. Capture::Tiny is available as perl-Capture-Tiny RPM module for Fedora Core up from FC11 and in DAG repository, at least. First, installing non-core CPAN modules locally for user is very simple thanks to local::lib. After setup it is simply 'cpan -i PerlIO::Util', and PerlIO::Util gets installed locally, by default inside ~/perl5. Second, we can go the route that Git.pm did, namely include sources for extra modules (private-Error.pm in case of Git.pm) and install it if needed. With proper build system (Module::Install, Module::Build) it is even simpler with inc or inc::latest. And there is always PAR[1] if you want to have everything in one file. [1]: http://par.perl.org > While I agree there are other ways of handling this > I think just adding the filehandle to the print statements provides the > easiest cross section of usability and functionality for everything > involved. Well, manipulating *STDOUT shouldn't be, I think, much harder, although I guess it would need more care (so that e.g. "Generating..." page gets written to real STDOUT). -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header 2010-01-14 1:23 ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle John 'Warthog9' Hawley @ 2010-01-15 23:49 ` Jakub Narebski 1 sibling, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-15 23:49 UTC (permalink / raw) To: John 'Warthog9' Hawley; +Cc: git "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes: I have modified commit message a bit > If $site_header is not defined you get extraneous errors in the web > logs: > > [Wed Jan 13 16:55:42 2010] [error] [client ::1] [Wed Jan 13 16:55:42 2010] > gitweb.cgi: Use of uninitialized value $site_header in -f at .../gitweb.cgi line 3287., > referer: http://git/gitweb.cgi > > for example (line wrapped for better readibility). This commit > ensures that the variable is defined before trying to use it. Ordinarily build procedure ensures that $site_header is defined (but empty, therefore false-ish), so this issue might happen only because of errors in gitweb config file. Nevetheless it is better to code defensively. Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org> > --- > gitweb/gitweb.perl | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 07fdeb5..c4a177d 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -3277,7 +3277,7 @@ EOF > print "</head>\n" . > "<body>\n"; > > - if (-f $site_header) { > + if ($site_header && -f $site_header) { It should be, I think + if (defined $site_header && -f $site_header) { although I guess that nobody would name $site_header file "0". > insert_file($site_header); > } > > -- More important is the fact that it is not the only variable holding file name, that is not checked that it is defined before use. You should do the same change also for $site_footer and $home_text, and die_error(500, "Gitweb misconfigured") or just die_error(500) if $projects_list is not defined. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 5/9] gitweb: add a get function to compliment print_local_time 2010-01-14 1:23 ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 6/9] gitweb: add a get function to compliment print_sort_th John 'Warthog9' Hawley @ 2010-01-23 11:13 ` Jakub Narebski 1 sibling, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-23 11:13 UTC (permalink / raw) To: John 'Warthog9' Hawley; +Cc: git "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes: Should authorship be associated with eaglescrag.net or kernel.org? > This adds a get function for print_local_time so that the basic > function can be used outside of their straight printing operation. Signoff? > --- > gitweb/gitweb.perl | 11 +++++++++-- > 1 files changed, 9 insertions(+), 2 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index b41bc33..d38aad6 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -3537,14 +3537,21 @@ sub git_print_header_div { > } > > sub print_local_time { > + print get_local_time(@_); > +} > + > +sub get_local_time { This new function should be rather named 'format_local_time', to follow gitweb convention of naming subroutines. (And of course commit message needs to be changed accordingly.) This comment applies also to the following patch. > + my $localtime = ""; > my %date = @_; > if ($date{'hour_local'} < 6) { > - printf(" (<span class=\"atnight\">%02d:%02d</span> %s)", > + $localtime .= sprintf(" (<span class=\"atnight\">%02d:%02d</span> %s)", > $date{'hour_local'}, $date{'minute_local'}, $date{'tz_local'}); > } else { > - printf(" (%02d:%02d %s)", > + $localtime .= sprintf(" (%02d:%02d %s)", > $date{'hour_local'}, $date{'minute_local'}, $date{'tz_local'}); > } > + > + return $localtime Style (terminate statement with ';'): + return $localtime; > } > > # Outputs the author name and date in long form > -- > 1.6.5.2 > -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 3/9] gitweb: Add option to force version match 2010-01-14 1:22 ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 4/9] gitweb: Makefile improvements John 'Warthog9' Hawley @ 2010-01-15 23:36 ` Jakub Narebski 2010-01-24 21:59 ` Petr Baudis 2 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-15 23:36 UTC (permalink / raw) To: John 'Warthog9' Hawley; +Cc: git "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes: > This adds $git_versions_must_match variable, which is set to true, > checks that we are running on the same version of git that we > shipped with, and if not throw '500 Internal Server Error' error. > What is checked is the version of gitweb (embedded in building > gitweb.cgi), against version of runtime git binary used. > > Gitweb can usually run with a mismatched git install. This is more > here to give an obvious warning as to whats going on vs. silently > failing. > > By default this feature is turned on. If this feature is turned on, then I would prefer for gitweb tests to have it explicitly turned off, so I don't need to rebuild git to test gitweb. Also it would be nice to have some tests for this new feature. See patch below (proposed to be squashed with this one). P.S. t is there where I have noticed the issue with undefined $action in git_footer_html(), mentioned in reply to patch 1/9. > Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org> > Signed-off-by: Jakub Narebski <jnareb@gmail.com> > --- Here could be information about differences from previous versions of this patch. > gitweb/README | 3 +++ > gitweb/gitweb.perl | 23 +++++++++++++++++++++++ > 2 files changed, 26 insertions(+), 0 deletions(-) > > diff --git a/gitweb/README b/gitweb/README > index 6c2c8e1..03151d2 100644 > --- a/gitweb/README > +++ b/gitweb/README > @@ -233,6 +233,9 @@ not include variables usually directly set during build): > If server load exceed this value then return "503 Service Unavaliable" error. > Server load is taken to be 0 if gitweb cannot determine its value. Set it to > undefined value to turn it off. The default is 300. > + * $git_versions_must_match > + If set, gitweb fails with 500 Internal Server Error if the version of gitweb > + doesn't match version of git binary. The default is true. > > > Projects list file format > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 8298de5..b41bc33 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -221,6 +221,9 @@ our %avatar_size = ( > 'double' => 32 > ); > > +# If it is true, exit if gitweb version and git binary version don't match > +our $git_versions_must_match = 1; > + > # Used to set the maximum load that we will still respond to gitweb queries. > # If server load exceed this value then return "503 server busy" error. > # If gitweb cannot determined server load, it is taken to be 0. > @@ -587,6 +590,26 @@ if (defined $maxload && get_loadavg() > $maxload) { > our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown"; > $number_of_git_cmds++; > > +# Throw an error if git versions does not match, if $git_versions_must_match is true. > +if ($git_versions_must_match && > + $git_version ne $version) { > + my $admin_contact = > + defined $ENV{'SERVER_ADMIN'} ? ", $ENV{'SERVER_ADMIN'}," : ''; > + my $err_msg = <<EOT; > +<h1 align="center">*** Warning ***</h1> > +<p> > +This version of gitweb was compiled for <b>@{[esc_html($version)]}</b>, > +however git version <b>@{[esc_html($git_version)]}</b> was found on server, > +and administrator requested strict version checking. > +</p> > +<p> > +Please contact the server administrator${admin_contact} to either configure > +gitweb to allow mismatched versions, or update git or gitweb installation. > +</p> > +EOT > + die_error(500, 'Internal server error', $err_msg); > +} > + > $projects_list ||= $projectroot; > > # ====================================================================== diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh index 76d8b7b..d9ffc90 --- a/t/gitweb-lib.sh +++ b/t/gitweb-lib.sh @@ -25,6 +25,7 @@ our \$favicon = 'file:///$TEST_DIRECTORY/../gitweb/git-favicon.png'; our \$projects_list = ''; our \$export_ok = ''; our \$strict_export = ''; +our \$git_versions_must_match = 0; EOF diff --git a/t/t9501-gitweb-standalone-http-status.sh b/t/t9501-gitweb-standalone-http-status.sh index 0688a57..721900e 100755 --- a/t/t9501-gitweb-standalone-http-status.sh +++ b/t/t9501-gitweb-standalone-http-status.sh @@ -113,5 +113,31 @@ test_expect_success 'snapshots: bad object id' ' ' test_debug 'cat gitweb.output' +# ====================================================================== +# check $git_versions_must_match feature +# should be last section, just in case +cp -f gitweb_config.perl gitweb_config.perl.bak +echo 'our $git_versions_must_match = 1;' >>gitweb_config.perl + +cat <<\EOF >>gitweb_config.perl +our $version = "current"; +EOF +test_expect_success 'force version match: no match' ' + gitweb_run "p=.git" && + grep "500 - Internal Server Error" gitweb.output +' +test_debug 'cat gitweb.headers' + +cat <<\EOF >>gitweb_config.perl +# must be kept in sync with code in gitweb/gitweb.perl +our $version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown"; +EOF +test_expect_success 'force version match: match' ' + gitweb_run "p=.git" && + grep "Status: 200 OK" gitweb.output +' +test_debug 'cat gitweb.headers' + +mv -f gitweb_config.perl.bak gitweb_config.perl test_done > -- > 1.6.5.2 > -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: [PATCH 3/9] gitweb: Add option to force version match 2010-01-14 1:22 ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 4/9] gitweb: Makefile improvements John 'Warthog9' Hawley 2010-01-15 23:36 ` [PATCH 3/9] gitweb: Add option to force version match Jakub Narebski @ 2010-01-24 21:59 ` Petr Baudis 2010-01-24 23:17 ` Jakub Narebski 2 siblings, 1 reply; 60+ messages in thread From: Petr Baudis @ 2010-01-24 21:59 UTC (permalink / raw) To: John 'Warthog9' Hawley; +Cc: git Hi! On Wed, Jan 13, 2010 at 05:22:59PM -0800, John 'Warthog9' Hawley wrote: > +# Throw an error if git versions does not match, if $git_versions_must_match is true. > +if ($git_versions_must_match && > + $git_version ne $version) { > + my $admin_contact = > + defined $ENV{'SERVER_ADMIN'} ? ", $ENV{'SERVER_ADMIN'}," : ''; > + my $err_msg = <<EOT; > +<h1 align="center">*** Warning ***</h1> > +<p> > +This version of gitweb was compiled for <b>@{[esc_html($version)]}</b>, > +however git version <b>@{[esc_html($git_version)]}</b> was found on server, > +and administrator requested strict version checking. > +</p> > +<p> > +Please contact the server administrator${admin_contact} to either configure > +gitweb to allow mismatched versions, or update git or gitweb installation. > +</p> > +EOT > + die_error(500, 'Internal server error', $err_msg); > +} > + I get very unhappy when *I* am the server administrator and read messages like this. ;-) Could you please mention the \$git_versions_must_match in the text? Petr "Pasky" Baudis ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 3/9] gitweb: Add option to force version match 2010-01-24 21:59 ` Petr Baudis @ 2010-01-24 23:17 ` Jakub Narebski 0 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-24 23:17 UTC (permalink / raw) To: Petr Baudis Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley, git Petr Baudis <pasky@suse.cz> writes: > On Wed, Jan 13, 2010 at 05:22:59PM -0800, John 'Warthog9' Hawley wrote: > > +# Throw an error if git versions does not match, if $git_versions_must_match is true. > > +if ($git_versions_must_match && > > + $git_version ne $version) { > > + my $admin_contact = > > + defined $ENV{'SERVER_ADMIN'} ? ", $ENV{'SERVER_ADMIN'}," : ''; > > + my $err_msg = <<EOT; > > +<h1 align="center">*** Warning ***</h1> > > +<p> > > +This version of gitweb was compiled for <b>@{[esc_html($version)]}</b>, > > +however git version <b>@{[esc_html($git_version)]}</b> was found on server, > > +and administrator requested strict version checking. +and administrator requested strict version checking by setting +\$git_versions_must_match to false value (e.g. to 0) in gitweb configuration file, +'@{[esc_path(-e $GITWEB_CONFIG ? $GITWEB_CONFIG : $GITWEB_CONFIG_SYSTEM)]}'. > > +</p> > > +<p> > > +Please contact the server administrator${admin_contact} to either configure > > +gitweb to allow mismatched versions, or update git or gitweb installation. > > +</p> > > +EOT > > + die_error(500, 'Internal server error', $err_msg); > > +} > > + > > I get very unhappy when *I* am the server administrator and read > messages like this. ;-) Could you please mention the > \$git_versions_must_match in the text? Something like the text (extension) proposed above? Actually, J.H., if you are changing the default, you should also change the message to mention that checking versions is the default, and administrator can turn it off[1]. It is incorrect if $git_versions_must_match defaults to true. [1] And you have to take into account that gitweb config file can not exist, and neither $GITWEB_CONFIG nor $GITWEB_CONFIG_SYSTEM needs to be non-empty. J.H., are you really, really sure that it is a good idea to have version check turned on by default, in light of this issue? -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information 2010-01-14 1:22 ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information John 'Warthog9' Hawley 2010-01-14 1:22 ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley @ 2010-01-15 22:40 ` Jakub Narebski 1 sibling, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-15 22:40 UTC (permalink / raw) To: John 'Warthog9' Hawley; +Cc: git "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes: Don't you want kernel.org address also for authorship? Also commit summary can be written in shorter way, see proposal below. From: "John 'Warthog9' Hawley" <warthog9@kernel.org> Subject: gitweb: Allow for longer error explanation in die_error() > This is a small change that just adds a 3rd, optional, parameter to die_error > that allows for extended error information to be output along with what the > error was. Singed-off-by: "John 'Warthog9' Hawley" <warthog9@kernel.org> > --- > gitweb/gitweb.perl | 8 +++++++- > 1 files changed, 7 insertions(+), 1 deletions(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 0a07d3a..8298de5 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -3391,6 +3391,7 @@ sub git_footer_html { > sub die_error { > my $status = shift || 500; > my $error = shift || "Internal server error"; > + my $extra = shift; > > my %http_responses = ( > 400 => '400 Bad Request', > @@ -3405,8 +3406,13 @@ sub die_error { > <br /><br /> > $status - $error > <br /> > -</div> > EOF > + if (defined $extra) { > + print "<hr />\n" . > + "$extra\n"; Following gitweb whitespace convention (tab for indent, space for align), it should be: + print "<hr />\n" . + "$extra\n"; (6 x space in place of last tab). > + } > + print "</div>\n"; > + > git_footer_html(); > exit; > } > -- > 1.6.5.2 > -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 1/9] gitweb: Load checking 2010-01-14 1:22 ` [PATCH 1/9] gitweb: Load checking John 'Warthog9' Hawley 2010-01-14 1:22 ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information John 'Warthog9' Hawley @ 2010-01-15 22:30 ` Jakub Narebski 1 sibling, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-15 22:30 UTC (permalink / raw) To: John 'Warthog9' Hawley; +Cc: git This one looks good, but while examining other patch in this series I have noticed rare situation where we would get Perl error with this patch. I have added fix-up for this issue, although I guess that better solution might be not to add any <script> element for git_footer_html called from die_error. I'm sorry I haven't noticed this earlier. "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes: > From: John 'Warthog9' Hawley <warthog9@kernel.org> > > This changes slightly the behavior of gitweb, so that it verifies > that the box isn't inundated with before attempting to serve gitweb. > If the box is overloaded, it basically returns a 503 Server Unavailable > until the load falls below the defined threshold. This helps dramatically > if you have a box that's I/O bound, reaches a certain load and you > don't want gitweb, the I/O hog that it is, increasing the pain the > server is already undergoing. > > This behavior is controlled by $maxload configuration variable. > Default is a load of 300, which for most cases should never be hit. > Unset it (set it to undefined value, i.e. undef) to turn off checking. > > Currently it requires that '/proc/loadavg' file exists, otherwise the > load check is bypassed (load is taken to be 0). So platforms that do > not implement '/proc/loadavg' currently cannot use this feature. > (provisions are included for additional checks to be added by others) > While at it check that $action is defined before comparing it in git_footer_html() subroutine. Until this patch there were no direct or indirect (via die_error) invocation of git_footer_html() with $action undefined; each call was after dispatch, which sets $action to default value if it is undefined. This would cause Perl error ("Use of uninitialized value in string eq") if load is too high _and_ gitweb was invoked without action parameter explicitly set (e.g. for projects list). > Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org> > Signed-off-by: Jakub Narebski <jnareb@gmail.com> > --- > gitweb/README | 7 ++++++- > gitweb/gitweb.perl | 45 +++++++++++++++++++++++++++++++++++++++++---- > 2 files changed, 47 insertions(+), 5 deletions(-) > > diff --git a/gitweb/README b/gitweb/README > index e34ee79..6c2c8e1 100644 > --- a/gitweb/README > +++ b/gitweb/README > @@ -174,7 +174,7 @@ not include variables usually directly set during build): > Base URL for relative URLs in pages generated by gitweb, > (e.g. $logo, $favicon, @stylesheets if they are relative URLs), > needed and used only for URLs with nonempty PATH_INFO via > - <base href="$base_url>. Usually gitweb sets its value correctly, > + <base href="$base_url">. Usually gitweb sets its value correctly, > and there is no need to set this variable, e.g. to $my_uri or "/". > * $home_link > Target of the home link on top of all pages (the first part of view > @@ -228,6 +228,11 @@ not include variables usually directly set during build): > repositories from launching cross-site scripting (XSS) attacks. Set this > to true if you don't trust the content of your repositories. The default > is false. > + * $maxload > + Used to set the maximum load that we will still respond to gitweb queries. > + If server load exceed this value then return "503 Service Unavaliable" error. > + Server load is taken to be 0 if gitweb cannot determine its value. Set it to > + undefined value to turn it off. The default is 300. > > > Projects list file format > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 7e477af..0a07d3a 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -221,6 +221,12 @@ our %avatar_size = ( > 'double' => 32 > ); > > +# Used to set the maximum load that we will still respond to gitweb queries. > +# If server load exceed this value then return "503 server busy" error. > +# If gitweb cannot determined server load, it is taken to be 0. > +# Leave it undefined (or set to 'undef') to turn off load checking. > +our $maxload = 300; > + > # You define site-wide feature defaults here; override them with > # $GITWEB_CONFIG as necessary. > our %feature = ( > @@ -551,6 +557,32 @@ if (-e $GITWEB_CONFIG) { > do $GITWEB_CONFIG_SYSTEM if -e $GITWEB_CONFIG_SYSTEM; > } > > +# Get loadavg of system, to compare against $maxload. > +# Currently it requires '/proc/loadavg' present to get loadavg; > +# if it is not present it returns 0, which means no load checking. > +sub get_loadavg { > + if( -e '/proc/loadavg' ){ > + open my $fd, '<', '/proc/loadavg' > + or return 0; > + my @load = split(/\s+/, scalar <$fd>); > + close $fd; > + > + # The first three columns measure CPU and IO utilization of the last one, > + # five, and 10 minute periods. The fourth column shows the number of > + # currently running processes and the total number of processes in the m/n > + # format. The last column displays the last process ID used. > + return $load[0] || 0; > + } > + # additional checks for load average should go here for things that don't export > + # /proc/loadavg > + > + return 0; > +} > + > +if (defined $maxload && get_loadavg() > $maxload) { > + die_error(503, "The load average on the server is too high"); > +} > + > # version of the core git binary > our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown"; > $number_of_git_cmds++; > @@ -3354,14 +3386,19 @@ sub git_footer_html { > # 500: The server isn't configured properly, or > # an internal error occurred (e.g. failed assertions caused by bugs), or > # an unknown error occurred (e.g. the git binary died unexpectedly). > +# 503: The server is currently unavailable (because it is overloaded, > +# or down for maintenance). Generally, this is a temporary state. > sub die_error { > my $status = shift || 500; > my $error = shift || "Internal server error"; > > - my %http_responses = (400 => '400 Bad Request', > - 403 => '403 Forbidden', > - 404 => '404 Not Found', > - 500 => '500 Internal Server Error'); > + my %http_responses = ( > + 400 => '400 Bad Request', > + 403 => '403 Forbidden', > + 404 => '404 Not Found', > + 500 => '500 Internal Server Error', > + 503 => '503 Service Unavailable', > + ); > git_header_html($http_responses{$status}); > print <<EOF; > <div class="page_body"> @@ -3354,7 +3354,8 @@ sub git_footer_html { } print qq!<script type="text/javascript" src="$javascript"></script>\n!; - if ($action eq 'blame_incremental') { + if (defined $action && + $action eq 'blame_incremental') { print qq!<script type="text/javascript">\n!. qq!startBlame("!. href(action=>"blame_data", -replay=>1) .qq!",\n!. qq! "!. href() .qq!");\n!. > -- > 1.6.5.2 > -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/9] Gitweb caching v5 2010-01-14 1:22 [PATCH 0/9] Gitweb caching v5 John 'Warthog9' Hawley 2010-01-14 1:22 ` [PATCH 1/9] gitweb: Load checking John 'Warthog9' Hawley @ 2010-01-15 1:40 ` Jakub Narebski 2010-01-15 4:29 ` J.H. 1 sibling, 1 reply; 60+ messages in thread From: Jakub Narebski @ 2010-01-15 1:40 UTC (permalink / raw) To: John 'Warthog9' Hawley, John 'Warthog9' Hawley; +Cc: git "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes: > Afternoon everyone, > > This is the latest incarnation of gitweb w/ caching. This is > finally at the point where it should probably start either being > considered for inclusion or mainline, or I need to accept that this > will never get in and more perminantely fork (as is the case with > Fedora where this is going in as gitweb-caching as a parrallel rpm > package). > > That said this brings the base up to mainline (again), Could you tell us which commit is the base of this series (like in git-request-pull output), i.e. which commit this series is rebased against? > it updates a > number of elements in the caching engine, and this is a much cleaner > break-out of the tree vs. what I am currently developing against. Is caching engine part changed since v2? > v5: > - Missed a couple of things that were in my local tree, and > added them back in. That doesn't tell us much. > - Split up the die_error and the version matching patch > - Set version matching to be on by default - otherwise this > really is code that will never get checked, or at best > enabled by default by distributions > - Added a minor code cleanup with respect to $site_header > that was already in my tree > - Applied against a more recent git tree vs. 1.6.6-rc2 > - Removed breakout patch for now (did that in v4 actually) > and will deal with that separately > > http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v5 ---- Short comments about patches in this series; I will be sending detailed comments for each patch individually. > John 'Warthog9' Hawley (9): > gitweb: Load checking Looks good. > gitweb: change die_error to take "extra" argument for extended die > information Commit message could be better (summary should really be shorter), and I think there is some indent typo, but otherwise looks good. > gitweb: Add option to force version match For me it needs to be disabled in gitweb test suite (t/gitweb-lib.sh), if it is enabled by default. I really like that I can test current gitweb without need to recompile git. Also it should have tests that it works as intended (both for matching and non-matching versions) in t/t9501-gitweb-standalone-http-status.sh > gitweb: Makefile improvements Does it differ from my proposal (i.e. gitweb/Makefile doing the work), based on your idea ("make gitweb" for Makefile and gitweb/Makefile)? > gitweb: add a get function to compliment print_local_time > gitweb: add a get function to compliment print_sort_th Those two looks O.K. from what I seen. > gitweb: cleanup error message produced by undefined $site_header Shouldn't there be such protection for other such variables, like $site_footer and $home_text (and a bit diferent protection against undefined $projects_list)? By the way, how did you arrived at undefined $site_header: deafult build configuration leaves it empty, but defined. > gitweb: Convert output to using indirect file handle I have alternate solution, using shorter filehandle name (just $out) in git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel I would have to think a bit about separate handle for binary files; I am not sure if it is really required. > gitweb: File based caching layer (from git.kernel.org) I am working (time permitting) in spliting this large code drop into smaller commits, namely: - href(..., -path_info => 0) (for cache key) - simple file based caching + tests - global expire time + tests - output caching in gitweb (WIP) - adaptive expiration time (planned) - tee output / cache write (planned) - expire time variation from CHI (planned) - locking for single writer (planned) - server-side generating info (planned) - AJAX-y generating info (wishlist) while ensuring that it pass all existing gitweb tests, and adding new tests for new features. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/9] Gitweb caching v5 2010-01-15 1:40 ` [PATCH 0/9] Gitweb caching v5 Jakub Narebski @ 2010-01-15 4:29 ` J.H. 2010-01-15 10:28 ` Jakub Narebski 0 siblings, 1 reply; 60+ messages in thread From: J.H. @ 2010-01-15 4:29 UTC (permalink / raw) To: Jakub Narebski; +Cc: John 'Warthog9' Hawley, git On 01/14/2010 05:40 PM, Jakub Narebski wrote: > "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes: > >> Afternoon everyone, >> >> This is the latest incarnation of gitweb w/ caching. This is >> finally at the point where it should probably start either being >> considered for inclusion or mainline, or I need to accept that this >> will never get in and more perminantely fork (as is the case with >> Fedora where this is going in as gitweb-caching as a parrallel rpm >> package). >> >> That said this brings the base up to mainline (again), > > Could you tell us which commit is the base of this series (like in > git-request-pull output), i.e. which commit this series is rebased > against? This series was based on git://git.kernel.org/pub/scm/git/git.git 054d2fa05cf0bc55fe1556c9e87d58d67a144f44 http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v5 > >> it updates a >> number of elements in the caching engine, and this is a much cleaner >> break-out of the tree vs. what I am currently developing against. > > Is caching engine part changed since v2? Slightly, not dramatically. Only changes were to modify the caching engine to deal with the change in output mechanism (I.E. my $output -> print {$fh}) <snip> >> John 'Warthog9' Hawley (9): >> gitweb: Load checking > Looks good. > >> gitweb: change die_error to take "extra" argument for extended die >> information > Commit message could be better (summary should really be shorter), and > I think there is some indent typo, but otherwise looks good. > >> gitweb: Add option to force version match > For me it needs to be disabled in gitweb test suite (t/gitweb-lib.sh), > if it is enabled by default. I really like that I can test current > gitweb without need to recompile git. > > Also it should have tests that it works as intended (both for matching > and non-matching versions) in t/t9501-gitweb-standalone-http-status.sh I'll get t9501 cleaned up and make sure that the tests default to turning it off, and I'll add a test to confirm that this works. >> gitweb: Makefile improvements > Does it differ from my proposal (i.e. gitweb/Makefile doing the work), > based on your idea ("make gitweb" for Makefile and gitweb/Makefile)? I think this is taken straight from the version you had, I don't think I've modified it. > >> gitweb: add a get function to compliment print_local_time >> gitweb: add a get function to compliment print_sort_th > Those two looks O.K. from what I seen. > >> gitweb: cleanup error message produced by undefined $site_header > Shouldn't there be such protection for other such variables, like > $site_footer and $home_text (and a bit diferent protection against > undefined $projects_list)? By the way, how did you arrived at > undefined $site_header: deafult build configuration leaves it empty, > but defined. I would have to go back and figure it out, but it's something I hit years ago and added that check to keep it from spewing all over my logs. Could easily add it to the others mentioned. >> gitweb: Convert output to using indirect file handle > I have alternate solution, using shorter filehandle name (just $out) > in > > git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel > http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel > > I would have to think a bit about separate handle for binary files; > I am not sure if it is really required. For caching you have to have it. When your outputing the data back from the cache you need to switch the output mode for the browser to receive the data properly. Otherwise the resulting output from the caching engine is going to be garbage. The caching engine explicitly stores the binary files separate from the rest of the response. >> gitweb: File based caching layer (from git.kernel.org) > I am working (time permitting) in spliting this large code drop into > smaller commits, namely: first up, is there a reason not to take the caching layer as it stands while you work on these? I'm fine with adding test cases for what's there now if you want, but I guess I'm confused about explicitly wanting to break these into smaller commits. > - href(..., -path_info => 0) (for cache key) You actually *really* want to have the full url vs. just the path_info. While I accept that this means that you will end up with multiple copies of data being stored it helps dramatically if you have multiple sites pointing into the same caching space. If you happen to have two distinct trees http://git.public.com/?p=test.git;a=summary http://git.private.com/?p=test.git;a=summary That respectively point to: /group/public/git/test.git /group/private/git/test.git you'll end up squashing the cache files needlessly and erroneously as what's in the cache file will depend on what last site was hit that generated the file. > - simple file based caching + tests > - global expire time + tests > - output caching in gitweb (WIP) > - adaptive expiration time (planned) > - tee output / cache write (planned) You sadly can't 'tee' the output as this would re-introduce the stampeding heard problem which is one of the reasons the caching layer came about in the first place. Suppose you could give one person the output but make everyone else wait for the cache to finish writing out, or have the waiting client processes tail the file while it's generated but those both seem a little excessive vs. just waiting. > - expire time variation from CHI (planned) > - locking for single writer (planned) > - server-side generating info (planned) > - AJAX-y generating info (wishlist) If it's helpful I can genuinely devote several more days to this to get these cleaned up. Which would save you a fair amount of time in breaking this up. - John 'Warthog9' Hawley ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH 0/9] Gitweb caching v5 2010-01-15 4:29 ` J.H. @ 2010-01-15 10:28 ` Jakub Narebski 0 siblings, 0 replies; 60+ messages in thread From: Jakub Narebski @ 2010-01-15 10:28 UTC (permalink / raw) To: J.H.; +Cc: John 'Warthog9' Hawley, git On Fri, 15 Jan 2010, J.H. wrote: > On 01/14/2010 05:40 PM, Jakub Narebski wrote: >> "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes: >>> it updates a >>> number of elements in the caching engine, and this is a much cleaner >>> break-out of the tree vs. what I am currently developing against. >> >> Is caching engine part changed since v2? > > Slightly, not dramatically. Only changes were to modify the caching > engine to deal with the change in output mechanism (I.E. my $output -> > print {$fh}) O.K. It is still quite large code dump... >>> John 'Warthog9' Hawley (9): [...] >>> gitweb: Add option to force version match >> For me it needs to be disabled in gitweb test suite (t/gitweb-lib.sh), >> if it is enabled by default. I really like that I can test current >> gitweb without need to recompile git. >> >> Also it should have tests that it works as intended (both for matching >> and non-matching versions) in t/t9501-gitweb-standalone-http-status.sh > > I'll get t9501 cleaned up and make sure that the tests default to > turning it off, and I'll add a test to confirm that this works. I'll send you proposed fixup to squash, so if you want you can simply wait for a bit. [...] >>> gitweb: cleanup error message produced by undefined $site_header >> Shouldn't there be such protection for other such variables, like >> $site_footer and $home_text (and a bit diferent protection against >> undefined $projects_list)? By the way, how did you arrived at >> undefined $site_header: deafult build configuration leaves it empty, >> but defined. > > I would have to go back and figure it out, but it's something I hit > years ago and added that check to keep it from spewing all over my logs. > Could easily add it to the others mentioned. Please remember that while for $home_text, $site_header and $site_footer it would be enough to replace if (-f $file_variable) { ... } with if (defined $file_variable && -f $file_variable) { ... } for undefined $projects_list you have to fail (use die_error). Defensive programming, here protecting against errors in gitweb config, is a good idea IMHO. >>> gitweb: Convert output to using indirect file handle >> I have alternate solution, using shorter filehandle name (just $out) >> in >> >> git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel >> http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel >> >> I would have to think a bit about separate handle for binary files; >> I am not sure if it is really required. > > For caching you have to have it. When your outputing the data back from > the cache you need to switch the output mode for the browser to receive > the data properly. Otherwise the resulting output from the caching > engine is going to be garbage. The caching engine explicitly stores the > binary files separate from the rest of the response. I think you don't. Just read from cache file and print to browser in binary (:raw) mode; IIUC the ':utf8' is needed only when you print from Perl. >>> gitweb: File based caching layer (from git.kernel.org) >> I am working (time permitting) in spliting this large code drop into >> smaller commits, namely: > > first up, is there a reason not to take the caching layer as it stands > while you work on these? I'm fine with adding test cases for what's > there now if you want, but I guess I'm confused about explicitly wanting > to break these into smaller commits. It's there a reason to take this big code dump while we (me and you) are working on it? As I see there is no other work on gitweb now, and for the time being you can run your fork for a bit. Or is it about planned deployment on git.fedora.org or something? Anyway when there is official release of git including gitweb output caching, it should be easy to switch off from caching fork. The caching code has also a few issues: coding style, Perl style (e.g. not using $cache_enabled / $caching_enabled as boolean), duplicated code, unnecessary open/close dance around stat (you can stat file which is not closed), not using atomic write (I'm not sure how important this issue is, especially with locking), not being able to replace cache by some other caching module, etc. >> - href(..., -path_info => 0) (for cache key) > > You actually *really* want to have the full url vs. just the path_info. Errr... it is not about full url (although I errorneously used relative URL for human-readable cache key) vs only path_info, but about path_info (for example http://git.public.com/test.git) and query (for example http://git.public.com/?p=test.git) URLs which leads to exactly the same view. > While I accept that this means that you will end up with multiple > copies of data being stored it helps dramatically if you have multiple > sites pointing into the same caching space. If you happen to have two > distinct trees > > http://git.public.com/?p=test.git;a=summary > http://git.private.com/?p=test.git;a=summary > > That respectively point to: > > /group/public/git/test.git > /group/private/git/test.git > > you'll end up squashing the cache files needlessly and erroneously as > what's in the cache file will depend on what last site was hit that > generated the file. So the key (the human-readable key, not it's MD5 digest) would be href(-replay=>1, -full_url=>1, -path_info=>0); >> - simple file based caching + tests >> - global expire time + tests >> - output caching in gitweb (WIP) >> - adaptive expiration time (planned) >> - tee output / cache write (planned) > > You sadly can't 'tee' the output as this would re-introduce the > stampeding heard problem which is one of the reasons the caching layer > came about in the first place. Suppose you could give one person the > output but make everyone else wait for the cache to finish writing out, > or have the waiting client processes tail the file while it's generated > but those both seem a little excessive vs. just waiting. The "tee" is about serving [one person] the output, while cache is being written. It is a bit orthogonal issue to the 'stampeding herd' protection. >> - expire time variation from CHI (planned) This feature is meant to prevent cache miss stampedes[1] [1] http://search.cpan.org/~jswartz/CHI-0.33/lib/CHI.pm#expires_variance >> - locking for single writer (planned) ...or rather locking to avoid 'stampeding herd' problem. BTW I don't recall if it isn't the solution you used, but if there exists stale cache entry, the other clients can use stale version instead of forcing to wait till the writer finishes updating cache. >> - server-side generating info (planned) >> - AJAX-y generating info (wishlist) > > If it's helpful I can genuinely devote several more days to this to get > these cleaned up. Which would save you a fair amount of time in > breaking this up. I'm in the middle of working on gitweb caching, with the intent of arriving at (super)set of features that are in your patch series. See http://repo.or.cz/w/?p=git/jnareb-git.git;a=shortlog;h=refs/heads/gitweb/cache-kernel (this branch is rebased / amended / rewritten). -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2010-02-07 21:32 UTC | newest] Thread overview: 60+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-01-14 1:22 [PATCH 0/9] Gitweb caching v5 John 'Warthog9' Hawley 2010-01-14 1:22 ` [PATCH 1/9] gitweb: Load checking John 'Warthog9' Hawley 2010-01-14 1:22 ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information John 'Warthog9' Hawley 2010-01-14 1:22 ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 4/9] gitweb: Makefile improvements John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 6/9] gitweb: add a get function to compliment print_sort_th John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle John 'Warthog9' Hawley 2010-01-14 1:23 ` [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley 2010-01-16 2:48 ` Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) output caching (WIP) Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' problem (WIP) Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for filling cache (WIP) Jakub Narebski 2010-01-23 0:27 ` [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating " Jakub Narebski 2010-01-24 22:24 ` Petr Baudis 2010-01-25 0:03 ` Jakub Narebski 2010-01-25 1:17 ` Jakub Narebski 2010-01-25 11:46 ` Jakub Narebski 2010-01-25 13:02 ` Petr Baudis 2010-01-25 13:48 ` Jakub Narebski 2010-01-25 13:56 ` Petr Baudis 2010-01-25 20:32 ` J.H. 2010-01-26 1:49 ` Jakub Narebski 2010-01-28 17:39 ` Petr Baudis 2010-01-31 11:58 ` Jakub Narebski 2010-01-25 20:58 ` Jakub Narebski 2010-01-25 20:41 ` J.H. 2010-01-26 2:30 ` Jakub Narebski 2010-01-23 19:55 ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H. 2010-01-24 13:54 ` [RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when regenerating cache (WIP) Jakub Narebski 2010-02-06 0:51 ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H. 2010-02-06 23:56 ` Jakub Narebski 2010-02-07 12:35 ` Jakub Narebski [not found] ` <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com> 2010-01-23 0:48 ` [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing for caching) Jakub Narebski 2010-02-07 21:32 ` Jakub Narebski 2010-01-16 0:43 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle Jakub Narebski 2010-01-16 0:58 ` Junio C Hamano 2010-01-16 1:14 ` Jakub Narebski 2010-01-16 1:41 ` Junio C Hamano 2010-01-24 22:14 ` Petr Baudis 2010-01-25 1:47 ` Jakub Narebski 2010-01-25 20:48 ` J.H. 2010-01-25 21:48 ` Jakub Narebski 2010-01-15 23:49 ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header Jakub Narebski 2010-01-23 11:13 ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time Jakub Narebski 2010-01-15 23:36 ` [PATCH 3/9] gitweb: Add option to force version match Jakub Narebski 2010-01-24 21:59 ` Petr Baudis 2010-01-24 23:17 ` Jakub Narebski 2010-01-15 22:40 ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information Jakub Narebski 2010-01-15 22:30 ` [PATCH 1/9] gitweb: Load checking Jakub Narebski 2010-01-15 1:40 ` [PATCH 0/9] Gitweb caching v5 Jakub Narebski 2010-01-15 4:29 ` J.H. 2010-01-15 10:28 ` Jakub Narebski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).