[PATCH 0/9] Gitweb caching v5

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/9] Gitweb caching v5
@ 2010-01-14  1:22 John 'Warthog9' Hawley
  2010-01-14  1:22 ` [PATCH 1/9] gitweb: Load checking John 'Warthog9' Hawley
  2010-01-15  1:40 ` [PATCH 0/9] Gitweb caching v5 Jakub Narebski
  0 siblings, 2 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-01-14  1:22 UTC (permalink / raw)
  To: git

Afternoon everyone,
 
This is the latest incarnation of gitweb w/ caching.  This is
finally at the point where it should probably start either being
considered for inclusion or mainline, or I need to accept that this
will never get in and more perminantely fork (as is the case with
Fedora where this is going in as gitweb-caching as a parrallel rpm
package).

That said this brings the base up to mainline (again), it updates a
number of elements in the caching engine, and this is a much cleaner
break-out of the tree vs. what I am currently developing against.

v5:
	- Missed a couple of things that were in my local tree, and
	  added them back in.
	- Split up the die_error and the version matching patch
	- Set version matching to be on by default - otherwise this
	  really is code that will never get checked, or at best
	  enabled by default by distributions
	- Added a minor code cleanup with respect to $site_header
	  that was already in my tree
	- Applied against a more recent git tree vs. 1.6.6-rc2
	- Removed breakout patch for now (did that in v4 actually)
	  and will deal with that separately 

	http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v5

v4:
	- major re-working of the caching layer to use file handle
	  redirection instead of buffering output
	- other minor improvements

	http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v4
v3:
	- various minor re-works based on mailing list feedback,
	  this series was not sent to the mailing list.
v2:
	- Better breakout
	- You can actually disable the cache now

- John 'Warthog9' Hawley 



John 'Warthog9' Hawley (9):
  gitweb: Load checking
  gitweb: change die_error to take "extra" argument for extended die
    information
  gitweb: Add option to force version match
  gitweb: Makefile improvements
  gitweb: add a get function to compliment print_local_time
  gitweb: add a get function to compliment print_sort_th
  gitweb: cleanup error message produced by undefined $site_header
  gitweb: Convert output to using indirect file handle
  gitweb: File based caching layer (from git.kernel.org)

 Makefile           |   65 +---
 gitweb/Makefile    |  129 +++++++
 gitweb/README      |   10 +-
 gitweb/cache.pm    |  283 ++++++++++++++
 gitweb/gitweb.css  |    6 +
 gitweb/gitweb.perl | 1034 ++++++++++++++++++++++++++++++----------------------
 6 files changed, 1030 insertions(+), 497 deletions(-)
 create mode 100644 gitweb/Makefile
 create mode 100644 gitweb/cache.pm

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/9] gitweb: Load checking
  2010-01-14  1:22 [PATCH 0/9] Gitweb caching v5 John 'Warthog9' Hawley
@ 2010-01-14  1:22 ` John 'Warthog9' Hawley
  2010-01-14  1:22   ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information John 'Warthog9' Hawley
  2010-01-15 22:30   ` [PATCH 1/9] gitweb: Load checking Jakub Narebski
  2010-01-15  1:40 ` [PATCH 0/9] Gitweb caching v5 Jakub Narebski
  1 sibling, 2 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-01-14  1:22 UTC (permalink / raw)
  To: git

From: John 'Warthog9' Hawley <warthog9@kernel.org>

This changes slightly the behavior of gitweb, so that it verifies
that the box isn't inundated with before attempting to serve gitweb.
If the box is overloaded, it basically returns a 503 Server Unavailable
until the load falls below the defined threshold.  This helps dramatically
if you have a box that's I/O bound, reaches a certain load and you
don't want gitweb, the I/O hog that it is, increasing the pain the
server is already undergoing.

This behavior is controlled by $maxload configuration variable.
Default is a load of 300, which for most cases should never be hit.
Unset it (set it to undefined value, i.e. undef) to turn off checking.

Currently it requires that '/proc/loadavg' file exists, otherwise the
load check is bypassed (load is taken to be 0).  So platforms that do
not implement '/proc/loadavg' currently cannot use this feature.
(provisions are included for additional checks to be added by others)

Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/README      |    7 ++++++-
 gitweb/gitweb.perl |   45 +++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/gitweb/README b/gitweb/README
index e34ee79..6c2c8e1 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -174,7 +174,7 @@ not include variables usually directly set during build):
    Base URL for relative URLs in pages generated by gitweb,
    (e.g. $logo, $favicon, @stylesheets if they are relative URLs),
    needed and used only for URLs with nonempty PATH_INFO via
-   <base href="$base_url>.  Usually gitweb sets its value correctly,
+   <base href="$base_url">.  Usually gitweb sets its value correctly,
    and there is no need to set this variable, e.g. to $my_uri or "/".
  * $home_link
    Target of the home link on top of all pages (the first part of view
@@ -228,6 +228,11 @@ not include variables usually directly set during build):
    repositories from launching cross-site scripting (XSS) attacks.  Set this
    to true if you don't trust the content of your repositories. The default
    is false.
+ * $maxload
+   Used to set the maximum load that we will still respond to gitweb queries.
+   If server load exceed this value then return "503 Service Unavaliable" error.
+   Server load is taken to be 0 if gitweb cannot determine its value.  Set it to
+   undefined value to turn it off.  The default is 300.
 
 
 Projects list file format
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7e477af..0a07d3a 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -221,6 +221,12 @@ our %avatar_size = (
 	'double'  => 32
 );
 
+# Used to set the maximum load that we will still respond to gitweb queries.
+# If server load exceed this value then return "503 server busy" error.
+# If gitweb cannot determined server load, it is taken to be 0.
+# Leave it undefined (or set to 'undef') to turn off load checking.
+our $maxload = 300;
+
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %feature = (
@@ -551,6 +557,32 @@ if (-e $GITWEB_CONFIG) {
 	do $GITWEB_CONFIG_SYSTEM if -e $GITWEB_CONFIG_SYSTEM;
 }
 
+# Get loadavg of system, to compare against $maxload.
+# Currently it requires '/proc/loadavg' present to get loadavg;
+# if it is not present it returns 0, which means no load checking.
+sub get_loadavg {
+	if( -e '/proc/loadavg' ){
+		open my $fd, '<', '/proc/loadavg'
+			or return 0;
+		my @load = split(/\s+/, scalar <$fd>);
+		close $fd;
+
+		# The first three columns measure CPU and IO utilization of the last one,
+		# five, and 10 minute periods.  The fourth column shows the number of
+		# currently running processes and the total number of processes in the m/n
+		# format.  The last column displays the last process ID used.
+		return $load[0] || 0;
+	}
+	# additional checks for load average should go here for things that don't export
+	# /proc/loadavg
+
+	return 0;
+}
+
+if (defined $maxload && get_loadavg() > $maxload) {
+	die_error(503, "The load average on the server is too high");
+}
+
 # version of the core git binary
 our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown";
 $number_of_git_cmds++;
@@ -3354,14 +3386,19 @@ sub git_footer_html {
 # 500: The server isn't configured properly, or
 #      an internal error occurred (e.g. failed assertions caused by bugs), or
 #      an unknown error occurred (e.g. the git binary died unexpectedly).
+# 503: The server is currently unavailable (because it is overloaded,
+#      or down for maintenance).  Generally, this is a temporary state.
 sub die_error {
 	my $status = shift || 500;
 	my $error = shift || "Internal server error";
 
-	my %http_responses = (400 => '400 Bad Request',
-			      403 => '403 Forbidden',
-			      404 => '404 Not Found',
-			      500 => '500 Internal Server Error');
+	my %http_responses = (
+		400 => '400 Bad Request',
+		403 => '403 Forbidden',
+		404 => '404 Not Found',
+		500 => '500 Internal Server Error',
+		503 => '503 Service Unavailable',
+	);
 	git_header_html($http_responses{$status});
 	print <<EOF;
 <div class="page_body">
-- 
1.6.5.2

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information
  2010-01-14  1:22 ` [PATCH 1/9] gitweb: Load checking John 'Warthog9' Hawley
@ 2010-01-14  1:22   ` John 'Warthog9' Hawley
  2010-01-14  1:22     ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley
  2010-01-15 22:40     ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information Jakub Narebski
  2010-01-15 22:30   ` [PATCH 1/9] gitweb: Load checking Jakub Narebski
  1 sibling, 2 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-01-14  1:22 UTC (permalink / raw)
  To: git

This is a small change that just adds a 3rd, optional, parameter to die_error
that allows for extended error information to be output along with what the
error was.
---
 gitweb/gitweb.perl |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 0a07d3a..8298de5 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3391,6 +3391,7 @@ sub git_footer_html {
 sub die_error {
 	my $status = shift || 500;
 	my $error = shift || "Internal server error";
+	my $extra = shift;
 
 	my %http_responses = (
 		400 => '400 Bad Request',
@@ -3405,8 +3406,13 @@ sub die_error {
 <br /><br />
 $status - $error
 <br />
-</div>
 EOF
+	if (defined $extra) {
+		print "<hr />\n" .
+			"$extra\n";
+	}
+	print "</div>\n";
+
 	git_footer_html();
 	exit;
 }
-- 
1.6.5.2

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 3/9] gitweb: Add option to force version match
  2010-01-14  1:22   ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information John 'Warthog9' Hawley
@ 2010-01-14  1:22     ` John 'Warthog9' Hawley
  2010-01-14  1:23       ` [PATCH 4/9] gitweb: Makefile improvements John 'Warthog9' Hawley
                         ` (2 more replies)
  2010-01-15 22:40     ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information Jakub Narebski
  1 sibling, 3 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-01-14  1:22 UTC (permalink / raw)
  To: git

This adds $git_versions_must_match variable, which is set to true,
checks that we are running on the same version of git that we
shipped with, and if not throw '500 Internal Server Error' error.
What is checked is the version of gitweb (embedded in building
gitweb.cgi), against version of runtime git binary used.

Gitweb can usually run with a mismatched git install.  This is more
here to give an obvious warning as to whats going on vs. silently
failing.

By default this feature is turned on.

Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/README      |    3 +++
 gitweb/gitweb.perl |   23 +++++++++++++++++++++++
 2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/gitweb/README b/gitweb/README
index 6c2c8e1..03151d2 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -233,6 +233,9 @@ not include variables usually directly set during build):
    If server load exceed this value then return "503 Service Unavaliable" error.
    Server load is taken to be 0 if gitweb cannot determine its value.  Set it to
    undefined value to turn it off.  The default is 300.
+ * $git_versions_must_match
+   If set, gitweb fails with 500 Internal Server Error if the version of gitweb
+   doesn't match version of git binary.  The default is true.
 
 
 Projects list file format
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 8298de5..b41bc33 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -221,6 +221,9 @@ our %avatar_size = (
 	'double'  => 32
 );
 
+# If it is true, exit if gitweb version and git binary version don't match
+our $git_versions_must_match = 1;
+
 # Used to set the maximum load that we will still respond to gitweb queries.
 # If server load exceed this value then return "503 server busy" error.
 # If gitweb cannot determined server load, it is taken to be 0.
@@ -587,6 +590,26 @@ if (defined $maxload && get_loadavg() > $maxload) {
 our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown";
 $number_of_git_cmds++;
 
+# Throw an error if git versions does not match, if $git_versions_must_match is true.
+if ($git_versions_must_match &&
+    $git_version ne $version) {
+	my $admin_contact =
+		defined $ENV{'SERVER_ADMIN'} ? ", $ENV{'SERVER_ADMIN'}," : '';
+	my $err_msg = <<EOT;
+<h1 align="center">*** Warning ***</h1>
+<p>
+This version of gitweb was compiled for <b>@{[esc_html($version)]}</b>,
+however git version <b>@{[esc_html($git_version)]}</b> was found on server,
+and administrator requested strict version checking.
+</p>
+<p>
+Please contact the server administrator${admin_contact} to either configure
+gitweb to allow mismatched versions, or update git or gitweb installation.
+</p>
+EOT
+	die_error(500, 'Internal server error', $err_msg);
+}
+
 $projects_list ||= $projectroot;
 
 # ======================================================================
-- 
1.6.5.2

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 4/9] gitweb: Makefile improvements
  2010-01-14  1:22     ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley
@ 2010-01-14  1:23       ` John 'Warthog9' Hawley
  2010-01-14  1:23         ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time John 'Warthog9' Hawley
  2010-01-15 23:36       ` [PATCH 3/9] gitweb: Add option to force version match Jakub Narebski
  2010-01-24 21:59       ` Petr Baudis
  2 siblings, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-01-14  1:23 UTC (permalink / raw)
  To: git

From: John 'Warthog9' Hawley <warthog9@kernel.org>

This commit adjust the main Makefile so you can simply run

     make gitweb

which in turn calls gitweb/Makefile.  This means that in order to
generate gitweb, you can simply run 'make' from gitweb subdirectory:

     cd gitweb
     make

Targets gitweb/gitweb.cgi and (dependent on JSMIN being defined)
gitweb/gitweb.min.js in main Makefile are preserved for backward
compatibility.

Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 Makefile        |   65 +++++----------------------
 gitweb/Makefile |  129 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 141 insertions(+), 53 deletions(-)
 create mode 100644 gitweb/Makefile

diff --git a/Makefile b/Makefile
index be18389..9f069a8 100644
--- a/Makefile
+++ b/Makefile
@@ -282,29 +282,6 @@ pathsep = :
 # JavaScript minifier invocation that can function as filter
 JSMIN =
 
-# default configuration for gitweb
-GITWEB_CONFIG = gitweb_config.perl
-GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf
-GITWEB_HOME_LINK_STR = projects
-GITWEB_SITENAME =
-GITWEB_PROJECTROOT = /pub/git
-GITWEB_PROJECT_MAXDEPTH = 2007
-GITWEB_EXPORT_OK =
-GITWEB_STRICT_EXPORT =
-GITWEB_BASE_URL =
-GITWEB_LIST =
-GITWEB_HOMETEXT = indextext.html
-GITWEB_CSS = gitweb.css
-GITWEB_LOGO = git-logo.png
-GITWEB_FAVICON = git-favicon.png
-ifdef JSMIN
-GITWEB_JS = gitweb.min.js
-else
-GITWEB_JS = gitweb.js
-endif
-GITWEB_SITE_HEADER =
-GITWEB_SITE_FOOTER =
-
 export prefix bindir sharedir sysconfdir
 
 CC = gcc
@@ -1535,6 +1512,11 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)): % : %.perl
 	chmod +x $@+ && \
 	mv $@+ $@
 
+
+.PHONY: gitweb
+gitweb:
+	$(QUIET_SUBDIR0)gitweb $(QUIET_SUBDIR1) all
+
 ifdef JSMIN
 OTHER_PROGRAMS += gitweb/gitweb.cgi   gitweb/gitweb.min.js
 gitweb/gitweb.cgi: gitweb/gitweb.perl gitweb/gitweb.min.js
@@ -1542,30 +1524,13 @@ else
 OTHER_PROGRAMS += gitweb/gitweb.cgi
 gitweb/gitweb.cgi: gitweb/gitweb.perl
 endif
-	$(QUIET_GEN)$(RM) $@ $@+ && \
-	sed -e '1s|#!.*perl|#!$(PERL_PATH_SQ)|' \
-	    -e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \
-	    -e 's|++GIT_BINDIR++|$(bindir)|g' \
-	    -e 's|++GITWEB_CONFIG++|$(GITWEB_CONFIG)|g' \
-	    -e 's|++GITWEB_CONFIG_SYSTEM++|$(GITWEB_CONFIG_SYSTEM)|g' \
-	    -e 's|++GITWEB_HOME_LINK_STR++|$(GITWEB_HOME_LINK_STR)|g' \
-	    -e 's|++GITWEB_SITENAME++|$(GITWEB_SITENAME)|g' \
-	    -e 's|++GITWEB_PROJECTROOT++|$(GITWEB_PROJECTROOT)|g' \
-	    -e 's|"++GITWEB_PROJECT_MAXDEPTH++"|$(GITWEB_PROJECT_MAXDEPTH)|g' \
-	    -e 's|++GITWEB_EXPORT_OK++|$(GITWEB_EXPORT_OK)|g' \
-	    -e 's|++GITWEB_STRICT_EXPORT++|$(GITWEB_STRICT_EXPORT)|g' \
-	    -e 's|++GITWEB_BASE_URL++|$(GITWEB_BASE_URL)|g' \
-	    -e 's|++GITWEB_LIST++|$(GITWEB_LIST)|g' \
-	    -e 's|++GITWEB_HOMETEXT++|$(GITWEB_HOMETEXT)|g' \
-	    -e 's|++GITWEB_CSS++|$(GITWEB_CSS)|g' \
-	    -e 's|++GITWEB_LOGO++|$(GITWEB_LOGO)|g' \
-	    -e 's|++GITWEB_FAVICON++|$(GITWEB_FAVICON)|g' \
-	    -e 's|++GITWEB_JS++|$(GITWEB_JS)|g' \
-	    -e 's|++GITWEB_SITE_HEADER++|$(GITWEB_SITE_HEADER)|g' \
-	    -e 's|++GITWEB_SITE_FOOTER++|$(GITWEB_SITE_FOOTER)|g' \
-	    $< >$@+ && \
-	chmod +x $@+ && \
-	mv $@+ $@
+	$(QUIET_SUBDIR0)gitweb $(QUIET_SUBDIR1) $(patsubst gitweb/%,%,$@)
+
+ifdef JSMIN
+gitweb/gitweb.min.js: gitweb/gitweb.js
+	$(QUIET_SUBDIR0)gitweb $(QUIET_SUBDIR1) $(patsubst gitweb/%,%,$@)
+endif # JSMIN
+
 
 git-instaweb: git-instaweb.sh gitweb/gitweb.cgi gitweb/gitweb.css gitweb/gitweb.js
 	$(QUIET_GEN)$(RM) $@ $@+ && \
@@ -1592,12 +1557,6 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)) git-instaweb: % : unimplemented.sh
 	mv $@+ $@
 endif # NO_PERL
 
-
-ifdef JSMIN
-gitweb/gitweb.min.js: gitweb/gitweb.js
-	$(QUIET_GEN)$(JSMIN) <$< >$@
-endif # JSMIN
-
 ifndef NO_PYTHON
 $(patsubst %.py,%,$(SCRIPT_PYTHON)): GIT-CFLAGS
 $(patsubst %.py,%,$(SCRIPT_PYTHON)): % : %.py
diff --git a/gitweb/Makefile b/gitweb/Makefile
new file mode 100644
index 0000000..c9eb1ee
--- /dev/null
+++ b/gitweb/Makefile
@@ -0,0 +1,129 @@
+# The default target of this Makefile is...
+all::
+
+# Define V=1 to have a more verbose compile.
+#
+# Define JSMIN to point to JavaScript minifier that functions as
+# a filter to have gitweb.js minified.
+#
+
+prefix ?= $(HOME)
+bindir ?= $(prefix)/bin
+RM ?= rm -f
+
+# JavaScript minifier invocation that can function as filter
+JSMIN ?=
+
+# default configuration for gitweb
+GITWEB_CONFIG = gitweb_config.perl
+GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf
+GITWEB_HOME_LINK_STR = projects
+GITWEB_SITENAME =
+GITWEB_PROJECTROOT = /pub/git
+GITWEB_PROJECT_MAXDEPTH = 2007
+GITWEB_EXPORT_OK =
+GITWEB_STRICT_EXPORT =
+GITWEB_BASE_URL =
+GITWEB_LIST =
+GITWEB_HOMETEXT = indextext.html
+GITWEB_CSS = gitweb.css
+GITWEB_LOGO = git-logo.png
+GITWEB_FAVICON = git-favicon.png
+ifdef JSMIN
+GITWEB_JS = gitweb.min.js
+else
+GITWEB_JS = gitweb.js
+endif
+GITWEB_SITE_HEADER =
+GITWEB_SITE_FOOTER =
+
+# include user config
+-include ../config.mak.autogen
+-include ../config.mak
+
+# determine version
+../GIT-VERSION-FILE: .FORCE-GIT-VERSION-FILE
+	$(QUIET_SUBDIR0)../ $(QUIET_SUBDIR1) GIT-VERSION-FILE
+
+-include ../GIT-VERSION-FILE
+
+### Build rules
+
+SHELL_PATH ?= $(SHELL)
+PERL_PATH  ?= /usr/bin/perl
+
+# Shell quote;
+bindir_SQ = $(subst ','\'',$(bindir))         #'
+SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH)) #'
+PERL_PATH_SQ  = $(subst ','\'',$(PERL_PATH))  #'
+
+# Quiet generation (unless V=1)
+QUIET_SUBDIR0  = +$(MAKE) -C # space to separate -C and subdir
+QUIET_SUBDIR1  =
+
+ifneq ($(findstring $(MAKEFLAGS),w),w)
+PRINT_DIR = --no-print-directory
+else # "make -w"
+NO_SUBDIR = :
+endif
+
+ifneq ($(findstring $(MAKEFLAGS),s),s)
+ifndef V
+	QUIET          = @
+	QUIET_GEN      = $(QUIET)echo '   ' GEN $@;
+	QUIET_SUBDIR0  = +@subdir=
+	QUIET_SUBDIR1  = ;$(NO_SUBDIR) echo '   ' SUBDIR $$subdir; \
+	                 $(MAKE) $(PRINT_DIR) -C $$subdir
+	export V
+	export QUIET
+	export QUIET_GEN
+	export QUIET_SUBDIR0
+	export QUIET_SUBDIR1
+endif
+endif
+
+all:: gitweb.cgi
+
+ifdef JSMIN
+FILES=gitweb.cgi gitweb.min.js
+gitweb.cgi: gitweb.perl gitweb.min.js
+else # !JSMIN
+FILES=gitweb.cgi
+gitweb.cgi: gitweb.perl
+endif # JSMIN
+
+gitweb.cgi:
+	$(QUIET_GEN)$(RM) $@ $@+ && \
+	sed -e '1s|#!.*perl|#!$(PERL_PATH_SQ)|' \
+	    -e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \
+	    -e 's|++GIT_BINDIR++|$(bindir)|g' \
+	    -e 's|++GITWEB_CONFIG++|$(GITWEB_CONFIG)|g' \
+	    -e 's|++GITWEB_CONFIG_SYSTEM++|$(GITWEB_CONFIG_SYSTEM)|g' \
+	    -e 's|++GITWEB_HOME_LINK_STR++|$(GITWEB_HOME_LINK_STR)|g' \
+	    -e 's|++GITWEB_SITENAME++|$(GITWEB_SITENAME)|g' \
+	    -e 's|++GITWEB_PROJECTROOT++|$(GITWEB_PROJECTROOT)|g' \
+	    -e 's|"++GITWEB_PROJECT_MAXDEPTH++"|$(GITWEB_PROJECT_MAXDEPTH)|g' \
+	    -e 's|++GITWEB_EXPORT_OK++|$(GITWEB_EXPORT_OK)|g' \
+	    -e 's|++GITWEB_STRICT_EXPORT++|$(GITWEB_STRICT_EXPORT)|g' \
+	    -e 's|++GITWEB_BASE_URL++|$(GITWEB_BASE_URL)|g' \
+	    -e 's|++GITWEB_LIST++|$(GITWEB_LIST)|g' \
+	    -e 's|++GITWEB_HOMETEXT++|$(GITWEB_HOMETEXT)|g' \
+	    -e 's|++GITWEB_CSS++|$(GITWEB_CSS)|g' \
+	    -e 's|++GITWEB_LOGO++|$(GITWEB_LOGO)|g' \
+	    -e 's|++GITWEB_FAVICON++|$(GITWEB_FAVICON)|g' \
+	    -e 's|++GITWEB_JS++|$(GITWEB_JS)|g' \
+	    -e 's|++GITWEB_SITE_HEADER++|$(GITWEB_SITE_HEADER)|g' \
+	    -e 's|++GITWEB_SITE_FOOTER++|$(GITWEB_SITE_FOOTER)|g' \
+	    $< >$@+ && \
+	chmod +x $@+ && \
+	mv $@+ $@
+
+ifdef JSMIN
+gitweb.min.js: gitweb.js
+	$(QUIET_GEN)$(JSMIN) <$< >$@
+endif # JSMIN
+
+clean:
+	$(RM) $(FILES)
+
+.PHONY: all clean .FORCE-GIT-VERSION-FILE
-- 
1.6.5.2

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 5/9] gitweb: add a get function to compliment print_local_time
  2010-01-14  1:23       ` [PATCH 4/9] gitweb: Makefile improvements John 'Warthog9' Hawley
@ 2010-01-14  1:23         ` John 'Warthog9' Hawley
  2010-01-14  1:23           ` [PATCH 6/9] gitweb: add a get function to compliment print_sort_th John 'Warthog9' Hawley
  2010-01-23 11:13           ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time Jakub Narebski
  0 siblings, 2 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-01-14  1:23 UTC (permalink / raw)
  To: git

This adds a get function for print_local_time so that the basic
function can be used outside of their straight printing operation.
---
 gitweb/gitweb.perl |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index b41bc33..d38aad6 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3537,14 +3537,21 @@ sub git_print_header_div {
 }
 
 sub print_local_time {
+	print get_local_time(@_);
+}
+
+sub get_local_time {
+	my $localtime = "";
 	my %date = @_;
 	if ($date{'hour_local'} < 6) {
-		printf(" (<span class=\"atnight\">%02d:%02d</span> %s)",
+		$localtime .= sprintf(" (<span class=\"atnight\">%02d:%02d</span> %s)",
 			$date{'hour_local'}, $date{'minute_local'}, $date{'tz_local'});
 	} else {
-		printf(" (%02d:%02d %s)",
+		$localtime .= sprintf(" (%02d:%02d %s)",
 			$date{'hour_local'}, $date{'minute_local'}, $date{'tz_local'});
 	}
+
+	return $localtime
 }
 
 # Outputs the author name and date in long form
-- 
1.6.5.2

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 6/9] gitweb: add a get function to compliment print_sort_th
  2010-01-14  1:23         ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time John 'Warthog9' Hawley
@ 2010-01-14  1:23           ` John 'Warthog9' Hawley
  2010-01-14  1:23             ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header John 'Warthog9' Hawley
  2010-01-23 11:13           ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time Jakub Narebski
  1 sibling, 1 reply; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-01-14  1:23 UTC (permalink / raw)
  To: git

This adds a get function for print_sort_th so that the basic
function can be used outside of their straight printing operation.
---
 gitweb/gitweb.perl |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index d38aad6..07fdeb5 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -4375,17 +4375,24 @@ sub fill_project_list_info {
 # print 'sort by' <th> element, generating 'sort by $name' replay link
 # if that order is not selected
 sub print_sort_th {
+	print get_sort_th(@_);
+}
+
+sub get_sort_th {
 	my ($name, $order, $header) = @_;
+	my $sortth = "";
 	$header ||= ucfirst($name);
 
 	if ($order eq $name) {
-		print "<th>$header</th>\n";
+		$sortth .= "<th>$header</th>\n";
 	} else {
-		print "<th>" .
+		$sortth .= "<th>" .
 		      $cgi->a({-href => href(-replay=>1, order=>$name),
 		               -class => "header"}, $header) .
 		      "</th>\n";
 	}
+
+	return $sortth;
 }
 
 sub git_project_list_body {
-- 
1.6.5.2

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header
  2010-01-14  1:23           ` [PATCH 6/9] gitweb: add a get function to compliment print_sort_th John 'Warthog9' Hawley
@ 2010-01-14  1:23             ` John 'Warthog9' Hawley
  2010-01-14  1:23               ` [PATCH 8/9] gitweb: Convert output to using indirect file handle John 'Warthog9' Hawley
  2010-01-15 23:49               ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header Jakub Narebski
  0 siblings, 2 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-01-14  1:23 UTC (permalink / raw)
  To: git

If $site_header is not defined you get extraneous errors in the web
logs:

[Wed Jan 13 16:55:42 2010] [error] [client ::1] [Wed Jan 13 16:55:42 2010] gitweb.cgi: Use of uninitialized value $site_header in -f at /var/www/gitweb/gitweb.cgi line 3287., referer: http://git/gitweb.cgi

for example.  This ensures that the variable is defined before trying to use it.
---
 gitweb/gitweb.perl |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 07fdeb5..c4a177d 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3277,7 +3277,7 @@ EOF
 	print "</head>\n" .
 	      "<body>\n";
 
-	if (-f $site_header) {
+	if ($site_header && -f $site_header) {
 		insert_file($site_header);
 	}
 
-- 
1.6.5.2

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 8/9] gitweb: Convert output to using indirect file handle
  2010-01-14  1:23             ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header John 'Warthog9' Hawley
@ 2010-01-14  1:23               ` John 'Warthog9' Hawley
  2010-01-14  1:23                 ` [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
  2010-01-16  0:43                 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle Jakub Narebski
  2010-01-15 23:49               ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header Jakub Narebski
  1 sibling, 2 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-01-14  1:23 UTC (permalink / raw)
  To: git

This converts the output handling of gitweb to using an indirect
file handle.  This is in preparation to add the caching layer.  This
is a slight modification to the way I was originally doing it by
passing the output around.  This should be a nop and this shouldn't
change the behavior of gitweb.  This does leave error reporting
functions (die_error specifically) continuing to output directly
as I want to garauntee those will report their errors regardless of
what may be going on with respect to the rest of the output.
---
 gitweb/gitweb.perl |  880 ++++++++++++++++++++++++++--------------------------
 1 files changed, 448 insertions(+), 432 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c4a177d..8bb323c 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -450,6 +450,13 @@ our %feature = (
 		'default' => [0]},
 );
 
+# Basic file handler for all of gitweb, there are two of them.  The first
+# is the basic text/html file handler which is used for everything other
+# then the binary files, that uses a separate file handler though
+# these are both set to STDOUT for the time being.
+our $output_handler = *STDOUT;
+our $output_handler_bin = *STDOUT;
+
 sub gitweb_get_feature {
 	my ($name) = @_;
 	return unless exists $feature{$name};
@@ -3081,7 +3088,7 @@ sub insert_file {
 	my $filename = shift;
 
 	open my $fd, '<', $filename;
-	print map { to_utf8($_) } <$fd>;
+	print {$output_handler} map { to_utf8($_) } <$fd>;
 	close $fd;
 }
 
@@ -3198,10 +3205,10 @@ sub git_header_html {
 	} else {
 		$content_type = 'text/html';
 	}
-	print $cgi->header(-type=>$content_type, -charset => 'utf-8',
+	print {$output_handler} $cgi->header(-type=>$content_type, -charset => 'utf-8',
 	                   -status=> $status, -expires => $expires);
 	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
-	print <<EOF;
+	print {$output_handler} <<EOF;
 <?xml version="1.0" encoding="utf-8"?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
@@ -3216,16 +3223,16 @@ EOF
 	# the stylesheet, favicon etc urls won't work correctly with path_info
 	# unless we set the appropriate base URL
 	if ($ENV{'PATH_INFO'}) {
-		print "<base href=\"".esc_url($base_url)."\" />\n";
+		print {$output_handler} "<base href=\"".esc_url($base_url)."\" />\n";
 	}
 	# print out each stylesheet that exist, providing backwards capability
 	# for those people who defined $stylesheet in a config file
 	if (defined $stylesheet) {
-		print '<link rel="stylesheet" type="text/css" href="'.$stylesheet.'"/>'."\n";
+		print {$output_handler} '<link rel="stylesheet" type="text/css" href="'.$stylesheet.'"/>'."\n";
 	} else {
 		foreach my $stylesheet (@stylesheets) {
 			next unless $stylesheet;
-			print '<link rel="stylesheet" type="text/css" href="'.$stylesheet.'"/>'."\n";
+			print {$output_handler} '<link rel="stylesheet" type="text/css" href="'.$stylesheet.'"/>'."\n";
 		}
 	}
 	if (defined $project) {
@@ -3244,7 +3251,7 @@ EOF
 
 			$href_params{'action'} = $type;
 			$link_attr{'-href'} = href(%href_params);
-			print "<link ".
+			print {$output_handler} "<link ".
 			      "rel=\"$link_attr{'-rel'}\" ".
 			      "title=\"$link_attr{'-title'}\" ".
 			      "href=\"$link_attr{'-href'}\" ".
@@ -3254,7 +3261,7 @@ EOF
 			$href_params{'extra_options'} = '--no-merges';
 			$link_attr{'-href'} = href(%href_params);
 			$link_attr{'-title'} .= ' (no merges)';
-			print "<link ".
+			print {$output_handler} "<link ".
 			      "rel=\"$link_attr{'-rel'}\" ".
 			      "title=\"$link_attr{'-title'}\" ".
 			      "href=\"$link_attr{'-href'}\" ".
@@ -3263,37 +3270,37 @@ EOF
 		}
 
 	} else {
-		printf('<link rel="alternate" title="%s projects list" '.
+		printf( {$output_handler} '<link rel="alternate" title="%s projects list" '.
 		       'href="%s" type="text/plain; charset=utf-8" />'."\n",
 		       $site_name, href(project=>undef, action=>"project_index"));
-		printf('<link rel="alternate" title="%s projects feeds" '.
+		printf( {$output_handler} '<link rel="alternate" title="%s projects feeds" '.
 		       'href="%s" type="text/x-opml" />'."\n",
 		       $site_name, href(project=>undef, action=>"opml"));
 	}
 	if (defined $favicon) {
-		print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
+		print {$output_handler} qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
 	}
 
-	print "</head>\n" .
+	print {$output_handler} "</head>\n" .
 	      "<body>\n";
 
 	if ($site_header && -f $site_header) {
 		insert_file($site_header);
 	}
 
-	print "<div class=\"page_header\">\n" .
+	print {$output_handler} "<div class=\"page_header\">\n" .
 	      $cgi->a({-href => esc_url($logo_url),
 	               -title => $logo_label},
 	              qq(<img src="$logo" width="72" height="27" alt="git" class="logo"/>));
-	print $cgi->a({-href => esc_url($home_link)}, $home_link_str) . " / ";
+	print {$output_handler} $cgi->a({-href => esc_url($home_link)}, $home_link_str) . " / ";
 	if (defined $project) {
-		print $cgi->a({-href => href(action=>"summary")}, esc_html($project));
+		print {$output_handler} $cgi->a({-href => href(action=>"summary")}, esc_html($project));
 		if (defined $action) {
-			print " / $action";
+			print {$output_handler} " / $action";
 		}
-		print "\n";
+		print {$output_handler} "\n";
 	}
-	print "</div>\n";
+	print {$output_handler} "</div>\n";
 
 	my $have_search = gitweb_check_feature('search');
 	if (defined $project && $have_search) {
@@ -3313,7 +3320,7 @@ EOF
 		if ($use_pathinfo) {
 			$action .= "/".esc_url($project);
 		}
-		print $cgi->startform(-method => "get", -action => $action) .
+		print {$output_handler} $cgi->startform(-method => "get", -action => $action) .
 		      "<div class=\"search\">\n" .
 		      (!$use_pathinfo &&
 		      $cgi->input({-name=>"p", -value=>$project, -type=>"hidden"}) . "\n") .
@@ -3336,11 +3343,11 @@ EOF
 sub git_footer_html {
 	my $feed_class = 'rss_logo';
 
-	print "<div class=\"page_footer\">\n";
+	print {$output_handler} "<div class=\"page_footer\">\n";
 	if (defined $project) {
 		my $descr = git_get_project_description($project);
 		if (defined $descr) {
-			print "<div class=\"page_footer_text\">" . esc_html($descr) . "</div>\n";
+			print {$output_handler} "<div class=\"page_footer_text\">" . esc_html($descr) . "</div>\n";
 		}
 
 		my %href_params = get_feed_info();
@@ -3351,22 +3358,22 @@ sub git_footer_html {
 
 		foreach my $format qw(RSS Atom) {
 			$href_params{'action'} = lc($format);
-			print $cgi->a({-href => href(%href_params),
+			print {$output_handler} $cgi->a({-href => href(%href_params),
 			              -title => "$href_params{'-title'} $format feed",
 			              -class => $feed_class}, $format)."\n";
 		}
 
 	} else {
-		print $cgi->a({-href => href(project=>undef, action=>"opml"),
+		print {$output_handler} $cgi->a({-href => href(project=>undef, action=>"opml"),
 		              -class => $feed_class}, "OPML") . " ";
-		print $cgi->a({-href => href(project=>undef, action=>"project_index"),
+		print {$output_handler} $cgi->a({-href => href(project=>undef, action=>"project_index"),
 		              -class => $feed_class}, "TXT") . "\n";
 	}
-	print "</div>\n"; # class="page_footer"
+	print {$output_handler} "</div>\n"; # class="page_footer"
 
 	if (defined $t0 && gitweb_check_feature('timed')) {
-		print "<div id=\"generating_info\">\n";
-		print 'This page took '.
+		print {$output_handler} "<div id=\"generating_info\">\n";
+		print {$output_handler} 'This page took '.
 		      '<span id="generating_time" class="time_span">'.
 		      Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
 		      ' seconds </span>'.
@@ -3375,26 +3382,26 @@ sub git_footer_html {
 		      $number_of_git_cmds.
 		      '</span> git commands '.
 		      " to generate.\n";
-		print "</div>\n"; # class="page_footer"
+		print {$output_handler} "</div>\n"; # class="page_footer"
 	}
 
 	if (-f $site_footer) {
 		insert_file($site_footer);
 	}
 
-	print qq!<script type="text/javascript" src="$javascript"></script>\n!;
+	print {$output_handler} qq!<script type="text/javascript" src="$javascript"></script>\n!;
 	if ($action eq 'blame_incremental') {
-		print qq!<script type="text/javascript">\n!.
+		print {$output_handler} qq!<script type="text/javascript">\n!.
 		      qq!startBlame("!. href(action=>"blame_data", -replay=>1) .qq!",\n!.
 		      qq!           "!. href() .qq!");\n!.
 		      qq!</script>\n!;
 	} elsif (gitweb_check_feature('javascript-actions')) {
-		print qq!<script type="text/javascript">\n!.
+		print {$output_handler} qq!<script type="text/javascript">\n!.
 		      qq!window.onload = fixLinks;\n!.
 		      qq!</script>\n!;
 	}
 
-	print "</body>\n" .
+	print {$output_handler} "</body>\n" .
 	      "</html>";
 }
 
@@ -3416,6 +3423,14 @@ sub die_error {
 	my $error = shift || "Internal server error";
 	my $extra = shift;
 
+	# The output handlers for die_error need to be reset to STDOUT
+	# so that half the message isn't being output to random and 
+	# half to STDOUT as expected.  This is mainly for the benefit
+	# of using git_header_html() and git_footer_html() since those
+	# internaly use the indirect print handler.
+	$output_handler = *STDOUT;
+	$output_handler_bin = *STDOUT;
+
 	my %http_responses = (
 		400 => '400 Bad Request',
 		403 => '403 Forbidden',
@@ -3484,12 +3499,12 @@ sub git_print_page_nav {
 		$arg{$label}{'_href'} = $link;
 	}
 
-	print "<div class=\"page_nav\">\n" .
+	print {$output_handler} "<div class=\"page_nav\">\n" .
 		(join " | ",
 		 map { $_ eq $current ?
 		       $_ : $cgi->a({-href => ($arg{$_}{_href} ? $arg{$_}{_href} : href(%{$arg{$_}}))}, "$_")
 		 } @navs);
-	print "<br/>\n$extra<br/>\n" .
+	print {$output_handler} "<br/>\n$extra<br/>\n" .
 	      "</div>\n";
 }
 
@@ -3530,14 +3545,14 @@ sub git_print_header_div {
 	$args{'hash'} = $hash if $hash;
 	$args{'hash_base'} = $hash_base if $hash_base;
 
-	print "<div class=\"header\">\n" .
+	print {$output_handler} "<div class=\"header\">\n" .
 	      $cgi->a({-href => href(%args), -class => "title"},
 	      $title ? $title : $action) .
 	      "\n</div>\n";
 }
 
 sub print_local_time {
-	print get_local_time(@_);
+	print {$output_handler} get_local_time(@_);
 }
 
 sub get_local_time {
@@ -3562,11 +3577,11 @@ sub git_print_authorship {
 	my $author = $co->{'author_name'};
 
 	my %ad = parse_date($co->{'author_epoch'}, $co->{'author_tz'});
-	print "<$tag class=\"author_date\">" .
+	print {$output_handler} "<$tag class=\"author_date\">" .
 	      format_search_author($author, "author", esc_html($author)) .
 	      " [$ad{'rfc2822'}";
-	print_local_time(%ad) if ($opts{-localtime});
-	print "]" . git_get_avatar($co->{'author_email'}, -pad_before => 1)
+	print {$output_handler} get_local_time(%ad) if ($opts{-localtime});
+	print {$output_handler} "]" . git_get_avatar($co->{'author_email'}, -pad_before => 1)
 		  . "</$tag>\n";
 }
 
@@ -3582,7 +3597,7 @@ sub git_print_authorship_rows {
 	@people = ('author', 'committer') unless @people;
 	foreach my $who (@people) {
 		my %wd = parse_date($co->{"${who}_epoch"}, $co->{"${who}_tz"});
-		print "<tr><td>$who</td><td>" .
+		print {$output_handler} "<tr><td>$who</td><td>" .
 		      format_search_author($co->{"${who}_name"}, $who,
 			       esc_html($co->{"${who}_name"})) . " " .
 		      format_search_author($co->{"${who}_email"}, $who,
@@ -3592,8 +3607,8 @@ sub git_print_authorship_rows {
 		      "</td></tr>\n" .
 		      "<tr>" .
 		      "<td></td><td> $wd{'rfc2822'}";
-		print_local_time(%wd);
-		print "</td>" .
+		print {$output_handler} get_local_time(%wd);
+		print {$output_handler} "</td>" .
 		      "</tr>\n";
 	}
 }
@@ -3603,11 +3618,10 @@ sub git_print_page_path {
 	my $type = shift;
 	my $hb = shift;
 
-
-	print "<div class=\"page_path\">";
-	print $cgi->a({-href => href(action=>"tree", hash_base=>$hb),
+	print {$output_handler} "<div class=\"page_path\">";
+	print {$output_handler} $cgi->a({-href => href(action=>"tree", hash_base=>$hb),
 	              -title => 'tree root'}, to_utf8("[$project]"));
-	print " / ";
+	print {$output_handler} " / ";
 	if (defined $name) {
 		my @dirname = split '/', $name;
 		my $basename = pop @dirname;
@@ -3615,25 +3629,25 @@ sub git_print_page_path {
 
 		foreach my $dir (@dirname) {
 			$fullname .= ($fullname ? '/' : '') . $dir;
-			print $cgi->a({-href => href(action=>"tree", file_name=>$fullname,
+			print {$output_handler} $cgi->a({-href => href(action=>"tree", file_name=>$fullname,
 			                             hash_base=>$hb),
 			              -title => $fullname}, esc_path($dir));
-			print " / ";
+			print {$output_handler} " / ";
 		}
 		if (defined $type && $type eq 'blob') {
-			print $cgi->a({-href => href(action=>"blob_plain", file_name=>$file_name,
+			print {$output_handler} $cgi->a({-href => href(action=>"blob_plain", file_name=>$file_name,
 			                             hash_base=>$hb),
 			              -title => $name}, esc_path($basename));
 		} elsif (defined $type && $type eq 'tree') {
-			print $cgi->a({-href => href(action=>"tree", file_name=>$file_name,
+			print {$output_handler} $cgi->a({-href => href(action=>"tree", file_name=>$file_name,
 			                             hash_base=>$hb),
 			              -title => $name}, esc_path($basename));
-			print " / ";
+			print {$output_handler} " / ";
 		} else {
-			print esc_path($basename);
+			print {$output_handler} esc_path($basename);
 		}
 	}
-	print "<br/></div>\n";
+	print {$output_handler} "<br/></div>\n";
 }
 
 sub git_print_log {
@@ -3657,7 +3671,7 @@ sub git_print_log {
 			$signoff = 1;
 			$empty = 0;
 			if (! $opts{'-remove_signoff'}) {
-				print "<span class=\"signoff\">" . esc_html($line) . "</span><br/>\n";
+				print {$output_handler} "<span class=\"signoff\">" . esc_html($line) . "</span><br/>\n";
 				next;
 			} else {
 				# remove signoff lines
@@ -3676,12 +3690,12 @@ sub git_print_log {
 			$empty = 0;
 		}
 
-		print format_log_line_html($line) . "<br/>\n";
+		print {$output_handler} format_log_line_html($line) . "<br/>\n";
 	}
 
 	if ($opts{'-final_empty_line'}) {
 		# end with single empty line
-		print "<br/>\n" unless $empty;
+		print {$output_handler} "<br/>\n" unless $empty;
 	}
 }
 
@@ -3754,12 +3768,12 @@ sub git_print_tree_entry {
 	# the mode of the entry, list is the name of the entry, an href,
 	# and link is the action links of the entry.
 
-	print "<td class=\"mode\">" . mode_str($t->{'mode'}) . "</td>\n";
+	print {$output_handler} "<td class=\"mode\">" . mode_str($t->{'mode'}) . "</td>\n";
 	if (exists $t->{'size'}) {
-		print "<td class=\"size\">$t->{'size'}</td>\n";
+		print {$output_handler} "<td class=\"size\">$t->{'size'}</td>\n";
 	}
 	if ($t->{'type'} eq "blob") {
-		print "<td class=\"list\">" .
+		print {$output_handler} "<td class=\"list\">" .
 			$cgi->a({-href => href(action=>"blob", hash=>$t->{'hash'},
 			                       file_name=>"$basedir$t->{'name'}", %base_key),
 			        -class => "list"}, esc_path($t->{'name'}));
@@ -3768,71 +3782,71 @@ sub git_print_tree_entry {
 			if ($link_target) {
 				my $norm_target = normalize_link_target($link_target, $basedir);
 				if (defined $norm_target) {
-					print " -> " .
+					print {$output_handler} " -> " .
 					      $cgi->a({-href => href(action=>"object", hash_base=>$hash_base,
 					                             file_name=>$norm_target),
 					               -title => $norm_target}, esc_path($link_target));
 				} else {
-					print " -> " . esc_path($link_target);
+					print {$output_handler} " -> " . esc_path($link_target);
 				}
 			}
 		}
-		print "</td>\n";
-		print "<td class=\"link\">";
-		print $cgi->a({-href => href(action=>"blob", hash=>$t->{'hash'},
+		print {$output_handler} "</td>\n";
+		print {$output_handler} "<td class=\"link\">";
+		print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$t->{'hash'},
 		                             file_name=>"$basedir$t->{'name'}", %base_key)},
 		              "blob");
 		if ($have_blame) {
-			print " | " .
+			print {$output_handler} " | " .
 			      $cgi->a({-href => href(action=>"blame", hash=>$t->{'hash'},
 			                             file_name=>"$basedir$t->{'name'}", %base_key)},
 			              "blame");
 		}
 		if (defined $hash_base) {
-			print " | " .
+			print {$output_handler} " | " .
 			      $cgi->a({-href => href(action=>"history", hash_base=>$hash_base,
 			                             hash=>$t->{'hash'}, file_name=>"$basedir$t->{'name'}")},
 			              "history");
 		}
-		print " | " .
+		print {$output_handler} " | " .
 			$cgi->a({-href => href(action=>"blob_plain", hash_base=>$hash_base,
 			                       file_name=>"$basedir$t->{'name'}")},
 			        "raw");
-		print "</td>\n";
+		print {$output_handler} "</td>\n";
 
 	} elsif ($t->{'type'} eq "tree") {
-		print "<td class=\"list\">";
-		print $cgi->a({-href => href(action=>"tree", hash=>$t->{'hash'},
+		print {$output_handler} "<td class=\"list\">";
+		print {$output_handler} $cgi->a({-href => href(action=>"tree", hash=>$t->{'hash'},
 		                             file_name=>"$basedir$t->{'name'}",
 		                             %base_key)},
 		              esc_path($t->{'name'}));
-		print "</td>\n";
-		print "<td class=\"link\">";
-		print $cgi->a({-href => href(action=>"tree", hash=>$t->{'hash'},
+		print {$output_handler} "</td>\n";
+		print {$output_handler} "<td class=\"link\">";
+		print {$output_handler} $cgi->a({-href => href(action=>"tree", hash=>$t->{'hash'},
 		                             file_name=>"$basedir$t->{'name'}",
 		                             %base_key)},
 		              "tree");
 		if (defined $hash_base) {
-			print " | " .
+			print {$output_handler} " | " .
 			      $cgi->a({-href => href(action=>"history", hash_base=>$hash_base,
 			                             file_name=>"$basedir$t->{'name'}")},
 			              "history");
 		}
-		print "</td>\n";
+		print {$output_handler} "</td>\n";
 	} else {
 		# unknown object: we can only present history for it
 		# (this includes 'commit' object, i.e. submodule support)
-		print "<td class=\"list\">" .
+		print {$output_handler} "<td class=\"list\">" .
 		      esc_path($t->{'name'}) .
 		      "</td>\n";
-		print "<td class=\"link\">";
+		print {$output_handler} "<td class=\"link\">";
 		if (defined $hash_base) {
-			print $cgi->a({-href => href(action=>"history",
+			print {$output_handler} $cgi->a({-href => href(action=>"history",
 			                             hash_base=>$hash_base,
 			                             file_name=>"$basedir$t->{'name'}")},
 			              "history");
 		}
-		print "</td>\n";
+		print {$output_handler} "</td>\n";
 	}
 }
 
@@ -3879,13 +3893,13 @@ sub git_difftree_body {
 	my ($difftree, $hash, @parents) = @_;
 	my ($parent) = $parents[0];
 	my $have_blame = gitweb_check_feature('blame');
-	print "<div class=\"list_head\">\n";
+	print {$output_handler} "<div class=\"list_head\">\n";
 	if ($#{$difftree} > 10) {
-		print(($#{$difftree} + 1) . " files changed:\n");
+		print {$output_handler} (($#{$difftree} + 1) . " files changed:\n");
 	}
-	print "</div>\n";
+	print {$output_handler} "</div>\n";
 
-	print "<table class=\"" .
+	print {$output_handler} "<table class=\"" .
 	      (@parents > 1 ? "combined " : "") .
 	      "diff_tree\">\n";
 
@@ -3893,11 +3907,11 @@ sub git_difftree_body {
 	my $has_header = @$difftree && @parents > 1 && $action eq 'commitdiff';
 	if ($has_header) {
 		# table header
-		print "<thead><tr>\n" .
+		print {$output_handler} "<thead><tr>\n" .
 		       "<th></th><th></th>\n"; # filename, patchN link
 		for (my $i = 0; $i < @parents; $i++) {
 			my $par = $parents[$i];
-			print "<th>" .
+			print {$output_handler} "<th>" .
 			      $cgi->a({-href => href(action=>"commitdiff",
 			                             hash=>$hash, hash_parent=>$par),
 			               -title => 'commitdiff to parent number ' .
@@ -3905,7 +3919,7 @@ sub git_difftree_body {
 			              $i+1) .
 			      "&nbsp;</th>\n";
 		}
-		print "</tr></thead>\n<tbody>\n";
+		print {$output_handler} "</tr></thead>\n<tbody>\n";
 	}
 
 	my $alternate = 1;
@@ -3914,9 +3928,9 @@ sub git_difftree_body {
 		my $diff = parsed_difftree_line($line);
 
 		if ($alternate) {
-			print "<tr class=\"dark\">\n";
+			print {$output_handler} "<tr class=\"dark\">\n";
 		} else {
-			print "<tr class=\"light\">\n";
+			print {$output_handler} "<tr class=\"light\">\n";
 		}
 		$alternate ^= 1;
 
@@ -3927,14 +3941,14 @@ sub git_difftree_body {
 
 			if (!is_deleted($diff)) {
 				# file exists in the result (child) commit
-				print "<td>" .
+				print {$output_handler} "<td>" .
 				      $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'},
 				                             file_name=>$diff->{'to_file'},
 				                             hash_base=>$hash),
 				              -class => "list"}, esc_path($diff->{'to_file'})) .
 				      "</td>\n";
 			} else {
-				print "<td>" .
+				print {$output_handler} "<td>" .
 				      esc_path($diff->{'to_file'}) .
 				      "</td>\n";
 			}
@@ -3942,7 +3956,7 @@ sub git_difftree_body {
 			if ($action eq 'commitdiff') {
 				# link to patch
 				$patchno++;
-				print "<td class=\"link\">" .
+				print {$output_handler} "<td class=\"link\">" .
 				      $cgi->a({-href => "#patch$patchno"}, "patch") .
 				      " | " .
 				      "</td>\n";
@@ -3960,9 +3974,9 @@ sub git_difftree_body {
 				$not_deleted ||= ($status ne 'D');
 
 				if ($status eq 'A') {
-					print "<td  class=\"link\" align=\"right\"> | </td>\n";
+					print {$output_handler} "<td  class=\"link\" align=\"right\"> | </td>\n";
 				} elsif ($status eq 'D') {
-					print "<td class=\"link\">" .
+					print {$output_handler} "<td class=\"link\">" .
 					      $cgi->a({-href => href(action=>"blob",
 					                             hash_base=>$hash,
 					                             hash=>$from_hash,
@@ -3971,11 +3985,11 @@ sub git_difftree_body {
 					      " | </td>\n";
 				} else {
 					if ($diff->{'to_id'} eq $from_hash) {
-						print "<td class=\"link nochange\">";
+						print {$output_handler} "<td class=\"link nochange\">";
 					} else {
-						print "<td class=\"link\">";
+						print {$output_handler} "<td class=\"link\">";
 					}
-					print $cgi->a({-href => href(action=>"blobdiff",
+					print {$output_handler} $cgi->a({-href => href(action=>"blobdiff",
 					                             hash=>$diff->{'to_id'},
 					                             hash_parent=>$from_hash,
 					                             hash_base=>$hash,
@@ -3987,24 +4001,24 @@ sub git_difftree_body {
 				}
 			}
 
-			print "<td class=\"link\">";
+			print {$output_handler} "<td class=\"link\">";
 			if ($not_deleted) {
-				print $cgi->a({-href => href(action=>"blob",
+				print {$output_handler} $cgi->a({-href => href(action=>"blob",
 				                             hash=>$diff->{'to_id'},
 				                             file_name=>$diff->{'to_file'},
 				                             hash_base=>$hash)},
 				              "blob");
-				print " | " if ($has_history);
+				print {$output_handler} " | " if ($has_history);
 			}
 			if ($has_history) {
-				print $cgi->a({-href => href(action=>"history",
+				print {$output_handler} $cgi->a({-href => href(action=>"history",
 				                             file_name=>$diff->{'to_file'},
 				                             hash_base=>$hash)},
 				              "history");
 			}
-			print "</td>\n";
+			print {$output_handler} "</td>\n";
 
-			print "</tr>\n";
+			print {$output_handler} "</tr>\n";
 			next; # instead of 'else' clause, to avoid extra indent
 		}
 		# else ordinary diff
@@ -4030,51 +4044,51 @@ sub git_difftree_body {
 			my $mode_chng = "<span class=\"file_status new\">[new $to_file_type";
 			$mode_chng   .= " with mode: $to_mode_str" if $to_mode_str;
 			$mode_chng   .= "]</span>";
-			print "<td>";
-			print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'},
+			print {$output_handler} "<td>";
+			print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'},
 			                             hash_base=>$hash, file_name=>$diff->{'file'}),
 			              -class => "list"}, esc_path($diff->{'file'}));
-			print "</td>\n";
-			print "<td>$mode_chng</td>\n";
-			print "<td class=\"link\">";
+			print {$output_handler} "</td>\n";
+			print {$output_handler} "<td>$mode_chng</td>\n";
+			print {$output_handler} "<td class=\"link\">";
 			if ($action eq 'commitdiff') {
 				# link to patch
 				$patchno++;
-				print $cgi->a({-href => "#patch$patchno"}, "patch");
-				print " | ";
+				print {$output_handler} $cgi->a({-href => "#patch$patchno"}, "patch");
+				print {$output_handler} " | ";
 			}
-			print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'},
+			print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'},
 			                             hash_base=>$hash, file_name=>$diff->{'file'})},
 			              "blob");
-			print "</td>\n";
+			print {$output_handler} "</td>\n";
 
 		} elsif ($diff->{'status'} eq "D") { # deleted
 			my $mode_chng = "<span class=\"file_status deleted\">[deleted $from_file_type]</span>";
-			print "<td>";
-			print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'from_id'},
+			print {$output_handler} "<td>";
+			print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'from_id'},
 			                             hash_base=>$parent, file_name=>$diff->{'file'}),
 			               -class => "list"}, esc_path($diff->{'file'}));
-			print "</td>\n";
-			print "<td>$mode_chng</td>\n";
-			print "<td class=\"link\">";
+			print {$output_handler} "</td>\n";
+			print {$output_handler} "<td>$mode_chng</td>\n";
+			print {$output_handler} "<td class=\"link\">";
 			if ($action eq 'commitdiff') {
 				# link to patch
 				$patchno++;
-				print $cgi->a({-href => "#patch$patchno"}, "patch");
-				print " | ";
+				print {$output_handler} $cgi->a({-href => "#patch$patchno"}, "patch");
+				print {$output_handler} " | ";
 			}
-			print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'from_id'},
+			print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'from_id'},
 			                             hash_base=>$parent, file_name=>$diff->{'file'})},
 			              "blob") . " | ";
 			if ($have_blame) {
-				print $cgi->a({-href => href(action=>"blame", hash_base=>$parent,
+				print {$output_handler} $cgi->a({-href => href(action=>"blame", hash_base=>$parent,
 				                             file_name=>$diff->{'file'})},
 				              "blame") . " | ";
 			}
-			print $cgi->a({-href => href(action=>"history", hash_base=>$parent,
+			print {$output_handler} $cgi->a({-href => href(action=>"history", hash_base=>$parent,
 			                             file_name=>$diff->{'file'})},
 			              "history");
-			print "</td>\n";
+			print {$output_handler} "</td>\n";
 
 		} elsif ($diff->{'status'} eq "M" || $diff->{'status'} eq "T") { # modified, or type changed
 			my $mode_chnge = "";
@@ -4092,39 +4106,39 @@ sub git_difftree_body {
 				}
 				$mode_chnge .= "]</span>\n";
 			}
-			print "<td>";
-			print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'},
+			print {$output_handler} "<td>";
+			print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'},
 			                             hash_base=>$hash, file_name=>$diff->{'file'}),
 			              -class => "list"}, esc_path($diff->{'file'}));
-			print "</td>\n";
-			print "<td>$mode_chnge</td>\n";
-			print "<td class=\"link\">";
+			print {$output_handler} "</td>\n";
+			print {$output_handler} "<td>$mode_chnge</td>\n";
+			print {$output_handler} "<td class=\"link\">";
 			if ($action eq 'commitdiff') {
 				# link to patch
 				$patchno++;
-				print $cgi->a({-href => "#patch$patchno"}, "patch") .
+				print {$output_handler} $cgi->a({-href => "#patch$patchno"}, "patch") .
 				      " | ";
 			} elsif ($diff->{'to_id'} ne $diff->{'from_id'}) {
 				# "commit" view and modified file (not onlu mode changed)
-				print $cgi->a({-href => href(action=>"blobdiff",
+				print {$output_handler} $cgi->a({-href => href(action=>"blobdiff",
 				                             hash=>$diff->{'to_id'}, hash_parent=>$diff->{'from_id'},
 				                             hash_base=>$hash, hash_parent_base=>$parent,
 				                             file_name=>$diff->{'file'})},
 				              "diff") .
 				      " | ";
 			}
-			print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'},
+			print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'},
 			                             hash_base=>$hash, file_name=>$diff->{'file'})},
 			               "blob") . " | ";
 			if ($have_blame) {
-				print $cgi->a({-href => href(action=>"blame", hash_base=>$hash,
+				print {$output_handler} $cgi->a({-href => href(action=>"blame", hash_base=>$hash,
 				                             file_name=>$diff->{'file'})},
 				              "blame") . " | ";
 			}
-			print $cgi->a({-href => href(action=>"history", hash_base=>$hash,
+			print {$output_handler} $cgi->a({-href => href(action=>"history", hash_base=>$hash,
 			                             file_name=>$diff->{'file'})},
 			              "history");
-			print "</td>\n";
+			print {$output_handler} "</td>\n";
 
 		} elsif ($diff->{'status'} eq "R" || $diff->{'status'} eq "C") { # renamed or copied
 			my %status_name = ('R' => 'moved', 'C' => 'copied');
@@ -4134,7 +4148,7 @@ sub git_difftree_body {
 				# mode also for directories, so we cannot use $to_mode_str
 				$mode_chng = sprintf(", mode: %04o", $to_mode_oct & 0777);
 			}
-			print "<td>" .
+			print {$output_handler} "<td>" .
 			      $cgi->a({-href => href(action=>"blob", hash_base=>$hash,
 			                             hash=>$diff->{'to_id'}, file_name=>$diff->{'to_file'}),
 			              -class => "list"}, esc_path($diff->{'to_file'})) . "</td>\n" .
@@ -4147,35 +4161,35 @@ sub git_difftree_body {
 			if ($action eq 'commitdiff') {
 				# link to patch
 				$patchno++;
-				print $cgi->a({-href => "#patch$patchno"}, "patch") .
+				print {$output_handler} $cgi->a({-href => "#patch$patchno"}, "patch") .
 				      " | ";
 			} elsif ($diff->{'to_id'} ne $diff->{'from_id'}) {
 				# "commit" view and modified file (not only pure rename or copy)
-				print $cgi->a({-href => href(action=>"blobdiff",
+				print {$output_handler} $cgi->a({-href => href(action=>"blobdiff",
 				                             hash=>$diff->{'to_id'}, hash_parent=>$diff->{'from_id'},
 				                             hash_base=>$hash, hash_parent_base=>$parent,
 				                             file_name=>$diff->{'to_file'}, file_parent=>$diff->{'from_file'})},
 				              "diff") .
 				      " | ";
 			}
-			print $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'},
+			print {$output_handler} $cgi->a({-href => href(action=>"blob", hash=>$diff->{'to_id'},
 			                             hash_base=>$parent, file_name=>$diff->{'to_file'})},
 			              "blob") . " | ";
 			if ($have_blame) {
-				print $cgi->a({-href => href(action=>"blame", hash_base=>$hash,
+				print {$output_handler} $cgi->a({-href => href(action=>"blame", hash_base=>$hash,
 				                             file_name=>$diff->{'to_file'})},
 				              "blame") . " | ";
 			}
-			print $cgi->a({-href => href(action=>"history", hash_base=>$hash,
+			print {$output_handler} $cgi->a({-href => href(action=>"history", hash_base=>$hash,
 			                            file_name=>$diff->{'to_file'})},
 			              "history");
-			print "</td>\n";
+			print {$output_handler} "</td>\n";
 
 		} # we should not encounter Unmerged (U) or Unknown (X) status
-		print "</tr>\n";
+		print {$output_handler} "</tr>\n";
 	}
-	print "</tbody>" if $has_header;
-	print "</table>\n";
+	print {$output_handler} "</tbody>" if $has_header;
+	print {$output_handler} "</table>\n";
 }
 
 sub git_patchset_body {
@@ -4190,7 +4204,7 @@ sub git_patchset_body {
 	my $to_name;
 	my (%from, %to);
 
-	print "<div class=\"patchset\">\n";
+	print {$output_handler} "<div class=\"patchset\">\n";
 
 	# skip to first patch
 	while ($patch_line = <$fd>) {
@@ -4218,7 +4232,7 @@ sub git_patchset_body {
 		# and parse raw git-diff line if needed
 		if (is_patch_split($diffinfo, { 'to_file' => $to_name })) {
 			# this is continuation of a split patch
-			print "<div class=\"patch cont\">\n";
+			print {$output_handler} "<div class=\"patch cont\">\n";
 		} else {
 			# advance raw git-diff output if needed
 			$patch_idx++ if defined $diffinfo;
@@ -4230,7 +4244,7 @@ sub git_patchset_body {
 			# find which patch (using pathname of result) we are at now;
 			if ($is_combined) {
 				while ($to_name ne $diffinfo->{'to_file'}) {
-					print "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n" .
+					print {$output_handler} "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n" .
 					      format_diff_cc_simplified($diffinfo, @hash_parents) .
 					      "</div>\n";  # class="patch"
 
@@ -4247,7 +4261,7 @@ sub git_patchset_body {
 
 			# this is first patch for raw difftree line with $patch_idx index
 			# we index @$difftree array from 0, but number patches from 1
-			print "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n";
+			print {$output_handler} "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n";
 		}
 
 		# git diff header
@@ -4255,25 +4269,25 @@ sub git_patchset_body {
 		#assert($patch_line !~ m!$/$!) if DEBUG; # is chomp-ed
 		$patch_number++;
 		# print "git diff" header
-		print format_git_diff_header_line($patch_line, $diffinfo,
+		print {$output_handler} format_git_diff_header_line($patch_line, $diffinfo,
 		                                  \%from, \%to);
 
 		# print extended diff header
-		print "<div class=\"diff extended_header\">\n";
+		print {$output_handler} "<div class=\"diff extended_header\">\n";
 	EXTENDED_HEADER:
 		while ($patch_line = <$fd>) {
 			chomp $patch_line;
 
 			last EXTENDED_HEADER if ($patch_line =~ m/^--- |^diff /);
 
-			print format_extended_diff_header_line($patch_line, $diffinfo,
+			print {$output_handler} format_extended_diff_header_line($patch_line, $diffinfo,
 			                                       \%from, \%to);
 		}
-		print "</div>\n"; # class="diff extended_header"
+		print {$output_handler} "</div>\n"; # class="diff extended_header"
 
 		# from-file/to-file diff header
 		if (! $patch_line) {
-			print "</div>\n"; # class="patch"
+			print {$output_handler} "</div>\n"; # class="patch"
 			last PATCH;
 		}
 		next PATCH if ($patch_line =~ m/^diff /);
@@ -4284,7 +4298,7 @@ sub git_patchset_body {
 		chomp $patch_line;
 		#assert($patch_line =~ m/^\+\+\+/) if DEBUG;
 
-		print format_diff_from_to_header($last_patch_line, $patch_line,
+		print {$output_handler} format_diff_from_to_header($last_patch_line, $patch_line,
 		                                 $diffinfo, \%from, \%to,
 		                                 @hash_parents);
 
@@ -4295,11 +4309,11 @@ sub git_patchset_body {
 
 			next PATCH if ($patch_line =~ m/^diff /);
 
-			print format_diff_line($patch_line, \%from, \%to);
+			print {$output_handler} format_diff_line($patch_line, \%from, \%to);
 		}
 
 	} continue {
-		print "</div>\n"; # class="patch"
+		print {$output_handler} "</div>\n"; # class="patch"
 	}
 
 	# for compact combined (--cc) format, with chunk and patch simpliciaction
@@ -4311,7 +4325,7 @@ sub git_patchset_body {
 		$diffinfo = parsed_difftree_line($difftree->[$patch_idx]);
 
 		# generate anchor for "patch" links in difftree / whatchanged part
-		print "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n" .
+		print {$output_handler} "<div class=\"patch\" id=\"patch". ($patch_idx+1) ."\">\n" .
 		      format_diff_cc_simplified($diffinfo, @hash_parents) .
 		      "</div>\n";  # class="patch"
 
@@ -4320,13 +4334,13 @@ sub git_patchset_body {
 
 	if ($patch_number == 0) {
 		if (@hash_parents > 1) {
-			print "<div class=\"diff nodifferences\">Trivial merge</div>\n";
+			print {$output_handler} "<div class=\"diff nodifferences\">Trivial merge</div>\n";
 		} else {
-			print "<div class=\"diff nodifferences\">No differences found</div>\n";
+			print {$output_handler} "<div class=\"diff nodifferences\">No differences found</div>\n";
 		}
 	}
 
-	print "</div>\n"; # class="patchset"
+	print {$output_handler} "</div>\n"; # class="patchset"
 }
 
 # . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
@@ -4375,7 +4389,7 @@ sub fill_project_list_info {
 # print 'sort by' <th> element, generating 'sort by $name' replay link
 # if that order is not selected
 sub print_sort_th {
-	print get_sort_th(@_);
+	print {$output_handler} get_sort_th(@_);
 }
 
 sub get_sort_th {
@@ -4428,20 +4442,20 @@ sub git_project_list_body {
 			}
 		}
 		my $cloud = git_populate_project_tagcloud(\%ctags);
-		print git_show_project_tagcloud($cloud, 64);
+		print {$output_handler} git_show_project_tagcloud($cloud, 64);
 	}
 
-	print "<table class=\"project_list\">\n";
+	print {$output_handler} "<table class=\"project_list\">\n";
 	unless ($no_header) {
-		print "<tr>\n";
+		print {$output_handler} "<tr>\n";
 		if ($check_forks) {
-			print "<th></th>\n";
+			print {$output_handler} "<th></th>\n";
 		}
-		print_sort_th('project', $order, 'Project');
-		print_sort_th('descr', $order, 'Description');
-		print_sort_th('owner', $order, 'Owner');
-		print_sort_th('age', $order, 'Last Change');
-		print "<th></th>\n" . # for links
+		print {$output_handler} get_sort_th('project', $order, 'Project');
+		print {$output_handler} get_sort_th('descr', $order, 'Description');
+		print {$output_handler} get_sort_th('owner', $order, 'Owner');
+		print {$output_handler} get_sort_th('age', $order, 'Last Change');
+		print {$output_handler} "<th></th>\n" . # for links
 		      "</tr>\n";
 	}
 	my $alternate = 1;
@@ -4461,26 +4475,26 @@ sub git_project_list_body {
 		}
 
 		if ($alternate) {
-			print "<tr class=\"dark\">\n";
+			print {$output_handler} "<tr class=\"dark\">\n";
 		} else {
-			print "<tr class=\"light\">\n";
+			print {$output_handler} "<tr class=\"light\">\n";
 		}
 		$alternate ^= 1;
 		if ($check_forks) {
-			print "<td>";
+			print {$output_handler} "<td>";
 			if ($pr->{'forks'}) {
-				print "<!-- $pr->{'forks'} -->\n";
-				print $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "+");
+				print {$output_handler} "<!-- $pr->{'forks'} -->\n";
+				print {$output_handler} $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "+");
 			}
-			print "</td>\n";
+			print {$output_handler} "</td>\n";
 		}
-		print "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
-		                        -class => "list"}, esc_html($pr->{'path'})) . "</td>\n" .
+		print {$output_handler} "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
+		                        -class => "list"}, esc_html($pr->{'path'})) ."</td>\n".
 		      "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
 		                        -class => "list", -title => $pr->{'descr_long'}},
 		                        esc_html($pr->{'descr'})) . "</td>\n" .
 		      "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n";
-		print "<td class=\"". age_class($pr->{'age'}) . "\">" .
+		print {$output_handler} "<td class=\"". age_class($pr->{'age'}) . "\">" .
 		      (defined $pr->{'age_string'} ? $pr->{'age_string'} : "No commits") . "</td>\n" .
 		      "<td class=\"link\">" .
 		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary")}, "summary")   . " | " .
@@ -4492,14 +4506,14 @@ sub git_project_list_body {
 		      "</tr>\n";
 	}
 	if (defined $extra) {
-		print "<tr>\n";
+		print {$output_handler} "<tr>\n";
 		if ($check_forks) {
-			print "<td></td>\n";
+			print {$output_handler} "<td></td>\n";
 		}
-		print "<td colspan=\"5\">$extra</td>\n" .
+		print {$output_handler} "<td colspan=\"5\">$extra</td>\n" .
 		      "</tr>\n";
 	}
-	print "</table>\n";
+	print {$output_handler} "</table>\n";
 }
 
 sub git_log_body {
@@ -4519,7 +4533,7 @@ sub git_log_body {
 		               "<span class=\"age\">$co{'age_string'}</span>" .
 		               esc_html($co{'title'}) . $ref,
 		               $commit);
-		print "<div class=\"title_text\">\n" .
+		print {$output_handler} "<div class=\"title_text\">\n" .
 		      "<div class=\"log_link\">\n" .
 		      $cgi->a({-href => href(action=>"commit", hash=>$commit)}, "commit") .
 		      " | " .
@@ -4529,16 +4543,16 @@ sub git_log_body {
 		      "<br/>\n" .
 		      "</div>\n";
 		      git_print_authorship(\%co, -tag => 'span');
-		      print "<br/>\n</div>\n";
+		      print {$output_handler} "<br/>\n</div>\n";
 
-		print "<div class=\"log_body\">\n";
+		print {$output_handler} "<div class=\"log_body\">\n";
 		git_print_log($co{'comment'}, -final_empty_line=> 1);
-		print "</div>\n";
+		print {$output_handler} "</div>\n";
 	}
 	if ($extra) {
-		print "<div class=\"page_nav\">\n";
-		print "$extra\n";
-		print "</div>\n";
+		print {$output_handler} "<div class=\"page_nav\">\n";
+		print {$output_handler} "$extra\n";
+		print {$output_handler} "</div>\n";
 	}
 }
 
@@ -4549,41 +4563,41 @@ sub git_shortlog_body {
 	$from = 0 unless defined $from;
 	$to = $#{$commitlist} if (!defined $to || $#{$commitlist} < $to);
 
-	print "<table class=\"shortlog\">\n";
+	print {$output_handler} "<table class=\"shortlog\">\n";
 	my $alternate = 1;
 	for (my $i = $from; $i <= $to; $i++) {
 		my %co = %{$commitlist->[$i]};
 		my $commit = $co{'id'};
 		my $ref = format_ref_marker($refs, $commit);
 		if ($alternate) {
-			print "<tr class=\"dark\">\n";
+			print {$output_handler} "<tr class=\"dark\">\n";
 		} else {
-			print "<tr class=\"light\">\n";
+			print {$output_handler} "<tr class=\"light\">\n";
 		}
 		$alternate ^= 1;
 		# git_summary() used print "<td><i>$co{'age_string'}</i></td>\n" .
-		print "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" .
+		print {$output_handler} "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" .
 		      format_author_html('td', \%co, 10) . "<td>";
-		print format_subject_html($co{'title'}, $co{'title_short'},
+		print {$output_handler} format_subject_html($co{'title'}, $co{'title_short'},
 		                          href(action=>"commit", hash=>$commit), $ref);
-		print "</td>\n" .
+		print {$output_handler} "</td>\n" .
 		      "<td class=\"link\">" .
 		      $cgi->a({-href => href(action=>"commit", hash=>$commit)}, "commit") . " | " .
 		      $cgi->a({-href => href(action=>"commitdiff", hash=>$commit)}, "commitdiff") . " | " .
 		      $cgi->a({-href => href(action=>"tree", hash=>$commit, hash_base=>$commit)}, "tree");
 		my $snapshot_links = format_snapshot_links($commit);
 		if (defined $snapshot_links) {
-			print " | " . $snapshot_links;
+			print {$output_handler} " | " . $snapshot_links;
 		}
-		print "</td>\n" .
+		print {$output_handler} "</td>\n" .
 		      "</tr>\n";
 	}
 	if (defined $extra) {
-		print "<tr>\n" .
+		print {$output_handler} "<tr>\n" .
 		      "<td colspan=\"4\">$extra</td>\n" .
 		      "</tr>\n";
 	}
-	print "</table>\n";
+	print {$output_handler} "</table>\n";
 }
 
 sub git_history_body {
@@ -4594,7 +4608,7 @@ sub git_history_body {
 	$from = 0 unless defined $from;
 	$to = $#{$commitlist} unless (defined $to && $to <= $#{$commitlist});
 
-	print "<table class=\"history\">\n";
+	print {$output_handler} "<table class=\"history\">\n";
 	my $alternate = 1;
 	for (my $i = $from; $i <= $to; $i++) {
 		my %co = %{$commitlist->[$i]};
@@ -4606,18 +4620,18 @@ sub git_history_body {
 		my $ref = format_ref_marker($refs, $commit);
 
 		if ($alternate) {
-			print "<tr class=\"dark\">\n";
+			print {$output_handler} "<tr class=\"dark\">\n";
 		} else {
-			print "<tr class=\"light\">\n";
+			print {$output_handler} "<tr class=\"light\">\n";
 		}
 		$alternate ^= 1;
-		print "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" .
+		print {$output_handler} "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" .
 	# shortlog:   format_author_html('td', \%co, 10)
 		      format_author_html('td', \%co, 15, 3) . "<td>";
 		# originally git_history used chop_str($co{'title'}, 50)
-		print format_subject_html($co{'title'}, $co{'title_short'},
+		print {$output_handler} format_subject_html($co{'title'}, $co{'title_short'},
 		                          href(action=>"commit", hash=>$commit), $ref);
-		print "</td>\n" .
+		print {$output_handler} "</td>\n" .
 		      "<td class=\"link\">" .
 		      $cgi->a({-href => href(action=>$ftype, hash_base=>$commit, file_name=>$file_name)}, $ftype) . " | " .
 		      $cgi->a({-href => href(action=>"commitdiff", hash=>$commit)}, "commitdiff");
@@ -4627,7 +4641,7 @@ sub git_history_body {
 			my $blob_parent  = git_get_hash_by_path($commit, $file_name);
 			if (defined $blob_current && defined $blob_parent &&
 					$blob_current ne $blob_parent) {
-				print " | " .
+				print {$output_handler} " | " .
 					$cgi->a({-href => href(action=>"blobdiff",
 					                       hash=>$blob_current, hash_parent=>$blob_parent,
 					                       hash_base=>$hash_base, hash_parent_base=>$commit,
@@ -4635,15 +4649,15 @@ sub git_history_body {
 					        "diff to current");
 			}
 		}
-		print "</td>\n" .
+		print {$output_handler} "</td>\n" .
 		      "</tr>\n";
 	}
 	if (defined $extra) {
-		print "<tr>\n" .
+		print {$output_handler} "<tr>\n" .
 		      "<td colspan=\"4\">$extra</td>\n" .
 		      "</tr>\n";
 	}
-	print "</table>\n";
+	print {$output_handler} "</table>\n";
 }
 
 sub git_tags_body {
@@ -4652,7 +4666,7 @@ sub git_tags_body {
 	$from = 0 unless defined $from;
 	$to = $#{$taglist} if (!defined $to || $#{$taglist} < $to);
 
-	print "<table class=\"tags\">\n";
+	print {$output_handler} "<table class=\"tags\">\n";
 	my $alternate = 1;
 	for (my $i = $from; $i <= $to; $i++) {
 		my $entry = $taglist->[$i];
@@ -4663,50 +4677,50 @@ sub git_tags_body {
 			$comment_short = chop_str($comment, 30, 5);
 		}
 		if ($alternate) {
-			print "<tr class=\"dark\">\n";
+			print {$output_handler} "<tr class=\"dark\">\n";
 		} else {
-			print "<tr class=\"light\">\n";
+			print {$output_handler} "<tr class=\"light\">\n";
 		}
 		$alternate ^= 1;
 		if (defined $tag{'age'}) {
-			print "<td><i>$tag{'age'}</i></td>\n";
+			print {$output_handler} "<td><i>$tag{'age'}</i></td>\n";
 		} else {
-			print "<td></td>\n";
+			print {$output_handler} "<td></td>\n";
 		}
-		print "<td>" .
+		print {$output_handler} "<td>" .
 		      $cgi->a({-href => href(action=>$tag{'reftype'}, hash=>$tag{'refid'}),
 		               -class => "list name"}, esc_html($tag{'name'})) .
 		      "</td>\n" .
 		      "<td>";
 		if (defined $comment) {
-			print format_subject_html($comment, $comment_short,
+			print {$output_handler} format_subject_html($comment, $comment_short,
 			                          href(action=>"tag", hash=>$tag{'id'}));
 		}
-		print "</td>\n" .
+		print {$output_handler} "</td>\n" .
 		      "<td class=\"selflink\">";
 		if ($tag{'type'} eq "tag") {
-			print $cgi->a({-href => href(action=>"tag", hash=>$tag{'id'})}, "tag");
+			print {$output_handler} $cgi->a({-href => href(action=>"tag", hash=>$tag{'id'})}, "tag");
 		} else {
-			print "&nbsp;";
+			print {$output_handler} "&nbsp;";
 		}
-		print "</td>\n" .
+		print {$output_handler} "</td>\n" .
 		      "<td class=\"link\">" . " | " .
 		      $cgi->a({-href => href(action=>$tag{'reftype'}, hash=>$tag{'refid'})}, $tag{'reftype'});
 		if ($tag{'reftype'} eq "commit") {
-			print " | " . $cgi->a({-href => href(action=>"shortlog", hash=>$tag{'fullname'})}, "shortlog") .
+			print {$output_handler} " | " . $cgi->a({-href => href(action=>"shortlog", hash=>$tag{'fullname'})}, "shortlog") .
 			      " | " . $cgi->a({-href => href(action=>"log", hash=>$tag{'fullname'})}, "log");
 		} elsif ($tag{'reftype'} eq "blob") {
-			print " | " . $cgi->a({-href => href(action=>"blob_plain", hash=>$tag{'refid'})}, "raw");
+			print {$output_handler} " | " . $cgi->a({-href => href(action=>"blob_plain", hash=>$tag{'refid'})}, "raw");
 		}
-		print "</td>\n" .
+		print {$output_handler} "</td>\n" .
 		      "</tr>";
 	}
 	if (defined $extra) {
-		print "<tr>\n" .
+		print {$output_handler} "<tr>\n" .
 		      "<td colspan=\"5\">$extra</td>\n" .
 		      "</tr>\n";
 	}
-	print "</table>\n";
+	print {$output_handler} "</table>\n";
 }
 
 sub git_heads_body {
@@ -4715,19 +4729,19 @@ sub git_heads_body {
 	$from = 0 unless defined $from;
 	$to = $#{$headlist} if (!defined $to || $#{$headlist} < $to);
 
-	print "<table class=\"heads\">\n";
+	print {$output_handler} "<table class=\"heads\">\n";
 	my $alternate = 1;
 	for (my $i = $from; $i <= $to; $i++) {
 		my $entry = $headlist->[$i];
 		my %ref = %$entry;
 		my $curr = $ref{'id'} eq $head;
 		if ($alternate) {
-			print "<tr class=\"dark\">\n";
+			print {$output_handler} "<tr class=\"dark\">\n";
 		} else {
-			print "<tr class=\"light\">\n";
+			print {$output_handler} "<tr class=\"light\">\n";
 		}
 		$alternate ^= 1;
-		print "<td><i>$ref{'age'}</i></td>\n" .
+		print {$output_handler} "<td><i>$ref{'age'}</i></td>\n" .
 		      ($curr ? "<td class=\"current_head\">" : "<td>") .
 		      $cgi->a({-href => href(action=>"shortlog", hash=>$ref{'fullname'}),
 		               -class => "list name"},esc_html($ref{'name'})) .
@@ -4740,11 +4754,11 @@ sub git_heads_body {
 		      "</tr>";
 	}
 	if (defined $extra) {
-		print "<tr>\n" .
+		print {$output_handler} "<tr>\n" .
 		      "<td colspan=\"3\">$extra</td>\n" .
 		      "</tr>\n";
 	}
-	print "</table>\n";
+	print {$output_handler} "</table>\n";
 }
 
 sub git_search_grep_body {
@@ -4752,7 +4766,7 @@ sub git_search_grep_body {
 	$from = 0 unless defined $from;
 	$to = $#{$commitlist} if (!defined $to || $#{$commitlist} < $to);
 
-	print "<table class=\"commit_search\">\n";
+	print {$output_handler} "<table class=\"commit_search\">\n";
 	my $alternate = 1;
 	for (my $i = $from; $i <= $to; $i++) {
 		my %co = %{$commitlist->[$i]};
@@ -4761,12 +4775,12 @@ sub git_search_grep_body {
 		}
 		my $commit = $co{'id'};
 		if ($alternate) {
-			print "<tr class=\"dark\">\n";
+			print {$output_handler} "<tr class=\"dark\">\n";
 		} else {
-			print "<tr class=\"light\">\n";
+			print {$output_handler} "<tr class=\"light\">\n";
 		}
 		$alternate ^= 1;
-		print "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" .
+		print {$output_handler} "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" .
 		      format_author_html('td', \%co, 15, 5) .
 		      "<td>" .
 		      $cgi->a({-href => href(action=>"commit", hash=>$co{'id'}),
@@ -4786,25 +4800,25 @@ sub git_search_grep_body {
 				$match = esc_html($match);
 				$trail = esc_html($trail);
 
-				print "$lead<span class=\"match\">$match</span>$trail<br />";
+				print {$output_handler} "$lead<span class=\"match\">$match</span>$trail<br />";
 			}
 		}
-		print "</td>\n" .
+		print {$output_handler} "</td>\n" .
 		      "<td class=\"link\">" .
 		      $cgi->a({-href => href(action=>"commit", hash=>$co{'id'})}, "commit") .
 		      " | " .
 		      $cgi->a({-href => href(action=>"commitdiff", hash=>$co{'id'})}, "commitdiff") .
 		      " | " .
 		      $cgi->a({-href => href(action=>"tree", hash=>$co{'tree'}, hash_base=>$co{'id'})}, "tree");
-		print "</td>\n" .
+		print {$output_handler} "</td>\n" .
 		      "</tr>\n";
 	}
 	if (defined $extra) {
-		print "<tr>\n" .
+		print {$output_handler} "<tr>\n" .
 		      "<td colspan=\"3\">$extra</td>\n" .
 		      "</tr>\n";
 	}
-	print "</table>\n";
+	print {$output_handler} "</table>\n";
 }
 
 ## ======================================================================
@@ -4824,11 +4838,11 @@ sub git_project_list {
 
 	git_header_html();
 	if (-f $home_text) {
-		print "<div class=\"index_include\">\n";
+		print {$output_handler} "<div class=\"index_include\">\n";
 		insert_file($home_text);
-		print "</div>\n";
+		print {$output_handler} "</div>\n";
 	}
-	print $cgi->startform(-method => "get") .
+	print {$output_handler} $cgi->startform(-method => "get") .
 	      "<p class=\"projsearch\">Search:\n" .
 	      $cgi->textfield(-name => "s", -value => $searchtext) . "\n" .
 	      "</p>" .
@@ -4858,7 +4872,7 @@ sub git_forks {
 sub git_project_index {
 	my @projects = git_get_projects_list($project);
 
-	print $cgi->header(
+	print {$output_handler} $cgi->header(
 		-type => 'text/plain',
 		-charset => 'utf-8',
 		-content_disposition => 'inline; filename="index.aux"');
@@ -4875,7 +4889,7 @@ sub git_project_index {
 		$path  =~ s/ /\+/g;
 		$owner =~ s/ /\+/g;
 
-		print "$path $owner\n";
+		print {$output_handler} "$path $owner\n";
 	}
 }
 
@@ -4902,12 +4916,12 @@ sub git_summary {
 	git_header_html();
 	git_print_page_nav('summary','', $head);
 
-	print "<div class=\"title\">&nbsp;</div>\n";
-	print "<table class=\"projects_list\">\n" .
+	print {$output_handler} "<div class=\"title\">&nbsp;</div>\n";
+	print {$output_handler} "<table class=\"projects_list\">\n" .
 	      "<tr id=\"metadata_desc\"><td>description</td><td>" . esc_html($descr) . "</td></tr>\n" .
 	      "<tr id=\"metadata_owner\"><td>owner</td><td>" . esc_html($owner) . "</td></tr>\n";
 	if (defined $cd{'rfc2822'}) {
-		print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
+		print {$output_handler} "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
 	}
 
 	# use per project git URL list in $projectroot/$project/cloneurl
@@ -4917,7 +4931,7 @@ sub git_summary {
 	@url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
 	foreach my $git_url (@url_list) {
 		next unless $git_url;
-		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
+		print {$output_handler} "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
 		$url_tag = "";
 	}
 
@@ -4926,23 +4940,23 @@ sub git_summary {
 	if ($show_ctags) {
 		my $ctags = git_get_project_ctags($project);
 		my $cloud = git_populate_project_tagcloud($ctags);
-		print "<tr id=\"metadata_ctags\"><td>Content tags:<br />";
-		print "</td>\n<td>" unless %$ctags;
-		print "<form action=\"$show_ctags\" method=\"post\"><input type=\"hidden\" name=\"p\" value=\"$project\" />Add: <input type=\"text\" name=\"t\" size=\"8\" /></form>";
-		print "</td>\n<td>" if %$ctags;
-		print git_show_project_tagcloud($cloud, 48);
-		print "</td></tr>";
+		print {$output_handler} "<tr id=\"metadata_ctags\"><td>Content tags:<br />";
+		print {$output_handler} "</td>\n<td>" unless %$ctags;
+		print {$output_handler} "<form action=\"$show_ctags\" method=\"post\"><input type=\"hidden\" name=\"p\" value=\"$project\" />Add: <input type=\"text\" name=\"t\" size=\"8\" /></form>";
+		print {$output_handler} "</td>\n<td>" if %$ctags;
+		print {$output_handler} git_show_project_tagcloud($cloud, 48);
+		print {$output_handler} "</td></tr>";
 	}
 
-	print "</table>\n";
+	print {$output_handler} "</table>\n";
 
 	# If XSS prevention is on, we don't include README.html.
 	# TODO: Allow a readme in some safe format.
 	if (!$prevent_xss && -s "$projectroot/$project/README.html") {
-		print "<div class=\"title\">readme</div>\n" .
+		print {$output_handler} "<div class=\"title\">readme</div>\n" .
 		      "<div class=\"readme\">\n";
 		insert_file("$projectroot/$project/README.html");
-		print "\n</div>\n"; # class="readme"
+		print {$output_handler} "\n</div>\n"; # class="readme"
 	}
 
 	# we need to request one more than 16 (0..15) to check if
@@ -4991,7 +5005,7 @@ sub git_tag {
 	}
 
 	git_print_header_div('commit', esc_html($tag{'name'}), $hash);
-	print "<div class=\"title_text\">\n" .
+	print {$output_handler} "<div class=\"title_text\">\n" .
 	      "<table class=\"object_header\">\n" .
 	      "<tr>\n" .
 	      "<td>object</td>\n" .
@@ -5003,15 +5017,15 @@ sub git_tag {
 	if (defined($tag{'author'})) {
 		git_print_authorship_rows(\%tag, 'author');
 	}
-	print "</table>\n\n" .
+	print {$output_handler} "</table>\n\n" .
 	      "</div>\n";
-	print "<div class=\"page_body\">";
+	print {$output_handler} "<div class=\"page_body\">";
 	my $comment = $tag{'comment'};
 	foreach my $line (@$comment) {
 		chomp $line;
-		print esc_html($line, -nbsp=>1) . "<br/>\n";
+		print {$output_handler} esc_html($line, -nbsp=>1) . "<br/>\n";
 	}
-	print "</div>\n";
+	print {$output_handler} "</div>\n";
 	git_footer_html();
 }
 
@@ -5062,21 +5076,23 @@ sub git_blame_common {
 
 	# incremental blame data returns early
 	if ($format eq 'data') {
-		print $cgi->header(
+		print {$output_handler} $cgi->header(
 			-type=>"text/plain", -charset => "utf-8",
 			-status=> "200 OK");
 		local $| = 1; # output autoflush
-		print while <$fd>;
+		while (<$fd>) {
+			 print {$output_handler} $_;
+		}
 		close $fd
-			or print "ERROR $!\n";
+			or die_error(500, "ERROR $!\n");
 
-		print 'END';
+		print {$output_handler} 'END';
 		if (defined $t0 && gitweb_check_feature('timed')) {
-			print ' '.
+			print {$output_handler} ' '.
 			      Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
 			      ' '.$number_of_git_cmds;
 		}
-		print "\n";
+		print {$output_handler} "\n";
 
 		return;
 	}
@@ -5109,20 +5125,20 @@ sub git_blame_common {
 
 	# page body
 	if ($format eq 'incremental') {
-		print "<noscript>\n<div class=\"error\"><center><b>\n".
+		print {$output_handler} "<noscript>\n<div class=\"error\"><center><b>\n".
 		      "This page requires JavaScript to run.\n Use ".
 		      $cgi->a({-href => href(action=>'blame',javascript=>0,-replay=>1)},
 		              'this page').
 		      " instead.\n".
 		      "</b></center></div>\n</noscript>\n";
 
-		print qq!<div id="progress_bar" style="width: 100%; background-color: yellow"></div>\n!;
+		print {$output_handler} qq!<div id="progress_bar" style="width: 100%; background-color: yellow"></div>\n!;
 	}
 
-	print qq!<div class="page_body">\n!;
-	print qq!<div id="progress_info">... / ...</div>\n!
+	print {$output_handler} qq!<div class="page_body">\n!;
+	print {$output_handler} qq!<div id="progress_info">... / ...</div>\n!
 		if ($format eq 'incremental');
-	print qq!<table id="blame_table" class="blame" width="100%">\n!.
+	print {$output_handler} qq!<table id="blame_table" class="blame" width="100%">\n!.
 	      #qq!<col width="5.5em" /><col width="2.5em" /><col width="*" />\n!.
 	      qq!<thead>\n!.
 	      qq!<tr><th>Commit</th><th>Line</th><th>Data</th></tr>\n!.
@@ -5143,12 +5159,12 @@ sub git_blame_common {
 			chomp $line;
 			$linenr++;
 
-			print qq!<tr id="l$linenr" class="$color_class">!.
+			print {$output_handler} qq!<tr id="l$linenr" class="$color_class">!.
 			      qq!<td class="sha1"><a href=""> </a></td>!.
 			      qq!<td class="linenr">!.
 			      qq!<a class="linenr" href="">$linenr</a></td>!;
-			print qq!<td class="pre">! . esc_html($line) . "</td>\n";
-			print qq!</tr>\n!;
+			print {$output_handler} qq!<td class="pre">! . esc_html($line) . "</td>\n";
+			print {$output_handler} qq!</tr>\n!;
 		}
 
 	} else { # porcelain, i.e. ordinary blame
@@ -5189,25 +5205,25 @@ sub git_blame_common {
 			$tr_class .= ' boundary' if (exists $meta->{'boundary'});
 			$tr_class .= ' no-previous' if ($meta->{'nprevious'} == 0);
 			$tr_class .= ' multiple-previous' if ($meta->{'nprevious'} > 1);
-			print "<tr id=\"l$lineno\" class=\"$tr_class\">\n";
+			print {$output_handler} "<tr id=\"l$lineno\" class=\"$tr_class\">\n";
 			if ($group_size) {
-				print "<td class=\"sha1\"";
-				print " title=\"". esc_html($author) . ", $date\"";
-				print " rowspan=\"$group_size\"" if ($group_size > 1);
-				print ">";
-				print $cgi->a({-href => href(action=>"commit",
+				print {$output_handler} "<td class=\"sha1\"";
+				print {$output_handler} " title=\"". esc_html($author) . ", $date\"";
+				print {$output_handler} " rowspan=\"$group_size\"" if ($group_size > 1);
+				print {$output_handler} ">";
+				print {$output_handler} $cgi->a({-href => href(action=>"commit",
 				                             hash=>$full_rev,
 				                             file_name=>$file_name)},
 				              esc_html($short_rev));
 				if ($group_size >= 2) {
 					my @author_initials = ($author =~ /\b([[:upper:]])\B/g);
 					if (@author_initials) {
-						print "<br />" .
+						print {$output_handler} "<br />" .
 						      esc_html(join('', @author_initials));
 						#           or join('.', ...)
 					}
 				}
-				print "</td>\n";
+				print {$output_handler} "</td>\n";
 			}
 			# 'previous' <sha1 of parent commit> <filename at commit>
 			if (exists $meta->{'previous'} &&
@@ -5224,23 +5240,23 @@ sub git_blame_common {
 			my $blamed = href(action => 'blame',
 			                  file_name => $linenr_filename,
 			                  hash_base => $linenr_commit);
-			print "<td class=\"linenr\">";
-			print $cgi->a({ -href => "$blamed#l$orig_lineno",
+			print {$output_handler} "<td class=\"linenr\">";
+			print {$output_handler} $cgi->a({ -href => "$blamed#l$orig_lineno",
 			                -class => "linenr" },
 			              esc_html($lineno));
-			print "</td>";
-			print "<td class=\"pre\">" . esc_html($data) . "</td>\n";
-			print "</tr>\n";
+			print {$output_handler} "</td>";
+			print {$output_handler} "<td class=\"pre\">" . esc_html($data) . "</td>\n";
+			print {$output_handler} "</tr>\n";
 		} # end while
 
 	}
 
 	# footer
-	print "</tbody>\n".
+	print {$output_handler} "</tbody>\n".
 	      "</table>\n"; # class="blame"
-	print "</div>\n";   # class="blame_body"
+	print {$output_handler} "</div>\n";   # class="blame_body"
 	close $fd
-		or print "Reading blob failed\n";
+		or print {$output_handler} "Reading blob failed\n";
 
 	git_footer_html();
 }
@@ -5323,16 +5339,17 @@ sub git_blob_plain {
 	my $sandbox = $prevent_xss &&
 		$type !~ m!^(?:text/plain|image/(?:gif|png|jpeg))$!;
 
-	print $cgi->header(
+	print {$output_handler} $cgi->header(
 		-type => $type,
 		-expires => $expires,
 		-content_disposition =>
 			($sandbox ? 'attachment' : 'inline')
 			. '; filename="' . $save_as . '"');
 	local $/ = undef;
-	binmode STDOUT, ':raw';
-	print <$fd>;
-	binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+
+	binmode $output_handler_bin, ':raw';
+	print {$output_handler_bin} <$fd>;
+	binmode $output_handler_bin, ':utf8'; # as set at the beginning of gitweb.cgi
 	close $fd;
 }
 
@@ -5391,18 +5408,18 @@ sub git_blob {
 		git_print_page_nav('','', $hash_base,$co{'tree'},$hash_base, $formats_nav);
 		git_print_header_div('commit', esc_html($co{'title'}), $hash_base);
 	} else {
-		print "<div class=\"page_nav\">\n" .
+		print {$output_handler} "<div class=\"page_nav\">\n" .
 		      "<br/><br/></div>\n" .
 		      "<div class=\"title\">$hash</div>\n";
 	}
 	git_print_page_path($file_name, "blob", $hash_base);
-	print "<div class=\"page_body\">\n";
+	print {$output_handler} "<div class=\"page_body\">\n";
 	if ($mimetype =~ m!^image/!) {
-		print qq!<img type="$mimetype"!;
+		print {$output_handler} qq!<img type="$mimetype"!;
 		if ($file_name) {
-			print qq! alt="$file_name" title="$file_name"!;
+			print {$output_handler} qq! alt="$file_name" title="$file_name"!;
 		}
-		print qq! src="! .
+		print {$output_handler} qq! src="! .
 		      href(action=>"blob_plain", hash=>$hash,
 		           hash_base=>$hash_base, file_name=>$file_name) .
 		      qq!" />\n!;
@@ -5412,14 +5429,14 @@ sub git_blob {
 			chomp $line;
 			$nr++;
 			$line = untabify($line);
-			printf "<div class=\"pre\"><a id=\"l%i\" href=\"" . href(-replay => 1)
+			printf {$output_handler} "<div class=\"pre\"><a id=\"l%i\" href=\"" . href(-replay => 1)
 				. "#l%i\" class=\"linenr\">%4i</a> %s</div>\n",
 			       $nr, $nr, $nr, esc_html($line, -nbsp=>1);
 		}
 	}
 	close $fd
-		or print "Reading blob failed.\n";
-	print "</div>";
+		or print {$output_handler} "Reading blob failed.\n";
+	print {$output_handler} "</div>";
 	git_footer_html();
 }
 
@@ -5474,9 +5491,9 @@ sub git_tree {
 		git_print_header_div('commit', esc_html($co{'title'}) . $ref, $hash_base);
 	} else {
 		undef $hash_base;
-		print "<div class=\"page_nav\">\n";
-		print "<br/><br/></div>\n";
-		print "<div class=\"title\">$hash</div>\n";
+		print {$output_handler} "<div class=\"page_nav\">\n";
+		print {$output_handler} "<br/><br/></div>\n";
+		print {$output_handler} "<div class=\"title\">$hash</div>\n";
 	}
 	if (defined $file_name) {
 		$basedir = $file_name;
@@ -5485,16 +5502,16 @@ sub git_tree {
 		}
 		git_print_page_path($file_name, 'tree', $hash_base);
 	}
-	print "<div class=\"page_body\">\n";
-	print "<table class=\"tree\">\n";
+	print {$output_handler} "<div class=\"page_body\">\n";
+	print {$output_handler} "<table class=\"tree\">\n";
 	my $alternate = 1;
 	# '..' (top directory) link if possible
 	if (defined $hash_base &&
 	    defined $file_name && $file_name =~ m![^/]+$!) {
 		if ($alternate) {
-			print "<tr class=\"dark\">\n";
+			print {$output_handler} "<tr class=\"dark\">\n";
 		} else {
-			print "<tr class=\"light\">\n";
+			print {$output_handler} "<tr class=\"light\">\n";
 		}
 		$alternate ^= 1;
 
@@ -5502,33 +5519,33 @@ sub git_tree {
 		$up =~ s!/?[^/]+$!!;
 		undef $up unless $up;
 		# based on git_print_tree_entry
-		print '<td class="mode">' . mode_str('040000') . "</td>\n";
-		print '<td class="size">&nbsp;</td>'."\n" if $show_sizes;
-		print '<td class="list">';
-		print $cgi->a({-href => href(action=>"tree",
+		print {$output_handler} '<td class="mode">' . mode_str('040000') . "</td>\n";
+		print {$output_handler} '<td class="size">&nbsp;</td>'."\n" if $show_sizes;
+		print {$output_handler} '<td class="list">';
+		print {$output_handler} $cgi->a({-href => href(action=>"tree",
 		                             hash_base=>$hash_base,
 		                             file_name=>$up)},
 		              "..");
-		print "</td>\n";
-		print "<td class=\"link\"></td>\n";
+		print {$output_handler} "</td>\n";
+		print {$output_handler} "<td class=\"link\"></td>\n";
 
-		print "</tr>\n";
+		print {$output_handler} "</tr>\n";
 	}
 	foreach my $line (@entries) {
 		my %t = parse_ls_tree_line($line, -z => 1, -l => $show_sizes);
 
 		if ($alternate) {
-			print "<tr class=\"dark\">\n";
+			print {$output_handler} "<tr class=\"dark\">\n";
 		} else {
-			print "<tr class=\"light\">\n";
+			print {$output_handler} "<tr class=\"light\">\n";
 		}
 		$alternate ^= 1;
 
 		git_print_tree_entry(\%t, $basedir, $hash_base, $have_blame);
 
-		print "</tr>\n";
+		print {$output_handler} "</tr>\n";
 	}
-	print "</table>\n" .
+	print {$output_handler} "</table>\n" .
 	      "</div>";
 	git_footer_html();
 }
@@ -5605,16 +5622,16 @@ sub git_snapshot {
 	}
 
 	$filename =~ s/(["\\])/\\$1/g;
-	print $cgi->header(
+	print {$output_handler} $cgi->header(
 		-type => $known_snapshot_formats{$format}{'type'},
 		-content_disposition => 'inline; filename="' . $filename . '"',
 		-status => '200 OK');
 
 	open my $fd, "-|", $cmd
 		or die_error(500, "Execute git-archive failed");
-	binmode STDOUT, ':raw';
-	print <$fd>;
-	binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+	binmode $output_handler_bin, ':raw';
+	print {$output_handler_bin} <$fd>;
+	binmode $output_handler_bin, ':utf8'; # as set at the beginning of gitweb.cgi
 	close $fd;
 }
 
@@ -5765,11 +5782,11 @@ sub git_commit {
 	} else {
 		git_print_header_div('tree', esc_html($co{'title'}) . $ref, $co{'tree'}, $hash);
 	}
-	print "<div class=\"title_text\">\n" .
+	print {$output_handler} "<div class=\"title_text\">\n" .
 	      "<table class=\"object_header\">\n";
 	git_print_authorship_rows(\%co);
-	print "<tr><td>commit</td><td class=\"sha1\">$co{'id'}</td></tr>\n";
-	print "<tr>" .
+	print {$output_handler} "<tr><td>commit</td><td class=\"sha1\">$co{'id'}</td></tr>\n";
+	print {$output_handler} "<tr>" .
 	      "<td>tree</td>" .
 	      "<td class=\"sha1\">" .
 	      $cgi->a({-href => href(action=>"tree", hash=>$co{'tree'}, hash_base=>$hash),
@@ -5780,13 +5797,13 @@ sub git_commit {
 	              "tree");
 	my $snapshot_links = format_snapshot_links($hash);
 	if (defined $snapshot_links) {
-		print " | " . $snapshot_links;
+		print {$output_handler} " | " . $snapshot_links;
 	}
-	print "</td>" .
+	print {$output_handler} "</td>" .
 	      "</tr>\n";
 
 	foreach my $par (@$parents) {
-		print "<tr>" .
+		print {$output_handler} "<tr>" .
 		      "<td>parent</td>" .
 		      "<td class=\"sha1\">" .
 		      $cgi->a({-href => href(action=>"commit", hash=>$par),
@@ -5799,12 +5816,12 @@ sub git_commit {
 		      "</td>" .
 		      "</tr>\n";
 	}
-	print "</table>".
+	print {$output_handler} "</table>".
 	      "</div>\n";
 
-	print "<div class=\"page_body\">\n";
+	print {$output_handler} "<div class=\"page_body\">\n";
 	git_print_log($co{'comment'});
-	print "</div>\n";
+	print {$output_handler} "</div>\n";
 
 	git_difftree_body(\@difftree, $hash, @$parents);
 
@@ -5852,7 +5869,7 @@ sub git_object {
 		die_error(400, "Not enough information to find object");
 	}
 
-	print $cgi->redirect(-uri => href(action=>$type, -full=>1,
+	print {$output_handler} $cgi->redirect(-uri => href(action=>$type, -full=>1,
 	                                  hash=>$hash, hash_base=>$hash_base,
 	                                  file_name=>$file_name),
 	                     -status => '302 Found');
@@ -5943,23 +5960,23 @@ sub git_blobdiff {
 			git_print_page_nav('','', $hash_base,$co{'tree'},$hash_base, $formats_nav);
 			git_print_header_div('commit', esc_html($co{'title'}), $hash_base);
 		} else {
-			print "<div class=\"page_nav\"><br/>$formats_nav<br/></div>\n";
-			print "<div class=\"title\">$hash vs $hash_parent</div>\n";
+			print {$output_handler} "<div class=\"page_nav\"><br/>$formats_nav<br/></div>\n";
+			print {$output_handler} "<div class=\"title\">$hash vs $hash_parent</div>\n";
 		}
 		if (defined $file_name) {
 			git_print_page_path($file_name, "blob", $hash_base);
 		} else {
-			print "<div class=\"page_path\"></div>\n";
+			print {$output_handler} "<div class=\"page_path\"></div>\n";
 		}
 
 	} elsif ($format eq 'plain') {
-		print $cgi->header(
+		print {$output_handler} $cgi->header(
 			-type => 'text/plain',
 			-charset => 'utf-8',
 			-expires => $expires,
 			-content_disposition => 'inline; filename="' . "$file_name" . '.patch"');
 
-		print "X-Git-Url: " . $cgi->self_url() . "\n\n";
+		print {$output_handler} "X-Git-Url: " . $cgi->self_url() . "\n\n";
 
 	} else {
 		die_error(400, "Unknown blobdiff format");
@@ -5967,12 +5984,12 @@ sub git_blobdiff {
 
 	# patch
 	if ($format eq 'html') {
-		print "<div class=\"page_body\">\n";
+		print {$output_handler} "<div class=\"page_body\">\n";
 
 		git_patchset_body($fd, [ \%diffinfo ], $hash_base, $hash_parent_base);
 		close $fd;
 
-		print "</div>\n"; # class="page_body"
+		print {$output_handler} "</div>\n"; # class="page_body"
 		git_footer_html();
 
 	} else {
@@ -5980,12 +5997,12 @@ sub git_blobdiff {
 			$line =~ s!a/($hash|$hash_parent)!'a/'.esc_path($diffinfo{'from_file'})!eg;
 			$line =~ s!b/($hash|$hash_parent)!'b/'.esc_path($diffinfo{'to_file'})!eg;
 
-			print $line;
+			print {$output_handler} $line;
 
 			last if $line =~ m!^\+\+\+!;
 		}
 		local $/ = undef;
-		print <$fd>;
+		print {$output_handler} <$fd>;
 		close $fd;
 	}
 }
@@ -6148,16 +6165,16 @@ sub git_commitdiff {
 		git_header_html(undef, $expires);
 		git_print_page_nav('commitdiff','', $hash,$co{'tree'},$hash, $formats_nav);
 		git_print_header_div('commit', esc_html($co{'title'}) . $ref, $hash);
-		print "<div class=\"title_text\">\n" .
+		print {$output_handler} "<div class=\"title_text\">\n" .
 		      "<table class=\"object_header\">\n";
 		git_print_authorship_rows(\%co);
-		print "</table>".
+		print {$output_handler} "</table>".
 		      "</div>\n";
-		print "<div class=\"page_body\">\n";
+		print {$output_handler} "<div class=\"page_body\">\n";
 		if (@{$co{'comment'}} > 1) {
-			print "<div class=\"log\">\n";
+			print {$output_handler} "<div class=\"log\">\n";
 			git_print_log($co{'comment'}, -final_empty_line=> 1, -remove_title => 1);
-			print "</div>\n"; # class="log"
+			print {$output_handler} "</div>\n"; # class="log"
 		}
 
 	} elsif ($format eq 'plain') {
@@ -6165,27 +6182,27 @@ sub git_commitdiff {
 		my $tagname = git_get_rev_name_tags($hash);
 		my $filename = basename($project) . "-$hash.patch";
 
-		print $cgi->header(
+		print {$output_handler} $cgi->header(
 			-type => 'text/plain',
 			-charset => 'utf-8',
 			-expires => $expires,
 			-content_disposition => 'inline; filename="' . "$filename" . '"');
 		my %ad = parse_date($co{'author_epoch'}, $co{'author_tz'});
-		print "From: " . to_utf8($co{'author'}) . "\n";
-		print "Date: $ad{'rfc2822'} ($ad{'tz_local'})\n";
-		print "Subject: " . to_utf8($co{'title'}) . "\n";
+		print {$output_handler} "From: " . to_utf8($co{'author'}) . "\n";
+		print {$output_handler} "Date: $ad{'rfc2822'} ($ad{'tz_local'})\n";
+		print {$output_handler} "Subject: " . to_utf8($co{'title'}) . "\n";
 
-		print "X-Git-Tag: $tagname\n" if $tagname;
-		print "X-Git-Url: " . $cgi->self_url() . "\n\n";
+		print {$output_handler} "X-Git-Tag: $tagname\n" if $tagname;
+		print {$output_handler} "X-Git-Url: " . $cgi->self_url() . "\n\n";
 
 		foreach my $line (@{$co{'comment'}}) {
-			print to_utf8($line) . "\n";
+			print {$output_handler} to_utf8($line) . "\n";
 		}
-		print "---\n\n";
+		print {$output_handler} "---\n\n";
 	} elsif ($format eq 'patch') {
 		my $filename = basename($project) . "-$hash.patch";
 
-		print $cgi->header(
+		print {$output_handler} $cgi->header(
 			-type => 'text/plain',
 			-charset => 'utf-8',
 			-expires => $expires,
@@ -6198,24 +6215,24 @@ sub git_commitdiff {
 			$hash_parent eq '-c' || $hash_parent eq '--cc';
 		git_difftree_body(\@difftree, $hash,
 		                  $use_parents ? @{$co{'parents'}} : $hash_parent);
-		print "<br/>\n";
+		print {$output_handler} "<br/>\n";
 
 		git_patchset_body($fd, \@difftree, $hash,
 		                  $use_parents ? @{$co{'parents'}} : $hash_parent);
 		close $fd;
-		print "</div>\n"; # class="page_body"
+		print {$output_handler} "</div>\n"; # class="page_body"
 		git_footer_html();
 
 	} elsif ($format eq 'plain') {
 		local $/ = undef;
-		print <$fd>;
+		print {$output_handler} <$fd>;
 		close $fd
-			or print "Reading git-diff-tree failed\n";
+			or print {$output_handler} "Reading git-diff-tree failed\n";
 	} elsif ($format eq 'patch') {
 		local $/ = undef;
-		print <$fd>;
+		print {$output_handler} <$fd>;
 		close $fd
-			or print "Reading git-format-patch failed\n";
+			or print {$output_handler} "Reading git-format-patch failed\n";
 	}
 }
 
@@ -6318,7 +6335,7 @@ sub git_search {
 		git_print_page_nav('','', $hash,$co{'tree'},$hash);
 		git_print_header_div('commit', esc_html($co{'title'}), $hash);
 
-		print "<table class=\"pickaxe search\">\n";
+		print {$output_handler} "<table class=\"pickaxe search\">\n";
 		my $alternate = 1;
 		local $/ = "\n";
 		open my $fd, '-|', git_cmd(), '--no-pager', 'log', @diff_opts,
@@ -6334,24 +6351,24 @@ sub git_search {
 			if (defined $set{'commit'}) {
 				# finish previous commit
 				if (%co) {
-					print "</td>\n" .
+					print {$output_handler} "</td>\n" .
 					      "<td class=\"link\">" .
 					      $cgi->a({-href => href(action=>"commit", hash=>$co{'id'})}, "commit") .
 					      " | " .
 					      $cgi->a({-href => href(action=>"tree", hash=>$co{'tree'}, hash_base=>$co{'id'})}, "tree");
-					print "</td>\n" .
+					print {$output_handler} "</td>\n" .
 					      "</tr>\n";
 				}
 
 				if ($alternate) {
-					print "<tr class=\"dark\">\n";
+					print {$output_handler} "<tr class=\"dark\">\n";
 				} else {
-					print "<tr class=\"light\">\n";
+					print {$output_handler} "<tr class=\"light\">\n";
 				}
 				$alternate ^= 1;
 				%co = parse_commit($set{'commit'});
 				my $author = chop_and_escape_str($co{'author_name'}, 15, 5);
-				print "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" .
+				print {$output_handler} "<td title=\"$co{'age_string_age'}\"><i>$co{'age_string_date'}</i></td>\n" .
 				      "<td><i>$author</i></td>\n" .
 				      "<td>" .
 				      $cgi->a({-href => href(action=>"commit", hash=>$co{'id'}),
@@ -6360,7 +6377,7 @@ sub git_search {
 			} elsif (defined $set{'to_id'}) {
 				next if ($set{'to_id'} =~ m/^0{40}$/);
 
-				print $cgi->a({-href => href(action=>"blob", hash_base=>$co{'id'},
+				print {$output_handler} $cgi->a({-href => href(action=>"blob", hash_base=>$co{'id'},
 				                             hash=>$set{'to_id'}, file_name=>$set{'to_file'}),
 				              -class => "list"},
 				              "<span class=\"match\">" . esc_path($set{'file'}) . "</span>") .
@@ -6371,23 +6388,23 @@ sub git_search {
 
 		# finish last commit (warning: repetition!)
 		if (%co) {
-			print "</td>\n" .
+			print {$output_handler} "</td>\n" .
 			      "<td class=\"link\">" .
 			      $cgi->a({-href => href(action=>"commit", hash=>$co{'id'})}, "commit") .
 			      " | " .
 			      $cgi->a({-href => href(action=>"tree", hash=>$co{'tree'}, hash_base=>$co{'id'})}, "tree");
-			print "</td>\n" .
+			print {$output_handler} "</td>\n" .
 			      "</tr>\n";
 		}
 
-		print "</table>\n";
+		print {$output_handler} "</table>\n";
 	}
 
 	if ($searchtype eq 'grep') {
 		git_print_page_nav('','', $hash,$co{'tree'},$hash);
 		git_print_header_div('commit', esc_html($co{'title'}), $hash);
 
-		print "<table class=\"grep_search\">\n";
+		print {$output_handler} "<table class=\"grep_search\">\n";
 		my $alternate = 1;
 		my $matches = 0;
 		local $/ = "\n";
@@ -6406,21 +6423,21 @@ sub git_search {
 				(undef, $file, $lno, $ltext) = split(/:/, $line, 4);
 			}
 			if ($file ne $lastfile) {
-				$lastfile and print "</td></tr>\n";
+				$lastfile and print {$output_handler} "</td></tr>\n";
 				if ($alternate++) {
-					print "<tr class=\"dark\">\n";
+					print {$output_handler} "<tr class=\"dark\">\n";
 				} else {
-					print "<tr class=\"light\">\n";
+					print {$output_handler} "<tr class=\"light\">\n";
 				}
-				print "<td class=\"list\">".
+				print {$output_handler} "<td class=\"list\">".
 					$cgi->a({-href => href(action=>"blob", hash=>$co{'hash'},
 							       file_name=>"$file"),
 						-class => "list"}, esc_path($file));
-				print "</td><td>\n";
+				print {$output_handler} "</td><td>\n";
 				$lastfile = $file;
 			}
 			if ($binary) {
-				print "<div class=\"binary\">Binary file</div>\n";
+				print {$output_handler} "<div class=\"binary\">Binary file</div>\n";
 			} else {
 				$ltext = untabify($ltext);
 				if ($ltext =~ m/^(.*)($search_regexp)(.*)$/i) {
@@ -6432,7 +6449,7 @@ sub git_search {
 				} else {
 					$ltext = esc_html($ltext, -nbsp=>1);
 				}
-				print "<div class=\"pre\">" .
+				print {$output_handler} "<div class=\"pre\">" .
 					$cgi->a({-href => href(action=>"blob", hash=>$co{'hash'},
 							       file_name=>"$file").'#l'.$lno,
 						-class => "linenr"}, sprintf('%4i', $lno))
@@ -6440,16 +6457,16 @@ sub git_search {
 			}
 		}
 		if ($lastfile) {
-			print "</td></tr>\n";
+			print {$output_handler} "</td></tr>\n";
 			if ($matches > 1000) {
-				print "<div class=\"diff nodifferences\">Too many matches, listing trimmed</div>\n";
+				print {$output_handler} "<div class=\"diff nodifferences\">Too many matches, listing trimmed</div>\n";
 			}
 		} else {
-			print "<div class=\"diff nodifferences\">No matches found</div>\n";
+			print {$output_handler} "<div class=\"diff nodifferences\">No matches found</div>\n";
 		}
 		close $fd;
 
-		print "</table>\n";
+		print {$output_handler} "</table>\n";
 	}
 	git_footer_html();
 }
@@ -6457,7 +6474,7 @@ sub git_search {
 sub git_search_help {
 	git_header_html();
 	git_print_page_nav('','', $hash,$hash,$hash);
-	print <<EOT;
+	print {$output_handler} <<EOT;
 <p><strong>Pattern</strong> is by default a normal string that is matched precisely (but without
 regard to case, except in the case of pickaxe). However, when you check the <em>re</em> checkbox,
 the pattern entered is recognized as the POSIX extended
@@ -6469,7 +6486,7 @@ insensitive).</p>
 EOT
 	my $have_grep = gitweb_check_feature('grep');
 	if ($have_grep) {
-		print <<EOT;
+		print {$output_handler} <<EOT;
 <dt><b>grep</b></dt>
 <dd>All files in the currently selected tree (HEAD unless you are explicitly browsing
     a different one) are searched for the given pattern. On large trees, this search can take
@@ -6478,7 +6495,7 @@ due to git-grep peculiarity, currently if regexp mode is turned off, the matches
 case-sensitive.</dd>
 EOT
 	}
-	print <<EOT;
+	print {$output_handler} <<EOT;
 <dt><b>author</b></dt>
 <dd>Name and e-mail of the change author and date of birth of the patch will be scanned for the given pattern.</dd>
 <dt><b>committer</b></dt>
@@ -6486,7 +6503,7 @@ EOT
 EOT
 	my $have_pickaxe = gitweb_check_feature('pickaxe');
 	if ($have_pickaxe) {
-		print <<EOT;
+		print {$output_handler} <<EOT;
 <dt><b>pickaxe</b></dt>
 <dd>All commits that caused the string to appear or disappear from any file (changes that
 added, removed or "modified" the string) will be listed. This search can take a while and
@@ -6494,7 +6511,7 @@ takes a lot of strain on the server, so please use it wisely. Note that since yo
 interested even in changes just changing the case as well, this search is case sensitive.</dd>
 EOT
 	}
-	print "</dl>\n";
+	print {$output_handler} "</dl>\n";
 	git_footer_html();
 }
 
@@ -6541,7 +6558,7 @@ sub git_feed {
 				$since = Time::ParseDate::parsedate($if_modified, GMT => 1);
 			}
 			if (defined $since && $latest_epoch <= $since) {
-				print $cgi->header(
+				print {$output_handler} $cgi->header(
 					-type => $content_type,
 					-charset => 'utf-8',
 					-last_modified => $latest_date{'rfc2822'},
@@ -6549,12 +6566,12 @@ sub git_feed {
 				return;
 			}
 		}
-		print $cgi->header(
+		print {$output_handler} $cgi->header(
 			-type => $content_type,
 			-charset => 'utf-8',
 			-last_modified => $latest_date{'rfc2822'});
 	} else {
-		print $cgi->header(
+		print {$output_handler} $cgi->header(
 			-type => $content_type,
 			-charset => 'utf-8');
 	}
@@ -6598,13 +6615,13 @@ sub git_feed {
 	} else {
 		$alt_url = href(-full=>1, action=>"summary");
 	}
-	print qq!<?xml version="1.0" encoding="utf-8"?>\n!;
+	print {$output_handler} qq!<?xml version="1.0" encoding="utf-8"?>\n!;
 	if ($format eq 'rss') {
-		print <<XML;
+		print {$output_handler} <<XML;
 <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
 <channel>
 XML
-		print "<title>$title</title>\n" .
+		print {$output_handler} "<title>$title</title>\n" .
 		      "<link>$alt_url</link>\n" .
 		      "<description>$descr</description>\n" .
 		      "<language>en</language>\n" .
@@ -6614,22 +6631,22 @@ XML
 			# prefer the logo to the favicon, since RSS
 			# doesn't allow both
 			my $img = esc_url($logo || $favicon);
-			print "<image>\n" .
+			print {$output_handler} "<image>\n" .
 			      "<url>$img</url>\n" .
 			      "<title>$title</title>\n" .
 			      "<link>$alt_url</link>\n" .
 			      "</image>\n";
 		}
 		if (%latest_date) {
-			print "<pubDate>$latest_date{'rfc2822'}</pubDate>\n";
-			print "<lastBuildDate>$latest_date{'rfc2822'}</lastBuildDate>\n";
+			print {$output_handler} "<pubDate>$latest_date{'rfc2822'}</pubDate>\n";
+			print {$output_handler} "<lastBuildDate>$latest_date{'rfc2822'}</lastBuildDate>\n";
 		}
-		print "<generator>gitweb v.$version/$git_version</generator>\n";
+		print {$output_handler} "<generator>gitweb v.$version/$git_version</generator>\n";
 	} elsif ($format eq 'atom') {
-		print <<XML;
+		print {$output_handler} <<XML;
 <feed xmlns="http://www.w3.org/2005/Atom">
 XML
-		print "<title>$title</title>\n" .
+		print {$output_handler} "<title>$title</title>\n" .
 		      "<subtitle>$descr</subtitle>\n" .
 		      '<link rel="alternate" type="text/html" href="' .
 		      $alt_url . '" />' . "\n" .
@@ -6639,19 +6656,19 @@ XML
 		      # use project owner for feed author
 		      "<author><name>$owner</name></author>\n";
 		if (defined $favicon) {
-			print "<icon>" . esc_url($favicon) . "</icon>\n";
+			print {$output_handler} "<icon>" . esc_url($favicon) . "</icon>\n";
 		}
 		if (defined $logo_url) {
 			# not twice as wide as tall: 72 x 27 pixels
-			print "<logo>" . esc_url($logo) . "</logo>\n";
+			print {$output_handler} "<logo>" . esc_url($logo) . "</logo>\n";
 		}
 		if (! %latest_date) {
 			# dummy date to keep the feed valid until commits trickle in:
-			print "<updated>1970-01-01T00:00:00Z</updated>\n";
+			print {$output_handler} "<updated>1970-01-01T00:00:00Z</updated>\n";
 		} else {
-			print "<updated>$latest_date{'iso-8601'}</updated>\n";
+			print {$output_handler} "<updated>$latest_date{'iso-8601'}</updated>\n";
 		}
-		print "<generator version='$version/$git_version'>gitweb</generator>\n";
+		print {$output_handler} "<generator version='$version/$git_version'>gitweb</generator>\n";
 	}
 
 	# contents
@@ -6676,7 +6693,7 @@ XML
 		# print element (entry, item)
 		my $co_url = href(-full=>1, action=>"commitdiff", hash=>$commit);
 		if ($format eq 'rss') {
-			print "<item>\n" .
+			print {$output_handler} "<item>\n" .
 			      "<title>" . esc_html($co{'title'}) . "</title>\n" .
 			      "<author>" . esc_html($co{'author'}) . "</author>\n" .
 			      "<pubDate>$cd{'rfc2822'}</pubDate>\n" .
@@ -6686,22 +6703,22 @@ XML
 			      "<content:encoded>" .
 			      "<![CDATA[\n";
 		} elsif ($format eq 'atom') {
-			print "<entry>\n" .
+			print {$output_handler} "<entry>\n" .
 			      "<title type=\"html\">" . esc_html($co{'title'}) . "</title>\n" .
 			      "<updated>$cd{'iso-8601'}</updated>\n" .
 			      "<author>\n" .
 			      "  <name>" . esc_html($co{'author_name'}) . "</name>\n";
 			if ($co{'author_email'}) {
-				print "  <email>" . esc_html($co{'author_email'}) . "</email>\n";
+				print {$output_handler} "  <email>" . esc_html($co{'author_email'}) . "</email>\n";
 			}
-			print "</author>\n" .
+			print {$output_handler} "</author>\n" .
 			      # use committer for contributor
 			      "<contributor>\n" .
 			      "  <name>" . esc_html($co{'committer_name'}) . "</name>\n";
 			if ($co{'committer_email'}) {
-				print "  <email>" . esc_html($co{'committer_email'}) . "</email>\n";
+				print {$output_handler} "  <email>" . esc_html($co{'committer_email'}) . "</email>\n";
 			}
-			print "</contributor>\n" .
+			print {$output_handler} "</contributor>\n" .
 			      "<published>$cd{'iso-8601'}</published>\n" .
 			      "<link rel=\"alternate\" type=\"text/html\" href=\"$co_url\" />\n" .
 			      "<id>$co_url</id>\n" .
@@ -6709,19 +6726,19 @@ XML
 			      "<div xmlns=\"http://www.w3.org/1999/xhtml\">\n";
 		}
 		my $comment = $co{'comment'};
-		print "<pre>\n";
+		print {$output_handler} "<pre>\n";
 		foreach my $line (@$comment) {
 			$line = esc_html($line);
-			print "$line\n";
+			print {$output_handler} "$line\n";
 		}
-		print "</pre><ul>\n";
+		print {$output_handler} "</pre><ul>\n";
 		foreach my $difftree_line (@difftree) {
 			my %difftree = parse_difftree_raw_line($difftree_line);
 			next if !$difftree{'from_id'};
 
 			my $file = $difftree{'file'} || $difftree{'to_file'};
 
-			print "<li>" .
+			print {$output_handler} "<li>" .
 			      "[" .
 			      $cgi->a({-href => href(-full=>1, action=>"blobdiff",
 			                             hash=>$difftree{'to_id'}, hash_parent=>$difftree{'from_id'},
@@ -6729,26 +6746,26 @@ XML
 			                             file_name=>$file, file_parent=>$difftree{'from_file'}),
 			              -title => "diff"}, 'D');
 			if ($have_blame) {
-				print $cgi->a({-href => href(-full=>1, action=>"blame",
+				print {$output_handler} $cgi->a({-href => href(-full=>1, action=>"blame",
 				                             file_name=>$file, hash_base=>$commit),
 				              -title => "blame"}, 'B');
 			}
 			# if this is not a feed of a file history
 			if (!defined $file_name || $file_name ne $file) {
-				print $cgi->a({-href => href(-full=>1, action=>"history",
+				print {$output_handler} $cgi->a({-href => href(-full=>1, action=>"history",
 				                             file_name=>$file, hash=>$commit),
 				              -title => "history"}, 'H');
 			}
 			$file = esc_path($file);
-			print "] ".
+			print {$output_handler} "] ".
 			      "$file</li>\n";
 		}
 		if ($format eq 'rss') {
-			print "</ul>]]>\n" .
+			print {$output_handler} "</ul>]]>\n" .
 			      "</content:encoded>\n" .
 			      "</item>\n";
 		} elsif ($format eq 'atom') {
-			print "</ul>\n</div>\n" .
+			print {$output_handler} "</ul>\n</div>\n" .
 			      "</content>\n" .
 			      "</entry>\n";
 		}
@@ -6756,9 +6773,9 @@ XML
 
 	# end of feed
 	if ($format eq 'rss') {
-		print "</channel>\n</rss>\n";
+		print {$output_handler} "</channel>\n</rss>\n";
 	} elsif ($format eq 'atom') {
-		print "</feed>\n";
+		print {$output_handler} "</feed>\n";
 	}
 }
 
@@ -6772,13 +6789,12 @@ sub git_atom {
 
 sub git_opml {
 	my @list = git_get_projects_list();
-
-	print $cgi->header(
+	print {$output_handler} $cgi->header(
 		-type => 'text/xml',
 		-charset => 'utf-8',
 		-content_disposition => 'inline; filename="opml.xml"');
 
-	print <<XML;
+	print {$output_handler} <<XML;
 <?xml version="1.0" encoding="utf-8"?>
 <opml version="1.0">
 <head>
@@ -6803,9 +6819,9 @@ XML
 		my $path = esc_html(chop_str($proj{'path'}, 25, 5));
 		my $rss  = href('project' => $proj{'path'}, 'action' => 'rss', -full => 1);
 		my $html = href('project' => $proj{'path'}, 'action' => 'summary', -full => 1);
-		print "<outline type=\"rss\" text=\"$path\" title=\"$path\" xmlUrl=\"$rss\" htmlUrl=\"$html\"/>\n";
+		print {$output_handler} "<outline type=\"rss\" text=\"$path\" title=\"$path\" xmlUrl=\"$rss\" htmlUrl=\"$html\"/>\n";
 	}
-	print <<XML;
+	print {$output_handler} <<XML;
 </outline>
 </body>
 </opml>
-- 
1.6.5.2

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org)
  2010-01-14  1:23               ` [PATCH 8/9] gitweb: Convert output to using indirect file handle John 'Warthog9' Hawley
@ 2010-01-14  1:23                 ` John 'Warthog9' Hawley
  2010-01-16  2:48                   ` Jakub Narebski
                                     ` (2 more replies)
  2010-01-16  0:43                 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle Jakub Narebski
  1 sibling, 3 replies; 60+ messages in thread
From: John 'Warthog9' Hawley @ 2010-01-14  1:23 UTC (permalink / raw)
  To: git

This is a very large patch that implements the file based
caching layer that is used on such large sites as kernel.org and
soon git.fedoraproject.org.  This provides a simple, and straight
forward caching mechanism that scales dramatically better than
Gitweb by itself.

The caching layer basically buffers the output that Gitweb would
normally return, and saves that output to a cache file on the local
disk.  When the file is requested it attempts to gain a shared lock
on the cache file and cat it out to the client.  Should an exclusive
lock be on a file (it's being updated) the code has a choice to either
update in the background and go ahead and show the stale page while
update is being performed, or stall the client(s) until the page
is generated.

There are two forms of stalling involved here, background building
and non-background building, both of which are discussed in the
configuration page.

There are still a few known "issues" with respect to this:
- Code needs to be added to be "browser" aware so
  that clients like wget that are trying to get a
  binary blob don't obtain a "Generating..." page
- There is an intermittent flushing issue that has yet
  to be tracked down

Caching is disabled by default with the $cache_enable variable,
setting this to 1 will enable file based caching.  It is expected
that this will be extended to include additional types of caching
(like memcached) in the future and should not be exclusively
considered a binary value.
---
 gitweb/cache.pm    |  283 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 gitweb/gitweb.css  |    6 +
 gitweb/gitweb.perl |   58 ++++++++++-
 3 files changed, 344 insertions(+), 3 deletions(-)
 create mode 100644 gitweb/cache.pm

diff --git a/gitweb/cache.pm b/gitweb/cache.pm
new file mode 100644
index 0000000..d08bcec
--- /dev/null
+++ b/gitweb/cache.pm
@@ -0,0 +1,283 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+#
+# This program is licensed under the GPLv2
+
+#
+# Gitweb caching engine
+#
+
+use File::Path qw(make_path remove_tree);
+use Digest::MD5 qw(md5 md5_hex md5_base64);
+use Fcntl ':flock';
+
+sub cache_fetch {
+	my ($action) = @_;
+	my $cacheTime = 0;
+
+	# Deal with cache being disabled
+	if( $cache_enable == 0 ){
+		$output_handler = *STDOUT;
+		$output_handler_bin = *STDOUT;
+		$actions{$action}->();
+		return;
+	}elsif( $cache_enable == 1 ){
+		#obviously we are using file based caching
+
+		if(! -d $cachedir){
+			print "*** Warning ***: Caching enabled but cache directory does not exsist.  ($cachedir)\n";
+			mkdir ("cache", 0665) || die "Cannot create cache dir - you will need to manually create";
+			print "Cache directory created successfully\n";
+		}
+
+		our $full_url = "$my_url?". $ENV{'QUERY_STRING'};
+		our $urlhash = md5_hex($full_url);
+		our $fullhashdir = "$cachedir/". substr( $urlhash, 0, 2) ."/";
+
+		my $numdirs = make_path( $fullhashdir, { mode => 0777, error => \my $mkdirerr, } );
+		if( @$mkdirerr ){
+			my $mkdirerrmsg = "";
+			for my $diag (@$mkdirerr) {
+				my ($file, $message) = %$diag;
+				if($file eq '' ){
+					$mkdirerrmsg .= "general error: $message\n";
+				}else{
+					$mkdirerrmsg .= "problem unlinking $file: $message\n";
+				}
+			}
+			die_error(500, "Could not create cache directory | $mkdirerrmsg");
+		}
+		$fullhashpath = "$fullhashdir/". substr( $urlhash, 2 );
+		$fullhashbinpath = "$fullhashpath.bin";
+	} # done dealing with cache enabled / disabled
+
+	if(! -e "$fullhashpath" ){
+		if(! defined(my $childPid = fork()) ){
+			cacheUpdate($action,0);
+			cacheDisplay($action);
+		} elsif ( $childPid == 0 ){
+			#run the updater
+			cacheUpdate($action,1);
+		}else{
+			cacheWaitForUpdate($action);
+		}
+	}else{
+		#if cache is out dated, update
+		#else displayCache();
+		open(cacheFile, '<', "$fullhashpath");
+		stat(cacheFile);
+		close(cacheFile);
+		$cacheTime = get_loadavg() * 60;
+		if( $cacheTime > $maxCacheTime ){
+			$cacheTime = $maxCacheTime;
+		}
+		if( $cacheTime < $minCacheTime ){
+			$cacheTime = $minCacheTime;
+		}
+		if( (stat(_))[9] < (time - $cacheTime) ){
+			if( ! defined(my $childPid = fork()) ){
+				cacheUpdate($action,0);
+				cacheDisplay($action);
+			} elsif ( $childPid == 0 ){
+				#run the updater
+				#print "Running updater\n";
+				cacheUpdate($action,1);
+			}else{
+				#print "Waiting for update\n";
+				cacheWaitForUpdate($action);
+			}
+		} else {
+			cacheDisplay($action);
+		}
+
+
+	}
+
+	#
+	# If all of the caching failes - lets go ahead and press on without it and fall back to 'default'
+	# non-caching behavior.  This is the softest of the failure conditions.
+	#
+	#$actions{$action}->();
+}
+
+sub cacheUpdate {
+	my ($action,$areForked) = @_;
+	my $lockingStatus;
+	my $fileData = "";
+
+	if($backgroundCache){
+		open(cacheFileBG, '>:utf8', "$fullhashpath.bg");
+		my $lockStatBG = flock(cacheFileBG,LOCK_EX|LOCK_NB);
+
+		$lockStatus = $lockStatBG;
+	}else{
+		open(cacheFile, '>:utf8', "$fullhashpath");
+		my $lockStat = flock(cacheFile,LOCK_EX|LOCK_NB);
+
+		$lockStatus = $lockStat;
+	}
+	#print "lock status: $lockStat\n";
+
+
+	if (! $lockStatus ){
+		if ( $areForked ){
+			exit(0);
+		}else{
+			return;
+		}
+	}
+
+	if(
+		$action eq "snapshot"
+		||
+		$action eq "blob_plain"
+	){
+		open cacheFileBin, '>', $fullhashbinpath or die_error(500, "Could not open bin dump file");
+		$output_handler_bin = *cacheFileBin;
+	}
+
+	$output_handler = *cacheFile;
+
+	if($backgroundCache){
+		open(cacheFile, '>:utf8', "$fullhashpath");
+		$lockStat = flock(cacheFile,LOCK_EX);
+
+		if (! $lockStat ){
+			if ( $areForked ){
+				exit(0);
+			}else{
+				return;
+			}
+		}
+	}
+
+	$actions{$action}->();
+
+	if(
+		$action eq "snapshot"
+		||
+		$action eq "blob_plain"
+	){
+		close(cacheFileBin);
+	}
+
+	flock(cacheFile,LOCK_UN);
+	close(cacheFile);
+
+	if($backgroundCache){
+		flock(cacheFileBG,LOCK_UN);
+		close(cacheFileBG);
+	}
+
+	if ( $areForked ){
+		exit(0);
+	} else {
+		return;
+	}
+}
+
+
+sub cacheWaitForUpdate {
+	my ($action) = @_;
+	my $x = 0;
+	my $max = 10;
+	my $lockStat = 0;
+
+	if( $backgroundCache ){
+		if( -e "$fullhashpath" ){
+			open(cacheFile, '<:utf8', "$fullhashpath");
+			$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
+			stat(cacheFile);
+			close(cacheFile);
+
+			if( $lockStat && ( (stat(_))[9] > (time - $maxCacheLife) ) ){
+				cacheDisplay($action);
+				return;
+			}
+		}
+	}
+
+	if(
+		$action eq "atom"
+		||
+		$action eq "rss"
+		||
+		$action eq "opml"
+	){
+		do {
+			sleep 2 if $x > 0;
+			open(cacheFile, '<:utf8', "$fullhashpath");
+			$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
+			close(cacheFile);
+			$x++;
+			$combinedLockStat = $lockStat;
+		} while ((! $combinedLockStat) && ($x < $max));
+
+		if( $x != $max ){
+			cacheDisplay($action);
+		}
+		return;
+	}
+
+	$| = 1;
+
+	print $::cgi->header(-type=>'text/html', -charset => 'utf-8',
+	                   -status=> 200, -expires => 'never');
+
+	print <<EOF;
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www/w3.porg/TR/html4/strict.dtd">
+<!-- git web w/caching interface version $version, (C) 2006-2010, John 'Warthog9' Hawley <warthog9\@kernel.org> -->
+<!-- git core binaries version $git_version -->
+<head>
+<meta http-equiv="content-type" content="$content_type; charset=utf-8"/>
+<meta name="generator" content="gitweb/$version git/$git_version"/>
+<meta name="robots" content="index, nofollow"/>
+<meta http-equiv="refresh" content="0"/>
+<title>$title</title>
+</head>
+<body>
+EOF
+
+	print "Generating..";
+	do {
+		print ".";
+		sleep 2 if $x > 0;
+		open(cacheFile, '<:utf8', "$fullhashpath");
+		$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
+		close(cacheFile);
+		$x++;
+		$combinedLockStat = $lockStat;
+	} while ((! $combinedLockStat) && ($x < $max));
+	print <<EOF;
+</body>
+</html>
+EOF
+	return;
+}
+
+sub cacheDisplay {
+	my ($action) = @_;
+	open(cacheFile, '<:utf8', "$fullhashpath");
+	$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
+	if (! $lockStat ){
+		close(cacheFile);
+		cacheWaitForUpdate($action);
+	}
+
+	while( <cacheFile> ){
+		print $_;
+	}
+	if(
+		$action eq "snapshot"
+		||
+		$action eq "blob_plain"
+	){
+		open(cacheFileBin, '<', "$fullhashbinpath");
+		binmode STDOUT, ':raw';
+		print <cacheFileBin>;
+		binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+		close(cacheFileBin);
+	}
+	close(cacheFile);
+}
diff --git a/gitweb/gitweb.css b/gitweb/gitweb.css
index 50067f2..f809f16 100644
--- a/gitweb/gitweb.css
+++ b/gitweb/gitweb.css
@@ -67,6 +67,12 @@ div.page_path {
 	border-width: 0px 0px 1px;
 }
 
+div.cachetime {
+	float: left;
+	margin-right: 10px;
+	color: #555555;
+}
+
 div.page_footer {
 	height: 17px;
 	padding: 4px 8px;
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 8bb323c..ec95bb9 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -230,6 +230,50 @@ our $git_versions_must_match = 1;
 # Leave it undefined (or set to 'undef') to turn off load checking.
 our $maxload = 300;
 
+# This enables/disables the caching layer in gitweb.  This currently only supports the
+# 'dumb' file based caching layer, primarily used on git.kernel.org.  this is reasonably
+# effective but it has the downside of requiring a huge amount of disk space if there
+# are a number of repositories involved.  It is not uncommon for git.kernel.org to have
+# on the order of 80G - 120G accumulate over the course of a few months.  It is recommended
+# that the cache directory be periodically completely deleted, and this is safe to perform.
+# Suggested mechanism
+# mv $cacheidr $cachedir.flush;mkdir $cachedir;rm -rf $cachedir.flush
+# Value is binary. 0 = disabled (default), 1 = enabled.
+#
+# Values of caching:
+# 	1 = 'dumb' file based caching used on git.kernel.org
+our $cache_enable = 0;
+
+# Used to set the minimum cache timeout for the dynamic caching algorithm.  Basically
+# if we calculate the cache to be under this number of seconds we set the cache timeout
+# to this minimum.
+# Value is in seconds.  1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour
+our $minCacheTime = 20;
+
+# Used to set the maximum cache timeout for the dynamic caching algorithm.  Basically
+# if we calculate the cache to exceed this number of seconds we set the cache timeout
+# to this maximum.
+# Value is in seconds.  1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour
+our $maxCacheTime = 1200;
+
+# If you need to change the location of the caching directory, override this
+# otherwise this will probably do fine for you
+our $cachedir = 'cache';
+
+# If this is set (to 1) cache will do it's best to always display something instead
+# of making someone wait for the cache to update.  This will launch the cacheUpdate
+# into the background and it will lock a <file>.bg file and will only lock the
+# actual cache file when it needs to write into it.  In theory this will make
+# gitweb seem more responsive at the price of possibly stale data.
+our $backgroundCache = 1;
+
+# Used to set the maximum cache file life.  If a cache files last modify time exceeds
+# this value, it will assume that the data is just too old, and HAS to be regenerated
+# instead of trying to display the existing cache data.
+# Value is in seconds.  1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour
+# 18000 = 5 hours
+our $maxCacheLife = 18000;
+
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %feature = (
@@ -593,6 +637,11 @@ if (defined $maxload && get_loadavg() > $maxload) {
 	die_error(503, "The load average on the server is too high");
 }
 
+#
+# Includes
+#
+do 'cache.pm';
+
 # version of the core git binary
 our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown";
 $number_of_git_cmds++;
@@ -994,7 +1043,7 @@ if ($action !~ m/^(?:opml|project_list|project_index)$/ &&
     !$project) {
 	die_error(400, "Project needed");
 }
-$actions{$action}->();
+cache_fetch($action);
 exit;
 
 ## ======================================================================
@@ -3200,7 +3249,9 @@ sub git_header_html {
 	# support xhtml+xml but choking when it gets what it asked for.
 	if (defined $cgi->http('HTTP_ACCEPT') &&
 	    $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ &&
-	    $cgi->Accept('application/xhtml+xml') != 0) {
+	    $cgi->Accept('application/xhtml+xml') != 0
+	    &&
+	    $cache_enable == 0) {
 		$content_type = 'application/xhtml+xml';
 	} else {
 		$content_type = 'text/html';
@@ -3344,6 +3395,7 @@ sub git_footer_html {
 	my $feed_class = 'rss_logo';
 
 	print {$output_handler} "<div class=\"page_footer\">\n";
+	print {$output_handler} "<div class=\"cachetime\">Cache Last Updated: ". gmtime( time ) ." GMT</div>\n";
 	if (defined $project) {
 		my $descr = git_get_project_description($project);
 		if (defined $descr) {
@@ -3424,7 +3476,7 @@ sub die_error {
 	my $extra = shift;
 
 	# The output handlers for die_error need to be reset to STDOUT
-	# so that half the message isn't being output to random and 
+	# so that half the message isn't being output to random and
 	# half to STDOUT as expected.  This is mainly for the benefit
 	# of using git_header_html() and git_footer_html() since those
 	# internaly use the indirect print handler.
-- 
1.6.5.2

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org)
  2010-01-14  1:23                 ` [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
@ 2010-01-16  2:48                   ` Jakub Narebski
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
       [not found]                   ` <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com>
  2 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-16  2:48 UTC (permalink / raw)
  To: John 'Warthog9' Hawley, John 'Warthog9' Hawley; +Cc: git

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> This is a very large patch

This is true, and that is why I am woeking on splitting this patch
into series of smaller patches, each adding single feature present in
this megapatch (this code drop)... and cleaning up (and improving) it
while at it.  This hopefully would make it easier to review.

>                             that implements the file based
> caching layer that is used on such large sites as kernel.org and
> soon git.fedoraproject.org.  This provides a simple, and straight
> forward caching mechanism that scales dramatically better than
> Gitweb by itself.

Do you have any benchmarks comparing gitweb performace with and
without caching enabled?

> 
> The caching layer basically buffers the output that Gitweb would
> normally return, and saves that output to a cache file on the local
> disk.  When the file is requested it attempts to gain a shared lock
> on the cache file and cat it out to the client.  Should an exclusive
> lock be on a file (it's being updated) the code has a choice to either
> update in the background and go ahead and show the stale page while
> update is being performed, or stall the client(s) until the page
> is generated.

The above paragraph is not very clear to me.

Correct me if I am wrong, but as I understand it the cache
architecture is as following:

* This patch implements output caching, which means that the whole
  gitweb response, including HTTP headers, is stored in cache.  (This
  means that in absence of extra mechanism content-type negotiation
  should be disabled when caching is turned on).

* Caching engine used implements simple file based caching layer,
  where cached data is stored verbatim in cache file (no serialization
  / hibernating / marshalling of data - better performance, and
  possibility of X-Sendfile support).  Cache expiration is global
  value, i.e. is not stored along cache entry in file.  Cache entries
  expire based on mtime of file.

* When there exist cache entry for given request, and it is not
  expired, gitweb output is served directtly from cache file.

* When there exist cache entry for given request, but it is expired,
  one process acquires exclusive (writer) lock on file; the rest of
  clients get served stale data.

* When there does not exist cache entry for given request, one process
  acquires exclusive (writer) lock on cache file; the rest of clients
  wait for cache to be filled.

> 
> There are two forms of stalling involved here, background building
> and non-background building, both of which are discussed in the
> configuration page.

I'd like to have at least design decisions put into commit message,
and perhaps also have caching mechanism described in separate section
in gitweb/README.

> 
> There are still a few known "issues" with respect to this:
> - Code needs to be added to be "browser" aware so
>   that clients like wget that are trying to get a
>   binary blob don't obtain a "Generating..." page

This issue should be clearly addressed: when do we serve
"Generating..." page, and when we do not.  The issue is not only wget
trying to download binary blob or patchset, or snapshot, but also
binary blob which is image referenced from a blob which is HTML, and
there is issue of web feeds (accessed by feed readers).

> - There is an intermittent flushing issue that has yet
>   to be tracked down

Could you tell us more where does this shows (what are the
symptompts)?

BTW if it was split into small separate commits, you could be able to
find bug by bisecting history.  Also troubles with finding this bug
might mean that code is not very clean.

> 
> Caching is disabled by default with the $cache_enable variable,
> setting this to 1 will enable file based caching.  It is expected
> that this will be extended to include additional types of caching
> (like memcached) in the future and should not be exclusively
> considered a binary value.

Not a good idea, IMHO.  In my rewrite of this patch there is _boolean_
$caching_enabled variable which controls if (output) caching is
enabled or not, and $cache variable holding instance of cache engine,
which might be used to select different caching that simple file-based
caching.

Signoff?

> ---
>  gitweb/cache.pm    |  283 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  gitweb/gitweb.css  |    6 +
>  gitweb/gitweb.perl |   58 ++++++++++-
>  3 files changed, 344 insertions(+), 3 deletions(-)
>  create mode 100644 gitweb/cache.pm

Very large patch... but no updates to gitweb/README, no updates to
t/gitweb-lib.sh (I guess that gitweb tests are no longer working).

> diff --git a/gitweb/cache.pm b/gitweb/cache.pm
> new file mode 100644
> index 0000000..d08bcec
> --- /dev/null
> +++ b/gitweb/cache.pm
> @@ -0,0 +1,283 @@
> +# gitweb - simple web interface to track changes in git repositories
> +#
> +# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
> +#
> +# This program is licensed under the GPLv2
> +
> +#
> +# Gitweb caching engine
> +#
> +
> +use File::Path qw(make_path remove_tree);

Using make_path (you do not use remove_tree, so there is no need for
importing it) instead of older mkdir interface requires File::Path
version 2.0 (which meant that I had to upgrade File::Path).  This at
least should be mentioned in the comment, perhaps also in
gitweb/INSTALL.

> +use Digest::MD5 qw(md5 md5_hex md5_base64);

You use only md5_hex; no need to import other functions.

> +use Fcntl ':flock';
> +
> +sub cache_fetch {
> +	my ($action) = @_;
> +	my $cacheTime = 0;
> +
> +	# Deal with cache being disabled
> +	if( $cache_enable == 0 ){

Style:

  +	if ($cache_enable == 0) {

or better

  +	if ($cache_enabled) {

> +		$output_handler = *STDOUT;
> +		$output_handler_bin = *STDOUT;

There should be no need for that, as $output_handle is set to *STDOUT
(or \*STDOUT) anyway.

> +		$actions{$action}->();
> +		return;

Anyway I think that the whole block should be _outside_ cache_fetch,
which should be invoked only if caching is enabled.  For example in
gitweb.perl:

  if ($caching_enabled) {
  	do $cache_pm;
  	die $@ if $@;

  	# ...

	cache_fetch($cache, $action);
  } else {
  	$actions{$action}->();
  }

> +	}elsif( $cache_enable == 1 ){

Style.

> +		#obviously we are using file based caching

See my comment about using $cache_enable as enum selecting cache type
(blergh).  BTW what's with 'obviously'?

> +
> +		if(! -d $cachedir){

Style.

> +			print "*** Warning ***: Caching enabled but cache directory does not exsist.  ($cachedir)\n";

Why this warning?  Is it really necessary?

> +			mkdir ("cache", 0665) || die "Cannot create cache dir - you will need to manually create";
> +			print "Cache directory created successfully\n";
> +		}
> +
> +		our $full_url = "$my_url?". $ENV{'QUERY_STRING'};

Wouldn't work if you client uses path_info URL, e.g.

  http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel

That's why I use href(-replay=>1, -full_url=>1, -path_info=>0) for
cache key for request (you could use freeze(\%input_params) instead,
where freeze is from Storable module).

> +		our $urlhash = md5_hex($full_url);
> +		our $fullhashdir = "$cachedir/". substr( $urlhash, 0, 2) ."/";

Is depth 2 enough for cache?

> +
> +		my $numdirs = make_path( $fullhashdir, { mode => 0777, error => \my $mkdirerr, } );
> +		if( @$mkdirerr ){
> +			my $mkdirerrmsg = "";
> +			for my $diag (@$mkdirerr) {
> +				my ($file, $message) = %$diag;
> +				if($file eq '' ){
> +					$mkdirerrmsg .= "general error: $message\n";
> +				}else{
> +					$mkdirerrmsg .= "problem unlinking $file: $message\n";
> +				}
> +			}
> +			die_error(500, "Could not create cache directory | $mkdirerrmsg");
> +		}
> +		$fullhashpath = "$fullhashdir/". substr( $urlhash, 2 );
> +		$fullhashbinpath = "$fullhashpath.bin";
> +	} # done dealing with cache enabled / disabled

Note also if dealing with caching enabled / disabled was outside
cache_fetch you would have less nested code.

> +
> +	if(! -e "$fullhashpath" ){
> +		if(! defined(my $childPid = fork()) ){

Style.

> +			cacheUpdate($action,0);
> +			cacheDisplay($action);

Why camelCase Java/JavaScript-like convention, quite different from
the C-like naming convention used elsewhere in gitweb?

> +		} elsif ( $childPid == 0 ){
> +			#run the updater
> +			cacheUpdate($action,1);

cacheUpdate($action,0) vs cacheUpdate($action,1) is very cryptic
distinctions.  It would be better to use "named parameter" and/or
separate, differently named, [wrapper] functions.

> +		}else{
> +			cacheWaitForUpdate($action);
> +		}

This whole block should probably be in a separate function.

> +	}else{
> +		#if cache is out dated, update
> +		#else displayCache();
> +		open(cacheFile, '<', "$fullhashpath");
> +		stat(cacheFile);
> +		close(cacheFile);

You don't need to open file to stat it.

> +		$cacheTime = get_loadavg() * 60;
> +		if( $cacheTime > $maxCacheTime ){
> +			$cacheTime = $maxCacheTime;
> +		}
> +		if( $cacheTime < $minCacheTime ){
> +			$cacheTime = $minCacheTime;
> +		}

This should probably be a separate function (effective cache expiraton
time).  Also adaptiveness of caching is not described in commit
message.

> +		if( (stat(_))[9] < (time - $cacheTime) ){
> +			if( ! defined(my $childPid = fork()) ){
> +				cacheUpdate($action,0);
> +				cacheDisplay($action);
> +			} elsif ( $childPid == 0 ){
> +				#run the updater
> +				#print "Running updater\n";

Remains of debugging.

> +				cacheUpdate($action,1);
> +			}else{
> +				#print "Waiting for update\n";
> +				cacheWaitForUpdate($action);
> +			}

Repeated code (I think).

> +		} else {
> +			cacheDisplay($action);
> +		}
> +
> +
> +	}
> +
> +	#
> +	# If all of the caching failes - lets go ahead and press on without it and fall back to 'default'
> +	# non-caching behavior.  This is the softest of the failure conditions.
> +	#
> +	#$actions{$action}->();

Why is this commented out?

> +}
> +
> +sub cacheUpdate {
> +	my ($action,$areForked) = @_;
> +	my $lockingStatus;
> +	my $fileData = "";
> +
> +	if($backgroundCache){
> +		open(cacheFileBG, '>:utf8', "$fullhashpath.bg");
> +		my $lockStatBG = flock(cacheFileBG,LOCK_EX|LOCK_NB);
> +
> +		$lockStatus = $lockStatBG;
> +	}else{
> +		open(cacheFile, '>:utf8', "$fullhashpath");
> +		my $lockStat = flock(cacheFile,LOCK_EX|LOCK_NB);
> +
> +		$lockStatus = $lockStat;
> +	}

Almost identical code.  Use of global handles instead of indirect
filehandles.

> +	#print "lock status: $lockStat\n";
> +
> +
> +	if (! $lockStatus ){
> +		if ( $areForked ){
> +			exit(0);
> +		}else{
> +			return;
> +		}
> +	}

This conditional needs explanation (comment), I think.

> +
> +	if(
> +		$action eq "snapshot"
> +		||
> +		$action eq "blob_plain"

This condition should be put in a separate function/

> +	){
> +		open cacheFileBin, '>', $fullhashbinpath or die_error(500, "Could not open bin dump file");
> +		$output_handler_bin = *cacheFileBin;
> +	}
> +
> +	$output_handler = *cacheFile;
> +
> +	if($backgroundCache){
> +		open(cacheFile, '>:utf8', "$fullhashpath");

Why "$fullhashpath" and not simply $fullhashpath?

> +		$lockStat = flock(cacheFile,LOCK_EX);
> +
> +		if (! $lockStat ){
> +			if ( $areForked ){
> +				exit(0);
> +			}else{
> +				return;
> +			}
> +		}

Repeated code.

> +	}
> +
> +	$actions{$action}->();
> +
> +	if(
> +		$action eq "snapshot"
> +		||
> +		$action eq "blob_plain"
> +	){
> +		close(cacheFileBin);
> +	}
> +
> +	flock(cacheFile,LOCK_UN);
> +	close(cacheFile);
> +
> +	if($backgroundCache){
> +		flock(cacheFileBG,LOCK_UN);
> +		close(cacheFileBG);
> +	}
> +
> +	if ( $areForked ){
> +		exit(0);
> +	} else {
> +		return;
> +	}
> +}
> +
> +
> +sub cacheWaitForUpdate {
> +	my ($action) = @_;
> +	my $x = 0;
> +	my $max = 10;

What is $x, what is $max?

> +	my $lockStat = 0;
> +
> +	if( $backgroundCache ){
> +		if( -e "$fullhashpath" ){
> +			open(cacheFile, '<:utf8', "$fullhashpath");

Why opening with :uft8, and not with :raw?  I don't think we need to
do the eventual conversion once again...

> +			$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
> +			stat(cacheFile);
> +			close(cacheFile);
> +
> +			if( $lockStat && ( (stat(_))[9] > (time - $maxCacheLife) ) ){
> +				cacheDisplay($action);
> +				return;
> +			}

Why do we deal with cache expiration in two places?  If it is not
a bug, it should be explained in a comment.

> +		}
> +	}
> +
> +	if(
> +		$action eq "atom"
> +		||
> +		$action eq "rss"
> +		||
> +		$action eq "opml"
> +	){
> +		do {
> +			sleep 2 if $x > 0;
> +			open(cacheFile, '<:utf8', "$fullhashpath");
> +			$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
> +			close(cacheFile);
> +			$x++;
> +			$combinedLockStat = $lockStat;
> +		} while ((! $combinedLockStat) && ($x < $max));

Why busy wait instead of _blocking_ lock, i.e. waiting on lock for it
to be free?  It doesn't look like we _do_ anything in the loop.

Ah, I see that we wait at most 2*$max seconds (where interval of 2
seconds is hardcoded).  Is it really necessary?

> +
> +		if( $x != $max ){
> +			cacheDisplay($action);
> +		}
> +		return;
> +	}
> +
> +	$| = 1;
> +
> +	print $::cgi->header(-type=>'text/html', -charset => 'utf-8',
> +	                   -status=> 200, -expires => 'never');
> +
> +	print <<EOF;
> +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www/w3.porg/TR/html4/strict.dtd">
> +<!-- git web w/caching interface version $version, (C) 2006-2010, John 'Warthog9' Hawley <warthog9\@kernel.org> -->
> +<!-- git core binaries version $git_version -->
> +<head>
> +<meta http-equiv="content-type" content="$content_type; charset=utf-8"/>
> +<meta name="generator" content="gitweb/$version git/$git_version"/>
> +<meta name="robots" content="index, nofollow"/>
> +<meta http-equiv="refresh" content="0"/>
> +<title>$title</title>
> +</head>
> +<body>
> +EOF
> +
> +	print "Generating..";
> +	do {
> +		print ".";
> +		sleep 2 if $x > 0;
> +		open(cacheFile, '<:utf8', "$fullhashpath");
> +		$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
> +		close(cacheFile);
> +		$x++;
> +		$combinedLockStat = $lockStat;
> +	} while ((! $combinedLockStat) && ($x < $max));

This trick of having http-equiv 'refresh' meta with the delay of 0
seconds, but not closing the output and therefore not triggering
redirect should be described in comments, and perhaps also in the
commit message.

> +	print <<EOF;
> +</body>
> +</html>
> +EOF
> +	return;
> +}
> +
> +sub cacheDisplay {
> +	my ($action) = @_;
> +	open(cacheFile, '<:utf8', "$fullhashpath");
> +	$lockStat = flock(cacheFile,LOCK_SH|LOCK_NB);
> +	if (! $lockStat ){
> +		close(cacheFile);
> +		cacheWaitForUpdate($action);
> +	}
> +
> +	while( <cacheFile> ){
> +		print $_;
> +	}

Why not slurp it (local $/ = undef), but write line after line?

> +	if(
> +		$action eq "snapshot"
> +		||
> +		$action eq "blob_plain"
> +	){
> +		open(cacheFileBin, '<', "$fullhashbinpath");
> +		binmode STDOUT, ':raw';
> +		print <cacheFileBin>;

Why not slurp it (local $/ = undef), but write line after line,
implicitly?

> +		binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
> +		close(cacheFileBin);
> +	}
> +	close(cacheFile);
> +}

> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 8bb323c..ec95bb9 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -230,6 +230,50 @@ our $git_versions_must_match = 1;
>  # Leave it undefined (or set to 'undef') to turn off load checking.
>  our $maxload = 300;
>  
> +# This enables/disables the caching layer in gitweb.  This currently only supports the
> +# 'dumb' file based caching layer, primarily used on git.kernel.org.  this is reasonably
> +# effective but it has the downside of requiring a huge amount of disk space if there
> +# are a number of repositories involved.  It is not uncommon for git.kernel.org to have
> +# on the order of 80G - 120G accumulate over the course of a few months.  It is recommended
> +# that the cache directory be periodically completely deleted, and this is safe to perform.
> +# Suggested mechanism
> +# mv $cacheidr $cachedir.flush;mkdir $cachedir;rm -rf $cachedir.flush
> +# Value is binary. 0 = disabled (default), 1 = enabled.
> +#
> +# Values of caching:
> +# 	1 = 'dumb' file based caching used on git.kernel.org
> +our $cache_enable = 0;
> +
> +# Used to set the minimum cache timeout for the dynamic caching algorithm.  Basically
> +# if we calculate the cache to be under this number of seconds we set the cache timeout
> +# to this minimum.
> +# Value is in seconds.  1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour
> +our $minCacheTime = 20;
> +
> +# Used to set the maximum cache timeout for the dynamic caching algorithm.  Basically
> +# if we calculate the cache to exceed this number of seconds we set the cache timeout
> +# to this maximum.
> +# Value is in seconds.  1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour
> +our $maxCacheTime = 1200;
> +
> +# If you need to change the location of the caching directory, override this
> +# otherwise this will probably do fine for you
> +our $cachedir = 'cache';

Why not '/tmp/gitweb-cache', or '/var/cache/gitweb'?  Perhaps use
TMPDIR / File::Spec->tmpdir() if it is undefined?

Note that this path is relative to the place where we run gitweb from,
which is important for gitweb tests.

> +
> +# If this is set (to 1) cache will do it's best to always display something instead
> +# of making someone wait for the cache to update.  This will launch the cacheUpdate
> +# into the background and it will lock a <file>.bg file and will only lock the
> +# actual cache file when it needs to write into it.  In theory this will make
> +# gitweb seem more responsive at the price of possibly stale data.
> +our $backgroundCache = 1;

Does it mean that if there exist cache entry for given request, but it
is expired, also the client that created write lock gets stale data
instead of 'Generating...' info, and updates/regenerates cache using
background process?

This comment is not entirely clear for me.

> +
> +# Used to set the maximum cache file life.  If a cache files last modify time exceeds
> +# this value, it will assume that the data is just too old, and HAS to be regenerated
> +# instead of trying to display the existing cache data.
> +# Value is in seconds.  1 = 1 seconds, 60 = 1 minute, 600 = 10 minutes, 3600 = 1 hour
> +# 18000 = 5 hours
> +our $maxCacheLife = 18000;

This should also be mentioned in commit message (modifying what I
wrote).

> +
>  # You define site-wide feature defaults here; override them with
>  # $GITWEB_CONFIG as necessary.
>  our %feature = (
> @@ -593,6 +637,11 @@ if (defined $maxload && get_loadavg() > $maxload) {
>  	die_error(503, "The load average on the server is too high");
>  }
>  
> +#
> +# Includes
> +#
> +do 'cache.pm';

Should be

  +do "$cache_pm";

if you don't use require, where $cache_pm can be overriden in gitweb
config, otherwise gitweb caching tests wouldn't work: they invoke
gitweb from test directory.

> +
>  # version of the core git binary
>  our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown";
>  $number_of_git_cmds++;
> @@ -994,7 +1043,7 @@ if ($action !~ m/^(?:opml|project_list|project_index)$/ &&
>      !$project) {
>  	die_error(400, "Project needed");
>  }
> -$actions{$action}->();
> +cache_fetch($action);
>  exit;
>  

As I wrote, I think cache_fetch should be invoked only when caching is
enabled.

>  ## ======================================================================
> @@ -3200,7 +3249,9 @@ sub git_header_html {
>  	# support xhtml+xml but choking when it gets what it asked for.
>  	if (defined $cgi->http('HTTP_ACCEPT') &&
>  	    $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ &&
> -	    $cgi->Accept('application/xhtml+xml') != 0) {
> +	    $cgi->Accept('application/xhtml+xml') != 0
> +	    &&
> +	    $cache_enable == 0) {
>  		$content_type = 'application/xhtml+xml';
>  	} else {
>  		$content_type = 'text/html';

O.K.

> @@ -3344,6 +3395,7 @@ sub git_footer_html {
>  	my $feed_class = 'rss_logo';
>  
>  	print {$output_handler} "<div class=\"page_footer\">\n";
> +	print {$output_handler} "<div class=\"cachetime\">Cache Last Updated: ". gmtime( time ) ." GMT</div>\n";

Shouldn't this be conditional on $cache_enabled?

>  	if (defined $project) {
>  		my $descr = git_get_project_description($project);
>  		if (defined $descr) {

BTW. you need, I think, protect timing info and do not show it if
caching is enabled.  It doesn't make much sense to show how much time
it took to generate page... when said page could have been retrieved
from cache.

But it might make sense; I am not sure.

> @@ -3424,7 +3476,7 @@ sub die_error {
>  	my $extra = shift;
>  
>  	# The output handlers for die_error need to be reset to STDOUT
> -	# so that half the message isn't being output to random and 
> +	# so that half the message isn't being output to random and
>  	# half to STDOUT as expected.  This is mainly for the benefit
>  	# of using git_header_html() and git_footer_html() since those
>  	# internaly use the indirect print handler.

It looks like spurious change.

> -- 
> 1.6.5.2
> 

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC PATCH 00/10] gitweb: Simple file based output caching
  2010-01-14  1:23                 ` [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
  2010-01-16  2:48                   ` Jakub Narebski
@ 2010-01-23  0:27                   ` Jakub Narebski
  2010-01-23  0:27                     ` [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski
                                       ` (11 more replies)
       [not found]                   ` <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com>
  2 siblings, 12 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-23  0:27 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

This 10 patches long patch series is intended as proof of concept
for splitting large 'gitweb: File based caching layer (from git.kernel.org)'
mega-patch by John 'Warthog9' Hawley aka J.H., by starting small and
adding features piece by piece.

This patch is meant as replacement for last two patches:
* [PATCH 8/9] gitweb: Convert output to using indirect file handle
  Message-ID: <1263432185-21334-9-git-send-email-warthog9@eaglescrag.net>
* [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org)
  Message-ID: <1263432185-21334-10-git-send-email-warthog9@eaglescrag.net>

in the long patch series by J.H.
* [PATCH 0/9] Gitweb caching v5
  http://thread.gmane.org/gmane.comp.version-control.git/136913

Note that this patch series is part of 'gitweb/cache-kernel' branch of
http://repo.or.cz/w/git/jnareb-git.git repository (gitweb link), built
on top of modified patches from 'Gitweb caching v2' series (from
'gitweb-ml-v2' branch of http://git.kernel.org/?p=git/warthog9/gitweb.git
repository).  Therefore they might not apply as straight replacements
on top of early parts of 'gitweb-ml-v5' branch.

This is work in progress (showing how I see introducing output caching
to gitweb), it lacks proper documentation (POD for gitweb/cache.pm,
new configuration variables in gitweb/README, perhaps "Gitweb caching"
section in gitweb/README and gitweb/cache.pm mentioned in gitweb/INSTALL),
and commits/patches marked '(WIP)' lacks proper commit message.

Just food for thought...

Table of contents:
~~~~~~~~~~~~~~~~~~
 [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing
                   for caching)
 [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1)
 [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching
 [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration
 [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set)
                   output caching (WIP)
 [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP)
 [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP)
 [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd'
                   problem (WIP)
 [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for
                   filling cache (WIP)
 [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when
                   regenerating cache (WIP)


Diffstat:
~~~~~~~~~

 gitweb/cache.pm                        |  566 ++++++++++
 gitweb/gitweb.perl                     | 1923 +++++++++++++++++---------------
 t/gitweb-lib.sh                        |    2 +
 t/t9500-gitweb-standalone-no-errors.sh |   13 +
 t/t9503-gitweb-caching.sh              |   32 +
 t/t9503/test_cache_interface.pl        |  195 ++++
 t/test-lib.sh                          |    3 +
 7 files changed, 1836 insertions(+), 898 deletions(-)
 create mode 100644 gitweb/cache.pm
 create mode 100755 t/t9503-gitweb-caching.sh
 create mode 100755 t/t9503/test_cache_interface.pl

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1)
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
@ 2010-01-23  0:27                     ` Jakub Narebski
  2010-01-23  0:27                     ` [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching Jakub Narebski
                                       ` (10 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-23  0:27 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

If named boolean option -path_info is passed to href() subroutine, use
its value, instead of querying 'pathinfo' feature to check whether to
generate path_info URL (if generating path_info link is possible at all).

href(-replay=>1, -path_info=>0) is meant to be used to generate a key
for caching gitweb output; alternate solution would be to use freeze()
from Storable (core module) on %input_params hash (or its reference),
e.g.:
  $key = freeze \%input_params;
or other serialization technique.

While at it document extra options/flags to href().

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Actually after discussion with J.H.
  http://thread.gmane.org/gmane.comp.version-control.git/136913/focus=137061
I have changed my mind and now I think that good human-readable key
for output caching is href(-replay=>1, -full=>1, -path_info=>0);

The reason behind -path_info=>0 is that we want to have the same cache
entry (the same cache file) regardless of whether we use path_info
URL, or non-path_info URL.

The reason behind -full_path=>1 (from what I understand from
J.H. argument) is that you can have two gitweb deployments sharing the
same cache root, but in which the same project name can point to
different projects (different $projectsroot).

 gitweb/gitweb.perl |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2f6a8e1..cd5073c 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1004,6 +1004,10 @@ exit;
 ## ======================================================================
 ## action links

+# possible values of extra options
+# -full => 0|1      - use absolute/full URL ($my_uri/$my_url as base)
+# -replay => 1      - start from a current view (replay with modifications)
+# -path_info => 0|1 - don't use/use path_info URL (if possible)
 sub href {
 	my %params = @_;
 	# default is to use -absolute url() i.e. $my_uri
@@ -1020,7 +1024,8 @@ sub href {
 	}

 	my $use_pathinfo = gitweb_check_feature('pathinfo');
-	if ($use_pathinfo and defined $params{'project'}) {
+	if (defined $params{'project'} &&
+	    (exists $params{-path_info} ? $params{-path_info} : $use_pathinfo)) {
 		# try to put as many parameters as possible in PATH_INFO:
 		#   - project name
 		#   - action
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
  2010-01-23  0:27                     ` [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski
@ 2010-01-23  0:27                     ` Jakub Narebski
  2010-01-23  0:27                     ` [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski
                                       ` (9 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-23  0:27 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

This is first step towards implementing file based output (response)
caching layer that is used on such large sites as kernel.org.

This patch introduces GitwebCaching::SimpleFileCache package, which
follows Cache::Cache / CHI interface, although do not implement it
fully.  The intent of following established convention is to be able
in the future to replace our simple file based cache e.g. by one using
memcached.

Like in original patch by John 'Warthog9' Hawley (J.H.) (the one this
commit intends to be incremental step to), the data is stored in the
case as-is, without adding metadata (like expiration date), and
without serialization (which means only scalar data).

To be implemented (from original patch by J.H.):
* cache expiration (based on file stats, current time and global
  expiration time); currently elements in cache do not expire
* actually using this cache in gitweb, except error pages
* adaptive cache expiration, based on average system load
* optional locking interface, where only one process can update cache
  (using flock)
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Possible extensions (beyond what was in original patch):
* (optionally) show information about cache utilization
* AJAX (JavaScript-based) progress indicator
* JavaScript code to update relative dates in cached output
* make cache size-aware (try to not exceed specified maximum size)
* utilize X-Sendfile header (or equivalent) to show cached data
  (optional, as it makes sense only if web server supports sendfile
  feature and have it enabled)
* variable expiration feature from CHI, allowing items to expire a bit
  earlier than the stated expiration time to prevent cache miss
  stampedes (although locking, if available, should take care of
  this).

The code of GitwebCaching::SimpleFileCache package in gitweb/cache.pm
was heavily based on file-based cache in Cache::Cache package, i.e.
on Cache::FileCache, Cache::FileBackend and Cache::BaseCache
(including implementing atomic write, something that original patch
lacks).

This patch does not yet enable output caching in gitweb (it doesn't
have all required features yet); on the other hand it includes tests,
currently testing only cache Perl API.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Large parts of this code are based _heavily_ on Cache::FileCache
implementation (including Cache::FileBackend and Cache::BaseCache)
from Cache::Cache distribution (which is dual licensed using
(Perl) Artistic License and GNU General Public License, like Perl
itself).  In the final version of code it should probably be
cleaned up.

Also although it implements 'compute' interface from CHI (Unified
cache interface), actual CHI code was not used even as reference.
(Nevertheless gitweb/cache.pm is meant to use minimal dependencies,
like gitweb itself, so it won't use Moose / Mouse for OO).

 gitweb/cache.pm                 |  317 +++++++++++++++++++++++++++++++++++++++
 t/t9503-gitweb-caching.sh       |   32 ++++
 t/t9503/test_cache_interface.pl |   77 ++++++++++
 t/test-lib.sh                   |    3 +
 4 files changed, 429 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/cache.pm
 create mode 100755 t/t9503-gitweb-caching.sh
 create mode 100755 t/t9503/test_cache_interface.pl

diff --git a/gitweb/cache.pm b/gitweb/cache.pm
new file mode 100644
index 0000000..ea544b0
--- /dev/null
+++ b/gitweb/cache.pm
@@ -0,0 +1,317 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+#
+# This program is licensed under the GPLv2
+
+#
+# Gitweb caching engine
+#
+
+{
+# Minimalistic cache that stores data in the filesystem, without serialization
+# and currently without any kind of cache expiration (all keys last forever till
+# they got explicitely removed)
+#
+# It follows Cache::Cache and CHI interface (but does not implement it fully)
+
+package GitwebCache::SimpleFileCache;
+
+use strict;
+use warnings;
+
+use File::Path qw(make_path);  # requires version >= 2.0
+use File::Spec;
+use File::Temp;
+use Digest::MD5 qw(md5_hex);
+
+# by default, the cache nests all entries on the filesystem two
+# directories deep
+
+our $DEFAULT_CACHE_DEPTH = 2;
+
+# by default, the root of the cache is located in 'cache'.
+
+our $DEFAULT_CACHE_ROOT = "cache";
+
+# ......................................................................
+# constructor
+
+# The options are set by passing in a reference to a hash containing
+# any of the following keys:
+#  * 'namespace'
+#    The namespace associated with this cache.  This allows easy separation of
+#    multiple, distinct caches without worrying about key collision.  Defaults
+#    to '' (which does not allow for simple implementation of clear() method).
+#  * 'cache_root'
+#    The location in the filesystem that will hold the root of the cache.
+#    Defaults to 'cache', relative to gitweb.cgi directory.
+#  * 'cache_depth'
+#    The number of subdirectories deep to cache object item.  This should be
+#    large enough that no cache directory has more than a few hundred objects.
+#    Defaults to 2 unless explicitly set.
+sub new {
+	my ($proto, $p_options_hash_ref) = @_;
+
+	my $class = ref($proto) || $proto;
+	my $self  = {};
+	$self = bless($self, $class);
+
+	my ($root, $depth, $ns);
+	if (defined $p_options_hash_ref) {
+		$root  = $p_options_hash_ref->{'cache_root'};
+		$depth = $p_options_hash_ref->{'cache_depth'};
+		$ns    = $p_options_hash_ref->{'namespace'};
+	}
+	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
+	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
+	$ns    = '' unless defined($ns);
+
+	$self->set_root($root);
+	$self->set_depth($depth);
+	$self->set_namespace($ns);
+
+	return $self;
+}
+
+# ......................................................................
+# accessors
+
+sub get_depth {
+	my ($self) = @_;
+
+	return $self->{'_Depth'};
+}
+
+sub set_depth {
+	my ($self, $depth) = @_;
+
+	$self->{'_Depth'} = $depth;
+}
+
+sub get_root {
+	my ($self) = @_;
+
+	return $self->{'_Root'};
+}
+
+
+sub set_root {
+	my ($self, $root) = @_;
+
+	$self->{'_Root'} = $root;
+}
+
+sub get_namespace {
+	my ($self) = @_;
+
+	return $self->{'_Namespace'};
+}
+
+
+sub set_namespace {
+	my ($self, $namespace) = @_;
+
+	$self->{'_Namespace'} = $namespace;
+}
+
+# ----------------------------------------------------------------------
+# (private) utility functions and methods
+
+# Take an human readable key, and create a unique (hashed) key from it
+sub _Build_Hashed_Key {
+	my ($p_key) = @_;
+
+	return md5_hex($p_key);
+}
+
+# Take an human readable key, and return file path
+sub _path_to_key {
+	my ($self, $p_namespace, $p_key) = @_;
+
+	return $self->_path_to_hashed_key($p_namespace,
+	                                    _Build_Hashed_Key($p_key));
+}
+
+# Take hashed key, and return file path
+sub _path_to_hashed_key {
+	my ($self, $p_namespace, $p_hashed_key) = @_;
+
+	return File::Spec->catfile($self->get_root(), $p_namespace,
+	                           _Split_Word($p_hashed_key, $self->get_depth()));
+}
+
+# Split word into N components, where each component but last is two-letter word
+# e.g. _Split_Word("06b90e786e304a18fdfbd7c7bcc41a6b", 2) == qw(06 b90e786e304a18fdfbd7c7bcc41a6b);
+#      _Split_Word("06b90e786e304a18fdfbd7c7bcc41a6b", 3) == qw(06 b9 0e786e304a18fdfbd7c7bcc41a6b);
+sub _Split_Word {
+	my ($p_word, $p_depth) = @_;
+
+	$p_depth--; # now it is number of leading 2-letter components
+	return unpack("(a2)$p_depth a*", $p_word);
+}
+
+sub _Read_File {
+	my ($p_path) = @_;
+
+	-e $p_path
+		or return undef;
+
+	open(my $fh, '<', $p_path)
+		or return undef;
+
+	local $/ = undef;
+	my $data = <$fh>;
+
+	close($fh);
+
+	return $data;
+}
+
+# write a file atomically, assuming that path leading to file exists
+sub _Write_File {
+	my ($p_path, $p_data) = @_;
+
+	my ($volume, $directory, $filename) = File::Spec->splitpath($p_path);
+	if (defined $directory and defined $volume) {
+		$directory = File::Spec->catpath($volume, $directory, '');
+	}
+
+	my $temp = File::Temp->new(DIR => $directory,
+	                          TEMPLATE => "${filename}_XXXXX",
+	                          SUFFIX => '.tmp');
+	binmode($temp);
+	print {$temp} $p_data;
+	close($temp);
+
+	rename($temp, $p_path);
+}
+
+# ensures that directory leading to path exists, or dies
+sub _Make_Path {
+	my ($p_path, $p_dir) = @_;
+
+	my ($volume, $directory, $filename) = File::Spec->splitpath($p_path);
+	if (defined $directory and defined $volume) {
+		$directory = File::Spec->catpath($volume, $directory, "");
+	}
+
+	return
+		unless (defined $directory and not -d $directory);
+
+	my $numdirs = make_path($directory,
+	                        { mode => 0777, error => \my $mkdirerr });
+	if (@$mkdirerr) {
+		my $mkdirerrmsg = "";
+		for my $diag (@$mkdirerr) {
+			my ($file, $message) = %$diag;
+			if ($file eq '' ){
+				$mkdirerrmsg .= "general error: $message\n";
+			} else {
+				$mkdirerrmsg .= "problem unlinking $file: $message\n";
+			}
+		}
+		#die_error(500, "Could not create cache directory | $mkdirerrmsg");
+	}
+}
+
+sub _Remove_File {
+	my ($p_path) = @_;
+
+	if (-f $p_path) {
+		unlink($p_path);
+	}
+}
+
+# _read_data and _write_data methods do deserialization/serialization
+# in original implementation in Cache::Cache distribution
+
+sub _read_data {
+	my ($self, $p_path) = @_;
+
+	return _Read_File($p_path);
+}
+
+sub _write_data {
+	my ($self, $p_path, $p_data) = @_;
+
+	_Make_Path($p_path);
+	_Write_File($p_path, $p_data);
+}
+
+# ----------------------------------------------------------------------
+# worker methods (explicit namespace)
+
+sub restore {
+	my ($self, $p_namespace, $p_key) = @_;
+
+	return $self->_read_data($self->_path_to_key($p_namespace, $p_key));
+}
+
+sub store {
+	my ($self, $p_namespace, $p_key, $p_data) = @_;
+
+	$self->_write_data($self->_path_to_key($p_namespace, $p_key),
+	                   $p_data);
+}
+
+sub delete_key {
+	my ($self, $p_namespace, $p_key) = @_;
+
+	_Remove_File($self->_path_to_key($p_namespace, $p_key));
+}
+
+sub get_size {
+	my ($self, $p_namespace, $p_key) = @_;
+
+	my $path = $self->_path_to_key($p_namespace, $p_key);
+	if (-e $path) {
+		return -s $path;
+	}
+	return 0;
+}
+
+# ......................................................................
+# interface methods
+
+# Removing and expiring
+
+sub remove {
+	my ($self, $p_key) = @_;
+
+	$self->delete_key($self->get_namespace(), $p_key);
+}
+
+# Getting and setting
+
+sub set {
+	my ($self, $p_key, $p_data) = @_;
+
+	$self->store($self->get_namespace(), $p_key, $p_data);
+}
+
+sub get {
+	my ($self, $p_key) = @_;
+
+	my $data = $self->restore($self->get_namespace(), $p_key)
+		or return undef;
+
+	return $data;
+}
+
+sub compute {
+	my ($self, $p_key, $p_coderef) = @_;
+
+	my $data = $self->get($p_key);
+	if (!defined $data) {
+		$data = $p_coderef->($self, $p_key);
+		$self->set($p_key, $data);
+	}
+
+	return $data;
+}
+
+1;
+} # end of package GitwebCache::SimpleFileCache;
+
+1;
diff --git a/t/t9503-gitweb-caching.sh b/t/t9503-gitweb-caching.sh
new file mode 100755
index 0000000..768080c
--- /dev/null
+++ b/t/t9503-gitweb-caching.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Jakub Narebski
+#
+
+test_description='caching interface to be used in gitweb'
+#test_description='caching interface used in gitweb, gitweb caching
+#
+#This test checks cache (interface) used in gitweb caching, caching
+#infrastructure and gitweb response (output) caching (the last by
+#running gitweb as CGI script from commandline).'
+
+# for now we are running only cache interface tests
+. ./test-lib.sh
+
+# this test is present in gitweb-lib.sh
+if ! test_have_prereq PERL; then
+	say 'perl not available, skipping test'
+	test_done
+fi
+
+"$PERL_PATH" -MTest::More -e 0 >/dev/null 2>&1 || {
+	say 'perl module Test::More unavailable, skipping test'
+	test_done
+}
+
+# ----------------------------------------------------------------------
+
+test_external 'GitwebCache::* Perl API (in gitweb/cache.pm)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_cache_interface.pl
+
+test_done
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
new file mode 100755
index 0000000..0b6628b
--- /dev/null
+++ b/t/t9503/test_cache_interface.pl
@@ -0,0 +1,77 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use Test::More;
+
+# test source version; there is no installation target for gitweb
+my $cache_pm = "$ENV{TEST_DIRECTORY}/../gitweb/cache.pm";
+
+unless (-f "$cache_pm") {
+	plan skip_all => "gitweb/cache.pm not found";
+}
+
+# it is currently not a proper Perl module, so we use 'do FILE'
+#ok(eval { do "$cache_pm"; 1 or die $!; }, "loading gitweb/cache.pm");
+my $return = do "$cache_pm";
+ok(!$@,              "parse gitweb/cache.pm");
+ok(defined $return,  "do    gitweb/cache.pm");
+ok($return,          "run   gitweb/cache.pm");
+# instead of: BEGIN { use_ok('GitwebCache::SimpleFileCache') }
+
+# Test creating a cache
+#
+my $cache = new_ok('GitwebCache::SimpleFileCache',
+	[ { 'cache_root' => 'cache', 'cache_depth' => 2 } ]);
+
+# Test that default values are defined
+#
+ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
+	'$DEFAULT_CACHE_ROOT defined');
+ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
+	'$DEFAULT_CACHE_DEPTH defined');
+
+# Test accessors and default values for cache
+#
+SKIP: {
+	skip 'default values not defined', 3
+		unless ($GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT &&
+		        $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH);
+
+	is($cache->get_namespace(), '', "default namespace is ''");
+	is($cache->get_root(), $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
+		"default cache root is '$GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT'");
+	cmp_ok($cache->get_depth(), '==', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
+		"default cache depth is $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH");
+}
+
+# Test the getting, setting, and removal of a cached value
+# (Cache::Cache interface)
+#
+my $key = 'Test Key';
+my $value = 'Test Value';
+can_ok($cache, qw(get set remove));
+#ok(!defined($cache->get($key)),        'get before set')
+#	or diag("get returned '", $cache->get($key), "' for $key");
+$cache->set($key, $value);
+is($cache->get($key), $value,          'get after set, returns cached value');
+$cache->remove($key);
+ok(!defined($cache->get($key)),        'get after remove, is undefined');
+
+# Test the getting and setting of a cached value
+# (CHI interface)
+#
+my $call_count = 0;
+sub get_value {
+	$call_count++;
+	return $value;
+}
+can_ok($cache, qw(compute));
+is($cache->compute($key, \&get_value), $value, 'compute 1st time (set)');
+is($cache->compute($key, \&get_value), $value, 'compute 2nd time (get)');
+is($cache->compute($key, \&get_value), $value, 'compute 3rd time (get)');
+cmp_ok($call_count, '==', 1, 'get_value() is called once');
+
+done_testing();
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 142f36f..9282d9e 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -429,6 +429,9 @@ test_external () {
 		# Announce the script to reduce confusion about the
 		# test output that follows.
 		say_color "" " run $test_count: $descr ($*)"
+		# Export TEST_DIRECTORY and TRASH_DIRECTORY
+		# to be able to use them in script
+		export TEST_DIRECTORY TRASH_DIRECTORY
 		# Run command; redirect its stderr to &4 as in
 		# test_run_, but keep its stdout on our stdout even in
 		# non-verbose mode.
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
  2010-01-23  0:27                     ` [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski
  2010-01-23  0:27                     ` [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching Jakub Narebski
@ 2010-01-23  0:27                     ` Jakub Narebski
  2010-01-23  0:27                     ` [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) output caching (WIP) Jakub Narebski
                                       ` (8 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-23  0:27 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Add stat-based cache expiration to file-based GitwebCache::SimpleFileCache.
Contrary to the way other caching interfaces such as Cache::Cache and CHI
do it, the time cache element expires in is _global_ value associated with
cache instance, and is not local property of cache entry.  (Currently cache
entry does not store any metadata associated with entry... which means that
there is no need for serialization / marshalling / freezing and thawing.)
Default expire time is -1, which means never expire.

To check if cache entry is expired, GitwebCache::SimpleFileCache compares
difference between mtime (last modify time) of a cache file and current time
with (global) time to expire.  It is done using CHI-compatibile is_valid()
method.

Add test checking that expire time of 0 (expire now) works correctly.


While at it show diagnostic if there were parse errors in gitweb/cache.pm
(TO BE MOVED TO PREVIOUS COMMIT).

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
The main difference from the way J.H. did it in his patch adding output
caching to gitweb is that here I uses imply stat, and not open/stat/close.

Now that cahcing engine supports cache expiration, we can add caching
support to gitweb.

 gitweb/cache.pm                 |   47 ++++++++++++++++++++++++++++++++++++++-
 t/t9503/test_cache_interface.pl |   11 ++++++++-
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index ea544b0..12a7a78 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -50,6 +50,10 @@ our $DEFAULT_CACHE_ROOT = "cache";
 #    The number of subdirectories deep to cache object item.  This should be
 #    large enough that no cache directory has more than a few hundred objects.
 #    Defaults to 2 unless explicitly set.
+#  * 'default_expires_in' (Cache::Cache compatibile),
+#    'expires_in' (CHI compatibile) [seconds]
+#    The expiration time for objects place in the cache.
+#    Defaults to $EXPIRES_NEVER if not explicitly set.
 sub new {
 	my ($proto, $p_options_hash_ref) = @_;
 
@@ -57,19 +61,24 @@ sub new {
 	my $self  = {};
 	$self = bless($self, $class);
 
-	my ($root, $depth, $ns);
+	my ($root, $depth, $ns, $expires_in);
 	if (defined $p_options_hash_ref) {
 		$root  = $p_options_hash_ref->{'cache_root'};
 		$depth = $p_options_hash_ref->{'cache_depth'};
 		$ns    = $p_options_hash_ref->{'namespace'};
+		$expires_in =
+			$p_options_hash_ref->{'default_expires_in'} ||
+			$p_options_hash_ref->{'expires_in'};
 	}
 	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
 	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
 	$ns    = '' unless defined($ns);
+	$expires_in = -1 unless defined($expires_in); # <0 means never
 
 	$self->set_root($root);
 	$self->set_depth($depth);
 	$self->set_namespace($ns);
+	$self->set_expires_in($expires_in);
 
 	return $self;
 }
@@ -115,6 +124,20 @@ sub set_namespace {
 	$self->{'_Namespace'} = $namespace;
 }
 
+sub get_expires_in {
+	my ($self) = @_;
+
+	return $self->{'_Expires_In'};
+}
+
+
+sub set_expires_in {
+	my ($self, $expires_in) = @_;
+
+	$self->{'_Expires_In'} = $expires_in;
+}
+
+
 # ----------------------------------------------------------------------
 # (private) utility functions and methods
 
@@ -282,6 +305,27 @@ sub remove {
 	$self->delete_key($self->get_namespace(), $p_key);
 }
 
+# exists in cache and is not expired
+sub is_valid {
+	my ($self, $p_key) = @_;
+
+	# should there be namespace variant of this function?
+	my $path = $self->_path_to_key($self->get_namespace(), $p_key);
+
+	# does file exists in cache?
+	return 0 unless -f $path;
+
+	# expire time can be set to never
+	my $expires_in = $self->get_expires_in();
+	return 1 unless (defined $expires_in && $expires_in >= 0);
+
+	# is file expired?
+	my $mtime = (stat(_))[9];
+	my $now = time();
+
+	return (($now - $mtime) < $expires_in);
+}
+
 # Getting and setting
 
 sub set {
@@ -293,6 +337,7 @@ sub set {
 sub get {
 	my ($self, $p_key) = @_;
 
+	return undef unless $self->is_valid($p_key);
 	my $data = $self->restore($self->get_namespace(), $p_key)
 		or return undef;
 
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 0b6628b..3644ca8 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -16,7 +16,8 @@ unless (-f "$cache_pm") {
 # it is currently not a proper Perl module, so we use 'do FILE'
 #ok(eval { do "$cache_pm"; 1 or die $!; }, "loading gitweb/cache.pm");
 my $return = do "$cache_pm";
-ok(!$@,              "parse gitweb/cache.pm");
+ok(!$@,              "parse gitweb/cache.pm")
+	or diag("parse error:\n", $@);
 ok(defined $return,  "do    gitweb/cache.pm");
 ok($return,          "run   gitweb/cache.pm");
 # instead of: BEGIN { use_ok('GitwebCache::SimpleFileCache') }
@@ -74,4 +75,12 @@ is($cache->compute($key, \&get_value), $value, 'compute 2nd time (get)');
 is($cache->compute($key, \&get_value), $value, 'compute 3rd time (get)');
 cmp_ok($call_count, '==', 1, 'get_value() is called once');
 
+# Test cache expiration for 'expire now'
+#
+$cache->set_expires_in(0);
+is($cache->get_expires_in(), 0,        '"expires in" is set to now (0)');
+$cache->set($key, $value);
+ok(!defined($cache->get($key)),        'cache is expired');
+
+
 done_testing();
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) output caching (WIP)
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
                                       ` (2 preceding siblings ...)
  2010-01-23  0:27                     ` [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski
@ 2010-01-23  0:27                     ` Jakub Narebski
  2010-01-23  0:27                     ` [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) Jakub Narebski
                                       ` (7 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-23  0:27 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch, as you can see, lack proper commit message: it is work in
progress.

As you can see cache_fetch() subroutine is much, much simpler that the
one in original patch by J.H.:
  [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org)
  Message-ID: <1263432185-21334-10-git-send-email-warthog9@eaglescrag.net>
  http://permalink.gmane.org/gmane.comp.version-control.git/136917

As you can see I have introduced $cache_pm variable, to be able to
test caching in t/t9500-gitweb-standalone-no-errors.sh, but also to be
able to install cache.pm in some other place than along gitweb.cgi.
There would be no such problems if we used 'require GitwebCache' or
somesuch, in place of 'do "cache.pm"' like in original patch by J.H.
But at leats for now I have decided to follow J.H. in this issue.

Instead of using binary (sic!) valued $cache_enable as in J.H. patch,
I use set of two variables: $cache (to be able to select what caching
engine to use, and what features should be enabled), and
$caching_enabled to actually enable/disable cache.

When caching is enabled gitweb do not output timing info (time to
generate page), as it would contain incorrect information if the page
was retrieved from cache; in this place we could put cachetime info
from the original patch by J.H., which means writing

  Cache Last Updated: ". gmtime( time )

I have forgot about adding this feature...


ATTENTION !!!: I have run both tests (t9500 to check for errors in
gitweb.perl, and t9503 to test the API), but I haven't actually tested
that *gitweb itself* behaves correctly.

 gitweb/cache.pm                        |   36 ++++++++++++++++++++++++++++
 gitweb/gitweb.perl                     |   40 +++++++++++++++++++++++++++++--
 t/gitweb-lib.sh                        |    2 +
 t/t9500-gitweb-standalone-no-errors.sh |   13 ++++++++++
 4 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index 12a7a78..3a33158 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -359,4 +359,41 @@ sub compute {
 1;
 } # end of package GitwebCache::SimpleFileCache;
 
+# human readable key identifying gitweb output
+sub gitweb_output_key {
+	return href(-replay => 1, -full => 1, -path_info => 0);
+}
+
+sub cache_fetch {
+	my ($cache, $action) = @_;
+
+	my $key = gitweb_output_key();
+	my $data = $cache->get($key);
+
+	if (defined $data) {
+		# print cached data
+		binmode STDOUT, ':raw';
+		print STDOUT $data;
+
+	} else {
+		# calculate data and regenerate data
+		open my $data_fh, '>', \$data
+			or die "Can't open memory file: $!";
+		# matches "binmode STDOUT, ':uft8'" at beginning
+		binmode $data_fh, ':utf8';
+
+		$out = $data_fh || \*STDOUT;
+		$actions{$action}->();
+
+		if (defined $data) {
+			$cache->set($key, $data);
+			binmode STDOUT, ':raw';
+			local $/ = undef;
+			print STDOUT $data;
+		}
+
+		close $data_fh;
+	}
+}
+
 1;
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cd5073c..0394dc8 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -234,6 +234,22 @@ our $gitlinkurl_base = ("++GITWEB_BASE_URL++" =~ m!^(git://.*)$!) ? $1 : '';
 # Leave it undefined (or set to 'undef') to turn off load checking.
 our $maxload = 300;
 
+# This enables/disables the caching layer in gitweb.  Currently supported
+# is only output (response) caching, similar to the one used on git.kernel.org.
+our $caching_enabled = 0;
+# Set to _initialized_ instance of cache interface implementing (at least)
+# get($key) and set($key, $data) methods (Cache::Cache and CHI interfaces).
+# If unset, GitwebCache::SimpleFileCache would be used, which is 'dumb'
+# (but fast) file based caching layer, currently without any support for
+# cache size limiting.  It is therefore recommended that the cache directory
+# be periodically completely deleted; this operation is safe to perform.
+# Suggested mechanism:
+# mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush
+our $cache;
+# Locations of 'cache.pm' file; if it is relative path, it is relative to
+# the directory gitweb is run from
+our $cache_pm = 'cache.pm';
+
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %feature = (
@@ -998,7 +1014,21 @@ if ($action !~ m/^(?:opml|project_list|project_index)$/ &&
     !$project) {
 	die_error(400, "Project needed");
 }
-$actions{$action}->();
+
+if ($caching_enabled) {
+	do $cache_pm;
+	die $@ if $@;
+
+	$cache ||= GitwebCache::SimpleFileCache->new({
+		'cache_root'  => '/tmp/cache',
+		'cache_depth' => 2,
+		'expires_in'  => 20, # in seconds
+	});
+	cache_fetch($cache, $action);
+} else {
+	$actions{$action}->();
+}
+
 exit;
 
 ## ======================================================================
@@ -3207,7 +3237,9 @@ sub git_header_html {
 	# 'application/xhtml+xml', otherwise send it as plain old 'text/html'.
 	# we have to do this because MSIE sometimes globs '*/*', pretending to
 	# support xhtml+xml but choking when it gets what it asked for.
-	if (defined $cgi->http('HTTP_ACCEPT') &&
+	# Disable content-type negotiation when caching (use mimetype good for all).
+	if (!$caching_enabled &&
+	    defined $cgi->http('HTTP_ACCEPT') &&
 	    $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ &&
 	    $cgi->Accept('application/xhtml+xml') != 0) {
 		$content_type = 'application/xhtml+xml';
@@ -3380,7 +3412,9 @@ sub git_footer_html {
 	}
 	print {$out} "</div>\n"; # class="page_footer"
 
-	if (defined $t0 && gitweb_check_feature('timed')) {
+	# timing info doesn't make much sense with output (response) caching
+	if (!$caching_enabled &&
+	    defined $t0 && gitweb_check_feature('timed')) {
 		print {$out} "<div id=\"generating_info\">\n";
 		print {$out} 'This page took '.
 		             '<span id="generating_time" class="time_span">'.
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index d9ffc90..d041083 100755
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -27,6 +27,8 @@ our \$export_ok = '';
 our \$strict_export = '';
 our \$git_versions_must_match = 0;
 
+our \$cache_pm = '$TEST_DIRECTORY/../gitweb/cache.pm';
+
 EOF
 
 	cat >.git/description <<EOF
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 2fc7fdb..0f93962 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -639,4 +639,17 @@ test_expect_success \
 	 gitweb_run "p=.git;a=summary"'
 test_debug 'cat gitweb.log'
 
+# ----------------------------------------------------------------------
+# caching
+
+cat >>gitweb_config.perl <<\EOF
+
+$caching_enabled = 1;
+EOF
+test_expect_success \
+	'caching enabled' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
+
 test_done
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP)
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
                                       ` (3 preceding siblings ...)
  2010-01-23  0:27                     ` [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) output caching (WIP) Jakub Narebski
@ 2010-01-23  0:27                     ` Jakub Narebski
  2010-01-23  0:27                     ` [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) Jakub Narebski
                                       ` (6 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-23  0:27 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch, as you can see, lack proper commit message: it is work in
progress.

As it is quick'n'dirty proof of concept patch, it lacks variables in
gitweb.perl that can be used to control cache behaviour.  In original
patch in J.H. those were the following variables: $minCacheTime,
$maxCacheTime.

In the final version we should probably leave $check_load undefined,
or set it to 'sub { return 0; }' instead of requiring that there is
get_loadavg() subroutine in main namespace (see t9503 test change).
It would be alos good idea to leave ->set_expires_in() subroutine
which would set min and max range so that adaptive cache lifetime is
effectively disabled.

 gitweb/cache.pm                 |   82 ++++++++++++++++++++++++++++++++++----
 t/t9503/test_cache_interface.pl |    8 +++-
 2 files changed, 80 insertions(+), 10 deletions(-)

diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index 3a33158..9692e8d 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -61,24 +61,36 @@ sub new {
 	my $self  = {};
 	$self = bless($self, $class);
 
-	my ($root, $depth, $ns, $expires_in);
+	my ($root, $depth, $ns);
+	my ($expires_min, $expires_max, $increase_factor, $check_load);
 	if (defined $p_options_hash_ref) {
 		$root  = $p_options_hash_ref->{'cache_root'};
 		$depth = $p_options_hash_ref->{'cache_depth'};
 		$ns    = $p_options_hash_ref->{'namespace'};
-		$expires_in =
+		$expires_min =
+			$p_options_hash_ref->{'expires_min'} ||
 			$p_options_hash_ref->{'default_expires_in'} ||
 			$p_options_hash_ref->{'expires_in'};
+		$expires_max =
+			$p_options_hash_ref->{'expires_max'};
+		$increase_factor = $p_options_hash_ref->{'expires_factor'};
+		$check_load      = $p_options_hash_ref->{'check_load'};
 	}
 	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
 	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
 	$ns    = '' unless defined($ns);
-	$expires_in = -1 unless defined($expires_in); # <0 means never
+	$expires_min =   20 unless defined($expires_min);
+	$expires_max = 1200 unless defined($expires_max);
+	$increase_factor = 60 unless defined($increase_factor);
+	$check_load = \&main::get_loadavg unless defined($check_load);
 
 	$self->set_root($root);
 	$self->set_depth($depth);
 	$self->set_namespace($ns);
-	$self->set_expires_in($expires_in);
+	$self->set_expires_min($expires_min);
+	$self->set_expires_max($expires_max);
+	$self->set_increase_factor($increase_factor);
+	$self->set_check_load($check_load);
 
 	return $self;
 }
@@ -124,19 +136,71 @@ sub set_namespace {
 	$self->{'_Namespace'} = $namespace;
 }
 
-sub get_expires_in {
+
+sub get_expires_min {
 	my ($self) = @_;
 
-	return $self->{'_Expires_In'};
+	return $self->{'_Expires_Min'};
+}
+
+sub set_expires_min {
+	my ($self, $expires_min) = @_;
+
+	$self->{'_Expires_Min'} = $expires_min;
 }
 
+sub get_expires_max {
+	my ($self) = @_;
+
+	return $self->{'_Expires_Max'};
+}
+
+sub set_expires_max {
+	my ($self, $expires_max) = @_;
+
+	$self->{'_Expires_Max'} = $expires_max;
+}
+
+sub get_increase_factor {
+	my ($self) = @_;
+
+	return $self->{'_Increase_Factor'};
+}
 
-sub set_expires_in {
-	my ($self, $expires_in) = @_;
+sub set_increase_factor {
+	my ($self, $increase_factor) = @_;
 
-	$self->{'_Expires_In'} = $expires_in;
+	$self->{'_Increase_Factor'} = $increase_factor;
 }
 
+sub get_check_load {
+	my ($self) = @_;
+
+	return $self->{'_Check_Load'};
+}
+
+sub set_check_load {
+	my ($self, $sub) = @_;
+
+	$self->{'_Check_Load'} = $sub;
+}
+
+# ......................................................................
+
+sub get_expires_in {
+	my ($self) = @_;
+	my $expires_in =
+		#$self->get_expires_min() +
+		$self->get_increase_factor() * $self->get_check_load()->();
+
+	if ($expires_in < $self->get_expires_min()) {
+		return $self->get_expires_min();
+	} elsif ($expires_in > $self->get_expires_max()) {
+		return $self->get_expires_max();
+	}
+
+	return $expires_in;
+}
 
 # ----------------------------------------------------------------------
 # (private) utility functions and methods
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 3644ca8..0870b87 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -6,6 +6,11 @@ use strict;
 
 use Test::More;
 
+# mockup
+sub get_loadavg {
+	return 0.0;
+}
+
 # test source version; there is no installation target for gitweb
 my $cache_pm = "$ENV{TEST_DIRECTORY}/../gitweb/cache.pm";
 
@@ -77,7 +82,8 @@ cmp_ok($call_count, '==', 1, 'get_value() is called once');
 
 # Test cache expiration for 'expire now'
 #
-$cache->set_expires_in(0);
+$cache->set_expires_min(0);
+$cache->set_expires_max(0);
 is($cache->get_expires_in(), 0,        '"expires in" is set to now (0)');
 $cache->set($key, $value);
 ok(!defined($cache->get($key)),        'cache is expired');
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP)
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
                                       ` (4 preceding siblings ...)
  2010-01-23  0:27                     ` [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) Jakub Narebski
@ 2010-01-23  0:27                     ` Jakub Narebski
  2010-01-23  0:27                     ` [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' problem (WIP) Jakub Narebski
                                       ` (5 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-23  0:27 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch, as you can see, lack proper commit message: it is work in
progress.

The reason behind supporting compute() as interface to cache is that
with this interface it is possible (as it can be seen in the following
patch) to use locking to avoid cache miss stampede (only one process
regenerates cache).

The support for $cache which do not provide '->compute($key, $code)'
method is left just in case we would want to use such (external)
caching engine.  We should probably add generic_compute() subroutine
which would use get/set, and provide compute-like interface.

 gitweb/cache.pm |   53 +++++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index 9692e8d..8dd4f39 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -432,6 +432,48 @@ sub cache_fetch {
 	my ($cache, $action) = @_;
 
 	my $key = gitweb_output_key();
+	if ($cache->can('compute')) {
+		cache_fetch_compute($cache, $action, $key);
+	} else {
+		cache_fetch_get_set($cache, $action, $key);
+	}
+}
+
+# calculate data to regenerate cache
+sub cache_calculate {
+	my ($action) = @_;
+
+	my $data;
+	open my $data_fh, '>', \$data
+		or die "Can't open memory file: $!";
+	# matches "binmode STDOUT, ':uft8'" at beginning
+	binmode $data_fh, ':utf8';
+
+	$out = $data_fh || \*STDOUT;
+	$actions{$action}->();
+
+	close $data_fh;
+
+	return $data;
+}
+
+# for $cache which can ->compute($key, $code)
+sub cache_fetch_compute {
+	my ($cache, $action, $key) = @_;
+
+	my $data = $cache->compute($key, sub { cache_calculate($action) });
+
+	if (defined $data) {
+		# print cached data
+		binmode STDOUT, ':raw';
+		local $/ = undef;
+		print STDOUT $data;
+	}
+}
+
+# for $cache which can ->get($key) and ->set($key, $data)
+sub cache_fetch_get_set {
+	my ($cache, $action, $key) = @_;
+
 	my $data = $cache->get($key);
 
 	if (defined $data) {
@@ -440,22 +482,13 @@ sub cache_fetch {
 		print STDOUT $data;
 
 	} else {
-		# calculate data and regenerate data
-		open my $data_fh, '>', \$data
-			or die "Can't open memory file: $!";
-		# matches "binmode STDOUT, ':uft8'" at beginning
-		binmode $data_fh, ':utf8';
-
-		$out = $data_fh || \*STDOUT;
-		$actions{$action}->();
+		$data = cache_calculate($action);
 
 		if (defined $data) {
 			$cache->set($key, $data);
 			binmode STDOUT, ':raw';
 			local $/ = undef;
 			print STDOUT $data;
 		}
-
-		close $data_fh;
 	}
 }
 
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' problem (WIP)
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
                                       ` (5 preceding siblings ...)
  2010-01-23  0:27                     ` [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) Jakub Narebski
@ 2010-01-23  0:27                     ` Jakub Narebski
  2010-01-23  0:27                     ` [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for filling cache (WIP) Jakub Narebski
                                       ` (4 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-23  0:27 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch, as you can see, lack proper commit message: it is work in
progress.

As you can see we can almost do the same for the cache which supports
only get/set interface... but for replacing ->retrieve() with ->get()
in ->compute().  This is argument for generic_compute subroutine,
mentioned in previous patch.

Note that we actually can and do test that provided mechanism avoid
cache miss stampede (aka 'stampeding herd') problem... although the
test should probably be run only with --long (this would need update
to t/test-lib.sh to pass GIT_TEST_LONG to external tests in
test_external and the like), as we need to sleep at least one second
to ensure that we would have 'stampeding herd' problem.

 gitweb/cache.pm                 |   27 ++++++++++++++--
 t/t9503/test_cache_interface.pl |   66 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index 8dd4f39..f514ee9 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -24,6 +24,7 @@ use File::Path qw(make_path);  # requires version >= 2.0
 use File::Spec;
 use File::Temp;
 use Digest::MD5 qw(md5_hex);
+use Fcntl qw(:flock);
 
 # by default, the cache nests all entries on the filesystem two
 # directories deep
@@ -217,7 +218,7 @@ sub _path_to_key {
 	my ($self, $p_namespace, $p_key) = @_;
 
 	return $self->_path_to_hashed_key($p_namespace,
-	                                    _Build_Hashed_Key($p_key));
+	                                  _Build_Hashed_Key($p_key));
 }
 
 # Take hashed key, and return file path
@@ -228,6 +229,13 @@ sub _path_to_hashed_key {
 	                           _Split_Word($p_hashed_key, $self->get_depth()));
 }
 
+sub _lockfile_to_key {
+	my ($self, $p_namespace, $p_key) = @_;
+
+	return $self->_path_to_hashed_key($p_namespace,
+	                                  _Build_Hashed_Key($p_key)) . '.lock';
+}
+
 # Split word into N components, where each component but last is two-letter word
 # e.g. _Split_Word("06b90e786e304a18fdfbd7c7bcc41a6b", 2) == qw(06 b90e786e304a18fdfbd7c7bcc41a6b);
 #      _Split_Word("06b90e786e304a18fdfbd7c7bcc41a6b", 3) == qw(06 b9 0e786e304a18fdfbd7c7bcc41a6b);
@@ -412,17 +420,30 @@ sub compute {
 	my ($self, $p_key, $p_coderef) = @_;
 
 	my $data = $self->get($p_key);
-	if (!defined $data) {
+	return $data if defined $data;
+
+	my $lockfile = $self->_lockfile_to_key($self->get_namespace(), $p_key);
+	_Make_Path($lockfile);
+	open my $lock_fh, '+>', $lockfile;
+	#	or die "Can't open lockfile '$lockfile': $!";
+	if (my $lock_state = flock($lock_fh, LOCK_EX | LOCK_NB)) {
+		# acquired writers lock
 		$data = $p_coderef->($self, $p_key);
 		$self->set($p_key, $data);
+	} else {
+		# get readers lock
+		flock($lock_fh, LOCK_SH);
+		$data = $self->restore($self->get_namespace(), $p_key);
 	}
-
+	close $lock_fh;
 	return $data;
 }
 
 1;
 } # end of package GitwebCache::SimpleFileCache;
 
+# ======================================================================
+
 # human readable key identifying gitweb output
 sub gitweb_output_key {
 	return href(-replay => 1, -full => 1, -path_info => 0);
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 0870b87..43b806d 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -65,6 +65,8 @@ $cache->set($key, $value);
 is($cache->get($key), $value,          'get after set, returns cached value');
 $cache->remove($key);
 ok(!defined($cache->get($key)),        'get after remove, is undefined');
+eval { $cache->remove('Not-Existent Key'); };
+ok(!$@,                                'remove on non-existent key doesn\'t die');
 
 # Test the getting and setting of a cached value
 # (CHI interface)
@@ -80,6 +82,70 @@ is($cache->compute($key, \&get_value), $value, 'compute 2nd time (get)');
 is($cache->compute($key, \&get_value), $value, 'compute 3rd time (get)');
 cmp_ok($call_count, '==', 1, 'get_value() is called once');
 
+# Test 'stampeding herd' / cache miss stampede problem
+# (probably should be run only if GIT_TEST_LONG)
+sub get_value_slow {
+	$call_count++;
+	sleep 1;
+	return $value;
+}
+my ($pid, $kid_fh);
+
+$call_count = 0;
+$cache->remove($key);
+$pid = open $kid_fh, '-|';
+SKIP: {
+	skip "cannot fork: $!", 1
+		unless defined $pid;
+
+	my $data = $cache->get($key);
+	if (!defined $data) {
+		$data = get_value_slow();
+		$cache->set($key, $data);
+	}
+
+	if ($pid) {
+		my $child_count = <$kid_fh>;
+		chomp $child_count;
+
+		waitpid $pid, 0;
+		close $kid_fh;
+
+		$call_count += $child_count;
+	} else {
+		print "$call_count\n";
+		exit 0;
+	}
+
+	cmp_ok($call_count, '==', 2, 'parallel get/set: get_value_slow() called twice');
+}
+
+$call_count = 0;
+$cache->remove($key);
+$pid = open $kid_fh, '-|';
+SKIP: {
+	skip "cannot fork: $!", 1
+		unless defined $pid;
+
+	my $data = $cache->compute($key, \&get_value_slow);
+
+	if ($pid) {
+		my $child_count = <$kid_fh>;
+		chomp $child_count;
+
+		waitpid $pid, 0;
+		close $kid_fh;
+
+		$call_count += $child_count;
+	} else {
+		print "$call_count\n";
+		exit 0;
+	}
+
+	cmp_ok($call_count, '==', 1, 'parallel compute: get_value_slow() called once');
+}
+
+
 # Test cache expiration for 'expire now'
 #
 $cache->set_expires_min(0);
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for filling cache (WIP)
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
                                       ` (6 preceding siblings ...)
  2010-01-23  0:27                     ` [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' problem (WIP) Jakub Narebski
@ 2010-01-23  0:27                     ` Jakub Narebski
  2010-01-23  0:27                     ` [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating " Jakub Narebski
                                       ` (3 subsequent siblings)
  11 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-23  0:27 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch, as you can see, lack proper commit message: it is work in
progress.

Note that contrary to original patch by J.H. we fork unconditionally;
otherwise the process which do recalculate cache entry would be at
disadvantage compared to processes which failed race to acquire
writers lock.  Perhaps this should be made configurable, though.

Also note that currently there is no limit on how stale the data which
is being served while regenerating data can be, i.e. there is no
equivalent for $maxCacheLife.  Simplest solution (I guess) would be to
just delete cache entry if it is too stale upfront.

 gitweb/cache.pm                 |   32 +++++++++++++++++++++++++++-----
 t/t9503/test_cache_interface.pl |   37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+), 5 deletions(-)

diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index f514ee9..1adf467 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -426,14 +426,36 @@ sub compute {
 	_Make_Path($lockfile);
 	open my $lock_fh, '+>', $lockfile;
 	#	or die "Can't open lockfile '$lockfile': $!";
+
+	# try to retrieve stale data
+	$data = $self->restore($self->get_namespace(), $p_key);
+
 	if (my $lock_state = flock($lock_fh, LOCK_EX | LOCK_NB)) {
 		# acquired writers lock
-		$data = $p_coderef->($self, $p_key);
-		$self->set($p_key, $data);
+		my $pid = fork() if $data;
+		if (!defined $pid || $pid) {
+			# parent, or didn't fork
+			$data = $p_coderef->($self, $p_key);
+			$self->set($p_key, $data);
+
+			if ($pid) {
+				# wait for child (which would print) and exit
+				waitpid $pid, 0;
+				exit 0;
+			} else {
+				# there is no child, or was no $data to serve in background
+				;
+			}
+		} else {
+			# child to serve $data
+			;
+		}
 	} else {
-		# get readers lock
-		flock($lock_fh, LOCK_SH);
-		$data = $self->restore($self->get_namespace(), $p_key);
+		if (!defined $data) {
+			# get readers lock if there is no stale data to serve
+			flock($lock_fh, LOCK_SH);
+			$data = $self->restore($self->get_namespace(), $p_key);
+		}
 	}
 	close $lock_fh;
 	return $data;
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 43b806d..788e8f2 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -154,5 +154,42 @@ is($cache->get_expires_in(), 0,        '"expires in" is set to now (0)');
 $cache->set($key, $value);
 ok(!defined($cache->get($key)),        'cache is expired');
 
+# Test that cache returns stale data in existing but expired cache situation
+# (probably should be run only if GIT_TEST_LONG)
+$cache->set_expires_min(0);
+$cache->set_expires_max(0);
+my $stale_value = 'Stale Value';
+my $child_data = '';
+$cache->set($key, $stale_value);
+$call_count = 0;
+$pid = open $kid_fh, '-|';
+SKIP: {
+	skip "cannot fork: $!", 4
+		unless defined $pid;
+
+	my $data = $cache->compute($key, \&get_value_slow);
+
+	if ($pid) {
+		$child_data = <$kid_fh>;
+		chomp $child_data;
+
+		waitpid $pid, 0;
+		close $kid_fh;
+	} else {
+		print "$data\n";
+		exit 0;
+	}
+
+	is($data,       $stale_value, 'stale data in parent when expired');
+	is($child_data, $stale_value, 'stale data in child  when expired');
+
+	# never expire
+	$cache->set_expires_min(-1);
+	$cache->set_expires_max(-1);
+	is($cache->get($key), $value, 'value got set correctly');
+}
+$cache->set_expires_min(0);
+$cache->set_expires_max(0);
+
 
 done_testing();
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
                                       ` (7 preceding siblings ...)
  2010-01-23  0:27                     ` [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for filling cache (WIP) Jakub Narebski
@ 2010-01-23  0:27                     ` Jakub Narebski
  2010-01-24 22:24                       ` Petr Baudis
  2010-01-23 19:55                     ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H.
                                       ` (2 subsequent siblings)
  11 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-01-23  0:27 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch, as you can see, lack proper commit message: it is work in
progress.

Contrary to the patch by J.H. the subroutine that is responsible for
doing "Generating..." progress info / activity indicator is in
gitweb.perl, and not in cache.pm.  Also it does not use not declared
nor defined $title variable ;-)

Compared with original patch by J.H. gitweb prints generating info in
more restricted set of situations; the set of actions where gitweb
does not generate activity indicator is larger.  We could probably
provide activity indicator also for (possibly) non-HTML output, like
'blob_plain' or 'patches', provided that 'User-Agent' denotes that we
are using web browser.

In the future we would probably want Ajax-y JavaScript-based activity
indicator, without need for busy-wait.

NOTE that I have not tested that it works, only that it should not
contain syntax errors...

 gitweb/cache.pm    |   32 ++++++++++++++++++++-
 gitweb/gitweb.perl |   75 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index 1adf467..2c7be3a 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -63,7 +63,8 @@ sub new {
 	$self = bless($self, $class);
 
 	my ($root, $depth, $ns);
-	my ($expires_min, $expires_max, $increase_factor, $check_load);
+	my ($expires_min, $expires_max, $increase_factor,
+	    $check_load, $generating_info);
 	if (defined $p_options_hash_ref) {
 		$root  = $p_options_hash_ref->{'cache_root'};
 		$depth = $p_options_hash_ref->{'cache_depth'};
@@ -76,6 +77,7 @@ sub new {
 			$p_options_hash_ref->{'expires_max'};
 		$increase_factor = $p_options_hash_ref->{'expires_factor'};
 		$check_load      = $p_options_hash_ref->{'check_load'};
+		$generating_info = $p_options_hash_ref->{'generating_info'};
 	}
 	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
 	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
@@ -92,6 +94,7 @@ sub new {
 	$self->set_expires_max($expires_max);
 	$self->set_increase_factor($increase_factor);
 	$self->set_check_load($check_load);
+	$self->set_generating_info($generating_info);
 
 	return $self;
 }
@@ -186,6 +189,18 @@ sub set_check_load {
 	$self->{'_Check_Load'} = $sub;
 }
 
+sub get_generating_info {
+	my ($self) = @_;
+
+	return $self->{'_Generating_Info'};
+}
+
+sub set_generating_info {
+	my ($self, $sub) = @_;
+
+	$self->{'_Generating_Info'} = $sub;
+}
+
 # ......................................................................
 
 sub get_expires_in {
@@ -203,6 +218,12 @@ sub get_expires_in {
 	return $expires_in;
 }
 
+sub generating_info {
+	if (defined $self->get_generating_info()) {
+		$self->get_generating_info()->($self, @_);
+	}
+}
+
 # ----------------------------------------------------------------------
 # (private) utility functions and methods
 
@@ -434,6 +455,9 @@ sub compute {
 		# acquired writers lock
 		my $pid = fork() if $data;
 		if (!defined $pid || $pid) {
+			# provide "generating page..." info if there is no stale data to serve
+			$self->generating_info($p_key, $lock_fh)
+				unless ($data);
 			# parent, or didn't fork
 			$data = $p_coderef->($self, $p_key);
 			$self->set($p_key, $data);
@@ -451,8 +475,12 @@ sub compute {
 			;
 		}
 	} else {
+		# some else process is (re)generating cache
 		if (!defined $data) {
-			# get readers lock if there is no stale data to serve
+			# there is no stale data to serve
+			# provide "generating page..." info
+			$self->generating_info($p_key, $lock_fh);
+			# get readers lock
 			flock($lock_fh, LOCK_SH);
 			$data = $self->restore($self->get_namespace(), $p_key);
 		}
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 0394dc8..cb46b8c 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -13,7 +13,7 @@ use CGI qw(:standard :escapeHTML -nosticky);
 use CGI::Util qw(unescape);
 use CGI::Carp qw(fatalsToBrowser);
 use Encode;
-use Fcntl ':mode';
+use Fcntl qw(:mode :flock);
 use File::Find qw();
 use File::Basename qw(basename);
 binmode STDOUT, ':utf8';
@@ -1023,6 +1023,7 @@ if ($caching_enabled) {
 		'cache_root'  => '/tmp/cache',
 		'cache_depth' => 2,
 		'expires_in'  => 20, # in seconds
+		'generating_info' => \&git_generating_data_html,
 	});
 	cache_fetch($cache, $action);
 } else {
@@ -3215,6 +3216,78 @@ sub blob_contenttype {
 ## ======================================================================
 ## functions printing HTML: header, footer, error page
 
+sub git_generating_data_html {
+	my ($cache, $key, $lock_fh) = @_;
+
+	if ($action eq 'atom' || $action eq 'rss' || $action eq 'opml' || # feeds
+	    $action eq 'blob_plain' || # unknown mimetype
+	    $action eq 'commitdiff_plain' || # text/plain
+	    $action eq 'patch' || $action eq 'patches' || # text/plain
+	    $action eq 'snapshot') { # binary
+		return;
+	}
+
+	my $title = "[Generating...] $site_name";
+	if (defined $project) {
+		$title .= " - " . to_utf8($project);
+		if (defined $action) {
+			$title .= "/$action";
+			if (defined $file_name) {
+				$title .= " - " . esc_path($file_name);
+				if ($action eq "tree" && $file_name !~ m|/$|) {
+					$title .= "/";
+				}
+			}
+		}
+	}
+
+	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
+	print STDOUT $cgi->header(-type => 'text/html', -charset => 'utf-8',
+	                          -status=> '200 OK', -expires => 'now');
+	print STDOUT <<"EOF";
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
+<!-- git web interface version $version -->
+<!-- git core binaries version $git_version -->
+<head>
+<meta http-equiv="content-type" content="text/html; charset=utf-8" />
+<meta http-equiv="refresh" content="0" />
+<meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version" />
+<meta name="robots" content="noindex, nofollow" />
+<title>$title</title>
+</head>
+<body>
+EOF
+	print STDOUT 'Generating..';
+
+	my $ready;
+	my ($wait, $wait_step, $wait_max); # in seconds
+	$wait = 0;
+	$wait_step = 2;
+	$wait_max = 20;
+	$| = 1; # autoflush
+	do {
+		print STDOUT '.';
+
+		sleep $wait_step if ($wait > 0);
+		$wait += $wait_step;
+
+		$ready = flock($lock_fh, LOCK_SH|LOCK_NB)
+			if $lock_fh;
+
+	} while (!$ready && ($wait < $wait_max));
+
+	print STDOUT <<"EOF";
+
+</body>
+</html>
+EOF
+	#exit 0;
+	return;
+}
+
 sub git_header_html {
 	my $status = shift || "200 OK";
 	my $expires = shift;
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-23  0:27                     ` [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating " Jakub Narebski
@ 2010-01-24 22:24                       ` Petr Baudis
  2010-01-25  0:03                         ` Jakub Narebski
  2010-01-25 11:46                         ` Jakub Narebski
  0 siblings, 2 replies; 60+ messages in thread
From: Petr Baudis @ 2010-01-24 22:24 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: git, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley

  I have stupid question, common to both the original patch and this
RFC.

> [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when
> regenerating cache (WIP)

  Just why is a "Generating..." page appropriate?

  I have to admit I hate it; can you please at least make it
configurable? Why is it needed at all? It generates unnecessary
redirects, clears my previous page in the browser by a useless
placeholder, confuses non-interactive HTTP clients and I just can't
see the real purpose.

  What's the gain?

  Thanks,

-- 
				Petr "Pasky" Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-24 22:24                       ` Petr Baudis
@ 2010-01-25  0:03                         ` Jakub Narebski
  2010-01-25  1:17                           ` Jakub Narebski
  2010-01-25 11:46                         ` Jakub Narebski
  1 sibling, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-01-25  0:03 UTC (permalink / raw)
  To: Petr Baudis
  Cc: git, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley

On Sun, 24 Jan 2010, Petr Baudis wrote:

>   I have stupid question, common to both the original patch and this
> RFC.
> 
> > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when
> > regenerating cache (WIP)
> 
>   Just why is a "Generating..." page appropriate?
> 
>   I have to admit I hate it; can you please at least make it
> configurable? Why is it needed at all? It generates unnecessary
> redirects, clears my previous page in the browser by a useless
> placeholder, confuses non-interactive HTTP clients and I just can't
> see the real purpose.
> 
>   What's the gain?

Don't ask me, I am just following original patch by John 'Warthog9' 
Hawley (J.H.) here, splitting it into smaller and hopefully easier to 
review parts, with each separate feature in a separate patch (so that 
features like this one are more visible).  I _think_ that in original 
patch by J.H. this feature was configurable (at least to some extent).  
As this series is more proof of concept than ready to apply patches, 
some of configurability is missing currently.

I think that adding the "Generating..." page follows behavior of other 
well known web applications, such as for example GMail.  Currently
if caching is enabled, and there is no appropriate entry in cache (not
even [slightly] stale entry) the page is shown only when it is generated
in full: there is no streaming (although I planned to add it).  So for
example for 'blame' view or for pickaxe search you could wait quite
a bit for a page to be shown.  The "Generating..." page is a kind of 
activity indicator here.

Note that the JavaScript-using Ajax-y version from the following patch 
'[RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when 
regenerating cache (WIP)'
  http://permalink.gmane.org/gmane.comp.version-control.git/137896
is not visible in browser history, as it simply replaces "Generating..."
page with the result.  So there is no 'unnecessary redirect', and there
is no 'useless placeholder' in browser history.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-25  0:03                         ` Jakub Narebski
@ 2010-01-25  1:17                           ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-25  1:17 UTC (permalink / raw)
  To: Petr Baudis
  Cc: git, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley

Jakub Narebski wrote:
> On Sun, 24 Jan 2010, Petr Baudis wrote:
> 
> >   I have stupid question, common to both the original patch and this
> > RFC.
> > 
> > > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when
> > > regenerating cache (WIP)
> > 
> >   Just why is a "Generating..." page appropriate?
> > 
> >   I have to admit I hate it; can you please at least make it
> > configurable? Why is it needed at all? It generates unnecessary
> > redirects, clears my previous page in the browser by a useless
> > placeholder, confuses non-interactive HTTP clients and I just can't
> > see the real purpose.

> Note that the JavaScript-using Ajax-y version from the following patch 
> '[RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when 
> regenerating cache (WIP)'
>   http://permalink.gmane.org/gmane.comp.version-control.git/137896
> is not visible in browser history, as it simply replaces "Generating..."
> page with the result.  So there is no 'unnecessary redirect', and there
> is no 'useless placeholder' in browser history.

Actually I have checked and the version used on git.kernel.org does show
"Generating..." page, but it does not appear in the history.  I guess
that refresh with no URL does not count as redirect.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-24 22:24                       ` Petr Baudis
  2010-01-25  0:03                         ` Jakub Narebski
@ 2010-01-25 11:46                         ` Jakub Narebski
  2010-01-25 13:02                           ` Petr Baudis
  1 sibling, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-01-25 11:46 UTC (permalink / raw)
  To: Petr Baudis
  Cc: git, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley

On Sun, 24 Jan 2010, Petr Baudis wrote:
>   I have stupid question, common to both the original patch and this
> RFC.
> 
> > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when
> > regenerating cache (WIP)
> 
>   Just why is a "Generating..." page appropriate?
> 
>   I have to admit I hate it; can you please at least make it
> configurable? Why is it needed at all? It [...] confuses
> non-interactive HTTP clients [...]

First, if I understand the code correctly HTTP clients which do not 
honor metaredirect (http-equiv refresh) would get page which looks
the following

  <html>
  Generating...
  </html>
  <html>
  Gitweb page
  </html>


Second, gitweb can always check User-Agent header, and serve 
"Generating..." page only to web browsers:

  unless (defined $cgi->user_agent() &&
          $cgi->user_agent() =~ /\b(Mozilla|Opera)\b/i) {
  	return;
  }

or something like that.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-25 11:46                         ` Jakub Narebski
@ 2010-01-25 13:02                           ` Petr Baudis
  2010-01-25 13:48                             ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: Petr Baudis @ 2010-01-25 13:02 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: git, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley

On Mon, Jan 25, 2010 at 12:46:39PM +0100, Jakub Narebski wrote:
> On Sun, 24 Jan 2010, Petr Baudis wrote:
> >   I have stupid question, common to both the original patch and this
> > RFC.
> > 
> > > [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when
> > > regenerating cache (WIP)
> > 
> >   Just why is a "Generating..." page appropriate?
> > 
> >   I have to admit I hate it; can you please at least make it
> > configurable? Why is it needed at all? It [...] confuses
> > non-interactive HTTP clients [...]
> 
> First, if I understand the code correctly HTTP clients which do not 
> honor metaredirect (http-equiv refresh) would get page which looks
> the following
> 
>   <html>
>   Generating...
>   </html>
>   <html>
>   Gitweb page
>   </html>

To be clear, I conjectured it confuses non-interactive HTTP clients from
one of the TODOs in your patch series - I'm not sure about it myself. :)

> Second, gitweb can always check User-Agent header, and serve 
> "Generating..." page only to web browsers:
> 
>   unless (defined $cgi->user_agent() &&
>           $cgi->user_agent() =~ /\b(Mozilla|Opera)\b/i) {
>   	return;
>   }
> 
> or something like that.

I'm not too happy with this. What about Safari? Opera? ELinks? There's a
lot of web browsers.

Most of the issues can be worked around, but I'm not sure why to go
through all the trouble. I just personally don't see the value in having
the placeholder in there at all, to me it is distracting UI even if all
the technicalities are put aside.

But if it will be possible to turn this off eventually, it's all your
call whether to bother implementing it. :-)

-- 
				Petr "Pasky" Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-25 13:02                           ` Petr Baudis
@ 2010-01-25 13:48                             ` Jakub Narebski
  2010-01-25 13:56                               ` Petr Baudis
  2010-01-25 20:41                               ` J.H.
  0 siblings, 2 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-25 13:48 UTC (permalink / raw)
  To: Petr Baudis
  Cc: git, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley

On Mon, Jan 25, 2010, Petr Baudis wrote:
> On Mon, Jan 25, 2010 at 12:46:39PM +0100, Jakub Narebski wrote:
>> On Sun, 24 Jan 2010, Petr Baudis wrote:

>>>   I have stupid question, common to both the original patch and this
>>> RFC.
>>> 
>>>> [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when
>>>> regenerating cache (WIP)
>>> 
>>>   Just why is a "Generating..." page appropriate?
>>> 
>>>   I have to admit I hate it; can you please at least make it
>>> configurable? Why is it needed at all? It [...] confuses
>>> non-interactive HTTP clients [...]

>> Second, gitweb can always check User-Agent header, and serve 
>> "Generating..." page only to web browsers:
>> 
>>   unless (defined $cgi->user_agent() &&
>>           $cgi->user_agent() =~ /\b(Mozilla|Opera)\b/i) {
>>   	return;
>>   }
>> 
>> or something like that.
> 
> I'm not too happy with this. What about Safari? Opera? ELinks? There's a
> lot of web browsers.

The "Mozilla" part would catch all "Mozilla compatibile" web browsers,
including Firefox (and other Gecko-based web browsers), Internet Explorer,
WebKit based browsers including Safari and Chrome and Konqueror.
The "Opera" part would catch Opera.
http://www.nczonline.net/blog/2010/01/12/history-of-the-user-agent-string/

As to other web browsers like Elinks, Lynx, w3m, Dillo, etc.: the issue
is whether they honor '<meta http-equiv="refresh" content="0" />'.  
I think it is better to stay on the safe side; it is not disaster if web
browser is not shown "Generating..." page where it could (but see 
explanation below).

> Most of the issues can be worked around, but I'm not sure why to go
> through all the trouble. I just personally don't see the value in having
> the placeholder in there at all, to me it is distracting UI even if all
> the technicalities are put aside.

The issue that "Generating..." page tries to solve is, I think, the 
following.  

Some actions, like 'blame' view or pickaxe search, or grep search,
can take quite a long time to generate, with times counted in
tens of seconds.  It is not that visible for non-caching, because
gitweb streams output so we have at least _some_ output upfront quite
fast.

Now those patches (mine and J.H. both) make gitweb use locking
(it is IIRC configurable in J.H. patch) to make only one process
generate the page if it is missing from cache, or is stale.  Now
if it is missing, we have to wait until it is generated in full
before being able to show it to client.  While it is possible to
"tee" output (using PerlIO::tee, or Capture::Tiny, or tie like
CGI::Cache) writing it simultaneously to browser and to cache for 
the process that is generating data, it is as far as I understand
it impossible for processes which are waiting for data.  Therefore
the need for "Generating..." page, so the user does not think that
web server hung or something, and is not generating output.

We can try to reduce occurrences of cache miss stampedes by using
'expires_variance' feature[1] from CHI - Unified caching interface.
We can also turn off locking and tee output to have some output upfront
as an activity indicator instead of this "Generating..." page.

[1]: http://search.cpan.org/~jswartz/CHI-0.33/lib/CHI.pm#set

> But if it will be possible to turn this off eventually, it's all your
> call whether to bother implementing it. :-)

In my implementation it is (or rather would be) as simple as just
not passing 'generating_info' => \&git_generating_data_html in the
GitwebCache::SimpleFileCache constructor.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-25 13:48                             ` Jakub Narebski
@ 2010-01-25 13:56                               ` Petr Baudis
  2010-01-25 20:32                                 ` J.H.
  2010-01-25 20:58                                 ` Jakub Narebski
  2010-01-25 20:41                               ` J.H.
  1 sibling, 2 replies; 60+ messages in thread
From: Petr Baudis @ 2010-01-25 13:56 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: git, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley

On Mon, Jan 25, 2010 at 02:48:26PM +0100, Jakub Narebski wrote:
> Now those patches (mine and J.H. both) make gitweb use locking
> (it is IIRC configurable in J.H. patch) to make only one process
> generate the page if it is missing from cache, or is stale.  Now
> if it is missing, we have to wait until it is generated in full
> before being able to show it to client.  While it is possible to
> "tee" output (using PerlIO::tee, or Capture::Tiny, or tie like
> CGI::Cache) writing it simultaneously to browser and to cache for 
> the process that is generating data, it is as far as I understand
> it impossible for processes which are waiting for data.  Therefore
> the need for "Generating..." page, so the user does not think that
> web server hung or something, and is not generating output.

Ah, ok, so the message is there to cover up for a technical problem. ;-)
I didn't quite realize. Then, it would be great to tweak the mechanisms
so that the user does not really have to wait.

So, I wonder about two things:

(i) How often does it happen that two requests for the same page are
received? Has anyone measured it? Or is at least able to make
a minimally educated guess? IOW, isn't this premature optimization?

(ii) Can't the locked gitwebs do the equivalent of tail -f?

P.S.: Again the disclaimer - if this is "too hard", it's better to
accept patches like they are, then improve this later. But perhaps
a better solution would be not to clutter the code by optimizing this
case at all if it's not clear it really matters in the real world.

				Petr "Pasky" Baudis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-25 13:56                               ` Petr Baudis
@ 2010-01-25 20:32                                 ` J.H.
  2010-01-26  1:49                                   ` Jakub Narebski
  2010-01-28 17:39                                   ` Petr Baudis
  2010-01-25 20:58                                 ` Jakub Narebski
  1 sibling, 2 replies; 60+ messages in thread
From: J.H. @ 2010-01-25 20:32 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Jakub Narebski, git, John 'Warthog9' Hawley

On 01/25/2010 05:56 AM, Petr Baudis wrote:
> On Mon, Jan 25, 2010 at 02:48:26PM +0100, Jakub Narebski wrote:
>> Now those patches (mine and J.H. both) make gitweb use locking
>> (it is IIRC configurable in J.H. patch) to make only one process
>> generate the page if it is missing from cache, or is stale.  Now
>> if it is missing, we have to wait until it is generated in full
>> before being able to show it to client.  While it is possible to
>> "tee" output (using PerlIO::tee, or Capture::Tiny, or tie like
>> CGI::Cache) writing it simultaneously to browser and to cache for 
>> the process that is generating data, it is as far as I understand
>> it impossible for processes which are waiting for data.  Therefore
>> the need for "Generating..." page, so the user does not think that
>> web server hung or something, and is not generating output.
> 
> Ah, ok, so the message is there to cover up for a technical problem. ;-)
> I didn't quite realize. Then, it would be great to tweak the mechanisms
> so that the user does not really have to wait.

No, that is an incorrect assumption on how the 'Generating...' page
works, and your missing a bit of the point.

(1) The message itself 'Generating...' is a que to the user that
something is happening and that the browser is not actually hanging.
Web users are at the point where if things are not instantaneous and
show immediately they will either browse away completely or hit the
refresh button incessantly until content does appear.  While the page is
usually only seen for about a second, and I'll admit it can be annoying,
it's nothing more than a 'sit tight a second'.  For things like the
front page it can take upwards of 7 seconds to generate for a single
user, a lot to ask for a no response scenario.

(2) It prevents the stampeding herd problem, which was very vehemently
discussed 4 years ago by HPA and myself and roughly boils down to this:

When a single user comes into the site, in particular the front page, it
kicks off a process that will start to generate at it, causing a huge
amount of git requests into individual repositories and a lot of disk
i/o.  A second user will then come in and the same requests will start
to be done from the beginning again, and so on until you basically kill
the machine because the disk i/o goes up enough that it can't ever
service the requests fast enough.

This does 2 things in the end:

1) means there's only 1 copy of the page ever being generated, thus
meaning there isn't extraneous and dangerous disk i/o going on on the system

2) prevents a user from reporting to the website that it's broken by
giving them a visual que that things aren't broken.

> So, I wonder about two things:
> 
> (i) How often does it happen that two requests for the same page are
> received? Has anyone measured it? Or is at least able to make
> a minimally educated guess? IOW, isn't this premature optimization?

For most pages, not many but it happens more often than you think.  The
data I have is much too old to be useful now but the front page could,
at times, have up to 30 people waiting for it without caching.  This is
a very important patch believe it or not.  For a site the size of
kernel.org it cannot exist without this.

But here's a quick stat, in 36 hours git.kernel.org has had
156099 accesses world wide or about 1.2 accesses a second.

android.git.kernel.org, in the same time period has had 115818 accesses.

If the first request takes 7 seconds to generate, by the time it's done
there are now 3 additional requests running.  If it again takes 7
seconds to generate there are now another 3 requests running, etc.  Very
quickly you've got so much i/o running the box more or less is useless.

> (ii) Can't the locked gitwebs do the equivalent of tail -f?

Not really going to help much, most of the gitweb operations won't
output much of anything beyond the header until it's collected all of
the data it needs anyway and then there will be a flurry of output.  It
also means that this 'Generating...' page will only work for caching
schemes that tail can read out of, which I'm not sure it would work all
that well with things like memcached or a non-custom caching layer where
we don't necessarily have direct access to the file being written to.

At least the way I had it (and I'll admit I haven't read through Jakub's
re-working of my patches so I don't know if it's still there) is that
with background caching you only get the 'Generating...' page if it's
new or the content is grossly out of data.  If it's a popular page and
it's not grossly out of date it shows you the 'stale' data while it
generates the new content in the background anyway, only locking you out
when the new file is being written.  Or at least that's how I had it.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-25 20:32                                 ` J.H.
@ 2010-01-26  1:49                                   ` Jakub Narebski
  2010-01-28 17:39                                   ` Petr Baudis
  1 sibling, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-26  1:49 UTC (permalink / raw)
  To: J.H.; +Cc: Petr Baudis, git, John 'Warthog9' Hawley

On Mon, 25 Jan 2010, J.H. wrote:
> On 01/25/2010 05:56 AM, Petr Baudis wrote:
>> On Mon, Jan 25, 2010 at 02:48:26PM +0100, Jakub Narebski wrote:

>>> Now those patches (mine and J.H. both) make gitweb use locking
>>> (it is IIRC configurable in J.H. patch) to make only one process
>>> generate the page if it is missing from cache, or is stale.  Now
>>> if it is missing, we have to wait until it is generated in full
>>> before being able to show it to client.  While it is possible to
>>> "tee" output (using PerlIO::tee, or Capture::Tiny, or tie like
>>> CGI::Cache) writing it simultaneously to browser and to cache for 
>>> the process that is generating data, it is as far as I understand
>>> it impossible for processes which are waiting for data.  Therefore
>>> the need for "Generating..." page, so the user does not think that
>>> web server hung or something, and is not generating output.
>> 
>> Ah, ok, so the message is there to cover up for a technical problem. ;-)
>> I didn't quite realize. Then, it would be great to tweak the mechanisms
>> so that the user does not really have to wait.
> 
> No, that is an incorrect assumption on how the 'Generating...' page
> works, and your missing a bit of the point.
> 
> (1) The message itself 'Generating...' is a cue to the user that
> something is happening and that the browser is not actually hanging.
> Web users are at the point where if things are not instantaneous and
> show immediately they will either browse away completely or hit the
> refresh button incessantly until content does appear.  While the page is
> usually only seen for about a second, and I'll admit it can be annoying,
> it's nothing more than a 'sit tight a second'.  For things like the
> front page it can take upwards of 7 seconds to generate for a single
> user, a lot to ask for a no response scenario.

Pasky is asking later whether instead of separate "Generating..." page
(be it server side using trick with http-equiv refresh taking place when
page/connection is closed, or JavaScript Ajax-y one) couldn't gitweb
use streaming and have early parts of generated page (in particular the
page header and navigation bar) serve as activity indicator.  I think
it is possible, if a bit more complicated, and such feature can be 
introduced in a separate commit.

As to 'sit tight for a second': the code can be modified to print/show 
"Generating..." page only if we are waiting more than second (or more
than half of second with utime from Time::HiRes) for the data.  Pasky,
would such change address some of your concerns about "Generating..."
page as activity indicator?

> 
> (2) It prevents the stampeding herd problem, which was very vehemently
> discussed 4 years ago by HPA and myself and roughly boils down to this:
> 
> When a single user comes into the site, in particular the front page, it
> kicks off a process that will start to generate at it, causing a huge
> amount of git requests into individual repositories and a lot of disk
> i/o.  A second user will then come in and the same requests will start
> to be done from the beginning again, and so on until you basically kill
> the machine because the disk i/o goes up enough that it can't ever
> service the requests fast enough.

As you can see from the way my proof of concept (of splitting) caching
series is split, I think that those two features are separate.  Using
locking to have one writer, and protect against cache miss stampede 
problem is one issue, and having "Generating..." page is another issue.
They are not orthogonal as they are now, as "Generating..." page depends
on locking to check if page got generated, but it could simply check
whether the cache file is present already.

> 
> This does 2 things in the end:
> 
> 1) means there's only 1 copy of the page ever being generated, thus
> meaning there isn't extraneous and dangerous disk i/o going on on the system
> 
> 2) prevents a user from reporting to the website that it's broken by
> giving them a visual cue that things aren't broken.

1) is about locking ('expires_variance' can help there, too), 2) is about
"Generating..." but could be done with streaming: "tee"/"tail" solution.

>> So, I wonder about two things:
>> 
>> (i) How often does it happen that two requests for the same page are
>> received? Has anyone measured it? Or is at least able to make
>> a minimally educated guess? IOW, isn't this premature optimization?
> 
> For most pages, not many but it happens more often than you think.  The
> data I have is much too old to be useful now but the front page could,
> at times, have up to 30 people waiting for it without caching.  This is
> a very important patch believe it or not.  For a site the size of
> kernel.org it cannot exist without this.
> 
> But here's a quick stat, in 36 hours git.kernel.org has had
> 156099 accesses world wide or about 1.2 accesses a second.
> 
> android.git.kernel.org, in the same time period has had 115818 accesses.
> 
> If the first request takes 7 seconds to generate, by the time it's done
> there are now 3 additional requests running.  If it again takes 7
> seconds to generate there are now another 3 requests running, etc.  Very
> quickly you've got so much i/o running the box more or less is useless.

On the other hand do cache for front page (projects list) ever expire
so that there are no stale data to serve?  So the issue with growing I/O
is I guess only with priming the cache.  

BTW. I wonder how well the 'expires_variance' solution used in CHI would
protect against cache miss stampedes; that probably depends on the ratio
of variance to time it takes gitweb to generate the page.

>> (ii) Can't the locked gitwebs do the equivalent of tail -f?
> 
> Not really going to help much, most of the gitweb operations won't
> output much of anything beyond the header until it's collected all of
> the data it needs anyway and then there will be a flurry of output.  

Well, it might be enough to have headers as activity indicator... but
on the other hand users can think that server is broken if they see no
further data, as opposed to the "Generating..." page.

> It also means that this 'Generating...' page will only work for caching
> schemes that tail can read out of, which I'm not sure it would work all
> that well with things like memcached or a non-custom caching layer where
> we don't necessarily have direct access to the file being written to.

I don't think that caching interfaces such as CHI or Cache::Cache, or
individual caching engines such as Cache::Memcached or Cache::FastMmap
offer streaming access.  OTOH we can read from the side as the data
is generated ("tee" to a temporary file, or something), but it is even
more complication.

> 
> At least the way I had it (and I'll admit I haven't read through Jakub's
> re-working of my patches so I don't know if it's still there) is that
> with background caching you only get the 'Generating...' page if it's
> new or the content is grossly out of data.  If it's a popular page and
> it's not grossly out of date it shows you the 'stale' data while it
> generates the new content in the background anyway, only locking you out
> when the new file is being written.  Or at least that's how I had it.

It is there, i.e. gitweb would serve stale data if it can be found,
but it lacks 'too stale to serve as stale data' timeout/expire time.
I skipped this feature to get split series out of door faster.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-25 20:32                                 ` J.H.
  2010-01-26  1:49                                   ` Jakub Narebski
@ 2010-01-28 17:39                                   ` Petr Baudis
  2010-01-31 11:58                                     ` Jakub Narebski
  1 sibling, 1 reply; 60+ messages in thread
From: Petr Baudis @ 2010-01-28 17:39 UTC (permalink / raw)
  To: J.H.; +Cc: Jakub Narebski, git, John 'Warthog9' Hawley

On Mon, Jan 25, 2010 at 12:32:37PM -0800, J.H. wrote:
> This does 2 things in the end:
> 
> 1) means there's only 1 copy of the page ever being generated, thus
> meaning there isn't extraneous and dangerous disk i/o going on on the system

But this has nothing to do with what you _do_ when there are multiple
requests, whether you do the same as if caching was disabled (hang until
content is generated) or doing something novel (creating redirects
through "Generating..." page).

> 2) prevents a user from reporting to the website that it's broken by
> giving them a visual que that things aren't broken.

But this has nothing to do with caching per se, right? I think it
actually makes _no difference_ if caching is enabled or not to this
problem, or am I missing something?

My point is, I guess, that showing the Generating page doesn't seem to
have actually anything to do with the caching itself?

> > So, I wonder about two things:
> > 
> > (i) How often does it happen that two requests for the same page are
> > received? Has anyone measured it? Or is at least able to make
> > a minimally educated guess? IOW, isn't this premature optimization?
> 
> For most pages, not many but it happens more often than you think.  The
> data I have is much too old to be useful now but the front page could,
> at times, have up to 30 people waiting for it without caching.  This is
> a very important patch believe it or not.  For a site the size of
> kernel.org it cannot exist without this.
..snip numbers..

Ok, I think you convinced me here (even if I think the front page is
kind of special case).

> > (ii) Can't the locked gitwebs do the equivalent of tail -f?
> 
> Not really going to help much, most of the gitweb operations won't
> output much of anything beyond the header until it's collected all of
> the data it needs anyway and then there will be a flurry of output.  It
> also means that this 'Generating...' page will only work for caching
> schemes that tail can read out of, which I'm not sure it would work all
> that well with things like memcached or a non-custom caching layer where
> we don't necessarily have direct access to the file being written to.
> 
> At least the way I had it (and I'll admit I haven't read through Jakub's
> re-working of my patches so I don't know if it's still there) is that
> with background caching you only get the 'Generating...' page if it's
> new or the content is grossly out of data.  If it's a popular page and
> it's not grossly out of date it shows you the 'stale' data while it
> generates the new content in the background anyway, only locking you out
> when the new file is being written.  Or at least that's how I had it.

Well, my user experience with gitweb on kernel.org is that I get
Generating... page all the time when I dive deep enough to the object
tree. I just find it really distracting and sometimes troublesome when
I want to wget some final object.

I think it's fine to take in the caching support with the Generating...
page in the bundle, but I do want to declare my intention to get rid of
it later, at least for caching backends that could do without it - for
pages where content appears incrementally, tail -f, for pages where
content appears all at once, show at least the header and some "I'm
busy" notification without redirects.

-- 
				Petr "Pasky" Baudis
If you can't see the value in jet powered ants you should turn in
your nerd card. -- Dunbal (464142)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-28 17:39                                   ` Petr Baudis
@ 2010-01-31 11:58                                     ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-31 11:58 UTC (permalink / raw)
  To: Petr Baudis; +Cc: J.H., git, John 'Warthog9' Hawley

On Thu, Jan 28, 2010, Petr Baudis wrote:
> On Mon, Jan 25, 2010 at 12:32:37PM -0800, J.H. wrote:
> > This does 2 things in the end:
> > 
> > 1) means there's only 1 copy of the page ever being generated, thus
> > meaning there isn't extraneous and dangerous disk i/o going on on the system
> 
> But this has nothing to do with what you _do_ when there are multiple
> requests, whether you do the same as if caching was disabled (hang until
> content is generated) or doing something novel (creating redirects
> through "Generating..." page).
> 
> > 2) prevents a user from reporting to the website that it's broken by
> > giving them a visual que that things aren't broken.
> 
> But this has nothing to do with caching per se, right? I think it
> actually makes _no difference_ if caching is enabled or not to this
> problem, or am I missing something?
> 
> 
> My point is, I guess, that showing the Generating page doesn't seem to
> have actually anything to do with the caching itself?

The point is that without caching it is easy to streaming response, and
to consider early parts of page (like page header, generated before any
heavy work) to serve as activity indicator.

With caching it is difficult to have streaming response, both from
technical point of view (writer must generate to client and to cache
simultaneously, readers must know when writer finished work to close
connection), and from robustness point of view (what happens if writer
is interrupted / killed before finishing generating output).  With
"generate then display" (which is not exclusive to caching, and is
another possible way of generating content even without caching) we
rather need some kind of activity indicator like "Generating..." page.

I think that "Generating..." page can be improved in two ways:
* Show "Generating..." page only if we are waiting for response for
  more than one second.  This might need mucking with alarms, as I think
  that sleep 1 before $self->generating_info(...) would be not a good
  solution.
* Stream response (using PerlIO::tee layer from PerlIO::Util, or 
  Capture::Tiny module, or tied filehandle like in CGI::Cache) for
  writer (i.e. process generating data), and wait for it to be finished
  (perhaps with "Generating...") in readers.  This way you wouldn't get
  "Generating..." page for rare views/URLs, and for common views/URLs
  there is high probability that you would not need "Generating..."
  page as there would be slightly stale response to serve.
Of course one can implement _both_ of those solutions, i.e. wait one
seconds in readers, and stream in writer.

I am not sure, but there might be another issue why activity indicator
is more important for the case with caching enabled.  If you interrupt
writer, one of readers waiting for finished data would have to take
role of writer, which besides need for technical solution to this problem
would mean longer wait.

[..] 
> > > (ii) Can't the locked gitwebs do the equivalent of tail -f?
> > 
> > Not really going to help much, most of the gitweb operations won't
> > output much of anything beyond the header until it's collected all of
> > the data it needs anyway and then there will be a flurry of output.  It
> > also means that this 'Generating...' page will only work for caching
> > schemes that tail can read out of, which I'm not sure it would work all
> > that well with things like memcached or a non-custom caching layer where
> > we don't necessarily have direct access to the file being written to.
> > 
> > At least the way I had it (and I'll admit I haven't read through Jakub's
> > re-working of my patches so I don't know if it's still there) is that
> > with background caching you only get the 'Generating...' page if it's
> > new or the content is grossly out of data.  If it's a popular page and
> > it's not grossly out of date it shows you the 'stale' data while it
> > generates the new content in the background anyway, only locking you out
> > when the new file is being written.  Or at least that's how I had it.
> 
> Well, my user experience with gitweb on kernel.org is that I get
> "Generating..." page all the time when I dive deep enough to the object
> tree. I just find it really distracting and sometimes troublesome when
> I want to wget some final object.

First, the user_agent checking would help there (it's a pity that all
web spiders (bots) and all non-interactive downloaders do not say what
they are explicitly in User-Agent string).  

Second, I guess that waiting 1 second (or more) before showing 
"Generating..." page would help in most cases.

> 
> I think it's fine to take in the caching support with the Generating...
> page in the bundle, but I do want to declare my intention to get rid of
> it later, at least for caching backends that could do without it - for
> pages where content appears incrementally, tail -f, for pages where
> content appears all at once, show at least the header and some "I'm
> busy" notification without redirects.

In the final version this should be fully configurable.  Note that
the series of patches I have send were just proof of concept for 
splitting caching patch into smaller parts / individual features.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-25 13:56                               ` Petr Baudis
  2010-01-25 20:32                                 ` J.H.
@ 2010-01-25 20:58                                 ` Jakub Narebski
  1 sibling, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-25 20:58 UTC (permalink / raw)
  To: Petr Baudis
  Cc: git, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley

On Mon, Jan 25, 2010 at 14:56 +0100, Petr Baudis wrote:
> On Mon, Jan 25, 2010 at 02:48:26PM +0100, Jakub Narebski wrote:

> > Now those patches (mine and J.H. both) make gitweb use locking
> > (it is IIRC configurable in J.H. patch) to make only one process
> > generate the page if it is missing from cache, or is stale.  Now
> > if it is missing, we have to wait until it is generated in full
> > before being able to show it to client.  While it is possible to
> > "tee" output (using PerlIO::tee, or Capture::Tiny, or tie like
> > CGI::Cache) writing it simultaneously to browser and to cache for 
> > the process that is generating data, it is as far as I understand
> > it impossible for processes which are waiting for data.  Therefore
> > the need for "Generating..." page, so the user does not think that
> > web server hung or something, and is not generating output.
> 
> Ah, ok, so the message is there to cover up for a technical problem. ;-)
> I didn't quite realize. Then, it would be great to tweak the mechanisms
> so that the user does not really have to wait.

Well, the mechanism would certainly be configurable in final version
(current split version is more of proof of concept of splitting).

> 
> So, I wonder about two things:
> 
> (i) How often does it happen that two requests for the same page are
> received? Has anyone measured it? Or is at least able to make
> a minimally educated guess? IOW, isn't this premature optimization?

To be more exact the question is how often second request for the same
page appears when earlier request didn't finished processing.  It is
the matter of both frequency of given requests, and time it takes to
generate request (which grows with growing load on server).

As to measurements: Pasky, do you have access logs, or their analysis
a la AWStats, Webalizer and the like, for repo.or.cz?  Warthog9, do
you have access logs or analysis for git.kernel.org?  Can you get similar
from fedorahosted?

> (ii) Can't the locked gitwebs do the equivalent of tail -f?

Well, it could, in principle, but it would need some changes.  

First, instead of using temporary file to create cache entry atomically
(write to temporary file, then rename) the process generating data would
have to write to file other processes can read from.  It could be e.g.
lockfile.

Second, there would be needed extended cache API so that generated data
is streamed to cache file, ->set($key, $data) ==> ->set($key, $fh) or
->set_io($key, $fh).  This would mean some complications, but what might
be more important is that this trick would not work as far as I can see
with other caching backends / caching engines that the one from 
gitweb/cache.pm (like memcached or mmap based ones).

Then the code could look like the following (in pseudocode):

  try to acquire writers lock
  if (acquired writers lock) {
  	generate and "tee" response
  	create cache entry
  } else {
  	# <<<<<<
  	while (not acquired writers lock &&
  	       sysread something) {
  		print <data>;
  	}
  	# >>>>>>
  	retrieve and print (rest) of data
  }

where parts between <<<<<< and >>>>>> are new.

But there is another complication: gitweb needs to be able to deal with
the situation where process generating data got interrupted before 
creating full output, or process generating data ran die_error which
does not generate any cache entry (e.g. if the URL we are trying to
access returns 404 not found - the check for existence of object can
take a while if the system is busy, I think).  

Now in current implementation either cache entry is written in full, or it
is not written at all.  It would be, I think, fairly easy to check with the
current code whether cache entry got generated when we acquired readers
lock (when the process get terminated, the lock gets released, which
is advantage over using atomic creating file with O_EXCL for locking),
and if we didn't repeat the whole process.  With the "tee"/"tail" 
solution if the process generating data got interrupted before end,
we can detect such situation, but currently I have no idea what should
be done in such situation.  We can as easily as for the current solution
(which needs "Generating..." page for activity indicator) to detect
die_error situation, and with some care i.e. with not writing to cache
file directly we can ensure that cache entries contain full, correctly
generated data.

> 
> P.S.: Again the disclaimer - if this is "too hard", it's better to
> accept patches like they are, then improve this later. But perhaps
> a better solution would be not to clutter the code by optimizing this
> case at all if it's not clear it really matters in the real world.

See above.

P.S. I have noticed that with current implementation (well, I am not sure
if it is true also for J.H. implementation) there is problem if there is
more than one process trying to request URL which result in die_error being
called.  The design decision, present in original patch, was to not cache
"die_error" / non-"200 OK" pages; it seems sane, but I don't know if it
was a correct decision.  The solution for interrupted generating process,
described above, works also for die_error pages, although it makes 
die_error pages slower for such (hopefully rare) situation of simultaneous
errorneous request.

P.P.S. Both Pasky's approach to caching projects_list page, and Lea Wiemann
work on "gitweb caching" project for Google Summer of Code 2008 approached
caching in different way: by caching (parsed) data, not by caching output.
Note however that for some actions like 'snapshot' we would probably want
to have response/output caching anyway.  Also for output caching we can
use X-Sendfile (or like) extension.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-25 13:48                             ` Jakub Narebski
  2010-01-25 13:56                               ` Petr Baudis
@ 2010-01-25 20:41                               ` J.H.
  2010-01-26  2:30                                 ` Jakub Narebski
  1 sibling, 1 reply; 60+ messages in thread
From: J.H. @ 2010-01-25 20:41 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Petr Baudis, git, John 'Warthog9' Hawley

On 01/25/2010 05:48 AM, Jakub Narebski wrote:
> On Mon, Jan 25, 2010, Petr Baudis wrote:
>> On Mon, Jan 25, 2010 at 12:46:39PM +0100, Jakub Narebski wrote:
>>> On Sun, 24 Jan 2010, Petr Baudis wrote:
> 
>>>>   I have stupid question, common to both the original patch and this
>>>> RFC.
>>>>
>>>>> [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when
>>>>> regenerating cache (WIP)
>>>>
>>>>   Just why is a "Generating..." page appropriate?
>>>>
>>>>   I have to admit I hate it; can you please at least make it
>>>> configurable? Why is it needed at all? It [...] confuses
>>>> non-interactive HTTP clients [...]
> 
>>> Second, gitweb can always check User-Agent header, and serve 
>>> "Generating..." page only to web browsers:
>>>
>>>   unless (defined $cgi->user_agent() &&
>>>           $cgi->user_agent() =~ /\b(Mozilla|Opera)\b/i) {
>>>   	return;
>>>   }
>>>
>>> or something like that.
>>
>> I'm not too happy with this. What about Safari? Opera? ELinks? There's a
>> lot of web browsers.
> 
> The "Mozilla" part would catch all "Mozilla compatibile" web browsers,
> including Firefox (and other Gecko-based web browsers), Internet Explorer,
> WebKit based browsers including Safari and Chrome and Konqueror.
> The "Opera" part would catch Opera.
> http://www.nczonline.net/blog/2010/01/12/history-of-the-user-agent-string/
> 
> As to other web browsers like Elinks, Lynx, w3m, Dillo, etc.: the issue
> is whether they honor '<meta http-equiv="refresh" content="0" />'.  
> I think it is better to stay on the safe side; it is not disaster if web
> browser is not shown "Generating..." page where it could (but see 
> explanation below).

Most of them do, that particular tag has been around for a long time and
since it doesn't require Javascript to do the page refresh it's pretty
much universal.

The problem is going to be with things like wget when someone wants to
snag a binary file.  This works fine if the file is already cached, but
the user doesn't get what they are expecting if they get a blob that
isn't the final file, but the html contents of the page.  I don't know
of any hint that things like wget would send to the server that you
could switch based on, but it would be more or less the non-background
caching state.

>> Most of the issues can be worked around, but I'm not sure why to go
>> through all the trouble. I just personally don't see the value in having
>> the placeholder in there at all, to me it is distracting UI even if all
>> the technicalities are put aside.
> 
> The issue that "Generating..." page tries to solve is, I think, the 
> following.  
> 
> Some actions, like 'blame' view or pickaxe search, or grep search,
> can take quite a long time to generate, with times counted in
> tens of seconds.  It is not that visible for non-caching, because
> gitweb streams output so we have at least _some_ output upfront quite
> fast.
> 
> Now those patches (mine and J.H. both) make gitweb use locking
> (it is IIRC configurable in J.H. patch) to make only one process
> generate the page if it is missing from cache, or is stale.  Now
> if it is missing, we have to wait until it is generated in full
> before being able to show it to client.  While it is possible to
> "tee" output (using PerlIO::tee, or Capture::Tiny, or tie like
> CGI::Cache) writing it simultaneously to browser and to cache for 
> the process that is generating data, it is as far as I understand
> it impossible for processes which are waiting for data.  Therefore
> the need for "Generating..." page, so the user does not think that
> web server hung or something, and is not generating output.
>  
> We can try to reduce occurrences of cache miss stampedes by using
> 'expires_variance' feature[1] from CHI - Unified caching interface.
> We can also turn off locking and tee output to have some output upfront
> as an activity indicator instead of this "Generating..." page.
> 
> [1]: http://search.cpan.org/~jswartz/CHI-0.33/lib/CHI.pm#set
> 
>> But if it will be possible to turn this off eventually, it's all your
>> call whether to bother implementing it. :-)
> 
> In my implementation it is (or rather would be) as simple as just
> not passing 'generating_info' => \&git_generating_data_html in the
> GitwebCache::SimpleFileCache constructor.
> 

At least in mine it was don't allow background caching.  It would force
everyone to wait on the one process that was actually generating
content.  But it means a few blank pages with a spinning working icon
until the cache releases it's exclusive lock.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
  2010-01-25 20:41                               ` J.H.
@ 2010-01-26  2:30                                 ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-26  2:30 UTC (permalink / raw)
  To: J.H.; +Cc: Petr Baudis, git, John 'Warthog9' Hawley

On Mon, 25 Jan 2010, J.H. wrote:
> On 01/25/2010 05:48 AM, Jakub Narebski wrote:
>> On Mon, Jan 25, 2010, Petr Baudis wrote:
>>> On Mon, Jan 25, 2010 at 12:46:39PM +0100, Jakub Narebski wrote:
>>>> On Sun, 24 Jan 2010, Petr Baudis wrote:
>> 
>>>>>   I have stupid question, common to both the original patch and this
>>>>> RFC.
>>>>>
>>>>>> [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when
>>>>>> regenerating cache (WIP)
>>>>>
>>>>>   Just why is a "Generating..." page appropriate?
>>>>>
>>>>>   I have to admit I hate it; can you please at least make it
>>>>> configurable? Why is it needed at all? It [...] confuses
>>>>> non-interactive HTTP clients [...]
>> 
>>>> Second, gitweb can always check User-Agent header, and serve 
>>>> "Generating..." page only to web browsers:
>>>>
>>>>   unless (defined $cgi->user_agent() &&
>>>>           $cgi->user_agent() =~ /\b(Mozilla|Opera)\b/i) {
>>>>   	return;
>>>>   }
>>>>
>>>> or something like that.
>>>
>>> I'm not too happy with this. What about Safari? Opera? ELinks? There's a
>>> lot of web browsers.
>> 
>> The "Mozilla" part would catch all "Mozilla compatibile" web browsers,
>> including Firefox (and other Gecko-based web browsers), Internet Explorer,
>> WebKit based browsers including Safari and Chrome and Konqueror.
>> The "Opera" part would catch Opera.
>> http://www.nczonline.net/blog/2010/01/12/history-of-the-user-agent-string/
>> 
>> As to other web browsers like Elinks, Lynx, w3m, Dillo, etc.: the issue
>> is whether they honor '<meta http-equiv="refresh" content="0" />'.  
>> I think it is better to stay on the safe side; it is not disaster if web
>> browser is not shown "Generating..." page where it could (but see 
>> explanation below).
> 
> Most of them do, that particular tag has been around for a long time and
> since it doesn't require Javascript to do the page refresh it's pretty
> much universal.

For this method to work there is additional assumption about web browser,
namely that it would follow refresh only after it gets page in full (only
when connection closes).  But perhaps that is true for all web browsers
following http-equiv refresh.

> 
> The problem is going to be with things like wget when someone wants to
> snag a binary file.  This works fine if the file is already cached, but
> the user doesn't get what they are expecting if they get a blob that
> isn't the final file, but the html contents of the page.  I don't know
> of any hint that things like wget would send to the server that you
> could switch based on, but it would be more or less the non-background
> caching state.

Well, the above solution would work as long as you don't run wget with
--user-agent.

[...]
>>> But if it will be possible to turn this off eventually, it's all your
>>> call whether to bother implementing it. :-)
>> 
>> In my implementation it is (or rather would be) as simple as just
>> not passing 'generating_info' => \&git_generating_data_html in the
>> GitwebCache::SimpleFileCache constructor.
> 
> At least in mine it was don't allow background caching.  It would force
> everyone to wait on the one process that was actually generating
> content.  But it means a few blank pages with a spinning working icon
> until the cache releases it's exclusive lock.

Theoretically at least having "Generating..." page is orthogonal to using
locking, to background serving and to serving stale data.  In current 
implementation those features are dependent, serving stale data on locking,
and "Generating..." also on locking.

I think that serving stale data should depend on background caching, as
otherwise we introduce inequality in treating process which regenerates
cache (and must wait without background caching) and rest of processes
(which are served stale data).

Certainly in final version of patch all this should be made configurable.

 
P.S. With background caching there is a question whether parent should
generate and child display stale or "Generating...", or whether parent
should display stale/activity and child (re)generate cache entry.  Also
there is an issue whether parent should waitpid for child, would fiddling
with SIGCHLD handler be enough, or is it not needed.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 00/10] gitweb: Simple file based output caching
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
                                       ` (8 preceding siblings ...)
  2010-01-23  0:27                     ` [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating " Jakub Narebski
@ 2010-01-23 19:55                     ` J.H.
  2010-01-24 13:54                     ` [RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when regenerating cache (WIP) Jakub Narebski
  2010-02-06  0:51                     ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H.
  11 siblings, 0 replies; 60+ messages in thread
From: J.H. @ 2010-01-23 19:55 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley

Just a heads up I wasn't able to get to these yesterday (Friday), but
I'll try and work in looking these over come Monday.

- John 'Warthog9' Hawley

On 01/22/2010 04:27 PM, Jakub Narebski wrote:
> This 10 patches long patch series is intended as proof of concept
> for splitting large 'gitweb: File based caching layer (from git.kernel.org)'
> mega-patch by John 'Warthog9' Hawley aka J.H., by starting small and
> adding features piece by piece.
> 
> This patch is meant as replacement for last two patches:
> * [PATCH 8/9] gitweb: Convert output to using indirect file handle
>   Message-ID: <1263432185-21334-9-git-send-email-warthog9@eaglescrag.net>
> * [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org)
>   Message-ID: <1263432185-21334-10-git-send-email-warthog9@eaglescrag.net>
> 
> in the long patch series by J.H.
> * [PATCH 0/9] Gitweb caching v5
>   http://thread.gmane.org/gmane.comp.version-control.git/136913
> 
> Note that this patch series is part of 'gitweb/cache-kernel' branch of
> http://repo.or.cz/w/git/jnareb-git.git repository (gitweb link), built
> on top of modified patches from 'Gitweb caching v2' series (from
> 'gitweb-ml-v2' branch of http://git.kernel.org/?p=git/warthog9/gitweb.git
> repository).  Therefore they might not apply as straight replacements
> on top of early parts of 'gitweb-ml-v5' branch.
> 
> This is work in progress (showing how I see introducing output caching
> to gitweb), it lacks proper documentation (POD for gitweb/cache.pm,
> new configuration variables in gitweb/README, perhaps "Gitweb caching"
> section in gitweb/README and gitweb/cache.pm mentioned in gitweb/INSTALL),
> and commits/patches marked '(WIP)' lacks proper commit message.
> 
> Just food for thought...
> 
> Table of contents:
> ~~~~~~~~~~~~~~~~~~
>  [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing
>                    for caching)
>  [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1)
>  [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching
>  [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration
>  [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set)
>                    output caching (WIP)
>  [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP)
>  [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP)
>  [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd'
>                    problem (WIP)
>  [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for
>                    filling cache (WIP)
>  [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when
>                    regenerating cache (WIP)
> 
> 
> Diffstat:
> ~~~~~~~~~
> 
>  gitweb/cache.pm                        |  566 ++++++++++
>  gitweb/gitweb.perl                     | 1923 +++++++++++++++++---------------
>  t/gitweb-lib.sh                        |    2 +
>  t/t9500-gitweb-standalone-no-errors.sh |   13 +
>  t/t9503-gitweb-caching.sh              |   32 +
>  t/t9503/test_cache_interface.pl        |  195 ++++
>  t/test-lib.sh                          |    3 +
>  7 files changed, 1836 insertions(+), 898 deletions(-)
>  create mode 100644 gitweb/cache.pm
>  create mode 100755 t/t9503-gitweb-caching.sh
>  create mode 100755 t/t9503/test_cache_interface.pl

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when regenerating cache (WIP)
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
                                       ` (9 preceding siblings ...)
  2010-01-23 19:55                     ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H.
@ 2010-01-24 13:54                     ` Jakub Narebski
  2010-02-06  0:51                     ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H.
  11 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-24 13:54 UTC (permalink / raw)
  To: git
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch, as you can see, lack proper commit message: it is work in
progress (and very much proof of concept patch).

This is Ajax-y JavaScript-based activity indicator.  It is very
bare-bones, without any error checking or fallback in the case we
misdetected and JavaScript is turned off.  Also currently it would
work only for subsequent links from gitweb, with 'js=1' added.

We need to mark Ajax request to _not_ create "Generating..." page.
I have used there custon 'X-Ajax' HTTP header; we could use 
'Pragma: wait', or ejen additional query parameter for that.

 gitweb/gitweb.perl |   69 +++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 50 insertions(+), 19 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index cb46b8c..d422a4b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3219,6 +3219,9 @@ sub blob_contenttype {
 sub git_generating_data_html {
 	my ($cache, $key, $lock_fh) = @_;
 
+	if ($cgi->http('X-Ajax')) {
+		return;
+	}
 	if ($action eq 'atom' || $action eq 'rss' || $action eq 'opml' || # feeds
 	    $action eq 'blob_plain' || # unknown mimetype
 	    $action eq 'commitdiff_plain' || # text/plain
@@ -3241,6 +3244,7 @@ sub git_generating_data_html {
 		}
 	}
 
+	my $use_javascript = $cgi->param('js');
 	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
 	print STDOUT $cgi->header(-type => 'text/html', -charset => 'utf-8',
 	                          -status=> '200 OK', -expires => 'now');
@@ -3253,31 +3257,58 @@ sub git_generating_data_html {
 <!-- git core binaries version $git_version -->
 <head>
 <meta http-equiv="content-type" content="text/html; charset=utf-8" />
-<meta http-equiv="refresh" content="0" />
+EOF
+	if (!$use_javascript) {
+		print STDOUT qq!<meta http-equiv="refresh" content="0" />\n!;
+	}
+	print STDOUT <<"EOF";
 <meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version" />
 <meta name="robots" content="noindex, nofollow" />
 <title>$title</title>
 </head>
 <body>
 EOF
-	print STDOUT 'Generating..';
-
-	my $ready;
-	my ($wait, $wait_step, $wait_max); # in seconds
-	$wait = 0;
-	$wait_step = 2;
-	$wait_max = 20;
-	$| = 1; # autoflush
-	do {
-		print STDOUT '.';
-
-		sleep $wait_step if ($wait > 0);
-		$wait += $wait_step;
-
-		$ready = flock($lock_fh, LOCK_SH|LOCK_NB)
-			if $lock_fh;
-
-	} while (!$ready && ($wait < $wait_max));
+	if ($use_javascript) {
+		my $url_self = href(-replay=>1);
+		print STDOUT <<"EOF";
+Generating...<br />
+<script type="text/javascript" src="$javascript"></script>
+<script type="text/javascript">
+window.onload = function () {
+	xhr = createRequestObject();
+	xhr.onreadystatechange = function () {
+		if (xhr.readyState === 4 && xhr.status === 200) {
+			// intentionally overwrite document
+			document.write(xhr.responseText);
+		}
+	}
+	xhr.open('GET', $url_self);
+	// mark request to not show "Generating..." page
+	xhr.setRequestHeader('X-Ajax', 'Yes');
+	xhr.send(null);
+}
+</script>
+EOF
+	} else {
+		print STDOUT 'Generating..';
+
+		my $ready;
+		my ($wait, $wait_step, $wait_max); # in seconds
+		$wait = 0;
+		$wait_step = 2;
+		$wait_max = 20;
+		$| = 1; # autoflush
+		do {
+			print STDOUT '.';
+			
+			sleep $wait_step if ($wait > 0);
+			$wait += $wait_step;
+			
+			$ready = flock($lock_fh, LOCK_SH|LOCK_NB)
+				if $lock_fh;
+			
+		} while (!$ready && ($wait < $wait_max));
+	}
 
 	print STDOUT <<"EOF";
 
-- 
1.6.6.1

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 00/10] gitweb: Simple file based output caching
  2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
                                       ` (10 preceding siblings ...)
  2010-01-24 13:54                     ` [RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when regenerating cache (WIP) Jakub Narebski
@ 2010-02-06  0:51                     ` J.H.
  2010-02-06 23:56                       ` Jakub Narebski
  11 siblings, 1 reply; 60+ messages in thread
From: J.H. @ 2010-02-06  0:51 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley

> Table of contents:
> ~~~~~~~~~~~~~~~~~~
>  [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing
>                    for caching)

This looks fine, I did some quick testing to verify that this would work
- and it does.

The only caveat that needs to be aware is that if the layer is going to
output binary data it needs to flip the whole stream to :raw before
outputting (this is going to be more specific to the caching layer).

One advantage to having the file handles separate is that it's easier to
distinguish if the data is going to need to be binary data that will
need to be flipped properly.

Also means you could cache the binary data differently than textual data.

I.E. binary data gets saved to disk, but page data gets saved to memcached.

Just food for thought, I'm not sure which way makes more sense
personally, though I would have a tendency to err on the side of
flexibility and have both.

>  [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1)

note: delaying additional comment till I've finished reading through the
basics of the following patches.

>  [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching

Ok this is quite the departure from what I had, I'm unsure that it's the
right way to go, but it obviously has merits (I.E. much simpler addition
of any Cache::Cache or CHI compatible caching layer)

This patch itself looks fine, and as it states it borrows heavily from
my basic implementation - just wraps it differently.  I might have some
thoughts on extending this a bit to be a bit more flushed out from a
basic standpoint.

Need to dig through it some more, but I'm generally ok with it.

>  [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration

Looks fine to me, though the note about getting the errors should get
moved to previous patch, as it says.

Note: I'm going to stop here as the following are WIP and I want to play
around with this particular direction on my own a little more before
further comment.  There's some ideas running around I want to try and
get down in code first.  Me moving on and trying these other ideas is
not a reflection on the following patches, just some alternative
thinking before I discuss some other ideas on the following patches.

Also I've been sitting on this e-mail in this state for almost a week
while I've been playing with this and having to fight other fires and I
know that Jakub has been looking for commentary on this.

>  [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set)
>                    output caching (WIP)
>  [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP)
>  [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP)
>  [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd'
>                    problem (WIP)
>  [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for
>                    filling cache (WIP)
>  [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when
>                    regenerating cache (WIP)

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 00/10] gitweb: Simple file based output caching
  2010-02-06  0:51                     ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H.
@ 2010-02-06 23:56                       ` Jakub Narebski
  2010-02-07 12:35                         ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-02-06 23:56 UTC (permalink / raw)
  To: J.H.; +Cc: git, John 'Warthog9' Hawley, Petr Baudis

On Sat, 6 Feb 2010, J.H. wrote:
> > Table of contents:
> > ~~~~~~~~~~~~~~~~~~
> >  [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing
> >                    for caching)

I am working on v2 of this series, where this patch is not necessary.

It uses *STDOUT->push_layer(scalar => \$data) and *STDOUT->pop_layer()
from PerlIO::Util if it is available, and manipulation of *STDOUT (which
means *STDOUT = $data_fh and not $out = $data_fh).  But I must say that
doing capture of STDOUT (only; STDERR is not captured) without requiring
extra Perl modules (like recommended Capture::Tiny or e.g. IO::Capture),
and especially testing that it works correctly with capturing output
of cache_fetch is serious PITA.

This patch has the advantage that all operations are simpler.  In
particular it is easy to have section which should be not captured,
or where capture should be turned off (slightly different).

It has the disadvantage that all future contributions must use
"print $out <something>" / "print {$out} <something>", and that
contributions from before this change would have to be carefully
updated.  (Well, we could probably add the test that would check
that everything that needs to go to $out does, and everything that
shouldn't got to $out but to STDOUT doesn't.)

If I were to have such patch in new version of "gitweb output
caching" series, I would make the following changes:
* (optionally) use simpler 'print $out <sth>' instead of visually 
  distinct 'print {$out} <sth>', where from first glance one can
  see that $out is filehandle and not something to be printed
* use short filehandle name: $out, or $oh, or $o/$O.
* split above patch in 2 to 4 patches:
  - pure mechanical (scripted) change:
     + print <sth>     ->  print $out <sth>
     + printf(<sth>)   ->  printf($out <sth>)
     + binmode STDOUT  ->  binmode $out
    The last with possible exception of very first binmode call.
  - realign (purely whitespace change)
  - wrap too long lines (newlines and whitespace), optional
  - change $out to $bout/$bin ($binary_output_fh) where needed;
    but see comment below (optional)

> 
> This looks fine, I did some quick testing to verify that this would work
> - and it does.

I have only ran test, and didn't actually check that it works correctly.
This commit shouldn't change gitweb behaviour at all.

> 
> The only caveat that needs to be aware is that if the layer is going to
> output binary data it needs to flip the whole stream to :raw before
> outputting (this is going to be more specific to the caching layer).
> 
> One advantage to having the file handles separate is that it's easier to
> distinguish if the data is going to need to be binary data that will
> need to be flipped properly.

I don't think that it would be needed.  

First, all mode changing operations, i.e. calls to binmode are changed
to act on $out rather than on STDOUT it means.  It means that if we are
using 'in memory file' to capture output to scalar variable, then captured
data would be properly converted in variable.  So it would be enough to
save this variable in :raw mode to file.  If we are saving directly to
cache file, then of course saved data would go through layer and would
be converted properly.  In any case in cache file we would have _already_
_converted_ data.

This means that regardless whether $out used ':utf8' (pseudo)layer,
or ':raw' (pseudo)layer, if we read from cache file in ':raw' (binary mode)
and print data from cache to original (true) STDOUT also in ':raw' mode,
we would print correctly formatted data.

> 
> Also means you could cache the binary data differently than textual data.
> 
> I.E. binary data gets saved to disk, but page data gets saved to memcached.

That's true, but on the other hand it would be easy to add some extra
command marking data as binary below binmode.  Or we can examine IO
layers (using PerlIO::get_layers($out); the PerlIO module is in Perl
core) if there is 'utf8' layer or 'raw' (pseudo)layer.

> 
> Just food for thought, I'm not sure which way makes more sense
> personally, though I would have a tendency to err on the side of
> flexibility and have both.

It might be good idea... but nevertheless I'd like to have short name
for binary filehandle, if we decode to keep it.  What should it be?
$bout, $bin, $B, $bin_out, $out_bin, $bin_fh?

> 
> >  [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1)
> 
> note: delaying additional comment till I've finished reading through the
> basics of the following patches.

This is to use later _full_ _normalized_ URI as cache key for given page.
IIRC in your original patch you ignored path_info; but on the other hand
git.kernel.org has path_info feature turned off...

> 
> >  [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching
> 
> Ok this is quite the departure from what I had, I'm unsure that it's the
> right way to go, but it obviously has merits (I.E. much simpler addition
> of any Cache::Cache or CHI compatible caching layer)
> 
> This patch itself looks fine, and as it states it borrows heavily from
> my basic implementation - just wraps it differently.  I might have some
> thoughts on extending this a bit to be a bit more flushed out from a
> basic standpoint.
> 
> Need to dig through it some more, but I'm generally ok with it.

Note that the new implementation in (not send yet) new version of 
"gitweb output caching" series is based more on newer and more modern
CHI unified interface rather than older Cache::Cache interface.  It
is I think much cleaner and easier to read.

The major difference from your implementation is that in my version
the gitweb caching engine uses "save to temporary file + rename file
to final name" method to have atomic write to cache (atomic cache
filling).  It should be more robust, but OTOH it introduces a bit of
performance penalty.  With locking and single writer we could use
predictable temporary file name rather than using tempfile/mkstemp
or equivalent from File::Temp, or UUID based filename like CHI does
it.

Also, tests.

Current code (even the v2 version) lacks proper error detection, error
signalling and logging.

> 
> >  [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration
> 
> Looks fine to me, though the note about getting the errors should get
> moved to previous patch, as it says.

I wanted to get this series out faster, that is why it is not polished.

> 
> Note: I'm going to stop here as the following are WIP and I want to play
> around with this particular direction on my own a little more before
> further comment.  There's some ideas running around I want to try and
> get down in code first.  Me moving on and trying these other ideas is
> not a reflection on the following patches, just some alternative
> thinking before I discuss some other ideas on the following patches.

Take a look at gitweb/cache-kernel-v2 branch (the new caching series).
Note however that it would be subject to rebasing / changes.

> 
> Also I've been sitting on this e-mail in this state for almost a week
> while I've been playing with this and having to fight other fires and I
> know that Jakub has been looking for commentary on this.

Thank you very much for your commentary, in spite of your heavy load.

> 
> >  [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set)
> >                    output caching (WIP)
> >  [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP)
> >  [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP)
> >  [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd'
> >                    problem (WIP)
> >  [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for
> >                    filling cache (WIP)
> >  [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when
> >                    regenerating cache (WIP)

There is new version of this series in gitweb/cache-kernel-v2 in my
git/jnareb-git.git fork (clone) of git.git repository at repo.or.cz.
Now all commits have proper description (for first series one had to
read comment section in emails for commit description), and all features
are tested (at least on API level, and to some extent) -- full tests
do require having PerlIO::Util installed (I have done it following
local::lib and installing it from 'cpan' client), though.

Also all features are fully configurable, to even greater extent than
in original series by J.H. (this what what v1 was lacking).  And there
is (see diffstat) section about caching in gitweb/README.

The following changes since commit d5f8a3d6f4d946c33459e00edf02819f89711777:
  Junio C Hamano (1):
        Merge branch 'master' into next

are available in the git repository at:

  git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel-v2

You can view it via gitweb at:

  http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel-v2
  http://repo.or.cz/w/git/jnareb-git.git/log/refs/heads/gitweb/cache-kernel-v2

SHORTLOG (10):
      gitweb: href(..., -path_info => 0|1)
      gitweb/cache.pm - Very simple file based caching
      gitweb/cache.pm - Stat-based cache expiration
      gitweb: Use Cache::Cache compatibile (get, set) output caching
      gitweb/cache.pm - Adaptive cache expiration time
      gitweb: Use CHI compatibile (compute method) caching
      gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem
      gitweb/cache.pm - Serve stale data when waiting for filling cache
      gitweb/cache.pm - Regenerate (refresh) cache in background
      gitweb: Show appropriate "Generating..." page when regenerating cache

 gitweb/README                          |   70 +++++
 gitweb/cache.pm                        |  527 ++++++++++++++++++++++++++++++++
 gitweb/gitweb.perl                     |  301 +++++++++++++++++-
 t/gitweb-lib.sh                        |    2 +
 t/t9500-gitweb-standalone-no-errors.sh |   19 ++
 t/t9503-gitweb-caching.sh              |   32 ++
 t/t9503/test_cache_interface.pl        |  380 +++++++++++++++++++++++
 t/test-lib.sh                          |    3 +
 8 files changed, 1319 insertions(+), 15 deletions(-)
 create mode 100644 gitweb/cache.pm
 create mode 100755 t/t9503-gitweb-caching.sh
 create mode 100755 t/t9503/test_cache_interface.pl

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 00/10] gitweb: Simple file based output caching
  2010-02-06 23:56                       ` Jakub Narebski
@ 2010-02-07 12:35                         ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-02-07 12:35 UTC (permalink / raw)
  To: J.H.; +Cc: git, John 'Warthog9' Hawley

On Sun, 7 Feb 2010, Jakub Narebski wrote:

> There is new version of this series in gitweb/cache-kernel-v2 in my
> git/jnareb-git.git fork (clone) of git.git repository at repo.or.cz.
> Now all commits have proper description (for first series one had to
> read comment section in emails for commit description), [...]

Below there are commit messages for gitweb/cache-kernel-v2 branch after
rebase and fixups:

commit 560e2ab10d0f8457fbeca7a26814ff3e32396f7b
Author: Jakub Narebski <jnareb@gmail.com>
Date:   Sun Feb 7 11:27:22 2010 +0100

    gitweb: href(..., -path_info => 0|1)
    
    If named boolean option -path_info is passed to href() subroutine, use
    its value to decide whether to generate path_info URL form.  If this
    option is not passed, href() queries 'pathinfo' feature to check
    whether to generate path_info URL (if generating path_info link is
    possible at all).
    
    href(-replay=>1, -path_info=>0) is meant to be used to generate a key
    for caching gitweb output; alternate solution would be to use freeze()
    from Storable (core module) on %input_params hash (or its reference),
    e.g.:
      $key = freeze \%input_params;
    or other serialization technique.
    
    While at it document extra options/flags to href().
    
    Signed-off-by: Jakub Narebski <jnareb@gmail.com>

 gitweb/gitweb.perl |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

commit dd6e8dc27d5b799bd2a1aed03738195dfe3bc5e7
Author: Jakub Narebski <jnareb@gmail.com>
Date:   Sun Feb 7 13:13:06 2010 +0100

    gitweb/cache.pm - Very simple file based caching
    
    This is first step towards implementing file based output (response)
    caching layer that is used on such large sites as kernel.org.
    
    This patch introduces GitwebCaching::SimpleFileCache package, which
    follows Cache::Cache / CHI interface, although do not implement it
    fully.  The intent of following established convention is to be able
    in the future to replace our simple file based cache e.g. by one using
    memcached.
    
    Like in original patch by John 'Warthog9' Hawley (J.H.) (the one this
    commit intends to be incremental step to), the data is stored in the
    case as-is, without adding metadata (like expiration date), and
    without serialization (which means only scalar data).
    
    To be implemented (from original patch by J.H.):
    * cache expiration (based on file stats, current time and global
      expiration time); currently elements in cache do not expire
    * actually using this cache in gitweb, except error pages
    * adaptive cache expiration, based on average system load
    * optional locking interface, where only one process can update cache
      (using flock)
    * server-side progress indicator when waiting for filling cache,
      which in turn requires separating situations (like snapshots and
      other non-HTML responses) where we should not show 'please wait'
      message
    
    Possible extensions (beyond what was in original patch):
    * (optionally) show information about cache utilization
    * AJAX (JavaScript-based) progress indicator
    * JavaScript code to update relative dates in cached output
    * make cache size-aware (try to not exceed specified maximum size)
    * utilize X-Sendfile header (or equivalent) to show cached data
      (optional, as it makes sense only if web server supports sendfile
      feature and have it enabled)
    * variable expiration feature from CHI, allowing items to expire a bit
      earlier than the stated expiration time to prevent cache miss
      stampedes (although locking, if available, should take care of
      this).
    
    The code of GitwebCaching::SimpleFileCache package in gitweb/cache.pm
    was heavily based on file-based cache in Cache::Cache package, i.e.
    on Cache::FileCache, Cache::FileBackend and Cache::BaseCache, and on
    file-based cache in CHI, i.e. on CHI::Driver::File and CHI::Driver
    (including implementing atomic write, something that original patch
    lacks).
    
    This patch does not yet enable output caching in gitweb (it doesn't
    have all required features yet); on the other hand it includes tests,
    currently testing only cache Perl API.
    
    Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
    Signed-off-by: Jakub Narebski <jnareb@gmail.com>

 gitweb/cache.pm                 |  269 +++++++++++++++++++++++++++++++++++++++
 t/t9503-gitweb-caching.sh       |   32 +++++
 t/t9503/test_cache_interface.pl |   84 ++++++++++++
 t/test-lib.sh                   |    3 +
 4 files changed, 388 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/cache.pm
 create mode 100755 t/t9503-gitweb-caching.sh
 create mode 100755 t/t9503/test_cache_interface.pl

commit 3914e7da792fec50fcc64c0e644d54cf4451703a
Author: Jakub Narebski <jnareb@gmail.com>
Date:   Sun Feb 7 13:13:17 2010 +0100

    gitweb/cache.pm - Stat-based cache expiration
    
    Add stat-based cache expiration to file-based GitwebCache::SimpleFileCache.
    Contrary to the way other caching interfaces such as Cache::Cache and CHI
    do it, the time cache element expires in is _global_ value associated with
    cache instance, and is not local property of cache entry.  (Currently cache
    entry does not store any metadata associated with entry... which means that
    there is no need for serialization / marshalling / freezing and thawing.)
    Default expire time is -1, which means never expire.
    
    To check if cache entry is expired, GitwebCache::SimpleFileCache compares
    difference between mtime (last modify time) of a cache file and current time
    with (global) time to expire.  It is done using CHI-compatible is_valid()
    method.
    
    Add some tests checking that expiring works correctly (on the level of API).
    
    To be implemented (from original patch by J.H.):
    * actually using this cache in gitweb, except error pages
    * adaptive cache expiration, based on average system load
    * optional locking interface, where only one process can update cache
      (using flock)
    * server-side progress indicator when waiting for filling cache,
      which in turn requires separating situations (like snapshots and
      other non-HTML responses) where we should not show 'please wait'
      message
    
    Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
    Signed-off-by: Jakub Narebski <jnareb@gmail.com>

 gitweb/cache.pm                 |   34 ++++++++++++++++++++++++++++++++--
 t/t9503/test_cache_interface.pl |   10 ++++++++++
 2 files changed, 42 insertions(+), 2 deletions(-)

commit a55625cb0f2d6c08a28e774fd2ddb4e5347a24b3
Author: Jakub Narebski <jnareb@gmail.com>
Date:   Sun Feb 7 13:13:27 2010 +0100

    gitweb: Use Cache::Cache compatible (get, set) output caching
    
    This commit actually adds output caching to gitweb, as we have now
    minimal features required for it in GitwebCache::SimpleFileCache
    (a 'dumb' but fast file-based cache engine).  To enable cache you need
    at least set $caching_enabled to true in gitweb config, and copy cache.pm
    from gitweb/ alongside gitweb.cgi - this is described in more detail
    in the new "Gitweb caching" section in gitweb/README
    
    Currently cache support related subroutines in cache.pm (which are
    outside GitwebCache::SimpleFileCache package) are not well separated
    from gitweb script itself; cache.pm lacks encapsulation.  cache.pm
    assumes that there are href() subroutine and %actions variable, and
    that there exist $actions{$action} (where $action is parameter passed
    to cache_fetch), and it is a code reference (see also comments in
    t/t9503/test_cache_interface.pl).  This is remaining artifact from the
    original patch by J.H. (which also had cache_fetch() subroutine).
    
    Gitweb itself uses directly only cache_fetch, to get page from cache
    or to generate page and save it to cache, and cache_stop, to be used
    in die_error subroutine, as currently error pages are not cached.
    
    The cache_fetch subroutine captures output (from STDOUT only, as
    STDERR is usually logged) using either ->push_layer()/->pop_layer()
    from PerlIO::Util submodule (if it is available), or by setting and
    restoring *STDOUT.  Note that only the former could be tested reliably
    to be reliable in t9503 test!
    
    Enabling caching causes the following additional changes to gitweb
    output:
    * Disables content-type negotiation (choosing between 'text/html'
      mimetype and 'application/xhtml+xml') when caching, as there is no
      content-type negotiation done when retrieving page from cache.
      Use 'text/html' mimetype that can be used by all browsers.
    * Disable timing info (how much time it took to generate original
      page, and how many git commands it took), and in its place show when
      page was originally generated (in GMT / UTC timezone).
    
    Add basic tests of caching support to t9500-gitweb-standalone-no-errors
    test: set $caching_enabled to true and check for errors for first time
    run (generating cache) and second time run (retrieving from cache) for a
    single view - summary view for a project.
    
    If PerlIO::Util is available (see comments), test that cache_fetch
    behaves correctly, namely that it saves and restores action output in
    cache, and that it prints generated output or cached output.
    
    
    To be implemented (from original patch by J.H.):
    * adaptive cache expiration, based on average system load
    * optional locking interface, where only one process can update cache
      (using flock)
    * server-side progress indicator when waiting for filling cache,
      which in turn requires separating situations (like snapshots and
      other non-HTML responses) where we should not show 'please wait'
      message
    
    Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
    Signed-off-by: Jakub Narebski <jnareb@gmail.com>

 gitweb/README                          |   70 ++++++++++++++++++++++
 gitweb/cache.pm                        |   78 ++++++++++++++++++++++++
 gitweb/gitweb.perl                     |  102 ++++++++++++++++++++++++++++----
 t/gitweb-lib.sh                        |    2 +
 t/t9500-gitweb-standalone-no-errors.sh |   19 ++++++
 t/t9503/test_cache_interface.pl        |   93 +++++++++++++++++++++++++++++
 6 files changed, 352 insertions(+), 12 deletions(-)

commit 3e471ebd31e881ce1439f23075378c2ec6b95e4d
Author: Jakub Narebski <jnareb@gmail.com>
Date:   Sun Feb 7 13:13:31 2010 +0100

    gitweb/cache.pm - Adaptive cache expiration time
    
    Add to GitwebCache::SimpleFileCache support for adaptive lifetime
    (cache expiration) control.  Cache lifetime can be increased or
    decreased by any factor, e.g. load average, through the definition
    of the 'check_load' callback.
    
    Note that using ->set_expires_in, or unsetting 'check_load' via
    ->set_check_load(undef) turns off adaptive caching.
    
    Make gitweb automatically adjust cache lifetime by load, using
    get_loadavg() function.  Define and describe default parameters for
    dynamic (adaptive) cache expiration time control.
    
    There are some very basic tests of dynamic expiration time in t9503,
    namely checking if dynamic expire time is within given upper and lower
    bounds.
    
    To be implemented (from original patch by J.H.):
    * optional locking interface, where only one process can update cache
      (using flock)
    * server-side progress indicator when waiting for filling cache,
      which in turn requires separating situations (like snapshots and
      other non-HTML responses) where we should not show 'please wait'
      message
    
    Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
    Signed-off-by: Jakub Narebski <jnareb@gmail.com>

 gitweb/cache.pm                 |   55 +++++++++++++++++++++++++++++++++++---
 gitweb/gitweb.perl              |   27 +++++++++++++++++-
 t/t9503/test_cache_interface.pl |   22 +++++++++++++++
 3 files changed, 97 insertions(+), 7 deletions(-)

commit 984390f99c33d82cd4ddbfa6e00c721d9e74cddb
Author: Jakub Narebski <jnareb@gmail.com>
Date:   Sun Feb 7 13:13:52 2010 +0100

    gitweb: Use CHI compatible (compute method) caching
    
    If $cache provides CHI compatible ->compute($key, $code) method, use it
    instead of Cache::Cache compatible ->get($key) and ->set($key, $data).
    While at it, refactor regenerating cache into cache_calculate subroutine.
    
    GitwebCache::SimpleFileCache provides 'compute' method, which currently
    simply use 'get' and 'set' methods in proscribed manner.  Nevertheless
    'compute' method can be more flexible in choosing when to refresh cache,
    and which process is to refresh/(re)generate cache entry.  This method
    would use (advisory) locking to prevent 'cache miss stampede' (aka
    'stampeding herd') problem in the next commit.
    
    Signed-off-by: Jakub Narebski <jnareb@gmail.com>

 gitweb/cache.pm |   39 ++++++++++++++++++++++++++++++++++++---
 1 files changed, 36 insertions(+), 3 deletions(-)

commit 7d0109e4379f5187364edf7c25cdbc5247609f64
Author: Jakub Narebski <jnareb@gmail.com>
Date:   Sun Feb 7 13:18:14 2010 +0100

    gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem
    
    In the ->compute($key, $code) method from GitwebCache::SimpleFileCache,
    use locking (via flock) to ensure that only one process would generate
    data to update/fill-in cache; the rest would wait for the cache to
    be (re)generated and would read data from cache.
    
    Currently this feature can not be disabled (via %cache_options).
    
    
    A test in t9503 shows that in the case where there are two clients
    trying to simultaneously access non-existent or stale cache entry,
    (and generating data takes (artifically) a bit of time), if they are
    using ->compute method the data is (re)generated once, as opposed to
    if those clients are just using ->get/->set methods.
    
    To be implemented (from original patch by J.H.):
    * background building, and showing stale cache
    * server-side progress indicator when waiting for filling cache,
      which in turn requires separating situations (like snapshots and
      other non-HTML responses) where we should not show 'please wait'
      message
    
    Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
    Signed-off-by: Jakub Narebski <jnareb@gmail.com>

 gitweb/cache.pm                 |   29 ++++++++++++++++-
 t/t9503/test_cache_interface.pl |   65 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+), 2 deletions(-)

commit e7985f69eb9000860b155939d5fd7040e30f682f
Author: Jakub Narebski <jnareb@gmail.com>
Date:   Sun Feb 7 13:19:21 2010 +0100

    gitweb/cache.pm - Serve stale data when waiting for filling cache
    
    When process fails to acquire exclusive (writers) lock, then instead
    of waiting for the other process to (re)generate and fill cache, serve
    stale (expired) data from cache.  This is of course possible only if
    there is some stale data in cache for given key.
    
    This feature of GitwebCache::SimpleFileCache is used only for an
    ->update($key, $code) method.  It is controlled by 'max_lifetime'
    cache parameter; you can set it to -1 to always serve stale data
    if it exists, and you can set it to 0 (or any value smaller than
    'expires_min') to turn this feature off.
    
    This feature, as it is implemented currently, makes ->update() method a
    bit assymetric with respect to process that acquired writers lock and
    those processes that didn't, which can be seen in the new test in t9503.
    The process that is to regenerate (refresh) data in cache must wait for
    the data to be generated in full before showing anything to client, while
    the other processes show stale (expired) data immediately.  In order to
    remove or reduce this assymetry gitweb would need to employ one of the two
    alternate solutions.  Either data should be (re)generated in background,
    so that process that acquired writers lock would generate data in
    background while serving stale data, or alternatively the process that
    generates data should pass output to original STDOUT while capturing it
    ("tee" otput).
    
    When developing this feature, ->is_valid() method acquired additional
    extra optional parameter, where one cap pass expire time instead of using
    cache-wode global expire time.
    
    To be implemented (from original patch by J.H.):
    * background building,
    * server-side progress indicator when waiting for filling cache,
      which in turn requires separating situations (like snapshots and
      other non-HTML responses) where we should not show 'please wait'
      message
    
    Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
    Signed-off-by: Jakub Narebski <jnareb@gmail.com>

 gitweb/cache.pm                 |   23 ++++++++++----
 gitweb/gitweb.perl              |    8 +++++
 t/t9503/test_cache_interface.pl |   63 +++++++++++++++++++++++++++++++++++++-
 3 files changed, 86 insertions(+), 8 deletions(-)

commit 19911970b8a811a6382e39a10b071bff1dd4bd70
Author: Jakub Narebski <jnareb@gmail.com>
Date:   Sun Feb 7 13:20:46 2010 +0100

    gitweb/cache.pm - Regenerate (refresh) cache in background
    
    This commit removes assymetry in serving stale data (if it exists)
    when regenerating cache in GitwebCache::SimpleFileCache.  The process
    that acquired exclusive (writers) lock, and is therefore selected to
    be the one that (re)generates data to fill the cache, can now generate
    data in background, while serving stale data.
    
    This feature can be enabled or disabled on demand via 'background_cache'
    cache parameter.  It is turned on by default.
    
    To be implemented (from original patch by J.H.):
    * server-side progress indicator when waiting for filling cache,
      which in turn requires separating situations (like snapshots and
      other non-HTML responses) where we should not show 'please wait'
      message
    
    Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
    Signed-off-by: Jakub Narebski <jnareb@gmail.com>

 gitweb/cache.pm                 |   36 +++++++++++++++++++++++++++++-------
 gitweb/gitweb.perl              |    9 +++++++++
 t/t9503/test_cache_interface.pl |   14 ++++++++------
 3 files changed, 46 insertions(+), 13 deletions(-)

commit ce97bb5bc1660f6d5c9b9be68c556ac94097978c
Author: Jakub Narebski <jnareb@gmail.com>
Date:   Sun Feb 7 13:21:10 2010 +0100

    gitweb: Show appropriate "Generating..." page when regenerating cache
    
    When there exist stale/expired (but not too stale) version of
    (re)generated page in cache, gitweb returns stale version (and updates
    cache in background, assuming 'background_cache' is set to true value).
    When there is no stale version suitable to serve the client, currently
    we have to wait for the data to be generated in full before showing it.
    Add to GitwebCache::SimpleFileCache, via 'generating_info' callback,
    the ability to show user some activity indicator / progress bar, to
    show that we are working on generating data.
    
    Gitweb itself uses "Generating..." page as activity indicator, which
    redirects (via <meta http-equiv="Refresh" ...>) to refreshed version
    of the page after the cache is filled (via trick of not closing page
    and therefore not closing connection till data is available in cache,
    checked by getting shared/readers lock on lockfile for cache entry).
    The git_generating_data_html() subroutine, which is used by gitweb
    to implement this feature, is highly configurable: you can choose
    initial delay, frequency of writing some data so that connection
    won't get closed, and maximum time to wait for data in "Generating..."
    page (see %generating_options hash).
    
    Currently git_generating_data_html() contains hardcoded "whitelist" of
    actions for which such HTML "Generating..." page makes sense.
    
    
    This implements final feature from the original gitweb output caching
    patch by J.H.
    
    Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
    Signed-off-by: Jakub Narebski <jnareb@gmail.com>

 gitweb/cache.pm                 |   23 +++++-
 gitweb/gitweb.perl              |  154 ++++++++++++++++++++++++++++++++++++++-
 t/t9503/test_cache_interface.pl |   45 +++++++++++
 3 files changed, 216 insertions(+), 6 deletions(-)

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

[parent not found: <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com>]

* Re: [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing for caching)
       [not found]                   ` <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com>
@ 2010-01-23  0:48                     ` Jakub Narebski
  2010-02-07 21:32                     ` Jakub Narebski
  1 sibling, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-23  0:48 UTC (permalink / raw)
  To: git; +Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley

As you can (or rather can't ;-)) see this patch didn't made it into list,
because with 119,993 characters in format-patch patch it probably exceeds
a little bit exceeding 100,000 characters message size limit on VGER.

The problem is that it doesn't make sense to send partial patch... well,
perhaps reindent and breaking of exceedingly long lines should be split
into separate patch...

For now you can view the patch via gitweb
  http://repo.or.cz/w/git/jnareb-git.git/commitdiff/0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557
and apply it from
  http://repo.or.cz/w/git/jnareb-git.git/patch/0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557

The comment for this message (for this patch) can be seen below.

-- >8 --
On Sat, 23 Jan 2010, Jakub Narebski wrote:

> This means replacing
> 
>   print <something>;
> by
>   print {$out} <something>;
> 
> and
> 
>   binmode STDOUT, <layer>;
> by
>   binmode $out, <layer>;
> 
> where $out is global variable set to \*STDOUT at the beginning of
> gitweb, but after reading gitweb config.  This way it would be simple
> to e.g. tie output filehandle or use PerlIO layers to simultaneously
> write to standard output and to some specified file (like "tee"
> utility does), or redirect output to a scalar, or a file.
> 
> die_error (re)sets $out to \*STDOUT; we would (probably) want to treat
> errors in a special way, and do not cache them.
> 
> 
> The only other differences are reindent of continued lines (if
> needed), and sometimes word-wrapping lines which this change made too
> long.
> 
> Signed-off-by: Jakub Narebski <jnareb@gmail.com>
> ---
> This patch is meant as (straight) replacement for the following patch
> by J.H. (John 'Warthog9' Hawley):
> * [PATCH 8/9] gitweb: Convert output to using indirect file handle
>   Message-ID: <1263432185-21334-9-git-send-email-warthog9@eaglescrag.net>
>   http://permalink.gmane.org/gmane.comp.version-control.git/136915
> 
> Actually this patch precedes (was written before) the patch by J.H.
> 
> This patch was written _before_ comment from Junio that it would be
> better to simply use
>   print $out <something>;
> and do not try to be too clever.
> 
> 
> Differences from patch by J.H.:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> * Only one output handle, instead of having one output handle for text,
>   (':utf8'), and another output handler for binary files (:raw).  
> 
>   I do assume that I can write to handler with appropriate layer:
>   :utf8/:raw, and then I can simply read from cache file in :raw
>   binmode, as the data is already converted correctly.
> 
> * Shorter name for output handle: $out instead of $output_handler
> 
> * Set output handler to \*STDOUT (default value in declaration of this
>   variable, and also in die_error() subroutine), instead of *STDOUT.
>   This way $out is indirect filehandle, instead of using direct
>   filehandle which is _global_ to current package (see perlopentut(1)).
> 
> * Reindent continuation lines, i.e.
> 
>     print <line1> .
>           <line2>;
> 
>   got replaced (reindented) with
> 
>     print {$out} <line1> .
>                  <line2>;
> 
>   In some places lines were broken into two, when after indent the
>   line got too long.
> 
> * Slightly different replacement for printf

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing for caching)
       [not found]                   ` <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com>
  2010-01-23  0:48                     ` [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing for caching) Jakub Narebski
@ 2010-02-07 21:32                     ` Jakub Narebski
  1 sibling, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-02-07 21:32 UTC (permalink / raw)
  To: git

On Sat, 23 Feb 2010, Jakub Narebski wrote:

> This means replacing
> 
>   print <something>;
> by
>   print {$out} <something>;
> 
> and
> 
>   binmode STDOUT, <layer>;
> by
>   binmode $out, <layer>;

[...]

> This patch was written _before_ comment from Junio that it would be
> better to simply use
>   print $out <something>;
> and do not try to be too clever.

Actually 'print FILEHANDLE LIST' form is an example of indirect object
syntax in Perl (which for filehandles was present before there were
objects in Perl).  In indirect object notation the "first argument"
must be a name, a scalar variable, or a block (like {$out}).  If instead
of $out there would be something more complicated, we would have to
use block form ('print {$out[$i]} <sth>', for example).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle
  2010-01-14  1:23               ` [PATCH 8/9] gitweb: Convert output to using indirect file handle John 'Warthog9' Hawley
  2010-01-14  1:23                 ` [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
@ 2010-01-16  0:43                 ` Jakub Narebski
  2010-01-16  0:58                   ` Junio C Hamano
  2010-01-24 22:14                   ` Petr Baudis
  1 sibling, 2 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-16  0:43 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> This converts the output handling of gitweb to using an indirect
> file handle.  This is in preparation to add the caching layer.  This
> is a slight modification to the way I was originally doing it by
> passing the output around.  This should be a nop and this shouldn't
> change the behavior of gitweb.  This does leave error reporting
> functions (die_error specifically) continuing to output directly
> as I want to garauntee those will report their errors regardless of
> what may be going on with respect to the rest of the output.

Signoff?

Compare with my version of this patch:
  http://repo.or.cz/w/git/jnareb-git.git/commitdiff/0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557
in the gitweb/cache-kernel branch:
  http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel

My commit message looks like the following:

....
gitweb: Print to explicit filehandle (preparing for caching)

This means replacing

  print <something>;
by
  print {$out} <something>;

and

  binmode STDOUT, <layer>;
by
  binmode $out, <layer>;

where $out is global variable set to \*STDOUT at the beginning of
gitweb, but after reading gitweb config.  This way it would be simple
to e.g. tie output filehandle or use PerlIO layers to simultaneously
write to standard output and to some specified file (like "tee"
utility does), or redirect output to a scalar, or a file.

die_error (re)sets $out to \*STDOUT; we would (probably) want to treat
errors in a special way, and do not cache them.

The only other differences are reindent of continued lines (if needed),
and sometimes word-wrapping lines which this change made too long.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
....

> ---
>  gitweb/gitweb.perl |  880 ++++++++++++++++++++++++++--------------------------
>  1 files changed, 448 insertions(+), 432 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index c4a177d..8bb323c 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -450,6 +450,13 @@ our %feature = (
>  		'default' => [0]},
>  );
>  
> +# Basic file handler for all of gitweb, there are two of them.  The first
> +# is the basic text/html file handler which is used for everything other
> +# then the binary files, that uses a separate file handler though
> +# these are both set to STDOUT for the time being.
> +our $output_handler = *STDOUT;
> +our $output_handler_bin = *STDOUT;
> +

First it is not file handleR, but filehandle.

Second, there is no need for separate filehandle for binary files, if
you do it correctly (i.e. call binmode on filehandle, and not on
STDOUT).  When caching is enabled, and 'print {$output_handle} <sth>'
prints to in-memory file (or even directly to cache file) it would do
conversion, so when reading from cache file we can dump it raw, in
binary mode.

Third, wouldn't it be better to use shorter variable name, e.g. $out
or $oh, instead of $output_handle?  We would be able to align print(f)
statements without making lines much longer.

Fourth, there is slight difference between
  our $out = *STDOUT;
and
  out $out = \*STDOUT;
In the former we have global variable, in latter we have indirect
filehandle.  CGI::Cache uses the latter form, IIRC.

> @@ -3313,7 +3320,7 @@ EOF
>  		if ($use_pathinfo) {
>  			$action .= "/".esc_url($project);
>  		}
> -		print $cgi->startform(-method => "get", -action => $action) .
> +		print {$output_handler} $cgi->startform(-method => "get", -action => $action) .
>  		      "<div class=\"search\">\n" .
>  		      (!$use_pathinfo &&
>  		      $cgi->input({-name=>"p", -value=>$project, -type=>"hidden"}) . "\n") .

Here for example after change gitweb source stops being nicely aligned.
OTOH it makes for bigger patch.  In my version I did realign.

You can always check for true differences with "diff -w".

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle
  2010-01-16  0:43                 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle Jakub Narebski
@ 2010-01-16  0:58                   ` Junio C Hamano
  2010-01-16  1:14                     ` Jakub Narebski
  2010-01-24 22:14                   ` Petr Baudis
  1 sibling, 1 reply; 60+ messages in thread
From: Junio C Hamano @ 2010-01-16  0:58 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: John 'Warthog9' Hawley, git

Jakub Narebski <jnareb@gmail.com> writes:

> This means replacing
>
>   print <something>;
> by
>   print {$out} <something>;

Just out of curiosity, how is this different from

    print $out <something>;

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle
  2010-01-16  0:58                   ` Junio C Hamano
@ 2010-01-16  1:14                     ` Jakub Narebski
  2010-01-16  1:41                       ` Junio C Hamano
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-01-16  1:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: John 'Warthog9' Hawley, git

On Sat, 16 Jan 2010, Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
> > This means replacing
> >
> >   print <something>;
> > by
> >   print {$out} <something>;
> 
> Just out of curiosity, how is this different from
> 
>     print $out <something>;

Actually there is no difference.  It doesn't matter one way or other in
situations in gitweb.

I have thought however (but I might be mistaken) that "print {$fh} <sth>"
is idiomatic Perl.

'perldoc -f print' says:
    Note that if you're storing FILEHANDLES in an array or other expression,
    you will have to use a block returning its value instead:

           print { $files[$i] } "stuff\n";
           print { $OK ? STDOUT : STDERR } "stuff\n";

Also, there is no "," between FILEHANDLE and LIST in "print FILEHANDLE LIST"
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle
  2010-01-16  1:14                     ` Jakub Narebski
@ 2010-01-16  1:41                       ` Junio C Hamano
  0 siblings, 0 replies; 60+ messages in thread
From: Junio C Hamano @ 2010-01-16  1:41 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: John 'Warthog9' Hawley, git

Jakub Narebski <jnareb@gmail.com> writes:

> I have thought however (but I might be mistaken) that "print {$fh} <sth>"
> is idiomatic Perl.
>
> 'perldoc -f print' says:
>     Note that if you're storing FILEHANDLES in an array or other expression,
>     you will have to use a block returning its value instead:

Note that "in an array or other expression".  I've always thought the
intention of this phrase was "you _could_ help the parser by doing this,
if you have expression more complex than a simple scalar variable
reference".

IOW, I know that {} _can_ be used there, but I haven't seen people write
{$a_single_variable}, especially without a space around the "expression"
(technically, a single variable is an expression), when

	print $fh <stuff>

suffices, and I was curious why you chose to use the syntax when it wasn't
necessary.  Besides, {$fh} looks so eh... (hesitates to mention a dirty
word ^W^W^Wthe name of a different language, but bleeps it out)...

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle
  2010-01-16  0:43                 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle Jakub Narebski
  2010-01-16  0:58                   ` Junio C Hamano
@ 2010-01-24 22:14                   ` Petr Baudis
  2010-01-25  1:47                     ` Jakub Narebski
  1 sibling, 1 reply; 60+ messages in thread
From: Petr Baudis @ 2010-01-24 22:14 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: John 'Warthog9' Hawley, git

On Fri, Jan 15, 2010 at 04:43:32PM -0800, Jakub Narebski wrote:
> Third, wouldn't it be better to use shorter variable name, e.g. $out
> or $oh, instead of $output_handle?  We would be able to align print(f)
> statements without making lines much longer.

I totally agree, I was going to comment on this too. Please use
something very short, I think even the three characters might be pushing
it a bit. Having long idiom will make it real pain to both read and
write gitweb code.

It would still be nice if we could have some way to avoid this hack
entirely. Couldn't we layer PerlIO::via over STDOUT and continue to use
argument-less print as before?

I think we (well, in practice, "you" nowadays :) should be less
perfectionist when accepting gitweb contributions, so if the answer is
"it's too hard for me to do right now", that's fine I guess; I know I
can't contribute the code currently. :(

				Petr "Pasky" Baudis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle
  2010-01-24 22:14                   ` Petr Baudis
@ 2010-01-25  1:47                     ` Jakub Narebski
  2010-01-25 20:48                       ` J.H.
  0 siblings, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-01-25  1:47 UTC (permalink / raw)
  To: Petr Baudis, John 'Warthog9' Hawley
  Cc: John 'Warthog9' Hawley, git

On Sun, Jan 24, 2010 at 23:14 +0100, Petr Baudis wrote:
> On Fri, Jan 15, 2010 at 04:43:32PM -0800, Jakub Narebski wrote:

> > Third, wouldn't it be better to use shorter variable name, e.g. $out
> > or $oh, instead of $output_handle?  We would be able to align print(f)
> > statements without making lines much longer.
> 
> I totally agree, I was going to comment on this too. Please use
> something very short, I think even the three characters might be pushing
> it a bit. Having long idiom will make it real pain to both read and
> write gitweb code.
> 
> It would still be nice if we could have some way to avoid this hack
> entirely. Couldn't we layer PerlIO::via over STDOUT and continue to use
> argument-less print as before?

First, the original version in v2 series, the one that can be found in
42641b1 (GITWEB - File based caching layer, 2009-12-10) commit in
git://git.kernel.org/pub/scm/git/warthog9/gitweb.git gitweb-ml-v2
http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v2
the change was even larger. It was
  print <something>;
replaced by
  $output .= <something>;
and of course make functions that printed sth return formatted output
instead.

Also with original version you paid the price even if caching was turned
off (not used): the whole output had to fit in memory, and there was no
streaming, which meant that the whole output had to be generated before
it could be displayed (decreasing latency).

Second, PerlIO::via is not a good solution.  One solution would be to
require Capture::Tiny if caching is enabled, and just do

  $out = capture {
  	$actions{$action}->();
  }

or even simply (if it is possible with the way 'capture' is prototyped)

  $out = capture $actions{$action};

Another solution would be to use PerlIO::Util to redirect output to
scalar:

  *STDOUT->push_layer(scalar => \$buffer);
  # ...
  *STDOUT->pop_layer();

Yet another solution would be to manipulate *STDOUT directly

  $stdout = *STDOUT;
  *STDOUT = $buffer_fh;
  ...
  *STDOUT = $stdout;

for which, I think, to be done correctly we would have to duplicate
(large?) parts of Capture::Tiny.

So there is a bit of rule of preservation of difficulty at work.  Either
we have large patch adding explicit filehandle to all print statements
'print <sth> -> print $out <sth>' but simple code, or have smaller patch
but complicated *STDOUT manipulation, or have small patch but rely on 
non-core CPAN modules present.

> I think we (well, in practice, "you" nowadays :) should be less
> perfectionist when accepting gitweb contributions, so if the answer is
> "it's too hard for me to do right now", that's fine I guess; I know I
> can't contribute the code currently. :(

True.  At least two series ('categories' which I have copy locally
in http://repo.or.cz/w/git/jnareb-git.git as gitweb/category, and
'committags' by Marcel M. Cary[1]) that are not yet in gitweb because
of that...

[1]: http://thread.gmane.org/gmane.comp.version-control.git/100418/focus=133141

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle
  2010-01-25  1:47                     ` Jakub Narebski
@ 2010-01-25 20:48                       ` J.H.
  2010-01-25 21:48                         ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: J.H. @ 2010-01-25 20:48 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Petr Baudis, John 'Warthog9' Hawley, git

> So there is a bit of rule of preservation of difficulty at work.  Either
> we have large patch adding explicit filehandle to all print statements
> 'print <sth> -> print $out <sth>' but simple code, or have smaller patch
> but complicated *STDOUT manipulation, or have small patch but rely on 
> non-core CPAN modules present.

I think depending in non-core CPAN modules is a really bad idea, and
will cause some concern and consternation with mainline distributions,
besides making it more complicated for users to get this up and running
quickly and easily.  While I agree there are other ways of handling this
I think just adding the filehandle to the print statements provides the
easiest cross section of usability and functionality for everything
involved.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 8/9] gitweb: Convert output to using indirect file handle
  2010-01-25 20:48                       ` J.H.
@ 2010-01-25 21:48                         ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-25 21:48 UTC (permalink / raw)
  To: J.H.; +Cc: Petr Baudis, John 'Warthog9' Hawley, git

On Mon, 25 Jan 2010, J.H. wrote:

> > So there is a bit of rule of preservation of difficulty at work.  Either
> > we have large patch adding explicit filehandle to all print statements
> > 'print <sth> -> print $out <sth>' but simple code, or have smaller patch
> > but complicated *STDOUT manipulation, or have small patch but rely on 
> > non-core CPAN modules present.
> 
> I think depending in non-core CPAN modules is a really bad idea, and
> will cause some concern and consternation with mainline distributions,
> besides making it more complicated for users to get this up and running
> quickly and easily.  

Note: the modules in question are Capture::Tiny or PerlIO::Util.
Capture::Tiny is available as perl-Capture-Tiny RPM module for Fedora Core
up from FC11 and in DAG repository, at least. 

First, installing non-core CPAN modules locally for user is very simple
thanks to local::lib.  After setup it is simply 'cpan -i PerlIO::Util',
and PerlIO::Util gets installed locally, by default inside ~/perl5.

Second, we can go the route that Git.pm did, namely include sources for
extra modules (private-Error.pm in case of Git.pm) and install it if 
needed.  With proper build system (Module::Install, Module::Build) it is
even simpler with inc or inc::latest.  And there is always PAR[1] if you
want to have everything in one file.

[1]: http://par.perl.org

>                     While I agree there are other ways of handling this 
> I think just adding the filehandle to the print statements provides the
> easiest cross section of usability and functionality for everything
> involved.

Well, manipulating *STDOUT shouldn't be, I think, much harder, although
I guess it would need more care (so that e.g. "Generating..." page gets
written to real STDOUT).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header
  2010-01-14  1:23             ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header John 'Warthog9' Hawley
  2010-01-14  1:23               ` [PATCH 8/9] gitweb: Convert output to using indirect file handle John 'Warthog9' Hawley
@ 2010-01-15 23:49               ` Jakub Narebski
  1 sibling, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-15 23:49 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

I have modified commit message a bit

> If $site_header is not defined you get extraneous errors in the web
> logs:
> 
> [Wed Jan 13 16:55:42 2010] [error] [client ::1] [Wed Jan 13 16:55:42 2010]
>  gitweb.cgi: Use of uninitialized value $site_header in -f at .../gitweb.cgi line 3287.,
>  referer: http://git/gitweb.cgi
> 
> for example (line wrapped for better readibility).  This commit
> ensures that the variable is defined before trying to use it.

  Ordinarily build procedure ensures that $site_header is defined (but
  empty, therefore false-ish), so this issue might happen only because
  of errors in gitweb config file.  Nevetheless it is better to code
  defensively.

  Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
> ---
>  gitweb/gitweb.perl |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 07fdeb5..c4a177d 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3277,7 +3277,7 @@ EOF
>  	print "</head>\n" .
>  	      "<body>\n";
>  
> -	if (-f $site_header) {
> +	if ($site_header && -f $site_header) {

It should be, I think

+	if (defined $site_header && -f $site_header) {

although I guess that nobody would name $site_header file "0".

>  		insert_file($site_header);
>  	}
>  
> -- 

More important is the fact that it is not the only variable holding
file name, that is not checked that it is defined before use.  You
should do the same change also for $site_footer and $home_text, and
die_error(500, "Gitweb misconfigured") or just die_error(500) if
$projects_list is not defined.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 5/9] gitweb: add a get function to compliment print_local_time
  2010-01-14  1:23         ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time John 'Warthog9' Hawley
  2010-01-14  1:23           ` [PATCH 6/9] gitweb: add a get function to compliment print_sort_th John 'Warthog9' Hawley
@ 2010-01-23 11:13           ` Jakub Narebski
  1 sibling, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-23 11:13 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

Should authorship be associated with eaglescrag.net or kernel.org?

> This adds a get function for print_local_time so that the basic
> function can be used outside of their straight printing operation.

Signoff?
> ---
>  gitweb/gitweb.perl |   11 +++++++++--
>  1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index b41bc33..d38aad6 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3537,14 +3537,21 @@ sub git_print_header_div {
>  }
>  
>  sub print_local_time {
> +	print get_local_time(@_);
> +}
> +
> +sub get_local_time {

This new function should be rather named 'format_local_time', to
follow gitweb convention of naming subroutines.  (And of course commit
message needs to be changed accordingly.)

This comment applies also to the following patch.

> +	my $localtime = "";
>  	my %date = @_;
>  	if ($date{'hour_local'} < 6) {
> -		printf(" (<span class=\"atnight\">%02d:%02d</span> %s)",
> +		$localtime .= sprintf(" (<span class=\"atnight\">%02d:%02d</span> %s)",
>  			$date{'hour_local'}, $date{'minute_local'}, $date{'tz_local'});
>  	} else {
> -		printf(" (%02d:%02d %s)",
> +		$localtime .= sprintf(" (%02d:%02d %s)",
>  			$date{'hour_local'}, $date{'minute_local'}, $date{'tz_local'});
>  	}
> +
> +	return $localtime

Style (terminate statement with ';'): 

  +	return $localtime;

>  }
>  
>  # Outputs the author name and date in long form
> -- 
> 1.6.5.2
> 

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/9] gitweb: Add option to force version match
  2010-01-14  1:22     ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley
  2010-01-14  1:23       ` [PATCH 4/9] gitweb: Makefile improvements John 'Warthog9' Hawley
@ 2010-01-15 23:36       ` Jakub Narebski
  2010-01-24 21:59       ` Petr Baudis
  2 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-15 23:36 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> This adds $git_versions_must_match variable, which is set to true,
> checks that we are running on the same version of git that we
> shipped with, and if not throw '500 Internal Server Error' error.
> What is checked is the version of gitweb (embedded in building
> gitweb.cgi), against version of runtime git binary used.
> 
> Gitweb can usually run with a mismatched git install.  This is more
> here to give an obvious warning as to whats going on vs. silently
> failing.
> 
> By default this feature is turned on.

If this feature is turned on, then I would prefer for gitweb tests to
have it explicitly turned off, so I don't need to rebuild git to test
gitweb.  

Also it would be nice to have some tests for this new feature.

See patch below (proposed to be squashed with this one).


P.S. t is there where I have noticed the issue with undefined $action
in git_footer_html(), mentioned in reply to patch 1/9.
 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
> Signed-off-by: Jakub Narebski <jnareb@gmail.com>
> ---

Here could be information about differences from previous versions of
this patch.

>  gitweb/README      |    3 +++
>  gitweb/gitweb.perl |   23 +++++++++++++++++++++++
>  2 files changed, 26 insertions(+), 0 deletions(-)
> 
> diff --git a/gitweb/README b/gitweb/README
> index 6c2c8e1..03151d2 100644
> --- a/gitweb/README
> +++ b/gitweb/README
> @@ -233,6 +233,9 @@ not include variables usually directly set during build):
>     If server load exceed this value then return "503 Service Unavaliable" error.
>     Server load is taken to be 0 if gitweb cannot determine its value.  Set it to
>     undefined value to turn it off.  The default is 300.
> + * $git_versions_must_match
> +   If set, gitweb fails with 500 Internal Server Error if the version of gitweb
> +   doesn't match version of git binary.  The default is true.
>  
>  
>  Projects list file format
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 8298de5..b41bc33 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -221,6 +221,9 @@ our %avatar_size = (
>  	'double'  => 32
>  );
>  
> +# If it is true, exit if gitweb version and git binary version don't match
> +our $git_versions_must_match = 1;
> +
>  # Used to set the maximum load that we will still respond to gitweb queries.
>  # If server load exceed this value then return "503 server busy" error.
>  # If gitweb cannot determined server load, it is taken to be 0.
> @@ -587,6 +590,26 @@ if (defined $maxload && get_loadavg() > $maxload) {
>  our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown";
>  $number_of_git_cmds++;
>  
> +# Throw an error if git versions does not match, if $git_versions_must_match is true.
> +if ($git_versions_must_match &&
> +    $git_version ne $version) {
> +	my $admin_contact =
> +		defined $ENV{'SERVER_ADMIN'} ? ", $ENV{'SERVER_ADMIN'}," : '';
> +	my $err_msg = <<EOT;
> +<h1 align="center">*** Warning ***</h1>
> +<p>
> +This version of gitweb was compiled for <b>@{[esc_html($version)]}</b>,
> +however git version <b>@{[esc_html($git_version)]}</b> was found on server,
> +and administrator requested strict version checking.
> +</p>
> +<p>
> +Please contact the server administrator${admin_contact} to either configure
> +gitweb to allow mismatched versions, or update git or gitweb installation.
> +</p>
> +EOT
> +	die_error(500, 'Internal server error', $err_msg);
> +}
> +
>  $projects_list ||= $projectroot;
>  
>  # ======================================================================
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index 76d8b7b..d9ffc90
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -25,6 +25,7 @@ our \$favicon = 'file:///$TEST_DIRECTORY/../gitweb/git-favicon.png';
 our \$projects_list = '';
 our \$export_ok = '';
 our \$strict_export = '';
+our \$git_versions_must_match = 0;
 
 EOF
 
diff --git a/t/t9501-gitweb-standalone-http-status.sh b/t/t9501-gitweb-standalone-http-status.sh
index 0688a57..721900e 100755
--- a/t/t9501-gitweb-standalone-http-status.sh
+++ b/t/t9501-gitweb-standalone-http-status.sh
@@ -113,5 +113,31 @@ test_expect_success 'snapshots: bad object id' '
 '
 test_debug 'cat gitweb.output'
 
+# ======================================================================
+# check $git_versions_must_match feature
+# should be last section, just in case
+cp -f gitweb_config.perl gitweb_config.perl.bak
+echo 'our $git_versions_must_match = 1;' >>gitweb_config.perl
+
+cat <<\EOF >>gitweb_config.perl
+our $version = "current";
+EOF
+test_expect_success 'force version match: no match' '
+	gitweb_run "p=.git" &&
+	grep "500 - Internal Server Error" gitweb.output
+'
+test_debug 'cat gitweb.headers'
+
+cat <<\EOF >>gitweb_config.perl
+# must be kept in sync with code in gitweb/gitweb.perl
+our $version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown";
+EOF
+test_expect_success 'force version match: match' '
+	gitweb_run "p=.git" &&
+	grep "Status: 200 OK" gitweb.output
+'
+test_debug 'cat gitweb.headers'
+
+mv -f gitweb_config.perl.bak gitweb_config.perl
 
 test_done

> -- 
> 1.6.5.2
> 

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/9] gitweb: Add option to force version match
  2010-01-14  1:22     ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley
  2010-01-14  1:23       ` [PATCH 4/9] gitweb: Makefile improvements John 'Warthog9' Hawley
  2010-01-15 23:36       ` [PATCH 3/9] gitweb: Add option to force version match Jakub Narebski
@ 2010-01-24 21:59       ` Petr Baudis
  2010-01-24 23:17         ` Jakub Narebski
  2 siblings, 1 reply; 60+ messages in thread
From: Petr Baudis @ 2010-01-24 21:59 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git

  Hi!

On Wed, Jan 13, 2010 at 05:22:59PM -0800, John 'Warthog9' Hawley wrote:
> +# Throw an error if git versions does not match, if $git_versions_must_match is true.
> +if ($git_versions_must_match &&
> +    $git_version ne $version) {
> +	my $admin_contact =
> +		defined $ENV{'SERVER_ADMIN'} ? ", $ENV{'SERVER_ADMIN'}," : '';
> +	my $err_msg = <<EOT;
> +<h1 align="center">*** Warning ***</h1>
> +<p>
> +This version of gitweb was compiled for <b>@{[esc_html($version)]}</b>,
> +however git version <b>@{[esc_html($git_version)]}</b> was found on server,
> +and administrator requested strict version checking.
> +</p>
> +<p>
> +Please contact the server administrator${admin_contact} to either configure
> +gitweb to allow mismatched versions, or update git or gitweb installation.
> +</p>
> +EOT
> +	die_error(500, 'Internal server error', $err_msg);
> +}
> +

  I get very unhappy when *I* am the server administrator and read
messages like this. ;-) Could you please mention the
\$git_versions_must_match in the text?

				Petr "Pasky" Baudis

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 3/9] gitweb: Add option to force version match
  2010-01-24 21:59       ` Petr Baudis
@ 2010-01-24 23:17         ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-24 23:17 UTC (permalink / raw)
  To: Petr Baudis
  Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	git

Petr Baudis <pasky@suse.cz> writes:

> On Wed, Jan 13, 2010 at 05:22:59PM -0800, John 'Warthog9' Hawley wrote:

> > +# Throw an error if git versions does not match, if $git_versions_must_match is true.
> > +if ($git_versions_must_match &&
> > +    $git_version ne $version) {
> > +	my $admin_contact =
> > +		defined $ENV{'SERVER_ADMIN'} ? ", $ENV{'SERVER_ADMIN'}," : '';
> > +	my $err_msg = <<EOT;
> > +<h1 align="center">*** Warning ***</h1>
> > +<p>
> > +This version of gitweb was compiled for <b>@{[esc_html($version)]}</b>,
> > +however git version <b>@{[esc_html($git_version)]}</b> was found on server,
> > +and administrator requested strict version checking.

+and administrator requested strict version checking by setting
+\$git_versions_must_match to false value (e.g. to 0) in gitweb configuration file,
+'@{[esc_path(-e $GITWEB_CONFIG ? $GITWEB_CONFIG : $GITWEB_CONFIG_SYSTEM)]}'.

> > +</p>
> > +<p>
> > +Please contact the server administrator${admin_contact} to either configure
> > +gitweb to allow mismatched versions, or update git or gitweb installation.
> > +</p>
> > +EOT
> > +	die_error(500, 'Internal server error', $err_msg);
> > +}
> > +
> 
>   I get very unhappy when *I* am the server administrator and read
> messages like this. ;-) Could you please mention the
> \$git_versions_must_match in the text?

Something like the text (extension) proposed above?


Actually, J.H., if you are changing the default, you should also change the
message to mention that checking versions is the default, and administrator
can turn it off[1].  It is incorrect if $git_versions_must_match defaults
to true.

[1] And you have to take into account that gitweb config file can not
exist, and neither $GITWEB_CONFIG nor $GITWEB_CONFIG_SYSTEM needs to be
non-empty.  J.H., are you really, really sure that it is a good idea to
have version check turned on by default, in light of this issue?
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information
  2010-01-14  1:22   ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information John 'Warthog9' Hawley
  2010-01-14  1:22     ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley
@ 2010-01-15 22:40     ` Jakub Narebski
  1 sibling, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-15 22:40 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

Don't you want kernel.org address also for authorship?  Also commit
summary can be written in shorter way, see proposal below.

From: "John 'Warthog9' Hawley" <warthog9@kernel.org>
Subject: gitweb: Allow for longer error explanation in die_error()

> This is a small change that just adds a 3rd, optional, parameter to die_error
> that allows for extended error information to be output along with what the
> error was.

Singed-off-by: "John 'Warthog9' Hawley" <warthog9@kernel.org>
> ---
>  gitweb/gitweb.perl |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 0a07d3a..8298de5 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3391,6 +3391,7 @@ sub git_footer_html {
>  sub die_error {
>  	my $status = shift || 500;
>  	my $error = shift || "Internal server error";
> +	my $extra = shift;
>  
>  	my %http_responses = (
>  		400 => '400 Bad Request',
> @@ -3405,8 +3406,13 @@ sub die_error {
>  <br /><br />
>  $status - $error
>  <br />
> -</div>
>  EOF
> +	if (defined $extra) {
> +		print "<hr />\n" .
> +			"$extra\n";

Following gitweb whitespace convention (tab for indent, space for
align), it should be:

 +		print "<hr />\n" .
 +		      "$extra\n";

(6 x space in place of last tab).

> +	}
> +	print "</div>\n";
> +
>  	git_footer_html();
>  	exit;
>  }
> -- 
> 1.6.5.2
> 

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/9] gitweb: Load checking
  2010-01-14  1:22 ` [PATCH 1/9] gitweb: Load checking John 'Warthog9' Hawley
  2010-01-14  1:22   ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information John 'Warthog9' Hawley
@ 2010-01-15 22:30   ` Jakub Narebski
  1 sibling, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-15 22:30 UTC (permalink / raw)
  To: John 'Warthog9' Hawley; +Cc: git

This one looks good, but while examining other patch in this series
I have noticed rare situation where we would get Perl error with
this patch.

I have added fix-up for this issue, although I guess that better
solution might be not to add any <script> element for git_footer_html
called from die_error.

I'm sorry I haven't noticed this earlier.


"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> From: John 'Warthog9' Hawley <warthog9@kernel.org>
> 
> This changes slightly the behavior of gitweb, so that it verifies
> that the box isn't inundated with before attempting to serve gitweb.
> If the box is overloaded, it basically returns a 503 Server Unavailable
> until the load falls below the defined threshold.  This helps dramatically
> if you have a box that's I/O bound, reaches a certain load and you
> don't want gitweb, the I/O hog that it is, increasing the pain the
> server is already undergoing.
> 
> This behavior is controlled by $maxload configuration variable.
> Default is a load of 300, which for most cases should never be hit.
> Unset it (set it to undefined value, i.e. undef) to turn off checking.
> 
> Currently it requires that '/proc/loadavg' file exists, otherwise the
> load check is bypassed (load is taken to be 0).  So platforms that do
> not implement '/proc/loadavg' currently cannot use this feature.
> (provisions are included for additional checks to be added by others)
>
While at it check that $action is defined before comparing it in
git_footer_html() subroutine.  Until this patch there were no direct
or indirect (via die_error) invocation of git_footer_html() with
$action undefined; each call was after dispatch, which sets $action to
default value if it is undefined.

This would cause Perl error ("Use of uninitialized value in string eq")
if load is too high _and_ gitweb was invoked without action parameter
explicitly set (e.g. for projects list).
 
> Signed-off-by: John 'Warthog9' Hawley <warthog9@kernel.org>
> Signed-off-by: Jakub Narebski <jnareb@gmail.com>
> ---
>  gitweb/README      |    7 ++++++-
>  gitweb/gitweb.perl |   45 +++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 47 insertions(+), 5 deletions(-)
> 
> diff --git a/gitweb/README b/gitweb/README
> index e34ee79..6c2c8e1 100644
> --- a/gitweb/README
> +++ b/gitweb/README
> @@ -174,7 +174,7 @@ not include variables usually directly set during build):
>     Base URL for relative URLs in pages generated by gitweb,
>     (e.g. $logo, $favicon, @stylesheets if they are relative URLs),
>     needed and used only for URLs with nonempty PATH_INFO via
> -   <base href="$base_url>.  Usually gitweb sets its value correctly,
> +   <base href="$base_url">.  Usually gitweb sets its value correctly,
>     and there is no need to set this variable, e.g. to $my_uri or "/".
>   * $home_link
>     Target of the home link on top of all pages (the first part of view
> @@ -228,6 +228,11 @@ not include variables usually directly set during build):
>     repositories from launching cross-site scripting (XSS) attacks.  Set this
>     to true if you don't trust the content of your repositories. The default
>     is false.
> + * $maxload
> +   Used to set the maximum load that we will still respond to gitweb queries.
> +   If server load exceed this value then return "503 Service Unavaliable" error.
> +   Server load is taken to be 0 if gitweb cannot determine its value.  Set it to
> +   undefined value to turn it off.  The default is 300.
>  
>  
>  Projects list file format
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 7e477af..0a07d3a 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -221,6 +221,12 @@ our %avatar_size = (
>  	'double'  => 32
>  );
>  
> +# Used to set the maximum load that we will still respond to gitweb queries.
> +# If server load exceed this value then return "503 server busy" error.
> +# If gitweb cannot determined server load, it is taken to be 0.
> +# Leave it undefined (or set to 'undef') to turn off load checking.
> +our $maxload = 300;
> +
>  # You define site-wide feature defaults here; override them with
>  # $GITWEB_CONFIG as necessary.
>  our %feature = (
> @@ -551,6 +557,32 @@ if (-e $GITWEB_CONFIG) {
>  	do $GITWEB_CONFIG_SYSTEM if -e $GITWEB_CONFIG_SYSTEM;
>  }
>  
> +# Get loadavg of system, to compare against $maxload.
> +# Currently it requires '/proc/loadavg' present to get loadavg;
> +# if it is not present it returns 0, which means no load checking.
> +sub get_loadavg {
> +	if( -e '/proc/loadavg' ){
> +		open my $fd, '<', '/proc/loadavg'
> +			or return 0;
> +		my @load = split(/\s+/, scalar <$fd>);
> +		close $fd;
> +
> +		# The first three columns measure CPU and IO utilization of the last one,
> +		# five, and 10 minute periods.  The fourth column shows the number of
> +		# currently running processes and the total number of processes in the m/n
> +		# format.  The last column displays the last process ID used.
> +		return $load[0] || 0;
> +	}
> +	# additional checks for load average should go here for things that don't export
> +	# /proc/loadavg
> +
> +	return 0;
> +}
> +
> +if (defined $maxload && get_loadavg() > $maxload) {
> +	die_error(503, "The load average on the server is too high");
> +}
> +
>  # version of the core git binary
>  our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown";
>  $number_of_git_cmds++;
> @@ -3354,14 +3386,19 @@ sub git_footer_html {
>  # 500: The server isn't configured properly, or
>  #      an internal error occurred (e.g. failed assertions caused by bugs), or
>  #      an unknown error occurred (e.g. the git binary died unexpectedly).
> +# 503: The server is currently unavailable (because it is overloaded,
> +#      or down for maintenance).  Generally, this is a temporary state.
>  sub die_error {
>  	my $status = shift || 500;
>  	my $error = shift || "Internal server error";
>  
> -	my %http_responses = (400 => '400 Bad Request',
> -			      403 => '403 Forbidden',
> -			      404 => '404 Not Found',
> -			      500 => '500 Internal Server Error');
> +	my %http_responses = (
> +		400 => '400 Bad Request',
> +		403 => '403 Forbidden',
> +		404 => '404 Not Found',
> +		500 => '500 Internal Server Error',
> +		503 => '503 Service Unavailable',
> +	);
>  	git_header_html($http_responses{$status});
>  	print <<EOF;
>  <div class="page_body">
@@ -3354,7 +3354,8 @@ sub git_footer_html {
 	}
 
 	print qq!<script type="text/javascript" src="$javascript"></script>\n!;
-	if ($action eq 'blame_incremental') {
+	if (defined $action &&
+	    $action eq 'blame_incremental') {
 		print qq!<script type="text/javascript">\n!.
 		      qq!startBlame("!. href(action=>"blame_data", -replay=>1) .qq!",\n!.
 		      qq!           "!. href() .qq!");\n!.

> -- 
> 1.6.5.2
> 

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] Gitweb caching v5
  2010-01-14  1:22 [PATCH 0/9] Gitweb caching v5 John 'Warthog9' Hawley
  2010-01-14  1:22 ` [PATCH 1/9] gitweb: Load checking John 'Warthog9' Hawley
@ 2010-01-15  1:40 ` Jakub Narebski
  2010-01-15  4:29   ` J.H.
  1 sibling, 1 reply; 60+ messages in thread
From: Jakub Narebski @ 2010-01-15  1:40 UTC (permalink / raw)
  To: John 'Warthog9' Hawley, John 'Warthog9' Hawley; +Cc: git

"John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

> Afternoon everyone,
>  
> This is the latest incarnation of gitweb w/ caching.  This is
> finally at the point where it should probably start either being
> considered for inclusion or mainline, or I need to accept that this
> will never get in and more perminantely fork (as is the case with
> Fedora where this is going in as gitweb-caching as a parrallel rpm
> package).
> 
> That said this brings the base up to mainline (again),

Could you tell us which commit is the base of this series (like in
git-request-pull output), i.e. which commit this series is rebased
against?

> it updates a
> number of elements in the caching engine, and this is a much cleaner
> break-out of the tree vs. what I am currently developing against.

Is caching engine part changed since v2?

> v5:
> 	- Missed a couple of things that were in my local tree, and
> 	  added them back in.

That doesn't tell us much.

> 	- Split up the die_error and the version matching patch
> 	- Set version matching to be on by default - otherwise this
> 	  really is code that will never get checked, or at best
> 	  enabled by default by distributions
> 	- Added a minor code cleanup with respect to $site_header
> 	  that was already in my tree
> 	- Applied against a more recent git tree vs. 1.6.6-rc2
> 	- Removed breakout patch for now (did that in v4 actually)
> 	  and will deal with that separately 
> 
> 	http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v5
 
----
Short comments about patches in this series; I will be sending
detailed comments for each patch individually.

> John 'Warthog9' Hawley (9):
>   gitweb: Load checking
Looks good.

>   gitweb: change die_error to take "extra" argument for extended die
>     information
Commit message could be better (summary should really be shorter), and
I think there is some indent typo, but otherwise looks good.

>   gitweb: Add option to force version match
For me it needs to be disabled in gitweb test suite (t/gitweb-lib.sh),
if it is enabled by default.  I really like that I can test current
gitweb without need to recompile git.

Also it should have tests that it works as intended (both for matching
and non-matching versions) in t/t9501-gitweb-standalone-http-status.sh

>   gitweb: Makefile improvements
Does it differ from my proposal (i.e. gitweb/Makefile doing the work),
based on your idea ("make gitweb" for Makefile and gitweb/Makefile)?

>   gitweb: add a get function to compliment print_local_time
>   gitweb: add a get function to compliment print_sort_th
Those two looks O.K. from what I seen.

>   gitweb: cleanup error message produced by undefined $site_header
Shouldn't there be such protection for other such variables, like
$site_footer and $home_text (and a bit diferent protection against
undefined $projects_list)?  By the way, how did you arrived at
undefined $site_header: deafult build configuration leaves it empty,
but defined. 

>   gitweb: Convert output to using indirect file handle
I have alternate solution, using shorter filehandle name (just $out)
in

  git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel
  http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel

I would have to think a bit about separate handle for binary files;
I am not sure if it is really required.

>   gitweb: File based caching layer (from git.kernel.org)
I am working (time permitting) in spliting this large code drop into
smaller commits, namely:

 - href(..., -path_info => 0)          (for cache key)
 - simple file based caching + tests
 - global expire time + tests
 - output caching in gitweb            (WIP)
 - adaptive expiration time            (planned)
 - tee output / cache write            (planned)
 - expire time variation from CHI      (planned)
 - locking for single writer           (planned)
 - server-side generating info         (planned)
 - AJAX-y generating info              (wishlist)

while ensuring that it pass all existing gitweb tests, and adding new
tests for new features.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] Gitweb caching v5
  2010-01-15  1:40 ` [PATCH 0/9] Gitweb caching v5 Jakub Narebski
@ 2010-01-15  4:29   ` J.H.
  2010-01-15 10:28     ` Jakub Narebski
  0 siblings, 1 reply; 60+ messages in thread
From: J.H. @ 2010-01-15  4:29 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: John 'Warthog9' Hawley, git

On 01/14/2010 05:40 PM, Jakub Narebski wrote:
> "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:
> 
>> Afternoon everyone,
>>  
>> This is the latest incarnation of gitweb w/ caching.  This is
>> finally at the point where it should probably start either being
>> considered for inclusion or mainline, or I need to accept that this
>> will never get in and more perminantely fork (as is the case with
>> Fedora where this is going in as gitweb-caching as a parrallel rpm
>> package).
>>
>> That said this brings the base up to mainline (again),
> 
> Could you tell us which commit is the base of this series (like in
> git-request-pull output), i.e. which commit this series is rebased
> against?

This series was based on
git://git.kernel.org/pub/scm/git/git.git
054d2fa05cf0bc55fe1556c9e87d58d67a144f44

http://git.kernel.org/?p=git/warthog9/gitweb.git;a=shortlog;h=refs/heads/gitweb-ml-v5

> 
>> it updates a
>> number of elements in the caching engine, and this is a much cleaner
>> break-out of the tree vs. what I am currently developing against.
> 
> Is caching engine part changed since v2?

Slightly, not dramatically.  Only changes were to modify the caching
engine to deal with the change in output mechanism (I.E. my $output ->
print {$fh})

<snip>
>> John 'Warthog9' Hawley (9):
>>   gitweb: Load checking
> Looks good.
> 
>>   gitweb: change die_error to take "extra" argument for extended die
>>     information
> Commit message could be better (summary should really be shorter), and
> I think there is some indent typo, but otherwise looks good.
> 
>>   gitweb: Add option to force version match
> For me it needs to be disabled in gitweb test suite (t/gitweb-lib.sh),
> if it is enabled by default.  I really like that I can test current
> gitweb without need to recompile git.
> 
> Also it should have tests that it works as intended (both for matching
> and non-matching versions) in t/t9501-gitweb-standalone-http-status.sh

I'll get t9501 cleaned up and make sure that the tests default to
turning it off, and I'll add a test to confirm that this works.

>>   gitweb: Makefile improvements
> Does it differ from my proposal (i.e. gitweb/Makefile doing the work),
> based on your idea ("make gitweb" for Makefile and gitweb/Makefile)?

I think this is taken straight from the version you had, I don't think
I've modified it.

> 
>>   gitweb: add a get function to compliment print_local_time
>>   gitweb: add a get function to compliment print_sort_th
> Those two looks O.K. from what I seen.
> 
>>   gitweb: cleanup error message produced by undefined $site_header
> Shouldn't there be such protection for other such variables, like
> $site_footer and $home_text (and a bit diferent protection against
> undefined $projects_list)?  By the way, how did you arrived at
> undefined $site_header: deafult build configuration leaves it empty,
> but defined.

I would have to go back and figure it out, but it's something I hit
years ago and added that check to keep it from spewing all over my logs.
 Could easily add it to the others mentioned.

>>   gitweb: Convert output to using indirect file handle
> I have alternate solution, using shorter filehandle name (just $out)
> in
> 
>   git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel
>   http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel
> 
> I would have to think a bit about separate handle for binary files;
> I am not sure if it is really required.

For caching you have to have it.  When your outputing the data back from
the cache you need to switch the output mode for the browser to receive
the data properly.  Otherwise the resulting output from the caching
engine is going to be garbage.  The caching engine explicitly stores the
binary files separate from the rest of the response.

>>   gitweb: File based caching layer (from git.kernel.org)
> I am working (time permitting) in spliting this large code drop into
> smaller commits, namely:

first up, is there a reason not to take the caching layer as it stands
while you work on these?  I'm fine with adding test cases for what's
there now if you want, but I guess I'm confused about explicitly wanting
to break these into smaller commits.

>  - href(..., -path_info => 0)          (for cache key)

You actually *really* want to have the full url vs. just the path_info.
 While I accept that this means that you will end up with multiple
copies of data being stored it helps dramatically if you have multiple
sites pointing into the same caching space.  If you happen to have two
distinct trees

http://git.public.com/?p=test.git;a=summary
http://git.private.com/?p=test.git;a=summary

That respectively point to:

/group/public/git/test.git
/group/private/git/test.git

you'll end up squashing the cache files needlessly and erroneously as
what's in the cache file will depend on what last site was hit that
generated the file.

>  - simple file based caching + tests
>  - global expire time + tests
>  - output caching in gitweb            (WIP)
>  - adaptive expiration time            (planned)
>  - tee output / cache write            (planned)

You sadly can't 'tee' the output as this would re-introduce the
stampeding heard problem which is one of the reasons the caching layer
came about in the first place.  Suppose you could give one person the
output but make everyone else wait for the cache to finish writing out,
or have the waiting client processes tail the file while it's generated
but those both seem a little excessive vs. just waiting.

>  - expire time variation from CHI      (planned)
>  - locking for single writer           (planned)
>  - server-side generating info         (planned)
>  - AJAX-y generating info              (wishlist)

If it's helpful I can genuinely devote several more days to this to get
these cleaned up.  Which would save you a fair amount of time in
breaking this up.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/9] Gitweb caching v5
  2010-01-15  4:29   ` J.H.
@ 2010-01-15 10:28     ` Jakub Narebski
  0 siblings, 0 replies; 60+ messages in thread
From: Jakub Narebski @ 2010-01-15 10:28 UTC (permalink / raw)
  To: J.H.; +Cc: John 'Warthog9' Hawley, git

On Fri, 15 Jan 2010, J.H. wrote:
> On 01/14/2010 05:40 PM, Jakub Narebski wrote:
>> "John 'Warthog9' Hawley" <warthog9@eaglescrag.net> writes:

>>> it updates a
>>> number of elements in the caching engine, and this is a much cleaner
>>> break-out of the tree vs. what I am currently developing against.
>> 
>> Is caching engine part changed since v2?
> 
> Slightly, not dramatically.  Only changes were to modify the caching
> engine to deal with the change in output mechanism (I.E. my $output ->
> print {$fh})

O.K.

It is still quite large code dump...

>>> John 'Warthog9' Hawley (9):
[...]
>>>   gitweb: Add option to force version match
>> For me it needs to be disabled in gitweb test suite (t/gitweb-lib.sh),
>> if it is enabled by default.  I really like that I can test current
>> gitweb without need to recompile git.
>> 
>> Also it should have tests that it works as intended (both for matching
>> and non-matching versions) in t/t9501-gitweb-standalone-http-status.sh
> 
> I'll get t9501 cleaned up and make sure that the tests default to
> turning it off, and I'll add a test to confirm that this works.

I'll send you proposed fixup to squash, so if you want you can simply
wait for a bit.

[...] 
>>>   gitweb: cleanup error message produced by undefined $site_header
>> Shouldn't there be such protection for other such variables, like
>> $site_footer and $home_text (and a bit diferent protection against
>> undefined $projects_list)?  By the way, how did you arrived at
>> undefined $site_header: deafult build configuration leaves it empty,
>> but defined.
> 
> I would have to go back and figure it out, but it's something I hit
> years ago and added that check to keep it from spewing all over my logs.
> Could easily add it to the others mentioned.

Please remember that while for $home_text, $site_header and $site_footer
it would be enough to replace

  if (-f $file_variable) {
  	...
  }

with

  if (defined $file_variable && -f $file_variable) {
  	...
  }

for undefined $projects_list you have to fail (use die_error).


Defensive programming, here protecting against errors in gitweb config,
is a good idea IMHO.

>>>   gitweb: Convert output to using indirect file handle
>> I have alternate solution, using shorter filehandle name (just $out)
>> in
>> 
>>   git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel
>>   http://repo.or.cz/w/git/jnareb-git.git/shortlog/refs/heads/gitweb/cache-kernel
>> 
>> I would have to think a bit about separate handle for binary files;
>> I am not sure if it is really required.
> 
> For caching you have to have it.  When your outputing the data back from
> the cache you need to switch the output mode for the browser to receive
> the data properly.  Otherwise the resulting output from the caching
> engine is going to be garbage.  The caching engine explicitly stores the
> binary files separate from the rest of the response.

I think you don't.  Just read from cache file and print to browser in
binary (:raw) mode; IIUC the ':utf8' is needed only when you print from
Perl.

>>>   gitweb: File based caching layer (from git.kernel.org)
>> I am working (time permitting) in spliting this large code drop into
>> smaller commits, namely:
> 
> first up, is there a reason not to take the caching layer as it stands
> while you work on these?  I'm fine with adding test cases for what's
> there now if you want, but I guess I'm confused about explicitly wanting
> to break these into smaller commits.

It's there a reason to take this big code dump while we (me and you) are
working on it?  As I see there is no other work on gitweb now, and for the
time being you can run your fork for a bit.  Or is it about planned
deployment on git.fedora.org or something?  Anyway when there is official
release of git including gitweb output caching, it should be easy to switch
off from caching fork.
 
The caching code has also a few issues: coding style, Perl style (e.g.
not using $cache_enabled / $caching_enabled as boolean), duplicated code,
unnecessary open/close dance around stat (you can stat file which is not
closed), not using atomic write (I'm not sure how important this issue
is, especially with locking), not being able to replace cache by some
other caching module, etc.


>>  - href(..., -path_info => 0)          (for cache key)
> 
> You actually *really* want to have the full url vs. just the path_info.

Errr... it is not about full url (although I errorneously used relative
URL for human-readable cache key) vs only path_info, but about path_info
(for example http://git.public.com/test.git) and query (for example
http://git.public.com/?p=test.git) URLs which leads to exactly the same
view.

> While I accept that this means that you will end up with multiple
> copies of data being stored it helps dramatically if you have multiple
> sites pointing into the same caching space.  If you happen to have two
> distinct trees
> 
> http://git.public.com/?p=test.git;a=summary
> http://git.private.com/?p=test.git;a=summary
> 
> That respectively point to:
> 
> /group/public/git/test.git
> /group/private/git/test.git
> 
> you'll end up squashing the cache files needlessly and erroneously as
> what's in the cache file will depend on what last site was hit that
> generated the file.

So the key (the human-readable key, not it's MD5 digest) would be

  href(-replay=>1, -full_url=>1, -path_info=>0);

>>  - simple file based caching + tests
>>  - global expire time + tests
>>  - output caching in gitweb            (WIP)
>>  - adaptive expiration time            (planned)
>>  - tee output / cache write            (planned)
> 
> You sadly can't 'tee' the output as this would re-introduce the
> stampeding heard problem which is one of the reasons the caching layer
> came about in the first place.  Suppose you could give one person the
> output but make everyone else wait for the cache to finish writing out,
> or have the waiting client processes tail the file while it's generated
> but those both seem a little excessive vs. just waiting.

The "tee" is about serving [one person] the output, while cache is being
written.  It is a bit orthogonal issue to the 'stampeding herd' protection.
 
>>  - expire time variation from CHI      (planned)

This feature is meant to prevent cache miss stampedes[1]

[1] http://search.cpan.org/~jswartz/CHI-0.33/lib/CHI.pm#expires_variance

>>  - locking for single writer           (planned)

...or rather locking to avoid 'stampeding herd' problem.

BTW I don't recall if it isn't the solution you used, but if there exists
stale cache entry, the other clients can use stale version instead of
forcing to wait till the writer finishes updating cache.

>>  - server-side generating info         (planned)
>>  - AJAX-y generating info              (wishlist)
> 
> If it's helpful I can genuinely devote several more days to this to get
> these cleaned up.  Which would save you a fair amount of time in
> breaking this up.

I'm in the middle of working on gitweb caching, with the intent of arriving
at (super)set of features that are in your patch series.

See http://repo.or.cz/w/?p=git/jnareb-git.git;a=shortlog;h=refs/heads/gitweb/cache-kernel
(this branch is rebased / amended / rewritten).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2010-02-07 21:32 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-14  1:22 [PATCH 0/9] Gitweb caching v5 John 'Warthog9' Hawley
2010-01-14  1:22 ` [PATCH 1/9] gitweb: Load checking John 'Warthog9' Hawley
2010-01-14  1:22   ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information John 'Warthog9' Hawley
2010-01-14  1:22     ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley
2010-01-14  1:23       ` [PATCH 4/9] gitweb: Makefile improvements John 'Warthog9' Hawley
2010-01-14  1:23         ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time John 'Warthog9' Hawley
2010-01-14  1:23           ` [PATCH 6/9] gitweb: add a get function to compliment print_sort_th John 'Warthog9' Hawley
2010-01-14  1:23             ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header John 'Warthog9' Hawley
2010-01-14  1:23               ` [PATCH 8/9] gitweb: Convert output to using indirect file handle John 'Warthog9' Hawley
2010-01-14  1:23                 ` [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
2010-01-16  2:48                   ` Jakub Narebski
2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) output caching (WIP) Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' problem (WIP) Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for filling cache (WIP) Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating " Jakub Narebski
2010-01-24 22:24                       ` Petr Baudis
2010-01-25  0:03                         ` Jakub Narebski
2010-01-25  1:17                           ` Jakub Narebski
2010-01-25 11:46                         ` Jakub Narebski
2010-01-25 13:02                           ` Petr Baudis
2010-01-25 13:48                             ` Jakub Narebski
2010-01-25 13:56                               ` Petr Baudis
2010-01-25 20:32                                 ` J.H.
2010-01-26  1:49                                   ` Jakub Narebski
2010-01-28 17:39                                   ` Petr Baudis
2010-01-31 11:58                                     ` Jakub Narebski
2010-01-25 20:58                                 ` Jakub Narebski
2010-01-25 20:41                               ` J.H.
2010-01-26  2:30                                 ` Jakub Narebski
2010-01-23 19:55                     ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H.
2010-01-24 13:54                     ` [RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when regenerating cache (WIP) Jakub Narebski
2010-02-06  0:51                     ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H.
2010-02-06 23:56                       ` Jakub Narebski
2010-02-07 12:35                         ` Jakub Narebski
     [not found]                   ` <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com>
2010-01-23  0:48                     ` [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing for caching) Jakub Narebski
2010-02-07 21:32                     ` Jakub Narebski
2010-01-16  0:43                 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle Jakub Narebski
2010-01-16  0:58                   ` Junio C Hamano
2010-01-16  1:14                     ` Jakub Narebski
2010-01-16  1:41                       ` Junio C Hamano
2010-01-24 22:14                   ` Petr Baudis
2010-01-25  1:47                     ` Jakub Narebski
2010-01-25 20:48                       ` J.H.
2010-01-25 21:48                         ` Jakub Narebski
2010-01-15 23:49               ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header Jakub Narebski
2010-01-23 11:13           ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time Jakub Narebski
2010-01-15 23:36       ` [PATCH 3/9] gitweb: Add option to force version match Jakub Narebski
2010-01-24 21:59       ` Petr Baudis
2010-01-24 23:17         ` Jakub Narebski
2010-01-15 22:40     ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information Jakub Narebski
2010-01-15 22:30   ` [PATCH 1/9] gitweb: Load checking Jakub Narebski
2010-01-15  1:40 ` [PATCH 0/9] Gitweb caching v5 Jakub Narebski
2010-01-15  4:29   ` J.H.
2010-01-15 10:28     ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).