* [RFC PATCHv3 01/10] gitweb: href(..., -path_info => 0|1)
2010-02-16 19:36 [RFC PATCHv3 00/10] gitweb: Simple file based output caching Jakub Narebski
@ 2010-02-16 19:36 ` Jakub Narebski
2010-02-16 19:36 ` [RFC PATCHv3 02/10] gitweb/cache.pm - Very simple file based cache Jakub Narebski
` (10 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-16 19:36 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Petr Baudis, Jakub Narebski
If named boolean option -path_info is passed to href() subroutine, it
would use its value to decide whether to generate path_info URL form.
If this option is not passed, href() queries 'pathinfo' feature to
check whether to generate path_info URL (if generating path_info link
is possible at all).
href(-replay=>1, -path_info=>0) is meant to be used to generate a key
for caching gitweb output; alternate solution would be to use freeze()
from Storable (core module) on %input_params hash (or its reference),
e.g.:
$key = freeze \%input_params;
or other serialization of %input_params.
While at it document extra options/flags to href().
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
There is no change in the patch with v3, as compared to v2.
Note that in the caching patch by J.H. from "Gitweb caching v5" thread
(and top commit in git://git.kernel.org/pub/scm/git/warthog9/gitweb.git,
gitweb-ml-v5 branch) the key was generated as "$my_url?".$ENV{'QUERY_STRING'}
which wouldn't work with path_info URLs, but on the other hand gitweb
at git.kernel.org doesn't use path_info URLs (perhaps even doesn't
support them).
Using href(replay=>1,full=>1,path_info=>0) has additional advantage
over using $cgi->self_url() in that it also does not depend on
ordering of parameters in handcrafted URLs.
gitweb/gitweb.perl | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 1f6978a..97ea3ec 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -970,6 +970,10 @@ exit;
## ======================================================================
## action links
+# possible values of extra options
+# -full => 0|1 - use absolute/full URL ($my_uri/$my_url as base)
+# -replay => 1 - start from a current view (replay with modifications)
+# -path_info => 0|1 - don't use/use path_info URL (if possible)
sub href {
my %params = @_;
# default is to use -absolute url() i.e. $my_uri
@@ -986,7 +990,8 @@ sub href {
}
my $use_pathinfo = gitweb_check_feature('pathinfo');
- if ($use_pathinfo and defined $params{'project'}) {
+ if (defined $params{'project'} &&
+ (exists $params{-path_info} ? $params{-path_info} : $use_pathinfo)) {
# try to put as many parameters as possible in PATH_INFO:
# - project name
# - action
--
1.6.6.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCHv3 02/10] gitweb/cache.pm - Very simple file based cache
2010-02-16 19:36 [RFC PATCHv3 00/10] gitweb: Simple file based output caching Jakub Narebski
2010-02-16 19:36 ` [RFC PATCHv3 01/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski
@ 2010-02-16 19:36 ` Jakub Narebski
2010-02-16 19:36 ` [RFC PATCHv3 03/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski
` (9 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-16 19:36 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Petr Baudis, Jakub Narebski
This is first step towards implementing file based output (response)
caching layer that is used on such large sites as kernel.org.
This patch introduces GitwebCaching::SimpleFileCache package, which
follows Cache::Cache / CHI interface, although do not implement it
fully. The intent of following established convention for cache
interface is to be able to replace our simple file based cache,
e.g. by the one using memcached.
Like in original patch by John 'Warthog9' Hawley (J.H.) (the one this
commit intends to be incremental step to), the data is stored in the
case as-is, without adding metadata (like expiration date), and
without serialization (which means only scalar data).
To be implemented (from original patch by J.H.):
* cache expiration (based on file stats, current time and global
expiration time); currently elements in cache do not expire
* actually using this cache in gitweb, except error pages
* adaptive cache expiration, based on average system load
* optional locking interface, where only one process can update cache
(using flock)
* server-side progress indicator when waiting for filling cache,
which in turn requires separating situations (like snapshots and
other non-HTML responses) where we should not show 'please wait'
message
Possible extensions (beyond what was in original patch):
* (optionally) show information about cache utilization
* AJAX (JavaScript-based) progress indicator
* JavaScript code to update relative dates in cached output
* make cache size-aware (try to not exceed specified maximum size)
* utilize X-Sendfile header (or equivalent) to show cached data
(optional, as it makes sense only if web server supports sendfile
feature and have it enabled)
* variable expiration feature from CHI, allowing items to expire a bit
earlier than the stated expiration time to prevent cache miss
stampedes (although locking, if available, should take care of
this).
The code of GitwebCaching::SimpleFileCache package in gitweb/cache.pm
was heavily based on file-based cache in Cache::Cache package, i.e.
on Cache::FileCache, Cache::FileBackend and Cache::BaseCache, and on
file-based cache in CHI, i.e. on CHI::Driver::File and CHI::Driver
(including implementing atomic write, something that original patch
lacks). It tries to follow more modern CHI architecture, but without
requiring Moose. It is much simplified compared to both interfaces
and their file-based drivers.
This patch does not yet enable output caching in gitweb (it doesn't
have all required features yet); on the other hand it includes tests
of cache Perl API in t/t9503-gitweb-caching.sh.
Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v2:
* Updated copyright info (not fully).
* Added many comments.
* Internal: field names are now 'root' etc, not '_Root' etc.
* path_to_namespace has now shortcut for empty namespace.
* Error reporting via "... or die "Couldn't ...: $!".
* ->remove now test that file exist befor attempting unlink
* ->set returns $data
* t9503 shows $@ if command inside eval die-d.
Differences from relevant parts of J.H. patch:
* Does not use make_path, but older mkpath interface from File::Path;
this way we do not require File::Path version >= 2.0.
* Uses catfile/catdir from File::Spec (see below)
* Writes to cache file atomically, by writing to unique temporary file
and then renaming/moving this file to the cache file. This is to make
caching more robust, by preventing readers from getting partial data.
The issue this prevents should be very rare, as we write whole data
(whole page) at once, after it is generated in full, so this situation
can theoretically happen only in the case of heavy load, many clients,
and very large page (over the size of (file)system cache).
* Error reporting is done via "die", not via die_error from gitweb.
This is to avoid entangling cache.pm and gitweb.perl code (making
cache.pm dependent on gitweb). This can be improved further by
passing die handler (wrapper around die_error in case of gitweb) to
cache.
* The depth of cache hierarchy is configurable, while J.H. patch uses
hardcoded depth of 1 subdirectory. It uses unpack rather than substr.
For 06b90e786e... key digest it uses 06/b90e786e... (with cache_depth = 1)
rather than 06/06b90e786e... in J.H. patch.
* It does not have binary and non-binary cache data separately; it
does not use '.bin' suffix (extension) for binary-data cache files.
* GitwebCache::SimpleFileCache uses standard ->get, ->set, ->compute
methods, which should allow easy replacing it with CHI or Cache::Cache
cache (see later commit, adding caching support to gitweb).
* Tests of caching API in t/t9503-gitweb-caching.sh
* Better Perl style, more consistent with what gitweb itself uses.
Possible improvements:
* CHI uses fast_catpath from CHI::Util, which for Unix reduces to simple
join("/", @_). GitwebCache::SimpleFileCache uses catpath (remainder of
Cache::FileCache-based code), but gitweb itself uses just "$root/$path",
so this over-portability just wastes performance.
* We use File::Temp to generate temporary file to write data. CHI uses
instead unique ids (Data::UUID to generate initial unique id, then suffix
it with serial number, in hexadecimal) for efficiency.
Note that with locking (introduced later in series) we could write to
temporary file with pre-defined (not randomized/incremental) name.
* Separate change to t/test-lib.sh, adding exporting TEST_DIRECTORY and
TRASH_DIRECTORY in test_external to allow its use in external tests,
to a separate commit (it is required for t9503 to find cache.pm).
* Simplify constructor, and make it manipulate fields directly rather
than using accessors
gitweb/cache.pm | 306 +++++++++++++++++++++++++++++++++++++++
t/t9503-gitweb-caching.sh | 32 ++++
t/t9503/test_cache_interface.pl | 85 +++++++++++
t/test-lib.sh | 3 +
4 files changed, 426 insertions(+), 0 deletions(-)
create mode 100644 gitweb/cache.pm
create mode 100755 t/t9503-gitweb-caching.sh
create mode 100755 t/t9503/test_cache_interface.pl
diff --git a/gitweb/cache.pm b/gitweb/cache.pm
new file mode 100644
index 0000000..231b292
--- /dev/null
+++ b/gitweb/cache.pm
@@ -0,0 +1,306 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Gitweb caching engine
+#
+
+{
+# Minimalistic cache that stores data in the filesystem, without serialization
+# and currently without any kind of cache expiration (all keys last forever till
+# they got explicitely removed).
+#
+# It follows Cache::Cache and CHI interface (but does not implement it fully)
+
+package GitwebCache::SimpleFileCache;
+
+use strict;
+use warnings;
+
+use File::Path qw(mkpath);
+use File::Spec;
+use File::Temp;
+use Digest::MD5 qw(md5_hex);
+
+# by default, the cache nests all entries on the filesystem two
+# directories deep
+
+our $DEFAULT_CACHE_DEPTH = 2;
+
+# by default, the root of the cache is located in 'cache'.
+
+our $DEFAULT_CACHE_ROOT = "cache";
+
+# ......................................................................
+# constructor
+
+# The options are set by passing in a reference to a hash containing
+# any of the following keys:
+# * 'namespace'
+# The namespace associated with this cache. This allows easy separation of
+# multiple, distinct caches without worrying about key collision. Defaults
+# to '' (which does not allow for simple implementation of clear() method).
+# * 'cache_root' (Cache::FileCache compatibile),
+# 'root_dir' (CHI::Driver::File compatibile),
+# The location in the filesystem that will hold the root of the cache.
+# Defaults to 'cache', relative to gitweb.cgi directory.
+# * 'cache_depth' (Cache::FileCache compatibile),
+# 'depth' (CHI::Driver::File compatibile),
+# The number of subdirectories deep to cache object item. This should be
+# large enough that no cache directory has more than a few hundred objects.
+# Defaults to 1 unless explicitly set.
+sub new {
+ my ($proto, $p_options_hash_ref) = @_;
+
+ my $class = ref($proto) || $proto;
+ my $self = {};
+ $self = bless($self, $class);
+
+ my ($root, $depth, $ns);
+ if (defined $p_options_hash_ref) {
+ $root =
+ $p_options_hash_ref->{'cache_root'} ||
+ $p_options_hash_ref->{'root_dir'};
+ $depth =
+ $p_options_hash_ref->{'cache_depth'} ||
+ $p_options_hash_ref->{'depth'};
+ $ns = $p_options_hash_ref->{'namespace'};
+ }
+ $root = $DEFAULT_CACHE_ROOT unless defined($root);
+ $depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
+ $ns = '' unless defined($ns);
+
+ $self->set_root($root);
+ $self->set_depth($depth);
+ $self->set_namespace($ns);
+
+ return $self;
+}
+
+# ......................................................................
+# accessors
+
+# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
+
+# creates get_depth() and set_depth($depth) etc. methods
+foreach my $i (qw(depth root namespace)) {
+ my $field = $i;
+ no strict 'refs';
+ *{"get_$field"} = sub {
+ my $self = shift;
+ return $self->{$field};
+ };
+ *{"set_$field"} = sub {
+ my ($self, $value) = @_;
+ $self->{$field} = $value;
+ };
+}
+
+# ----------------------------------------------------------------------
+# utility functions and methods
+
+# Return root dir for namespace (lazily built, cached)
+sub path_to_namespace {
+ my ($self) = @_;
+
+ if (!exists $self->{'path_to_namespace'}) {
+ if ($self->{'namespace'} ne '') {
+ $self->{'path_to_namespace'} = File::Spec->catfile(
+ $self->{'root'},
+ $self->{'namespace'}
+ );
+ } else {
+ $self->{'path_to_namespace'} = $self->{'root'};
+ }
+ }
+ return $self->{'path_to_namespace'};
+}
+
+# Take an human readable key, and return file path
+sub path_to_key {
+ my ($self, $key, $dir_ref) = @_;
+
+ my @paths = ( $self->path_to_namespace() );
+
+ # Create a unique (hashed) key from human readable key
+ my $filename = md5_hex($key); # or $digester->add($key)->hexdigest()
+
+ # Split filename so that it have DEPTH subdirectories,
+ # where each subdirectory has a two-letter name
+ push @paths, unpack("(a2)".($self->{'depth'} || 1)." a*", $filename);
+ $filename = pop @paths;
+
+ # Join paths together, computing dir separately if $dir_ref was passed.
+ my $filepath;
+ if (defined $dir_ref && ref($dir_ref)) {
+ my $dir = File::Spec->catdir(@paths);
+ $filepath = File::Spec->catfile($dir, $filename);
+ $$dir_ref = $dir;
+ } else {
+ $filepath = File::Spec->catfile(@paths, $filename);
+ }
+
+ return $filepath;
+}
+
+# ----------------------------------------------------------------------
+# worker methods
+
+sub fetch {
+ my ($self, $key) = @_;
+
+ my $file = $self->path_to_key($key);
+ return undef unless (defined $file && -f $file);
+
+ # Fast slurp, adapted from File::Slurp::read, with unnecessary options removed
+ # via CHI::Driver::File (from CHI-0.33)
+ my $buf = '';
+ open my $read_fh, '<', $file
+ or return undef;
+ binmode $read_fh, ':raw';
+
+ my $size_left = -s $read_fh;
+
+ while ($size_left > 0) {
+ my $read_cnt = sysread($read_fh, $buf, $size_left, length($buf));
+ return undef unless defined $read_cnt;
+
+ last if $read_cnt == 0;
+ $size_left -= $read_cnt;
+ #last if $size_left <= 0;
+ }
+
+ close $read_fh
+ or die "Couldn't close file '$file' opened for reading: $!";
+ return $buf;
+}
+
+sub store {
+ my ($self, $key, $data) = @_;
+
+ my $dir;
+ my $file = $self->path_to_key($key, \$dir);
+ return undef unless (defined $file && defined $dir);
+
+ # ensure that directory leading to cache file exists
+ if (!-d $dir) {
+ mkpath($dir, 0, 0777)
+ or die "Couldn't mkpath '$dir': $!";
+ }
+
+ # generate a temporary file
+ my $temp = File::Temp->new(
+ #DIR => $dir,
+ TEMPLATE => "${file}_XXXXX",
+ SUFFIX => ".tmp"
+ ) or die "Couldn't create temporary file '${file}_XXXXX': $!";
+ chmod 0666, $temp
+ or die "Couldn't change permissions to 0666 for '$temp': $!";
+
+ # Fast spew, adapted from File::Slurp::write, with unnecessary options removed
+ # via CHI::Driver::File (from CHI-0.33)
+ my $write_fh = $temp;
+ binmode $write_fh, ':raw';
+
+ my $size_left = length($data);
+ my $offset = 0;
+
+ while ($size_left > 0) {
+ my $write_cnt = syswrite($write_fh, $data, $size_left, $offset);
+ return undef unless defined $write_cnt;
+
+ $size_left -= $write_cnt;
+ $offset += $write_cnt; # == length($data);
+ }
+
+ close $temp
+ or die "Couldn't close temporary file '$temp' opened for writing: $!";
+ rename($temp, $file)
+ or die "Couldn't rename temporary file '$temp' to '$file': $!";
+}
+
+# get size of an element associated with the $key (not the size of whole cache)
+sub get_size {
+ my ($self, $key) = @_;
+
+ my $path = $self->path_to_key($key)
+ or return undef;
+ if (-f $path) {
+ return -s $path;
+ }
+ return 0;
+}
+
+# ......................................................................
+# interface methods
+
+# Removing and expiring
+
+# $cache->remove($key)
+#
+# Remove the data associated with the $key from the cache.
+sub remove {
+ my ($self, $key) = @_;
+
+ my $file = $self->path_to_key($key)
+ or return undef;
+ return undef unless -f $file;
+ unlink($file)
+ or die "Couldn't remove file '$file': $!";
+}
+
+# Getting and setting
+
+# $cache->set($key, $data);
+#
+# Associates $data with $key in the cache, overwriting any existing entry.
+# Returns $data.
+sub set {
+ my ($self, $key, $data) = @_;
+
+ return unless (defined $key && defined $data);
+
+ $self->store($key, $data);
+
+ return $data;
+}
+
+# $data = $cache->get($key);
+#
+# Returns the data associated with $key. If $key does not exist
+# or has expired, returns undef.
+sub get {
+ my ($self, $key) = @_;
+
+ my $data = $self->fetch($key)
+ or return undef;
+
+ return $data;
+}
+
+# $data = $cache->compute($key, $code);
+#
+# Combines the get and set operations in a single call. Attempts to
+# get $key; if successful, returns the value. Otherwise, calls $code
+# and uses the return value as the new value for $key, which is then
+# returned.
+sub compute {
+ my ($self, $p_key, $p_coderef) = @_;
+
+ my $data = $self->get($p_key);
+ if (!defined $data) {
+ $data = $p_coderef->($self, $p_key);
+ $self->set($p_key, $data);
+ }
+
+ return $data;
+}
+
+1;
+} # end of package GitwebCache::SimpleFileCache;
+
+1;
diff --git a/t/t9503-gitweb-caching.sh b/t/t9503-gitweb-caching.sh
new file mode 100755
index 0000000..768080c
--- /dev/null
+++ b/t/t9503-gitweb-caching.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Jakub Narebski
+#
+
+test_description='caching interface to be used in gitweb'
+#test_description='caching interface used in gitweb, gitweb caching
+#
+#This test checks cache (interface) used in gitweb caching, caching
+#infrastructure and gitweb response (output) caching (the last by
+#running gitweb as CGI script from commandline).'
+
+# for now we are running only cache interface tests
+. ./test-lib.sh
+
+# this test is present in gitweb-lib.sh
+if ! test_have_prereq PERL; then
+ say 'perl not available, skipping test'
+ test_done
+fi
+
+"$PERL_PATH" -MTest::More -e 0 >/dev/null 2>&1 || {
+ say 'perl module Test::More unavailable, skipping test'
+ test_done
+}
+
+# ----------------------------------------------------------------------
+
+test_external 'GitwebCache::* Perl API (in gitweb/cache.pm)' \
+ "$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_cache_interface.pl
+
+test_done
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
new file mode 100755
index 0000000..39802b7
--- /dev/null
+++ b/t/t9503/test_cache_interface.pl
@@ -0,0 +1,85 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use Test::More;
+use Data::Dumper;
+
+# test source version; there is no installation target for gitweb
+my $cache_pm = "$ENV{TEST_DIRECTORY}/../gitweb/cache.pm";
+
+unless (-f "$cache_pm") {
+ plan skip_all => "$cache_pm not found";
+}
+
+# it is currently not a proper Perl module, so we use 'do FILE'
+# instead of: BEGIN { use_ok('GitwebCache::SimpleFileCache') }
+my $return = do "$cache_pm";
+ok(!$@, "parse gitweb/cache.pm")
+ or diag("parse error:\n", $@);
+ok(defined $return, "do gitweb/cache.pm");
+ok($return, "run gitweb/cache.pm");
+
+
+# Test creating a cache
+#
+my $cache = new_ok('GitwebCache::SimpleFileCache',
+ [ { 'cache_root' => 'cache', 'cache_depth' => 2 } ]);
+
+# Test that default values are defined
+#
+ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
+ '$DEFAULT_CACHE_ROOT defined');
+ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
+ '$DEFAULT_CACHE_DEPTH defined');
+
+# Test accessors and default values for cache
+#
+SKIP: {
+ skip 'default values not defined', 3
+ unless ($GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT &&
+ $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH);
+
+ is($cache->get_namespace(), '', "default namespace is ''");
+ is($cache->get_root(), $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
+ "default cache root is '$GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT'");
+ cmp_ok($cache->get_depth(), '==', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
+ "default cache depth is $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH");
+}
+
+# Test the getting, setting, and removal of a cached value
+# (Cache::Cache interface)
+#
+my $key = 'Test Key';
+my $value = 'Test Value';
+can_ok($cache, qw(get set remove));
+#ok(!defined($cache->get($key)), 'get before set')
+# or diag("get returned '", $cache->get($key), "' for $key");
+$cache->set($key, $value);
+is($cache->get($key), $value, 'get after set, returns cached value');
+cmp_ok($cache->get_size($key), '>', 0, 'get_size after set, is greater than 0');
+$cache->remove($key);
+ok(!defined($cache->get($key)), 'get after remove, is undefined');
+eval { $cache->remove('Not-Existent Key'); };
+ok(!$@, 'remove on non-existent key doesn\'t die');
+diag($@) if $@;
+
+# Test the getting and setting of a cached value
+# (CHI interface)
+#
+my $call_count = 0;
+sub get_value {
+ $call_count++;
+ return $value;
+}
+can_ok($cache, qw(compute));
+is($cache->compute($key, \&get_value), $value, 'compute 1st time (set)');
+is($cache->compute($key, \&get_value), $value, 'compute 2nd time (get)');
+is($cache->compute($key, \&get_value), $value, 'compute 3rd time (get)');
+cmp_ok($call_count, '==', 1, 'get_value() is called once from compute');
+
+done_testing();
+
+print Dumper($cache);
diff --git a/t/test-lib.sh b/t/test-lib.sh
index afd3053..a13bdd5 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -453,6 +453,9 @@ test_external () {
# Announce the script to reduce confusion about the
# test output that follows.
say_color "" " run $test_count: $descr ($*)"
+ # Export TEST_DIRECTORY and TRASH_DIRECTORY
+ # to be able to use them in script
+ export TEST_DIRECTORY TRASH_DIRECTORY
# Run command; redirect its stderr to &4 as in
# test_run_, but keep its stdout on our stdout even in
# non-verbose mode.
--
1.6.6.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCHv3 03/10] gitweb/cache.pm - Stat-based cache expiration
2010-02-16 19:36 [RFC PATCHv3 00/10] gitweb: Simple file based output caching Jakub Narebski
2010-02-16 19:36 ` [RFC PATCHv3 01/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski
2010-02-16 19:36 ` [RFC PATCHv3 02/10] gitweb/cache.pm - Very simple file based cache Jakub Narebski
@ 2010-02-16 19:36 ` Jakub Narebski
2010-02-16 19:36 ` [RFC PATCHv3 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching Jakub Narebski
` (8 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-16 19:36 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Petr Baudis, Jakub Narebski
Add stat-based cache expiration to file-based GitwebCache::SimpleFileCache.
Contrary to the way other caching interfaces such as Cache::Cache and CHI
do it, the time cache element expires in is _global_ value associated with
cache instance, and is not local property of cache entry. (Currently cache
entry does not store any metadata associated with entry... which means that
there is no need for serialization / marshalling / freezing and thawing.)
Default expire time is -1, which means never expire.
To check if cache entry is expired, GitwebCache::SimpleFileCache compares
difference between mtime (last modify time) of a cache file and current time
with (global) time to expire. It is done using CHI-compatibile is_valid()
method.
Add some tests checking that expiring works correctly (on the level of API).
To be implemented (from original patch by J.H.):
* actually using this cache in gitweb, except error pages
* adaptive cache expiration, based on average system load
* optional locking interface, where only one process can update cache
(using flock)
* server-side progress indicator when waiting for filling cache,
which in turn requires separating situations (like snapshots and
other non-HTML responses) where we should not show 'please wait'
message
Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Now that caching engine supports cache expiration, we can add caching
support to gitweb.
Differences from v2:
* More comments explaining code
* Fixed bug in ->is_valid(): stat(_) should be used after '-f $path',
and not after ->get_expires_in()... which runs -f '/proc/loadavg'
(otherwise stat(_) is about '/proc/loadavg' and not $path).
Differences from relevant parts of J.H. patch:
* It simply uses stat on last accessed file (checked for existence),
instead of opening file for reading (without error detection!), running
stat on it, and then closing it.
* One can use expire time of -1 (or to be more exact less than 0) to set
expire time to never (cache is considered fresh forever, does not expire).
* There are some tests in t9503 of cache expiration (one of those assume
that expire time of one day would be not expired in get after set).
gitweb/cache.pm | 39 +++++++++++++++++++++++++++++++++++++--
t/t9503/test_cache_interface.pl | 10 ++++++++++
2 files changed, 47 insertions(+), 2 deletions(-)
diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index 231b292..7f1bd5f 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -53,6 +53,10 @@ our $DEFAULT_CACHE_ROOT = "cache";
# The number of subdirectories deep to cache object item. This should be
# large enough that no cache directory has more than a few hundred objects.
# Defaults to 1 unless explicitly set.
+# * 'default_expires_in' (Cache::Cache compatibile),
+# 'expires_in' (CHI compatibile) [seconds]
+# The expiration time for objects place in the cache.
+# Defaults to -1 (never expire) if not explicitly set.
sub new {
my ($proto, $p_options_hash_ref) = @_;
@@ -60,7 +64,7 @@ sub new {
my $self = {};
$self = bless($self, $class);
- my ($root, $depth, $ns);
+ my ($root, $depth, $ns, $expires_in);
if (defined $p_options_hash_ref) {
$root =
$p_options_hash_ref->{'cache_root'} ||
@@ -69,14 +73,19 @@ sub new {
$p_options_hash_ref->{'cache_depth'} ||
$p_options_hash_ref->{'depth'};
$ns = $p_options_hash_ref->{'namespace'};
+ $expires_in =
+ $p_options_hash_ref->{'default_expires_in'} ||
+ $p_options_hash_ref->{'expires_in'};
}
$root = $DEFAULT_CACHE_ROOT unless defined($root);
$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
$ns = '' unless defined($ns);
+ $expires_in = -1 unless defined($expires_in); # <0 means never
$self->set_root($root);
$self->set_depth($depth);
$self->set_namespace($ns);
+ $self->set_expires_in($expires_in);
return $self;
}
@@ -87,7 +96,7 @@ sub new {
# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
# creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace)) {
+foreach my $i (qw(depth root namespace expires_in)) {
my $field = $i;
no strict 'refs';
*{"get_$field"} = sub {
@@ -253,6 +262,31 @@ sub remove {
or die "Couldn't remove file '$file': $!";
}
+# $cache->is_valid($key)
+#
+# Returns a boolean indicating whether $key exists in the cache
+# and has not expired (global per-cache 'expires_in').
+sub is_valid {
+ my ($self, $key) = @_;
+
+ my $path = $self->path_to_key($key);
+
+ # does file exists in cache?
+ return 0 unless -f $path;
+ # reuse stat structure
+ my $mtime = (stat(_))[9]
+ or die "Couldn't stat file '$path': $!";
+
+ # expire time can be set to never
+ my $expires_in = $self->get_expires_in();
+ return 1 unless (defined $expires_in && $expires_in >= 0);
+
+ # is file expired?
+ my $now = time();
+
+ return (($now - $mtime) < $expires_in);
+}
+
# Getting and setting
# $cache->set($key, $data);
@@ -276,6 +310,7 @@ sub set {
sub get {
my ($self, $key) = @_;
+ return undef unless $self->is_valid($key);
my $data = $self->fetch($key)
or return undef;
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 39802b7..ec92207 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -80,6 +80,16 @@ is($cache->compute($key, \&get_value), $value, 'compute 2nd time (get)');
is($cache->compute($key, \&get_value), $value, 'compute 3rd time (get)');
cmp_ok($call_count, '==', 1, 'get_value() is called once from compute');
+# Test cache expiration for 'expire now'
+#
+$cache->set_expires_in(60*60*24); # set expire time to 1 day
+cmp_ok($cache->get_expires_in(), '>', 0, '"expires in" is greater than 0');
+is($cache->get($key), $value, 'get returns cached value (not expired)');
+$cache->set_expires_in(0);
+is($cache->get_expires_in(), 0, '"expires in" is set to now (0)');
+$cache->set($key, $value);
+ok(!defined($cache->get($key)), 'cache is expired');
+
done_testing();
print Dumper($cache);
--
1.6.6.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCHv3 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching
2010-02-16 19:36 [RFC PATCHv3 00/10] gitweb: Simple file based output caching Jakub Narebski
` (2 preceding siblings ...)
2010-02-16 19:36 ` [RFC PATCHv3 03/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski
@ 2010-02-16 19:36 ` Jakub Narebski
2010-02-16 19:36 ` [RFC PATCHv3 05/10] gitweb/cache.pm - Adaptive cache expiration time Jakub Narebski
` (7 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-16 19:36 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Petr Baudis, Jakub Narebski
This commit actually adds output caching to gitweb, as we have now
minimal features required for it in GitwebCache::SimpleFileCache
(a 'dumb' but fast file-based cache engine). To enable cache you need
at least set $caching_enabled to true in gitweb config, and copy
gitweb/cache.pm alongside generated gitweb.cgi - this is described
in more detail in the new "Gitweb caching" section in gitweb/README.
Gitweb itself uses directly only cache_fetch, to get page from cache
or to generate page and save it to cache, and cache_stop, to be used
in die_error subroutine, as currently error pages are not cached.
The cache_fetch subroutine captures output by using select(FILEHANDLE)
to change default filehandle for output. This means that a "print" or
a "printf" (or a "write") in gitweb source without a filehandle would
be captured. To change mode of filehandle used for capturing correctly,
"binmode STDOUT, <mode>" was changed to "binmode select(), <mode>"; this
has no change if $caching_enabled is false, and capturing is turned off.
Capturing is done using in-memory file held in Perl scalar.
Using select(FILEHANDLE) is a bit fragile as a method of capturing
output, as it assumes that we always use "print" or "printf" without
filehandle, and use select() which returns default filehandle for
output in place of explicit STDOUT. On the other hand it has the
advantage of being simple. Alternate solutions include using tie
(like in CGI::Cache), or using PerlIO layers - but that requires
non-standard PerlIO::Util module.
It is assumed that data is saved to cache _converted_, and should
therefore be read from cache and printed to STDOUT in ':raw' (binary)
mode.
Enabling caching causes the following additional changes to gitweb
output:
* Disables content-type negotiation (choosing between 'text/html'
mimetype and 'application/xhtml+xml') when caching, as there is no
content-type negotiation done when retrieving page from cache.
Use 'text/html' mimetype that can be used by all browsers.
* Disable timing info (how much time it took to generate original
page, and how many git commands it took), and in its place show when
page was originally generated (in GMT / UTC timezone).
Add basic tests of caching support to t9500-gitweb-standalone-no-errors
test: set $caching_enabled to true and check for errors for first time
run (generating cache) and second time run (retrieving from cache) for a
single view - summary view for a project.
In t9503-gitweb-caching , test that cache_fetch behaves correctly,
namely that it saves and restores action output in cache, and that it
prints generated output or cached output.
To be implemented (from original patch by J.H.):
* adaptive cache expiration, based on average system load
* optional locking interface, where only one process can update cache
(using flock)
* server-side progress indicator when waiting for filling cache,
which in turn requires separating situations (like snapshots and
other non-HTML responses) where we should not show 'please wait'
message
Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v2:
* "Capturing" gitweb output is done via changing default filehandle for
'print' and 'printf' via select(FILEHANDLE), and using select() in place
of STDOUT, e.g. in calls to binmode. Not manipulating *STDOUT (which
doesn't work correctly), not using PerlIO layers (which requires
non-standard PerlIO::Util module).
* cache_fetch changed signature from cache_fetch($cache, $action) to
cache_fetch($cache, $key, $code), and is called as
cache_fetch($cache, $output_key, $actions{$action});
and not as
cache_fetch($cache, $action);
This decouples cache.pm from gitweb implementation, and allows to use
'require $cache_pm;' in place of 'do $cache_pm;', and allows to simplify
code in t/t9503/test_cache_interface.pl.
* cache_fetch code got reordered a bit to avoid duplicating code (printing
captured and saved gitweb output or data from cache).
* cache.pm is loaded using 'require' instead of 'do'. To allow this the
directory where gitweb.cgi is in is added to @INC via 'use lib'.
Differences from relevant parts of J.H. patch:
* cache_fetch() subroutine is much, much simpler. Well, it lacks most of
the features of original cache_fetch() by J.H.: it doesn't have adaptive
cache lifetime, nor locking to prevent 'stampeding herd' problem, nor
serving stale data when waiting for cache regeneration, nor background
data generation, nor activity indicator... but the cache_fetch() itself
doesn't change much, as those features are added mainly via methods
and subroutines cache_fetch() calls.
* Capturing gitweb output is done without need to modify gitweb to either
save generated output into $output variable and print it or save in cache
after it is generated in full (original J.H. patch in "Gitweb caching v2"),
or changing all print statements to print to explicit filehandle which
points to STDOUT if caching is disabled and to in-memory file if caching
is enabled (modified J.H. patch in "Gitweb caching v5").
* It introduces $cache_pm variable, to be able to test caching in
t/t9500-gitweb-standalone-no-errors.sh, but also to be able to install
cache.pm in some other place than along gitweb.cgi.
* cache_fetch() changed signature, and it does not longer depends on
subroutines (like href()) and variables (like %actions) defined in
gitweb.perl. This allows to simply 'require $cache_pm' instead of
'do $cache_pm' (which also means that it can be installed as module
in PERL5LIB etc.).
* Instead of using "binary" (sic!) valued $cache_enable (which means 0 or 1
valued $cache_enable), a set of two variables is used. The $cache
variable can be used to select alternate caching engine / caching class.
The $caching_enabled variable is used to actually enable/disable cache.
* The information about the time when page was generated is shown only if
'timed' feature is enabled in gitweb config, and it is shown in place of
usual time it took to generate page (shown when caching is not enabled).
This means that change to gitweb/gitweb.css was not needed.
* cache_fetch() is run only when $caching_enabled. Some of cache
initializations are in gitweb.perl, and not at beginning of cache_fetch()
* Cache options are contained in %cache_options hash, instead of individual
global variables (which were using non-Perlish camelCase notation).
* There is information about caching in gitweb in gitweb/README
* There are tests of gitweb caching in t9500, and caching API in t9503
gitweb/README | 70 ++++++++++++++++++
gitweb/cache.pm | 66 +++++++++++++++++
gitweb/gitweb.perl | 122 +++++++++++++++++++++++++++-----
t/gitweb-lib.sh | 2 +
t/t9500-gitweb-standalone-no-errors.sh | 19 +++++
t/t9503/test_cache_interface.pl | 56 +++++++++++++++
6 files changed, 318 insertions(+), 17 deletions(-)
diff --git a/gitweb/README b/gitweb/README
index 6c2c8e1..53759fc 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -233,6 +233,11 @@ not include variables usually directly set during build):
If server load exceed this value then return "503 Service Unavaliable" error.
Server load is taken to be 0 if gitweb cannot determine its value. Set it to
undefined value to turn it off. The default is 300.
+ * $caching_enabled
+ If true, gitweb would use caching to speed up generating response.
+ Currently supported is only output (response) caching. See "Gitweb caching"
+ section below for details on how to configure and customize caching.
+ The default is false (caching is disabled).
Projects list file format
@@ -305,6 +310,71 @@ You can use the following files in repository:
descriptions.
+Gitweb caching
+~~~~~~~~~~~~~~
+
+Currently gitweb supports only output (HTTP response) caching, similar
+to the one used on git.kernel.org. To turn it on, set $caching_enabled
+variable to true value in gitweb config file, i.e.:
+
+ our $caching_enabled = 1;
+
+You can choose what caching engine should gitweb use by setting $cache
+variable either to _initialized_ instance of cache interface, e.g.:
+
+ use CHI;
+ our $cache = CHI->new( driver => 'Memcached',
+ servers => [ "10.0.0.15:11211", "10.0.0.15:11212" ],
+ l1_cache => { driver => 'FastMmap', root_dir => '/var/cache/gitweb' }
+ );
+
+Alternatively you can set $cache variable to the name of cache class, in
+which case caching engine should support Cache::Cache or CHI names for cache
+config (see below), and ignore unrecognized options, e.g.:
+
+ use Cache::FileCache;
+ our $cache = 'Cache::FileCache';
+
+Such caching engine should implement (at least) ->get($key) and
+->set($key, $data) methods (Cache::Cache and CHI compatible interface).
+
+If $cache is left unset (if it is left undefined), then gitweb would use
+GitwebCache::SimpleFileCache from cache.pm as caching engine. This engine
+is 'dumb' (but fast) file based caching layer, currently without any support
+for cache size limiting, or even removing expired / grossly expired entries.
+It has therefore the downside of requiring a huge amount of disk space if
+there are a number of repositories involved. It is not uncommon for
+git.kernel.org to have on the order of 80G - 120G accumulate over the course
+of a few months. It is therefore recommended that the cache directory be
+periodically completely deleted; this operation is safe to perform.
+Suggested mechanism (substitute $cachedir for actual path to gitweb cache):
+
+ # mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush
+
+For gitweb to use caching it must find 'cache.pm' file, which contains
+GitwebCache::SimpleFileCache and cache-related subroutines, from which
+cache_fetch and cache_stop are used in gitweb itself. Location of
+'cache.pm' file is provided in $cache_pm variable; if it is relative path,
+it is relative to the directory gitweb is run from. Default value of
+$cache_pm assumes that 'cache.pm' is copied to the same directory as
+'gitweb.cgi'.
+
+Currently 'cache.pm' is not a proper Perl module, because it is not
+encapsulated / it is not separated from details of gitweb. That is why it
+is sourced using 'do "$cache_pm"', rather than with "use" or "require"
+operators.
+
+Site-wide cache options are defined in %cache_options hash. Those options
+apply only when $cache is unset (GitwebCache::SimpleFileCache is used), or
+if $cache is name of cache class (e.g. $cache = 'Cache::FileCache'). You
+can override cache options in gitweb config, e.g.:
+
+ $cache_options{'expires_in'} = 60; # 60 seconds = 1 minute
+
+Please read comments for %cache_options entries in gitweb/gitweb.perl for
+description of available cache options.
+
+
Webserver configuration
-----------------------
diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index 7f1bd5f..151e029 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -338,4 +338,70 @@ sub compute {
1;
} # end of package GitwebCache::SimpleFileCache;
+our $oldfh = select();
+our $data_fh;
+
+# Start capturing data (STDOUT)
+# (printed using 'print <sth>' or 'printf <sth>')
+sub cache_start {
+ my $data_ref = shift;
+
+ $$data_ref = '';
+ $data_fh = undef;
+
+ open $data_fh, '>', $data_ref
+ or die "Couldn't open in-memory file: $!";
+ $oldfh = select($data_fh);
+}
+
+# Stop capturing data (required for die_error)
+sub cache_stop {
+ # return if we didn't start capturing
+ return unless defined $data_fh;
+
+ select($oldfh);
+ close $data_fh
+ or die "Couldn't close in-memory file: $!";
+}
+
+# Wrap caching data; capture only STDOUT
+# (works only for 'print <sth>' and 'printf <sth>')
+sub cache_capture (&) {
+ my $code = shift;
+ my $data;
+
+ cache_start(\$data);
+ $code->();
+ cache_stop();
+
+ return $data;
+}
+
+# cache_fetch($cache, $key, $action_code);
+#
+# Attempts to get $key from $cache; if successful, prints the value.
+# Otherwise, calls $action_code, capture its output, use the captured output
+# as the new value for $key in $cache, then prints captured output.
+sub cache_fetch {
+ my ($cache, $key, $code) = @_;
+
+ my $data = $cache->get($key);
+
+ if (!defined $data) {
+ $data = cache_capture {
+ $code->();
+ };
+ $cache->set($key, $data) if defined $data;
+ }
+
+ if (defined $data) {
+ # print cached data
+ #binmode STDOUT, ':raw';
+ #print STDOUT $data;
+ # for t9503 test:
+ binmode select(), ':raw';
+ print $data;
+ }
+}
+
1;
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 97ea3ec..7073e1b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -227,6 +227,44 @@ our %avatar_size = (
# Leave it undefined (or set to 'undef') to turn off load checking.
our $maxload = 300;
+# This enables/disables the caching layer in gitweb. Currently supported
+# is only output (response) caching, similar to the one used on git.kernel.org.
+our $caching_enabled = 0;
+# Set to _initialized_ instance of cache interface implementing (at least)
+# get($key) and set($key, $data) methods (Cache::Cache and CHI interfaces).
+# If unset, GitwebCache::SimpleFileCache would be used, which is 'dumb'
+# (but fast) file based caching layer, currently without any support for
+# cache size limiting. It is therefore recommended that the cache directory
+# be periodically completely deleted; this operation is safe to perform.
+# Suggested mechanism:
+# mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush
+our $cache;
+# Locations of 'cache.pm' file; if it is relative path, it is relative to
+# the directory gitweb is run from
+our $cache_pm = 'cache.pm';
+# You define site-wide cache options defaults here; override them with
+# $GITWEB_CONFIG as necessary.
+our %cache_options = (
+ # The location in the filesystem that will hold the root of the cache.
+ # This directory will be created as needed (if possible) on the first
+ # cache set. Note that either this directory must exists and web server
+ # has to have write permissions to it, or web server must be able to
+ # create this directory.
+ # Possible values:
+ # * 'cache' (relative to gitweb),
+ # * File::Spec->catdir(File::Spec->tmpdir(), 'gitweb-cache'),
+ # * '/var/cache/gitweb' (FHS compliant, requires being set up),
+ 'cache_root' => 'cache',
+ # The number of subdirectories deep to cache object item. This should be
+ # large enough that no cache directory has more than a few hundred
+ # objects. Each non-leaf directory contains up to 256 subdirectories
+ # (00-ff). Must be larger than 0.
+ 'cache_depth' => 1,
+ # The (global) expiration time for objects placed in the cache, in seconds.
+ 'expires_in' => 20,
+);
+
+
# You define site-wide feature defaults here; override them with
# $GITWEB_CONFIG as necessary.
our %feature = (
@@ -964,7 +1002,42 @@ if ($action !~ m/^(?:opml|project_list|project_index)$/ &&
!$project) {
die_error(400, "Project needed");
}
-$actions{$action}->();
+
+if ($caching_enabled) {
+ die_error(500, 'Caching enabled and "'.esc_path($cache_pm).'" not found')
+ unless -f $cache_pm;
+ # __DIR__ is taken from Dir::Self __DIR__ fragment
+ sub __DIR__ () {
+ File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
+ }
+ use lib __DIR__;
+ require $cache_pm;
+
+ # $cache might be initialized (instantiated) cache, i.e. cache object,
+ # or it might be name of class, or it might be undefined
+ unless (defined $cache && ref($cache)) {
+ $cache ||= 'GitwebCache::SimpleFileCache';
+ $cache = $cache->new({
+ %cache_options,
+ #'cache_root' => '/tmp/cache',
+ #'cache_depth' => 2,
+ #'expires_in' => 20, # in seconds (CHI compatibile)
+ # (Cache::Cache compatibile initialization)
+ 'default_expires_in' => $cache_options{'expires_in'},
+ # (CHI compatibile initialization)
+ 'root_dir' => $cache_options{'cache_root'},
+ 'depth' => $cache_options{'cache_depth'},
+ });
+ }
+
+ # human readable key identifying gitweb output
+ my $output_key = href(-replay => 1, -full => 1, -path_info => 0);
+
+ cache_fetch($cache, $output_key, $actions{$action});
+} else {
+ $actions{$action}->();
+}
+
exit;
## ======================================================================
@@ -3169,7 +3242,9 @@ sub git_header_html {
# 'application/xhtml+xml', otherwise send it as plain old 'text/html'.
# we have to do this because MSIE sometimes globs '*/*', pretending to
# support xhtml+xml but choking when it gets what it asked for.
- if (defined $cgi->http('HTTP_ACCEPT') &&
+ # Disable content-type negotiation when caching (use mimetype good for all).
+ if (!$caching_enabled &&
+ defined $cgi->http('HTTP_ACCEPT') &&
$cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ &&
$cgi->Accept('application/xhtml+xml') != 0) {
$content_type = 'application/xhtml+xml';
@@ -3342,17 +3417,25 @@ sub git_footer_html {
}
print "</div>\n"; # class="page_footer"
- if (defined $t0 && gitweb_check_feature('timed')) {
+ # timing info doesn't make much sense with output (response) caching,
+ # so when caching is enabled gitweb prints the time of page generation
+ if ((defined $t0 || $caching_enabled) &&
+ gitweb_check_feature('timed')) {
print "<div id=\"generating_info\">\n";
- print 'This page took '.
- '<span id="generating_time" class="time_span">'.
- Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
- ' seconds </span>'.
- ' and '.
- '<span id="generating_cmd">'.
- $number_of_git_cmds.
- '</span> git commands '.
- " to generate.\n";
+ if ($caching_enabled) {
+ print 'This page was generated at '.
+ gmtime( time() )." GMT\n";
+ } else {
+ print 'This page took '.
+ '<span id="generating_time" class="time_span">'.
+ Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
+ ' seconds </span>'.
+ ' and '.
+ '<span id="generating_cmd">'.
+ $number_of_git_cmds.
+ '</span> git commands '.
+ " to generate.\n";
+ }
print "</div>\n"; # class="page_footer"
}
@@ -3402,6 +3485,10 @@ sub die_error {
500 => '500 Internal Server Error',
503 => '503 Service Unavailable',
);
+
+ # Do not cache error pages (die_error() uses 'exit')
+ cache_stop() if ($caching_enabled);
+
git_header_html($http_responses{$status});
print <<EOF;
<div class="page_body">
@@ -5050,7 +5137,8 @@ sub git_blame_common {
or print "ERROR $!\n";
print 'END';
- if (defined $t0 && gitweb_check_feature('timed')) {
+ if (!$caching_enabled &&
+ defined $t0 && gitweb_check_feature('timed')) {
print ' '.
Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
' '.$number_of_git_cmds;
@@ -5309,9 +5397,9 @@ sub git_blob_plain {
($sandbox ? 'attachment' : 'inline')
. '; filename="' . $save_as . '"');
local $/ = undef;
- binmode STDOUT, ':raw';
+ binmode select(), ':raw';
print <$fd>;
- binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+ binmode select(), ':utf8'; # as set at the beginning of gitweb.cgi
close $fd;
}
@@ -5591,9 +5679,9 @@ sub git_snapshot {
open my $fd, "-|", $cmd
or die_error(500, "Execute git-archive failed");
- binmode STDOUT, ':raw';
+ binmode select(), ':raw';
print <$fd>;
- binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+ binmode select(), ':utf8'; # as set at the beginning of gitweb.cgi
close $fd;
}
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index 5a734b1..b7c2937 100644
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -27,6 +27,8 @@ our \$export_ok = '';
our \$strict_export = '';
our \$maxload = undef;
+our \$cache_pm = '$TEST_DIRECTORY/../gitweb/cache.pm';
+
EOF
cat >.git/description <<EOF
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 2fc7fdb..41c1119 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -639,4 +639,23 @@ test_expect_success \
gitweb_run "p=.git;a=summary"'
test_debug 'cat gitweb.log'
+# ----------------------------------------------------------------------
+# caching
+
+cat >>gitweb_config.perl <<\EOF
+$caching_enabled = 1;
+$cache_options{'expires_in'} = -1; # never expire cache for tests
+EOF
+rm -rf cache
+
+test_expect_success \
+ 'caching enabled (project summary, first run)' \
+ 'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
+ 'caching enabled (project summary, second run)' \
+ 'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
test_done
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index ec92207..7c7f00c 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -7,9 +7,17 @@ use strict;
use Test::More;
use Data::Dumper;
+# Modules that could have been used to capture output for testing cache_fetch
+#use Capture::Tiny;
+#use Test::Output qw(:stdout);
+# Modules that could have been used in $cache_pm for cache_fetch
+#use IO::Capture
+
# test source version; there is no installation target for gitweb
my $cache_pm = "$ENV{TEST_DIRECTORY}/../gitweb/cache.pm";
+# ......................................................................
+
unless (-f "$cache_pm") {
plan skip_all => "$cache_pm not found";
}
@@ -22,6 +30,7 @@ ok(!$@, "parse gitweb/cache.pm")
ok(defined $return, "do gitweb/cache.pm");
ok($return, "run gitweb/cache.pm");
+# ......................................................................
# Test creating a cache
#
@@ -89,6 +98,53 @@ $cache->set_expires_in(0);
is($cache->get_expires_in(), 0, '"expires in" is set to now (0)');
$cache->set($key, $value);
ok(!defined($cache->get($key)), 'cache is expired');
+$cache->set_expires_in(-1);
+
+# ......................................................................
+
+# Prepare for testing cache_fetch from $cache_pm
+my $action_output = <<'EOF';
+# This is data to be cached and shown
+EOF
+my $cached_output = <<"EOF";
+$action_output# (version recovered from cache)
+EOF
+
+sub action {
+ print $action_output;
+}
+
+
+# Catch output printed by cache_fetch
+# (only for 'print <sth>' and 'printf <sth>')
+sub capture_cache_fetch_output {
+ my $test_data = '';
+
+ open my $test_data_fh, '>', \$test_data;
+ my $oldfh = select($test_data_fh);
+
+ cache_fetch($cache, $key, \&action);
+
+ select($oldfh);
+ close $test_data_fh;
+
+ return $test_data;
+}
+
+# clean state
+$cache->set_expires_in(-1);
+$cache->remove($key);
+my $test_data;
+
+# first time (if there is no cache) generates cache entry
+$test_data = capture_cache_fetch_output();
+is($test_data, $action_output, 'action output is printed (generated)');
+is($cache->get($key), $action_output, 'action output is saved in cache (generated)');
+
+# second time (if cache is set/valid) reads from cache
+$cache->set($key, $cached_output);
+$test_data = capture_cache_fetch_output();
+is($test_data, $cached_output, 'action output is printed (from cache)');
done_testing();
--
1.6.6.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCHv3 05/10] gitweb/cache.pm - Adaptive cache expiration time
2010-02-16 19:36 [RFC PATCHv3 00/10] gitweb: Simple file based output caching Jakub Narebski
` (3 preceding siblings ...)
2010-02-16 19:36 ` [RFC PATCHv3 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching Jakub Narebski
@ 2010-02-16 19:36 ` Jakub Narebski
2010-02-16 19:36 ` [RFC PATCHv3 06/10] gitweb: Use CHI compatibile (compute method) caching Jakub Narebski
` (6 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-16 19:36 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Petr Baudis, Jakub Narebski
Add to GitwebCache::SimpleFileCache support for adaptive lifetime
(cache expiration) control. Cache lifetime can be increased or
decreased by any factor, e.g. load average, through the definition
of the 'check_load' callback.
Note that using ->set_expires_in, or unsetting 'check_load' via
->set_check_load(undef) turns off adaptive caching.
Make gitweb automatically adjust cache lifetime by load, using
get_loadavg() function. Define and describe default parameters for
dynamic (adaptive) cache expiration time control.
There are some very basic tests of dynamic expiration time in t9503,
namely checking if dynamic expire time is within given upper and lower
bounds.
To be implemented (from original patch by J.H.):
* optional locking interface, where only one process can update cache
(using flock)
* server-side progress indicator when waiting for filling cache,
which in turn requires separating situations (like snapshots and
other non-HTML responses) where we should not show 'please wait'
message
Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v2:
* Comments about new parameters for the constructor
* More comments
* Access fields directly ($self->{'check_load'}, instead of using accessors
($self->get_check_load()).
* Add ->check_load() method (note: does not check if 'check_load' exists).
Differences from relevant parts of J.H. patch:
* 'increase_factor' is configurable rather than hardcoded
* 'check_load' is passed via conctructor parameter; gitweb by default sets
it to \&get_loadavg. This means that cache.pm is not entangled with
gitweb.
* options are stored in %cache_options, e.g. as 'expires_max' (compatible
with Cache::Adaptive), and not as global variables with un-Perlish
camelCase names like $maxCacheTime.
* API tests
Possible improvements:
* Turn off adaptive cache lifetime in t9503 test by unsetting
check_load, and not only by using ->set_expires_in()
gitweb/cache.pm | 80 ++++++++++++++++++++++++++++++++++++--
gitweb/gitweb.perl | 27 ++++++++++++-
t/t9503/test_cache_interface.pl | 22 +++++++++++
3 files changed, 122 insertions(+), 7 deletions(-)
diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index 151e029..dcddd28 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -57,6 +57,22 @@ our $DEFAULT_CACHE_ROOT = "cache";
# 'expires_in' (CHI compatibile) [seconds]
# The expiration time for objects place in the cache.
# Defaults to -1 (never expire) if not explicitly set.
+# Sets 'expires_min' to given value.
+# * 'expires_min' [seconds]
+# The minimum expiration time for objects in cache (e.g. with 0% CPU load).
+# Used as lower bound in adaptive cache lifetime / expiration.
+# Defaults to 20 seconds; 'expires_in' sets it also.
+# * 'expires_max' [seconds]
+# The maximum expiration time for objects in cache.
+# Used as upper bound in adaptive cache lifetime / expiration.
+# Defaults to 1200 seconds, if not set;
+# defaults to 'expires_min' if 'expires_in' is used.
+# * 'check_load'
+# Subroutine (code) used for adaptive cache lifetime / expiration.
+# If unset, adaptive caching is turned off; defaults to unset.
+# * 'increase_factor' [seconds / 100% CPU load]
+# Factor multiplying 'check_load' result when calculating cache lietime.
+# Defaults to 60 seconds for 100% SPU load ('check_load' returning 1.0).
sub new {
my ($proto, $p_options_hash_ref) = @_;
@@ -64,7 +80,8 @@ sub new {
my $self = {};
$self = bless($self, $class);
- my ($root, $depth, $ns, $expires_in);
+ my ($root, $depth, $ns);
+ my ($expires_min, $expires_max, $increase_factor, $check_load);
if (defined $p_options_hash_ref) {
$root =
$p_options_hash_ref->{'cache_root'} ||
@@ -73,19 +90,31 @@ sub new {
$p_options_hash_ref->{'cache_depth'} ||
$p_options_hash_ref->{'depth'};
$ns = $p_options_hash_ref->{'namespace'};
- $expires_in =
+ $expires_min =
+ $p_options_hash_ref->{'expires_min'} ||
$p_options_hash_ref->{'default_expires_in'} ||
$p_options_hash_ref->{'expires_in'};
+ $expires_max =
+ $p_options_hash_ref->{'expires_max'};
+ $increase_factor = $p_options_hash_ref->{'expires_factor'};
+ $check_load = $p_options_hash_ref->{'check_load'};
}
$root = $DEFAULT_CACHE_ROOT unless defined($root);
$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
$ns = '' unless defined($ns);
- $expires_in = -1 unless defined($expires_in); # <0 means never
+ $expires_min = 20 unless defined($expires_min);
+ $expires_max = $expires_min
+ if (!defined($expires_max) && exists $p_options_hash_ref->{'expires_in'});
+ $expires_max = 1200 unless (defined($expires_max));
+ $increase_factor = 60 unless defined($increase_factor);
$self->set_root($root);
$self->set_depth($depth);
$self->set_namespace($ns);
- $self->set_expires_in($expires_in);
+ $self->set_expires_min($expires_min);
+ $self->set_expires_max($expires_max);
+ $self->set_increase_factor($increase_factor);
+ $self->set_check_load($check_load);
return $self;
}
@@ -96,7 +125,7 @@ sub new {
# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
# creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace expires_in)) {
+foreach my $i (qw(depth root namespace expires_min expires_max increase_factor check_load)) {
my $field = $i;
no strict 'refs';
*{"get_$field"} = sub {
@@ -109,6 +138,47 @@ foreach my $i (qw(depth root namespace expires_in)) {
};
}
+# ......................................................................
+# pseudo-accessors
+
+# returns adaptive lifetime of cache entry for given $key [seconds]
+sub get_expires_in {
+ my ($self) = @_;
+
+ # short-circuit
+ if (!defined $self->{'check_load'} ||
+ $self->{'expires_max'} <= $self->{'expires_min'}) {
+ return $self->{'expires_min'};
+ }
+
+ my $expires_in =
+ #$self->{'expires_min'} +
+ $self->{'increase_factor'} * $self->check_load();
+
+ if ($expires_in < $self->{'expires_min'}) {
+ return $self->{'expires_min'};
+ } elsif ($expires_in > $self->{'expires_max'}) {
+ return $self->{'expires_max'};
+ }
+
+ return $expires_in;
+}
+
+# sets expiration time to $duration, turns off adaptive cache lifetime
+sub set_expires_in {
+ my ($self, $duration) = @_;
+
+ $self->{'expires_min'} = $self->{'expires_max'} = $duration;
+}
+
+# runs 'check_load' subroutine, for adaptive cache lifetime.
+# Note: check in caller that 'check_load' exists.
+sub check_load {
+ my $self = shift;
+ #return &{$self->{'check_load'}}();
+ return $self->{'check_load'}->();
+}
+
# ----------------------------------------------------------------------
# utility functions and methods
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7073e1b..2b429d2 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -255,13 +255,36 @@ our %cache_options = (
# * File::Spec->catdir(File::Spec->tmpdir(), 'gitweb-cache'),
# * '/var/cache/gitweb' (FHS compliant, requires being set up),
'cache_root' => 'cache',
+
# The number of subdirectories deep to cache object item. This should be
# large enough that no cache directory has more than a few hundred
# objects. Each non-leaf directory contains up to 256 subdirectories
# (00-ff). Must be larger than 0.
'cache_depth' => 1,
- # The (global) expiration time for objects placed in the cache, in seconds.
- 'expires_in' => 20,
+
+ # The (global) minimum expiration time for objects placed in the cache,
+ # in seconds. If the dynamic adaptive cache exporation time is lower
+ # than this number, we set cache timeout to this minimum.
+ 'expires_min' => 20, # 20 seconds
+
+ # The (global) maximum expiration time for dynamic (adaptive) caching
+ # algorithm, in seconds. If the adaptive cache lifetime exceeds this
+ # number, we set cache timeout to this maximum.
+ # (If 'expires_min' >= 'expires_max', there is no adaptive cache timeout,
+ # and 'expires_min' is used as expiration time for objects in cache.)
+ 'expires_max' => 1200, # 20 minutes
+
+ # Cache lifetime will be increased by applying this factor to the result
+ # from 'check_load' callback (see below).
+ 'expires_factor' => 60, # expire time in seconds for 1.0 (100% CPU) load
+
+ # User supplied callback for deciding the cache policy, usually system
+ # load. Multiplied by 'expires_factor' gives adaptive expiration time,
+ # in seconds, subject to the limits imposed by 'expires_min' and
+ # 'expires_max' bounds. Set to undef (or delete) to turn off dynamic
+ # lifetime control.
+ # (Compatibile with Cache::Adaptive.)
+ 'check_load' => \&get_loadavg,
);
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 7c7f00c..d28f4df 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -100,6 +100,28 @@ $cache->set($key, $value);
ok(!defined($cache->get($key)), 'cache is expired');
$cache->set_expires_in(-1);
+# Test assertions for adaptive cache expiration
+#
+my $load = 0.0;
+sub load { return $load; }
+my $expires_min = 10;
+my $expires_max = 30;
+$cache->set_expires_min($expires_min);
+$cache->set_expires_max($expires_max);
+$cache->set_check_load(\&load);
+cmp_ok($cache->get_expires_min(), '==', $expires_min, '"expires min" set correctly');
+cmp_ok($cache->get_expires_max(), '==', $expires_max, '"expires max" set correctly');
+cmp_ok($cache->get_expires_in(), '>=', $expires_min,
+ '"expires in" bound from down for load=0');
+cmp_ok($cache->get_expires_in(), '<=', $expires_max,
+ '"expires in" bound from up for load=0');
+$load = 1_000;
+cmp_ok($cache->get_expires_in(), '>=', $expires_min,
+ '"expires in" bound from down for heavy load');
+cmp_ok($cache->get_expires_in(), '<=', $expires_max,
+ '"expires in" bound from up for heavy load');
+$cache->set_expires_in(-1);
+
# ......................................................................
# Prepare for testing cache_fetch from $cache_pm
--
1.6.6.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCHv3 06/10] gitweb: Use CHI compatibile (compute method) caching
2010-02-16 19:36 [RFC PATCHv3 00/10] gitweb: Simple file based output caching Jakub Narebski
` (4 preceding siblings ...)
2010-02-16 19:36 ` [RFC PATCHv3 05/10] gitweb/cache.pm - Adaptive cache expiration time Jakub Narebski
@ 2010-02-16 19:36 ` Jakub Narebski
2010-02-16 19:36 ` [RFC PATCHv3 07/10] gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
` (5 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-16 19:36 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Petr Baudis, Jakub Narebski
If $cache provides CHI compatible ->compute($key, $code) method, use it
instead of Cache::Cache compatible ->get($key) and ->set($key, $data).
While at it, refactor regenerating cache into cache_calculate subroutine.
GitwebCache::SimpleFileCache provides 'compute' method, which currently
simply use 'get' and 'set' methods in proscribed manner. Nevertheless
'compute' method can be more flexible in choosing when to refresh cache,
and which process is to refresh/(re)generate cache entry. This method
would use (advisory) locking to prevent 'cache miss stampede' (aka
'stampeding herd') problem in the next commit.
The support for $cache which do not provide '->compute($key, $code)'
method is left just in case we would want to use such (external)
caching engine.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This commit could be moved one commit earlier in this patch series for
shortlog to look nicer (although commit message would have to be adjusted to
that fact) ;-)
Differences from v2:
* Update/change signature of cache_fetch_compute and cache_fetch_get_set
* cache_fetch_compute updated to use new output capturing mechanism
* Fixed cache_fetch_get_set
This patch doesn't strictly have an equivalent in J.H. patch adding caching
to gitweb.
gitweb/cache.pm | 46 ++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 42 insertions(+), 4 deletions(-)
diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index dcddd28..b828102 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -455,12 +455,50 @@ sub cache_capture (&) {
sub cache_fetch {
my ($cache, $key, $code) = @_;
- my $data = $cache->get($key);
+ if ($cache->can('compute')) {
+ #goto &cache_fetch_compute
+ cache_fetch_compute($cache, $key, $code);
+ } else {
+ #goto &cache_fetch_get_set
+ cache_fetch_get_set($cache, $key, $code);
+ }
+}
+
+# calculate data to regenerate cache via capturing output
+# (probably unnecessary level of indirection)
+sub cache_calculate {
+ my $code = shift;
+
+ my $data = cache_capture {
+ $code->();
+ };
+
+ return $data;
+}
+
+# for $cache which can ->compute($key, $code)
+sub cache_fetch_compute {
+ my ($cache, $key, $code) = @_;
+
+ my $data = $cache->compute($key, sub { cache_calculate($code) });
+ if (defined $data) {
+ # print cached data
+ #binmode STDOUT, ':raw';
+ #print STDOUT $data;
+ # for t9503 test:
+ binmode select(), ':raw';
+ print $data;
+ }
+}
+
+# for $cache which can ->get($key) and ->set($key, $data)
+sub cache_fetch_get_set {
+ my ($cache, $key, $code) = @_;
+
+ my $data = $cache->get($key);
if (!defined $data) {
- $data = cache_capture {
- $code->();
- };
+ $data = cache_calculate($code);
$cache->set($key, $data) if defined $data;
}
--
1.6.6.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCHv3 07/10] gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem
2010-02-16 19:36 [RFC PATCHv3 00/10] gitweb: Simple file based output caching Jakub Narebski
` (5 preceding siblings ...)
2010-02-16 19:36 ` [RFC PATCHv3 06/10] gitweb: Use CHI compatibile (compute method) caching Jakub Narebski
@ 2010-02-16 19:36 ` Jakub Narebski
2010-02-16 19:36 ` [RFC PATCHv3 08/10] gitweb/cache.pm - Serve stale data when waiting for filling cache Jakub Narebski
` (4 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-16 19:36 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Petr Baudis, Jakub Narebski
In the ->compute($key, $code) method from GitwebCache::SimpleFileCache,
use locking (via flock) to ensure that only one process would generate
data to update/fill-in cache; the rest would wait for the cache to
be (re)generated and would read data from cache. If process that was
to (re)generate data dies or exits, one of the readers would take its
role.
Currently this feature can not be disabled (via %cache_options).
Other future features (like: serving stale data while cache is being
regenerated, (re)generating cache in background, activity indicator)
all depend on locking.
A test in t9503 shows that in the case where there are two clients
trying to simultaneously access non-existent or stale cache entry,
(and generating data takes (artifically) a bit of time), if they are
using ->compute method the data is (re)generated once, as opposed to
if those clients are just using ->get/->set methods.
To be implemented (from original patch by J.H.):
* background building, and showing stale cache
* server-side progress indicator when waiting for filling cache,
which in turn requires separating situations (like snapshots and
other non-HTML responses) where we should not show 'please wait'
message
Note that there is slight inefficiency, in that filename for lockfile
depends on the filename for cache entry (it just adds '.lock' suffix),
but is recalculated from $key for both paths.
Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v2:
* More comments.
* Add loop in ->compute(), so if process that was to (re)generate data dies
or exits, one of the readers would take its role. Such situation should
be rare: it can happen if two client access the same invalid URL.
* Add test for such situation in t9503
Differences from relevant parts of J.H. patch:
* Locking on separate *.lock file, for simpler work and robustness;
acquiring exclusive lock might require having file opened for possible
write, following
"File Locking Tricks and Traps" (http://perl.plover.com/yak/flock/)
J.H. patch used separate *.bg file for lock only for background caching
(to be implemented in one of next commits).
* Ensure that ->compute() delivers $data, unless $code fails. See above
about adding loop in ->compute() method.
* Single variable $lock_state, in place of $lockingStatus, $lockStatBG,
$lockStat, $lockStatus in J.H. patch.
* Use indirect (lexical) filehandles for lockfiles, instead of barewords,
i.e. global filehandles:
open my $lock_fh, '+>', $lockfile;
rather than equivalent of
open LOCK, '>:utf8', $lockfile
* Do not use LOCK_UN: closing lockfile would unlock
* In my opinion much cleaner flow control
Possible improvements:
* When using locking, only one process would be able to write to temporary
file. Therefore temporary file can now have non-random name; no need for
File::Temp nor unique_id (performance). It would probably be best done
via ->get_tempfile() method, or something like that.
* Run tests in t9503 which contain 'sleep 1' in them (which is required to
test how they handle concurrent access, e.g. testing cache miss stampede
situation) only when test is run with --long. This would require update
to t/test-lib.sh to export GIT_TEST_LONG to external program in
test_external etc.
* ->get_lockname should perhaps be ->get_lockfile which would return opened
filehandle, or filehandle + filename like File::Temp::tempfile, or objects
that functions like filehandle but stringifies to filename like
File::Temp->new().
* Equivalent of old _Make_Path?
* Would ability to turn this feature off make sense?
gitweb/cache.pm | 44 +++++++++++++++-
t/t9503/test_cache_interface.pl | 107 +++++++++++++++++++++++++++++++++++++++
2 files changed, 148 insertions(+), 3 deletions(-)
diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index b828102..25f0278 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -25,6 +25,7 @@ use File::Path qw(mkpath);
use File::Spec;
use File::Temp;
use Digest::MD5 qw(md5_hex);
+use Fcntl qw(:flock);
# by default, the cache nests all entries on the filesystem two
# directories deep
@@ -226,6 +227,13 @@ sub path_to_key {
return $filepath;
}
+# Take an human readable key, and return path to be used for lockfile
+sub get_lockname {
+ my $self = shift;
+
+ return $self->path_to_key(@_) . '.lock';
+}
+
# ----------------------------------------------------------------------
# worker methods
@@ -393,15 +401,45 @@ sub get {
# get $key; if successful, returns the value. Otherwise, calls $code
# and uses the return value as the new value for $key, which is then
# returned.
+# Uses file locking to have only one process updating value for $key
+# to avoid 'cache miss stampede' (aka 'stampeding herd') problem.
sub compute {
my ($self, $p_key, $p_coderef) = @_;
my $data = $self->get($p_key);
- if (!defined $data) {
- $data = $p_coderef->($self, $p_key);
- $self->set($p_key, $data);
+ return $data if defined $data;
+
+ my $dir;
+ my $lockfile = $self->get_lockname($p_key, \$dir);
+
+ # ensure that directory leading to lockfile exists
+ if (!-d $dir) {
+ mkpath($dir, 0, 0777)
+ or die "Couldn't mkpath '$dir' for lockfile: $!";
}
+ # this loop is to protect against situation where process that
+ # acquired exclusive lock (writer) dies or exits (die_error)
+ # before writing data to cache
+ my $lock_state; # needed for loop condition
+ do {
+ open my $lock_fh, '+>', $lockfile
+ or die "Could't open lockfile '$lockfile': $!";
+ $lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
+ if ($lock_state) {
+ # acquired writers lock
+ $data = $p_coderef->($self, $p_key);
+ $self->set($p_key, $data);
+ } else {
+ # get readers lock
+ flock($lock_fh, LOCK_SH);
+ $data = $self->fetch($p_key);
+ }
+ # closing lockfile releases lock
+ close $lock_fh
+ or die "Could't close lockfile '$lockfile': $!";
+ } until (defined $data || $lock_state);
+ # repeat until we have data, or we tried generating data oneself and failed
return $data;
}
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index d28f4df..255fad2 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -89,6 +89,113 @@ is($cache->compute($key, \&get_value), $value, 'compute 2nd time (get)');
is($cache->compute($key, \&get_value), $value, 'compute 3rd time (get)');
cmp_ok($call_count, '==', 1, 'get_value() is called once from compute');
+# Test 'stampeding herd' / 'cache miss stampede' problem
+# (probably should be run only if GIT_TEST_LONG)
+#
+sub get_value_slow {
+ $call_count++;
+ sleep 1;
+ return $value;
+}
+my ($pid, $kid_fh);
+
+$call_count = 0;
+$cache->remove($key);
+$pid = open $kid_fh, '-|';
+SKIP: {
+ skip "cannot fork: $!", 1
+ unless defined $pid;
+
+ my $data = $cache->get($key);
+ if (!defined $data) {
+ $data = get_value_slow();
+ $cache->set($key, $data);
+ }
+
+ if ($pid) {
+ my $child_count = <$kid_fh>;
+ chomp $child_count;
+
+ waitpid $pid, 0;
+ close $kid_fh;
+
+ $call_count += $child_count;
+ } else {
+ print "$call_count\n";
+ exit 0;
+ }
+
+ cmp_ok($call_count, '==', 2, 'parallel get/set: get_value_slow() called twice');
+}
+
+$call_count = 0;
+$cache->remove($key);
+$pid = open $kid_fh, '-|';
+SKIP: {
+ skip "cannot fork: $!", 1
+ unless defined $pid;
+
+ my $data = $cache->compute($key, \&get_value_slow);
+
+ if ($pid) {
+ my $child_count = <$kid_fh>;
+ chomp $child_count;
+
+ waitpid $pid, 0;
+ close $kid_fh;
+
+ $call_count += $child_count;
+ } else {
+ print "$call_count\n";
+ exit 0;
+ }
+
+ cmp_ok($call_count, '==', 1, 'parallel compute: get_value_slow() called once');
+}
+
+# Test that it doesn't hang if get_action exits/dies
+#
+sub get_value_die {
+ die "get_value_die\n";
+}
+print STDERR "$$: start\n";
+my $result = eval {
+ $pid = open $kid_fh, '-|';
+ SKIP: {
+ skip "cannot fork: $!", 2
+ unless defined $pid;
+
+ local $SIG{ALRM} = sub { die "alarm\n"; };
+ alarm 4;
+
+ my ($eval_error, $child_eval_error);
+ eval { $cache->compute('No Key', \&get_value_die); };
+ $eval_error = $@;
+
+ if ($pid) {
+ local $/ = "\0";
+
+ $child_eval_error = <$kid_fh>;
+ chomp $child_eval_error;
+
+ waitpid $pid, 0;
+ close $kid_fh;
+ } else {
+
+ print "$eval_error\0";
+ exit 0;
+ }
+
+ is($eval_error, "get_value_die\n", 'get_value_die() died in parent');
+ is($child_eval_error, "get_value_die\n", 'get_value_die() died in child');
+
+ alarm 0;
+ }
+};
+ok(!$@, 'no alarm call (neither process hung)');
+diag($@) if $@;
+
+
# Test cache expiration for 'expire now'
#
$cache->set_expires_in(60*60*24); # set expire time to 1 day
--
1.6.6.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCHv3 08/10] gitweb/cache.pm - Serve stale data when waiting for filling cache
2010-02-16 19:36 [RFC PATCHv3 00/10] gitweb: Simple file based output caching Jakub Narebski
` (6 preceding siblings ...)
2010-02-16 19:36 ` [RFC PATCHv3 07/10] gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
@ 2010-02-16 19:36 ` Jakub Narebski
2010-02-16 19:36 ` [RFC PATCHv3 09/10] gitweb/cache.pm - Regenerate (refresh) cache in background Jakub Narebski
` (3 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-16 19:36 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Petr Baudis, Jakub Narebski
When process fails to acquire exclusive (writers) lock, then instead
of waiting for the other process to (re)generate and fill cache, serve
stale (expired) data from cache. This is of course possible only if
there is some stale data in cache for given key.
This feature of GitwebCache::SimpleFileCache is used only for an
->update($key, $code) method. It is controlled by 'max_lifetime'
cache parameter; you can set it to -1 to always serve stale data
if it exists, and you can set it to 0 (or any value smaller than
'expires_min') to turn this feature off.
This feature, as it is implemented currently, makes ->update() method a
bit asymmetric with respect to process that acquired writers lock and
those processes that didn't, which can be seen in the new test in t9503.
The process that is to regenerate (refresh) data in cache must wait for
the data to be generated in full before showing anything to client, while
the other processes show stale (expired) data immediately. In order to
remove or reduce this asymmetry gitweb would need to employ one of the two
alternate solutions. Either data should be (re)generated in background,
so that process that acquired writers lock would generate data in
background while serving stale data, or alternatively the process that
generates data should pass output to original STDOUT while capturing it
("tee" output).
When developing this feature, ->is_valid() method acquired additional
extra optional parameter, where one can pass expire time instead of using
whole-cache global (adaptive) expire time.
To be implemented (from original patch by J.H.):
* background building,
* server-side progress indicator when waiting for filling cache,
which in turn requires separating situations (like snapshots and
other non-HTML responses) where we should not show 'please wait'
message
Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v2:
* 'max_lifetime' described in comment above new() in cache.pm.
* More comments describing code.
Differences from relevant parts of J.H. patch:
* Instead of using $maxCacheLife, use 'max_lifetime' option (with
'max_cache_lifetime' as optional spelling in GitwebCache::SimpleFileCache
constructor).
* Use 5*60*60 seconds for 5 hours, rather than 18000 (or 18_000) seconds.
* Serving stale data was enabled only when background cache regeneration was
enabled; here it is enabled for readers regardless whether background
generation (introduced in next commit) is turned on or not.
Possible improvements:
* Run long tests only with --long (well, they are not _that_ long).
gitweb/cache.pm | 33 +++++++++++++++-----
gitweb/gitweb.perl | 8 +++++
t/t9503/test_cache_interface.pl | 63 +++++++++++++++++++++++++++++++++++++-
3 files changed, 94 insertions(+), 10 deletions(-)
diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index 25f0278..1773a7e 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -74,6 +74,11 @@ our $DEFAULT_CACHE_ROOT = "cache";
# * 'increase_factor' [seconds / 100% CPU load]
# Factor multiplying 'check_load' result when calculating cache lietime.
# Defaults to 60 seconds for 100% SPU load ('check_load' returning 1.0).
+# * 'max_lifetime' [seconds]
+# If it is greater than 0, and cache entry is expired but not older
+# than it, serve stale data when waiting for cache entry to be
+# regenerated (refreshed). Non-adaptive.
+# Defaults to -1 (never expire / always serve stale).
sub new {
my ($proto, $p_options_hash_ref) = @_;
@@ -83,6 +88,7 @@ sub new {
my ($root, $depth, $ns);
my ($expires_min, $expires_max, $increase_factor, $check_load);
+ my ($max_lifetime);
if (defined $p_options_hash_ref) {
$root =
$p_options_hash_ref->{'cache_root'} ||
@@ -99,6 +105,9 @@ sub new {
$p_options_hash_ref->{'expires_max'};
$increase_factor = $p_options_hash_ref->{'expires_factor'};
$check_load = $p_options_hash_ref->{'check_load'};
+ $max_lifetime =
+ $p_options_hash_ref->{'max_lifetime'} ||
+ $p_options_hash_ref->{'max_cache_lifetime'};
}
$root = $DEFAULT_CACHE_ROOT unless defined($root);
$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
@@ -108,12 +117,14 @@ sub new {
if (!defined($expires_max) && exists $p_options_hash_ref->{'expires_in'});
$expires_max = 1200 unless (defined($expires_max));
$increase_factor = 60 unless defined($increase_factor);
+ $max_lifetime = -1 unless defined($max_lifetime);
$self->set_root($root);
$self->set_depth($depth);
$self->set_namespace($ns);
$self->set_expires_min($expires_min);
$self->set_expires_max($expires_max);
+ $self->set_max_lifetime($max_lifetime);
$self->set_increase_factor($increase_factor);
$self->set_check_load($check_load);
@@ -126,7 +137,7 @@ sub new {
# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
# creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace expires_min expires_max increase_factor check_load)) {
+foreach my $i (qw(depth root namespace expires_min expires_max increase_factor check_load max_lifetime)) {
my $field = $i;
no strict 'refs';
*{"get_$field"} = sub {
@@ -340,12 +351,13 @@ sub remove {
or die "Couldn't remove file '$file': $!";
}
-# $cache->is_valid($key)
+# $cache->is_valid($key[, $expires_in])
#
# Returns a boolean indicating whether $key exists in the cache
-# and has not expired (global per-cache 'expires_in').
+# and has not expired. Uses global per-cache expires time, unless
+# passed optional $expires_in argument.
sub is_valid {
- my ($self, $key) = @_;
+ my ($self, $key, $expires_in) = @_;
my $path = $self->path_to_key($key);
@@ -356,7 +368,7 @@ sub is_valid {
or die "Couldn't stat file '$path': $!";
# expire time can be set to never
- my $expires_in = $self->get_expires_in();
+ $expires_in = defined $expires_in ? $expires_in : $self->get_expires_in();
return 1 unless (defined $expires_in && $expires_in >= 0);
# is file expired?
@@ -431,9 +443,14 @@ sub compute {
$data = $p_coderef->($self, $p_key);
$self->set($p_key, $data);
} else {
- # get readers lock
- flock($lock_fh, LOCK_SH);
- $data = $self->fetch($p_key);
+ # try to retrieve stale data
+ $data = $self->fetch($p_key)
+ if $self->is_valid($p_key, $self->get_max_lifetime());
+ if (!defined $data) {
+ # get readers lock if there is no stale data to serve
+ flock($lock_fh, LOCK_SH);
+ $data = $self->fetch($p_key);
+ }
}
# closing lockfile releases lock
close $lock_fh
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2b429d2..ff249b9 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -285,6 +285,14 @@ our %cache_options = (
# lifetime control.
# (Compatibile with Cache::Adaptive.)
'check_load' => \&get_loadavg,
+
+ # Maximum cache file life, in seconds. If cache entry lifetime exceeds
+ # this value, it wouldn't be served as being too stale when waiting for
+ # cache to be regenerated/refreshed, instead of trying to display
+ # existing cache date.
+ # Set it to -1 to always serve existing data if it exists,
+ # set it to 0 to turn off serving stale data - always wait.
+ 'max_lifetime' => 5*60*60, # 5 hours
);
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 255fad2..2eae9e0 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -34,8 +34,11 @@ ok($return, "run gitweb/cache.pm");
# Test creating a cache
#
-my $cache = new_ok('GitwebCache::SimpleFileCache',
- [ { 'cache_root' => 'cache', 'cache_depth' => 2 } ]);
+my $cache = new_ok('GitwebCache::SimpleFileCache', [ {
+ 'cache_root' => 'cache',
+ 'cache_depth' => 2,
+ 'max_lifetime' => 0, # turn it off
+} ]);
# Test that default values are defined
#
@@ -275,6 +278,62 @@ $cache->set($key, $cached_output);
$test_data = capture_cache_fetch_output();
is($test_data, $cached_output, 'action output is printed (from cache)');
+# Test that cache returns stale data in existing but expired cache situation
+# (probably should be run only if GIT_TEST_LONG)
+#
+my $stale_value = 'Stale Value';
+my $child_data = '';
+$cache->set($key, $stale_value);
+$call_count = 0;
+sub run_compute_forked {
+ my $pid = shift;
+
+ my $data = $cache->compute($key, \&get_value_slow);
+
+ if ($pid) {
+ $child_data = <$kid_fh>;
+ chomp $child_data;
+
+ waitpid $pid, 0;
+ close $kid_fh;
+ } else {
+ print "$data\n";
+ exit 0;
+ }
+
+ return $data;
+}
+$cache->set_expires_in(0); # expire now
+$cache->set_max_lifetime(-1); # forever
+$pid = open $kid_fh, '-|';
+SKIP: {
+ skip "cannot fork: $!", 2
+ unless defined $pid;
+
+ my $data = run_compute_forked($pid);
+
+ # returning stale data works
+ ok($data eq $stale_value || $child_data eq $stale_value,
+ 'stale data in at least one process when expired');
+
+ $cache->set_expires_in(-1); # never expire for next ->get
+ is($cache->get($key), $value, 'value got set correctly, even if stale data returned');
+}
+$cache->set_expires_in(0); # expire now
+$cache->set_max_lifetime(0); # don't serve stale data
+$pid = open $kid_fh, '-|';
+SKIP: {
+ skip "cannot fork: $!", 1
+ unless defined $pid;
+
+ my $data = run_compute_forked($pid);
+
+ # no returning stale data
+ ok($data ne $stale_value && $child_data ne $stale_value,
+ 'configured to never return stale data');
+}
+$cache->set_expires_in(-1);
+
done_testing();
print Dumper($cache);
--
1.6.6.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCHv3 09/10] gitweb/cache.pm - Regenerate (refresh) cache in background
2010-02-16 19:36 [RFC PATCHv3 00/10] gitweb: Simple file based output caching Jakub Narebski
` (7 preceding siblings ...)
2010-02-16 19:36 ` [RFC PATCHv3 08/10] gitweb/cache.pm - Serve stale data when waiting for filling cache Jakub Narebski
@ 2010-02-16 19:36 ` Jakub Narebski
2010-02-16 19:36 ` [RFC PATCHv3 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
` (2 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-16 19:36 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Petr Baudis, Jakub Narebski
This commit removes asymmetry in serving stale data (if it exists)
when regenerating cache in GitwebCache::SimpleFileCache. The process
that acquired exclusive (writers) lock, and is therefore selected to
be the one that (re)generates data to fill the cache, can now generate
data in background, while serving stale data.
Those background processes are daemonized, i.e. detached from the main
process (the one returning data or stale data). Otherwise there might be a
problem when gitweb is running as (part of) long-lived process, for example
from mod_perl (or in the future from FastCGI): it would leave unreaped
children as zombies (entries in process table). We don't want to wait for
background process, and we can't set $SIG{CHLD} to 'IGNORE' in gitweb to
automatically reap child processes, because this interferes with using
open my $fd, '-|', git_cmd(), 'param', ...
or die_error(...)
# read from <$fd>
close $fd
or die_error(...)
In the above code "close" for magic "-|" open calls waitpid... and we
would would die with "No child processes". Removing 'or die' would
possibly remove ability to react to other errors.
This feature can be enabled or disabled on demand via 'background_cache'
cache parameter. It is turned on by default.
To be implemented (from original patch by J.H.):
* server-side progress indicator when waiting for filling cache,
which in turn requires separating situations (like snapshots and
other non-HTML responses) where we should not show 'please wait'
message
Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v2:
* More comments.
* Daemonize background process, detaching it from parent. This way
whether main process is short-lived (gitweb as CGI) or long-lived
(mod_perl), there would be no need to wait and no zombies.
Differences from relevant parts of J.H. patch:
* Fork (run generating process in background) only if there is stale data to
serve (and if background cache is turned on).
* In J.H. patch forking was done unconditionally, only generation or exit
depended on $backgroundCache.
* Lock before forking.
* Tests (currently only of API).
* In my opinion cleaner control flow.
gitweb/cache.pm | 61 ++++++++++++++++++++++++++++++++------
gitweb/gitweb.perl | 9 ++++++
t/t9503/test_cache_interface.pl | 16 ++++++----
3 files changed, 70 insertions(+), 16 deletions(-)
diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index 1773a7e..a3fa6fd 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -26,6 +26,7 @@ use File::Spec;
use File::Temp;
use Digest::MD5 qw(md5_hex);
use Fcntl qw(:flock);
+use POSIX qw(setsid);
# by default, the cache nests all entries on the filesystem two
# directories deep
@@ -79,6 +80,9 @@ our $DEFAULT_CACHE_ROOT = "cache";
# than it, serve stale data when waiting for cache entry to be
# regenerated (refreshed). Non-adaptive.
# Defaults to -1 (never expire / always serve stale).
+# * 'background_cache' (boolean)
+# This enables/disables regenerating cache in background process.
+# Defaults to true.
sub new {
my ($proto, $p_options_hash_ref) = @_;
@@ -88,7 +92,7 @@ sub new {
my ($root, $depth, $ns);
my ($expires_min, $expires_max, $increase_factor, $check_load);
- my ($max_lifetime);
+ my ($max_lifetime, $background_cache);
if (defined $p_options_hash_ref) {
$root =
$p_options_hash_ref->{'cache_root'} ||
@@ -108,6 +112,7 @@ sub new {
$max_lifetime =
$p_options_hash_ref->{'max_lifetime'} ||
$p_options_hash_ref->{'max_cache_lifetime'};
+ $background_cache = $p_options_hash_ref->{'background_cache'};
}
$root = $DEFAULT_CACHE_ROOT unless defined($root);
$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
@@ -118,6 +123,7 @@ sub new {
$expires_max = 1200 unless (defined($expires_max));
$increase_factor = 60 unless defined($increase_factor);
$max_lifetime = -1 unless defined($max_lifetime);
+ $background_cache = 1 unless defined($background_cache);
$self->set_root($root);
$self->set_depth($depth);
@@ -127,6 +133,7 @@ sub new {
$self->set_max_lifetime($max_lifetime);
$self->set_increase_factor($increase_factor);
$self->set_check_load($check_load);
+ $self->set_background_cache($background_cache);
return $self;
}
@@ -137,7 +144,9 @@ sub new {
# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
# creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace expires_min expires_max increase_factor check_load max_lifetime)) {
+foreach my $i (qw(depth root namespace
+ expires_min expires_max increase_factor check_load
+ max_lifetime background_cache)) {
my $field = $i;
no strict 'refs';
*{"get_$field"} = sub {
@@ -417,8 +426,9 @@ sub get {
# to avoid 'cache miss stampede' (aka 'stampeding herd') problem.
sub compute {
my ($self, $p_key, $p_coderef) = @_;
+ my ($data, $stale_data);
- my $data = $self->get($p_key);
+ $data = $self->get($p_key);
return $data if defined $data;
my $dir;
@@ -437,16 +447,47 @@ sub compute {
do {
open my $lock_fh, '+>', $lockfile
or die "Could't open lockfile '$lockfile': $!";
+
$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
if ($lock_state) {
- # acquired writers lock
- $data = $p_coderef->($self, $p_key);
- $self->set($p_key, $data);
+ ## acquired writers lock, have to generate data
+ my $pid;
+ if ($self->{'background_cache'}) {
+ # try to retrieve stale data
+ $stale_data = $self->fetch($p_key)
+ if $self->is_valid($p_key, $self->get_max_lifetime());
+
+ # fork if there is stale data, for background process
+ # to regenerate/refresh the cache (generate data)
+ $pid = fork() if (defined $stale_data);
+ }
+ if (!defined $pid || !$pid) {
+ ## didn't fork, or are in background process
+
+ # daemonize background process, detaching it from parent
+ # see also Proc::Daemonize, Apache2::SubProcess
+ if (defined $pid) {
+ POSIX::setsid(); # or setpgrp(0, 0);
+ fork() && exit 0;
+ }
+
+ $data = $p_coderef->($self, $p_key);
+ $self->set($p_key, $data);
+
+ if (defined $pid) {
+ ## in background process; parent will serve stale data
+ close $lock_fh
+ or die "Couldn't close lockfile '$lockfile' (background): $!";
+ exit 0;
+ }
+ }
+
} else {
# try to retrieve stale data
- $data = $self->fetch($p_key)
+ $stale_data = $self->fetch($p_key)
if $self->is_valid($p_key, $self->get_max_lifetime());
- if (!defined $data) {
+
+ if (!defined $stale_data) {
# get readers lock if there is no stale data to serve
flock($lock_fh, LOCK_SH);
$data = $self->fetch($p_key);
@@ -455,9 +496,9 @@ sub compute {
# closing lockfile releases lock
close $lock_fh
or die "Could't close lockfile '$lockfile': $!";
- } until (defined $data || $lock_state);
+ } until (defined $data || defined $stale_data || $lock_state);
# repeat until we have data, or we tried generating data oneself and failed
- return $data;
+ return defined $data ? $data : $stale_data;
}
1;
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index ff249b9..c391226 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -293,6 +293,15 @@ our %cache_options = (
# Set it to -1 to always serve existing data if it exists,
# set it to 0 to turn off serving stale data - always wait.
'max_lifetime' => 5*60*60, # 5 hours
+
+ # This enables/disables background caching. If it is set to true value,
+ # caching engine would return stale data (if it is not older than
+ # 'max_lifetime' seconds) if it exists, and launch process if regenerating
+ # (refreshing) cache into the background. If it is set to false value,
+ # the process that fills cache must always wait for data to be generated.
+ # In theory this will make gitweb seem more responsive at the price of
+ # serving possibly stale data.
+ 'background_cache' => 1,
);
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 2eae9e0..9643631 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -86,6 +86,7 @@ sub get_value {
$call_count++;
return $value;
}
+local $SIG{CHLD} = 'IGNORE'; # compute can fork, don't generate zombies
can_ok($cache, qw(compute));
is($cache->compute($key, \&get_value), $value, 'compute 1st time (set)');
is($cache->compute($key, \&get_value), $value, 'compute 2nd time (get)');
@@ -305,32 +306,35 @@ sub run_compute_forked {
}
$cache->set_expires_in(0); # expire now
$cache->set_max_lifetime(-1); # forever
+ok($cache->get_background_cache(), 'generate cache in background by default');
$pid = open $kid_fh, '-|';
SKIP: {
- skip "cannot fork: $!", 2
+ skip "cannot fork: $!", 3
unless defined $pid;
my $data = run_compute_forked($pid);
# returning stale data works
- ok($data eq $stale_value || $child_data eq $stale_value,
- 'stale data in at least one process when expired');
+ is($data, $stale_value, 'stale data in parent when expired');
+ is($child_data, $stale_value, 'stale data in child when expired');
$cache->set_expires_in(-1); # never expire for next ->get
+ diag('waiting for background process to have time to set data');
+ sleep 1; # wait for background process to have chance to set data
is($cache->get($key), $value, 'value got set correctly, even if stale data returned');
}
$cache->set_expires_in(0); # expire now
$cache->set_max_lifetime(0); # don't serve stale data
$pid = open $kid_fh, '-|';
SKIP: {
- skip "cannot fork: $!", 1
+ skip "cannot fork: $!", 2
unless defined $pid;
my $data = run_compute_forked($pid);
# no returning stale data
- ok($data ne $stale_value && $child_data ne $stale_value,
- 'configured to never return stale data');
+ isnt($data, $stale_value, 'no stale data in parent if configured');
+ isnt($child_data, $stale_value, 'no stale data in child if configured');
}
$cache->set_expires_in(-1);
--
1.6.6.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCHv3 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache
2010-02-16 19:36 [RFC PATCHv3 00/10] gitweb: Simple file based output caching Jakub Narebski
` (8 preceding siblings ...)
2010-02-16 19:36 ` [RFC PATCHv3 09/10] gitweb/cache.pm - Regenerate (refresh) cache in background Jakub Narebski
@ 2010-02-16 19:36 ` Jakub Narebski
2010-02-18 22:01 ` [RFC PATCHv3 00/10] gitweb: Simple file based output caching J.H.
2010-02-28 2:54 ` gitweb: Simple file based output caching TODO (was: Re: [RFC PATCHv3 00/10] gitweb: Simple file based output caching) Jakub Narebski
11 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-16 19:36 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Petr Baudis, Jakub Narebski
When there exist stale/expired (but not too stale) version of
(re)generated page in cache, gitweb returns stale version (and updates
cache in background, assuming 'background_cache' is set to true value).
When there is no stale version suitable to serve the client, currently
we have to wait for the data to be generated in full before showing it.
Add to GitwebCache::SimpleFileCache, via 'generating_info' callback,
the ability to show user some activity indicator / progress bar, to
show that we are working on generating data.
Gitweb itself uses "Generating..." page as activity indicator, which
redirects (via <meta http-equiv="Refresh" ...>) to refreshed version
of the page after the cache is filled (via trick of not closing page
and therefore not closing connection till data is available in cache,
checked by getting shared/readers lock on lockfile for cache entry).
The git_generating_data_html() subroutine, which is used by gitweb
to implement this feature, is highly configurable: you can choose
initial delay, frequency of writing some data so that connection
won't get closed, and maximum time to wait for data in "Generating..."
page (see %generating_options hash).
Currently git_generating_data_html() contains hardcoded "whitelist" of
actions for which such HTML "Generating..." page makes sense.
This implements final feature from the original gitweb output caching
patch by J.H.
Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v2:
* More comments.
* In more places use just fields directly and not via getter (accessor).
* Before using 'generating_info' in process that writes cache entry
(in background) and has exclusive lock, reset lock by closing and
opening lockfile (flock($lockfile, LOCK_UN) doesn't work).
* Fix bug in 'startup_delay' in git_generating_data_html: the test
was switched condition (thats for taking inspiration from two different
sources with slightly different code).
* Always use "die $@" instead of simple "die" to re-throw unexpected
errors (exceptions) in git_generating_data_html.
* exit in git_generating_data_html.
* Better tests in t9503.
Differences from relevant parts of J.H. patch:
* The subroutine that is responsible for doing "Generating..." progress
info / activity indicator (cacheWaitForUpdate() subroutine in J.H. patch,
git_generating_data_html() in this patch) is in gitweb.perl, and not in
cache.pm. It is passed as callback (as code reference) to $cache
constructor.
* gitweb prints generating info in more restricted set of situations; the
set of actions where gitweb does not generate activity indicator is
larger. We could probably provide activity indicator also for (possibly)
non-HTML output, like 'blob_plain' or 'patches', provided that
'User-Agent' denotes that we are using web browser.
* "Generating..." info behavior can be configured (at least the timings) via
%generating_options hash, instead of having those options hardcoded.
* There is initial 'startup_delay' (by default 1 second); if the data is
generated by that time, the "Generating..." page is not shown.
* Waiting is done using blocking flock + alarm, rather than "busy wait"
loop with non-blocking flock + sleep.
* Basic test for "Generating..." feature.
* Removed (by accident) copyright assignment from "Generating..." page.
Needs to be updated and brought back.
* Fixed typo in DTD URL.
gitweb/cache.pm | 52 ++++++++++++-
gitweb/gitweb.perl | 154 ++++++++++++++++++++++++++++++++++++++-
t/t9503/test_cache_interface.pl | 61 +++++++++++++++
3 files changed, 260 insertions(+), 7 deletions(-)
diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index a3fa6fd..e97d697 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -83,6 +83,12 @@ our $DEFAULT_CACHE_ROOT = "cache";
# * 'background_cache' (boolean)
# This enables/disables regenerating cache in background process.
# Defaults to true.
+# * 'generating_info'
+# Subroutine (code) called when process has to wait for cache entry
+# to be (re)generated (when there is no not-too-stale data to serve
+# instead), for other process (or bacground process). It is passed
+# $cache instance, $key, and opened $lock_fh filehandle to lockfile.
+# Unset by default (which means no activity indicator).
sub new {
my ($proto, $p_options_hash_ref) = @_;
@@ -92,7 +98,7 @@ sub new {
my ($root, $depth, $ns);
my ($expires_min, $expires_max, $increase_factor, $check_load);
- my ($max_lifetime, $background_cache);
+ my ($max_lifetime, $background_cache, $generating_info);
if (defined $p_options_hash_ref) {
$root =
$p_options_hash_ref->{'cache_root'} ||
@@ -113,6 +119,7 @@ sub new {
$p_options_hash_ref->{'max_lifetime'} ||
$p_options_hash_ref->{'max_cache_lifetime'};
$background_cache = $p_options_hash_ref->{'background_cache'};
+ $generating_info = $p_options_hash_ref->{'generating_info'};
}
$root = $DEFAULT_CACHE_ROOT unless defined($root);
$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
@@ -134,6 +141,7 @@ sub new {
$self->set_increase_factor($increase_factor);
$self->set_check_load($check_load);
$self->set_background_cache($background_cache);
+ $self->set_generating_info($generating_info);
return $self;
}
@@ -146,7 +154,7 @@ sub new {
# creates get_depth() and set_depth($depth) etc. methods
foreach my $i (qw(depth root namespace
expires_min expires_max increase_factor check_load
- max_lifetime background_cache)) {
+ max_lifetime background_cache generating_info)) {
my $field = $i;
no strict 'refs';
*{"get_$field"} = sub {
@@ -200,6 +208,17 @@ sub check_load {
return $self->{'check_load'}->();
}
+# $cache->generating_info($key, $lock);
+# runs 'generating_info' subroutine, for activity indicator,
+# checking if it is defined first.
+sub generating_info {
+ my $self = shift;
+
+ if (defined $self->{'generating_info'}) {
+ $self->{'generating_info'}->($self, @_);
+ }
+}
+
# ----------------------------------------------------------------------
# utility functions and methods
@@ -459,7 +478,8 @@ sub compute {
# fork if there is stale data, for background process
# to regenerate/refresh the cache (generate data)
- $pid = fork() if (defined $stale_data);
+ $pid = fork()
+ if (defined $stale_data || $self->{'generating_info'});
}
if (!defined $pid || !$pid) {
## didn't fork, or are in background process
@@ -476,18 +496,42 @@ sub compute {
if (defined $pid) {
## in background process; parent will serve stale data
+ ## or show activity indicator, and serve data
close $lock_fh
or die "Couldn't close lockfile '$lockfile' (background): $!";
exit 0;
}
+
+ } else {
+ ## forked, in parent process
+
+ # provide "generating page..." info if there is no stale data to serve
+ # might exit, or force web browser to do redirection (refresh)
+ if (!defined $stale_data) {
+ # lock can get inherited across forks; unlock
+ # flock($lock_fh, LOCK_UN); # <-- this doesn't work
+ close $lock_fh
+ or die "Couldn't close lockfile '$lockfile' for reopen: $!";
+ open $lock_fh, '<', $lockfile
+ or die "Couldn't reopen (for reading) lockfile '$lockfile': $!";
+
+ $self->generating_info($p_key, $lock_fh);
+ # generating info may exit, so we can not get there
+ # wait for and get data from background process
+ flock($lock_fh, LOCK_SH);
+ $data = $self->fetch($p_key);
+ }
}
-
+
} else {
# try to retrieve stale data
$stale_data = $self->fetch($p_key)
if $self->is_valid($p_key, $self->get_max_lifetime());
if (!defined $stale_data) {
+ # there is no stale data to serve
+ # provide "generating page..." info
+ $self->generating_info($p_key, $lock_fh);
# get readers lock if there is no stale data to serve
flock($lock_fh, LOCK_SH);
$data = $self->fetch($p_key);
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c391226..18bfbdb 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -13,7 +13,7 @@ use CGI qw(:standard :escapeHTML -nosticky);
use CGI::Util qw(unescape);
use CGI::Carp qw(fatalsToBrowser);
use Encode;
-use Fcntl ':mode';
+use Fcntl qw(:mode :flock);
use File::Find qw();
use File::Basename qw(basename);
binmode STDOUT, ':utf8';
@@ -302,8 +302,33 @@ our %cache_options = (
# In theory this will make gitweb seem more responsive at the price of
# serving possibly stale data.
'background_cache' => 1,
-);
+ # Subroutine which would be called when gitweb has to wait for data to
+ # be generated (it can't serve stale data because there isn't any,
+ # or if it exists it is older than 'max_lifetime'). The default
+ # is to use git_generating_data_html(), which creates "Generating..."
+ # page, which would then redirect or redraw/rewrite the page when
+ # data is ready.
+ # Set it to `undef' to disable this feature.
+ #
+ # Such subroutine (if invoked from GitwebCache::SimpleFileCache)
+ # is passed the following parameters: $cache instance, human-readable
+ # $key to current page, and filehandle $lock_fh to lockfile.
+ 'generating_info' => \&git_generating_data_html,
+);
+# You define site-wide options for "Generating..." page (if enabled) here;
+# override them with $GITWEB_CONFIG as necessary.
+our %generating_options = (
+ # The delay before displaying "Generating..." page, in seconds. It is
+ # intended for "Generating..." page to be shown only when really needed.
+ 'startup_delay' => 1,
+ # The time before generating new piece of output, to prevent from
+ # redirection before data is ready, in seconds.
+ 'print_interval' => 2,
+ # Maximum time "Generating..." page would be present, waiting for data,
+ # before unconditional redirect, in seconds.
+ 'timeout' => $cache_options{'expires_min'},
+);
# You define site-wide feature defaults here; override them with
# $GITWEB_CONFIG as necessary.
@@ -1273,7 +1298,7 @@ sub to_utf8 {
# correct, but quoted slashes look too horrible in bookmarks
sub esc_param {
my $str = shift;
- $str =~ s/([^A-Za-z0-9\-_.~()\/:@ ]+)/CGI::escape($1)/eg;
+ $str =~ s/([^A-Za-z0-9\-_.~()\/:@ ]+)/CGI::escape($1)/eg;#'
$str =~ s/ /\+/g;
return $str;
}
@@ -3260,6 +3285,129 @@ sub blob_contenttype {
## ======================================================================
## functions printing HTML: header, footer, error page
+# creates "Generating..." page when caching enabled and not in cache
+sub git_generating_data_html {
+ my ($cache, $key, $lock_fh) = @_;
+
+ # whitelist of actions that should get "Generating..." page
+ unless ($action =~ /(?:blame(?:|_incremental) | blobdiff | blob |
+ commitdiff | commit | forks | heads | tags |
+ log | shortlog | history | search |
+ tag | tree | summary | project_list)/x) {
+ return;
+ }
+ # blacklist of actions that should not have "Generating..." page
+ #if ($action =~ /(?:atom | rss | opml |
+ # blob_plain | blobdiff_plain | commitdiff_plain |
+ # patch | patches |
+ # blame_data | search_help | object | project_index |
+ # snapshot/x) { # binary
+ # return;
+ #}
+
+ # Stop capturing response
+ #
+ # NOTE: this would not be needed if gitweb would do 'print $out',
+ # and one could rely on printing to STDOUT to be not captured
+ cache_stop(); # or gitweb could use 'print $STDOUT' in place of 'print STDOUT'
+
+ # Initial delay
+ if ($generating_options{'startup_delay'} > 0) {
+ eval {
+ local $SIG{ALRM} = sub { die "alarm clock restart\n" }; # NB: \n required
+ alarm $generating_options{'startup_delay'};
+ flock($lock_fh, LOCK_SH); # blocking readers lock
+ alarm 0;
+ };
+ if ($@) {
+ # propagate unexpected errors
+ die $@ if $@ !~ /alarm clock restart/;
+ } else {
+ # we got response within 'startup_delay' timeout
+ return;
+ }
+ }
+
+ my $title = "[Generating...] $site_name";
+ # TODO: the following fragment of code duplicates the one
+ # in git_header_html, and it should be refactored.
+ if (defined $project) {
+ $title .= " - " . to_utf8($project);
+ if (defined $action) {
+ $title .= "/$action";
+ if (defined $file_name) {
+ $title .= " - " . esc_path($file_name);
+ if ($action eq "tree" && $file_name !~ m|/$|) {
+ $title .= "/";
+ }
+ }
+ }
+ }
+ my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
+
+ # Use the trick that 'refresh' HTTP header equivalent (set via http-equiv)
+ # with timeout of 0 seconds would redirect as soon as page is finished.
+ # This "Generating..." redirect page should not be cached (externally).
+ print STDOUT $cgi->header(-type => 'text/html', -charset => 'utf-8',
+ -status=> '200 OK', -expires => 'now');
+ print STDOUT <<"EOF";
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
+<!-- git web interface version $version -->
+<!-- git core binaries version $git_version -->
+<head>
+<meta http-equiv="content-type" content="text/html; charset=utf-8" />
+<meta http-equiv="refresh" content="0" />
+<meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version" />
+<meta name="robots" content="noindex, nofollow" />
+<title>$title</title>
+</head>
+<body>
+EOF
+ print STDOUT 'Generating...';
+
+ local $| = 1; # autoflush
+ my $total_time = 0;
+ my $interval = $generating_options{'print_interval'} || 1;
+ my $timeout = $generating_options{'timeout'};
+ my $alarm_handler = sub {
+ print STDOUT '.';
+ $total_time += $interval;
+ if ($total_time > $timeout) {
+ die "timeout\n";
+ }
+ };
+ eval {
+ # check if we can use functions from Time::HiRes
+ if (defined $t0) {
+ local $SIG{ALRM} = $alarm_handler;
+ Time::HiRes::alarm($interval, $interval);
+ } else {
+ local $SIG{ALRM} = sub {
+ $alarm_handler->();
+ alarm($interval);
+ };
+ alarm($interval);
+ }
+ flock($lock_fh, LOCK_SH); # blocking readers lock
+ alarm 0;
+ };
+ # It doesn't really matter if we got lock, or timed-out
+ # but we should re-throw unknown (unexpected) errors
+ die $@ if ($@ and $@ !~ /timeout/);
+
+ print STDOUT <<"EOF";
+
+</body>
+</html>
+EOF
+
+ exit 0;
+ #return;
+}
+
sub git_header_html {
my $status = shift || "200 OK";
my $expires = shift;
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 9643631..33ffb9d 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -338,6 +338,67 @@ SKIP: {
}
$cache->set_expires_in(-1);
+# Test 'generating_info' feature
+#
+$cache->remove($key);
+my $progress_info = "Generating...";
+sub test_generating_info {
+ local $| = 1;
+ print "$progress_info";
+}
+$cache->set_generating_info(\&test_generating_info);
+# Catch output printed by ->compute
+# (only for 'print <sth>' and 'printf <sth>')
+sub capture_compute {
+ my $output = '';
+
+ open my $output_fh, '>', \$output;
+ my $oldfh = select($output_fh);
+
+ my $data = $cache->compute($key, \&get_value_slow);
+
+ select($oldfh);
+ close $output_fh;
+
+ return ($output, $data);
+}
+sub run_capture_compute_forked {
+ my $pid = shift;
+
+ my ($output, $data) = capture_compute();
+ my ($child_output, $child_data);
+
+ if ($pid) {
+ local $/ = "\0";
+ chomp($child_output = <$kid_fh>);
+ chomp($child_data = <$kid_fh>);
+
+ waitpid $pid, 0;
+ close $kid_fh;
+ } else {
+ local $| = 1;
+ $output = '' unless defined $output;
+ $data = '' unless defined $data;
+ print "$output\0$data\0";
+ exit 0;
+ }
+
+ return ($output, $data, $child_output, $child_data);
+}
+SKIP: {
+ $pid = open $kid_fh, '-|';
+ skip "cannot fork: $!", 4
+ unless defined $pid;
+
+ my ($output, $data, $child_output, $child_data) =
+ run_capture_compute_forked($pid);
+
+ is($output, $progress_info, 'progress info from parent');
+ is($child_output, $progress_info, 'progress info from child');
+ is($data, $value, 'data info from parent');
+ is($child_data, $value, 'data info from child');
+}
+
done_testing();
print Dumper($cache);
--
1.6.6.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCHv3 00/10] gitweb: Simple file based output caching
2010-02-16 19:36 [RFC PATCHv3 00/10] gitweb: Simple file based output caching Jakub Narebski
` (9 preceding siblings ...)
2010-02-16 19:36 ` [RFC PATCHv3 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
@ 2010-02-18 22:01 ` J.H.
2010-02-19 0:14 ` Jakub Narebski
2010-02-28 2:54 ` gitweb: Simple file based output caching TODO (was: Re: [RFC PATCHv3 00/10] gitweb: Simple file based output caching) Jakub Narebski
11 siblings, 1 reply; 16+ messages in thread
From: J.H. @ 2010-02-18 22:01 UTC (permalink / raw)
To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley, Petr Baudis
> Shortlog:
> ~~~~~~~~~
> Jakub Narebski (10):
> gitweb: href(..., -path_info => 0|1)
This still looks fine to me.
> gitweb/cache.pm - Very simple file based cache
This looks fine to me, and I'm much more keen on using the CHI interface.
> gitweb/cache.pm - Stat-based cache expiration
- I think having expires time at -1 (never) is a dangerous default. I
think having it as an option is fine, but setting this to something like
6 months may be a better option. I wouldn't expect this to be an issue
in reality, but I can just see someone screwing up and having a cache
system that *never* expires. I think I'd rather see something long vs.
never. But that's just my opinion.
- This does have the problem (though this gets fixed later on, again)
that it would return invalid if there's a process already updating the
cache. This might need to be fixed later to understand the locking
structures on what is/isn't valid (this is mostly a note for me while I
read through the patches)
> gitweb: Use Cache::Cache compatibile (get, set) output caching
- It might be worth (in a later patch) enabling the PerlIO layers as
there was a significant speed improvement in using them, despite their
non-standardness.
- What if you want to use a different caching library but don't want
cache.pm ? That first bit of "if ($caching_enabled) {" might need to be
wrapped to figure out if we are using our inbuilt caching or an external
caching system. (just thinking out loud)
Looks fine to me.
> gitweb/cache.pm - Adaptive cache expiration time
is the commented:
#return &{$self->{'check_load'}}();
intended in check_load() ?
otherwise this looks fine to me.
> gitweb: Use CHI compatibile (compute method) caching
Looks fine to me. I think it's fine where it's at in the order myself.
> gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem
Looks fine to me.
> gitweb/cache.pm - Serve stale data when waiting for filling cache
Looks fine to me.
> gitweb/cache.pm - Regenerate (refresh) cache in background
Looks good to me.
> gitweb: Show appropriate "Generating..." page when regenerating cache
I've got a couple of things that will need to be added on top of this
(mainly to try and guess if the client is dumb or smart) so that if it
won't support generating... as expected it doesn't get it. But that can
come in a later patch.
This looks fine to me.
>
> Diffstat:
> ~~~~~~~~~
> gitweb/README | 70 ++++
> gitweb/cache.pm | 655 ++++++++++++++++++++++++++++++++
> gitweb/gitweb.perl | 321 +++++++++++++++-
> t/gitweb-lib.sh | 2 +
> t/t9500-gitweb-standalone-no-errors.sh | 19 +
> t/t9503-gitweb-caching.sh | 32 ++
> t/t9503/test_cache_interface.pl | 404 ++++++++++++++++++++
> t/test-lib.sh | 3 +
> 8 files changed, 1486 insertions(+), 20 deletions(-)
> create mode 100644 gitweb/cache.pm
> create mode 100755 t/t9503-gitweb-caching.sh
> create mode 100755 t/t9503/test_cache_interface.pl
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCHv3 00/10] gitweb: Simple file based output caching
2010-02-18 22:01 ` [RFC PATCHv3 00/10] gitweb: Simple file based output caching J.H.
@ 2010-02-19 0:14 ` Jakub Narebski
0 siblings, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-19 0:14 UTC (permalink / raw)
To: J.H.; +Cc: git, John 'Warthog9' Hawley, Petr Baudis
On Thu, 18 Feb 2010, J.H. wrote:
> > Shortlog:
> > ~~~~~~~~~
> > Jakub Narebski (10):
[removed all "Looks fine to me.]
> > gitweb/cache.pm - Very simple file based cache
>
> This looks fine to me, and I'm much more keen on using the CHI interface.
First, Cache::Cache is older and more mature than CHI, so it is more
likely for it to be installed (especially that nowadays CHI requires
Moose, modern OOP for Perl).
Second, I think that all caching engines (like Cache::Memcached and
Cache::FastMmap, which are not in Cache::Cache) provide ->get / ->set()
methods. Not all provide ->compute() method, and those that provide
its equivalent might do it differently (Cache::Memcached uses callbacks),
or name method differently (Cache::FastMmap uses ->get_and_set() instead
of ->compute()).
> > gitweb/cache.pm - Stat-based cache expiration
>
> - I think having expires time at -1 (never) is a dangerous default. I
> think having it as an option is fine, but setting this to something like
> 6 months may be a better option. I wouldn't expect this to be an issue
> in reality, but I can just see someone screwing up and having a cache
> system that *never* expires. I think I'd rather see something long vs.
> never. But that's just my opinion.
You might be right here. If it was generic cache engine, then default
to never expire is the only possible solution, as you don't know how
cache would be used, and what would be lifetimes involved. But for
gitweb we can (try to) guess good default expire time.
Please do remember however that there are *two* sets of defaults.
First is hardcoded in ->new() constructor for GitwebCache::SimpleFileCache,
second is in %cache_options in gitweb, and this is the one that can be
changed in $GITWEB_CONFIG. It is 'expires_on' value in %cache_options
that takes precedence (introduced in next commit), and it is 20 seconds.
> - This does have the problem (though this gets fixed later on, again)
> that it would return invalid if there's a process already updating the
> cache. This might need to be fixed later to understand the locking
> structures on what is/isn't valid (this is mostly a note for me while I
> read through the patches)
Yes, it is possible that more than one process would update cache.
Using locking is one of possible ways of solving or reducing this
issue, other is 'expires_variance' technique from CHI,... yet another
could be "touching" (setting mtime) of a cache file before regenerating
(refreshing it).
But GitwebCache::SimpleFileCache (err... the name could be better)
would always return correct (fully generated) cache entry, and would
not return partially filled data, even if there are more than one process
generating data in parallel (simultaneously), thanks to "atomic write"
technique (write to temporary file, then rename said file to destination).
File::Temp (or UUID + sequence number) would ensure that processes don't
stomp on each other data.
> > gitweb: Use Cache::Cache compatibile (get, set) output caching
>
> - It might be worth (in a later patch) enabling the PerlIO layers as
> there was a significant speed improvement in using them, despite their
> non-standardness.
As I wrote and said on #git, I think I was benchmarking incorrect thing;
what is important is the difference between writing large amount of data
(like e.g. snapshot) with different output capturing solutions; the
difference between PerlIO and other solutions was the difference between
12 μs (microseconds) and 18 μs _when profiling_, or something like that.
NOTE: output caching for gitweb consists of three separate issues:
* caching engine (GitwebCache::SimpleFileCache, or other caching engine
supporting ->get/->set or ->compute (see also below))
* capturing output (current 'select(FILEHANDLE)' solution, possible
tie filehandle and PerlIO layers solutions, or "tee" solutions...
but that could be added later)
* output caching driver (a la CGI::Cache, or Plack::Middleware::Cache,
which gets data from cache and prints it, or captures and prints
or "tee"s output and saves it to cache).
We can provide other options for capturing output, but this should be
IMVHO left for later commits, isn't it?
>
> - What if you want to use a different caching library but don't want
> cache.pm ? That first bit of "if ($caching_enabled) {" might need to be
> wrapped to figure out if we are using our inbuilt caching or an external
> caching system. (just thinking out loud)
It is described in gitweb/README... although I haven't tested it myself
(probably should have at least for one other engine, like
Cache::FileCache).
In short you can set $cache variable to instance (initialized object)
of selected caching engine, perhaps using %cache_options in its
constructor, or set $cache to string containing name of cache class,
and add extra options to constructor in %cache_options.
> Looks fine to me.
>
> > gitweb/cache.pm - Adaptive cache expiration time
>
> is the commented:
>
> #return &{$self->{'check_load'}}();
>
> intended in check_load() ?
>
> otherwise this looks fine to me.
Leftover from thinking about different ways of writing it. You can
write either &{$self->{'check_load'}}(), or you can write
$self->{'check_load'}->() - it is a matter of style.
Sidenote: we should probably standarise whether caller or the method
itself is responsible for checking if field is set, in the like methods
(there would be one more like that: the 'generating_info' callback).
>
> > gitweb: Use CHI compatibile (compute method) caching
>
> Looks fine to me. I think it's fine where it's at in the order myself.
O.K.
> > gitweb: Show appropriate "Generating..." page when regenerating cache
>
> I've got a couple of things that will need to be added on top of this
> (mainly to try and guess if the client is dumb or smart) so that if it
> won't support "Generating..." as expected it doesn't get it. But that can
> come in a later patch.
See subthread http://thread.gmane.org/gmane.comp.version-control.git/136913/focus=137805
in first version of [split] output caching for gitweb series.
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 16+ messages in thread
* gitweb: Simple file based output caching TODO (was: Re: [RFC PATCHv3 00/10] gitweb: Simple file based output caching)
2010-02-16 19:36 [RFC PATCHv3 00/10] gitweb: Simple file based output caching Jakub Narebski
` (10 preceding siblings ...)
2010-02-18 22:01 ` [RFC PATCHv3 00/10] gitweb: Simple file based output caching J.H.
@ 2010-02-28 2:54 ` Jakub Narebski
2010-02-28 11:51 ` gitweb: Simple file based output caching TODO Jakub Narebski
2010-02-28 12:07 ` gitweb: Simple file based output caching TODO (was: Re: [RFC PATCHv3 00/10] gitweb: Simple file based output caching) Petr Baudis
11 siblings, 2 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-28 2:54 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Petr Baudis
On Tue, 16 Feb 2010, Jakub Narebski wrote:
> This 10 patches long patch series is intended (yet) as preliminary version
> for splitting large 'gitweb: File based caching layer (from git.kernel.org)'
> mega-patch by John 'Warthog9' Hawley aka J.H., by starting small and
> adding features piece by piece.
[...]
> Shortlog:
> ~~~~~~~~~
> Jakub Narebski (10):
> gitweb: href(..., -path_info => 0|1)
> gitweb/cache.pm - Very simple file based cache
> gitweb/cache.pm - Stat-based cache expiration
> gitweb: Use Cache::Cache compatibile (get, set) output caching
> gitweb/cache.pm - Adaptive cache expiration time
> gitweb: Use CHI compatibile (compute method) caching
> gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem
> gitweb/cache.pm - Serve stale data when waiting for filling cache
> gitweb/cache.pm - Regenerate (refresh) cache in background
> gitweb: Show appropriate "Generating..." page when regenerating cache
Here is the list of things that needs to be addressed in the future
next (v4) version of this series, hopefully finally not an RFC.
* The caching engine (GitwebCache::SimpleFileCache) starts with default
expire time of "never" (-1), while later it uses gitweb defaults when
adaptive caching lifetime is added (20 / 1200 seconds). This (slight)
inconsistency should be fixed, either by using default of "never", or
by using gitweb defaults for caching engine defaults in both patches:
gitweb/cache.pm - Stat-based cache expiration
gitweb/cache.pm - Adaptive cache expiration time
Note that caching engine defaults are *independent* of gitweb's
defaults in %cache_options.
* Describe (better than it is now) in comments or in commit message
why Temp::File is used for 'atomic write' _without_ locking (even
when there might be more than one process (re)generating the same
cache entry simultaneously).
* [improvement]. When using locking after
gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem
it is ensured that only one process would (re)generate cache entry.
Therefore Temp::File is not needed for temporary file; the temporary
file can have constant name. This should improve performance a bit.
But this might be unnecessary complication.
* Show information about when page was generated in the footer always
when caching is enabled; currently it is shown only if caching *and*
'timed' feature is enabled
gitweb: Use Cache::Cache compatibile (get, set) output caching
* Actually check that using alternate caching engine works. It can be
done (what is described in gitweb/README) by setting $cache to either
cache engine class (package) name, or to cache object (instantiated
cache).
* [cleanup] Remove commented out alternate solutions (commented out
code).
* Benchmark overhead of caching, and performance of caching, perhaps for
different caching engines including original patch by J.H.
John, Pasky, do you have any further comments / requests?
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitweb: Simple file based output caching TODO
2010-02-28 2:54 ` gitweb: Simple file based output caching TODO (was: Re: [RFC PATCHv3 00/10] gitweb: Simple file based output caching) Jakub Narebski
@ 2010-02-28 11:51 ` Jakub Narebski
2010-02-28 12:07 ` gitweb: Simple file based output caching TODO (was: Re: [RFC PATCHv3 00/10] gitweb: Simple file based output caching) Petr Baudis
1 sibling, 0 replies; 16+ messages in thread
From: Jakub Narebski @ 2010-02-28 11:51 UTC (permalink / raw)
To: git
Cc: John 'Warthog9' Hawley, John 'Warthog9' Hawley,
Petr Baudis
> Here is the list of things that needs to be addressed in the future
> next (v4) version of this series, hopefully finally not an RFC.
>
> * The caching engine (GitwebCache::SimpleFileCache) starts with default
> expire time of "never" (-1), while later it uses gitweb defaults when
> adaptive caching lifetime is added (20 / 1200 seconds). This (slight)
> inconsistency should be fixed, either by using default of "never", or
> by using gitweb defaults for caching engine defaults in both patches:
> gitweb/cache.pm - Stat-based cache expiration
> gitweb/cache.pm - Adaptive cache expiration time
>
> Note that caching engine defaults are *independent* of gitweb's
> defaults in %cache_options.
>
> * Describe (better than it is now) in comments or in commit message
> why Temp::File is used for 'atomic write' _without_ locking (even
> when there might be more than one process (re)generating the same
> cache entry simultaneously).
>
> * [improvement]. When using locking after
> gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem
> it is ensured that only one process would (re)generate cache entry.
> Therefore Temp::File is not needed for temporary file; the temporary
> file can have constant name. This should improve performance a bit.
>
> But this might be unnecessary complication.
>
> * Show information about when page was generated in the footer always
> when caching is enabled; currently it is shown only if caching *and*
> 'timed' feature is enabled
> gitweb: Use Cache::Cache compatibile (get, set) output caching
>
> * Actually check that using alternate caching engine works. It can be
> done (what is described in gitweb/README) by setting $cache to either
> cache engine class (package) name, or to cache object (instantiated
> cache).
>
> * [cleanup] Remove commented out alternate solutions (commented out
> code).
>
> * Benchmark overhead of caching, and performance of caching, perhaps for
> different caching engines including original patch by J.H.
* Turn off 'blame_incremental' view (and links to it) when caching is
enabled. It doesn't make sense without support for "tee"-ing output
in caching engine (or to be more exact in the output capturing
engine), i.e. without printing output while capturing it.
Also currently it doesn't work anyway, for some reason.
* Check why git_generating_data_html, i.e. the "Generating..."
subroutine doesn't add '.' as activity indicator and fix it.
Test if the trick employed by it works in other browsers.
--
Jakub Narebski
Poland
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: gitweb: Simple file based output caching TODO (was: Re: [RFC PATCHv3 00/10] gitweb: Simple file based output caching)
2010-02-28 2:54 ` gitweb: Simple file based output caching TODO (was: Re: [RFC PATCHv3 00/10] gitweb: Simple file based output caching) Jakub Narebski
2010-02-28 11:51 ` gitweb: Simple file based output caching TODO Jakub Narebski
@ 2010-02-28 12:07 ` Petr Baudis
1 sibling, 0 replies; 16+ messages in thread
From: Petr Baudis @ 2010-02-28 12:07 UTC (permalink / raw)
To: Jakub Narebski
Cc: git, John 'Warthog9' Hawley,
John 'Warthog9' Hawley
On Sun, Feb 28, 2010 at 03:54:49AM +0100, Jakub Narebski wrote:
> John, Pasky, do you have any further comments / requests?
Sorry, I'm working intensively on two papers with looming deadline now
and have other time-demanding teaching duties, so I can't watch this as
closely as I'd wish - but I have no immediate further requests. I like
the general direction and hope this to be finally merged even if it
would have some minor nits. :)
--
Petr "Pasky" Baudis
"Ars longa, vita brevis." -- Hippocrates
^ permalink raw reply [flat|nested] 16+ messages in thread