git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Cc: John 'Warthog9' Hawley <warthog9@kernel.org>,
	John 'Warthog9' Hawley <warthog9@eaglescrag.net>,
	Junio C Hamano <gitster@pobox.com>, demerphq <demerphq@gmail.com>,
	Aevar Arnfjord Bjarmason <avarab@gmail.com>,
	Thomas Adam <thomas@xteddy.org>,
	Jakub Narebski <jnareb@gmail.com>
Subject: [PATCH 14/24] gitweb/lib - Serve stale data when waiting for filling cache
Date: Tue,  7 Dec 2010 00:10:59 +0100	[thread overview]
Message-ID: <1291677069-6559-15-git-send-email-jnareb@gmail.com> (raw)
In-Reply-To: <1291677069-6559-1-git-send-email-jnareb@gmail.com>

When process fails to acquire exclusive (writers) lock, then instead
of waiting for the other process to (re)generate and fill cache, serve
stale (expired) data from cache.  This is of course possible only if
there is some stale data in cache for given key.

This feature of GitwebCache::FileCacheWithLocking is used only for
->update($key, $code) and ->update_fh($key, $code_fh) methods.  It is
controlled by 'max_lifetime' cache parameter; you can set it to -1 to
always serve stale data if it exists, and you can set it to 0 (or any
value smaller than 'expires_min') to turn this feature off.

This feature, as it is implemented currently, makes ->update() method a
bit asymmetric with respect to process that acquired writers lock and
those processes that didn't, which can be seen in the new test in t9503.
The process that is to regenerate (refresh) data in cache must wait for
the data to be generated in full before showing anything to client, while
the other processes can show stale (expired) data immediately.  In order
to remove or reduce this asymmetry gitweb would need to employ one of
two alternate solutions.  Either data should be (re)generated in background,
so that process that acquired writers lock would generate data in
background while serving stale data, or alternatively the process that
generates data should pass output to original STDOUT while capturing it
("tee" output).

Note that process that got stale data serves it immediately, therefore
it wouldn't be available to regenerate data if process regenerating
data died; see commented-out TODO test in t9503.  Otherwise it would
have to wait to check if data got regenerated, which would negate the
idea of serving stale data for a fast return.

When developing this feature, ->is_valid() method in base class
GitwebCache::SimpleFileCache acquired additional extra optional
parameter, where one can pass expire time instead of using whole-cache
global (adaptive) expire time.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Compared to version in previous version of this series the parallel
access test got much improved (this actually started in earlier
commit).

This is the part that is possible _without_ regenerating cache in
background.

Note that here it is explicit that serving stale data when some
process is regenerating cache is possible only with locking enabled,
i.e. when using GitwebCache::FileCacheWithLocking.

 gitweb/gitweb.perl                             |    8 ++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |  105 ++++++++++++++++++++++--
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |   22 ++++--
 t/t9503/test_cache_interface.pl                |   68 +++++++++++++++-
 4 files changed, 189 insertions(+), 14 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 72683be..454766c 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -327,6 +327,14 @@ our %cache_options = (
 	# lifetime control.
 	# (Compatibile with Cache::Adaptive.)
 	'check_load' => \&get_loadavg,
+
+	# Maximum cache file life, in seconds.  If cache entry lifetime exceeds
+	# this value, it wouldn't be served as being too stale when waiting for
+	# cache to be regenerated/refreshed, instead of trying to display
+	# existing cache date.
+	# Set it to -1 to always serve existing data if it exists,
+	# set it to 0 to turn off serving stale data - always wait.
+	'max_lifetime' => 5*60*60, # 5 hours
 );
 # Set to _initialized_ instance of GitwebCache::Capture compatibile capturing
 # engine, i.e. one implementing ->new() constructor, and ->capture($code)
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 4d8114d..1d32810 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -25,7 +25,88 @@ use File::Path qw(mkpath);
 use Fcntl qw(:flock);
 
 # ......................................................................
-# constructor is inherited from GitwebCache::SimpleFileCache
+# constructor
+
+# The options are set by passing in a reference to a hash containing
+# any of the following keys:
+#  * 'namespace'
+#    The namespace associated with this cache.  This allows easy separation of
+#    multiple, distinct caches without worrying about key collision.  Defaults
+#    to $DEFAULT_NAMESPACE.
+#  * 'cache_root' (Cache::FileCache compatibile),
+#    'root_dir' (CHI::Driver::File compatibile),
+#    The location in the filesystem that will hold the root of the cache.
+#    Defaults to $DEFAULT_CACHE_ROOT.
+#  * 'cache_depth' (Cache::FileCache compatibile),
+#    'depth' (CHI::Driver::File compatibile),
+#    The number of subdirectories deep to cache object item.  This should be
+#    large enough that no cache directory has more than a few hundred objects.
+#    Defaults to $DEFAULT_CACHE_DEPTH unless explicitly set.
+#  * 'default_expires_in' (Cache::Cache compatibile),
+#    'expires_in' (CHI compatibile) [seconds]
+#    The expiration time for objects place in the cache.
+#    Defaults to -1 (never expire) if not explicitly set.
+#    Sets 'expires_min' to given value.
+#  * 'expires_min' [seconds]
+#    The minimum expiration time for objects in cache (e.g. with 0% CPU load).
+#    Used as lower bound in adaptive cache lifetime / expiration.
+#    Defaults to 20 seconds; 'expires_in' sets it also.
+#  * 'expires_max' [seconds]
+#    The maximum expiration time for objects in cache.
+#    Used as upper bound in adaptive cache lifetime / expiration.
+#    Defaults to 1200 seconds, if not set; 
+#    defaults to 'expires_min' if 'expires_in' is used.
+#  * 'check_load'
+#    Subroutine (code) used for adaptive cache lifetime / expiration.
+#    If unset, adaptive caching is turned off; defaults to unset.
+#  * 'increase_factor' [seconds / 100% CPU load]
+#    Factor multiplying 'check_load' result when calculating cache lietime.
+#    Defaults to 60 seconds for 100% SPU load ('check_load' returning 1.0).
+#
+# (all the above are inherited from GitwebCache::SimpleFileCache)
+#
+#  * 'max_lifetime' [seconds]
+#    If it is greater than 0, and cache entry is expired but not older
+#    than it, serve stale data when waiting for cache entry to be 
+#    regenerated (refreshed).  Non-adaptive.
+#    Defaults to -1 (never expire / always serve stale).
+sub new {
+	my $class = shift;
+	my %opts = ref $_[0] ? %{ $_[0] } : @_;
+
+	my $self = $class->SUPER::new(\%opts);
+
+	my ($max_lifetime);
+	if (%opts) {
+		$max_lifetime =
+			$opts{'max_lifetime'} ||
+			$opts{'max_cache_lifetime'};
+	}
+	$max_lifetime = -1 unless defined($max_lifetime);
+
+	$self->set_max_lifetime($max_lifetime);
+
+	return $self;
+}
+
+# ......................................................................
+# accessors
+
+# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
+
+# creates get_depth() and set_depth($depth) etc. methods
+foreach my $i (qw(max_lifetime)) {
+	my $field = $i;
+	no strict 'refs';
+	*{"get_$field"} = sub {
+		my $self = shift;
+		return $self->{$field};
+	};
+	*{"set_$field"} = sub {
+		my ($self, $value) = @_;
+		$self->{$field} = $value;
+	};
+}
 
 # ----------------------------------------------------------------------
 # utility functions and methods
@@ -67,7 +148,7 @@ sub _tempfile_to_path {
 
 sub _compute_generic {
 	my ($self, $key,
-	    $get_code, $set_code, $get_locked) = @_;
+	    $get_code, $fetch_code, $set_code, $fetch_locked) = @_;
 
 	my @result = $get_code->();
 	return @result if @result;
@@ -91,17 +172,23 @@ sub _compute_generic {
 				or die "Could't close lockfile '$lockfile': $!";
 
 		} else {
+			# try to retrieve stale data
+			@result = $fetch_code->()
+				if $self->is_valid($key, $self->get_max_lifetime());
+			return @result if @result;
+
 			# get readers lock (wait for writer)
+			# if there is no stale data to serve
 			flock($lock_fh, LOCK_SH);
 			# closing lockfile releases lock
-			if ($get_locked) {
-				@result = $get_code->();
+			if ($fetch_locked) {
+				@result = $fetch_code->();
 				close $lock_fh
 					or die "Could't close lockfile '$lockfile': $!";
 			} else {
 				close $lock_fh
 					or die "Could't close lockfile '$lockfile': $!";
-				@result = $get_code->();
+				@result = $fetch_code->();
 			}
 		}
 	} until (@result || $lock_state);
@@ -126,6 +213,9 @@ sub compute {
 			return $self->get($key);
 		},
 		sub {
+			return $self->fetch($key);
+		},
+		sub {
 			my $data = $code->();
 			$self->set($key, $data);
 			return $data;
@@ -152,9 +242,12 @@ sub compute_fh {
 			return $self->get_fh($key);
 		},
 		sub {
+			return $self->fetch_fh($key);
+		},
+		sub {
 			return $self->set_coderef_fh($key, $code_fh);
 		},
-		1 # $self->get_fh($key); just opens file
+		1 # $self->fetch_fh($key); just opens file
 	);
 }
 
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index aeb91d4..21ec434 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -365,12 +365,13 @@ sub remove {
 		or die "Couldn't remove file '$file': $!";
 }
 
-# $cache->is_valid($key)
+# $cache->is_valid($key[, $expires_in])
 #
 # Returns a boolean indicating whether $key exists in the cache
-# and has not expired (global per-cache 'expires_in').
+# and has not expired.  Uses global per-cache expires time, unless
+# passed optional $expires_in argument.
 sub is_valid {
-	my ($self, $key) = @_;
+	my ($self, $key, $expires_in) = @_;
 
 	my $path = $self->path_to_key($key);
 
@@ -383,7 +384,7 @@ sub is_valid {
 	return 0 unless ((stat(_))[7] > 0);
 
 	# expire time can be set to never
-	my $expires_in = $self->get_expires_in();
+	$expires_in = defined $expires_in ? $expires_in : $self->get_expires_in();
 	return 1 unless (defined $expires_in && $expires_in >= 0);
 
 	# is file expired?
@@ -441,18 +442,25 @@ sub compute {
 # ......................................................................
 # nonstandard interface methods
 
-sub get_fh {
+sub fetch_fh {
 	my ($self, $key) = @_;
 
 	my $path = $self->path_to_key($key);
 	return unless (defined $path);
 
-	return unless ($self->is_valid($key));
-
 	open my $fh, '<', $path or return;
 	return ($fh, $path);
 }
 
+
+sub get_fh {
+	my ($self, $key) = @_;
+
+	return unless ($self->is_valid($key));
+
+	return $self->fetch_fh($key);
+}
+
 sub set_coderef_fh {
 	my ($self, $key, $code) = @_;
 
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index c6a28f8..8a52261 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -22,7 +22,9 @@ BEGIN { use_ok('GitwebCache::FileCacheWithLocking'); }
 diag("Using lib '$INC[0]'");
 diag("Testing '$INC{'GitwebCache/FileCacheWithLocking.pm'}'");
 
-my $cache = new_ok('GitwebCache::FileCacheWithLocking');
+my $cache = new_ok('GitwebCache::FileCacheWithLocking', [ {
+	'max_lifetime' => 0, # turn it off
+} ]);
 isa_ok($cache, 'GitwebCache::SimpleFileCache');
 
 # Test that default values are defined
@@ -295,6 +297,70 @@ subtest 'parallel access' => sub {
 	done_testing();
 };
 
+# Test that cache returns stale data in existing but expired cache situation
+# (probably should be run only if GIT_TEST_LONG)
+#
+my $stale_value = 'Stale Value';
+
+subtest 'serving stale data when (re)generating' => sub {
+	$cache->set($key, $stale_value);
+	$call_count = 0;
+	$cache->set_expires_in(0);    # expire now
+	$cache->set_max_lifetime(-1); # forever (always serve stale data)
+
+	@output = parallel_run {
+		my $data = cache_compute($cache, $key, \&get_value_slow);
+		print $data;
+	};
+	ok(scalar(grep { $_ eq $stale_value } @output),
+	   'stale data in at least one process when expired');
+
+	$cache->set_expires_in(-1); # never expire for next ->get
+	is($cache->get($key), $value,
+	   'value got set correctly, even if stale data returned');
+
+
+# 	$cache->set($key, $stale_value);
+# 	unlink($lock_file);
+# 	@output = parallel_run {
+# 		my $data = eval { cache_compute($cache, $key, \&get_value_die_once); };
+# 		my $eval_error = $@;
+# 		print "$data" if defined $data;
+# 		print "$sep";
+# 		print "$eval_error" if defined $eval_error;
+# 	};
+#  TODO: {
+# 		local $TODO = 'not implemented';
+#
+# 		is_deeply(
+# 			[sort @output],
+# 			[sort ("$value${sep}", "${sep}get_value_die_once\n")],
+# 			'return non-stale value, even if process regenerating it died'
+# 		);
+#
+# 		$cache->set_expires_in(-1); # never expire for next ->get
+# 		is($cache->get($key), $value,
+# 		   'value got regenerated, even if process regenerating it died');
+# 	};
+# 	unlink($lock_file);
+
+	$cache->set($key, $stale_value);
+	$cache->set_expires_in(0);   # expire now
+	$cache->set_max_lifetime(0); # don't serve stale data
+
+	@output = parallel_run {
+		my $data = cache_compute($cache, $key, \&get_value_slow);
+		print $data;
+	};
+	# no returning stale data
+	ok(!scalar(grep { $_ eq $stale_value } @output),
+	   'no stale data if configured');
+
+
+	done_testing();
+};
+$cache->set_expires_in(-1);
+
 done_testing();
 
 
-- 
1.7.3

  parent reply	other threads:[~2010-12-06 23:12 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
2010-12-06 23:10 ` [PATCH 01/24] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
2010-12-06 23:10 ` [PATCH 02/24] gitweb: Prepare for splitting gitweb Jakub Narebski
2010-12-06 23:10 ` [PATCH 03/24] gitweb/lib - Very simple file based cache Jakub Narebski
2010-12-06 23:10 ` [PATCH 04/24] gitweb/lib - Stat-based cache expiration Jakub Narebski
2010-12-06 23:10 ` [PATCH 05/24] gitweb/lib - Regenerate entry if the cache file has size of 0 Jakub Narebski
2010-12-06 23:10 ` [PATCH 06/24] gitweb/lib - Simple output capture by redirecting STDOUT Jakub Narebski
2010-12-06 23:10 ` [PATCH 07/24] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
2010-12-06 23:10 ` [PATCH 08/24] gitweb: Add optional output caching Jakub Narebski
2010-12-06 23:10 ` [PATCH 09/24] gitweb/lib - Adaptive cache expiration time Jakub Narebski
2010-12-06 23:10 ` [PATCH 10/24] gitweb/lib - Use CHI compatibile (compute method) caching interface Jakub Narebski
2010-12-06 23:10 ` [PATCH 11/24] gitweb/lib - capture output directly to cache entry file Jakub Narebski
2010-12-06 23:10 ` [PATCH 12/24] gitweb/lib - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
2010-12-06 23:10 ` [PATCH 13/24] gitweb/lib - No need for File::Temp when locking Jakub Narebski
2010-12-06 23:10 ` Jakub Narebski [this message]
2010-12-06 23:11 ` [PATCH 15/24] gitweb/lib - Regenerate (refresh) cache in background Jakub Narebski
2010-12-06 23:11 ` [PATCH 16/24] gitweb: Introduce %actions_info, gathering information about actions Jakub Narebski
2010-12-06 23:11 ` [PATCH 17/24] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
2010-12-06 23:11 ` [PATCH 18/24] gitweb/lib - Configure running 'generating_info' when generating data Jakub Narebski
2010-12-06 23:11 ` [PATCH 19/24] gitweb: Add startup delay to activity indicator for cache Jakub Narebski
2010-12-06 23:11 ` [PATCH/RFC 20/24] gitweb/lib - Add support for setting error handler in cache Jakub Narebski
2010-12-06 23:11 ` [PATCH/RFC 21/24] gitweb: Wrap die_error to use as error handler for caching engine Jakub Narebski
2010-12-06 23:11 ` [PATCH/RFC 22/24] gitweb: Support legacy options used by kernel.org " Jakub Narebski
2010-12-06 23:11 ` [RFC/PATCH 23/24] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
2010-12-06 23:11 ` [RFC PATCH 24/24] gitweb: Add beginnings of cache administration page (proof of concept) Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1291677069-6559-15-git-send-email-jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=avarab@gmail.com \
    --cc=demerphq@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=thomas@xteddy.org \
    --cc=warthog9@eaglescrag.net \
    --cc=warthog9@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).