git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Cc: Pavan Kumar Sunkara <pavan.sss1991@gmail.com>,
	Petr Baudis <pasky@ucw.cz>,
	Christian Couder <chriscool@tuxfamily.org>,
	John 'Warthog9' Hawley <warthog9@eaglescrag.net>,
	John 'Warthog9' Hawley <warthog9@kernel.org>,
	Jakub Narebski <jnareb@gmail.com>
Subject: [RFC PATCHv4 15/17] gitweb/lib - Regenerate (refresh) cache in background
Date: Mon, 14 Jun 2010 18:08:28 +0200	[thread overview]
Message-ID: <1276531710-22945-17-git-send-email-jnareb@gmail.com> (raw)
In-Reply-To: <1276531710-22945-1-git-send-email-jnareb@gmail.com>

This commit removes asymmetry in serving stale data (if it exists)
when regenerating cache in GitwebCache::FileCacheWithLocking.  The process
that acquired exclusive (writers) lock, and is therefore selected to
be the one that (re)generates data to fill the cache, can now generate
data in background, while serving stale data.

Those background processes are daemonized, i.e. detached from the main
process (the one returning data or stale data).  Otherwise there might be a
problem when gitweb is running as (part of) long-lived process, for example
from mod_perl (or in the future from FastCGI): it would leave unreaped
children as zombies (entries in process table).  We don't want to wait for
background process, and we can't set $SIG{CHLD} to 'IGNORE' in gitweb to
automatically reap child processes, because this interferes with using
  open my $fd, '-|', git_cmd(), 'param', ...
    or die_error(...)
  # read from <$fd>
  close $fd
    or die_error(...)
In the above code "close" for magic "-|" open calls waitpid...  and we
would would die with "No child processes".  Removing 'or die' would
possibly remove ability to react to other errors.

This feature can be enabled or disabled on demand via 'background_cache'
cache parameter.  It is turned on by default.


To be implemented (from original patch by J.H.):
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl                             |    9 ++++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   61 +++++++++++++++++++----
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |    3 +-
 t/t9503/test_cache_interface.pl                |   12 ++++-
 4 files changed, 70 insertions(+), 15 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 5aee9a1..2ca1ad7 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -299,6 +299,15 @@ our %cache_options = (
 	# Set it to -1 to always serve existing data if it exists,
 	# set it to 0 to turn off serving stale data - always wait.
 	'max_lifetime' => 5*60*60, # 5 hours
+
+	# This enables/disables background caching.  If it is set to true value,
+	# caching engine would return stale data (if it is not older than
+	# 'max_lifetime' seconds) if it exists, and launch process if regenerating
+	# (refreshing) cache into the background.  If it is set to false value,
+	# the process that fills cache must always wait for data to be generated.
+	# In theory this will make gitweb seem more responsive at the price of
+	# serving possibly stale data.
+	'background_cache' => 1,
 );
 
 
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 266114c..bde1420 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -23,6 +23,7 @@ use warnings;
 
 use File::Path qw(mkpath);
 use Fcntl qw(:flock);
+use POSIX qw(setsid);
 
 # ......................................................................
 # constructor
@@ -70,21 +71,27 @@ use Fcntl qw(:flock);
 #    than it, serve stale data when waiting for cache entry to be 
 #    regenerated (refreshed).  Non-adaptive.
 #    Defaults to -1 (never expire / always serve stale).
+#  * 'background_cache' (boolean)
+#    This enables/disables regenerating cache in background process.
+#    Defaults to true.
 sub new {
 	my ($proto, $p_options_hash_ref) = @_;
 
 	my $class = ref($proto) || $proto;
 	my $self = $class->SUPER::new($p_options_hash_ref);
 
-	my ($max_lifetime);
+	my ($max_lifetime, $background_cache);
 	if (defined $p_options_hash_ref) {
 		$max_lifetime =
 			$p_options_hash_ref->{'max_lifetime'} ||
 			$p_options_hash_ref->{'max_cache_lifetime'};
+		$background_cache = $p_options_hash_ref->{'background_cache'};
 	}
 	$max_lifetime = -1 unless defined($max_lifetime);
+	$background_cache = 1 unless defined($background_cache);
 
 	$self->set_max_lifetime($max_lifetime);
+	$self->set_background_cache($background_cache);
 
 	return $self;
 }
@@ -95,7 +102,7 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(max_lifetime)) {
+foreach my $i (qw(max_lifetime background_cache)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -132,9 +139,10 @@ sub get_lockname {
 # Uses file locking to have only one process updating value for $key
 # to avoid 'cache miss stampede' (aka 'stampeding herd') problem.
 sub compute {
-	my ($self, $key, $code) = @_;
+	my ($self, $key, $coderef) = @_;
+	my ($data, $stale_data);
 
-	my $data = $self->get($key);
+	$data = $self->get($key);
 	return $data if defined $data;
 
 	my $dir;
@@ -153,16 +161,47 @@ sub compute {
 	do {
 		open my $lock_fh, '+>', $lockfile
 			or die "Could't open lockfile '$lockfile': $!";
+
 		$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
 		if ($lock_state) {
-			# acquired writers lock
-			$data = $code->($self, $key);
-			$self->set($key, $data);
+			## acquired writers lock, have to generate data
+			my $pid;
+			if ($self->{'background_cache'}) {
+				# try to retrieve stale data
+				$stale_data = $self->fetch($key)
+					if $self->is_valid($key, $self->get_max_lifetime());
+
+				# fork if there is stale data, for background process
+				# to regenerate/refresh the cache (generate data)
+				$pid = fork() if (defined $stale_data);
+			}
+			if (!defined $pid || !$pid) {
+				## didn't fork, or are in background process
+
+				# daemonize background process, detaching it from parent
+				# see also Proc::Daemonize, Apache2::SubProcess
+				if (defined $pid) {
+					POSIX::setsid();  # or setpgrp(0, 0);
+					fork() && exit 0;
+				}
+
+				$data = $coderef->($self, $key);
+				$self->set($key, $data);
+
+				if (defined $pid) {
+					## in background process; parent will serve stale data
+					close $lock_fh
+						or die "Couldn't close lockfile '$lockfile' (background): $!";
+					exit 0;
+				}
+			}
+			
 		} else {
 			# try to retrieve stale data
-			$data = $self->fetch($key)
+			$stale_data = $self->fetch($key)
 				if $self->is_valid($key, $self->get_max_lifetime());
-			if (!defined $data) {
+
+			if (!defined $stale_data) {
 				# get readers lock if there is no stale data to serve
 				flock($lock_fh, LOCK_SH);
 				$data = $self->fetch($key);
@@ -171,9 +210,9 @@ sub compute {
 		# closing lockfile releases lock
 		close $lock_fh
 			or die "Could't close lockfile '$lockfile': $!";
-	} until (defined $data || $lock_state);
+	} until (defined $data || defined $stale_data || $lock_state);
 	# repeat until we have data, or we tried generating data oneself and failed
-	return $data;
+	return defined $data ? $data : $stale_data;
 }
 
 1;
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 5289894..9eb4896 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -129,7 +129,8 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace expires_min expires_max increase_factor check_load)) {
+foreach my $i (qw(depth root namespace
+                  expires_min expires_max increase_factor check_load)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 185307e..667fb5e 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -18,6 +18,9 @@ my $cache = new_ok('GitwebCache::FileCacheWithLocking', [ {
 } ]);
 isa_ok($cache, 'GitwebCache::SimpleFileCache');
 
+# compute can fork, don't generate zombies
+#local $SIG{CHLD} = 'IGNORE';
+
 # Test that default values are defined
 #
 ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
@@ -309,18 +312,21 @@ sub run_compute_forked {
 }
 $cache->set_expires_in(0);    # expire now
 $cache->set_max_lifetime(-1); # forever (always serve stale data)
+ok($cache->get_background_cache(), 'generate cache in background by default');
 $pid = open $kid_fh, '-|';
 SKIP: {
-	skip "cannot fork: $!", 2
+	skip "cannot fork: $!", 3
 		unless defined $pid;
 
 	my ($data, $child_data) = run_compute_forked($pid, $kid_fh);
 
 	# returning stale data works
-	ok($data eq $stale_value || $child_data eq $stale_value,
-	   'stale data in at least one process when expired');
+	is($data,       $stale_value, 'stale data in parent when expired');
+	is($child_data, $stale_value, 'stale data in child  when expired');
 
 	$cache->set_expires_in(-1); # never expire for next ->get
+	diag('waiting for background process to have time to set data');
+	sleep $slow_time; # wait for background process to have chance to set data
 	is($cache->get($key), $value, 'value got set correctly, even if stale data returned');
 }
 
-- 
1.7.0.1

  parent reply	other threads:[~2010-06-14 16:10 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 01/17] gitweb: Return or exit after done serving request Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 02/17] gitweb: Fix typo in hash key name in %opts in git_header_html Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 03/17] gitweb: Prepare for splitting gitweb Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 04/17] gitweb/lib - Very simple file based cache Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 05/17] gitweb/lib - Stat-based cache expiration Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 06/17] gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/) Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 07/17] gitweb/lib - Simple select(FH) based output capture Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 08/17] gitweb/lib - Alternate ways of capturing output Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 09/17] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 10/17] gitweb: Add optional output caching (from gitweb/cache.pm) Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 10/18] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 11/17] gitweb/lib - Adaptive cache expiration time Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 12/17] gitweb/lib - Use CHI compatibile (compute method) caching interface Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 13/17] gitweb/lib - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 14/17] gitweb/lib - Serve stale data when waiting for filling cache Jakub Narebski
2010-06-14 16:08 ` Jakub Narebski [this message]
2010-06-14 16:08 ` [RFC PATCHv4 16/17] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 17/17] gitweb: Add startup delay to activity indicator for cache Jakub Narebski
2010-06-24  7:56 ` [RFC PATCHv4 18/17] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
2010-06-24  7:56 ` [RFC PATCHv4 19/17] gitweb: Add beginnings of cache administration page (WIP) Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1276531710-22945-17-git-send-email-jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=pasky@ucw.cz \
    --cc=pavan.sss1991@gmail.com \
    --cc=warthog9@eaglescrag.net \
    --cc=warthog9@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).