From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Cc: John 'Warthog9' Hawley <warthog9@kernel.org>,
John 'Warthog9' Hawley <warthog9@eaglescrag.net>,
Junio C Hamano <gitster@pobox.com>, demerphq <demerphq@gmail.com>,
Aevar Arnfjord Bjarmason <avarab@gmail.com>,
Thomas Adam <thomas@xteddy.org>,
Jakub Narebski <jnareb@gmail.com>
Subject: [PATCH 15/24] gitweb/lib - Regenerate (refresh) cache in background
Date: Tue, 7 Dec 2010 00:11:00 +0100 [thread overview]
Message-ID: <1291677069-6559-16-git-send-email-jnareb@gmail.com> (raw)
In-Reply-To: <1291677069-6559-1-git-send-email-jnareb@gmail.com>
This commit removes asymmetry in serving stale data (if stale data exists)
when regenerating cache in GitwebCache::FileCacheWithLocking. The process
that acquired exclusive (writers) lock, and is therefore selected to
be the one that (re)generates data to fill the cache, can now generate
data in background, while serving stale data.
Those background processes are daemonized, i.e. detached from the main
process (the one returning data or stale data). Otherwise there might be a
problem when gitweb is running as (part of) long-lived process, for example
from mod_perl or from FastCGI: it would leave unreaped children as zombies
(entries in process table). We don't want to wait for background process,
and we can't set $SIG{CHLD} to 'IGNORE' in gitweb to automatically reap
child processes, because this interferes with using
open my $fd, '-|', git_cmd(), 'param', ...
or die_error(...)
# read from <$fd>
close $fd
or die_error(...)
In the above code "close" for magic "-|" open calls waitpid... and we
would would die with "No child processes". Removing 'or die' would
possibly remove ability to react to other errors.
This feature can be enabled or disabled on demand via 'background_cache'
cache parameter. It is turned on by default.
The t9503 test got updated to test both case with background generation
enabled and case with background generation disabled.
Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Compared to previous version of this series ->_set_maybe_background
was extracted (refactored) from ->_compute_generic (in earlier version
it would be just ->compute). This hopefully makes code easier to
understand.
Differences to approach taken in "Gitweb caching v7" by J.H.
* It is made explicit that background generation depends on using
locking. It doesn't matter for J.H. series, as you canot turn off
using locking there.
* Forking (running generating process in background) is done only if
there is stale data to serve (and if background cache is turned on).
In J.H. series forking was done unconditionally, only generation or
exit depended on $backgroundCache (and technical/for debugging
$cacheDoFork).
* Locking is done before forking, as forking background process is done
only for the process regenerating cache.
* Daemonizes background process, detaching it from parent (using
setsid). This way whether main process is short-lived (gitweb as CGI)
or long-lived (mod_perl, PSGI or FastCGI), there would be no need to
wait and no zombies.
gitweb/gitweb.perl | 9 +++
gitweb/lib/GitwebCache/FileCacheWithLocking.pm | 64 ++++++++++++++++++++++--
t/t9503/test_cache_interface.pl | 40 ++++++++++++++-
3 files changed, 106 insertions(+), 7 deletions(-)
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 454766c..f202d6b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -335,6 +335,15 @@ our %cache_options = (
# Set it to -1 to always serve existing data if it exists,
# set it to 0 to turn off serving stale data - always wait.
'max_lifetime' => 5*60*60, # 5 hours
+
+ # This enables/disables background caching. If it is set to true value,
+ # caching engine would return stale data (if it is not older than
+ # 'max_lifetime' seconds) if it exists, and launch process if regenerating
+ # (refreshing) cache into the background. If it is set to false value,
+ # the process that fills cache must always wait for data to be generated.
+ # In theory this will make gitweb seem more responsive at the price of
+ # serving possibly stale data.
+ 'background_cache' => 1,
);
# Set to _initialized_ instance of GitwebCache::Capture compatibile capturing
# engine, i.e. one implementing ->new() constructor, and ->capture($code)
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 1d32810..82e88f1 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -23,6 +23,7 @@ use warnings;
use File::Path qw(mkpath);
use Fcntl qw(:flock);
+use POSIX qw(setsid);
# ......................................................................
# constructor
@@ -70,21 +71,27 @@ use Fcntl qw(:flock);
# than it, serve stale data when waiting for cache entry to be
# regenerated (refreshed). Non-adaptive.
# Defaults to -1 (never expire / always serve stale).
+# * 'background_cache' (boolean)
+# This enables/disables regenerating cache in background process.
+# Defaults to true.
sub new {
my $class = shift;
my %opts = ref $_[0] ? %{ $_[0] } : @_;
my $self = $class->SUPER::new(\%opts);
- my ($max_lifetime);
+ my ($max_lifetime, $background_cache);
if (%opts) {
$max_lifetime =
$opts{'max_lifetime'} ||
$opts{'max_cache_lifetime'};
+ $background_cache = $opts{'background_cache'};
}
$max_lifetime = -1 unless defined($max_lifetime);
+ $background_cache = 1 unless defined($background_cache);
$self->set_max_lifetime($max_lifetime);
+ $self->set_background_cache($background_cache);
return $self;
}
@@ -95,7 +102,7 @@ sub new {
# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
# creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(max_lifetime)) {
+foreach my $i (qw(max_lifetime background_cache)) {
my $field = $i;
no strict 'refs';
*{"get_$field"} = sub {
@@ -146,6 +153,52 @@ sub _tempfile_to_path {
# ......................................................................
# interface methods
+sub _set_maybe_background {
+ my ($self, $key, $fetch_code, $set_code) = @_;
+
+ my $pid;
+ my (@result, @stale_result);
+
+ if ($self->{'background_cache'}) {
+ # try to retrieve stale data
+ @stale_result = $fetch_code->()
+ if $self->is_valid($key, $self->get_max_lifetime());
+
+ # fork if there is stale data, for background process
+ # to regenerate/refresh the cache (generate data)
+ $pid = fork() if (@stale_result);
+ }
+
+ if ($pid) {
+ ## forked and are in parent process
+ # reap child, which spawned grandchild process (detaching it)
+ waitpid $pid, 0;
+
+ } else {
+ ## didn't fork, or are in background process
+
+ # daemonize background process, detaching it from parent
+ # see also Proc::Daemonize, Apache2::SubProcess
+ if (defined $pid) {
+ ## in background process
+ POSIX::setsid(); # or setpgrp(0, 0);
+ fork() && CORE::exit(0);
+ }
+
+ @result = $set_code->();
+
+ if (defined $pid) {
+ ## in background process; parent will serve stale data
+
+ # lockfile will be automatically closed on exit,
+ # and therefore lockfile would be unlocked
+ CORE::exit(0);
+ }
+ }
+
+ return @result > 0 ? @result : @stale_result;
+}
+
sub _compute_generic {
my ($self, $key,
$get_code, $fetch_code, $set_code, $fetch_locked) = @_;
@@ -162,16 +215,19 @@ sub _compute_generic {
do {
open my $lock_fh, '+>', $lockfile
or die "Could't open lockfile '$lockfile': $!";
+
$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
if ($lock_state) {
- # acquired writers lock
- @result = $set_code->();
+ ## acquired writers lock, have to generate data
+ @result = $self->_set_maybe_background($key, $fetch_code, $set_code);
# closing lockfile releases lock
close $lock_fh
or die "Could't close lockfile '$lockfile': $!";
} else {
+ ## didn't acquire writers lock, get stale data or wait for regeneration
+
# try to retrieve stale data
@result = $fetch_code->()
if $self->is_valid($key, $self->get_max_lifetime());
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 8a52261..7f08863 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -24,9 +24,13 @@ diag("Testing '$INC{'GitwebCache/FileCacheWithLocking.pm'}'");
my $cache = new_ok('GitwebCache::FileCacheWithLocking', [ {
'max_lifetime' => 0, # turn it off
+ 'background_cache' => 0,
} ]);
isa_ok($cache, 'GitwebCache::SimpleFileCache');
+# compute can fork, don't generate zombies
+#local $SIG{CHLD} = 'IGNORE';
+
# Test that default values are defined
#
ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
@@ -303,6 +307,9 @@ subtest 'parallel access' => sub {
my $stale_value = 'Stale Value';
subtest 'serving stale data when (re)generating' => sub {
+ # without background generation
+ $cache->set_background_cache(0);
+
$cache->set($key, $stale_value);
$call_count = 0;
$cache->set_expires_in(0); # expire now
@@ -312,12 +319,39 @@ subtest 'serving stale data when (re)generating' => sub {
my $data = cache_compute($cache, $key, \&get_value_slow);
print $data;
};
- ok(scalar(grep { $_ eq $stale_value } @output),
- 'stale data in at least one process when expired');
+ # returning stale data works
+ is_deeply(
+ [sort @output],
+ [sort ($value, $stale_value)],
+ 'no background: stale data returned by one process'
+ );
+
+ $cache->set_expires_in(-1); # never expire for next ->get
+ is($cache->get($key), $value,
+ 'no background: value got set correctly, even if stale data returned');
+
+
+ # with background generation
+ $cache->set_background_cache(1);
+
+ $cache->set($key, $stale_value);
+ $cache->set_expires_in(0); # set value is now expired
+ @output = parallel_run {
+ my $data = cache_compute($cache, $key, \&get_value_slow);
+ print $data;
+ };
+ # returning stale data works
+ is_deeply(
+ \@output,
+ [ ($stale_value) x 2 ],
+ 'background: stale data returned by both process when expired'
+ );
$cache->set_expires_in(-1); # never expire for next ->get
+ note('waiting for background process to have time to set data');
+ sleep $slow_time; # wait for background process to have chance to set data
is($cache->get($key), $value,
- 'value got set correctly, even if stale data returned');
+ 'background: value got set correctly by background process');
# $cache->set($key, $stale_value);
--
1.7.3
next prev parent reply other threads:[~2010-12-06 23:20 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-06 23:10 [PATCHv6/RFC 00/24] gitweb: Simple file based output caching Jakub Narebski
2010-12-06 23:10 ` [PATCH 01/24] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
2010-12-06 23:10 ` [PATCH 02/24] gitweb: Prepare for splitting gitweb Jakub Narebski
2010-12-06 23:10 ` [PATCH 03/24] gitweb/lib - Very simple file based cache Jakub Narebski
2010-12-06 23:10 ` [PATCH 04/24] gitweb/lib - Stat-based cache expiration Jakub Narebski
2010-12-06 23:10 ` [PATCH 05/24] gitweb/lib - Regenerate entry if the cache file has size of 0 Jakub Narebski
2010-12-06 23:10 ` [PATCH 06/24] gitweb/lib - Simple output capture by redirecting STDOUT Jakub Narebski
2010-12-06 23:10 ` [PATCH 07/24] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
2010-12-06 23:10 ` [PATCH 08/24] gitweb: Add optional output caching Jakub Narebski
2010-12-06 23:10 ` [PATCH 09/24] gitweb/lib - Adaptive cache expiration time Jakub Narebski
2010-12-06 23:10 ` [PATCH 10/24] gitweb/lib - Use CHI compatibile (compute method) caching interface Jakub Narebski
2010-12-06 23:10 ` [PATCH 11/24] gitweb/lib - capture output directly to cache entry file Jakub Narebski
2010-12-06 23:10 ` [PATCH 12/24] gitweb/lib - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
2010-12-06 23:10 ` [PATCH 13/24] gitweb/lib - No need for File::Temp when locking Jakub Narebski
2010-12-06 23:10 ` [PATCH 14/24] gitweb/lib - Serve stale data when waiting for filling cache Jakub Narebski
2010-12-06 23:11 ` Jakub Narebski [this message]
2010-12-06 23:11 ` [PATCH 16/24] gitweb: Introduce %actions_info, gathering information about actions Jakub Narebski
2010-12-06 23:11 ` [PATCH 17/24] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
2010-12-06 23:11 ` [PATCH 18/24] gitweb/lib - Configure running 'generating_info' when generating data Jakub Narebski
2010-12-06 23:11 ` [PATCH 19/24] gitweb: Add startup delay to activity indicator for cache Jakub Narebski
2010-12-06 23:11 ` [PATCH/RFC 20/24] gitweb/lib - Add support for setting error handler in cache Jakub Narebski
2010-12-06 23:11 ` [PATCH/RFC 21/24] gitweb: Wrap die_error to use as error handler for caching engine Jakub Narebski
2010-12-06 23:11 ` [PATCH/RFC 22/24] gitweb: Support legacy options used by kernel.org " Jakub Narebski
2010-12-06 23:11 ` [RFC/PATCH 23/24] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
2010-12-06 23:11 ` [RFC PATCH 24/24] gitweb: Add beginnings of cache administration page (proof of concept) Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1291677069-6559-16-git-send-email-jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=avarab@gmail.com \
--cc=demerphq@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=thomas@xteddy.org \
--cc=warthog9@eaglescrag.net \
--cc=warthog9@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).