From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Cc: John 'Warthog9' Hawley <warthog9@eaglescrag.net>,
John 'Warthog9' Hawley <warthog9@kernel.org>,
Petr Baudis <pasky@suse.cz>, Jakub Narebski <jnareb@gmail.com>
Subject: [RFC PATCHv2 07/10] gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem
Date: Tue, 9 Feb 2010 11:30:24 +0100 [thread overview]
Message-ID: <1265711427-15193-8-git-send-email-jnareb@gmail.com> (raw)
In-Reply-To: <1265711427-15193-1-git-send-email-jnareb@gmail.com>
In the ->compute($key, $code) method from GitwebCache::SimpleFileCache,
use locking (via flock) to ensure that only one process would generate
data to update/fill-in cache; the rest would wait for the cache to
be (re)generated and would read data from cache.
Currently this feature can not be disabled (via %cache_options).
A test in t9503 shows that in the case where there are two clients
trying to simultaneously access non-existent or stale cache entry,
(and generating data takes (artifically) a bit of time), if they are
using ->compute method the data is (re)generated once, as opposed to
if those clients are just using ->get/->set methods.
To be implemented (from original patch by J.H.):
* background building, and showing stale cache
* server-side progress indicator when waiting for filling cache,
which in turn requires separating situations (like snapshots and
other non-HTML responses) where we should not show 'please wait'
message
Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Differences from v1:
* Proper commit message
* ->get_lockname($key) replaced ->_lockfile_to_key($namespace, $key)
* mkpath to ensure that path leading to lockfile exists instead of private
_Make_Path subroutine (this is a bit of code repetition in new code).
* Testing of removing non-existent key moved to earlier commit
Differences from relevant parts of J.H. patch:
* Locking on separate *.lock file, for simpler work and robustness;
acquiring exclusive lock might require having file opened for possible
write, following
"File Locking Tricks and Traps" (http://perl.plover.com/yak/flock/)
J.H. patch used separate *.bg file for lock only for background caching
(to be implemented in one of next commits).
* Single variable $lock_state, in place of $lockingStatus, $lockStatBG,
$lockStat, $lockStatus in J.H. patch.
* Use indirect filehandles for lockfiles, instead of barewords, i.e. global
filehandles:
open my $lock_fh, '+>', $lockfile;
rather than equivalent of
open LOCK, '>:utf8', $lockfile
* Do not use LOCK_UN: closing lockfile would unlock
* In my opinion much cleaner flow control
Possible improvements:
* When using locking, only one process would be able to write to temporary
file. Therefore temporary file can now have non-random name; no need for
File::Temp nor unique_id (performance). It would probably be best done
via ->get_tempfile() method, or something like that.
* Run tests which contain 'sleep 1' in them (which is required to test how
they handle concurrent access, e.g. testing cache miss stampede situation)
only when test is run with --long. This would require update to
t/test-lib.sh to export GIT_TEST_LONG to external program in test_external
etc.
* ->get_lockname should perhaps be ->get_lockfile which would return opened
filehandle, or filehandle + filename like File::Temp::tempfile, or objects
that functions like filehandle but stringifies to filename like
File::Temp->new().
* Equivalent of old _Make_Path?
* Would ability to turn this feature off make sense?
gitweb/cache.pm | 29 ++++++++++++++++-
t/t9503/test_cache_interface.pl | 65 +++++++++++++++++++++++++++++++++++++++
2 files changed, 92 insertions(+), 2 deletions(-)
diff --git a/gitweb/cache.pm b/gitweb/cache.pm
index 899a4b4..5c2de7e 100644
--- a/gitweb/cache.pm
+++ b/gitweb/cache.pm
@@ -24,6 +24,7 @@ use File::Path qw(mkpath);
use File::Spec;
use File::Temp;
use Digest::MD5 qw(md5_hex);
+use Fcntl qw(:flock);
# by default, the cache nests all entries on the filesystem two
# directories deep
@@ -195,6 +196,12 @@ sub path_to_key {
return $filepath;
}
+sub get_lockname {
+ my $self = shift;
+
+ return $self->path_to_key(@_) . '.lock';
+}
+
# ----------------------------------------------------------------------
# worker methods
@@ -330,17 +337,35 @@ sub compute {
my ($self, $p_key, $p_coderef) = @_;
my $data = $self->get($p_key);
- if (!defined $data) {
+ return $data if defined $data;
+
+ my $dir;
+ my $lockfile = $self->get_lockname($p_key, \$dir);
+
+ # ensure that directory leading to lockfile exists
+ mkpath($dir, 0, 0777) if !-d $dir;
+
+ open my $lock_fh, '+>', $lockfile;
+ # or die "Can't open lockfile '$lockfile': $!";
+ if (my $lock_state = flock($lock_fh, LOCK_EX | LOCK_NB)) {
+ # acquired writers lock
$data = $p_coderef->($self, $p_key);
$self->set($p_key, $data);
+ } else {
+ # get readers lock
+ flock($lock_fh, LOCK_SH);
+ $data = $self->fetch($p_key);
}
-
+ # closing lockfile releases lock
+ close $lock_fh;
return $data;
}
1;
} # end of package GitwebCache::SimpleFileCache;
+# ======================================================================
+
# human readable key identifying gitweb output
sub gitweb_output_key {
return href(-replay => 1, -full => 1, -path_info => 0);
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 3945adc..42fe359 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -107,6 +107,71 @@ is($cache->compute($key, \&get_value), $value, 'compute 2nd time (get)');
is($cache->compute($key, \&get_value), $value, 'compute 3rd time (get)');
cmp_ok($call_count, '==', 1, 'get_value() is called once from compute');
+# Test 'stampeding herd' / 'cache miss stampede' problem
+# (probably should be run only if GIT_TEST_LONG)
+#
+sub get_value_slow {
+ $call_count++;
+ sleep 1;
+ return $value;
+}
+my ($pid, $kid_fh);
+
+$call_count = 0;
+$cache->remove($key);
+$pid = open $kid_fh, '-|';
+SKIP: {
+ skip "cannot fork: $!", 1
+ unless defined $pid;
+
+ my $data = $cache->get($key);
+ if (!defined $data) {
+ $data = get_value_slow();
+ $cache->set($key, $data);
+ }
+
+ if ($pid) {
+ my $child_count = <$kid_fh>;
+ chomp $child_count;
+
+ waitpid $pid, 0;
+ close $kid_fh;
+
+ $call_count += $child_count;
+ } else {
+ print "$call_count\n";
+ exit 0;
+ }
+
+ cmp_ok($call_count, '==', 2, 'parallel get/set: get_value_slow() called twice');
+}
+
+$call_count = 0;
+$cache->remove($key);
+$pid = open $kid_fh, '-|';
+SKIP: {
+ skip "cannot fork: $!", 1
+ unless defined $pid;
+
+ my $data = $cache->compute($key, \&get_value_slow);
+
+ if ($pid) {
+ my $child_count = <$kid_fh>;
+ chomp $child_count;
+
+ waitpid $pid, 0;
+ close $kid_fh;
+
+ $call_count += $child_count;
+ } else {
+ print "$call_count\n";
+ exit 0;
+ }
+
+ cmp_ok($call_count, '==', 1, 'parallel compute: get_value_slow() called once');
+}
+
+
# Test cache expiration for 'expire now'
#
$cache->set_expires_in(60*60*24); # set expire time to 1 day
--
1.6.6.1
next prev parent reply other threads:[~2010-02-09 10:31 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-09 10:30 [RFC PATCHv2 00/10] gitweb: Simple file based output caching Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 01/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 02/10] gitweb/cache.pm - Very simple file based caching Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 03/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching Jakub Narebski
2010-02-10 1:12 ` Jakub Narebski
2010-02-10 1:23 ` Petr Baudis
2010-02-10 11:28 ` Jakub Narebski
2010-02-10 12:02 ` Petr Baudis
2010-02-10 18:22 ` Jakub Narebski
2010-02-10 20:32 ` Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 05/10] gitweb/cache.pm - Adaptive cache expiration time Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 06/10] gitweb: Use CHI compatibile (compute method) caching Jakub Narebski
2010-02-09 10:30 ` Jakub Narebski [this message]
2010-02-09 10:30 ` [RFC PATCHv2 08/10] gitweb/cache.pm - Serve stale data when waiting for filling cache Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 09/10] gitweb/cache.pm - Regenerate (refresh) cache in background Jakub Narebski
2010-02-09 22:23 ` Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1265711427-15193-8-git-send-email-jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=pasky@suse.cz \
--cc=warthog9@eaglescrag.net \
--cc=warthog9@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).