git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Cc: Petr Baudis <pasky@suse.cz>, "J.H." <warthog19@eaglescrag.net>,
	Frank Lichtenheld <frank@lichtenheld.de>,
	Jakub Narebski <jnareb@gmail.com>
Subject: [RFC/PATCH 2/3] gitweb: Support caching projects list
Date: Mon, 17 Mar 2008 16:09:29 +0100	[thread overview]
Message-ID: <1205766570-13550-3-git-send-email-jnareb@gmail.com> (raw)
In-Reply-To: <1205766570-13550-1-git-send-email-jnareb@gmail.com>

From: Petr Baudis <pasky@suse.cz>

On repo.or.cz (permanently I/O overloaded and hosting 1050 project +
forks), the projects list (the default gitweb page) can take more than
a minute to generate. This naive patch adds simple support for caching
the projects list data structure so that all the projects do not need
to get rescanned at every page access.

$projlist_cache_lifetime gitweb configuration variable is introduced,
by default set to zero. If set to non-zero, it describes the number of
minutes for which the cache remains valid. Only single project root
per system can use the cache. Any script running with the same uid as
gitweb can change the cache trivially - this is for secure
installations only.

The cache itself is stored in /tmp/gitweb.index.cache as a
Data::Dumper dump of the perl data structure with the list of project
details.  When reusing the cache, the file is simply eval'd back into
@projects.

To prevent contention when multiple accesses coincide with cache
expiration, the timeout is postponed to time()+120 when we start
refreshing.  When showing cached version, a disclaimer is shown
at the top of the projects list.

Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is (slightly changed) second part of Petr Baudis patch; the
difference (intediff) between this version and the original can be
found in cover letter for this series.

The differences, besides a bit of style changes like using '&&'
instead of 'and', are:
 * Current version reads cache file in full, in 'slurp' mode, instead
   of reading it line by line and then concatenating lines.
 * Current version dumps @projects in the 'terse' mode, so it can be
   eval'ed directly into @projects, without need of extra variable.
 * Current version does atomic writing to cache file by writing first
   to temporary file (there in exclusive mode to *.lock file, but
   File::Temp::tempfile() temporary file could be used instead), and
   then renaming file.  This way we avoid possibility of reading
   partially created file.  Opening file in O_EXCL mode should prevent
   writers trampling one over another, and make only one instance of
   gitweb fill cache; on the other hand if somehow *.lock file is not
   deleted it would prevent regenerating cache.

Note: instead of using Data::Dumper to serialize data we could use
Storable module (distributed with Perl like Data::Dumper).  From what
I've checked it has larger initial cost, but might be better for
larger number of projects, exactly the situation when projects list
caching is needed.

I can send version using Storable; could you compare then Data::Dumper
on repo.or.cz set of repositories then, Pasky?

 gitweb/gitweb.css  |    6 ++++++
 gitweb/gitweb.perl |   51 ++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/gitweb/gitweb.css b/gitweb/gitweb.css
index 446a1c3..1e83896 100644
--- a/gitweb/gitweb.css
+++ b/gitweb/gitweb.css
@@ -85,6 +85,12 @@ div.title, a.title {
 	color: #000000;
 }
 
+div.stale_info {
+	display: block;
+	text-align: right;
+	font-style: italic;
+}
+
 div.readme {
 	padding: 8px;
 }
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 90ab894..5527378 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -118,6 +118,15 @@ our $fallback_encoding = 'latin1';
 # - one might want to include '-B' option, e.g. '-B', '-M'
 our @diff_opts = ('-M'); # taken from git_commit
 
+# projects list cache for busy sites with many projects;
+# if you set this to non-zero, it will be used as the cached
+# index lifetime in minutes
+# the cached list version is stored in /tmp and can be tweaked
+# by other scripts running with the same uid as gitweb - use this
+# ONLY at secure installations; only single gitweb project root per
+# system is supported!
+our $projlist_cache_lifetime = 0;
+
 # information about snapshot formats that gitweb is capable of serving
 our %known_snapshot_formats = (
 	# name => {
@@ -3510,16 +3519,52 @@ sub git_get_projects_details {
 }
 
 sub git_project_list_body {
-	my ($projlist, $order, $from, $to, $extra, $no_header) = @_;
+	my ($projlist, $order, $from, $to, $extra, $no_header, $cache_lifetime) = @_;
 
 	my ($check_forks) = gitweb_check_feature('forks');
 
-	my @projects = git_get_projects_details($projlist, $check_forks);
+	use File::stat;
+	use POSIX qw(:fcntl_h);
+
+	my $cache_file =  '/tmp/gitweb.index.cache';
+
+	my @projects;
+	my $stale = 0;
+	my $now = time();
+	if ($cache_lifetime && -f $cache_file &&
+	    stat($cache_file)->mtime + $cache_lifetime * 60 > $now &&
+	    open(my $fd, '<', $cache_file)) {
+		$stale = $now - stat($cache_file)->mtime;
+		local $/ = undef;
+		my $dump = <$fd>;
+		close $fd;
+		@projects = @{ eval $dump };
+	} else {
+		if ($cache_lifetime && -f $cache_file) {
+			# Postpone timeout by two minutes so that we get
+			# enough time to do our job.
+			my $time = time() - $cache_lifetime + 120;
+			utime $time, $time, $cache_file;
+		}
+		@projects = git_get_projects_details($projlist, $check_forks);
+		if ($cache_lifetime &&
+		    sysopen(my $fd, "$cache_file.lock", O_WRONLY|O_CREAT|O_EXCL, 0600)) {
+			use Data::Dumper;
+			$Data::Dumper::Terse = 1;
+			print $fd Dumper(\@projects);
+			close $fd;
+			rename "$cache_file.lock", $cache_file;
+		}
+	}
 
 	$order ||= $default_projects_order;
 	$from = 0 unless defined $from;
 	$to = $#projects if (!defined $to || $#projects < $to);
 
+	if ($cache_lifetime && $stale) {
+		print "<div class=\"stale_info\">Cached version (${stale}s old)</div>\n";
+	}
+
 	print "<table class=\"project_list\">\n";
 	unless ($no_header) {
 		print "<tr>\n";
@@ -3902,7 +3947,7 @@ sub git_project_list {
 		close $fd;
 		print "</div>\n";
 	}
-	git_project_list_body(\@list, $order);
+	git_project_list_body(\@list, $order, undef, undef, undef, undef, $projlist_cache_lifetime);
 	git_footer_html();
 }
 
-- 
1.5.4.3.453.gc1ad83

  parent reply	other threads:[~2008-03-17 15:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-17 15:09 [PATCH 0/3 v2] gitweb: Support caching projects list Jakub Narebski
2008-03-17 15:09 ` [PATCH 1/3] gitweb: Separate @projects population into git_get_projects_details() Jakub Narebski
2008-03-17 15:09 ` Jakub Narebski [this message]
2008-03-17 16:54   ` [RFC/PATCH 2/3] gitweb: Support caching projects list Frank Lichtenheld
2008-03-17 18:52     ` Jakub Narebski
2008-03-17 19:10       ` Frank Lichtenheld
2008-03-17 20:25         ` Jakub Narebski
2008-03-17 15:09 ` [RFC/PATCH 3/3] gitweb: Fill project details lazily when caching Jakub Narebski
2008-03-18  3:14   ` Petr Baudis
2008-03-18  9:12     ` Jakub Narebski
2008-03-18  9:52       ` Frank Lichtenheld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1205766570-13550-3-git-send-email-jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=frank@lichtenheld.de \
    --cc=git@vger.kernel.org \
    --cc=pasky@suse.cz \
    --cc=warthog19@eaglescrag.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).