git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Git::Repo API and gitweb caching
@ 2008-07-11  1:06 Lea Wiemann
  2008-07-11  1:10 ` [PATCH 1/3 v9] gitweb: add test suite with Test::WWW::Mechanize::CGI Lea Wiemann
                   ` (6 more replies)
  0 siblings, 7 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-07-11  1:06 UTC (permalink / raw)
  To: Git Mailing List; +Cc: John Hawley, Jakub Narebski, Petr Baudis

Hi everyone,

As follow-ups to this message, I'll be sending three patches for

1) adding the Mechanize tests,
2) adding the Git::Repo API, and (the important part:)
3) making gitweb use the Git::Repo API, and adding caching to gitweb.

The patches apply on master or next: they're viewable live in action,
with cache statistics temporarily enabled at the bottom of each page,
at: http://odin3.kernel.org/git-lewiemann/

Patch (3) basically makes two large changes in one patch, but it was
pretty hard to separate them during development.  I could try to split
them up after the fact, but it would take at least an hour or two, since
the changes that introduce caching are spread all over the code.  I
don't think that having separate commits ([a] use Git::Repo API, [b] add
caching) brings enough benefit to justify the effort.

There are some other changes in (3) as well, but they fell out as part
of the refactoring, so I didn't separate them either -- same thing.

The Mechanize tests succeed before and after patch (3) is applied, so
I'm reasonably confident that my refactoring didn't introduce any
(major) bugs.

And since you all are curious about the API thing :-), I've added some
notes about why I didn't use Git.pm in the patch message of patch (2).

To all reviewers: Since the patches are quite long, I suggest that for
anything but major changes that require either discussion or work on my
end, you simply send a patch that applies on top of my patches.  (Just
sending a patch with a bunch of trivial/small changes without comment
should be fine; the reasons for simple improvements are normally obvious.)

On my to-do list:

- Benchmarks.  I'm planning to time a replay of kernel.org's gitweb logs
on the test server, with and without caching.  Nothing fancy.  (The
performance of the test setup on odin3.kernel.org is not representative
of gitweb's actual performance under load.)

- Implement support for Last-Modified or ETags, since those basically
fall out for free with the current implementation.  (This will require
mod_perl, since CGI doesn't allow for accessing arbitrary request
headers AFAIK.)  That will make the site a tad more responsive, I hope,
and it will also hugely reduce the load for RSS/Atom requests, which
currently make up almost half of all requests to kernel.org's gitweb and
get served in full each time (i.e. "200 OK" instead of "304 Not Modified").

- Make gitweb use more parts of the Git::Repo API; in particular, the
commit and tag parsing code should be ripped out, and gitweb should use
the (much prettier) Git::Commit/Git::Tag API instead.  Perhaps some more
functions (like ls_tree) can be generalized into the API as well; I went
the easy route for now and simply replaced most "open '-|'" calls with
$repo->cmd_output calls.

-- Lea

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 1/3 v9] gitweb: add test suite with Test::WWW::Mechanize::CGI
  2008-07-11  1:06 [PATCH 0/3] Git::Repo API and gitweb caching Lea Wiemann
@ 2008-07-11  1:10 ` Lea Wiemann
  2008-07-11  1:11 ` [PATCH 2/3] add new Git::Repo API Lea Wiemann
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-07-11  1:10 UTC (permalink / raw)
  To: git

This test uses Test::WWW::Mechanize::CGI to check gitweb's output.  It
also uses HTML::Lint, XML::Parser, and Archive::Tar (if present, each)
to validate the HTML/XML/tgz output, and checks all links on the
tested pages if --long-tests is given.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
Last version is v8, in another thread at
<http://mid.gmane.org/1214488126-6783-1-git-send-email-LeWiemann@gmail.com>;
major changes since v8:

- Test tag and blobdiff views.

- Test pickaxe and grep search functions.

 Makefile                    |    1 +
 t/t9503-gitweb-Mechanize.sh |  144 ++++++++++
 t/t9503/test.pl             |  607 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 752 insertions(+), 0 deletions(-)
 create mode 100755 t/t9503-gitweb-Mechanize.sh
 create mode 100755 t/t9503/test.pl

diff --git a/Makefile b/Makefile
index e105d0d..15f210e 100644
--- a/Makefile
+++ b/Makefile
@@ -1248,6 +1248,7 @@ GIT-CFLAGS: .FORCE-GIT-CFLAGS
 
 GIT-BUILD-OPTIONS: .FORCE-GIT-BUILD-OPTIONS
 	@echo SHELL_PATH=\''$(SHELL_PATH_SQ)'\' >$@
+	@echo PERL_PATH=\''$(PERL_PATH_SQ)'\' >>$@
 
 ### Detect Tck/Tk interpreter path changes
 ifndef NO_TCLTK
diff --git a/t/t9503-gitweb-Mechanize.sh b/t/t9503-gitweb-Mechanize.sh
new file mode 100755
index 0000000..53f2a8a
--- /dev/null
+++ b/t/t9503-gitweb-Mechanize.sh
@@ -0,0 +1,144 @@
+#!/bin/sh
+#
+# Copyright (c) 2008 Jakub Narebski
+# Copyright (c) 2008 Lea Wiemann
+#
+
+# This test supports the --long-tests option.
+
+# This test only runs on Perl 5.8 and later versions, since
+# Test::WWW::Mechanize::CGI requires Perl 5.8.
+
+test_description='gitweb tests (using WWW::Mechanize)
+
+This test uses Test::WWW::Mechanize::CGI to test gitweb.'
+
+# helper functions
+
+safe_chmod () {
+	chmod "$1" "$2" &&
+	if [ "$(git config --get core.filemode)" = false ]
+	then
+		git update-index --chmod="$1" "$2"
+	fi
+}
+
+. ./test-lib.sh
+
+# check if test can be run
+"$PERL_PATH" -MEncode -e 'decode_utf8("", Encode::FB_CROAK)' >/dev/null 2>&1 || {
+	test_expect_success \
+		'skipping gitweb tests, perl version is too old' :
+	test_done
+	exit
+}
+
+"$PERL_PATH" -MTest::WWW::Mechanize::CGI -e '' >/dev/null 2>&1 || {
+	test_expect_success \
+		'skipping gitweb tests, Test::WWW::Mechanize::CGI not found' :
+	test_done
+	exit
+}
+
+# set up test repository
+test_expect_success 'set up test repository' '
+
+	echo "Not an empty file." > file &&
+	git add file &&
+	test_tick && git commit -a -m "Initial commit." &&
+	git branch b &&
+
+	echo "New file" > new_file &&
+	git add new_file &&
+	test_tick && git commit -a -m "File added." &&
+
+	safe_chmod +x new_file &&
+	test_tick && git commit -a -m "Mode changed." &&
+
+	git mv new_file renamed_file &&
+	test_tick && git commit -a -m "File renamed." &&
+
+	rm renamed_file &&
+	ln -s file renamed_file &&
+	test_tick && git commit -a -m "File to symlink." &&
+	git tag with-symlink &&
+
+	git rm renamed_file &&
+	rm -f renamed_file &&
+	test_tick && git commit -a -m "File removed." &&
+
+	cp file file2 &&
+	git add file2 &&
+	test_tick && git commit -a -m "File copied." &&
+
+	echo "New line" >> file2 &&
+	safe_chmod +x file2 &&
+	test_tick && git commit -a -m "Mode change and modification." &&
+
+	mkdir dir1 &&
+	echo "New file with a \"pickaxe test string\"" > dir1/file1 &&
+	git add dir1/file1 &&
+	test_tick && git commit -a -m "File added in subdirectory." &&
+	git tag -m "creating a tag object" tag-object
+
+	git checkout b &&
+	echo "Branch" >> b &&
+	git add b &&
+	test_tick && git commit -a -m "On branch" &&
+	git checkout master &&
+	test_tick && git pull . b
+'
+
+# set up gitweb configuration
+safe_pwd="$("$PERL_PATH" -MPOSIX=getcwd -e 'print quotemeta(getcwd)')"
+large_cache_root="../t9503/large_cache.tmp"
+test_expect_success 'create file cache directory' \
+	'mkdir -p "$large_cache_root"'
+cat >gitweb_config.perl <<EOF
+# gitweb configuration for tests
+
+our \$version = "current";
+our \$GIT = "$GIT_EXEC_PATH/git";
+our \$projectroot = "$safe_pwd";
+our \$project_maxdepth = 8;
+our \$home_link_str = "projects";
+our \$site_name = "[localhost]";
+our \$site_header = "";
+our \$site_footer = "";
+our \$home_text = "indextext.html";
+our @stylesheets = ("file:///$safe_pwd/../../gitweb/gitweb.css");
+our \$logo = "file:///$safe_pwd/../../gitweb/git-logo.png";
+our \$favicon = "file:///$safe_pwd/../../gitweb/git-favicon.png";
+our \$projects_list = "";
+our \$export_ok = "";
+our \$strict_export = "";
+our %feature;
+\$feature{'blame'}{'default'} = [1];
+
+our \$large_cache_root = "$large_cache_root";
+if (eval { require Cache::MemoryCache; 1 }) {
+	our \$cache = Cache::MemoryCache->new;
+}
+
+
+1;
+__END__
+EOF
+
+cat >.git/description <<EOF
+t9503-gitweb-Mechanize test repository
+EOF
+
+GITWEB_CONFIG="$(pwd)/gitweb_config.perl"
+export GITWEB_CONFIG
+
+# run tests
+
+test_external \
+	'test gitweb output' \
+	"$PERL_PATH" ../t9503/test.pl
+
+test_expect_success 'remove file cache directory' \
+	'rm -rf "$large_cache_root"'
+
+test_done
diff --git a/t/t9503/test.pl b/t/t9503/test.pl
new file mode 100755
index 0000000..b0a8269
--- /dev/null
+++ b/t/t9503/test.pl
@@ -0,0 +1,607 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+# This test supports the --long-tests option.
+
+use warnings;
+use strict;
+
+use Cwd qw( abs_path );
+use File::Spec;
+use File::Temp;
+use Storable;
+
+use Test::More qw(no_plan);
+
+die "this must be run by calling the t/t*.sh shell script(s)\n"
+    if Cwd->cwd !~ /trash directory$/;
+
+our $long_tests = $ENV{GIT_TEST_LONG}; # "our" so we can use "local $long_tests"
+
+eval { require Archive::Tar; };
+my $archive_tar_installed = !$@
+    or diag('Archive::Tar is not installed; no tests for valid snapshots');
+
+eval { require HTML::Lint; };
+my $html_lint_installed = !$@
+    or diag('HTML::Lint is not installed; no HTML validation tests');
+
+eval { require XML::Parser; };
+my $xml_parser_installed = !$@
+    or diag('XML::Parser is not installed; no tests for well-formed XML');
+
+sub rev_parse {
+	my $name = shift;
+	chomp(my $sha1 = `git rev-parse $name 2> /dev/null`);
+	$sha1 or die;
+}
+
+sub get_type {
+	my $name = shift;
+	chomp(my $type = `git cat-file -t $name 2> /dev/null`);
+	$type or die;
+}
+
+
+package OurMechanize;
+
+use base qw( Test::WWW::Mechanize::CGI );
+
+my %page_cache;
+# Cache requests.
+sub _make_request {
+	my ($self, $request) = (shift, shift);
+
+	my $response;
+	unless ($response = Storable::thaw($page_cache{$request->uri})) {
+		$response = $self->SUPER::_make_request($request, @_);
+		$page_cache{$request->uri} = Storable::freeze($response);
+	}
+	return $response;
+}
+
+# Fix whitespace problem.
+sub cgi_application {
+	my ($self, $application) = @_;
+
+	# This subroutine was copied (and modified) from
+	# WWW::Mechanize::CGI 0.3, which is licensed 'under the same
+	# terms as perl itself' and thus GPL compatible.
+	my $cgi = sub {
+		# Use exec, not the shell, to support embedded
+		# whitespace in the path to $application.
+		# http://rt.cpan.org/Ticket/Display.html?id=36654
+		my $status = system $application $application;
+		my $exit_code  = $status >> 8;
+
+		die "Failed to execute application '$application'. Reason: '$!'"
+		    if ($status == -1);
+		die "Application '$application' exited with exit code $exit_code"
+		    if ($exit_code > 0);
+	};
+
+	$self->cgi($cgi);
+}
+
+package main;
+
+
+my @revisions = split /\s/, `git-rev-list --first-parent HEAD`;
+chomp(my @heads = map { (split('/', $_))[2] } `git-for-each-ref --sort=-committerdate refs/heads`);
+chomp(my @tags = map { (split('/', $_))[2] } `git-for-each-ref --sort=-committerdate refs/tags`);
+my @tag_objects = grep { get_type($_) eq 'tag' } @tags;
+chomp(my @root_entries = `git-ls-tree --name-only HEAD`);
+my @files = grep { get_type("HEAD:$_") eq 'blob' } @root_entries or die;
+my @directories = grep { get_type("HEAD:$_") eq 'tree' } @root_entries or die;
+unless ($long_tests) {
+	# Only test one of each kind.
+	@files = $files[0];
+	@directories = $directories[0];
+	@tag_objects = $tag_objects[0];
+}
+
+my $gitweb = abs_path(File::Spec->catfile('..', '..', 'gitweb', 'gitweb.cgi'));
+
+my $mech = OurMechanize->new;
+$mech->cgi_application($gitweb);
+# On some systems(?) it's necessary to have %ENV here, otherwise the
+# CGI process won't get *any* of the current environment variables
+# (not even PATH, etc.)
+$mech->env(%ENV,
+	   GITWEB_CONFIG => $ENV{'GITWEB_CONFIG'},
+	   SCRIPT_FILENAME => $gitweb,
+	   $mech->env);
+
+# import config, predeclaring config variables
+our $site_name;
+require_ok($ENV{'GITWEB_CONFIG'})
+	or diag('Could not load gitweb config; some tests would fail');
+
+# Perform non-recursive checks on the current page, but do not check
+# the status code.
+my %verified_uris;
+sub _verify_page {
+	my ($uri, $fragment) = split '#', $mech->uri;
+	TODO: {
+		local $TODO = 'line number fragments can be broken for diffs and blames'
+		    if $fragment && $fragment =~ /^l[0-9]+$/;
+		$mech->content_like(qr/(name|id)="$fragment"/,
+				    "[auto] fragment #$fragment exists ($uri)")
+		    if $fragment;
+	}
+
+	return 1 if $verified_uris{$uri};
+	$verified_uris{$uri} = 1;
+
+	# Internal errors yield 200 but cause gitweb.cgi to exit with
+	# non-zero exit code, which Mechanize::CGI translates to 500,
+	# so we don't really need to check for "Software error" here,
+	# provided that the test cases always check the status code.
+	#$mech->content_lacks('<h1>Software error:</h1>') or return 0;
+
+	# Validate.  This is fast, so we can do it even without
+	# $long_tests.
+	$mech->html_lint_ok('[auto] validate HTML') or return 0
+	    if $html_lint_installed && $mech->is_html;
+	my $content_type = $mech->response->header('Content-Type')
+	    or die "$uri does not have a Content-Type header";
+	if ($xml_parser_installed && $content_type =~ /xml/) {
+		eval { XML::Parser->new->parse($mech->content); };
+		ok(!$@, "[auto] check for XML well-formedness ($uri)") or diag($@);
+	}
+	if ($archive_tar_installed && $uri =~ /sf=tgz/) {
+		my $snapshot_file = File::Temp->new;
+		print $snapshot_file $mech->content;
+		close $snapshot_file;
+		my $t = Archive::Tar->new;
+		$t->read($snapshot_file->filename, 1);
+		ok($t->get_files, "[auto] valid tgz snapshot ($uri)");
+	}
+	# WebService::Validator::Feed::W3C would be nice to
+	# use, but it doesn't support direct input (as opposed
+	# to URIs) long enough for our feeds.
+
+	return 1;
+}
+
+# Verify and spider the current page, the latter only if --long-tests
+# (-l) is given.  Do not check the status code of the current page.
+my %spidered_uris;  # pages whose links have been checked
+my %status_checked_uris;  # verified pages whose status is known to be 2xx
+sub check_page {
+	_verify_page or return 0;
+	if ($long_tests && !$spidered_uris{$mech->uri} ) {
+		$spidered_uris{$mech->uri} = 1;
+		my $orig_url = $mech->uri;
+		TODO: {
+			local $TODO = "blame links can be broken sometimes"
+			    if $orig_url =~ /a=blame/;
+			for my $url (map { $_->url_abs } $mech->followable_links) {
+				if (!$status_checked_uris{$url}) {
+					$status_checked_uris{$url} = 1;
+					local $long_tests = 0;  # stop recursing
+					test_page($url, "[auto] check link")
+					    or diag("broken link to $url on $orig_url");
+					$mech->back;
+				}
+			}
+		}
+	}
+	return 1;
+}
+
+my $baseurl = "http://localhost";
+my ($params, $url, $pagedesc, $status);
+
+# test_page ( <params>, <page_description>, <expected_status> )
+# Example:
+# if (test_page('?p=.git;a=summary', 'repository summary')) {
+#     $mech->...;
+#     $mech->...;
+# }
+#
+# Test that the page can be opened, call _verify_page on it, and
+# return true if there was no test failure.  Also set the global
+# variables $params, $pagedesc, and $url for use in the if block.
+# Optionally pass a third parameter $status to test the HTTP status
+# code of the page (useful for error pages).  You can also pass a full
+# URL instead of just parameters as the first parameter.
+sub test_page {
+	($params, $pagedesc, $status) = @_;
+	# missing $pagedesc is usually accidental
+	die "$params: no pagedesc given" unless defined $pagedesc;
+	if($params =~ /^$baseurl/) {
+		$url = "$params";
+	} else {
+		$url = "$baseurl$params";
+	}
+	$mech->get($url);
+	like($mech->status, $status ? qr/$status/ : qr/^[23][0-9][0-9]$/,
+	     "$pagedesc: $url" . ($status ? " -- yields $status" : ""))
+	    or return 0;
+	if ($mech->status =~ /^3/) {
+		# Don't check 3xx, they tend to look funny.
+		my $location = $mech->response->header('Location');
+		$mech->back;  # compensate for history
+		return test_page($location, "follow redirect from $url");
+	} else {
+		return check_page;
+	}
+}
+
+# follow_link ( \%parms, $pagedesc )
+# Example:
+# if (follow_link( { text => 'commit' }, 'first commit link')) {
+#     $mech->...;
+#     $mech->back;
+# }
+# Like test_page, but does not support status code testing, and
+# returns true if there was a link at all, regardless of whether it
+# was [23]xx or not.
+sub follow_link {
+	(my $parms, $pagedesc) = @_;
+	my $link = $mech->find_link(%$parms);
+	my $current_url = $mech->uri;
+	ok($link, "link exists: $pagedesc (on page $current_url)") or return 0;
+	test_page($link->url, "follow link: $pagedesc (on page $current_url)");
+	return 1;
+}
+
+# like follow_link, except that only checks and goes back immediately;
+# use this instead of ok(find_link...).
+sub test_link {
+	my ($parms, $pagedesc) = @_;
+	my $current_url = $mech->uri;
+	if($long_tests) {
+		# Check status, validate, spider.
+		return follow_link($parms, $pagedesc) && $mech->back;
+	} else {
+		# Only check presence of the link (much faster).
+		return ok($mech->find_link(%$parms),
+			  "link exists: $pagedesc (on page $current_url)");
+	}
+}
+
+sub get_summary {
+	test_page('?p=.git', 'repository summary');
+}
+
+get_summary or die 'summary page failed; aborting all tests';
+
+
+if (test_page '', 'project list (implicit)') {
+	$mech->title_like(qr!$site_name!,
+		"title contains $site_name");
+	$mech->content_contains('t9503-gitweb-Mechanize test repository',
+		'lists test repository (by description)');
+}
+
+
+# Test repository summary: implicit, implicit with pathinfo, explicit.
+for my $sumparams ('?p=.git', '/.git', '?p=.git;a=summary') {
+	if (test_page $sumparams, 'repository summary') {
+		$mech->title_like(qr!$site_name.*\.git/summary!,
+				  "title contains $site_name and \".git/summary\"");
+	}
+}
+
+
+# Search form
+
+# Search commit
+if (get_summary && $mech->submit_form_ok(
+	    { form_number => 1, fields => { 's' => 'Initial' } },
+	    'submit search form (default: commit search)')) {
+	check_page;
+	$mech->content_contains('Initial commit',
+				'content contains commit we searched for');
+}
+
+# Pickaxe
+if (get_summary && $mech->submit_form_ok(
+	    { form_number => 1, fields => { 's' => 'pickaxe test string',
+					    'st' => 'pickaxe' } },
+	    'submit search form (pickaxe)')) {
+	check_page;
+	test_link( { text => 'dir1/file1' }, 'file found with pickaxe' );
+	$mech->content_contains('A U Thor', 'commit author mentioned');
+}
+
+# Grep
+# Let's hope the pickaxe test string is still present in HEAD.
+if (get_summary && $mech->submit_form_ok(
+	    { form_number => 1, fields => { 's' => 'pickaxe test string',
+					    'st' => 'grep' } },
+	    'submit search form (grep)')) {
+	check_page;
+	test_link( { text => 'dir1/file1' }, 'file found with grep' );
+}
+
+
+# Basic error handling
+test_page('?p=non-existent.git', 'non-existent project', 404);
+test_page('?p=.git;a=commit;h=non-existent', 'non-existent commit', 404);
+
+
+# Summary view
+get_summary;
+
+# Check short log.  To do: Extract into separate test_short_log
+# function since the short log occurs on several pages.
+for my $revision (@revisions) {
+	for my $link_text qw( commit commitdiff tree snapshot ) {
+		test_link( { url_abs_regex => qr/h=$revision/, text => $link_text },
+			   "$link_text link for $revision");
+	}
+}
+
+# Check that branches and tags are highlighted in green and yellow in
+# the shortlog.  We assume here that we are on master, so it should be
+# at the top.
+$mech->content_like(qr{<span [^>]*class="head"[^>]*>master</span>},
+		    'master branch is highlighted in shortlog');
+$mech->content_like(qr{<span [^>]*class="tag"[^>]*>$tags[0]</span>},
+		    "$tags[0] (most recent tag) is highlighted in shortlog");
+
+# Check heads.  (This should be extracted as well.)
+for my $head (@heads) {
+	for my $link_text qw( shortlog log tree ) {
+		test_link( { url_abs_regex => qr{h=refs/heads/$head}, text => $link_text },
+			   "$link_text link for head '$head'");
+	}
+}
+
+# Check tags (assume we only have tags referring to commits, not to
+# blobs or trees).
+for my $tag (@tags) {
+	my $commit = rev_parse("$tag^{commit}");
+	test_link( { url_abs_regex => qr{h=refs/tags/$tag}, text => 'shortlog' },
+		   "shortlog link for tag '$tag'");
+	test_link( { url_abs_regex => qr{h=refs/tags/$tag}, text => 'log' },
+		   "log link for tag '$tag'");
+	test_link( { url_abs_regex => qr{h=$commit}, text => 'commit' },
+		   "commit link for tag '$tag'");
+	test_link( { url_abs_regex => qr{h=$commit}, text => $tag },
+	   "'$tag' links to the commit as well");
+	# To do: Test tag link for tag objects.
+	# Why don't we have tree + snapshot links?
+}
+
+
+# RSS/Atom/OPML view
+# Simply retrieve and verify well-formedness, but don't spider.
+$mech->get_ok('?p=.git;a=atom', 'Atom feed') and _verify_page;
+$mech->get_ok('?p=.git;a=rss', 'RSS feed') and _verify_page;
+TODO: {
+	# Now spider -- but there are broken links.
+	# http://mid.gmane.org/485EB333.5070108@gmail.com
+	local $TODO = "fix broken links in Atom/RSS feeds";
+	test_page('?p=.git;a=atom', 'Atom feed');
+	test_page('?p=.git;a=rss', 'RSS feed');
+}
+test_page('?a=opml', 'OPML outline');
+
+
+# Tag view
+get_summary;
+for my $tag (@tag_objects) {
+	my $tag_sha1 = rev_parse($tag);
+	my $object_sha1 = rev_parse("$tag^{}");
+	if(follow_link( { url_abs_regex => qr{h=$tag_sha1}, text => 'tag' },
+			"tag link for tag object '$tag'" )) {
+		$mech->content_contains('C O Mitter', 'tagger mentioned');
+		test_link( { url_abs_regex => qr/h=$object_sha1/ },
+			   'link to referenced object');
+		$mech->back;
+	}
+}
+
+
+# Commit view
+if (test_page('?p=.git;a=commit;h=master', 'view HEAD commit')) {
+	my $tree_sha1 = rev_parse('master:');
+	test_link( { url_abs_regex => qr/a=tree/, text => rev_parse('master:') },
+		   "SHA1 link to tree on commit page ($url)");
+	test_link( { url_abs_regex => qr/h=$tree_sha1/, text => 'tree' },
+		   "'tree' link to tree on commit page ($url)");
+	$mech->content_like(qr/A U Thor/, "author mentioned on commit page ($url)");
+}
+
+
+# Commitdiff view
+if (get_summary &&
+    follow_link( { text_regex => qr/file added/i }, 'commit with added file') &&
+    follow_link( { text => 'commitdiff' }, 'commitdiff')) {
+	$mech->content_like(qr/new file with mode/, "commitdiff has diffstat ($url)");
+	$mech->content_like(qr/new file mode/, "commitdiff has diff ($url)");
+}
+test_page("?p=.git;a=commitdiff;h=$revisions[-1]",
+	  'commitdiff without parent');
+
+# Diff formatting problem.
+if (get_summary &&
+    follow_link( { text_regex => qr/renamed/ }, 'commit with rename') &&
+    follow_link( { text => 'commitdiff' }, 'commitdiff')) {
+	TODO: {
+		local $TODO = "bad a/* link in diff";
+		if (follow_link( { text_regex => qr!^a/! },
+				 'a/* link (probably wrong)')) {
+			# The page we land on here is broken already.
+			follow_link( { url_abs_regex => qr/a=blob_plain/ },
+				     'linked file name');  # bang
+		}
+	}
+}
+
+
+# Raw commitdiff (commitdiff_plain) view
+if (test_page('?p=.git;a=commit;h=refs/tags/tag-object',
+	      'commit view of tags/tag-object') &&
+    follow_link( { text => 'commitdiff' }, "'commitdiff'") &&
+    follow_link( { text => 'raw' }, "'raw' (commitdiff_plain)")) {
+	$mech->content_like(qr/^From: A U Thor <author\@example.com>$/m,
+			    'commitdiff_plain: From header');
+	TODO: {
+		local $TODO = 'date header mangles timezone';
+		$mech->content_like(qr/^Date: Thu, 7 Apr 2005 15:..:13 -0700$/m,
+				    'commitdiff_plain: Date header (correct)');
+	}
+	$mech->content_like(qr/^Date: Thu, 7 Apr 2005 22:..:13 \+0000 \(-0700\)$/m,
+			    'commitdiff_plain: Date header (UTC, wrong)');
+	$mech->content_like(qr/^Subject: .+$/m,
+			    'commitdiff_plain: Subject header');
+	# '$' markers inexplicably don't work here if we use like(...)
+	# or $mech->content_like().
+	ok($mech->content =~ /^X-Git-Tag: tag-object\^0$/m,
+	   'commitdiff_plain: X-Git-Tag header');
+	ok($mech->content =~ /^X-Git-Url: $baseurl\?p=\.git;a=commitdiff_plain;h=refs%2Ftags%2Ftag-object$/m,
+	   'commitdiff_plain: X-Git-Url header');
+	ok($mech->content =~ /^---$/m, 'commitdiff_plain: separator');
+	ok($mech->content =~ /^diff --git /m, 'commitdiff_plain: diff');
+}
+
+
+# Blobdiff view
+# This assumes file2 has changed at least once, so that there can be a
+# blobdiff for it.
+if (get_summary && follow_link( { text => 'tree' }, 'first tree link') &&
+    follow_link( { text => 'history', url_abs_regex => qr/f=file2/ },
+		 'history of file2') &&
+    follow_link( { text => 'diff to current' },
+		 'some "diff to current" (blobdiff)')) {
+	$mech->content_contains('diff --git', 'blobdiff: has diff header');
+	test_link( { text => 'raw', url_abs_regex => qr/a=blobdiff_plain/ },
+		   '"raw" (blobdiff_plain) link');
+}
+
+# Tree view
+if (get_summary && follow_link( { text => 'tree' }, 'first tree link')) {
+	for my $file (@files) {
+		my $sha1 = rev_parse("HEAD:$file");
+		test_link( { text => $file, url_abs_regex => qr/h=$sha1/ },
+			   "'$file' is listed and linked");
+		test_link({ url_abs_regex => qr/f=$file/, text => $_ },
+			  "'$_' link") foreach qw( blame blob history raw );
+	}
+	for my $directory (@directories) {
+		my $sha1 = rev_parse("HEAD:$directory");
+		test_link({ url_abs_regex => qr/f=$directory/, text => $_ },
+			  "'$_' link") foreach qw( tree history );
+		if(follow_link( { text => $directory, url_abs_regex => qr/h=$sha1/ },
+				"'$directory is listed and linked" )) {
+			if(follow_link( { text => '..' }, 'parent directory')) {
+				test_link({ url_abs_regex => qr/h=$sha1/,
+					    text => $directory },
+					  'back to original tree view');
+				$mech->back;
+			}
+			$mech->back;
+		}
+	}
+}
+
+
+# Blame view
+if (get_summary && follow_link( { text => 'tree' }, 'first tree link')) {
+	for my $blame_link ($mech->find_all_links(text => 'blame')) {
+		my $url = $blame_link->url;
+		$mech->get_ok($url, "get $url -- blame link on tree view")
+		    and _verify_page;
+		$mech->content_like(qr/A U Thor/,
+				    "author mentioned on blame page");
+		TODO: {
+			# Now spider -- but there are broken links.
+			# http://mid.gmane.org/485EC621.7090101@gmail.com
+			local $TODO = "fix broken links in certain blame views";
+			check_page;
+		}
+		last unless $long_tests; # only test first blame link
+	}
+}
+
+
+# History view
+if (get_summary && follow_link( { text => 'tree' }, 'first tree link')) {
+	for my $file (@files, @directories) {
+		my $type = get_type("HEAD:$file");  # blob or tree
+		if (follow_link( { text => 'history', url_abs_regex => qr/f=$file/ },
+				 "history link for '$file'")) {
+			# There is at least one commit, so A U Thor is mentioned.
+			$mech->content_contains('A U Thor', 'A U Thor mentioned');
+			# The following tests test for at least *one*
+			# link of each type and are weak since we
+			# don't have any knowledge of commit hashes.
+			test_link( { text => $type, url_abs_regex => qr/f=$file/ },
+				   "$type");
+			test_link( { text => 'commitdiff' },
+				   "commitdiff");
+			test_link( { url_abs_regex => qr/a=commit;.*h=[a-f0-9]{40}/ },
+				   "subject links to commit"); # weak, brittle
+			$mech->back;
+		}
+	}
+}
+
+
+# Blob view
+if (get_summary && follow_link( { text => 'tree' }, 'first tree link')) {
+	for my $file (@files) {
+		if (follow_link( { text => $file, url_abs_regex => qr/a=blob/ },
+				 "\"$file\" (blob) entry on tree view")) {
+			chomp(my $first_line_regex = (`cat "$file"`)[0]);
+			$first_line_regex =~ s/ / |&nbsp;/g;
+			# Hope that the first line doesn't contain any
+			# HTML-escapable character.
+			$mech->content_like(qr/$first_line_regex/,
+					    "blob view contains first line of file ($url)");
+			$mech->back;
+		}
+	}
+}
+
+
+# Raw (blob_plain) view
+if (get_summary && follow_link( { text => 'tree' }, 'first tree link')) {
+	for my $file (@files) {
+		if (follow_link( { text => 'raw', url_abs_regex => qr/f=$file/ },
+				 "raw (blob_plain) entry for \"$file\" in tree view")) {
+			chomp(my $first_line = (`cat "$file"`)[0]);
+			$mech->content_contains(
+				$first_line, "blob_plain view contains first line of file");
+			$mech->back;
+		}
+	}
+}
+
+
+# Error handling
+# Pass valid and invalid paths to various file-based actions
+for my $action qw( blame blob blob_plain blame ) {
+	test_page("?p=.git;a=$action;f=$files[0];hb=HEAD",
+		  "$action: look up existent file");
+	test_page("?p=.git;a=$action;f=does_not_exist;hb=HEAD",
+		  "$action: look up non-existent file", 404);
+	TODO: {
+		local $TODO = 'wrong error code (but using Git::Repo will fix this)';
+		test_page("?p=.git;a=$action;f=$directories[0];hb=HEAD",
+			  "$action: look up directory", 400);
+	}
+}
+TODO: {
+	local $TODO = 'wrong error code (but using Git::Repo will fix this)';
+	test_page("?p=.git;a=tree;f=$files[0];hb=HEAD",
+		  'tree: look up existent file', 400);
+}
+# Pass valid and invalid paths to tree action
+test_page("?p=.git;a=tree;f=does_not_exist;hb=HEAD",
+	  'tree: look up non-existent file', 404);
+test_page("?p=.git;a=tree;f=$directories[0];hb=HEAD",
+	  'tree: look up directory');
+TODO: {
+	local $TODO = 'cannot use f=/ or f= for trees';
+	test_page("?p=.git;a=tree;f=/;hb=HEAD", 'tree: look up directory');
+}
+
+
+1;
+__END__
-- 
1.5.6.2.456.g63fc0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 2/3] add new Git::Repo API
  2008-07-11  1:06 [PATCH 0/3] Git::Repo API and gitweb caching Lea Wiemann
  2008-07-11  1:10 ` [PATCH 1/3 v9] gitweb: add test suite with Test::WWW::Mechanize::CGI Lea Wiemann
@ 2008-07-11  1:11 ` Lea Wiemann
  2008-07-13 21:38   ` Junio C Hamano
                     ` (3 more replies)
  2008-07-11  1:11 ` [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching Lea Wiemann
                   ` (4 subsequent siblings)
  6 siblings, 4 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-07-11  1:11 UTC (permalink / raw)
  To: git; +Cc: John Hawley, Jakub Narebski, Petr Baudis

This also adds the Git::Commit and Git::Tag classes, which are used by
Git::Repo, the Git::Object base class, and the Git::RepoRoot helper
factory class.
---
Here's some elaboration on why I didn't use or extend Git.pm.

Please note before starting a reply to this: This is not an argument;
I'm just explaining why I implemented it the way I did.  So please
don't try to argue with me about what I should or should have done.
I'm not going to refactor Git::Repo to use Git.pm or vice versa; it's
really a much more non-trivial task than you might think at first
glance.

Anyways, the following bullet points are my reasons for not extending
Git.pm:

- Git.pm doesn't do what I want: It's designed to provide access to
  working copies.  Extending it to have more repository-access
  functions might have resulted in a mess.

  Some long-term thoughts on this issue: I don't think Git.pm's design
  approach is sensible: It tries to do (a) WC access, (b) repo access,
  and (c) frontend error handling (with sensible error messages).
  Those things should really be separated; e.g. one could write a
  Git::WC class that *has* a Git::Repo instance (since every working
  copy has a repository associated with it); so you can use $wc =
  Git::WC->new to access the working copy, and $wc->repo to access its
  repository.  Git.pm will obviously have to stay since a few git
  commands use it, but I'd probably let it die a slow death, and
  (cleanly!) copy functionality to a Git::WC module (and perhaps a
  Git::Frontend module) in the long run.

- It would have needed refactoring and more features.

  - For example, the Error module should be removed (and normal die
    and eval should be used instead).  At some point when I was trying
    to debug it through gitweb, Git.pm would error out somewhere and I
    would randomly get either a blank page (usually) or an error page
    (rarely).  I suspect that this is because of the Error module or
    some strange interaction with another.  (Besides, Error is not in
    the Perl distribution, so it would be an unnecessary dependency
    for gitweb; the only one as far as I can see.)

  - Also, I needed something like max_exit_code and a custom path to
    the git binary; adding max_exit_code would have been non-trivial.

  Now I'm all in favor of re-using existing code, but refactoring
  Git.pm would have taken *much* longer than simply writing a new
  module.  I'm working on caching for gitweb, not on implementing the
  next great Perl API for Git.  (And Git::Repo isn't great, FTR.)

- It's buggy and untested.  Neither of these is a problem by itself,
  but the combination is deadly.  E.g. I was trying to refactor the
  'repository' constructor (to be able to do instantaneous
  instantiation) and stumbled upon a missing semicolon that rendered
  the surrounding code syntactically correct but obviously buggy (line
  214 on master).  Adding a semicolon there seemed to cause other
  errors, and given that (a) I don't understand what the code does and
  no test or comment tells me what it should, and (b) it doesn't work
  (or maybe it half-works?), I lost all my confidence that I could do
  anything resembling a (behavior-preserving) refactoring on that
  code.

  I might have been able to work with this particular problem, but
  such a problem (buggy and completely untested code) are indicative
  that the rest of the code might bear similar surprises.

- It's overly lengthy, and it's a lot of code for not much
  functionality.  Git.pm has 1200 LOC, with a large stack of tiny
  subroutines to handle pipes, and a complicated error handling
  system.  I can make out ~400 lines of actual Git functionality, plus
  a 100-lines constructor (all including documentation).  The part of
  Git::Repo that overlaps with Git.pm (in terms of duplicate
  functionality) seems to be in the range of 150-200 lines, and it's
  mostly pipe-handling.  That's not a whole lot.

- When I decided I didn't want to use Git.pm, it took me 2-3 hours to
  replicate the functionality in Git.pm that I needed; it would have
  taken me *much* longer to extend Git.pm to do what I want.  Again,
  I'm not working on the next great Git Perl API.

So where do we go with Git.pm and Git::Repo?  I would suggest that
they both stay.

Thus we'd have two APIs (both of them obviously incomplete).  If
there's enough objection to having two competing official APIs, I'll
be happy to move Git::Repo to the gitweb directory, and make it a
gitweb-specific thing (though it's not gitweb-specific at all in terms
of functionality); I really don't care, as my goal is to get caching
for gitweb working.  Again, if you're bothered by the fact that
there's duplicate functionality at all, please don't complain, but
send patches to reconcile the APIs; I will not take care of that
myself, since it's a very much non-trivial task.

Unrelatedly, should I add copyright notices at the bottom of each perl
module so they are displayed in the perldoc/man pages?  I'm not
generally a fan of such notices, since they tend to establish code
ownership, but if it's desired I can add them.

So, here's the Git::Repo (and friends) API, plus tests of course: ;-)

 perl/Git/Commit.pm       |  163 +++++++++++++++++++
 perl/Git/Object.pm       |   81 ++++++++++
 perl/Git/Repo.pm         |  397 ++++++++++++++++++++++++++++++++++++++++++++++
 perl/Git/RepoRoot.pm     |  103 ++++++++++++
 perl/Git/Tag.pm          |  155 ++++++++++++++++++
 perl/Makefile            |    1 +
 perl/Makefile.PL         |   10 +-
 t/t9710-perl-git-repo.sh |   49 ++++++
 t/t9710/test.pl          |  165 +++++++++++++++++++
 9 files changed, 1123 insertions(+), 1 deletions(-)
 create mode 100644 perl/Git/Commit.pm
 create mode 100644 perl/Git/Object.pm
 create mode 100644 perl/Git/Repo.pm
 create mode 100644 perl/Git/RepoRoot.pm
 create mode 100644 perl/Git/Tag.pm
 create mode 100755 t/t9710-perl-git-repo.sh
 create mode 100755 t/t9710/test.pl

diff --git a/perl/Git/Commit.pm b/perl/Git/Commit.pm
new file mode 100644
index 0000000..a9bc304
--- /dev/null
+++ b/perl/Git/Commit.pm
@@ -0,0 +1,163 @@
+=head1 NAME
+
+Git::Commit - Object-oriented interface to Git commit objects.
+
+=cut
+
+use strict;
+use warnings;
+
+
+package Git::Commit;
+
+use base qw(Git::Object);
+
+use constant _MESSAGE => 'M';
+use constant _ENCODING => 'E';
+use constant _TREE => 'T';
+use constant _PARENTS => 'P';
+use constant _AUTHOR => 'A';
+use constant _COMMITTER => 'C';
+
+
+# Keep documentation in one place to save space.
+
+=head1 METHODS
+
+=over
+
+=item $commit = Git::Commit->new($repo, $sha1)
+
+Return a new Git::Commit instance for a commit object with $sha1 in
+repository $repo.
+
+Calls to this method are free, since it does not check whether $sha1
+exists and has the right type.  However, accessing any of the commit
+object's properties will fail if $sha1 is not a valid commit object.
+
+Note that $sha1 must be the SHA1 of a commit object; tag objects are
+not dereferenced.
+
+=item $obj->repo
+
+Return the Git::Repo instance this object was instantiated with.
+
+=item $obj->sha1
+
+Return the SHA1 of this commit object.
+
+=item $commit->tree
+
+Return the tree this commit object refers to.
+
+=item $commit->parents
+
+Return a list of zero or more parent commit objects.  Note that commit
+objects stringify to their respective SHA1s, so you can alternatively
+treat this as a list of SHA1 strings.
+
+=item $commit->authors
+
+Return the author string of this commit object.
+
+=item $commit->committer
+
+Return the committer string of this commit object.
+
+=item $commit->message
+
+Return the undecoded commit message of this commit object.
+
+=item $commit->encoding
+
+Return the encoding header of the commit object.
+
+=back
+
+=cut
+
+
+sub tree {
+	my $self = shift;
+	$self->_load;
+	$self->{_TREE()};
+}
+
+sub parents {
+	my $self = shift;
+	$self->_load;
+	map { ref($self)->new($self->repo, $_) } @{$self->{_PARENTS()}};
+}
+
+sub author {
+	my $self = shift;
+	$self->_load;
+	$self->{_AUTHOR()} or '';
+}
+
+sub committer {
+	my $self = shift;
+	$self->_load;
+	$self->{_COMMITTER()} or '';
+}
+
+sub message {
+	my $self = shift;
+	$self->_load;
+	$self->{_MESSAGE()};
+}
+
+sub encoding {
+	my $self = shift;
+	$self->_load;
+	$self->{_ENCODING()};
+}
+
+# Auxiliary method to load (and parse) the commit object from the
+# repository if it hasn't already been loaded.
+
+sub _load {
+	my($self, $raw_text) = shift;
+	return if defined $self->{_MESSAGE()};  # already loaded
+
+	my $sha1 = $self->sha1;
+	if (!defined $raw_text) {
+		# Retrieve from the repository.
+		(my $type, $raw_text) = $self->repo->cat_file($sha1);
+		die "$sha1 is a $type object (expected a commit object)"
+		    unless $type eq 'commit';
+	}
+
+	(my $header, $self->{_MESSAGE()}) = split "\n\n", $raw_text, 2;
+	# Parse header.
+	for my $line (split "\n", $header) {
+		local $/ = "\n"; # for chomp
+		chomp($line);
+		my($key, $value) = split ' ', $line, 2;
+		if ($key eq 'tree') {
+			$self->{_TREE()} = $value;
+		} elsif ($key eq 'parent') {
+			push @{$self->{_PARENTS()}}, $value;
+		} elsif ($key eq 'author') {
+			$self->{_AUTHOR()} = $value;
+		} elsif ($key eq 'committer') {
+			$self->{_COMMITTER()} = $value;
+		} elsif ($key eq 'encoding') {
+			$self->{_ENCODING()} = $value;
+		} else {
+			die "unrecognized commit header $key";
+		}
+	}
+	undef;
+}
+
+
+=head1 NOTES
+
+You will usually want to call $repo->get_commit($sha1) instead of
+instantiating this class directly; see L<Git::Repo>.
+
+=cut
+
+
+1;
diff --git a/perl/Git/Object.pm b/perl/Git/Object.pm
new file mode 100644
index 0000000..a5126f9
--- /dev/null
+++ b/perl/Git/Object.pm
@@ -0,0 +1,81 @@
+=head1 NAME
+
+Git::Object - Object-oriented interface to Git objects (base class).
+
+=head1 DESCRIPTION
+
+Git::Object is a base class that provides access to commit, tag and
+(unimplemented) tree objects.  See L<Git::Commit> and L<Git::Tag>.
+
+Objects are loaded lazily, and hence instantiation is free.  Objects
+stringify to their SHA1s.
+
+=cut
+
+
+use strict;
+use warnings;
+
+
+package Git::Object;
+
+use base qw(Exporter);
+
+our @EXPORT = qw();
+our @EXPORT_OK = qw();
+
+use overload
+    '""' => \&stringify;
+
+# Hash indices:
+use constant _REPO => 'R';
+use constant _SHA1 => 'H';
+
+=head1 METHODS
+
+=over
+
+=item Git::Object->new($repo, $sha1)
+
+Return a new Git::Object instance for the object with $sha1 in the
+repository $repo (a Git::Repo instance).
+
+Note that this method does not check whether the object exists in the
+repository.  Trying to accessing its properties through a subclass
+will fail if the object doesn't exist, however.
+
+=cut
+
+sub new {
+	my($class, $repo, $sha1) = @_;
+	die "$repo is not a Git::Repo instance" unless $repo->isa('Git::Repo');
+	my $self = {_REPO() => $repo, _SHA1() => $sha1};
+	return bless $self, $class;
+}
+
+=item $obj->repo
+
+Return the Git::Repo instance this object was instantiated with.
+
+=cut
+
+sub repo {
+	shift->{_REPO()}
+}
+
+=item $obj->sha1
+
+Return the SHA1 of this object.
+
+=cut
+
+sub sha1 {
+	shift->{_SHA1()}
+}
+
+sub stringify {
+	shift->{_SHA1()}
+}
+
+
+1;
diff --git a/perl/Git/Repo.pm b/perl/Git/Repo.pm
new file mode 100644
index 0000000..5a4c7e0
--- /dev/null
+++ b/perl/Git/Repo.pm
@@ -0,0 +1,397 @@
+=head1 NAME
+
+Git::Repo - Read-only access to the Git repositories.
+
+WARNING: This module is in active development -- do not use it in any
+production code as the API may change at any time.
+
+=head1 SYNOPSIS
+
+  use Git::Repo;
+
+  my $repo = Git::Repo->new(
+      directory => '/path/to/repository.git',
+      git_binary => '/usr/bin/git');
+  my ($sha1, $type, $size) = $repo->get_sha1('HEAD');
+  print "Last log message:\n\n" . $repo->get_commit($sha1)->message;
+
+=cut
+
+
+use strict;
+use warnings;
+use 5.006002;
+
+
+package Git::Repo;
+
+use Git::Tag;
+use Git::Commit;
+
+use IPC::Open2 qw(open2);
+use IO::Handle;
+
+use base qw(Exporter);
+
+our @EXPORT = qw();
+our @EXPORT_OK = qw(assert_sha1 assert_opts);
+
+# Auxiliary subroutines
+
+sub assert_opts {
+	die "must have an even number of arguments for named options"
+	    unless $#_ % 2;
+}
+
+sub assert_sha1 {
+	my $sha1 = shift;
+	die "'$sha1' is not a SHA1 (need to use get_sha1?)"
+	    unless $sha1 && $sha1 =~ /^[a-f0-9]{40}$/;
+}
+
+
+=head1 METHODS
+
+=head2 Instantiation and basic methods
+
+=over
+
+=item $repo = Git::Repo->new(%opts)
+
+Return a new Git::Repo object.  The following options are supported:
+
+=over
+
+=item 'directory'
+
+The directory of the repository (mandatory).
+
+Note that this option is working-copy agnostic; you need to
+instantiate it with the working copy's .git directory as the
+'directory' option.
+
+=item 'git_binary'
+
+The name or full path of the git binary (default: 'git').
+
+=back
+
+Calling this method is free, since it does not check whether the
+repository exists.  Trying to access the repository through one of the
+instance methods will fail if it doesn't exist though.
+
+Examples:
+
+    $repo = Git::Repo->new(directory => '/path/to/repository.git');
+    $repo = Git::Repo->new(directory => '/path/to/working_copy/.git');
+
+=cut
+
+sub new {
+	my $class = shift;
+	assert_opts(@_);
+	my $self = {@_};
+	bless $self, $class;
+	die 'no directory given' unless $self->{directory};
+	return $self;
+}
+
+=item $repo->repo_dir
+
+Return the directory of the repository (.../.git in case of a working
+copy).
+
+=cut
+
+sub repo_dir {
+	shift->{directory}
+}
+
+=item $repo->git_binary
+
+Return the name of or path to the git binary (used with exec).
+
+=cut
+
+sub git_binary {
+	shift->{git_binary}
+}
+
+=item $repo->version
+
+Return the output of 'git --version', with /^git version / stripped.
+
+This method does not require the repository to exist.
+
+=cut
+
+sub version{
+	my $self = shift;
+	chomp(my $version = $self->cmd_output(cmd => ['--version']));
+	$version =~ s/^git version //;  # be permissive if this does not match
+	return $version;
+}
+
+
+=back
+
+=head2 Calling the Git binary
+
+=over
+
+=item $repo->cmd_output(%opts)
+
+Return the output of the given git command as a string, or as a list
+of lines in array context.  Valid options are:
+
+=over
+
+=item 'cmd'
+
+An arrayref of arguments to pass to git (mandatory).
+
+=item 'max_exit_code'
+
+Die if the exit code of the git binary is greater than
+C<max_exit_code> (default: 0).
+
+=back
+
+Example:
+
+    $output = $repo->cmd_output(cmd => ['show', 'HEAD'])
+
+=cut
+
+# To do: According to Git.pm, this might not work with ActiveState
+# Perl on Win 32.  Need to check or wait for reports.
+
+sub cmd_output {
+	my $self = shift;
+	assert_opts @_;
+	my %opts = @_;
+	# We don't support string-commands here unless someone makes a
+	# case for them -- they are too dangerous.
+	die 'cmd parameter must be array' unless ref($opts{cmd}) eq 'ARRAY';
+	my @cmd = ($self->_get_git_cmd, @{$opts{cmd}});
+	open my $fh, '-|', @cmd or die 'cannot open pipe: ' . join(" ", @cmd);
+	local $/;
+	my $output = <$fh>;
+	if (not close $fh) {
+		if ($!) {
+			# Close failed.  Git.pm says it is OK to not
+			# die here, but doesn't provide an example.
+			# Let's die here for now.
+			die "error closing pipe ($!): " . join(' ', @cmd);
+		}
+		my $exit_code = $? >> 8;
+		die "Command died with exit code $exit_code: " . join(" ", @cmd)
+		    if $exit_code > ($opts{max_exit_code} || 0);
+	}
+	return $output;
+}
+
+=item $repo->get_bidi_pipe(%opts)
+
+Open a new bidirectional pipe and return its STDIN and STDOUT file
+handles.  Valid options are:
+
+=over
+
+=item 'cmd'
+
+An arrayref of arguments to pass to git (mandatory).
+
+=item 'reuse'
+
+If true, reuse a previously opened pipe with the same command line and
+whose C<reuse> option was true (default: false).
+
+=back
+
+=cut
+
+sub get_bidi_pipe {
+	my $self = shift;
+	assert_opts @_;
+	my %opts = @_;
+	die 'missing or empty cmd option' unless $opts{cmd} and @{$opts{cmd}};
+	my ($stdin, $stdout);
+	my $cmd_str = join ' ', @{$opts{cmd}};  # key for reusing pipes
+	if ($opts{reuse}) {
+		my $pair = $self->{bidi_pipes}->{$cmd_str};
+		return @$pair if $pair;
+	}
+	my @cmd = ($self->_get_git_cmd, @{$opts{cmd}});
+	open2($stdout, $stdin, @cmd)
+	    or die 'cannot open pipe';
+	if ($opts{reuse}) {
+		$self->{bidi_pipes}->{$cmd_str} = [$stdin, $stdout];
+	}
+	return ($stdin, $stdout);
+}
+
+# Return the first items of the git command line, for instance
+# qw(/usr/bin/git --git-dir=/path/to/repo.git).
+sub _get_git_cmd {
+	my $self = shift;
+	return ($self->git_binary || 'git', '--git-dir=' . $self->repo_dir);
+}
+
+
+=back
+
+=head2 Inspecting the Repository
+
+=over
+
+=item $repo->get_sha1($extended_object_identifier)
+
+Look up the object identified by $extended_object_identifier and
+return its SHA1 hash in scalar context or its ($sha1, $type, $size) in
+list context, or undef or () if the lookup failed, where $type is one
+of 'tag', 'commit', 'tree', or 'blob'.
+
+See L<git-rev-parse(1)>, section "Specifying Revisions", for the
+syntax of the $extended_object_identifier string.
+
+Note that even if you pass a SHA1 hash, its existence is still
+checked, and this method returns undef or () if it doesn't exist in
+the repository.
+
+=cut
+
+sub get_sha1 {
+	my ($self, $object_id) = @_;
+	die 'no object identifier given' unless $object_id;
+	die 'object identifier must not contain newlines' if $object_id =~ /\n/;
+	my ($in, $out) = $self->get_bidi_pipe(
+		cmd => ['cat-file','--batch-check'], reuse => 1);
+	print $in "$object_id\n" or die 'cannot write to pipe';
+	local $/ = "\n";
+	my $output = <$out>;
+	die 'no output from pipe' unless $output;
+	chomp $output;
+	return if $output =~ /missing$/;
+	my ($sha1, $type, $size) =
+	    ($output =~ /^([0-9a-f]{40}) ([a-z]+) ([0-9]+)$/)
+	    or die "invalid response: $output";
+	return wantarray ? ($sha1, $type, $size) : $sha1;
+}
+
+=item $repo->cat_file($sha1)
+
+Return the ($type, $content) of the object identified by $sha1, or die
+if no such object exists in the repository.
+
+Note that you may want to use the higher-level methods get_commit and
+get_tag instead.
+
+=cut
+
+# Possible to-do items: Add optional $file_handle parameter.  Guard
+# against getting huge blobs back when we don't expect it (for
+# instance, we could limit the size and send SIGPIPE to git if we get
+# a blob that is too large).
+
+sub cat_file {
+	my ($self, $sha1) = @_;
+	assert_sha1($sha1);
+
+	my ($in, $out) = $self->get_bidi_pipe(
+		cmd => ['cat-file','--batch'], reuse => 1);
+	print $in "$sha1\n" or die 'cannot write to pipe';
+	my ($ret_sha1, $type, $size) = split ' ', $out->getline;
+	die "'$sha1' not found" if $type eq 'missing';
+	$out->read(my $content, $size);
+	$out->getline;  # eat trailing newline
+	return wantarray ? ($type, $content) : $content;
+}
+
+=item $repo->get_commit($commit_sha1)
+
+Return a new L<Git::Commit> instance referring to the commit object
+with SHA1 $commit_sha1.
+
+=cut
+
+sub get_commit {
+	my ($self, $sha1) = @_;
+	assert_sha1($sha1);
+	return Git::Commit->new($self, $sha1);
+}
+
+=item $repo->get_tag($tag_sha1)
+
+Return a new L<Git::Tag> instance referring to the tag object with SHA1
+$tag_sha1.
+
+=cut
+
+sub get_tag {
+	my ($self, $sha1) = @_;
+	assert_sha1($sha1);
+	return Git::Tag->new($self, $sha1);
+}
+
+=item $repo->get_path($tree_sha1, $file_sha1)
+
+Return the path of the tree or blob identified by $file_sha1 in the
+tree identified by $tree_sha1, or undef if the tree or blob does not
+exist in the given tree.
+
+=cut
+
+sub get_path {
+	my ($self, $tree, $file_sha1) = @_;
+	assert_sha1($tree, $file_sha1);
+	my @lines = split "\n", $self->cmd_output(cmd => ['ls-tree', '-r', '-t', $tree]);
+	for (@lines) {
+		if (/^[0-9]+ [a-z]+ $file_sha1\t(.+)$/) {
+			return $1;
+		}
+	}
+	return undef;
+}
+
+=item $repo->get_refs
+
+=item $repo->get_refs($pattern)
+
+Return an arrayref of [$sha1, $object_type, $ref_name] triples.  If
+$pattern is given, only refs matching the pattern are returned; see
+L<git-for-each-ref(1)> for details.
+
+=cut
+
+sub get_refs {
+	my ($self, $pattern) = @_;
+
+	return [ map [ split ], split("\n", $self->cmd_output(
+			 cmd => [ 'for-each-ref',
+				  defined $pattern ? $pattern : () ])) ];
+}
+
+=item $repo->name_rev($committish_sha1, $tags_only = 0)
+
+Return a symbolic name for the commit identified by $committish_sha1,
+or undef if no name can be found; see L<git-name-rev(1)> for details.
+If $tags_only is true, no branch names are used to name the commit.
+
+=cut
+
+sub name_rev {
+	my ($self, $sha1, $tags_only) = @_;
+	assert_sha1($sha1);
+
+	# Note that we cannot use a bidi pipe here since name git
+	# name-rev --stdin has an excessively high start-up time.
+	# http://thread.gmane.org/gmane.comp.version-control.git/85531
+	chomp(my $name = $self->cmd_output(
+		      cmd => [ 'name-rev', $tags_only ? '--tags' : (),
+			 '--name-only', $sha1 ]));
+	return $name eq 'undefined' ? undef : $name;
+}
+
+
+1;
diff --git a/perl/Git/RepoRoot.pm b/perl/Git/RepoRoot.pm
new file mode 100644
index 0000000..12e1836
--- /dev/null
+++ b/perl/Git/RepoRoot.pm
@@ -0,0 +1,103 @@
+=head1 NAME
+
+Git::RepoRoot - A factory class representing a root directory
+containing Git repositories.
+
+=head1 DESCRIPTION
+
+Git::RepoRoot is a factory class to create L<Git::Repo> instances that
+are located under a common root directory.  It also allows for
+specifying options that all Git::Repo instances will be created with.
+
+Using Git::RepoRoot to create Git::Repo instances is entirely
+optional, but can be more convenient than instantiating them directly.
+
+=cut
+
+
+use strict;
+use warnings;
+
+
+package Git::RepoRoot;
+
+use File::Spec;
+
+use Git::Repo;
+
+use base qw(Exporter);
+
+our @EXPORT = qw();
+our @EXPORT_OK = qw();
+
+=head1 METHODS
+
+=over
+
+=item $repo_root = Git::RepoRoot->new(%opts)
+
+Return a new Git::RepoRoot object.  The following options are
+supported:
+
+=over
+
+=item 'directory'
+
+The directory holding all repositories.
+
+=back
+
+All other options will be passed through to Git::Repo->new.
+
+Example:
+
+    $repo_root = Git::RepoRoot->new(directory => '/pub/git',
+                                    git_binary => '/usr/bin/git'q);
+
+=cut
+
+sub new {
+	my $class = shift;
+	Git::Repo::assert_opts(@_);
+	my $self = {@_};
+	bless $self, $class;
+	die 'no root directory given' unless $self->{directory};
+	return $self;
+}
+
+=item $repo_root->repo(%opts)
+
+Return a new L<Git::Repo> object.  The following options are
+supported:
+
+=over
+
+=item 'directory'
+
+The path of the repository relative to the repository root.
+
+=item 'repo_class'
+
+The Repo class to instantiate (default: 'Git::Repo').
+
+=back
+
+All other options are passed through to Git::Repo.
+
+=cut
+
+sub repo {
+	my $self = shift;
+	Git::Repo::assert_opts(@_);
+	my %opts = (%$self, @_);
+	die 'no directory given' unless $opts{directory};
+	# not technically necessary, but to guard against errors in the caller:
+	die "you passed an absolute path ($opts{directory})"
+	    if $opts{directory} =~ m!^/!;
+	my $repo_class = delete $opts{repo_class} || 'Git::Repo';
+	$opts{directory} = File::Spec->catfile($self->{directory}, $opts{directory});
+	return $repo_class->new(%opts);
+}
+
+
+1;
diff --git a/perl/Git/Tag.pm b/perl/Git/Tag.pm
new file mode 100644
index 0000000..4e0549b
--- /dev/null
+++ b/perl/Git/Tag.pm
@@ -0,0 +1,155 @@
+=head1 NAME
+
+Git::Tag - Object-oriented interface to Git tag objects.
+
+=cut
+
+use strict;
+use warnings;
+
+
+package Git::Tag;
+
+use base qw(Git::Object);
+
+use constant _MESSAGE => 'M';
+use constant _ENCODING => 'E';
+use constant _TAGGER => 'A';
+use constant _TAG => 'T';
+use constant _TYPE => 'Y';
+use constant _OBJECT => 'O';
+
+
+# Keep documentation in one place to save space.
+
+=head1 METHODS
+
+=over
+
+=item $tag = Git::Tag->new($repo, $sha1)
+
+Return a new Git::Tag instance for a tag object with $sha1 in
+repository $repo.
+
+Calls to this method are free, since it does not check whether $sha1
+exists and has the right type.  However, accessing any of the tag
+object's properties will fail if $sha1 is not a valid tag object.
+
+=item $obj->repo
+
+Return the Git::Repo instance this object was instantiated with.
+
+=item $obj->sha1
+
+Return the SHA1 of this tag object.
+
+=item $tag->object
+
+Return the SHA1 string of the object referenced by this tag .
+
+=item $tag->type
+
+Return the type of the referenced object, as claimed by the tag
+object.  This is usually 'commit', but can be any of 'tag', 'commit',
+'tree', or 'blob'.
+
+=item $tag->tagger
+
+Return the tagger string of this tag object.
+
+=item $tag->message
+
+Return the undecoded tag message of this tag object.
+
+=item $tag->encoding
+
+Return the encoding header of the tag object.
+
+=back
+
+=cut
+
+
+sub object {
+	my $self = shift;
+	$self->_load;
+	$self->{_OBJECT()};
+}
+
+sub type {
+	my $self = shift;
+	$self->_load;
+	$self->{_TYPE()} or '';
+}
+
+sub tag {
+	my $self = shift;
+	$self->_load;
+	$self->{_TAG()};
+}
+
+sub tagger {
+	my $self = shift;
+	$self->_load;
+	$self->{_TAGGER()} or '';
+}
+
+sub message {
+	my $self = shift;
+	$self->_load;
+	$self->{_MESSAGE()};
+}
+
+sub encoding {
+	my $self = shift;
+	$self->_load;
+	$self->{_ENCODING()};
+}
+
+# Auxiliary method to load (and parse) the tag object from the
+# repository if it hasn't already been loaded.
+
+sub _load {
+	my($self, $raw_text) = shift;
+	return if defined $self->{_MESSAGE()};  # already loaded
+
+	my $sha1 = $self->sha1;
+	if (!defined $raw_text) {
+		(my $type, $raw_text) = $self->repo->cat_file($sha1);
+		die "$sha1 is a $type object (expected a tag object)"
+		    unless $type eq 'tag';
+	}
+
+	(my $header, $self->{_MESSAGE()}) = split "\n\n", $raw_text, 2;
+	# Parse header.
+	for my $line (split "\n", $header) {
+		local $/ = "\n"; # for chomp
+		chomp($line);
+		my($key, $value) = split ' ', $line, 2;
+		if ($key eq 'object') {
+			$self->{_OBJECT()} = $value;
+		} elsif ($key eq 'type') {
+			$self->{_TYPE()} = $value;
+		} elsif ($key eq 'tag') {
+			$self->{_TAG()} = $value;
+		} elsif ($key eq 'tagger') {
+			$self->{_TAGGER()} = $value;
+		} elsif ($key eq 'encoding') {
+			$self->{_ENCODING()} = $value;
+		} else {
+			die "unrecognized tag header $key";
+		}
+	}
+	undef;
+}
+
+
+=head1 NOTES
+
+You will usually want to call $repo->get_tag($sha1) instead of
+instantiating this class directly; see L<Git::Repo>.
+
+=cut
+
+
+1;
diff --git a/perl/Makefile b/perl/Makefile
index 5e079ad..a70e78a 100644
--- a/perl/Makefile
+++ b/perl/Makefile
@@ -27,6 +27,7 @@ $(makfile): ../GIT-CFLAGS Makefile
 	echo install: >> $@
 	echo '	mkdir -p $(instdir_SQ)' >> $@
 	echo '	$(RM) $(instdir_SQ)/Git.pm; cp Git.pm $(instdir_SQ)' >> $@
+	echo '	mkdir -p $(instdir_SQ)/Git; $(RM) $(instdir_SQ)/Git/*.pm; cp Git/*.pm $(instdir_SQ)/Git' >> $@
 	echo '	$(RM) $(instdir_SQ)/Error.pm; \
 	cp private-Error.pm $(instdir_SQ)/Error.pm' >> $@
 	echo instlibdir: >> $@
diff --git a/perl/Makefile.PL b/perl/Makefile.PL
index 320253e..6c62160 100644
--- a/perl/Makefile.PL
+++ b/perl/Makefile.PL
@@ -8,7 +8,15 @@ instlibdir:
 MAKE_FRAG
 }
 
-my %pm = ('Git.pm' => '$(INST_LIBDIR)/Git.pm');
+# Note that when changing %pm, you'll have to remove pm_to_blib rather
+# than the blib directory to .force the .pm files to be re-installed
+# to blib/lib.
+my %pm = ('Git.pm' => '$(INST_LIBDIR)/Git.pm',
+	  'Git/Repo.pm' => '$(INST_LIBDIR)/Git/Repo.pm',
+	  'Git/RepoRoot.pm' => '$(INST_LIBDIR)/Git/RepoRoot.pm',
+	  'Git/Commit.pm' => '$(INST_LIBDIR)/Git/Commit.pm',
+	  'Git/Tag.pm' => '$(INST_LIBDIR)/Git/Tag.pm',
+	  'Git/Object.pm' => '$(INST_LIBDIR)/Git/Object.pm');
 
 # We come with our own bundled Error.pm. It's not in the set of default
 # Perl modules so install it if it's not available on the system yet.
diff --git a/t/t9710-perl-git-repo.sh b/t/t9710-perl-git-repo.sh
new file mode 100755
index 0000000..ca67b87
--- /dev/null
+++ b/t/t9710-perl-git-repo.sh
@@ -0,0 +1,49 @@
+#!/bin/sh
+#
+# Copyright (c) 2008 Lea Wiemann
+#
+
+test_description='perl interface (Git/*.pm)'
+. ./test-lib.sh
+
+perl -MTest::More -e 0 2>/dev/null || {
+	say_color skip "Perl Test::More unavailable, skipping test"
+	test_done
+}
+
+# Set up test repository.  Tagging/branching is a little tricky
+# because it needs to stay unambiguous for the name_rev tests.
+
+test_expect_success \
+    'set up test repository' \
+    'echo "test file 1" > file1 &&
+     echo "test file 2" > file2 &&
+     mkdir directory1 &&
+     echo "in directory1" >> directory1/file &&
+     mkdir directory2 &&
+     echo "in directory2" >> directory2/file &&
+     git add . &&
+     git commit -m "first commit" &&
+
+     git tag -a -m "tag message 1" tag-object-1 &&
+
+     echo "changed file 1" > file1 &&
+     git commit -a -m "second commit" &&
+
+     git branch branch-2 &&
+
+     echo "changed file 2" > file2 &&
+     git commit -a -m "third commit" &&
+
+     git tag -a -m "tag message 3" tag-object-3 &&
+     git tag -a -m "indirect tag message 3" indirect-tag-3 tag-object-3 &&
+
+     echo "changed file 1 again" > file1 &&
+     git commit -a -m "fourth commit"
+     '
+
+test_external_without_stderr \
+    'Git::Repo API' \
+    perl ../t9710/test.pl
+
+test_done
diff --git a/t/t9710/test.pl b/t/t9710/test.pl
new file mode 100755
index 0000000..188abba
--- /dev/null
+++ b/t/t9710/test.pl
@@ -0,0 +1,165 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use Test::More qw(no_plan);
+use Test::Exception;
+use Carp::Always;
+
+use Cwd;
+use File::Basename;
+use File::Temp;
+use File::Spec;
+use Data::Dumper; # for debugging
+
+BEGIN { use_ok('Git::Repo') }
+
+our $old_stderr;
+sub discard_stderr {
+	open our $old_stderr, ">&", STDERR or die "cannot save STDERR";
+	close STDERR;
+}
+sub restore_stderr {
+	open STDERR, ">&", $old_stderr or die "cannot restore STDERR";
+}
+
+# set up
+our $repo_dir = "trash directory";
+our $abs_wc_dir = Cwd->cwd;
+die "this must be run by calling the t/t97* shell script(s)\n"
+    if basename(Cwd->cwd) ne $repo_dir;
+ok(our $r = Git::Repo->new(directory => "./.git"), 'open repository');
+sub rev_parse {
+	my $name = shift;
+	chomp(my $sha1 = `git rev-parse $name 2> /dev/null`);
+	$sha1 or undef;
+}
+
+my @revisions = split /\s/, `git-rev-list --first-parent HEAD`;
+my $head = $revisions[0];
+
+# cmd_output
+is($r->cmd_output(cmd => ['cat-file', '-t', 'HEAD']), "commit\n", 'cmd_output: basic');
+discard_stderr;
+dies_ok { $r->cmd_output(cmd => ['bad-cmd']); } 'cmd_output: die on error';
+restore_stderr;
+my $bad_output;
+lives_ok { $bad_output = $r->cmd_output(
+		   cmd => ['rev-parse', '--verify', '--quiet', 'badrev'],
+		   max_exit_code => 1); }
+    'cmd_output: max_error';
+is($bad_output, '', 'cmd_output: return string on non-zero exit');
+# untested: get_bidi_pipe
+
+# get_sha1
+is($r->get_sha1('HEAD'), $head, 'get_sha1: scalar');
+is($r->get_sha1('HEAD'), $head, 'get_sha1: scalar, repeated');
+my($sha1, $type, $head_size) = $r->get_sha1('HEAD');
+is($sha1, $head, 'get_sha1: array (SHA1)');
+is($type, 'commit', 'get_sha1: array (commit)');
+ok($head_size > 0, 'get_sha1: array (size)');
+
+# cat_file
+is_deeply([$r->cat_file($r->get_sha1("$revisions[-1]:file1"))], ['blob', "test file 1\n"], 'cat_file: blob');
+is_deeply([$r->cat_file($r->get_sha1("$revisions[-1]:file1"))], ['blob', "test file 1\n"], 'cat_file: blob, repeated');
+dies_ok { $r->cat_file('0' x 40) } 'cat_file: non-existent sha1';
+
+# get_commit
+isa_ok($r->get_commit($revisions[-1]), 'Git::Commit',
+       'get_commit: returns Git::Commit object');
+
+# get_tag
+isa_ok($r->get_tag($r->get_sha1('tag-object-1')), 'Git::Tag',
+       'get_tag: returns Git::Tag object');
+
+# get_path
+is($r->get_path($head, $r->get_sha1('HEAD:directory1/file')),
+   'directory1/file', 'get_path: file');
+is($r->get_path($head, $r->get_sha1('HEAD:directory1')),
+   'directory1', 'get_path: directory');
+is($r->get_path($head, '0' x 40), undef, 'get_path: nonexistent');
+
+# get_refs
+my @refs = @{$r->get_refs()};
+is((grep { $_->[2] eq 'refs/heads/branch-2' } @refs), 1,
+   'get_refs: branch existence and uniqueness');
+my @branch2_info = @{(grep { $_->[2] eq 'refs/heads/branch-2' } @refs)[0]};
+is_deeply([@branch2_info], [$revisions[-2], 'commit', 'refs/heads/branch-2'],
+	  'get_heads: sub-array contents');
+@refs = @{$r->get_refs('refs/tags')};
+ok(@refs, 'get_refs: pattern');
+is((grep { $_->[2] eq 'refs/heads/branch-2' } @refs), 0, 'get_refs: pattern');
+
+# name_rev
+is($r->name_rev($revisions[-2]), 'branch-2', 'name_rev: branch');
+is($r->name_rev($head, 1), undef, 'name_rev: branch, tags only');
+is($r->name_rev($revisions[-1]), 'tags/tag-object-1^0', 'name_rev: tag object');
+is($r->name_rev($revisions[-1], 1), 'tag-object-1^0', 'name_rev: tag object, tags only');
+
+
+
+# Git::Commmit
+print "# Git::Commit:\n";
+
+BEGIN { use_ok('Git::Commit') }
+
+my $invalid_commit = Git::Commit->new($r, '0' x 40);
+is($invalid_commit->sha1, '0' x 40, 'new, sha1: accept invalid SHA1');
+dies_ok { $invalid_commit->tree } 'die on accessing properties of invalid SHA1s';
+
+$invalid_commit = Git::Commit->new($r, $r->get_sha1('HEAD:')); # tree, not commit
+dies_ok { $invalid_commit->tree } 'die on accessing properties of non-commit objects';
+
+my $c = Git::Commit->new($r, $revisions[-2]);
+is($c->repo, $r, 'repo: basic');
+is($c->sha1, $revisions[-2], 'sha1: basic');
+is($c->{Git::Commit::_PARENTS}, undef,
+   'lazy loading: not loaded after reading SHA1');
+is($c->tree, $r->get_sha1("$revisions[-2]:"), 'tree: basic');
+ok($c->{Git::Commit::_PARENTS}, 'lazy loading: loaded after reading tree');
+is_deeply([$c->parents], [$revisions[-1]], 'parents: basic');
+like($c->author, qr/A U Thor <author\@example.com> [0-9]+ \+0000/, 'author: basic');
+like($c->committer, qr/C O Mitter <committer\@example.com> [0-9]+ \+0000/, 'committer: basic');
+is($c->encoding, undef, 'encoding: undef');
+is($c->message, "second commit\n", 'message: basic');
+is($c, $c->sha1, 'stringify: basic');
+
+# error handling
+dies_ok { Git::Commit->new($r, $r->get_sha1('tag-object-3'))->_load }
+    'new: pass tag SHA1 (dies)';
+dies_ok { Git::Commit->new($r, '0' x 40)->_load }
+    'new: pass invalid SHA1 (dies)';
+
+
+# Git::Tag
+print "# Git::Tag:\n";
+
+BEGIN { use_ok('Git::Tag') }
+
+# We don't test functionality inherited from Git::Object that we
+# already tested in the Git::Commit tests.
+
+my $t = Git::Tag->new($r, $r->get_sha1('tag-object-1'));
+is($t->tag, 'tag-object-1', 'tag: basic');
+is($t->object, $revisions[-1], 'object: basic');
+is($t->type, 'commit', 'tag: type');
+like($t->tagger, qr/C O Mitter <committer\@example.com> [0-9]+ \+0000/, 'tagger: basic');
+is($t->encoding, undef, 'encoding: undef');
+is($t->message, "tag message 1\n", 'message: basic');
+
+# error handling
+dies_ok { Git::Tag->new($r, $head)->_load } 'new: pass commit SHA1 (dies)';
+dies_ok { Git::Tag->new($r, '0' x 40)->_load } 'new: pass invalid SHA1 (dies)';
+
+
+# Git::RepoRoot
+print "# Git::RepoRoot:\n";
+
+BEGIN { use_ok('Git::RepoRoot'); }
+
+my $reporoot = Git::RepoRoot->new(
+	directory => File::Spec->catfile($abs_wc_dir, '..'));
+is($reporoot->repo(directory => File::Spec->catfile($repo_dir, '.git'))
+   ->get_sha1('HEAD'), $head, 'repo: basic');
-- 
1.5.6.2.456.g63fc0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching
  2008-07-11  1:06 [PATCH 0/3] Git::Repo API and gitweb caching Lea Wiemann
  2008-07-11  1:10 ` [PATCH 1/3 v9] gitweb: add test suite with Test::WWW::Mechanize::CGI Lea Wiemann
  2008-07-11  1:11 ` [PATCH 2/3] add new Git::Repo API Lea Wiemann
@ 2008-07-11  1:11 ` Lea Wiemann
  2008-07-14 21:23   ` Jakub Narebski
  2008-07-11  1:21 ` [PATCH 0/3] Git::Repo API and gitweb caching Johannes Schindelin
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 63+ messages in thread
From: Lea Wiemann @ 2008-07-11  1:11 UTC (permalink / raw)
  To: git; +Cc: John Hawley, Jakub Narebski, Petr Baudis

Gitweb now uses the Git::Repo API; this change is behavior-preserving,
except for slightly more aggressive error handling; see below.

This patch also adds an optional caching layer for caching repository
data in memory and (for larger cacheable items, like blobs, snapshots,
or diffs) on disk.

Other minor changes:

- Gitweb would previously accept invalid input and either (a) display
  nothing, (b) display an obscure error message, or (c) proceed as
  normal since the parameter happens to be unused in the particular
  code path used.  This has changed in that gitweb will check for
  parameter correctness more aggressively, and display meaningful
  error messages.  This change is only relevant if you manually edit
  gitweb's CGI parameters, since gitweb only generates valid links.

- Empty projects:

  - Only display summary link for empty projects in project list to
    avoid broken links (yielding 404).

  - Slim down summary page for empty projects to avoid some broken
    links and unnecessary vertical space.

  - Sort empty projects at the bottom of the project list when sorting
    by last change.

  - Add test for empty projects to t9503 (the Mechanize test), now
    that there no broken links anymore.

- For HTML pages, remove the "Expires" HTTP response header, and add
  "Cache-Control: no-cache" instead.  This is because pages can
  contain dynamic content (like the subject of the latest commit), so
  the Expires headers would be wrong.

  This makes gitweb's responsiveness slightly worse, but it will get
  much better once If-Last-Modified is implemented.  It's better to be
  correct than to be convenient here, since having to press the reload
  button makes for lousy user experience (IOW, users should be able to
  always trust gitweb's output).

  Raw diffs and blobs still get the Expires header, where appropriate.

- Add a $page_info option to display cache stats at the bottom of each
  page; the option is named generically to allow for adding non-cache
  page info there at some point (timings perhaps?).

Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
It's all documented of course :-), but for the impatient here's a
snippet for gitweb_config.perl to activate caching:

use Cache::Memcached;
$cache = Cache::Memcached->new( { servers => ['localhost:11211'], compress_threshold => 1000 } );
$large_cache_root = '/home/lewiemann/gitweb-cache';
$large_cache_case_sensitive = 1;
# Invalidate cache on changes to gitweb without version number bump;
# useful for development.
$cache_key = (stat '/home/lewiemann/gitweb')[9] . (stat '/home/lewiemann/gitweb/gitweb.cgi')[9];
# Display detailed cache info at the bottom of each page.
$page_info = 2;

A live demo is here: http://odin3.kernel.org/git-lewiemann/

 Makefile                               |   27 +-
 gitweb/README                          |   14 +
 gitweb/gitweb.css                      |   16 +
 gitweb/gitweb.perl                     | 1415 +++++++++++++++++++-------------
 t/t9500-gitweb-standalone-no-errors.sh |    4 +-
 t/t9503-gitweb-Mechanize.sh            |   10 +
 t/t9503/test.pl                        |    5 +
 7 files changed, 911 insertions(+), 580 deletions(-)

diff --git a/Makefile b/Makefile
index 15f210e..a65aa4d 100644
--- a/Makefile
+++ b/Makefile
@@ -1099,25 +1099,28 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)): perl/perl.mak
 perl/perl.mak: GIT-CFLAGS perl/Makefile perl/Makefile.PL
 	$(QUIET_SUBDIR0)perl $(QUIET_SUBDIR1) PERL_PATH='$(PERL_PATH_SQ)' prefix='$(prefix_SQ)' $(@F)
 
+PERL_USE_LIB_REWRITE = \
+	-e '1{' \
+	-e '	s|\#!.*perl|\#!$(PERL_PATH_SQ)|' \
+	-e '	h' \
+	-e '	s=.*=use lib (split(/:/, \$$ENV{GITPERLLIB} || \"@@INSTLIBDIR@@\"));=' \
+	-e '	H' \
+	-e '	x' \
+	-e '}' \
+	-e 's|@@INSTLIBDIR@@|'"$$INSTLIBDIR"'|g' \
+	-e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g'
+
 $(patsubst %.perl,%,$(SCRIPT_PERL)): % : %.perl
 	$(QUIET_GEN)$(RM) $@ $@+ && \
 	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C perl -s --no-print-directory instlibdir` && \
-	sed -e '1{' \
-	    -e '	s|#!.*perl|#!$(PERL_PATH_SQ)|' \
-	    -e '	h' \
-	    -e '	s=.*=use lib (split(/:/, $$ENV{GITPERLLIB} || "@@INSTLIBDIR@@"));=' \
-	    -e '	H' \
-	    -e '	x' \
-	    -e '}' \
-	    -e 's|@@INSTLIBDIR@@|'"$$INSTLIBDIR"'|g' \
-	    -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \
-	    $@.perl >$@+ && \
+	sed $(PERL_USE_LIB_REWRITE) $@.perl >$@+ && \
 	chmod +x $@+ && \
 	mv $@+ $@
 
 gitweb/gitweb.cgi: gitweb/gitweb.perl
 	$(QUIET_GEN)$(RM) $@ $@+ && \
-	sed -e '1s|#!.*perl|#!$(PERL_PATH_SQ)|' \
+	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C perl -s --no-print-directory instlibdir` && \
+	sed $(PERL_USE_LIB_REWRITE) \
 	    -e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \
 	    -e 's|++GIT_BINDIR++|$(bindir)|g' \
 	    -e 's|++GITWEB_CONFIG++|$(GITWEB_CONFIG)|g' \
@@ -1125,7 +1128,7 @@ gitweb/gitweb.cgi: gitweb/gitweb.perl
 	    -e 's|++GITWEB_HOME_LINK_STR++|$(GITWEB_HOME_LINK_STR)|g' \
 	    -e 's|++GITWEB_SITENAME++|$(GITWEB_SITENAME)|g' \
 	    -e 's|++GITWEB_PROJECTROOT++|$(GITWEB_PROJECTROOT)|g' \
-	    -e 's|"++GITWEB_PROJECT_MAXDEPTH++"|$(GITWEB_PROJECT_MAXDEPTH)|g' \
+	    -e 's|++GITWEB_PROJECT_MAXDEPTH++|$(GITWEB_PROJECT_MAXDEPTH)|g' \
 	    -e 's|++GITWEB_EXPORT_OK++|$(GITWEB_EXPORT_OK)|g' \
 	    -e 's|++GITWEB_STRICT_EXPORT++|$(GITWEB_STRICT_EXPORT)|g' \
 	    -e 's|++GITWEB_BASE_URL++|$(GITWEB_BASE_URL)|g' \
diff --git a/gitweb/README b/gitweb/README
index 6908036..fb5a027 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -214,6 +214,20 @@ not include variables usually directly set during build):
    Rename detection options for git-diff and git-diff-tree. By default
    ('-M'); set it to ('-C') or ('-C', '-C') to also detect copies, or
    set it to () if you don't want to have renames detection.
+ * $cache
+   To activate caching, set this to an instance of a Cache::Cache
+   compatible cache (e.g. Cache::Memcached).
+ * $large_cache_root
+   A directory in which gitweb will store potentially large cache
+   items (like snapshots or blobs).  Cache items in this directory are
+   not deleted, so you should clean it up yourself periodically.
+   Default: undef (no caching for large items).
+ * $large_cache_case_sensitive
+   Set this to 1 if the file system on which the large cache resides
+   is case sensitive; it will result in a shorter file names.
+ * $page_info
+   Set to 1 to display cache statistics at the bottom of each page;
+   can be used to check whether your cache works.
 
 
 Projects list file format
diff --git a/gitweb/gitweb.css b/gitweb/gitweb.css
index aa0eeca..849bb8e 100644
--- a/gitweb/gitweb.css
+++ b/gitweb/gitweb.css
@@ -71,6 +71,22 @@ div.page_footer_text {
 	font-style: italic;
 }
 
+div.page_info {
+	color: #777777;
+	padding: 4px 8px;
+	margin-top: 0.5em;
+	background-color: #ecebe8;
+}
+
+div.page_info p {
+	margin: 0px;
+	padding: 0px;
+}
+
+pre.cache_list {
+	margin: 0.5em 2em;
+}
+
 div.page_body {
 	padding: 8px;
 	font-family: monospace;
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 90cd99b..01deca0 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -16,6 +16,9 @@ use Encode;
 use Fcntl ':mode';
 use File::Find qw();
 use File::Basename qw(basename);
+
+use Git::RepoRoot;
+
 binmode STDOUT, ':utf8';
 
 BEGIN {
@@ -161,6 +164,49 @@ our %known_snapshot_format_aliases = (
 	'x-zip' => undef, '' => undef,
 );
 
+# Cache::Cache object to cache data from the repository, or undef for
+# no cache.  You would typically use a Cache::Memcached instance here.
+our $cache = undef;
+
+# Expiration time in seconds for transient cache entries, or undef for
+# no expiration.  (Only used if $cache is defined.)
+#
+# Transient cache entries (like get_sha1('HEAD')) are automatically
+# invalidated when an mtime of either the repository's root directory
+# or of the refs directory or any subdirectory changes.  This
+# mechanism *should* detect changes to the repository reliably if you
+# only use git or rsync to access it, and hence this expiration time
+# can be set very high.  (This does not default to non-expiring
+# [undef] just in case a change goes undetected for some reason.)  You
+# might want to set this to a lower time (e.g. a few minutes) if
+# developers change files in the refs directories on your server by
+# non-standard means (i.e. manually).
+#
+# You can usually go with the default here.
+our $cache_transient_expiration_time = 60 * 60 * 24;
+
+# Directory on disk to hold potentially large cache items (in
+# particular, snapshots, diffs and blobs), or undef for no cache.
+# Cache files will be created in this directory, but they will not be
+# expired; you should periodically delete old files yourself.  Setting
+# $large_cache_root but not $cache is possible, but usually not
+# sensible.
+our $large_cache_root = undef;
+
+# Is the file system on which your large cache resides case-sensitive?
+our $large_cache_case_sensitive = 0;
+
+# Extra cache key component to use.  This should stringify to a string
+# without null bytes; it is used as a means to discard old cache
+# entries whenever this key changes.  Since Gitweb already uses its
+# own version number as a cache key component, you don't normally need
+# this unless you're developing gitweb.
+our $cache_key = undef;
+
+# Display information in the footer of each page (currently only cache
+# statistics): 0 = none, 1 = short, 2 = long.
+our $page_info = 0;
+
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %feature = (
@@ -376,8 +422,375 @@ if (-e $GITWEB_CONFIG) {
 	do $GITWEB_CONFIG_SYSTEM if -e $GITWEB_CONFIG_SYSTEM;
 }
 
-# version of the core git binary
-our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown";
+
+# ======================================================================
+# caching layer
+
+package CachedRepo;
+
+use Digest::MD5 qw(md5_base64 md5_hex);
+use List::Util qw(max);
+
+use base qw(Git::Repo);
+
+# Global statistics, collected across repositories.
+# Hits, misses, sets, and failed_sets are counters, and get_list is an
+# arrayref of keys, where a key is an arrayref of key items.
+our %cache_statistics = (
+	hits => 0, misses => 0, sets => 0, failed_sets => 0, get_list => []);
+our %large_cache_statistics = (
+	hits => 0, misses => 0, sets => 0, failed_sets => 0, get_list => []);
+
+# Options: like Git::Repo->new, and the following:
+# cache: a Cache::Cache conforming cache instance
+# transient_expiration_time: expiration time in seconds for transient
+#     cache entries (like get_hash('HEAD')), or undef; do not set to
+#     30 days or more, since it makes Cache::Memcached hiccup
+sub new {
+	my ($class, %opts) = @_;
+	my $cache = delete $opts{cache};
+	my $large_cache_root = delete $opts{large_cache_root};
+	my $transient_expiration_time = delete $opts{transient_expiration_time};
+	my $self = $class->SUPER::new(%opts);
+	$self->{cache} = $cache;
+	$self->{large_cache_root} = $large_cache_root;
+	$self->{transient_expiration_time} = $transient_expiration_time;
+	return $self;
+}
+
+sub cache { shift->{cache} }
+sub large_cache_root { shift->{large_cache_root} }
+sub transient_expiration_time { shift->{transient_expiration_time} }
+
+# Fast function to generate a unique (short) key for the cache to use.
+# None of the parameters should contain null bytes.  Example:
+# $repo->get_key('sha1', 'HEAD:file1') eq 'dK2M2Y8AsgTpgAmY7PhC3q'
+sub get_key {
+	my $self = shift;
+	# Some caches (like Cache::FileCache) hash keys themselves,
+        # but Cache::Memcached does not like long keys, so we need to
+        # hash them.  MD5 is fine here, since (as of July 2008) there
+        # are only collision attacks, but no practical preimage
+        # attacks on MD5.  Constructing two colliding keys doesn't
+        # seem to pose much of a threat for the cache.  Digest::SHA1
+        # is only in core as of Perl 5.9.
+	return md5_base64(join "\0", $self->_key_items(@_));
+}
+
+# Same as get_key, but returns a case insensitive (but longer) key.
+sub get_case_insensitive_key {
+	my $self = shift;
+	return md5_hex(join "\0", $self->_key_items(@_));
+}
+
+# Return a list of strings that can be used to generate a key.
+sub _key_items {
+	my $self = shift;
+	return map { defined $_ ? " $_" : '' }
+	    ('gitweb', $version, $cache_key, 'project', $self->repo_dir, @_);
+}
+
+# Convenience function: cache_set(\@key, $value, $expire);
+# $expire is boolean and indicates whether an expiry time should be set.
+sub cache_set {
+	my ($self, $key, $value, $expire) = @_;
+	return unless $self->cache;
+	my $expiration_token = $expire ? $self->get_last_modification() : undef;
+	my $ok = $self->cache->set(
+		$self->get_key($expiration_token, @$key), $value,
+		$expire ? $self->transient_expiration_time : ());
+	$ok ? $cache_statistics{sets}++ : $cache_statistics{failed_sets}++;
+	return $ok;
+}
+
+# Convenience function: cache_get(\@key, $expire)
+# $expire must be the same that has been used for cache_set or the
+# lookup will fail.
+sub cache_get {
+	my ($self, $key, $expire) = @_;
+	return unless $self->cache;
+	my $expiration_token = $expire ? $self->get_last_modification() : undef;
+	my $val = $self->cache->get($self->get_key($expiration_token, @$key));
+	defined $val ? $cache_statistics{hits}++ : $cache_statistics{misses}++;
+	push @{$cache_statistics{get_list}},
+	    [$self->repo_dir, $expire ? 1 : 0, @$key];
+	return $val;
+}
+
+# Like Git::Repo->cmd_output, but with an added 'cache' option to
+# indicate that the output can be cached: if 1, it is cached but
+# expires when the repo is modified, if 2, it is cached indefinitely.
+sub cmd_output {
+	my ($self, %opts) = @_;
+	my $key = ['cmd', $opts{max_exit_code}, @{$opts{cmd}}];
+	my $output;
+	unless ($opts{cache} && defined($output = $self->cache_get(
+						$key, $opts{cache} == 1))) {
+		$output = $self->SUPER::cmd_output(%opts);
+		$self->cache_set($key, $output, $opts{cache} == 1) if $opts{cache};
+	}
+	return $output;
+}
+
+# The following methods override the base class (Git::Repo) methods to
+# add caching.
+
+sub get_sha1 {
+	my ($self, $object_id) = @_;
+	my $expire = ($object_id !~ /^[0-9a-f]{40}(?![0-9a-f])/);
+	my $triple = $self->cache_get(['SHA1', $object_id], $expire);
+	unless (defined $triple) {
+		$triple = [$self->SUPER::get_sha1($object_id)];
+		# Do not cache failed lookups -- missing SHA1s would
+		# be permanently cached, but a subsequent push to the
+		# repository might add those missing SHA1s to the
+		# repository.
+		return unless $triple->[0];
+		$self->cache_set(['SHA1', $object_id], $triple, $expire);
+	}
+	return wantarray ? @$triple : $triple->[0];
+}
+
+sub cat_file {
+	my ($self, $sha1) = @_;
+	my $type_content = $self->cache_get(['cat-file', $sha1], 0);
+	unless (defined $type_content) {
+		$type_content = [$self->SUPER::cat_file($sha1)];
+		die 'unexpected empty return value' unless @$type_content;
+		$self->cache_set(['cat-file', $sha1], $type_content, 0);
+	}
+	return wantarray ? @$type_content : $type_content->[1];
+}
+
+# get_commit and get_tag only return empty Commit and Tag objects,
+# which when loaded (lazily), happen to call cat_file and thus are
+# cached as well.
+
+# Mostly copied from base class.
+sub get_path {
+	my ($self, $tree, $file_sha1) = @_;
+	assert_sha1($tree, $file_sha1);
+	# This can be quite large, so use progressive_cmd_output.
+	my $ls_tree_read = $self->progressive_cmd_output(
+		cmd => [qw(ls-tree -r -t), $tree], separator => "\n",
+		cache => 1);
+	while (my $line = $ls_tree_read->()) {
+		if ($line =~ /^[0-9]+ [a-z]+ $file_sha1\t(.+)$/) {
+			while ($ls_tree_read->()) { } # cache it
+			return $1;
+		}
+	}
+	return undef;
+}
+
+sub get_refs {
+	my ($self, $pattern) = @_;
+	my $refs = $self->cache_get(['refs', $pattern], 1);
+	unless (defined $refs ) {
+		$refs = $self->SUPER::get_refs($pattern);
+		$self->cache_set(['refs', $pattern], $refs, 1);
+	}
+	return $refs;
+}
+
+sub name_rev {
+	my ($self, $sha1, $tags_only) = @_;
+	my $name = $self->cache_get(['name-rev', $sha1, $tags_only], 1);
+	unless (defined $name) {
+		# || '' is to cache failed lookups (name_rev doesn't
+		# ever return empty names).
+		$name = $self->SUPER::name_rev($sha1, $tags_only) || '';
+		$self->cache_set(['name-rev', $sha1, $tags_only], $name, 1);
+	}
+	return $name || undef;
+}
+
+# progressive_cmd_output(%opts)
+#
+# Return a coderef that returns a chunk of the output of the given
+# command each time it is called, or undef when the output is
+# exhausted.  For the output to be cached, it must be read until the
+# coderef returns undef, otherwise it will leave a stale temporary
+# file in the cache.
+#
+# Options:
+# cmd: an arrayref or string of arguments to git; if it's a string, it will be
+#     passed to the shell
+# max_exit_code: die if the command exits with a higher exit code (default: 0)
+# separator: like $/; if undef, read the output in chunks of arbitrary size
+# cache: if true, cache the output of the command (without expiration)
+#
+# Example:
+# my $diff_read = $repo->progressive_cmd_output(
+#     cmd => ['diff', $from_sha1, $to_sha1], separator => "\n", cache => 1]);
+# while (my $line = $diff_read->()) {
+# 	chomp $line;
+# 	...
+# }
+my $_file_seq = 0;
+sub progressive_cmd_output {
+	die 'must pass an odd number of arguments' unless @_ % 2;
+	my ($self, %opts) = @_;
+	local $/ = defined $opts{separator} ? delete $opts{separator} : \32768;
+	# Half of cmd_output has been copied here, but there's no
+	# pretty way to generalize it.
+	my (@cmd, $cmd_str);
+	if (ref($opts{cmd}) eq 'ARRAY') {
+		@cmd = ($self->_get_git_cmd, @{$opts{cmd}});
+		$cmd_str = join " ", @cmd;  # only used for diagnostics
+	} else {
+		$cmd_str = main::quote_command($self->_get_git_cmd) .
+		    " $opts{cmd}";  # this will be passed to the shell
+	}
+	# We read from $fh, whether it's a pipe or a cache file.  If
+	# it's a pipe, we also progressively cache it to
+	# $tmp_cache_file, and at the end move $tmp_cache_file_name to
+	# $cache_file_name.  This avoids having partially written
+	# cache entries.
+	my ($fh, $cache_file_name, $tmp_cache_file_name, $tmp_cache_file);
+	if ($opts{cache} && $self->large_cache_root) {
+		my @key_items = ('cmd', $opts{max_exit_code},
+				 ref($opts{cmd}) eq 'ARRAY' ?
+				 (@{$opts{cmd}}) : (undef, $opts{cmd}));
+		push @{$large_cache_statistics{get_list}},
+		    [$self->repo_dir, @key_items];
+		my $key = $large_cache_case_sensitive ?
+		    $self->get_key(@key_items) :
+		    $self->get_case_insensitive_key(@key_items);
+		# Make the key file-system safe; _ and @ are available
+		# on most file systems and sort after the dot as a
+		# bonus.
+		$key =~ tr{+/}{_@};
+		$cache_file_name = File::Spec->catfile(
+			$self->large_cache_root, $key);
+		$tmp_cache_file_name = File::Spec->catfile(
+			$self->large_cache_root,
+			join('.', $$, $_file_seq++, 'tmp'));
+	}
+	unless ($cache_file_name && open $fh, '<', $cache_file_name) {
+		# Not in cache -- open pipe, and open cache file to write to.
+		if (@cmd) {
+			open $fh, '-|', @cmd;
+		} else {
+			open $fh, '-|', $cmd_str;
+		}
+		die "cannot open pipe: $cmd_str" unless $fh;
+		open $tmp_cache_file, '>', $tmp_cache_file_name
+		    if $tmp_cache_file_name;
+		# Increment failed_sets; it will be decremented upon
+		# successful finalization of the cache entry.
+		$large_cache_statistics{failed_sets}++;
+		# Do not die if the open fails, it simply leaves
+		# $tmp_cache_file undefined.
+		# Record uncached calls as misses.
+		$large_cache_statistics{misses}++;
+	} else {
+		$large_cache_statistics{hits}++;
+	}
+	my $read = sub {
+		return undef unless $fh;  # already closed
+		my $output = <$fh>;
+		if ($output) {
+			# Write to cache and return.
+			if ($tmp_cache_file && ! print $tmp_cache_file $output) {
+				# Writing to cache failed; clean up
+				# and stop caching this pipe.
+				close $tmp_cache_file;
+				$tmp_cache_file = undef;
+				unlink $tmp_cache_file_name;
+			}
+			return $output;
+		}
+		# End of output; close and finalize cache.
+		if (close $fh) {
+			# We sometimes get mysterious "Bad file
+			# descriptor" errors here, but reading from
+			# the pipe worked fine, so let's not die.
+			#die "error closing pipe ($!): $cmd_str" if $!;
+			my $exit_code = $? >> 8;
+			die "Command died with exit code $exit_code: $cmd_str"
+			    if $exit_code > ($opts{max_exit_code} || 0);
+		}
+		$fh = undef;
+		if ($tmp_cache_file && close $tmp_cache_file) {
+			# Cache file written OK, move it in place.
+			if (rename $tmp_cache_file_name, $cache_file_name) {
+				$large_cache_statistics{failed_sets}--;
+				$large_cache_statistics{sets}++;
+			}
+		}
+		return undef;
+	};
+	# We can also provide a &close function here in case it
+	# becomes necessary to close pipes prematurely.
+	return $read;
+}
+
+# Return the seconds since epoch when the repository was last touched.
+sub get_last_modification {
+	my $self = shift;
+	return $self->{last_modification} if $self->{last_modification};
+	# Hashref mapping absolute paths of directories to mtimes.  We
+	# rely on the fact here that every time git updates a file, it
+	# creates it under a different name and then moves it in
+	# place, thus causing the mtime of the containing directory to
+	# be updated.  Hence it's enough to just stat the directories.
+	my $mtimes = $self->cache_get(['mtimes'], 0);
+	if ($mtimes) {
+		CHECK_CACHE: {
+			# Check if the cache response is up to date.
+			while (my ($dir, $mtime) = each %$mtimes) {
+				last CHECK_CACHE if (stat $dir)[9] != $mtime;
+			}
+			$self->{last_modification} = max(values %$mtimes);
+			return max(values %$mtimes);
+		}
+	}
+	# Either mtimes are not in cache, or at least one directory
+	# has been updated.  Traverse the whole ref tree and record
+	# all directory mtimes -- this is a bit slower than the
+	# up-to-date-ness check above since we end up stat'ing all
+	# files in the refs directory.
+	my $time = time;
+	$mtimes = { $self->repo_dir => (stat $self->repo_dir)[9] };
+	my $cacheable = 1;
+	File::Find::find(
+		sub {
+			my $time = time;  # get time first
+			# No way to avoid stat'ing unconditionally
+			# with File::Find.
+			my @stat = stat($_);
+			if (Fcntl::S_ISDIR($stat[2])) {
+				# Record the directory's mtime.
+				$mtimes->{$File::Find::name} = $stat[9];
+				# Mtimes have a 1-second granularity,
+				# so if the directory has *just* been
+				# modified, we might miss subsequent
+				# modifictions in the same second if
+				# we cached it.
+				$cacheable = 0 if $stat[9] >= $time;
+			}
+		}, File::Spec->catfile($self->repo_dir, 'refs'));
+	$self->cache_set(['mtimes'], $mtimes, 0) if $cacheable;
+	$self->{last_modification} = max(values %$mtimes);
+	return max(values %$mtimes);
+}
+
+package main;
+
+
+our $repo_root = Git::RepoRoot->new(directory => $projectroot,
+				    git_binary => $GIT,
+				    cache => $cache,
+				    large_cache_root => $large_cache_root,
+				    transient_expiration_time =>
+				      $cache_transient_expiration_time,
+				    repo_class => 'CachedRepo'
+    );
+
+# Version of the core git binary.  This should normally be the same as
+# the gitweb version, but it may diverge slightly during development.
+our $git_version = $repo_root->repo(directory => 'dummy')->version;
 
 $projects_list ||= $projectroot;
 
@@ -392,6 +805,7 @@ if (defined $action) {
 
 # parameters which are pathnames
 our $project = $cgi->param('p');
+our $repo = $repo_root->repo(directory => $project) if $project;
 if (defined $project) {
 	if (!validate_pathname($project) ||
 	    !(-d "$projectroot/$project") ||
@@ -509,13 +923,14 @@ sub evaluate_path_info {
 		undef $project;
 		return;
 	}
+	$repo = $repo_root->repo(directory => $project);
 	# do not change any parameters if an action is given using the query string
 	return if $action;
 	$path_info =~ s,^\Q$project\E/*,,;
 	my ($refname, $pathname) = split(/:/, $path_info, 2);
 	if (defined $pathname) {
 		# we got "project.git/branch:filename" or "project.git/branch:dir/"
-		# we could use git_get_type(branch:pathname), but it needs $git_dir
+		# we could use git_get_type(branch:pathname) here
 		$pathname =~ s,^/+,,;
 		if (!$pathname || substr($pathname, -1) eq "/") {
 			$action  ||= "tree";
@@ -533,10 +948,6 @@ sub evaluate_path_info {
 }
 evaluate_path_info();
 
-# path to the current git repository
-our $git_dir;
-$git_dir = "$projectroot/$project" if $project;
-
 # dispatch
 my %actions = (
 	"blame" => \&git_blame,
@@ -597,8 +1008,7 @@ sub href (%) {
 	# default is to use -absolute url() i.e. $my_uri
 	my $href = $params{-full} ? $my_url : $my_uri;
 
-	# XXX: Warning: If you touch this, check the search form for updating,
-	# too.
+	# If you touch this, check the search form for updating, too.
 
 	my @mapping = (
 		project => "p",
@@ -1309,10 +1719,12 @@ sub format_diff_cc_simplified {
 	$result .= "<div class=\"diff header\">" .
 	           "diff --cc ";
 	if (!is_deleted($diffinfo)) {
-		$result .= $cgi->a({-href => href(action=>"blob",
-		                                  hash_base=>$hash,
-		                                  hash=>$diffinfo->{'to_id'},
-		                                  file_name=>$diffinfo->{'to_file'}),
+		$result .= $cgi->a(
+			{-href => href(action=>"blob",
+				       $hash && git_get_type($hash) eq 'commit' ?
+				       (hash_base=>$hash) : (),
+				       hash=>$diffinfo->{'to_id'},
+				       file_name=>$diffinfo->{'to_file'}),
 		                    -class => "path"},
 		                   esc_path($diffinfo->{'to_file'}));
 	} else {
@@ -1332,8 +1744,6 @@ sub format_diff_line {
 	my ($from, $to) = @_;
 	my $diff_class = "";
 
-	chomp $line;
-
 	if ($from && $to && ref($from->{'href'}) eq "ARRAY") {
 		# combined diff
 		my $prefix = substr($line, 0, scalar @{$from->{'href'}});
@@ -1495,11 +1905,6 @@ sub get_feed_info {
 ## ----------------------------------------------------------------------
 ## git utility subroutines, invoking git commands
 
-# returns path to the core git executable and the --git-dir parameter as list
-sub git_cmd {
-	return $GIT, '--git-dir='.$git_dir;
-}
-
 # quote the given arguments for passing them to the shell
 # quote_command("command", "arg 1", "arg with ' and ! characters")
 # => "'command' 'arg 1' 'arg with '\'' and '\!' characters'"
@@ -1509,33 +1914,55 @@ sub quote_command {
 		    map( { my $a = $_; $a =~ s/(['!])/'\\$1'/g; "'$a'" } @_ ));
 }
 
-# get HEAD ref of given project as hash
-sub git_get_head_hash {
-	my $project = shift;
-	my $o_git_dir = $git_dir;
-	my $retval = undef;
-	$git_dir = "$projectroot/$project";
-	if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") {
-		my $head = <$fd>;
-		close $fd;
-		if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) {
-			$retval = $1;
+# git_get_sha1_or_die ( EXTENDED_OBJECT_IDENTIFER [, TYPE] )
+#
+# Look up the object referred to by C<EXTENDED_OBJECT_IDENTIFER> and
+# return its SHA1 hash in scalar context or its ($hash, $type, $size)
+# in list context.  Return an error page to the browser if the object
+# couldn't be found.
+#
+# If C<TYPE> is given, resolve tag and commit objects if necessary and
+# die unless the object found has the right type.  The $type return
+# value is guaranteed to equal C<TYPE>.
+sub git_get_sha1_or_die {
+	my ($object_id, $want_type) = @_;
+	# This method shouldn't be used for checking missing
+	# parameters, since it cannot generate proper error messages.
+	# Hence we die with 500.
+	die_error(500, 'No object given') unless $object_id;
+	my ($hash, $type, $size) = $repo->get_sha1($object_id);
+	unless ($hash) {
+		my $human_type = ucfirst($want_type || 'object');
+		die_error(404, "$human_type not found: '$object_id'");
+	}
+	if ($want_type && $want_type ne $type) {
+		if ($type eq 'tag') {
+			return git_get_sha1_or_die(
+				$repo->get_tag($hash)->object, $want_type);
+		} elsif ($type eq 'commit' && $want_type eq 'tree') {
+			return git_get_sha1_or_die(
+				$repo->get_commit($hash)->tree, $want_type);
+		} else {
+			# $object_id and $type can be off due to recursion,
+			# but fixing it complicates the code too much.
+			die_error(400, "Expected a $want_type object, but " .
+				  "'$object_id' is a $type object");
 		}
 	}
-	if (defined $o_git_dir) {
-		$git_dir = $o_git_dir;
-	}
-	return $retval;
+	return wantarray ? ($hash, $type, $size) : $hash;
+}
+
+# get HEAD ref hash of current project or die if no HEAD ref was found
+sub git_get_head_hash {
+	die_error(400, 'no project given') unless $project;
+	my $sha1 = $repo->get_sha1('HEAD')
+	    or die_error(500, "HEAD ref not found for project '$project'");
+	return $sha1;
 }
 
 # get type of given object
 sub git_get_type {
-	my $hash = shift;
-
-	open my $fd, "-|", git_cmd(), "cat-file", '-t', $hash or return;
-	my $type = <$fd>;
-	close $fd or return;
-	chomp $type;
+	my($sha1, $type, $size) = $repo->get_sha1(shift);
 	return $type;
 }
 
@@ -1563,20 +1990,14 @@ sub git_parse_project_config {
 	my $section_regexp = shift;
 	my %config;
 
-	local $/ = "\0";
-
-	open my $fh, "-|", git_cmd(), "config", '-z', '-l',
-		or return;
-
-	while (my $keyval = <$fh>) {
-		chomp $keyval;
+	return unless $repo;
+	for my $keyval (split "\0", $repo->cmd_output(
+				cmd => [qw(config -z -l)], cache => 1)) {
 		my ($key, $value) = split(/\n/, $keyval, 2);
 
 		hash_set_multi(\%config, $key, $value)
 			if (!defined $section_regexp || $key =~ /^(?:$section_regexp)\./o);
 	}
-	close $fh;
-
 	return %config;
 }
 
@@ -1639,9 +2060,9 @@ sub git_get_project_config {
 
 	# get config
 	if (!defined $config_file ||
-	    $config_file ne "$git_dir/config") {
+	    $config_file ne "$projectroot/$project/config") {
 		%config = git_parse_project_config('gitweb');
-		$config_file = "$git_dir/config";
+		$config_file = "$projectroot/$project/config";
 	}
 
 	# ensure given type
@@ -1656,65 +2077,42 @@ sub git_get_project_config {
 	return $config{"gitweb.$key"};
 }
 
-# get hash of given path at given ref
-sub git_get_hash_by_path {
-	my $base = shift;
-	my $path = shift || return undef;
-	my $type = shift;
-
+# Return the SHA1 of the blob or tree at the path in the given commit,
+# or return undef if it does not exist.
+sub git_get_sha1_by_path {
+	my ($base, $path, $type) = @_;
 	$path =~ s,/+$,,;
-
-	open my $fd, "-|", git_cmd(), "ls-tree", $base, "--", $path
-		or die_error(500, "Open git-ls-tree failed");
-	my $line = <$fd>;
-	close $fd or return undef;
-
-	if (!defined $line) {
-		# there is no tree or hash given by $path at $base
-		return undef;
-	}
-
-	#'100644 blob 0fa3f3a66fb6a137f6ec2c19351ed4d807070ffa	panic.c'
-	$line =~ m/^([0-9]+) (.+) ([0-9a-fA-F]{40})\t/;
-	if (defined $type && $type ne $2) {
-		# type doesn't match
-		return undef;
-	}
-	return $3;
+	return $repo->get_sha1("$base:$path", $type);
 }
 
-# get path of entry with given hash at given tree-ish (ref)
-# used to get 'from' filename for combined diff (merge commit) for renames
+# Get path of entry with given hash at given tree-ish (ref); used to
+# get 'from' filename for combined diff (merge commit) for renames.
+# Note that this does not resolve tag or commit objects in the $hash
+# parameter, you must pass a tree or blob object.
 sub git_get_path_by_hash {
 	my $base = shift || return;
 	my $hash = shift || return;
 
-	local $/ = "\0";
+	my $tree = git_get_sha1_or_die($base, 'tree');
+	my ($file_sha1, $file_type) = $repo->get_sha1($hash);
+	die_error(404, "object not found: '$hash'") unless $file_sha1;
+	die_error(400, "'$hash' is a $file_type object, not a tree or blob object")
+	    unless $file_type eq 'blob' || $file_type eq 'tree';
 
-	open my $fd, "-|", git_cmd(), "ls-tree", '-r', '-t', '-z', $base
-		or return undef;
-	while (my $line = <$fd>) {
-		chomp $line;
-
-		#'040000 tree 595596a6a9117ddba9fe379b6b012b558bac8423	gitweb'
-		#'100644 blob e02e90f0429be0d2a69b76571101f20b8f75530f	gitweb/README'
-		if ($line =~ m/(?:[0-9]+) (?:.+) $hash\t(.+)$/) {
-			close $fd;
-			return $1;
-		}
-	}
-	close $fd;
-	return undef;
+	return $repo->get_path($tree, $file_sha1);
 }
 
 ## ......................................................................
 ## git utility functions, directly accessing git repository
 
+# The following subroutines locally change the global $project
+# variable as a side-effect so that their calls to
+# git_get_project_config work.
+
 sub git_get_project_description {
-	my $path = shift;
+	local $project = shift;
 
-	$git_dir = "$projectroot/$path";
-	open my $fd, "$git_dir/description"
+	open my $fd, "$projectroot/$project/description"
 		or return git_get_project_config('description');
 	my $descr = <$fd>;
 	close $fd;
@@ -1725,10 +2123,9 @@ sub git_get_project_description {
 }
 
 sub git_get_project_url_list {
-	my $path = shift;
+	local $project = shift;
 
-	$git_dir = "$projectroot/$path";
-	open my $fd, "$git_dir/cloneurl"
+	open my $fd, "$projectroot/$project/cloneurl"
 		or return wantarray ?
 		@{ config_to_multi(git_get_project_config('url')) } :
 		   config_to_multi(git_get_project_config('url'));
@@ -1863,11 +2260,10 @@ sub git_get_project_list_from_file {
 }
 
 sub git_get_project_owner {
-	my $project = shift;
+	local $project = shift;
 	my $owner;
 
 	return undef unless $project;
-	$git_dir = "$projectroot/$project";
 
 	if (!defined $gitweb_project_owner) {
 		git_get_project_list_from_file();
@@ -1880,44 +2276,45 @@ sub git_get_project_owner {
 		$owner = git_get_project_config('owner');
 	}
 	if (!defined $owner) {
-		$owner = get_file_owner("$git_dir");
+		$owner = get_file_owner("$projectroot/$project");
 	}
 
 	return $owner;
 }
 
 sub git_get_last_activity {
-	my ($path) = @_;
-	my $fd;
+	my $path = shift;
 
-	$git_dir = "$projectroot/$path";
-	open($fd, "-|", git_cmd(), 'for-each-ref',
-	     '--format=%(committer)',
-	     '--sort=-committerdate',
-	     '--count=1',
-	     'refs/heads') or return;
-	my $most_recent = <$fd>;
-	close $fd or return;
-	if (defined $most_recent &&
-	    $most_recent =~ / (\d+) [-+][01]\d\d\d$/) {
-		my $timestamp = $1;
-		my $age = time - $timestamp;
-		return ($age, age_string($age));
-	}
-	return (undef, undef);
+	chomp(my $most_recent = $repo_root->repo(directory => $path)->cmd_output(
+		      cmd => [ qw(for-each-ref --count=1 --format=%(committer)),
+			       qw(--sort=-committerdate refs/heads) ],
+		      cache => 1, max_exit_code => 255)) or return;
+	$most_recent =~ / (\d+) [-+][01]\d\d\d$/ or return;
+	my $timestamp = $1;
+	my $age = time - $timestamp;
+	return ($age, age_string($age));
 }
 
+# Return a hashref from SHA1s to arrayrefs of ref names.  Example:
+# { '7e51...' => ['tags/tag-object'], # tag SHA1
+#   '51ba...' => ['tags/tag-object'], # referenced commit SHA1
+#   '3c4a...' => ['heads/master', 'tags/another-tag'] }
 sub git_get_references {
 	my $type = shift || "";
 	my %refs;
-	# 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c refs/tags/v2.6.11
-	# c39ae07f393806ccf406ef966e9a15afc43cc36a refs/tags/v2.6.11^{}
-	open my $fd, "-|", git_cmd(), "show-ref", "--dereference",
-		($type ? ("--", "refs/$type") : ()) # use -- <pattern> if $type
-		or return;
-
-	while (my $line = <$fd>) {
-		chomp $line;
+	# This is not implementable in terms of $repo->get_refs
+	# because get_refs doesn't dereference, and we cannot
+	# dereference a lot of SHA1s ourselves as long as there is no
+	# implementation that uses Cache::Memcached->get_multi.
+	# Hence, we use cmd_output.
+	my @lines = split "\n", $repo->cmd_output(
+		cmd => ['show-ref', '--dereference',
+			($type ? ("--", "refs/$type") : ())],
+		max_exit_code => 1,  # exits with status 1 on empty repos
+		cache => 1);
+	for my $line (@lines) {
+		# 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c refs/tags/v2.6.11
+		# c39ae07f393806ccf406ef966e9a15afc43cc36a refs/tags/v2.6.11^{}
 		if ($line =~ m!^([0-9a-fA-F]{40})\srefs/($type/?[^^]+)!) {
 			if (defined $refs{$1}) {
 				push @{$refs{$1}}, $2;
@@ -1926,24 +2323,15 @@ sub git_get_references {
 			}
 		}
 	}
-	close $fd or return;
 	return \%refs;
 }
 
 sub git_get_rev_name_tags {
-	my $hash = shift || return undef;
+	my $hash = shift or die_error();
 
-	open my $fd, "-|", git_cmd(), "name-rev", "--tags", $hash
-		or return;
-	my $name_rev = <$fd>;
-	close $fd;
-
-	if ($name_rev =~ m|^$hash tags/(.*)$|) {
-		return $1;
-	} else {
-		# catches also '$hash undefined' output
-		return undef;
-	}
+	my $name = $repo->name_rev($hash);
+	$name =~ s!^tags/!! if $name;
+	return $name;
 }
 
 ## ----------------------------------------------------------------------
@@ -1982,14 +2370,12 @@ sub parse_date {
 }
 
 sub parse_tag {
-	my $tag_id = shift;
+	my $sha1 = shift;
 	my %tag;
 	my @comment;
 
-	open my $fd, "-|", git_cmd(), "cat-file", "tag", $tag_id or return;
-	$tag{'id'} = $tag_id;
-	while (my $line = <$fd>) {
-		chomp $line;
+	my ($raw_header, $raw_comment) = split "\n\n", $repo->cat_file($sha1), 2;
+	for my $line (split "\n", $raw_header) {
 		if ($line =~ m/^object ([0-9a-fA-F]{40})$/) {
 			$tag{'object'} = $1;
 		} elsif ($line =~ m/^type (.+)$/) {
@@ -2000,19 +2386,9 @@ sub parse_tag {
 			$tag{'author'} = $1;
 			$tag{'epoch'} = $2;
 			$tag{'tz'} = $3;
-		} elsif ($line =~ m/--BEGIN/) {
-			push @comment, $line;
-			last;
-		} elsif ($line eq "") {
-			last;
 		}
 	}
-	push @comment, <$fd>;
-	$tag{'comment'} = \@comment;
-	close $fd or return;
-	if (!defined $tag{'name'}) {
-		return
-	};
+	$tag{'comment'} = [split "\n", $raw_comment];
 	return %tag
 }
 
@@ -2021,8 +2397,6 @@ sub parse_commit_text {
 	my @commit_lines = split '\n', $commit_text;
 	my %co;
 
-	pop @commit_lines; # Remove '\0'
-
 	if (! @commit_lines) {
 		return;
 	}
@@ -2117,48 +2491,38 @@ sub parse_commit_text {
 
 sub parse_commit {
 	my ($commit_id) = @_;
-	my %co;
-
-	local $/ = "\0";
-
-	open my $fd, "-|", git_cmd(), "rev-list",
-		"--parents",
-		"--header",
-		"--max-count=1",
-		$commit_id,
-		"--",
-		or die_error(500, "Open git-rev-list failed");
-	%co = parse_commit_text(<$fd>, 1);
-	close $fd;
+	# This currently supports arbitrary object names, so we
+	# resolve it here.
+	my $sha1 = git_get_sha1_or_die($commit_id, 'commit');
 
-	return %co;
+	# This formats commits slightly differently than the raw
+	# cat-file, so we cannot use cat_file here.  This function
+	# should be replaced by $repo->get_commit anyway.
+	my $commit_text = $repo->cmd_output(
+		cmd => [qw(rev-list --parents --header --max-count=1), $sha1, '--'],
+		cache => 2);
+	$commit_text =~ s/\x00$//;
+	return parse_commit_text($commit_text, 1);
 }
 
 sub parse_commits {
 	my ($commit_id, $maxcount, $skip, $filename, @args) = @_;
-	my @cos;
+	# This currently supports arbitrary object names, so we
+	# resolve it here.
+	my $sha1 = git_get_sha1_or_die($commit_id);
+	git_get_sha1_or_die("$sha1:$filename") if $filename;  # check existence
 
 	$maxcount ||= 1;
 	$skip ||= 0;
 
-	local $/ = "\0";
-
-	open my $fd, "-|", git_cmd(), "rev-list",
-		"--header",
-		@args,
-		("--max-count=" . $maxcount),
-		("--skip=" . $skip),
-		@extra_options,
-		$commit_id,
-		"--",
-		($filename ? ($filename) : ())
-		or die_error(500, "Open git-rev-list failed");
-	while (my $line = <$fd>) {
-		my %co = parse_commit_text($line);
-		push @cos, \%co;
-	}
-	close $fd;
-
+	# TODO: Integrate this into Git::Repo, and get rid of @args
+	# and @extra_options.
+	my @raw_commits = split "\0", $repo->cmd_output(
+		cmd => ['rev-list', '--header', @args, "--max-count=$maxcount",
+			"--skip=$skip", @extra_options,	$sha1, "--",
+			$filename || ()],
+		cache => 2);
+	my @cos = map { { parse_commit_text($_) } } @raw_commits;
 	return wantarray ? @cos : \@cos;
 }
 
@@ -2263,7 +2627,9 @@ sub parse_from_to_diffinfo {
 		# ordinary (not combined) diff
 		$from->{'file'} = $diffinfo->{'from_file'};
 		if ($diffinfo->{'status'} ne "A") { # not new (added) file
-			$from->{'href'} = href(action=>"blob", hash_base=>$hash_parent,
+			$from->{'href'} = href(action=>"blob",
+					       $hash_parent && git_get_type($hash_parent) eq 'commit' ?
+					       (hash_base=>$hash_parent) : (),
 			                       hash=>$diffinfo->{'from_id'},
 			                       file_name=>$from->{'file'});
 		} else {
@@ -2273,7 +2639,9 @@ sub parse_from_to_diffinfo {
 
 	$to->{'file'} = $diffinfo->{'to_file'};
 	if (!is_deleted($diffinfo)) { # file exists in result
-		$to->{'href'} = href(action=>"blob", hash_base=>$hash,
+		$to->{'href'} = href(action=>"blob",
+				     $hash && git_get_type($hash) eq 'commit' ?
+				     (hash_base=>$hash) : (),
 		                     hash=>$diffinfo->{'to_id'},
 		                     file_name=>$to->{'file'});
 	} else {
@@ -2288,15 +2656,15 @@ sub git_get_heads_list {
 	my $limit = shift;
 	my @headslist;
 
-	open my $fd, '-|', git_cmd(), 'for-each-ref',
-		($limit ? '--count='.($limit+1) : ()), '--sort=-committerdate',
-		'--format=%(objectname) %(refname) %(subject)%00%(committer)',
-		'refs/heads'
-		or return;
-	while (my $line = <$fd>) {
+	my @lines = split "\n", $repo->cmd_output(
+		cmd => ['for-each-ref', '--sort=-committerdate',
+			$limit ? '--count='.($limit+1) : (),
+			'--format=%(objectname) %(refname) %(subject)%00%(committer)',
+			'refs/heads'],
+		cache => 1);
+	for my $line (@lines) {
 		my %ref_item;
 
-		chomp $line;
 		my ($refinfo, $committerinfo) = split(/\0/, $line);
 		my ($hash, $name, $title) = split(' ', $refinfo, 3);
 		my ($committer, $epoch, $tz) =
@@ -2316,7 +2684,6 @@ sub git_get_heads_list {
 
 		push @headslist, \%ref_item;
 	}
-	close $fd;
 
 	return wantarray ? @headslist : \@headslist;
 }
@@ -2325,16 +2692,16 @@ sub git_get_tags_list {
 	my $limit = shift;
 	my @tagslist;
 
-	open my $fd, '-|', git_cmd(), 'for-each-ref',
-		($limit ? '--count='.($limit+1) : ()), '--sort=-creatordate',
-		'--format=%(objectname) %(objecttype) %(refname) '.
-		'%(*objectname) %(*objecttype) %(subject)%00%(creator)',
-		'refs/tags'
-		or return;
-	while (my $line = <$fd>) {
+	my @lines = split "\n", $repo->cmd_output(
+		cmd => ['for-each-ref', '--sort=-creatordate',
+			($limit ? '--count='.($limit+1) : ()),
+			'--format=%(objectname) %(objecttype) %(refname) '.
+			'%(*objectname) %(*objecttype) %(subject)%00%(creator)',
+			'refs/tags'],
+		cache => 1);
+	for my $line (@lines) {
 		my %ref_item;
 
-		chomp $line;
 		my ($refinfo, $creatorinfo) = split(/\0/, $line);
 		my ($id, $type, $name, $refid, $reftype, $title) = split(' ', $refinfo, 6);
 		my ($creator, $epoch, $tz) =
@@ -2365,7 +2732,6 @@ sub git_get_tags_list {
 
 		push @tagslist, \%ref_item;
 	}
-	close $fd;
 
 	return wantarray ? @tagslist : \@tagslist;
 }
@@ -2429,8 +2795,16 @@ sub mimetype_guess {
 	return $mime;
 }
 
+# Replacement for (heuristic) -T operator.  (perldoc -f -T)
+sub is_ascii_text {
+	my $text = shift;
+	return ((grep { ord($_) > 127 } split('', $text)) / length $text) <= 0.3;
+}
+
+# Determine the MIME type of a blob based on its file name ($filename)
+# and its first n bytes ($snippet).
 sub blob_mimetype {
-	my $fd = shift;
+	my $snippet = shift;
 	my $filename = shift;
 
 	if ($filename) {
@@ -2438,10 +2812,7 @@ sub blob_mimetype {
 		$mime and return $mime;
 	}
 
-	# just in case
-	return $default_blob_plain_mimetype unless $fd;
-
-	if (-T $fd) {
+	if (is_ascii_text($snippet)) {
 		return 'text/plain';
 	} elsif (! $filename) {
 		return 'application/octet-stream';
@@ -2457,9 +2828,9 @@ sub blob_mimetype {
 }
 
 sub blob_contenttype {
-	my ($fd, $file_name, $type) = @_;
+	my ($snippet, $file_name, $type) = @_;
 
-	$type ||= blob_mimetype($fd, $file_name);
+	$type ||= blob_mimetype($snippet, $file_name);
 	if ($type eq 'text/plain' && defined $default_text_plain_charset) {
 		$type .= "; charset=$default_text_plain_charset";
 	}
@@ -2472,7 +2843,6 @@ sub blob_contenttype {
 
 sub git_header_html {
 	my $status = shift || "200 OK";
-	my $expires = shift;
 
 	my $title = "$site_name";
 	if (defined $project) {
@@ -2500,7 +2870,7 @@ sub git_header_html {
 		$content_type = 'text/html';
 	}
 	print $cgi->header(-type=>$content_type, -charset => 'utf-8',
-	                   -status=> $status, -expires => $expires);
+	                   -status=> $status, -cache_control => 'no-cache');
 	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
 	print <<EOF;
 <?xml version="1.0" encoding="utf-8"?>
@@ -2573,7 +2943,7 @@ EOF
 	print "</head>\n" .
 	      "<body>\n";
 
-	if (-f $site_header) {
+	if ($site_header && -f $site_header) {
 		open (my $fd, $site_header);
 		print <$fd>;
 		close $fd;
@@ -2593,8 +2963,8 @@ EOF
 	}
 	print "</div>\n";
 
-	my ($have_search) = gitweb_check_feature('search');
-	if (defined $project && $have_search) {
+	if (defined $project && gitweb_check_feature('search') &&
+	    $repo->get_sha1('HEAD')) {
 		if (!defined $searchtext) {
 			$searchtext = "";
 		}
@@ -2662,6 +3032,39 @@ sub git_footer_html {
 	}
 	print "</div>\n"; # class="page_footer"
 
+	if ($page_info) {
+		print "<div class=\"page_info\">\n";
+		my $print_stats = sub {
+			my ($name, $cache_exists) = (shift, shift);
+			my %s = @_;  # statistics hash
+			if ($cache_exists) {
+				my $gets = $s{hits} + $s{misses};
+				print "<p>" . ucfirst($name) . ": " .
+				    "<b>$gets</b> gets " .
+				    "(<b>$s{hits}</b> hits + ".
+				    "<b>$s{misses}</b> misses); " .
+				    "<b>$s{sets}</b> sets, " .
+				    "<b>$s{failed_sets}</b> failed sets.</p>\n";
+				if ($page_info == 2 && @{$s{get_list}}) {
+					print "<pre class=\"cache_list\">";
+					print join("\n",
+						   map(esc_html(join ',',
+								map(defined $_ ? " $_" : '',
+								    @$_)),
+						       @{$s{get_list}}));
+					print "</pre>\n";
+				}
+			} else {
+				print "<p><i>No $name.</i></p>\n";
+			}
+		};
+		$print_stats->('main cache', $cache,
+			       %CachedRepo::cache_statistics);
+		$print_stats->('large cache', $large_cache_root,
+			       %CachedRepo::large_cache_statistics);
+		print "</div>\n"; # class="page_info"
+	}
+
 	if (-f $site_footer) {
 		open (my $fd, $site_footer);
 		print <$fd>;
@@ -2710,6 +3113,8 @@ sub git_print_page_nav {
 	my ($current, $suppress, $head, $treehead, $treebase, $extra) = @_;
 	$extra = '' if !defined $extra; # pager or formats
 
+	return unless $repo->get_sha1('HEAD');  # no navigation for empty repos
+
 	my @navs = qw(summary shortlog log commit commitdiff tree);
 	if ($suppress) {
 		@navs = grep { $_ ne $suppress } @navs;
@@ -2893,20 +3298,8 @@ sub git_print_log ($;%) {
 
 # return link target (what link points to)
 sub git_get_link_target {
-	my $hash = shift;
-	my $link_target;
-
-	# read link
-	open my $fd, "-|", git_cmd(), "cat-file", "blob", $hash
-		or return;
-	{
-		local $/;
-		$link_target = <$fd>;
-	}
-	close $fd
-		or return;
-
-	return $link_target;
+	my $sha1 = shift;
+	return $repo->cat_file($sha1);
 }
 
 # given link target, and the directory (basedir) the link is in,
@@ -3383,7 +3776,7 @@ sub git_difftree_body {
 }
 
 sub git_patchset_body {
-	my ($fd, $difftree, $hash, @hash_parents) = @_;
+	my ($read, $difftree, $hash, @hash_parents) = @_;
 	my ($hash_parent) = $hash_parents[0];
 
 	my $is_combined = (@hash_parents > 1);
@@ -3397,7 +3790,7 @@ sub git_patchset_body {
 	print "<div class=\"patchset\">\n";
 
 	# skip to first patch
-	while ($patch_line = <$fd>) {
+	while ($patch_line = $read->()) {
 		chomp $patch_line;
 
 		last if ($patch_line =~ m/^diff /);
@@ -3465,7 +3858,7 @@ sub git_patchset_body {
 		# print extended diff header
 		print "<div class=\"diff extended_header\">\n";
 	EXTENDED_HEADER:
-		while ($patch_line = <$fd>) {
+		while ($patch_line = $read->()) {
 			chomp $patch_line;
 
 			last EXTENDED_HEADER if ($patch_line =~ m/^--- |^diff /);
@@ -3484,7 +3877,7 @@ sub git_patchset_body {
 		#assert($patch_line =~ m/^---/) if DEBUG;
 
 		my $last_patch_line = $patch_line;
-		$patch_line = <$fd>;
+		$patch_line = $read->();
 		chomp $patch_line;
 		#assert($patch_line =~ m/^\+\+\+/) if DEBUG;
 
@@ -3494,7 +3887,7 @@ sub git_patchset_body {
 
 		# the patch itself
 	LINE:
-		while ($patch_line = <$fd>) {
+		while ($patch_line = $read->()) {
 			chomp $patch_line;
 
 			next PATCH if ($patch_line =~ m/^diff /);
@@ -3545,9 +3938,6 @@ sub fill_project_list_info {
  PROJECT:
 	foreach my $pr (@$projlist) {
 		my (@activity) = git_get_last_activity($pr->{'path'});
-		unless (@activity) {
-			next PROJECT;
-		}
 		($pr->{'age'}, $pr->{'age_string'}) = @activity;
 		if (!defined $pr->{'descr'}) {
 			my $descr = git_get_project_description($pr->{'path'}) || "";
@@ -3585,7 +3975,9 @@ sub print_sort_th {
 		if ($str_sort) {
 			@$list = sort {$a->{$key} cmp $b->{$key}} @$list;
 		} else {
-			@$list = sort {$a->{$key} <=> $b->{$key}} @$list;
+			# Sort undefined keys last.
+			@$list = sort { (defined $a->{$key} ? $a->{$key} : 1e30) <=>
+					(defined $b->{$key} ? $b->{$key} : 1e30)} @$list;
 		}
 		print "<th>$header</th>\n";
 	} else {
@@ -3653,16 +4045,20 @@ sub git_project_list_body {
 		      "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
 		                        -class => "list", -title => $pr->{'descr_long'}},
 		                        esc_html($pr->{'descr'})) . "</td>\n" .
-		      "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n";
-		print "<td class=\"". age_class($pr->{'age'}) . "\">" .
+		      "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n" .
+		      "<td class=\"". age_class($pr->{'age'}) . "\">" .
 		      (defined $pr->{'age_string'} ? $pr->{'age_string'} : "No commits") . "</td>\n" .
 		      "<td class=\"link\">" .
-		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary")}, "summary")   . " | " .
-		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"shortlog")}, "shortlog") . " | " .
-		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"log")}, "log") . " | " .
-		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"tree")}, "tree") .
-		      ($pr->{'forks'} ? " | " . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "forks") : '') .
-		      "</td>\n" .
+		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary")}, "summary");
+		if ($pr->{'age_string'}) {
+			# Non-empty repository.
+			print " | " .
+			    $cgi->a({-href => href(project=>$pr->{'path'}, action=>"shortlog")}, "shortlog") . " | " .
+			    $cgi->a({-href => href(project=>$pr->{'path'}, action=>"log")}, "log") . " | " .
+			    $cgi->a({-href => href(project=>$pr->{'path'}, action=>"tree")}, "tree");
+		}
+		print " | " . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "forks") if $pr->{'forks'};
+		print "</td>\n" .
 		      "</tr>\n";
 	}
 	if (defined $extra) {
@@ -3760,8 +4156,8 @@ sub git_history_body {
 		      $cgi->a({-href => href(action=>"commitdiff", hash=>$commit)}, "commitdiff");
 
 		if ($ftype eq 'blob') {
-			my $blob_current = git_get_hash_by_path($hash_base, $file_name);
-			my $blob_parent  = git_get_hash_by_path($commit, $file_name);
+			my $blob_current = git_get_sha1_by_path($hash_base, $file_name);
+			my $blob_parent = git_get_sha1_by_path($commit, $file_name);
 			if (defined $blob_current && defined $blob_parent &&
 					$blob_current ne $blob_parent) {
 				print " | " .
@@ -4016,9 +4412,9 @@ sub git_project_index {
 
 sub git_summary {
 	my $descr = git_get_project_description($project) || "none";
-	my %co = parse_commit("HEAD");
-	my %cd = %co ? parse_date($co{'committer_epoch'}, $co{'committer_tz'}) : ();
-	my $head = $co{'id'};
+	my $head = $repo->get_sha1('HEAD', 'commit');
+	my %co = parse_commit($head) if $head;
+	my %cd = parse_date($co{'committer_epoch'}, $co{'committer_tz'}) if $head;
 
 	my $owner = git_get_project_owner($project);
 
@@ -4037,7 +4433,7 @@ sub git_summary {
 	git_header_html();
 	git_print_page_nav('summary','', $head);
 
-	print "<div class=\"title\">&nbsp;</div>\n";
+	print "<div class=\"title\">&nbsp;</div>\n" if $head;
 	print "<table class=\"projects_list\">\n" .
 	      "<tr><td>description</td><td>" . esc_html($descr) . "</td></tr>\n" .
 	      "<tr><td>owner</td><td>" . esc_html($owner) . "</td></tr>\n";
@@ -4103,14 +4499,12 @@ sub git_summary {
 }
 
 sub git_tag {
-	my $head = git_get_head_hash($project);
+	my $head = git_get_head_hash();
+	my $sha1 = git_get_sha1_or_die($hash, 'tag');
 	git_header_html();
 	git_print_page_nav('','', $head,undef,$head);
-	my %tag = parse_tag($hash);
-
-	if (! %tag) {
-		die_error(404, "Unknown tag object");
-	}
+	# TODO: This wants to become $repo->get_tag.
+	my %tag = parse_tag($sha1);
 
 	git_print_header_div('commit', esc_html($tag{'name'}), $hash);
 	print "<div class=\"title_text\">\n" .
@@ -4134,7 +4528,6 @@ sub git_tag {
 	print "<div class=\"page_body\">";
 	my $comment = $tag{'comment'};
 	foreach my $line (@$comment) {
-		chomp $line;
 		print esc_html($line, -nbsp=>1) . "<br/>\n";
 	}
 	print "</div>\n";
@@ -4145,25 +4538,19 @@ sub git_blame {
 	my $fd;
 	my $ftype;
 
-	gitweb_check_feature('blame')
-	    or die_error(403, "Blame view not allowed");
+	die_error(403, "Blame view not allowed")
+	    unless gitweb_check_feature('blame');
 
 	die_error(400, "No file name given") unless $file_name;
-	$hash_base ||= git_get_head_hash($project);
-	die_error(404, "Couldn't find base commit") unless ($hash_base);
-	my %co = parse_commit($hash_base)
-		or die_error(404, "Commit not found");
-	if (!defined $hash) {
-		$hash = git_get_hash_by_path($hash_base, $file_name, "blob")
-			or die_error(404, "Error looking up file");
-	}
-	$ftype = git_get_type($hash);
-	if ($ftype !~ "blob") {
-		die_error(400, "Object is not a blob");
-	}
-	open ($fd, "-|", git_cmd(), "blame", '-p', '--',
-	      $file_name, $hash_base)
-		or die_error(500, "Open git-blame failed");
+	$hash_base ||= git_get_head_hash();
+	my $hash_base_sha1 = git_get_sha1_or_die($hash_base, 'commit');
+	my %co = parse_commit($hash_base);
+	$hash ||= git_get_sha1_by_path($hash_base, $file_name, 'blob')
+	    or die_error(404, "Error looking up file: '$file_name'");
+	git_get_sha1_or_die($hash, 'blob');  # check existence
+	my @blame_lines = split "\n", $repo->cmd_output(
+		cmd => ['blame', '-p', '--', $file_name, $hash_base_sha1],
+	    cache => 2);
 	git_header_html();
 	my $formats_nav =
 		$cgi->a({-href => href(action=>"blob", -replay=>1)},
@@ -4188,7 +4575,7 @@ sub git_blame {
 HTML
 	my %metainfo = ();
 	while (1) {
-		$_ = <$fd>;
+		$_ = shift @blame_lines;
 		last unless defined $_;
 		my ($full_rev, $orig_lineno, $lineno, $group_size) =
 		    /^([0-9a-f]{40}) (\d+) (\d+)(?: (\d+))?$/;
@@ -4196,14 +4583,13 @@ HTML
 			$metainfo{$full_rev} = {};
 		}
 		my $meta = $metainfo{$full_rev};
-		while (<$fd>) {
+		while ($_ = shift @blame_lines) {
 			last if (s/^\t//);
 			if (/^(\S+) (.*)$/) {
 				$meta->{$1} = $2;
 			}
 		}
 		my $data = $_;
-		chomp $data;
 		my $rev = substr($full_rev, 0, 8);
 		my $author = $meta->{'author'};
 		my %date = parse_date($meta->{'author-time'},
@@ -4224,11 +4610,9 @@ HTML
 			              esc_html($rev));
 			print "</td>\n";
 		}
-		open (my $dd, "-|", git_cmd(), "rev-parse", "$full_rev^")
-			or die_error(500, "Open git-rev-parse failed");
-		my $parent_commit = <$dd>;
-		close $dd;
-		chomp($parent_commit);
+		# TODO: $parent_commit can be undef, in which case the
+		# link becomes invalid.
+		my $parent_commit = $repo->get_sha1("$full_rev^");
 		my $blamed = href(action => 'blame',
 		                  file_name => $meta->{'filename'},
 		                  hash_base => $parent_commit);
@@ -4243,13 +4627,11 @@ HTML
 	}
 	print "</table>\n";
 	print "</div>";
-	close $fd
-		or print "Reading blob failed\n";
 	git_footer_html();
 }
 
 sub git_tags {
-	my $head = git_get_head_hash($project);
+	my $head = git_get_head_hash();
 	git_header_html();
 	git_print_page_nav('','', $head,undef,$head);
 	git_print_header_div('summary', $project);
@@ -4262,7 +4644,7 @@ sub git_tags {
 }
 
 sub git_heads {
-	my $head = git_get_head_hash($project);
+	my $head = git_get_head_hash();
 	git_header_html();
 	git_print_page_nav('','', $head,undef,$head);
 	git_print_header_div('summary', $project);
@@ -4280,9 +4662,9 @@ sub git_blob_plain {
 
 	if (!defined $hash) {
 		if (defined $file_name) {
-			my $base = $hash_base || git_get_head_hash($project);
-			$hash = git_get_hash_by_path($base, $file_name, "blob")
-				or die_error(404, "Cannot find file");
+			my $base = $hash_base || git_get_head_hash();
+			$hash = git_get_sha1_by_path($base, $file_name, 'blob')
+			    or die_error(404, "Cannot find file: '$file_name'");
 		} else {
 			die_error(400, "No file name defined");
 		}
@@ -4291,11 +4673,15 @@ sub git_blob_plain {
 		$expires = "+1d";
 	}
 
-	open my $fd, "-|", git_cmd(), "cat-file", "blob", $hash
-		or die_error(500, "Open git-cat-file blob '$hash' failed");
+	my $sha1 = git_get_sha1_or_die($hash, 'blob');
+	my $blob_read = $repo->progressive_cmd_output(
+		cmd => ['cat-file', 'blob', $sha1], separator => undef,
+		cache => 1);
+	my $first_chunk = $blob_read->() || '';
 
 	# content-type (can include charset)
-	$type = blob_contenttype($fd, $file_name, $type);
+	$type = blob_contenttype(substr($first_chunk, 0, 1024),
+				 $file_name, $type);
 
 	# "save as" filename, even when no $file_name is given
 	my $save_as = "$hash";
@@ -4309,42 +4695,47 @@ sub git_blob_plain {
 		-type => $type,
 		-expires => $expires,
 		-content_disposition => 'inline; filename="' . $save_as . '"');
-	undef $/;
-	binmode STDOUT, ':raw';
-	print <$fd>;
-	binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
-	$/ = "\n";
-	close $fd;
+	{
+		local $/;
+		binmode STDOUT, ':raw';
+		print $first_chunk;
+		while (my $chunk = $blob_read->()) {
+			print $chunk;
+		}
+		binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+	}
 }
 
 sub git_blob {
-	my $expires;
-
 	if (!defined $hash) {
 		if (defined $file_name) {
-			my $base = $hash_base || git_get_head_hash($project);
-			$hash = git_get_hash_by_path($base, $file_name, "blob")
-				or die_error(404, "Cannot find file");
+			my $base = $hash_base || git_get_head_hash();
+			$hash = git_get_sha1_by_path($base, $file_name, 'blob')
+			    or die_error(404, "Cannot find file: '$file_name'");
 		} else {
 			die_error(400, "No file name defined");
 		}
-	} elsif ($hash =~ m/^[0-9a-fA-F]{40}$/) {
-		# blobs defined by non-textual hash id's can be cached
-		$expires = "+1d";
 	}
 
 	my ($have_blame) = gitweb_check_feature('blame');
-	open my $fd, "-|", git_cmd(), "cat-file", "blob", $hash
-		or die_error(500, "Couldn't cat $file_name, $hash");
-	my $mimetype = blob_mimetype($fd, $file_name);
-	if ($mimetype !~ m!^(?:text/|image/(?:gif|png|jpeg)$)! && -B $fd) {
-		close $fd;
-		return git_blob_plain($mimetype);
-	}
+	my $sha1 = git_get_sha1_or_die($hash, 'blob');
+	my $blob_read = $repo->progressive_cmd_output(
+		cmd => ['cat-file', 'blob', $sha1], separator => "\n",
+		cache => 1);
+	my @first_lines;
+	for my $i (0..20) {
+		my $line = $blob_read->() or last;
+		push @first_lines, $line;
+	}
+	my $test_snippet = join("\n", @first_lines);
+	my $mimetype = blob_mimetype($test_snippet, $file_name);
+	return git_blob_plain($mimetype)
+	    if ($mimetype !~ m!^(?:text/|image/(?:gif|png|jpeg)$)! &&
+		! is_ascii_text($test_snippet));
 	# we can have blame only for text/* mimetype
 	$have_blame &&= ($mimetype =~ m!^text/!);
 
-	git_header_html(undef, $expires);
+	git_header_html();
 	my $formats_nav = '';
 	if (defined $hash_base && (my %co = parse_commit($hash_base))) {
 		if (defined $file_name) {
@@ -4389,7 +4780,7 @@ sub git_blob {
 		      qq!" />\n!;
 	} else {
 		my $nr;
-		while (my $line = <$fd>) {
+		while (my $line = shift @first_lines || $blob_read->()) {
 			chomp $line;
 			$nr++;
 			$line = untabify($line);
@@ -4397,29 +4788,23 @@ sub git_blob {
 			       $nr, $nr, $nr, esc_html($line, -nbsp=>1);
 		}
 	}
-	close $fd
-		or print "Reading blob failed.\n";
 	print "</div>";
 	git_footer_html();
 }
 
 sub git_tree {
-	if (!defined $hash_base) {
-		$hash_base = "HEAD";
-	}
+	$hash_base ||= "HEAD";
 	if (!defined $hash) {
 		if (defined $file_name) {
-			$hash = git_get_hash_by_path($hash_base, $file_name, "tree");
+			$hash = git_get_sha1_by_path($hash_base, $file_name, 'tree')
+			    or die_error(404, "Cannot find file: '$file_name'");
 		} else {
 			$hash = $hash_base;
 		}
 	}
-	$/ = "\0";
-	open my $fd, "-|", git_cmd(), "ls-tree", '-z', $hash
-		or die_error(500, "Open git-ls-tree failed");
-	my @entries = map { chomp; $_ } <$fd>;
-	close $fd or die_error(404, "Reading tree failed");
-	$/ = "\n";
+	my $sha1 = git_get_sha1_or_die($hash, 'tree');
+	my @entries = split "\0", $repo->cmd_output(
+		cmd => ['ls-tree', '-z', $sha1], cache => 2);
 
 	my $refs = git_get_references();
 	my $ref = format_ref_marker($refs, $hash_base);
@@ -4520,9 +4905,8 @@ sub git_snapshot {
 		die_error(403, "Unsupported snapshot format");
 	}
 
-	if (!defined $hash) {
-		$hash = git_get_head_hash($project);
-	}
+	$hash ||= git_get_head_hash();
+	my $sha1 = git_get_sha1_or_die($hash);
 
 	my $name = $project;
 	$name =~ s,([^/])/*\.git$,$1,;
@@ -4532,9 +4916,9 @@ sub git_snapshot {
 	my $cmd;
 	$filename .= "-$hash$known_snapshot_formats{$format}{'suffix'}";
 	$cmd = quote_command(
-		git_cmd(), 'archive',
+		'archive',
 		"--format=$known_snapshot_formats{$format}{'format'}",
-		"--prefix=$name/", $hash);
+		"--prefix=$name/", $sha1);
 	if (exists $known_snapshot_formats{$format}{'compressor'}) {
 		$cmd .= ' | ' . quote_command(@{$known_snapshot_formats{$format}{'compressor'}});
 	}
@@ -4544,16 +4928,17 @@ sub git_snapshot {
 		-content_disposition => 'inline; filename="' . "$filename" . '"',
 		-status => '200 OK');
 
-	open my $fd, "-|", $cmd
-		or die_error(500, "Execute git-archive failed");
+	my $snapshot_read = $repo->progressive_cmd_output(
+		cmd => $cmd, separator => undef, cache => 1);
 	binmode STDOUT, ':raw';
-	print <$fd>;
+	while (my $chunk = $snapshot_read->()) {
+		print $chunk;
+	}
 	binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
-	close $fd;
 }
 
 sub git_log {
-	my $head = git_get_head_hash($project);
+	my $head = git_get_head_hash();
 	if (!defined $hash) {
 		$hash = $head;
 	}
@@ -4613,8 +4998,8 @@ sub git_log {
 
 sub git_commit {
 	$hash ||= $hash_base || "HEAD";
-	my %co = parse_commit($hash)
-	    or die_error(404, "Unknown commit object");
+	my $sha1 = git_get_sha1_or_die($hash, 'commit');
+	my %co = parse_commit($hash);
 	my %ad = parse_date($co{'author_epoch'}, $co{'author_tz'});
 	my %cd = parse_date($co{'committer_epoch'}, $co{'committer_tz'});
 
@@ -4649,24 +5034,15 @@ sub git_commit {
 	if (!defined $parent) {
 		$parent = "--root";
 	}
-	my @difftree;
-	open my $fd, "-|", git_cmd(), "diff-tree", '-r', "--no-commit-id",
-		@diff_opts,
-		(@$parents <= 1 ? $parent : '-c'),
-		$hash, "--"
-		or die_error(500, "Open git-diff-tree failed");
-	@difftree = map { chomp; $_ } <$fd>;
-	close $fd or die_error(404, "Reading git-diff-tree failed");
-
-	# non-textual hash id's can be cached
-	my $expires;
-	if ($hash =~ m/^[0-9a-fA-F]{40}$/) {
-		$expires = "+1d";
-	}
+	my @difftree = split "\n", $repo->cmd_output(
+		cmd => ['diff-tree', '-r', '--no-commit-id', @diff_opts,
+			(@$parents <= 1 ? $parent : '-c'), $sha1, '--'],
+		cache => 2);
+
 	my $refs = git_get_references();
 	my $ref = format_ref_marker($refs, $co{'id'});
 
-	git_header_html(undef, $expires);
+	git_header_html();
 	git_print_page_nav('commit', '',
 	                   $hash, $co{'tree'}, $hash,
 	                   $formats_nav);
@@ -4743,41 +5119,16 @@ sub git_object {
 	# - hash_base and file_name
 	my $type;
 
-	# - hash or hash_base alone
 	if ($hash || ($hash_base && !defined $file_name)) {
-		my $object_id = $hash || $hash_base;
-
-		open my $fd, "-|", quote_command(
-			git_cmd(), 'cat-file', '-t', $object_id) . ' 2> /dev/null'
-			or die_error(404, "Object does not exist");
-		$type = <$fd>;
-		chomp $type;
-		close $fd
-			or die_error(404, "Object does not exist");
-
-	# - hash_base and file_name
+		# hash or hash_base alone
+		$type = (git_get_sha1_or_die($hash || $hash_base))[1];
 	} elsif ($hash_base && defined $file_name) {
+		# hash_base and file_name
 		$file_name =~ s,/+$,,;
-
-		system(git_cmd(), "cat-file", '-e', $hash_base) == 0
-			or die_error(404, "Base object does not exist");
-
-		# here errors should not hapen
-		open my $fd, "-|", git_cmd(), "ls-tree", $hash_base, "--", $file_name
-			or die_error(500, "Open git-ls-tree failed");
-		my $line = <$fd>;
-		close $fd;
-
-		#'100644 blob 0fa3f3a66fb6a137f6ec2c19351ed4d807070ffa	panic.c'
-		unless ($line && $line =~ m/^([0-9]+) (.+) ([0-9a-fA-F]{40})\t/) {
-			die_error(404, "File or directory for given base does not exist");
-		}
-		$type = $2;
-		$hash = $3;
+		($hash, $type) = git_get_sha1_or_die("$hash_base:$file_name");
 	} else {
 		die_error(400, "Not enough information to find object");
 	}
-
 	print $cgi->redirect(-uri => href(action=>$type, -full=>1,
 	                                  hash=>$hash, hash_base=>$hash_base,
 	                                  file_name=>$file_name),
@@ -4787,47 +5138,25 @@ sub git_object {
 sub git_blobdiff {
 	my $format = shift || 'html';
 
-	my $fd;
 	my @difftree;
 	my %diffinfo;
-	my $expires;
+	my $diff_read;
 
-	# preparing $fd and %diffinfo for git_patchset_body
+	# prepare $diff_read and %diffinfo for git_patchset_body
 	# new style URI
 	if (defined $hash_base && defined $hash_parent_base) {
-		if (defined $file_name) {
-			# read raw output
-			open $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-				$hash_parent_base, $hash_base,
-				"--", (defined $file_parent ? $file_parent : ()), $file_name
-				or die_error(500, "Open git-diff-tree failed");
-			@difftree = map { chomp; $_ } <$fd>;
-			close $fd
-				or die_error(404, "Reading git-diff-tree failed");
-			@difftree
-				or die_error(404, "Blob diff not found");
-
-		} elsif (defined $hash &&
-		         $hash =~ /[0-9a-fA-F]{40}/) {
-			# try to find filename from $hash
-
-			# read filtered raw output
-			open $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-				$hash_parent_base, $hash_base, "--"
-				or die_error(500, "Open git-diff-tree failed");
-			@difftree =
-				# ':100644 100644 03b21826... 3b93d5e7... M	ls-files.c'
-				# $hash == to_id
-				grep { /^:[0-7]{6} [0-7]{6} [0-9a-fA-F]{40} $hash/ }
-				map { chomp; $_ } <$fd>;
-			close $fd
-				or die_error(404, "Reading git-diff-tree failed");
-			@difftree
-				or die_error(404, "Blob diff not found");
-
-		} else {
-			die_error(400, "Missing one of the blob diff parameters");
-		}
+		my $commit_sha1 = git_get_sha1_or_die($hash_base, 'commit');
+		my $parent_base_sha1 = git_get_sha1_or_die($hash_parent_base, 'commit');
+		git_get_sha1_or_die($file_parent, 'blob') if $file_parent;
+		# There used to be code to handle hash (h) parameters,
+		# but it's not used (anymore), so we can require $file_name.
+		die_error(400, "No file name given") unless $file_name;
+		# read raw output
+		@difftree = split "\n", $repo->cmd_output(
+			cmd => ['diff-tree', '-r', @diff_opts, $parent_base_sha1,
+				$commit_sha1, '--', $file_parent || (),
+				$file_name],
+			cache => 2);
 
 		if (@difftree > 1) {
 			die_error(400, "Ambiguous blob diff specification");
@@ -4840,21 +5169,17 @@ sub git_blobdiff {
 		$hash_parent ||= $diffinfo{'from_id'};
 		$hash        ||= $diffinfo{'to_id'};
 
-		# non-textual hash id's can be cached
-		if ($hash_base =~ m/^[0-9a-fA-F]{40}$/ &&
-		    $hash_parent_base =~ m/^[0-9a-fA-F]{40}$/) {
-			$expires = '+1d';
-		}
-
 		# open patch output
-		open $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-			'-p', ($format eq 'html' ? "--full-index" : ()),
-			$hash_parent_base, $hash_base,
-			"--", (defined $file_parent ? $file_parent : ()), $file_name
-			or die_error(500, "Open git-diff-tree failed");
+		my $hash_base_sha1 = git_get_sha1_or_die($hash_base, 'commit');
+		$diff_read = $repo->progressive_cmd_output(  # TODO: uncovered
+			cmd => ['diff-tree', '-r', @diff_opts, '-p',
+				($format eq 'html' ? '--full-index' : ()),
+				$parent_base_sha1, $hash_base_sha1, '--',
+				$file_parent || (), $file_name],
+			separator => "\n", cache => 1);
 	}
 
-	# old/legacy style URI
+	# old/legacy style URI (still used in feed [Atom/RSS] view)
 	if (!%diffinfo && # if new style URI failed
 	    defined $hash && defined $hash_parent) {
 		# fake git-diff-tree raw output
@@ -4877,17 +5202,14 @@ sub git_blobdiff {
 			$diffinfo{'to_file'}   = $hash;
 		}
 
-		# non-textual hash id's can be cached
-		if ($hash =~ m/^[0-9a-fA-F]{40}$/ &&
-		    $hash_parent =~ m/^[0-9a-fA-F]{40}$/) {
-			$expires = '+1d';
-		}
-
 		# open patch output
-		open $fd, "-|", git_cmd(), "diff", @diff_opts,
-			'-p', ($format eq 'html' ? "--full-index" : ()),
-			$hash_parent, $hash, "--"
-			or die_error(500, "Open git-diff failed");
+		my $parent_sha1 = git_get_sha1_or_die($hash_parent, 'blob');
+		my $sha1 = git_get_sha1_or_die($hash, 'commit');
+		$diff_read = $repo->progressive_cmd_output(
+			cmd => ['diff', @diff_opts, '-p',
+				($format eq 'html' ? '--full-index' : ()),
+				$parent_sha1, $sha1, '--'],
+			separator => "\n", cache => 1);
 	} else  {
 		die_error(400, "Missing one of the blob diff parameters")
 			unless %diffinfo;
@@ -4898,7 +5220,7 @@ sub git_blobdiff {
 		my $formats_nav =
 			$cgi->a({-href => href(action=>"blobdiff_plain", -replay=>1)},
 			        "raw");
-		git_header_html(undef, $expires);
+		git_header_html();
 		if (defined $hash_base && (my %co = parse_commit($hash_base))) {
 			git_print_page_nav('','', $hash_base,$co{'tree'},$hash_base, $formats_nav);
 			git_print_header_div('commit', esc_html($co{'title'}), $hash_base);
@@ -4913,6 +5235,8 @@ sub git_blobdiff {
 		}
 
 	} elsif ($format eq 'plain') {
+		my $expires = ($hash =~ m/^[0-9a-f]{40}$/ &&
+			       $hash_parent =~ m/^[0-9a-f]{40}$/) ? '+1d' : undef;
 		print $cgi->header(
 			-type => 'text/plain',
 			-charset => 'utf-8',
@@ -4929,14 +5253,13 @@ sub git_blobdiff {
 	if ($format eq 'html') {
 		print "<div class=\"page_body\">\n";
 
-		git_patchset_body($fd, [ \%diffinfo ], $hash_base, $hash_parent_base);
-		close $fd;
+		git_patchset_body($diff_read, [ \%diffinfo ], $hash_base, $hash_parent_base);
 
 		print "</div>\n"; # class="page_body"
 		git_footer_html();
 
 	} else {
-		while (my $line = <$fd>) {
+		while (my $line = $diff_read->()) {
 			$line =~ s!a/($hash|$hash_parent)!'a/'.esc_path($diffinfo{'from_file'})!eg;
 			$line =~ s!b/($hash|$hash_parent)!'b/'.esc_path($diffinfo{'to_file'})!eg;
 
@@ -4944,9 +5267,9 @@ sub git_blobdiff {
 
 			last if $line =~ m!^\+\+\+!;
 		}
-		local $/ = undef;
-		print <$fd>;
-		close $fd;
+		while (my $line = $diff_read->()) {
+			print $line;
+		}
 	}
 }
 
@@ -4957,12 +5280,20 @@ sub git_blobdiff_plain {
 sub git_commitdiff {
 	my $format = shift || 'html';
 	$hash ||= $hash_base || "HEAD";
-	my %co = parse_commit($hash)
-	    or die_error(404, "Unknown commit object");
+	my $sha1 = git_get_sha1_or_die($hash, 'commit');
+	my %co = parse_commit($hash);
 
 	# choose format for commitdiff for merge
-	if (! defined $hash_parent && @{$co{'parents'}} > 1) {
-		$hash_parent = '--cc';
+	my $hash_parent_param = $hash_parent;
+	# Unfortunately we can pass in command line options as
+	# $hash_parent.
+	if ($hash_parent_param && $hash_parent_param ne '-c' &&
+	    $hash_parent_param ne '--cc') {
+		$hash_parent_param =
+		    git_get_sha1_or_die($hash_parent_param, 'commit');
+	}
+	if (! defined $hash_parent_param && @{$co{'parents'}} > 1) {
+		$hash_parent_param = '--cc';
 	}
 	# we need to prepare $formats_nav before almost any parameter munging
 	my $formats_nav;
@@ -4971,8 +5302,7 @@ sub git_commitdiff {
 			$cgi->a({-href => href(action=>"commitdiff_plain", -replay=>1)},
 			        "raw");
 
-		if (defined $hash_parent &&
-		    $hash_parent ne '-c' && $hash_parent ne '--cc') {
+		if (defined $hash_parent) {
 			# commitdiff with two commits given
 			my $hash_parent_short = $hash_parent;
 			if ($hash_parent =~ m/^[0-9a-fA-F]{40}$/) {
@@ -5004,7 +5334,7 @@ sub git_commitdiff {
 				')';
 		} else {
 			# merge commit
-			if ($hash_parent eq '--cc') {
+			if ($hash_parent && $hash_parent eq '--cc') {
 				$formats_nav .= ' | ' .
 					$cgi->a({-href => href(action=>"commitdiff",
 					                       hash=>$hash, hash_parent=>'-c')},
@@ -5026,7 +5356,6 @@ sub git_commitdiff {
 		}
 	}
 
-	my $hash_parent_param = $hash_parent;
 	if (!defined $hash_parent_param) {
 		# --cc for multiple parents, --root for parentless
 		$hash_parent_param =
@@ -5034,34 +5363,26 @@ sub git_commitdiff {
 	}
 
 	# read commitdiff
-	my $fd;
+	my $diff_read;
 	my @difftree;
 	if ($format eq 'html') {
-		open $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-			"--no-commit-id", "--patch-with-raw", "--full-index",
-			$hash_parent_param, $hash, "--"
-			or die_error(500, "Open git-diff-tree failed");
-
-		while (my $line = <$fd>) {
+		$diff_read = $repo->progressive_cmd_output(
+			cmd => ['diff-tree', '-r', @diff_opts, '--no-commit-id',
+				'--patch-with-raw', '--full-index',
+				$hash_parent_param, $sha1, '--'],
+			separator => "\n", cache => 1);
+		while (my $line = $diff_read->()) {
 			chomp $line;
 			# empty line ends raw part of diff-tree output
 			last unless $line;
 			push @difftree, scalar parse_difftree_raw_line($line);
 		}
-
-	} elsif ($format eq 'plain') {
-		open $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-			'-p', $hash_parent_param, $hash, "--"
-			or die_error(500, "Open git-diff-tree failed");
-
 	} else {
-		die_error(400, "Unknown commitdiff format");
-	}
-
-	# non-textual hash id's can be cached
-	my $expires;
-	if ($hash =~ m/^[0-9a-fA-F]{40}$/) {
-		$expires = "+1d";
+		die unless $format eq 'plain';
+		$diff_read = $repo->progressive_cmd_output(
+			cmd => ['diff-tree', '-r', @diff_opts, '-p',
+				$hash_parent_param, $sha1, '--'],
+			separator => "\n", cache => 1);
 	}
 
 	# write commit message
@@ -5069,7 +5390,7 @@ sub git_commitdiff {
 		my $refs = git_get_references();
 		my $ref = format_ref_marker($refs, $co{'id'});
 
-		git_header_html(undef, $expires);
+		git_header_html();
 		git_print_page_nav('commitdiff','', $hash,$co{'tree'},$hash, $formats_nav);
 		git_print_header_div('commit', esc_html($co{'title'}) . $ref, $hash);
 		git_print_authorship(\%co);
@@ -5081,10 +5402,10 @@ sub git_commitdiff {
 		}
 
 	} elsif ($format eq 'plain') {
-		my $refs = git_get_references("tags");
-		my $tagname = git_get_rev_name_tags($hash);
+		my $tagname = git_get_rev_name_tags($sha1);
 		my $filename = basename($project) . "-$hash.patch";
 
+		my $expires = ($hash =~ m/^[0-9a-f]{40}$/) ? '+1d' : undef;
 		print $cgi->header(
 			-type => 'text/plain',
 			-charset => 'utf-8',
@@ -5112,17 +5433,15 @@ sub git_commitdiff {
 		                  $use_parents ? @{$co{'parents'}} : $hash_parent);
 		print "<br/>\n";
 
-		git_patchset_body($fd, \@difftree, $hash,
+		git_patchset_body($diff_read, \@difftree, $hash,
 		                  $use_parents ? @{$co{'parents'}} : $hash_parent);
-		close $fd;
 		print "</div>\n"; # class="page_body"
 		git_footer_html();
 
 	} elsif ($format eq 'plain') {
-		local $/ = undef;
-		print <$fd>;
-		close $fd
-			or print "Reading git-diff-tree failed\n";
+		while (my $line = $diff_read->()) {
+			print $line;
+		}
 	}
 }
 
@@ -5131,37 +5450,29 @@ sub git_commitdiff_plain {
 }
 
 sub git_history {
-	if (!defined $hash_base) {
-		$hash_base = git_get_head_hash($project);
-	}
-	if (!defined $page) {
-		$page = 0;
-	}
+	$hash_base ||= git_get_head_hash();
+	$page ||= 0;
 	my $ftype;
-	my %co = parse_commit($hash_base)
-	    or die_error(404, "Unknown commit object");
+	my %co = parse_commit($hash_base);
 
 	my $refs = git_get_references();
 	my $limit = sprintf("--max-count=%i", (100 * ($page+1)));
 
 	my @commitlist = parse_commits($hash_base, 101, (100 * $page),
-	                               $file_name, "--full-history")
-	    or die_error(404, "No such file or directory on given branch");
+	                               $file_name, "--full-history");
 
 	if (!defined $hash && defined $file_name) {
 		# some commits could have deleted file in question,
 		# and not have it in tree, but one of them has to have it
 		for (my $i = 0; $i <= @commitlist; $i++) {
-			$hash = git_get_hash_by_path($commitlist[$i]{'id'}, $file_name);
+			$hash = git_get_sha1_by_path($commitlist[$i]{'id'}, $file_name);
 			last if defined $hash;
 		}
 	}
 	if (defined $hash) {
+		git_get_sha1_or_die($hash);
 		$ftype = git_get_type($hash);
 	}
-	if (!defined $ftype) {
-		die_error(500, "Unknown type of object");
-	}
 
 	my $paging_nav = '';
 	if ($page > 0) {
@@ -5202,13 +5513,8 @@ sub git_search {
 	if (!defined $searchtext) {
 		die_error(400, "Text field is empty");
 	}
-	if (!defined $hash) {
-		$hash = git_get_head_hash($project);
-	}
+	$hash ||= git_get_head_hash();
 	my %co = parse_commit($hash);
-	if (!%co) {
-		die_error(404, "Unknown commit object");
-	}
 	if (!defined $page) {
 		$page = 0;
 	}
@@ -5280,12 +5586,15 @@ sub git_search {
 		print "<table class=\"pickaxe search\">\n";
 		my $alternate = 1;
 		$/ = "\n";
-		open my $fd, '-|', git_cmd(), '--no-pager', 'log', @diff_opts,
-			'--pretty=format:%H', '--no-abbrev', '--raw', "-S$searchtext",
-			($search_use_regexp ? '--pickaxe-regex' : ());
+		my $pickaxe_read = $repo->progressive_cmd_output(
+			cmd => ['log', @diff_opts, '--pretty=format:%H',
+				'--no-abbrev', '--raw', "-S$searchtext",
+				($search_use_regexp ? '--pickaxe-regex' : ()),
+				git_get_head_hash()],
+			separator => "\n", cache => 1);
 		undef %co;
 		my @files;
-		while (my $line = <$fd>) {
+		while (my $line = $pickaxe_read->()) {
 			chomp $line;
 			next unless $line;
 
@@ -5326,7 +5635,6 @@ sub git_search {
 				      "<br/>\n";
 			}
 		}
-		close $fd;
 
 		# finish last commit (warning: repetition!)
 		if (%co) {
@@ -5349,12 +5657,13 @@ sub git_search {
 		print "<table class=\"grep_search\">\n";
 		my $alternate = 1;
 		my $matches = 0;
-		$/ = "\n";
-		open my $fd, "-|", git_cmd(), 'grep', '-n',
-			$search_use_regexp ? ('-E', '-i') : '-F',
-			$searchtext, $co{'tree'};
+		my $grep_read = $repo->progressive_cmd_output(
+			cmd => ['grep', '-n',
+				$search_use_regexp ? ('-E', '-i') : '-F',
+				$searchtext, $co{'tree'}],
+			separator => "\n", cache => 1);
 		my $lastfile = '';
-		while (my $line = <$fd>) {
+		while (my $line = $grep_read->()) {
 			chomp $line;
 			my ($file, $lno, $ltext, $binary);
 			last if ($matches++ > 1000);
@@ -5406,7 +5715,6 @@ sub git_search {
 		} else {
 			print "<div class=\"diff nodifferences\">No matches found</div>\n";
 		}
-		close $fd;
 
 		print "</table>\n";
 	}
@@ -5458,7 +5766,7 @@ EOT
 }
 
 sub git_shortlog {
-	my $head = git_get_head_hash($project);
+	my $head = git_get_head_hash();
 	if (!defined $hash) {
 		$hash = $head;
 	}
@@ -5500,33 +5808,22 @@ sub git_feed {
 	}
 
 	# log/feed of current (HEAD) branch, log of given branch, history of file/directory
-	my $head = $hash || 'HEAD';
-	my @commitlist = parse_commits($head, 150, 0, $file_name);
+	my $head = $hash || $repo->get_sha1('HEAD');  # can be undef
+	my %last_modified = parse_date($repo->get_last_modification);
 
-	my %latest_commit;
-	my %latest_date;
 	my $content_type = "application/$format+xml";
 	if (defined $cgi->http('HTTP_ACCEPT') &&
 		 $cgi->Accept('text/xml') > $cgi->Accept($content_type)) {
 		# browser (feed reader) prefers text/xml
 		$content_type = 'text/xml';
 	}
-	if (defined($commitlist[0])) {
-		%latest_commit = %{$commitlist[0]};
-		%latest_date   = parse_date($latest_commit{'author_epoch'});
-		print $cgi->header(
-			-type => $content_type,
-			-charset => 'utf-8',
-			-last_modified => $latest_date{'rfc2822'});
-	} else {
-		print $cgi->header(
-			-type => $content_type,
-			-charset => 'utf-8');
-	}
+	print $cgi->header(
+		-type => $content_type, -charset => 'utf-8',
+		-last_modified => $last_modified{'rfc2822'});
 
 	# Optimization: skip generating the body if client asks only
 	# for Last-Modified date.
-	return if ($cgi->request_method() eq 'HEAD');
+	return if $cgi->request_method() && $cgi->request_method() eq 'HEAD';
 
 	# header variables
 	my $title = "$site_name - $project/$action";
@@ -5593,15 +5890,11 @@ XML
 			# not twice as wide as tall: 72 x 27 pixels
 			print "<logo>" . esc_url($logo) . "</logo>\n";
 		}
-		if (! %latest_date) {
-			# dummy date to keep the feed valid until commits trickle in:
-			print "<updated>1970-01-01T00:00:00Z</updated>\n";
-		} else {
-			print "<updated>$latest_date{'iso-8601'}</updated>\n";
-		}
+		print "<updated>$last_modified{'iso-8601'}</updated>\n";
 	}
 
 	# contents
+	my @commitlist = parse_commits($head, 150, 0, $file_name) if $head;
 	for (my $i = 0; $i <= $#commitlist; $i++) {
 		my %co = %{$commitlist[$i]};
 		my $commit = $co{'id'};
@@ -5612,13 +5905,11 @@ XML
 		my %cd = parse_date($co{'author_epoch'});
 
 		# get list of changed files
-		open my $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-			$co{'parent'} || "--root",
-			$co{'id'}, "--", (defined $file_name ? $file_name : ())
-			or next;
-		my @difftree = map { chomp; $_ } <$fd>;
-		close $fd
-			or next;
+		my @difftree = split "\n", $repo->cmd_output(
+			cmd => ['diff-tree', '-r', @diff_opts,
+				$co{'parent'} || '--root', $co{'id'}, '--',
+				(defined $file_name ? $file_name : ())],
+			cache => 2);
 
 		# print element (entry, item)
 		my $co_url = href(-full=>1, action=>"commitdiff", hash=>$commit);
@@ -5733,16 +6024,8 @@ XML
 
 	foreach my $pr (@list) {
 		my %proj = %$pr;
-		my $head = git_get_head_hash($proj{'path'});
-		if (!defined $head) {
-			next;
-		}
-		$git_dir = "$projectroot/$proj{'path'}";
-		my %co = parse_commit($head);
-		if (!%co) {
-			next;
-		}
-
+		next unless $repo_root->repo(directory => $proj{'path'})
+		    ->get_sha1('HEAD');
 		my $path = esc_html(chop_str($proj{'path'}, 25, 5));
 		my $rss  = "$my_url?p=$proj{'path'};a=rss";
 		my $html = "$my_url?p=$proj{'path'};a=summary";
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index ae7082b..e04fb5f 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -54,7 +54,7 @@ gitweb_run () {
 	# written to web server logs, so we are not interested in that:
 	# we are interested only in properly formatted errors/warnings
 	rm -f gitweb.log &&
-	perl -- "$(pwd)/../../gitweb/gitweb.perl" \
+	"$PERL_PATH" -- "$(pwd)/../../gitweb/gitweb.cgi" \
 		>/dev/null 2>gitweb.log &&
 	if grep -q -s "^[[]" gitweb.log >/dev/null; then false; else true; fi
 
@@ -71,7 +71,7 @@ safe_chmod () {
 
 . ./test-lib.sh
 
-perl -MEncode -e 'decode_utf8("", Encode::FB_CROAK)' >/dev/null 2>&1 || {
+"$PERL_PATH" -MEncode -e 'decode_utf8("", Encode::FB_CROAK)' >/dev/null 2>&1 || {
     test_expect_success 'skipping gitweb tests, perl version is too old' :
     test_done
     exit
diff --git a/t/t9503-gitweb-Mechanize.sh b/t/t9503-gitweb-Mechanize.sh
index 53f2a8a..c0558e5 100755
--- a/t/t9503-gitweb-Mechanize.sh
+++ b/t/t9503-gitweb-Mechanize.sh
@@ -89,6 +89,16 @@ test_expect_success 'set up test repository' '
 	test_tick && git pull . b
 '
 
+# set up empty repository
+# create this as a subdirectory of trash directory; not pretty, but simple
+test_expect_success 'set up empty repository' '
+
+	mkdir empty.git &&
+	cd empty.git &&
+	git init --bare &&
+	cd ..
+'
+
 # set up gitweb configuration
 safe_pwd="$("$PERL_PATH" -MPOSIX=getcwd -e 'print quotemeta(getcwd)')"
 large_cache_root="../t9503/large_cache.tmp"
diff --git a/t/t9503/test.pl b/t/t9503/test.pl
index b0a8269..2d83158 100755
--- a/t/t9503/test.pl
+++ b/t/t9503/test.pl
@@ -274,6 +274,11 @@ if (test_page '', 'project list (implicit)') {
 		"title contains $site_name");
 	$mech->content_contains('t9503-gitweb-Mechanize test repository',
 		'lists test repository (by description)');
+	if (follow_link( { text => 'empty.git' }, 'empty git repository')) {
+		# Just check that the empty.git summary page is linked
+		# and doesn't die.
+		$mech->back;
+	}
 }
 
 
-- 
1.5.6.2.456.g63fc0

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH 0/3] Git::Repo API and gitweb caching
  2008-07-11  1:06 [PATCH 0/3] Git::Repo API and gitweb caching Lea Wiemann
                   ` (2 preceding siblings ...)
  2008-07-11  1:11 ` [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching Lea Wiemann
@ 2008-07-11  1:21 ` Johannes Schindelin
  2008-07-11  9:33 ` Jakub Narebski
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 63+ messages in thread
From: Johannes Schindelin @ 2008-07-11  1:21 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: Git Mailing List, John Hawley, Jakub Narebski, Petr Baudis

Hi,

On Fri, 11 Jul 2008, Lea Wiemann wrote:

> Patch (3) basically makes two large changes in one patch, but it was 
> pretty hard to separate them during development.  I could try to split 
> them up after the fact, but it would take at least an hour or two, since 
> the changes that introduce caching are spread all over the code.  I 
> don't think that having separate commits ([a] use Git::Repo API, [b] add 
> caching) brings enough benefit to justify the effort.
> 
> There are some other changes in (3) as well, but they fell out as part 
> of the refactoring, so I didn't separate them either -- same thing.

FWIW there are a few reasons why splitting up (3) might be the thing you 
really want to do, even if it takes an hour or two:

- it makes reviewing much easier,
- it makes subsequent revisions of the patches easier to review,
- it make it easier to cherry-pick changes, should not all be equally 
  liked,
- it makes finding bugs much easier (both spotting during review and 
  bisecting), and
- it is good for documentation purposes, should someone read the commit 
  log.

Now, after weighing the benefit (especially in terms of hours spared 
others) against the downsides, you might want to reconsider your stance.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 0/3] Git::Repo API and gitweb caching
  2008-07-11  1:06 [PATCH 0/3] Git::Repo API and gitweb caching Lea Wiemann
                   ` (3 preceding siblings ...)
  2008-07-11  1:21 ` [PATCH 0/3] Git::Repo API and gitweb caching Johannes Schindelin
@ 2008-07-11  9:33 ` Jakub Narebski
  2008-07-11 14:07   ` Lea Wiemann
  2008-07-19  5:35 ` Lea Wiemann
  2008-08-18 19:34 ` Lea Wiemann
  6 siblings, 1 reply; 63+ messages in thread
From: Jakub Narebski @ 2008-07-11  9:33 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: Git Mailing List, John Hawley, Petr Baudis

On Fri, 11 July 2008, Lea Wiemann wrote:

> As follow-ups to this message, I'll be sending three patches for
> 
> 1) adding the Mechanize tests,

Somehow I didn't get Cc-ed this patch...

> The Mechanize tests succeed before and after patch (3) is applied, so
> I'm reasonably confident that my refactoring didn't introduce any
> (major) bugs.

If I remember correctly Mechanize tests detected some bugs in gitweb
(nice!), but there were swept under the rug, i.e. put as TODO.

Does that mean that those errors were corrected, or that refactoring
"just" didn't break anything more?

> On my to-do list:
> 
> - Benchmarks.  I'm planning to time a replay of kernel.org's gitweb logs
> on the test server, with and without caching.  Nothing fancy.  (The
> performance of the test setup on odin3.kernel.org is not representative
> of gitweb's actual performance under load.)

Do you plan to compare other gitweb caching implementations? I mean
here old kernel.org caching by J.H., and repo.or.cz caching (only for
projects list page) by Pasky.

> - Implement support for Last-Modified or ETags, since those basically
> fall out for free with the current implementation.  (This will require
> mod_perl, since CGI doesn't allow for accessing arbitrary request
> headers AFAIK.)

>From CGI(3pm):

       http()
           Called with no arguments returns the list of HTTP environment vari-
           ables, including such things as HTTP_USER_AGENT, HTTP_ACCEPT_LANGUAGE,
           and HTTP_ACCEPT_CHARSET, corresponding to the like-named HTTP header
           fields in the request.  Called with the name of an HTTP header field,
           returns its value.  Capitalization and the use of hyphens versus under-
           scores are not significant.

           For example, all three of these examples are equivalent:

              $requested_language = http('Accept-language');
              $requested_language = http('Accept_language');
              $requested_language = http('HTTP_ACCEPT_LANGUAGE');


>From http://hoohoo.ncsa.uiuc.edu/cgi/env.html

  In addition to these, the header lines received from the client, if
  any, are placed into the environment with the prefix HTTP_ followed
  by the header name. Any '-' characters in the header name are changed
  to '_' characters. The server may exclude any headers which it has
  already processed, such as Authorization, Content-type, and
  Content-length. If necessary, the server may choose to exclude any
  or all of these headers if including them would exceed any system
  environment limits.

So you _can_ access 'If-Modified-Since', 'If-None-Match' (by web browsers)
and 'If-Not-Modified-Since', 'If-Match' (by caches) from CGI.

> That will make the site a tad more responsive, I hope,
> and it will also hugely reduce the load for RSS/Atom requests, which
> currently make up almost half of all requests to kernel.org's gitweb and
> get served in full each time (i.e. "200 OK" instead of "304 Not Modified").

Currently only one "shortcut" is that gitweb respects HEAD request
(returning only HTTP headers) for feeds, and if I remember correctly
only for feeds.  I don't know however how much work gitweb does before
HEAD request shortcut.

> - Make gitweb use more parts of the Git::Repo API; in particular, the
> commit and tag parsing code should be ripped out, and gitweb should use
> the (much prettier) Git::Commit/Git::Tag API instead.  Perhaps some more
> functions (like ls_tree) can be generalized into the API as well; I went
> the easy route for now and simply replaced most "open '-|'" calls with
> $repo->cmd_output calls.

I think that ls_tree and git-ls-tree output parsing should be
generalized into Git::Tree API as well.


I'll try to review the rest of patches by tomorrow...
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 0/3] Git::Repo API and gitweb caching
  2008-07-11  9:33 ` Jakub Narebski
@ 2008-07-11 14:07   ` Lea Wiemann
  2008-07-11 16:27     ` Abhijit Menon-Sen
  0 siblings, 1 reply; 63+ messages in thread
From: Lea Wiemann @ 2008-07-11 14:07 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Git Mailing List, John Hawley, Petr Baudis, Johannes Schindelin

Jakub Narebski wrote:
> On Fri, 11 July 2008, Lea Wiemann wrote:
>> 1) adding the Mechanize tests,
> 
> Somehow I didn't get Cc-ed this patch...

Yup, nobody got Cc'ed; apologies.

> If I remember correctly Mechanize tests detected some bugs in gitweb
> (nice!), but there were swept under the rug, i.e. put as TODO.
> 
> Does that mean that those errors were corrected, or that refactoring
> "just" didn't break anything more?

Refactoring really just didn't break anything more; it could be that
something got corrected accidentally, but I'd be surprised.

>> - Benchmarks.
> 
> Do you plan to compare other gitweb caching implementations? [k.org, repo.or.cz]

Yup, sure.

>> - Implementing support for Last-Modified or ETags [...] will require
>> mod_perl, since CGI doesn't allow for accessing arbitrary request headers
> 
>               $requested_language = http('Accept-language');
> 
>   the header lines received from the client, if any, are placed into the
>   environment with the prefix HTTP_ followed by the header name.

Right, you'd think the request headers should be accessible this way,
but apparently not all of them are.

If you take this script, ...

#!/usr/bin/perl
use CGI qw(http);
print "Content-type: text/plain\n";
print "Last-Modified: Thu, 03 Jul 2008 22:39:42 GMT\n\n";
print "Header: ", http('If-Last-Modified');

... then my browser (according to LiveHTTPHeaders) sends an
If-Last-Modified header, but it doesn't get through to the CGI script.
It does work if you test Accept-Language.  (Try print `env` to get an
idea of what gets through.)  It happens on Apache and thttpd.
Apparently this part somehow applies:

>   'The server may exclude any headers which it has already processed,'

I honestly have no idea why the If-Last-Modified headers gets eaten (and
googling didn't help), but I assume that at least it'll be possible to
access all headers with mod_perl.

> 'If-Not-Modified-Since', 'If-Match' (by caches)

Wait, are you sure caches would use those headers (I believe only the
latter actually exists BTW), or did you fall prey to a thinko? ;)

> one "shortcut" is that gitweb respects HEAD request
> (returning only HTTP headers) for feeds

Yes, and I think it does help performance-wise, but only a really small
fraction of the RSS/Atom requests actually use HEAD.  Most use GET.

> I think that ls_tree and git-ls-tree output parsing should be
> generalized into Git::Tree API as well.

True, though I'm still not sure how to make element access work pretty
and fast.  I'll keep pondering it for a while.

> I'll try to review the rest of patches by tomorrow...

Thanks!

Johannes Schindelin wrote:
> FWIW there are a few reasons why splitting up (3) might be the thing
> you really want to do

I've put splitting it on my list; I'm not sure though if I'll get around
to doing it today.  Everyone, comments on patch (3) are still
appreciated in the meantime. ;-)  I'll integrate any suggestions/patches
when I split it up.

-- Lea

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 0/3] Git::Repo API and gitweb caching
  2008-07-11 14:07   ` Lea Wiemann
@ 2008-07-11 16:27     ` Abhijit Menon-Sen
  2008-07-12 15:08       ` Jakub Narebski
  0 siblings, 1 reply; 63+ messages in thread
From: Abhijit Menon-Sen @ 2008-07-11 16:27 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

At 2008-07-11 16:07:26 +0200, lewiemann@gmail.com wrote:
>
> > 'If-Not-Modified-Since', 'If-Match' (by caches)
> 
> Wait, are you sure caches would use those headers (I believe only the
> latter actually exists BTW), or did you fall prey to a thinko? ;)

If-Not-Modified-Since should really be If-Unmodified-Since.

But where's the thinko? To send If-Modified-Since or similar with a
request, you would need to have a cached copy of the resource and use
its Last-Modified, for example. Sure, you could do it without one, but
what would be the point?

-- ams

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 0/3] Git::Repo API and gitweb caching
  2008-07-11 16:27     ` Abhijit Menon-Sen
@ 2008-07-12 15:08       ` Jakub Narebski
  0 siblings, 0 replies; 63+ messages in thread
From: Jakub Narebski @ 2008-07-12 15:08 UTC (permalink / raw)
  To: Abhijit Menon-Sen
  Cc: Lea Wiemann, git, John Hawley, Petr Baudis, Johannes Schindelin

Abhijit Menon-Sen <ams@toroid.org> writes:

> At 2008-07-11 16:07:26 +0200, lewiemann@gmail.com wrote:
> >
> > > 'If-Not-Modified-Since', 'If-Match' (by caches)
> > 
> > Wait, are you sure caches would use those headers (I believe only the
> > latter actually exists BTW), or did you fall prey to a thinko? ;)
> 
> If-Not-Modified-Since should really be If-Unmodified-Since.
> 
> But where's the thinko? To send If-Modified-Since or similar with a
> request, you would need to have a cached copy of the resource and use
> its Last-Modified, for example. Sure, you could do it without one, but
> what would be the point?

Actually the thinko was use of If-Unmodified-Since: and If-Match:.

Both If-Unmodified-Since and If-Match can be used and should be used
when requesting *partial document transfers* (when you need to ensure
that document is unchanged before you fetch next part of it).  
Additionally If-Match can be used for concurrenty control when you
do a two-way communication using GET and PUT, in a PUT request to
ensure that you are modifying correct document.

They are not used, I think, by caches to refresh cached data.
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-11  1:11 ` [PATCH 2/3] add new Git::Repo API Lea Wiemann
@ 2008-07-13 21:38   ` Junio C Hamano
  2008-07-14  1:04     ` Lea Wiemann
  2008-07-13 23:28   ` Jakub Narebski
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 63+ messages in thread
From: Junio C Hamano @ 2008-07-13 21:38 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Jakub Narebski, Petr Baudis

Lea Wiemann <lewiemann@gmail.com> writes:
> Here's some elaboration on why I didn't use or extend Git.pm.
>
> Please note before starting a reply to this: This is not an argument;
> I'm just explaining why I implemented it the way I did.  So please
> don't try to argue with me about what I should or should have done....
> ...
> So where do we go with Git.pm and Git::Repo?  I would suggest that
> they both stay.

This probably is something the person who mentors you as a GSoC student
should tell you, and shouldn't come from me, but I am going to mention
it anyway.

I thought GSoC program was about giving participants a chance to acquire
skills to work better with open source community, and I firmly believe
that a major component of that skillset is about working with people,
probably even more so than about working with code.  Of course you need
have a certain basic coding skills and disciplines, but it is clear to
everybody that you are beyond that threashold already.  I would have
preferred to see the first paragraph of yours stated more diplomaticly;
it sounds rather unilateral ultimatum to me.

Having said that, I agree with your reasoning.  People who care enough can
help refactoring Git.pm to build on top of Git::Repo (this would include
adding missing Git::WC and possibly others), but I agree that would
largely be outside of the scope of gitweb caching.

This patch was not signed-off, but I assume it was simply forgotten and
not deliberate?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-11  1:11 ` [PATCH 2/3] add new Git::Repo API Lea Wiemann
  2008-07-13 21:38   ` Junio C Hamano
@ 2008-07-13 23:28   ` Jakub Narebski
  2008-07-14  2:29     ` Lea Wiemann
  2008-07-14  1:40   ` Petr Baudis
  2008-07-16 18:21   ` Jakub Narebski
  3 siblings, 1 reply; 63+ messages in thread
From: Jakub Narebski @ 2008-07-13 23:28 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Petr Baudis

On Fri, 11 July 2008 03:11, Lea Wiemann napisał:

> This also adds the Git::Commit and Git::Tag classes, which are used by
> Git::Repo, the Git::Object base class, and the Git::RepoRoot helper
> factory class.

I think it would be perhaps better to explain relationship and purpose
of each class in more detail, including Git::Repo.

No signoff - is it deliberate, or just omission?

> ---
> Here's some elaboration on why I didn't use or extend Git.pm.
> 
> Please note before starting a reply to this: This is not an argument;
> I'm just explaining why I implemented it the way I did.  So please
> don't try to argue with me about what I should or should have done.
> I'm not going to refactor Git::Repo to use Git.pm or vice versa; it's
> really a much more non-trivial task than you might think at first
> glance.
> 
> Anyways, the following bullet points are my reasons for not extending
> Git.pm:
> 
> - Git.pm doesn't do what I want: It's designed to provide access to
>   working copies.  Extending it to have more repository-access
>   functions might have resulted in a mess.

I don't quite understand.  True, Git->repository(...) has a bit strange
[optional] parameters, but you should remember that IIRC it predates
possibility of workdir / worktree / working copy separation.  That
I think is why it supports non-bare repositories (default), and bare
repositories, not separating working copy from repository.

Also, from what I understand and remember, Git.pm was created to have
one place, one single implementation for safe and operating system 
independent (which means that it works with ActiveState Perl on 
Windows) "pipe" and "backticks".  Everything else is just convenience.
 
>   Some long-term thoughts on this issue: I don't think Git.pm's design
>   approach is sensible: It tries to do (a) WC access, (b) repo access,
>   and (c) frontend error handling (with sensible error messages).

I can see (b) and (c), but I have trouble seeing (a).

>   Those things should really be separated; e.g. one could write a
>   Git::WC class that *has* a Git::Repo instance (since every working
>   copy has a repository associated with it); so you can use $wc =
>   Git::WC->new to access the working copy, and $wc->repo to access its
>   repository.

Please remember that the opposite relation is also true.  Non-bare 
repository has working copy / worktree associated with it, either 
implicitly (.git/.. is working copy), or explicitly (core.worktree).

Also even if worktree and repository are separated, there are a few
files in non-bare repository which refer to worktree condition, namely  
$GIT_DIR/HEAD, $GIT_DIR/logs/HEAD and $GIT_DIR/index.

>   Git.pm will obviously have to stay since a few git 
>   commands use it, 

The following commands use Git.pm: contrib/examples/git-remote.perl,
git-add--interactive.perl (helper script), git-cvsexportcommit, git-svn, 
git-send-email.

The following commands are written in Perl, but *do not* use Git.pm:
git-archimport (which probably should go to contrib, and be replaced by 
fast-import / fast-export based Bazaar import), git-cvsimport, 
git-cvsserver, git-relink (rarely used now, I think).

>   but I'd probably let it die a slow death, and (cleanly!) copy
>   functionality to a Git::WC module (and perhaps a Git::Frontend
>   module) in the long run. 

I'm not so sure if it is a way to go.  Most git commands wants to just 
invoke other git commands safely, either creating pipeline, or reading 
whole input; sometimes command output is not important.

Non OO things, like ability to write

  print color('reset') . "\n";

is also important.

> - It would have needed refactoring and more features.
> 
>   - For example, the Error module should be removed (and normal die
>     and eval should be used instead).  At some point when I was trying
>     to debug it through gitweb, Git.pm would error out somewhere and I
>     would randomly get either a blank page (usually) or an error page
>     (rarely).  I suspect that this is because of the Error module or
>     some strange interaction with another.

I'm not sure if using Error module and syntactic sugar in the form of 
try { ... } catch <class> with { ... } was a good idea for (c), 
frontend error handling.  Or if it was a good idea (the explanation in 
http://www.perl.com/pub/a/2002/11/14/exception.html is compelling)
if it was implemented fully, completely, and correctly.

How would you like to catch errors from frontend in Git::Repo and 
friends?

>     [...]                                (Besides, Error is not in
>     the Perl distribution, so it would be an unnecessary dependency
>     for gitweb; the only one as far as I can see.)

Git Makefile installs Error module from local copy if one is not 
available.  perl-Git module (which would be unfortunately required to 
use gitweb; well, that or you could install it locally) has 
"perl(Error)" as one of dependencies.  perl-Error module is a standard
module and I think it can be found in any modern Linux distribution; if 
not, see first sentence in this paragraph.

>   - Also, I needed something like max_exit_code and a custom path to
>     the git binary; adding max_exit_code would have been non-trivial.

What is max_exit_code (and why for example you couldn't just create your 
own derivative of Error)?
 
>   Now I'm all in favor of re-using existing code, but refactoring
>   Git.pm would have taken *much* longer than simply writing a new
>   module.  I'm working on caching for gitweb, not on implementing the
>   next great Perl API for Git.  (And Git::Repo isn't great, FTR.)

Well, I can understand that.

Besides, having more than one implementation of some new feature was 
quite common in early stages of feature development.  See for example 
wit and gitweb, git-applymbox and git-am, git-annotate and git-blame.

> - It's buggy and untested.  Neither of these is a problem by itself,
>   but the combination is deadly.

Haven't you added t/t9700-perl-git.sh?

>   E.g. I was trying to refactor the 
>   'repository' constructor (to be able to do instantaneous
>   instantiation) and stumbled upon a missing semicolon that rendered
>   the surrounding code syntactically correct but obviously buggy (line
>   214 on master).  Adding a semicolon there seemed to cause other
>   errors, and given that (a) I don't understand what the code does and
>   no test or comment tells me what it should, and (b) it doesn't work
>   (or maybe it half-works?), I lost all my confidence that I could do
>   anything resembling a (behavior-preserving) refactoring on that
>   code.

This is not that the code should not be rewritten... but git-blame (and 
git log -S, aka pickaxe search) can be created to find who is the 
author of said code, and ask him via email (probably Cc-ing git mailing 
list).

>   I might have been able to work with this particular problem, but
>   such a problem (buggy and completely untested code) are indicative
>   that the rest of the code might bear similar surprises.

I think that the pipe and command code might be tested best, as it is 
collation and condensation of many different "safe pipe" (etc.) code 
fragments.

> - It's overly lengthy, and it's a lot of code for not much
>   functionality.  Git.pm has 1200 LOC, with a large stack of tiny
>   subroutines to handle pipes, and a complicated error handling
>   system.  I can make out ~400 lines of actual Git functionality, plus
>   a 100-lines constructor (all including documentation).  The part of
>   Git::Repo that overlaps with Git.pm (in terms of duplicate
>   functionality) seems to be in the range of 150-200 lines, and it's
>   mostly pipe-handling.  That's not a whole lot.

Pipe handling is IMHO most important part.  Well, other parts such as 
color(...) are important too, but not for gitweb.

> - When I decided I didn't want to use Git.pm, it took me 2-3 hours to
>   replicate the functionality in Git.pm that I needed; it would have
>   taken me *much* longer to extend Git.pm to do what I want.  Again,
>   I'm not working on the next great Git Perl API.
> 
> So where do we go with Git.pm and Git::Repo?  I would suggest that
> they both stay.
> 
> Thus we'd have two APIs (both of them obviously incomplete).  If
> there's enough objection to having two competing official APIs, I'll
> be happy to move Git::Repo to the gitweb directory, and make it a
> gitweb-specific thing (though it's not gitweb-specific at all in terms
> of functionality); I really don't care, as my goal is to get caching
> for gitweb working.  Again, if you're bothered by the fact that
> there's duplicate functionality at all, please don't complain, but
> send patches to reconcile the APIs; I will not take care of that
> myself, since it's a very much non-trivial task.

What I worry about is that dependence on Git.pm or Git::Repo would make 
gitweb installation too hard for some.

> Unrelatedly, should I add copyright notices at the bottom of each perl
> module so they are displayed in the perldoc/man pages?  I'm not
> generally a fan of such notices, since they tend to establish code
> ownership, but if it's desired I can add them.

Well, most manpages have information about who made them... which means 
who was initial author, usually, and/or who is current maintainer.


[Comments on patch itself in separate email, later]

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-13 21:38   ` Junio C Hamano
@ 2008-07-14  1:04     ` Lea Wiemann
  0 siblings, 0 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-07-14  1:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, John Hawley, Jakub Narebski, Petr Baudis

Junio C Hamano wrote:
> This probably is something the person who mentors you as a GSoC student
> should tell you, and shouldn't come from me,

No, thanks for telling me.  I'm actually keeping John quite busy with
the server side on kernel.org, and wouldn't currently expect him to look
at the patches I'm sending.

> I would have preferred to see the first paragraph of yours stated more
> diplomaticly; it sounds rather unilateral ultimatum to me.

Apologies; I was trying to hold off developers who'd see duplicate
functionality and run off at me screaming "unacceptable! impure!!one".
I should've phrased it more carefully.

> This patch was not signed-off, but I assume it was simply forgotten

Yup.  In the unlikely case it gets applied as-is, feel free to add it.

(I'll reply to Jakub's email shortly, unless I fall asleep in the
meantime. ^^)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-11  1:11 ` [PATCH 2/3] add new Git::Repo API Lea Wiemann
  2008-07-13 21:38   ` Junio C Hamano
  2008-07-13 23:28   ` Jakub Narebski
@ 2008-07-14  1:40   ` Petr Baudis
  2008-07-14 22:19     ` Lea Wiemann
  2008-07-14 23:41     ` [PATCH 2/3] add new Git::Repo API Jakub Narebski
  2008-07-16 18:21   ` Jakub Narebski
  3 siblings, 2 replies; 63+ messages in thread
From: Petr Baudis @ 2008-07-14  1:40 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Jakub Narebski


Thanks to Jakub for a nice writeup too. I will try to cover mainly
design issues and do not try to dive too much into technical arguments
about current Git.pm, since I agree with the general idea of designing
a new API; Jakub covered most of that pretty well, I think.

Maybe I sound too mentoring at places; I tried to tone that down as much
as possible, and of course what Lea should be working on is her and hers
official mentor call. On the other hand, if something general gets into
the Git tree, I'd like to make sure it's something we can live happily
with for long time, not just a hack tailored for gitweb caching.


On Fri, Jul 11, 2008 at 03:11:05AM +0200, Lea Wiemann wrote:
> This also adds the Git::Commit and Git::Tag classes, which are used by
> Git::Repo, the Git::Object base class, and the Git::RepoRoot helper
> factory class.

I really miss some more detailed writeup on the envisioned design here.
And if we are redoing the API in a better way, we better should have
some vision.

Most importantly, how is Git::Repo interacting with working copies, and
how is it going to interact with them as the new OO model shapes up?
You mention briefly Git::WC later, but it is not really clear how the
interaction should work.


First, I don't think it's good idea at all to put the pipe-related stuff
to Git::Repo - this is botched up API just like the current one. This
all is independent from any repository instances, in fact it's perfectly
valid to call standalone remote commands (like ls-remote or, actually,
clone).

Here is an idea: Introduce Git::Command object that will have very
general interface and look like

	my $c = Git::Command->new(['git', '--git-dir=.', 'cat-file', \
		'-p', 'bla'], {pipe_out=>1})
	...
	$c->close();

and a Git::CommandFactory with a nicer interface that would look like

	my $cf = Git::CommandFactory->new('git', '--git-dir=.');
	my $c = $cf->output_pipe('cat-file', '-p', 'bla');
	$c->close();

Then, Git::Repo would have a single Git::CommandFactory instance
pre-initialized with the required calling convention, and returned by
e.g. cmd() method. Then, from the user POV, you would just:

	my $repo = Git::Repo->new;
	$repo->cmd->output_pipe('cat-file', '-p', 'bla');

Or am I overdoing it?


Git::Repo considers only bare repositories. Now, since "working copy" by
itself has nothing to do with Git and is just an ordinary directory
tree, I think Git::WC does not make that much sense, but something like
Git::Repo::Nonbare certainly would. This would be a Git::Repo subclass
with:

	* Different constructor

	* Different Git::CommandFactory instance

	* Git::Index object aside the existing ones (like Git::Config,
	  Git::RefTree, ...)

	* Some kind of wc_root() method to help directory navigation

And that pretty much covers it?


Another thing is clearly describing how error handling is going to work.
I have not much against ditching Error.pm, but just saying "die + eval"
does not cut it - how about possible sideband data? E.g. the failure
mode of Git.pm's command() method includes passing the error'd command
output in the exception object. How are we going to handle it? Now, it
might be actually okay to say that we _aren't_ going to handle this if
it is deemed unuseful, but that needs to be determined too. I don't know
off the top of my head.


> ---
> Here's some elaboration on why I didn't use or extend Git.pm.
> 
> Please note before starting a reply to this: This is not an argument;
> I'm just explaining why I implemented it the way I did.  So please
> don't try to argue with me about what I should or should have done.
> I'm not going to refactor Git::Repo to use Git.pm or vice versa; it's
> really a much more non-trivial task than you might think at first
> glance.

Frankly, I'm not really happy about this attitude. "In my dark corner,
I have developed this. Now behold - take it or leave it be; I'm not
going to argue."

That said, I do agree that better-structured object API would be
beneficial, and I think what you posted here is a good starting point.
However, I'm not going to settle with seeing a completely parallel and
independent code pop up alongside the existing Git.pm.

Instead, I believe the best course is to gradually translate all the
Git.pm functionality to the new OO model, leaving Git.pm as a
compatibility wrapper. Now, if you believe this is a non-trivial task,
please tell us why.

It should be actually very easy to start with moving all the pipe
functionality to Git::Command. This should let us get rid of a lot of
technical code, which can be reused in the cleanly rebuilt API. Aside
of the object management + error handling, the rest of Git.pm is then
actually rather boring and unimportant, especially given Jakub's
hopefully upcoming Git::Config.

I will be happy to send in patches factoring out the pipe handling to
Git::Command as soon as we agree about the missing design details and
the way of error handling. Then, making use of that in Git::Repo should
get us a long way ahead.

>   Now I'm all in favor of re-using existing code, but refactoring
>   Git.pm would have taken *much* longer than simply writing a new
>   module.  I'm working on caching for gitweb, not on implementing the
>   next great Perl API for Git.  (And Git::Repo isn't great, FTR.)

Wait, I can't make sense out of this paragraph. If Git.pm sucks, we can
work on new API. But we better _make_ it great. Or someone else comes by
next year and says "oh, but it's buggy and needs refactoring, let's
throw it away and redesign it!"

I agree that your main objective is caching for gitweb, but that's not
what everything revolves around for the rest of us. If you chose the way
of caching within the Git API and introducing the API to gitweb, I think
you should spend the effort to deal with the API properly now.

> diff --git a/perl/Git/Commit.pm b/perl/Git/Commit.pm
> new file mode 100644
> index 0000000..a9bc304
> --- /dev/null
> +++ b/perl/Git/Commit.pm
> @@ -0,0 +1,163 @@
> +=head1 NAME
> +
> +Git::Commit - Object-oriented interface to Git commit objects.
> +
> +=cut
> +
> +use strict;
> +use warnings;
> +
> +
> +package Git::Commit;
> +
> +use base qw(Git::Object);
> +
> +use constant _MESSAGE => 'M';
> +use constant _ENCODING => 'E';
> +use constant _TREE => 'T';
> +use constant _PARENTS => 'P';
> +use constant _AUTHOR => 'A';
> +use constant _COMMITTER => 'C';

What is the idea behind this? It may make sense to create numerical
constants like this and use arrays instead of hashes for "structs" in
order to gain extra performance. But if you are going to use hashes
anyway, why not actually key by sensible name directly?

> +=head1 METHODS
> +
> +=over
> +
> +=item $commit = Git::Commit->new($repo, $sha1)

Are we sure we don't want hash-based arguments instead? This is badly
extensible and inconsistent with the rest of the API.

> +Return a new Git::Commit instance for a commit object with $sha1 in
> +repository $repo.
> +
> +Calls to this method are free, since it does not check whether $sha1
> +exists and has the right type.  However, accessing any of the commit
> +object's properties will fail if $sha1 is not a valid commit object.

This is nice idea, but I'd also provide a well-defined way for the user
to verify the object's validity at a good moment; basically, make load()
a public method. The user can deal with errors then and rely on
error-free behavior later.

> +Note that $sha1 must be the SHA1 of a commit object; tag objects are
> +not dereferenced.

Why not?

> +=item $obj->repo
> +
> +Return the Git::Repo instance this object was instantiated with.
> +
> +=item $obj->sha1
> +
> +Return the SHA1 of this commit object.
> +
> +=item $commit->tree
> +
> +Return the tree this commit object refers to.
> +
> +=item $commit->parents
> +
> +Return a list of zero or more parent commit objects.  Note that commit
> +objects stringify to their respective SHA1s, so you can alternatively
> +treat this as a list of SHA1 strings.
> +
> +=item $commit->authors

author

> diff --git a/perl/Git/Object.pm b/perl/Git/Object.pm
> new file mode 100644
> index 0000000..a5126f9
> --- /dev/null
> +++ b/perl/Git/Object.pm
> @@ -0,0 +1,81 @@
> +=head1 NAME
> +
> +Git::Object - Object-oriented interface to Git objects (base class).
> +
> +=head1 DESCRIPTION
> +
> +Git::Object is a base class that provides access to commit, tag and
> +(unimplemented) tree objects.  See L<Git::Commit> and L<Git::Tag>.
> +
> +Objects are loaded lazily, and hence instantiation is free.  Objects
> +stringify to their SHA1s.

Maybe use the term 'Git database objects'? This way, it seems as if we
are talking about all Git/*.pm objects.

> +=cut
> +
> +
> +use strict;
> +use warnings;
> +
> +
> +package Git::Object;
> +
> +use base qw(Exporter);
> +
> +our @EXPORT = qw();
> +our @EXPORT_OK = qw();
> +
> +use overload
> +    '""' => \&stringify;

(For the beholders: the overload pragma comes from 5.004.)

> +# Hash indices:
> +use constant _REPO => 'R';
> +use constant _SHA1 => 'H';
> +
> +=head1 METHODS
> +
> +=over
> +
> +=item Git::Object->new($repo, $sha1)
> +
> +Return a new Git::Object instance for the object with $sha1 in the
> +repository $repo (a Git::Repo instance).
> +
> +Note that this method does not check whether the object exists in the
> +repository.  Trying to accessing its properties through a subclass
> +will fail if the object doesn't exist, however.
> +
> +=cut
> +
> +sub new {
> +	my($class, $repo, $sha1) = @_;
> +	die "$repo is not a Git::Repo instance" unless $repo->isa('Git::Repo');
> +	my $self = {_REPO() => $repo, _SHA1() => $sha1};
> +	return bless $self, $class;
> +}
> +
> +=item $obj->repo
> +
> +Return the Git::Repo instance this object was instantiated with.
> +
> +=cut
> +
> +sub repo {
> +	shift->{_REPO()}
> +}
> +
> +=item $obj->sha1
> +
> +Return the SHA1 of this object.
> +
> +=cut
> +
> +sub sha1 {
> +	shift->{_SHA1()}
> +}
> +
> +sub stringify {
> +	shift->{_SHA1()}
> +}
> +
> +1;

Why not overload "" directly to sha1()?

> diff --git a/perl/Git/Repo.pm b/perl/Git/Repo.pm
> new file mode 100644
> index 0000000..5a4c7e0
> --- /dev/null
> +++ b/perl/Git/Repo.pm
> @@ -0,0 +1,397 @@
> +=head1 NAME
> +
> +Git::Repo - Read-only access to the Git repositories.
> +
> +WARNING: This module is in active development -- do not use it in any
> +production code as the API may change at any time.
> +
> +=head1 SYNOPSIS
> +
> +  use Git::Repo;
> +
> +  my $repo = Git::Repo->new(
> +      directory => '/path/to/repository.git',
> +      git_binary => '/usr/bin/git');
> +  my ($sha1, $type, $size) = $repo->get_sha1('HEAD');
> +  print "Last log message:\n\n" . $repo->get_commit($sha1)->message;
> +
> +=cut
> +
> +
> +use strict;
> +use warnings;
> +use 5.006002;
> +
> +
> +package Git::Repo;
> +
> +use Git::Tag;
> +use Git::Commit;
> +
> +use IPC::Open2 qw(open2);
> +use IO::Handle;
> +
> +use base qw(Exporter);
> +
> +our @EXPORT = qw();
> +our @EXPORT_OK = qw(assert_sha1 assert_opts);
> +
> +# Auxiliary subroutines
> +
> +sub assert_opts {
> +	die "must have an even number of arguments for named options"
> +	    unless $#_ % 2;
> +}
> +
> +sub assert_sha1 {
> +	my $sha1 = shift;
> +	die "'$sha1' is not a SHA1 (need to use get_sha1?)"
> +	    unless $sha1 && $sha1 =~ /^[a-f0-9]{40}$/;
> +}

Pretend names with underscores, since they are internal?

> +=head1 METHODS
> +
> +=head2 Instantiation and basic methods
> +
> +=over
> +
> +=item $repo = Git::Repo->new(%opts)
> +
> +Return a new Git::Repo object.  The following options are supported:
> +
> +=over
> +
> +=item 'directory'
> +
> +The directory of the repository (mandatory).
> +
> +Note that this option is working-copy agnostic; you need to
> +instantiate it with the working copy's .git directory as the
> +'directory' option.

I don't think making this mandatory is reasonable, since all the git
commands can automatically figure this out by themselves too; so can
Git::Repo easily by calling git rev-parse --git-dir.

> +=item 'git_binary'
> +
> +The name or full path of the git binary (default: 'git').
> +
> +=back
> +
> +Calling this method is free, since it does not check whether the
> +repository exists.  Trying to access the repository through one of the
> +instance methods will fail if it doesn't exist though.
> +
> +Examples:
> +
> +    $repo = Git::Repo->new(directory => '/path/to/repository.git');
> +    $repo = Git::Repo->new(directory => '/path/to/working_copy/.git');
> +
> +=cut
> +
> +sub new {
> +	my $class = shift;
> +	assert_opts(@_);
> +	my $self = {@_};
> +	bless $self, $class;
> +	die 'no directory given' unless $self->{directory};
> +	return $self;
> +}
> +
> +=item $repo->repo_dir
> +
> +Return the directory of the repository (.../.git in case of a working
> +copy).
> +
> +=cut
> +
> +sub repo_dir {
> +	shift->{directory}
> +}
> +
> +=item $repo->git_binary
> +
> +Return the name of or path to the git binary (used with exec).
> +
> +=cut
> +
> +sub git_binary {
> +	shift->{git_binary}
> +}

--- CUT ---

> +=item $repo->version
..snip..
> +# Return the first items of the git command line, for instance
> +# qw(/usr/bin/git --git-dir=/path/to/repo.git).
> +sub _get_git_cmd {
> +	my $self = shift;
> +	return ($self->git_binary || 'git', '--git-dir=' . $self->repo_dir);
> +}

--- CUT ---

This definitely does not belong to a Git::Repo object.

BTW, I like the idea of reuse => 1 very much.

> +=back
> +
> +=head2 Inspecting the Repository
> +
> +=over
> +
> +=item $repo->cat_file($sha1)
> +
> +Return the ($type, $content) of the object identified by $sha1, or die
> +if no such object exists in the repository.

I don't think this is good combination of semantic and name. Since we
don't do the same thing as plain git cat-file, we might as well call it
cat_object() or even better get_object().

> +Note that you may want to use the higher-level methods get_commit and
> +get_tag instead.
..snip..

> +=item $repo->get_path($tree_sha1, $file_sha1)

Now we are quickly getting messy again. This should definitely live in
Git::Tree.

> +=item $repo->get_refs

Again, the refs should be properly integrated into the object structure.

-- 
				Petr "Pasky" Baudis
GNU, n. An animal of South Africa, which in its domesticated state
resembles a horse, a buffalo and a stag. In its wild condition it is
something like a thunderbolt, an earthquake and a cyclone. -- A. Pierce

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-13 23:28   ` Jakub Narebski
@ 2008-07-14  2:29     ` Lea Wiemann
  0 siblings, 0 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-07-14  2:29 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John Hawley, Petr Baudis

Jakub Narebski wrote:
> I think it would be perhaps better to explain relationship and purpose
> of each class in more detail, including Git::Repo.

Noted, will do.

>>   [Git.pm] tries to do (a) WC access, (b) repo access,
>>   and (c) frontend error handling (with sensible error messages).
> 
> I can see (b) and (c), but I have trouble seeing (a).

Well, Git.pm operates on working copies in the constructor (obviously),
but also wc_{path,subdir,chdir} and hash_and_insert_object.

>>   every working copy has a repository associated with it
> 
> Please remember that the opposite relation is also true.

True. *nods*

>>   but I'd probably let [Git.pm] die a slow death
> 
> I'm not so sure if it is a way to go.  Most git commands wants to just 
> invoke other git commands safely,

Good point.  Perhaps the command functionality of Git.pm and Git::Repo
could be extracted into something like Git::Cmd.

> Non OO things, like ability to write  print color('reset') . "\n";
> is also important.

Perhaps, though you might not get around some instantiation to specify
the semantics of the color command: Honor color configuration in
.gitconfig or .git/config?  Honor non-terminal stdout?  Honor command
line?  I suspect that in the end non-OO functions end up being wrappers
around OO interfaces that simply specify a set of reasonable defaults.

> I'm not sure if using Error module was a good idea for
> frontend error handling.

As a general rule, I'd try to not use program exceptions as a means to
do frontend error handling, unless you're trying hard to keep the
frontend minimalist.  Even if you don't care about i18n, different
frontends have different needs for their error reporting styles.  Also,
things like failed SHA1-lookups might be an error to one frontend but
not an error to another frontend, so you'd have to implement an
exception hierarchy to make fine-granular catching possible.

On top of that, this kind of exception handling doesn't seem very much
like typical Perl style.

> How would you like to catch errors from frontend in Git::Repo and 
> friends?

Handle them yourself -- Git::Repo doesn't die unless a fatal (i.e.
unexpected) error occurs:

($sha1, $type) = $repo->get_sha1('HEAD:/my/file');
if (! defined $sha1 || $type ne 'blob') { ... handle error ... }
$contents = $repo->cat_file($sha1);
... work with contents ...

Also note how there's one well-defined (and known) error point: $sha1
being undefined, or the $type being wrong.  The $repo methods *cannot*
throw errors unless they're fatal, so you can for instance call cat_file
and assume that everything goes right.

> What is max_exit_code

It allows you call the git binary without dying if it exits with
non-zero status; see the cmd_output documentation for details.

The idea is that a non-zero exit status always indicates an internal
(fatal) error, unless you specify that it's OK.

>> - It's buggy and untested.  Neither of these is a problem by itself,
>>   but the combination is deadly.
> 
> Haven't you added t/t9700-perl-git.sh?

Yes (and it alleviated the problem), but I couldn't test the areas where
the untestedness actually hits (e.g. the missing semicolon I mentioned).
 IOW, t9700 is only testing the parts that are working anyway.

> What I worry about is that dependence on Git.pm or Git::Repo would make 
> gitweb installation too hard for some.

If I'm not mistaken you can always drop the perl/Git directory next to
gitweb.cgi.  (I'll add that to the installation notes.)

>> Unrelatedly, should I add copyright notices at the bottom of each perl
>> module so they are displayed in the perldoc/man pages?
> 
> Well, most manpages have information about who made them... which means 
> who was initial author, usually, and/or who is current maintainer.

I don't really care about being credited as the initial author, and I'm
honestly not sure if I'll be able to maintain the modules in the long run.

Should I perhaps add some note along the lines of "Direct questions and
patches to git@vger"?

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching
  2008-07-11  1:11 ` [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching Lea Wiemann
@ 2008-07-14 21:23   ` Jakub Narebski
  2008-07-14 23:03     ` Lea Wiemann
  0 siblings, 1 reply; 63+ messages in thread
From: Jakub Narebski @ 2008-07-14 21:23 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Petr Baudis

On Fri, 11 July 2008, Lea Wiemann wrote:

> Gitweb now uses the Git::Repo API; this change is behavior-preserving,
> except for slightly more aggressive error handling; see below.

Good.

It was suggested to split this into separate commit from the following 
change, for making it easier to test (you can check that behavior is
the same, with the exception of error handling) and review (smaller
patch to read and review).
 
> This patch also adds an optional caching layer for caching repository
> data in memory and (for larger cacheable items, like blobs, snapshots,
> or diffs) on disk.

As it was said, if feasible it would be good idea to put this change
into separate commit.

> Other minor changes:
> 
> - Gitweb would previously accept invalid input and either (a) display
>   nothing, (b) display an obscure error message, or (c) proceed as
>   normal since the parameter happens to be unused in the particular
>   code path used.  This has changed in that gitweb will check for
>   parameter correctness more aggressively, and display meaningful
>   error messages.  This change is only relevant if you manually edit
>   gitweb's CGI parameters, since gitweb only generates valid links.

I understand that this change deals with treating invalid specifiers,
which point to either object that do not exists, are ambiguous, or point 
to object of invalid type.  Gitweb does check "syntactic" validity of 
input (of CGI parameters) already, even those that are not used for 
selected action.

BTW. such check was not feasible before implementing --batch and/or 
--batch-check options to git-cat-file; I think that possibly one more 
fork is not much price to pay for better error checking.

> - Empty projects:
> 
>   - Only display summary link for empty projects in project list to
>     avoid broken links (yielding 404).
> 
>   - Slim down summary page for empty projects to avoid some broken
>     links and unnecessary vertical space.
> 
>   - Sort empty projects at the bottom of the project list when sorting
>     by last change.
> 
>   - Add test for empty projects to t9503 (the Mechanize test), now
>     that there no broken links anymore.

Good.  The only thing that *might* be controversial is putting empty
projects at the bottom of sorted by age (by last change) projects list, 
instead of at top.

> - For HTML pages, remove the "Expires" HTTP response header, and add
>   "Cache-Control: no-cache" instead.  This is because pages can
>   contain dynamic content (like the subject of the latest commit), so
>   the Expires headers would be wrong.
> 
>   This makes gitweb's responsiveness slightly worse, but it will get
>   much better once If-Last-Modified is implemented.  It's better to be
>   correct than to be convenient here, since having to press the reload
>   button makes for lousy user experience (IOW, users should be able to
>   always trust gitweb's output).
> 
>   Raw diffs and blobs still get the Expires header, where appropriate.

I don't think it is a good change.

Gitweb generates two types of views (pages): transient and immutable.
An example of transient view (transient page/action) is for example RSS 
feed, or summary page.  When project (repository) is updated, they can 
change.

The opposite are immutable pages.  They are pages/actions/views where 
all specifiers are given by full SHA-1; to be more exact all specifiers 
that are needed to reconstruct object are given by SHA-1.  (It is 
enough to have sufficient check for immutability, i.e. such that if 
check succeeds, then page is immutable, but it doesn't need to be true 
in reverse.)

Gitweb sets expires to '+1d' which is one day to pages it considers 
immutable, while not defining expires for other pages (which results,
I think, in lack of expires header).  We could have set it to "forever", 
which in terms of Expires: HTTP header is half a year (from what
I remember).

Now I don't see *any* reason to not set long expires for immutable 
pages; I don't know if forbidding to cache transient pages even if in 
fact they are generated dynamically is a good idea...  Note that if 
caching is enabled, you can set expires to either time-to-expire of 
cache entries (simpler), or time left to live to invalidation of item 
in cache (better, but more complicated) perhaps also setting Age: 
header to appropriate value.


Sidenote: we would probably want to use Expires: for HTTP/1.0 requests, 
and Cache-Control: max-age=<seconds> for HTTP/1.1 requests.  But that 
might be left as improvement for later...

> - Add a $page_info option to display cache stats at the bottom of each
>   page; the option is named generically to allow for adding non-cache
>   page info there at some point (timings perhaps?).

Great idea!
 
> ---
> It's all documented of course :-), but for the impatient here's a
> snippet for gitweb_config.perl to activate caching:

Nice.

> use Cache::Memcached;
> $cache = Cache::Memcached->new( { servers => ['localhost:11211'],
>      compress_threshold => 1000 } );

IIRC you can use any Cache::Cache compatibile (is it explained later 
what it means?) cache here; IMVHO it would be nice if this info would 
be also in commit message.

> $large_cache_root = '/home/lewiemann/gitweb-cache';
> $large_cache_case_sensitive = 1;

Errr... I understand that it is your _private_ configuration, just 
copied here verbatim, but I don't think '/home/lewiemann/gitweb-cache'
is a good example: '/tmp/gitweb-cache' perhaps, that I can understand.

> # Invalidate cache on changes to gitweb without version number bump;
> # useful for development.
> $cache_key = (stat '/home/lewiemann/gitweb')[9] . 
>      (stat '/home/lewiemann/gitweb/gitweb.cgi')[9]; 

What should be used in production? "$cache_key = $version;"?

Besides hardcoding those paths is not a good idea.  You can always
use $ENV{'SCRIPT_FILENAME'}, or dirname of it.

> # Display detailed cache info at the bottom of each page.
> $page_info = 2;

Errr... what does "$page_info = <n>;" mean?

> A live demo is here: http://odin3.kernel.org/git-lewiemann/

Nice.  Thanks.

[...]
>  gitweb/README                          |   14 +

Very good.


[Comments on patch itself in separate email, later]
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-14  1:40   ` Petr Baudis
@ 2008-07-14 22:19     ` Lea Wiemann
  2008-07-18 16:48       ` Petr Baudis
  2008-07-14 23:41     ` [PATCH 2/3] add new Git::Repo API Jakub Narebski
  1 sibling, 1 reply; 63+ messages in thread
From: Lea Wiemann @ 2008-07-14 22:19 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, John Hawley, Jakub Narebski

Petr Baudis wrote:
> Maybe I sound too mentoring at places; [OTOH] if something general gets into
> the Git tree, I'd like to make sure it's something we can live happily
> with for long time, not just a hack tailored for gitweb caching.

Yup, I agree (and you don't sound too mentoring ^^).  Thanks for the
review and feedback.

> I really miss some more detailed writeup on the envisioned design here. [...]
> Most importantly, how is Git::Repo interacting with working copies,

Git::Repo is not interacting with working copies at all right now.  Is
there anything you think should be considered for its design?

Here's a write-up about the design (I'll probably move this into
Git::Repo's man page):

----------
Git::Repo aims to provide low-level access to Git repositories.  For
instance, you can resolve object names (like "HEAD~2") to SHA1s, and
inspect objects.  It does not attempt to be a wrapper around the git
plumbing or porcelain commands.

Error handling is simple: On a consistent repository, the Perl interface
will never die.  You can use the get_sha1 method to resolve arbitrary
object names or check the existence of SHA1 hashes; get_sha1 will return
undef if the object does not exist in the repository.  Any SHA1 that is
returned by get_sha1 can be safely passed to the other Git::Repo methods.
----------

> First, I don't think it's good idea at all to put the pipe-related stuff
> to Git::Repo - this is botched up API just like the current one.

Well, they're more like helper methods.  Since they don't fit into the
design goals of the Git::Repo API at all, I'd suggest we just
underscore-prefix them and take them out of the man page.  (The only
reason why I hadn't done this is that gitweb uses $repo->cmd_output
extensively, so it'd end up with a lot of underscore calls.  But I
suppose we can either alias _cmd_output to cmd_output in gitweb's
CachedRepo subclass, or live with $repo->_cmd_output calls.)  Does
underscore-prefixing sound good to you?

If someone wants to come up with a consistent nice interface for calling
git commands, sure.  I wasn't actually trying to do that.

> Here is an idea: Introduce Git::Command object [and a Git::CommandFactory]
> 
> 	my $c = Git::Command->new(['git', '--git-dir=.', 'cat-file', \
> 		'-p', 'bla'], {pipe_out=>1})    [...]
> 	my $cf = Git::CommandFactory->new('git', '--git-dir=.');
> 
> Or am I overdoing it?

Yes, I think so. ;-)  All we're talking about here is a wrapper around
"open '-|'" calls (plus some workarounds for Windows I suppose).

I don't have much of a notion of a 'command' as an object in my head;
your (snipped) example makes it look like you're trying to create a
IO::Handle-compatible interface, which I think is way too much effort
(and error-prone) for simple pipes.  Also, a CommandFactory class just
to catenate lists together seems like overkill to me.

Something like a command interface *may* make sense if it's tied to
repositories or working copies, in which case it could automatically set
--git-dir or --work-tree, but it's beyond the scope of what I'm trying
to create here, and I don't think it's even overly useful.

> Instead, I believe the best course is to gradually translate all the
> Git.pm functionality to the new OO model, leaving Git.pm as a
> compatibility wrapper. Now, if you believe this is a non-trivial task,
> please tell us why.

Well, as I said, the fact that there are untested parts in Git.pm
doesn't exactly make it trivial to refactor.  Also, only very few parts
can be cleanly moved to Git::Repo.  In "grep '^sub [^_]' Git.pm" I find
only cat_blob and perhaps hash_object to be eligible to be moved (though
hash_object should probably live in a working-copy/non-bare-repo class,
with an optional insert => 1 parameter).  And even cat_blob is
non-trivial to move unless you want the whole blob to be read into memory.

That's a lot of non-trivialness for very little gain.  I doubt I'd even
have enough time till the end of GSoC (minus vacation) to do this.

> It should be actually very easy to start with moving all the pipe
> functionality to Git::Command.

Creating a new (Git::Command) API is very much non-trivial, apart from
the fact that I'm not convinced that we need Git::Command, and that a
clean command interface neither falls out of Git.pm nor Git::Repo.

>>   I'm working on caching for gitweb, not on implementing the
>>   next great Perl API for Git.  (And Git::Repo isn't great, FTR.)
> 
> Wait, I can't make sense out of this paragraph. If Git.pm sucks, we can
> work on new API. But we better _make_ it great. Or someone else comes by
> next year and says "oh, but it's buggy and needs refactoring, let's
> throw it away and redesign it!"

Sorry, I wasn't clear with my parenthesed remark: I actually think that
Git::Repo is pretty good in terms of code and interface quality.  It's
just not *complete*, even in its limited scope, and I'm not attempting
to make it complete.

I do think that someone who wants to extend Git::Repo (like Jakub with
Git::Config) won't have much trouble doing so with the existing design.

>> +use constant _MESSAGE => 'M';
>> +use constant _ENCODING => 'E';   [snip]
> 
> if you are going to use hashes
> anyway, why not actually key by sensible name directly?

Embarrassingly premature optimization here. ^_^  I'll fix it.

>> $commit = Git::Commit->new($repo, $sha1)
> 
> Are we sure we don't want hash-based arguments instead? This is badly
> extensible and inconsistent with the rest of the API.

*ponders*  Every commit needs a repo and a SHA1, so those will never get
optional.  We can always add hash-based options after the two mandatory
arguments, but I don't even see any such possible options at the moment.
 (And if I turn out to be completely wrong, we can even move to a
hash-only argument list by checking the type of the first parameter.)
Really, I wouldn't worry.

>> [Git::Commit->new, Git::Tag->new:]
>> +Calls to this method are free, since it does not check whether $sha1
>> +exists and has the right type.  However, accessing any of the commit
>> +object's properties will fail if $sha1 is not a valid commit object.
> 
> This is nice idea, but I'd also provide a well-defined way for the user
> to verify the object's validity at a good moment; basically, make load()
> a public method. The user can deal with errors then and rely on
> error-free behavior later.

No, you should never pass in an invalid SHA1 in the first place.  The
above piece of documentation is just a warning that bugs will show up
delayed.  IOW, this is not the right place to have your error handling.

If you're getting a SHA1 through the user-interface, check its existence
with get_sha1 before passing it to the constructor.

>> +Note that $sha1 must be the SHA1 of a commit object; tag objects are
>> +not dereferenced.
> 
> Why not?

Because the SHA1 might resolve to an object of the wrong type, which
means you have to do error handling in Git::Object objects; that's the
wrong place.

If tag-resolving is really needed, we can add an optional $type
parameter to get_sha1, which will cause get_sha1 to resolve the object
until a $type object is found, or return undef if the object is or
resolves to an object of the wrong type.

I have resolving code in gitweb's git_get_sha1_or_die (which I didn't
implement in Git::Repo since it uses some customized error reporting).
The resolving code could conceivably be extracted and moved to get_sha1.
 I think there are a few things to ponder and maybe discuss, so I'd do
that in a separate patch (if I get around it before the end of the project).

>> +=item $commit->authors
> 
> author

Fixed.

>> +Objects are loaded lazily, and hence instantiation is free.  Objects
>> +stringify to their SHA1s.
> 
> Maybe use the term 'Git database objects'? This way, it seems as if we
> are talking about all Git/*.pm objects.

I've replaced it with: "Objects are loaded lazily, and hence
instantiation is free.  Git::Object instances stringify to their SHA1s."

>> +sub sha1 {
>> +sub stringify {
> 
> Why not overload "" directly to sha1()?

Done (and removed stringify).

>> +sub assert_opts {
>> +sub assert_sha1 {
> 
> Pretend names with underscores, since they are internal?

Done, and removed them from @EXPORT_OK.

>> 'directory': The directory of the repository (mandatory).
> 
> I don't think making this mandatory is reasonable, since all the git
> commands can automatically figure this out by themselves too; so can
> Git::Repo easily by calling git rev-parse --git-dir.

Sure, it can be made non-mandatory if it's needed, but there are so many
possibilities for the exact time and place at which the repo directory
should be resolved using rev-parse (if at all) that I'd rather leave
this to the person who has an actual use-case for it.  I'm not a fan of
designing APIs before they are needed.

>> [Snipped a lot of quoting --LW]
>> +=item $repo->repo_dir
>> +=item $repo->git_binary
>> +=item $repo->version
>> +sub _get_git_cmd {
> 
> This definitely does not belong to a Git::Repo object.

Which of those methods are you referring to?  I think $repo->version
might reasonably be removed (and the code re-added to gitweb); I'll do
so unless you object.  _get_git_cmd is already underscored, and repo_dir
and git_binary only access attributes passed in through the constructor,
so I think those three should stay.

>> +=item $repo->cat_file($sha1)
> 
> I don't think this is good combination of semantic and name. Since we
> don't do the same thing as plain git cat-file, we might as well call it
> cat_object() or even better get_object().

Yup; I like get_object (I think I was planning to rename it and then
didn't remember doing so before sending off the patch).  Will rename it.

>> +=item $repo->get_path($tree_sha1, $file_sha1)
> 
> Now we are quickly getting messy again. This should definitely live in
> Git::Tree.

Yup, that's true.  I'll move it into gitweb until we have Git::Tree
(with a comment that it can be moved to Git::Tree once it exists).

>> +=item $repo->get_refs
>> +=item $repo->get_refs($pattern)
> 
> Again, the refs should be properly integrated into the object structure.

Really?  I think it's generally fine for get_refs to exist and to live
in Git::Repo.

Its return value (currently an an arrayref of [$sha1, $object_type,
$ref_name] arrayrefs) might need improvement though, and I find the
$pattern parameter pretty suspect (in that it smells like a for-each-ref
wrapper).  Since get_refs is unused at the moment (gitweb ended up
needing the slightly different show-ref), I'll remove it for now.  (Same
thing about me not being a fan of premature API design applies.)

I keep patches of everything I remove so other people will be able to
use them as starting points; I'll probably post them as "FYI"-patches to
the list at the end of my project, to preserve them for posterity.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching
  2008-07-14 21:23   ` Jakub Narebski
@ 2008-07-14 23:03     ` Lea Wiemann
  2008-07-14 23:14       ` Jakub Narebski
  0 siblings, 1 reply; 63+ messages in thread
From: Lea Wiemann @ 2008-07-14 23:03 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John Hawley, Petr Baudis

Jakub Narebski wrote:
> It was suggested to split this into separate commit

Yup; I'll probably send updated patches tomorrow night (also for patch 2/3).

>> - gitweb will check for parameter correctness more aggressively,
> 
> I understand that this change deals with treating invalid specifiers,
> which point to either object that do not exists, are ambiguous, or point 
> to object of invalid type.

Yes, that's right.  (I don't believe we have any point where ambiguity
might come up though.)

>> - Empty projects: [...]
> 
> Good.  The only thing that *might* be controversial is putting empty
> projects at the bottom of sorted by age (by last change) projects list, 
> instead of at top.

Yup; let's see if anyone objects though.  If I sort the list by "Last
Change", I usually want to see projects with recent activity, not dead
project, at the top, which is why I changed it (since I was touching
that line anyway).

>> - For HTML pages, remove the "Expires" HTTP response header, and add
>>   "Cache-Control: no-cache" instead.  This is because pages can
>>   contain dynamic content (like the subject of the latest commit)
> 
> I don't think it is a good change.

Hm; I thought transient titles could slip in (e.g. try opening the tree
of some commit and remove the hb parameter; the URL will seem cacheable,
but the page contains the title of the HEAD commit), but I can't find
any URL right now where mainline actually sets a wrong Expires header.
I'll look into it; if you don't see me posting about it again I'll
re-add the Expires header.

> Note that if caching is enabled, you can set expires to either
> time-to-expire of cache entries (simpler), or time left to live to
> invalidation of item in cache (better, but more complicated)

Gitweb's cache is actually never out-of-date, and cache invalidation
happens automatically.  It uses some (long) expiry times to guard
against non-standard modification of the repository, but it's nothing
the HTTP client should be concerned with.

>> $cache = Cache::Memcached->new( { servers => ['localhost:11211'],
> 
> IIRC you can use any Cache::Cache compatibile cache here;
> IMVHO it would be nice if this info would be also in commit message.

I'll add that.

>> $large_cache_root = '/home/lewiemann/gitweb-cache';
> 
> Errr... I understand that it is your _private_ configuration, just 
> copied here verbatim, but I don't think '/home/lewiemann/gitweb-cache'
> is a good example: '/tmp/gitweb-cache' perhaps, that I can understand.

Yup. ;-)  Or /var/cache/gitweb.

>> # Invalidate cache on changes to gitweb without version number bump;
>> # useful for development.
>> $cache_key = (stat '/home/lewiemann/gitweb')[9] . 
>>      (stat '/home/lewiemann/gitweb/gitweb.cgi')[9]; 
> 
> What should be used in production? "$cache_key = $version;"?

No, nothing.  $version is used automatically as a cache key; I'll add
that to the documentation for $cache_key.

> You can always use $ENV{'SCRIPT_FILENAME'}, or dirname of it.

That one doesn't exist with my thttpd, or any other environment variable
that'd be usable.  It's just a hack anyway, so hardcoded paths are OK.
:)  I don't think gitweb should check its own mtime by default.

>> # Display detailed cache info at the bottom of each page.
>> $page_info = 2;
> 
> Errr... what does "$page_info = <n>;" mean?

Display no (0) / short (1) / detailed (2) page (cache) info at the
bottom of each page.  It's documented in gitweb.perl.

> [Comments on patch itself in separate email, later]

Thanks!

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching
  2008-07-14 23:03     ` Lea Wiemann
@ 2008-07-14 23:14       ` Jakub Narebski
  2008-07-14 23:56         ` Lea Wiemann
  0 siblings, 1 reply; 63+ messages in thread
From: Jakub Narebski @ 2008-07-14 23:14 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Petr Baudis

Lea Wiemann wrote:
> Jakub Narebski wrote:

> > Note that if caching is enabled, you can set expires to either
> > time-to-expire of cache entries (simpler), or time left to live to
> > invalidation of item in cache (better, but more complicated)
> 
> Gitweb's cache is actually never out-of-date, and cache invalidation
> happens automatically.  It uses some (long) expiry times to guard
> against non-standard modification of the repository, but it's nothing
> the HTTP client should be concerned with.

Could you explain then how gitweb cache is invalidated?

The _reasoning_ behind setting Expires:/Cache-Control: max-age= header
for gitweb with caching is that browser doesn't need to even try to
cache-validate or access page for the time we *know* that output would
not change[*1*] because it is from the cache.


[*1*] In significant way: changing relative dates/time doesn't count.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-14  1:40   ` Petr Baudis
  2008-07-14 22:19     ` Lea Wiemann
@ 2008-07-14 23:41     ` Jakub Narebski
  2008-07-15  0:11       ` Lea Wiemann
  2008-07-18 16:54       ` Petr Baudis
  1 sibling, 2 replies; 63+ messages in thread
From: Jakub Narebski @ 2008-07-14 23:41 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Lea Wiemann, git, John Hawley

On Mon, 14 July 2008, Petr Baudis wrote:
> On Fri, Jul 11, 2008 at 03:11:05AM +0200, Lea Wiemann wrote:
> >
> > This also adds the Git::Commit and Git::Tag classes, which are used by
> > Git::Repo, the Git::Object base class, and the Git::RepoRoot helper
> > factory class.
> 
> I really miss some more detailed writeup on the envisioned design here.
> And if we are redoing the API in a better way, we better should have
> some vision.

Once again: if you are adding some large amount of code, you'd better
describe "whys" of it.

> Most importantly, how is Git::Repo interacting with working copies, and
> how is it going to interact with them as the new OO model shapes up?
> You mention briefly Git::WC later, but it is not really clear how the
> interaction should work.
> 
> 
> First, I don't think it's good idea at all to put the pipe-related stuff
> to Git::Repo - this is botched up API just like the current one. This
> all is independent from any repository instances, in fact it's perfectly
> valid to call standalone remote commands (like ls-remote or, actually,
> clone).

There are three classes of git commands: 

 1. standalone i.e. those that doesn't require even repository, like
    e.g. git-ls-remote, git-clone or git-init (or git wrapper, like
    for example in "git --version").
 2. those that require repository (and should use --git-dir=<path>),
    like for example git-cat-file, git-log / git-rev-list,
    git-for-each-ref / git-show-refs, git-diff-tree, git-ls-tree;
 3. those that require both repository and working copy (and should
    probably use both --git-dir=<path> and --work-tree=<path>), like
    git-commit, git-clean, git-ls-files (the last one can require
    only index).
 3'. those that require both repository and working copy, and whose
    behavior depends on where in working copy we (current directory)
    is[*1*].

*All* git commands should know path to "git" wrapper binary, or where
$GIT_EXEC_PATH is.


[*1*] It is (besides pointer to Git::Repo instance) what Git::WC
differs from simple directory: the pointer where we are, and
wc_path(), wc_subdir(), [wc_cdup(),] and wc_chdir(<SUBDIR>) methods.

> Here is an idea: Introduce Git::Command object that will have very
> general interface and look like
> 
> 	my $c = Git::Command->new(['git', '--git-dir=.', 'cat-file', \
> 		'-p', 'bla'], {pipe_out=>1})
> 	...
> 	$c->close();

Errr... how do you read from such a pipe?  <$c> I think wouldn't work,
unless you would use some trickery...
 
> and a Git::CommandFactory with a nicer interface that would look like
> 
> 	my $cf = Git::CommandFactory->new('git', '--git-dir=.');
> 	my $c = $cf->output_pipe('cat-file', '-p', 'bla');
> 	$c->close();
> 
> Then, Git::Repo would have a single Git::CommandFactory instance
> pre-initialized with the required calling convention, and returned by
> e.g. cmd() method. Then, from the user POV, you would just:
> 
> 	my $repo = Git::Repo->new;
> 	$repo->cmd->output_pipe('cat-file', '-p', 'bla');
> 
> Or am I overdoing it?

You are probably overdoing it.


I think it would be good to have the following interface

Git->output_pipe('ls-remotes', $URL, '--heads');
[...]
$r = Git::Repo->new(<git_dir>);
$r->output_pipe('ls_tree', 'HEAD');
[...]
$nb = Git::Repo::NonBare->new(<git_dir>[, <working_area>]);
$nb->output_pipe('ls-files');


How can it be done with minimal effort, unfortunately I don't know...

> Git::Repo considers only bare repositories. Now, since "working copy" by
> itself has nothing to do with Git and is just an ordinary directory
> tree,

Well, it does provide also current subdir pointer, pointer to git
repository it is associated with, and a few methods to examine and
change both.

> I think Git::WC does not make that much sense, but something like 
> Git::Repo::Nonbare certainly would. This would be a Git::Repo subclass
> with:
> 
> 	* Different constructor
> 
> 	* Different Git::CommandFactory instance
> 
> 	* Git::Index object aside the existing ones (like Git::Config,
> 	  Git::RefTree, ...)
> 
> 	* Some kind of wc_root() method to help directory navigation
> 
> And that pretty much covers it?

Good idea, I think.


> Another thing is clearly describing how error handling is going to work.
> I have not much against ditching Error.pm, but just saying "die + eval"
> does not cut it - how about possible sideband data? E.g. the failure
> mode of Git.pm's command() method includes passing the error'd command
> output in the exception object. How are we going to handle it? Now, it
> might be actually okay to say that we _aren't_ going to handle this if
> it is deemed unuseful, but that needs to be determined too. I don't know
> off the top of my head.

I think that the solution might be some output_pipe option on how to
treat command exit status, command STDERR, and errors when invoking
command (for example command not found).

Mentioned http://http://www.perl.com/pub/a/2002/11/14/exception.html
explains why one might want to use Error.pm.


> > ---
> >
> > Please note before starting a reply to this: This is not an argument;
> > I'm just explaining why I implemented it the way I did.  So please
> > don't try to argue with me about what I should or should have done.
> > I'm not going to refactor Git::Repo to use Git.pm or vice versa; it's
> > really a much more non-trivial task than you might think at first
> > glance.
[...]
 
> I agree that your main objective is caching for gitweb, but that's not
> what everything revolves around for the rest of us. If you chose the way
> of caching within the Git API and introducing the API to gitweb, I think
> you should spend the effort to deal with the API properly now.

I think the idea is that gitweb caching as it is implemented (data
caching) requires some kind of Perl API, and that existing Git.pm
didn't cut -- therefore Git::Repo and friends was created.  But the
focus is gitweb caching, not Perl API (besides Perl API having to
be usable).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching
  2008-07-14 23:14       ` Jakub Narebski
@ 2008-07-14 23:56         ` Lea Wiemann
  2008-07-15  0:52           ` Jakub Narebski
  0 siblings, 1 reply; 63+ messages in thread
From: Lea Wiemann @ 2008-07-14 23:56 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John Hawley, Petr Baudis

Jakub Narebski wrote:
> Lea Wiemann wrote:
>> Gitweb's cache is actually never out-of-date
> 
> Could you explain then how gitweb cache is invalidated?

Sure; from gitweb.perl:

# Transient cache entries (like get_sha1('HEAD')) are automatically
# invalidated when an mtime of either the repository's root directory
# or of the refs directory or any subdirectory changes.  This
# mechanism *should* detect changes to the repository reliably if you
# only use git or rsync to write to it,

IOW, gitweb will do a small number of (inexpensive) stat calls on those
directories each time it's called, and use the most recent mtime as part
of the cache key for transient entries.  Hence those transient entries
will automatically become invalid once the most recent mtime changes.

(If any of the relevant directories has been modified since the last
time gitweb checked, gitweb will re-scan the whole tree to check for new
directories, and record their mtimes as well.  See get_last_modification
if you're interested in more gory details.)

The punchline is, the cache never returns outdated data.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-14 23:41     ` [PATCH 2/3] add new Git::Repo API Jakub Narebski
@ 2008-07-15  0:11       ` Lea Wiemann
  2008-07-18 16:54       ` Petr Baudis
  1 sibling, 0 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-07-15  0:11 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Petr Baudis, git, John Hawley

Jakub Narebski wrote:
> Git->output_pipe('ls-remotes', $URL, '--heads');

I'd make it cmd => ['ls-remotes', $URL, '--heads'] to avoid making the
interface inflexible, and perhaps add a git_binary option to specify the
binary path.  (Not that I'm suggesting something like this should be
added right now; but perhaps it'll be needed at some point.)

> I think that the solution might be some output_pipe option on how to
> treat command exit status, command STDERR, and errors when invoking
> command (for example command not found).

You read my design notes about error handling at the top of
<487BD0F3.2060508@gmail.com>, and you noticed there's max_exit_code in
cmd_output?  With the approach to error handling I described, open/exit
errors can cause the method to die, and stderr can be ignored.

If you're trying to implement a more sophisticated error handling
approach, be warned that you're opening a can of worms. ;)  If you're
aiming for something like that, it should be kept out of Git::Repo
(which has very coherent error handling as-is), and preferably be
discussed under a separate subject, since it's way beyond the scope of
this patch.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching
  2008-07-14 23:56         ` Lea Wiemann
@ 2008-07-15  0:52           ` Jakub Narebski
  2008-07-15  1:16             ` Lea Wiemann
  0 siblings, 1 reply; 63+ messages in thread
From: Jakub Narebski @ 2008-07-15  0:52 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Petr Baudis

On Tue, 15 July 2008, Lea Wiemann wrote:
> Jakub Narebski wrote:
> > 
> > Could you explain then how gitweb cache is invalidated?
> 
> Sure; from gitweb.perl:

I'll try to read and comment on patch itself soon.
 
> # Transient cache entries (like get_sha1('HEAD')) are automatically
> # invalidated when an mtime of either the repository's root directory
> # or of the refs directory or any subdirectory changes.  This
> # mechanism *should* detect changes to the repository reliably if you
> # only use git or rsync to write to it,

Nice idea... for project pages.  I'm not so sure about projects_list
page, if wouldn't be better to have expire time for *this* page.  You
would have/have to stat a lot of files/directories to detect changes.

BTW. some summary of the above should be IMHO in the commit message.
In short, you should write, I think, that caching mechanism uses
any Cache::Cache compatible cache for caching data, that large objects
can be cached on filesystem (or perhaps not, at it is a detail), that
cache validity is checked by stat-ing refs area.
 
> IOW, gitweb will do a small number of (inexpensive) stat calls on those
> directories each time it's called, and use the most recent mtime as part
> of the cache key for transient entries.  Hence those transient entries
> will automatically become invalid once the most recent mtime changes.
> 
> (If any of the relevant directories has been modified since the last
> time gitweb checked, gitweb will re-scan the whole tree to check for new
> directories, and record their mtimes as well.  See get_last_modification
> if you're interested in more gory details.)
> 
> The punchline is, the cache never returns outdated data.

In this case you don't have any basis to set expires for transient views;
on the other hand situation doesn't differ much from serving static
files (wrt. cache validation) so perhaps no expires but no "no-cache"
would be a good solution.

P.S. is there any cache evision mechanism (to limit cache size)
in gitweb cache, or just those implemented by caching backend?
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching
  2008-07-15  0:52           ` Jakub Narebski
@ 2008-07-15  1:16             ` Lea Wiemann
  2008-07-15  1:28               ` Johannes Schindelin
  0 siblings, 1 reply; 63+ messages in thread
From: Lea Wiemann @ 2008-07-15  1:16 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John Hawley, Petr Baudis

Jakub Narebski wrote:
> On Tue, 15 July 2008, Lea Wiemann wrote:
>> # Transient cache entries (like get_sha1('HEAD')) are automatically
>> # invalidated when an mtime [...] changes.
> 
> Nice idea... for project pages.  I'm not so sure about projects_list
> page, if wouldn't be better to have expire time for *this* page.  You
> would have/have to stat a lot of files/directories to detect changes.

It doesn't seem to be too much of a performance issue (it takes ~500ms
to generate the project list on kernel.org), and project lists aren't
requested often enough to be a good optimization target.

I'll see how it performs when the OS's page cache is cold though; if
it's too slow then I might revisit the issue.

> BTW. some summary of the above should be IMHO in the commit message.

*nods*

> In this case you don't have any basis to set expires for transient views;
> on the other hand situation doesn't differ much from serving static
> files (wrt. cache validation) so perhaps no expires but no "no-cache"
> would be a good solution.

Gitweb provides a view on a live repository, so I don't think caching is
usually what you want (e.g. static pages really tend to change much less
frequently).  (And yes, my Opera *does* seem to cache those pages unless
you add no-cache.)

Really, I don't think the performance penalty of revalidation will be
significant enough to justify risking stale data.  IOW, it's pretty
fast.  Check out repo.or.cz's performance (and repo.or.cz doesn't even
use Last-Modified).

> P.S. is there any cache evision mechanism (to limit cache size)
> in gitweb cache, or just those implemented by caching backend?

For the $cache, it's in the caching backend, for the $large_cache, there
is no mechanism, so you need to clean up yourself.  See the patch. ;-)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching
  2008-07-15  1:16             ` Lea Wiemann
@ 2008-07-15  1:28               ` Johannes Schindelin
  2008-07-15  1:44                 ` J.H.
  2008-07-15  1:50                 ` Lea Wiemann
  0 siblings, 2 replies; 63+ messages in thread
From: Johannes Schindelin @ 2008-07-15  1:28 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: Jakub Narebski, git, John Hawley, Petr Baudis

Hi,

On Tue, 15 Jul 2008, Lea Wiemann wrote:

> Jakub Narebski wrote:
> > On Tue, 15 July 2008, Lea Wiemann wrote:
> >> # Transient cache entries (like get_sha1('HEAD')) are automatically
> >> # invalidated when an mtime [...] changes.
> > 
> > Nice idea... for project pages.  I'm not so sure about projects_list 
> > page, if wouldn't be better to have expire time for *this* page.  You 
> > would have/have to stat a lot of files/directories to detect changes.
> 
> It doesn't seem to be too much of a performance issue (it takes ~500ms
> to generate the project list on kernel.org), and project lists aren't
> requested often enough to be a good optimization target.

Wasn't the main page (i.e. the projects list) the reason why kernel.org 
has its own little caching mechanism in the first place?

And did Pasky not report recently that repo.or.cz got substantially less 
loaded with some caching of its own?

warthog?  Pasky?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching
  2008-07-15  1:28               ` Johannes Schindelin
@ 2008-07-15  1:44                 ` J.H.
  2008-07-15  1:50                 ` Lea Wiemann
  1 sibling, 0 replies; 63+ messages in thread
From: J.H. @ 2008-07-15  1:44 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Lea Wiemann, Jakub Narebski, git, Petr Baudis

On Tue, 2008-07-15 at 03:28 +0200, Johannes Schindelin wrote:
> Hi,
> 
> On Tue, 15 Jul 2008, Lea Wiemann wrote:
> 
> > Jakub Narebski wrote:
> > > On Tue, 15 July 2008, Lea Wiemann wrote:
> > >> # Transient cache entries (like get_sha1('HEAD')) are automatically
> > >> # invalidated when an mtime [...] changes.
> > > 
> > > Nice idea... for project pages.  I'm not so sure about projects_list 
> > > page, if wouldn't be better to have expire time for *this* page.  You 
> > > would have/have to stat a lot of files/directories to detect changes.
> > 
> > It doesn't seem to be too much of a performance issue (it takes ~500ms
> > to generate the project list on kernel.org), and project lists aren't
> > requested often enough to be a good optimization target.
> 
> Wasn't the main page (i.e. the projects list) the reason why kernel.org 
> has its own little caching mechanism in the first place?
> 

Partially - it was by *far* the biggest problem we were facing, and it
likely has gotten a lot better since I went on a rampage and did
gitweb-caching.

That said there's a relatively small sample-set of files that gets hit
more than others it seems (from stuff I was looking at a while ago) and
limiting the general disk i/o on the kernel.org machines is *always* a
good thing (the less we go to disk the better) so while it started that
way it has been very beneficial, if it does chew up something like
15-20G of disk on those machines.)

- John 'Warthog9' Hawley

> And did Pasky not report recently that repo.or.cz got substantially less 
> loaded with some caching of its own?
> 
> warthog?  Pasky?
> 
> Ciao,
> Dscho

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching
  2008-07-15  1:28               ` Johannes Schindelin
  2008-07-15  1:44                 ` J.H.
@ 2008-07-15  1:50                 ` Lea Wiemann
  2008-07-15  2:03                   ` J.H.
  1 sibling, 1 reply; 63+ messages in thread
From: Lea Wiemann @ 2008-07-15  1:50 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jakub Narebski, git, John Hawley, Petr Baudis

Johannes Schindelin wrote:
> Wasn't the main page (i.e. the projects list) the reason why kernel.org 
> has its own little caching mechanism in the first place?
> 
> And did Pasky not report recently that repo.or.cz got substantially less 
> loaded with some caching of its own?

Yes, you need *some* caching mechanism.  Once you have that, it's fine,
at least if the page cache is hot.  (The x-thousand stat calls that my
caching implementation issues don't actually take that much time; I
suspect the ~1000 calls to memcached are the more expensive [and
optimizable] part, though I'd have to benchmark that.)

Mainline vs. my caching implementation (both with hot page cache) on
odin3.kernel.org:

$ time wget -qO/dev/null http://localhost/git-lewiemann/vanilla/
real    0m3.070s
$ time wget -qO/dev/null http://localhost/git-lewiemann/
real    0m0.719s

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching
  2008-07-15  1:50                 ` Lea Wiemann
@ 2008-07-15  2:03                   ` J.H.
  0 siblings, 0 replies; 63+ messages in thread
From: J.H. @ 2008-07-15  2:03 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: Johannes Schindelin, Jakub Narebski, git, Petr Baudis

To continue with that benchmarking it seems if gitweb-caching isn't hot
it takes about 3 seconds, if it is I'm about 7x faster than Lea's ;-)

$ time wget -qO/dev/null http://localhost/git/

real    0m2.952s
user    0m0.001s
sys     0m0.004s
$ time wget -qO/dev/null http://localhost/git/

real    0m0.108s
user    0m0.001s
sys     0m0.001s

But regardless of who's faster - I would still argue there are two
reasons to at least have some caching, even if it's crappy:

1) Prevent the thundering heard problem - many requests for the same
thing all generating the same data is bad, and kills I/O

2) Relatively static data can be generated once and stick around for a
bit and serve more requests more efficiently.  Now I agree that
invalidating the cache on a new mtime is better than my current
algorithm (which is purely time based on the cache data vs. the original
data, with some allowances for back-off due to load).

- John 'Warthog9' Hawley

On Tue, 2008-07-15 at 03:50 +0200, Lea Wiemann wrote:
> Johannes Schindelin wrote:
> > Wasn't the main page (i.e. the projects list) the reason why kernel.org 
> > has its own little caching mechanism in the first place?
> > 
> > And did Pasky not report recently that repo.or.cz got substantially less 
> > loaded with some caching of its own?
> 
> Yes, you need *some* caching mechanism.  Once you have that, it's fine,
> at least if the page cache is hot.  (The x-thousand stat calls that my
> caching implementation issues don't actually take that much time; I
> suspect the ~1000 calls to memcached are the more expensive [and
> optimizable] part, though I'd have to benchmark that.)
> 
> Mainline vs. my caching implementation (both with hot page cache) on
> odin3.kernel.org:
> 
> $ time wget -qO/dev/null http://localhost/git-lewiemann/vanilla/
> real    0m3.070s
> $ time wget -qO/dev/null http://localhost/git-lewiemann/
> real    0m0.719s

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-11  1:11 ` [PATCH 2/3] add new Git::Repo API Lea Wiemann
                     ` (2 preceding siblings ...)
  2008-07-14  1:40   ` Petr Baudis
@ 2008-07-16 18:21   ` Jakub Narebski
  2008-07-16 20:32     ` Lea Wiemann
  3 siblings, 1 reply; 63+ messages in thread
From: Jakub Narebski @ 2008-07-16 18:21 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Petr Baudis

[Here is promised patch review]

On Fri, 11 July 2008, Lea Wiemann wrote:

>  create mode 100644 perl/Git/Commit.pm
>  create mode 100644 perl/Git/Object.pm
>  create mode 100644 perl/Git/Repo.pm
>  create mode 100644 perl/Git/RepoRoot.pm
>  create mode 100644 perl/Git/Tag.pm

Does splitting into many small files is really necessary?

>  create mode 100755 t/t9710-perl-git-repo.sh
>  create mode 100755 t/t9710/test.pl
 
Good that you provided test suite.

> diff --git a/perl/Git/Commit.pm b/perl/Git/Commit.pm
> new file mode 100644
> index 0000000..a9bc304
> --- /dev/null
> +++ b/perl/Git/Commit.pm
> @@ -0,0 +1,163 @@
> +=head1 NAME
> +
> +Git::Commit - Object-oriented interface to Git commit objects.
> +
> +=cut
> +
> +use strict;
> +use warnings;
> +
> +
> +package Git::Commit;
> +
> +use base qw(Git::Object);
> +
> +use constant _MESSAGE => 'M';
> +use constant _ENCODING => 'E';
> +use constant _TREE => 'T';
> +use constant _PARENTS => 'P';
> +use constant _AUTHOR => 'A';
> +use constant _COMMITTER => 'C';

What is this for? Why not keep content of commit headers keyed under
their names? It would be much simpler code, and I don't think the
above buys anything in terms of performance.

There is another thing to consider: commit object and tag object format
was deliberately (I think) designed to be *extensible* (in a way).
You can add new headers, and old git binary should simply ignore
unknown headers.  This is what allowed us to add 'encoding' header
to commit object, and propose 'keyword' header to tag object (which
was to be used for notes / post-factum annotations).

> +# Keep documentation in one place to save space.

Errrrr... that is a stupid idea, and even more stupid reasoning.

Embedded PODs in Perl modules serve as sort of literate programming,
serving to describe code (technical/usage documentation) in addition
to comments in code.

> +=item $commit = Git::Commit->new($repo, $sha1)
> +
> +Return a new Git::Commit instance for a commit object with $sha1 in
> +repository $repo.
> +
> +Calls to this method are free, since it does not check whether $sha1
> +exists and has the right type.  However, accessing any of the commit
> +object's properties will fail if $sha1 is not a valid commit object.
> +
> +Note that $sha1 must be the SHA1 of a commit object; tag objects are
> +not dereferenced.

The technique you use has a name, and it is (IIUC) "lazy evaluation".
By the way, wouldn't it be better to make this method internal, and
use instead the following code to generate Git::Commit object

  $commit = $repo->commit($sha1);

> +=item $obj->repo
> +=item $obj->sha1

Those do not access the repository, isn't it?

> +=item $commit->tree
> +
> +Return the tree this commit object refers to.

I think you should say there explicitely that it is SHA-1 of tree
(tree object identifier) that is returned.  Isn't it?

By the way I think it should be mentioned explicitly in documentation
which methods cause "instantation" (which cause evaluation).

[The fact that documentation is separated from code means that
 I cannot easily tell and write if code match documentation]

> +=item $commit->parents
> +
> +Return a list of zero or more parent commit objects.  Note that commit
> +objects stringify to their respective SHA1s, so you can alternatively
> +treat this as a list of SHA1 strings.

Array or arrayref?

There is little inconsistency that tree object is (from the lack of
Git::Tree object) returned as SHA1, and parents as objects.

NOTE that element of list of revisions has in addition to that also
_effective_ parents in the event of history simplification, for example
for 'history' view, or when using '--first-parent' extra option.

> +=item $commit->authors

s/authors/author/; singular.

> +
> +Return the author string of this commit object.
> +
> +=item $commit->committer
> +
> +Return the committer string of this commit object.

It returns whole value of 'author' and 'committer' headers, not
something extracted from it (into name, email, epoch and timezone),
isn't it?

> +=item $commit->message
> +
> +Return the undecoded commit message of this commit object.

Just raw data?

NOTE that for element of list of revisions (as returned by git-rev-list
or git-log) would probably have commit message decoded to UTF-8 by git.

> +=item $commit->encoding
> +
> +Return the encoding header of the commit object.

Normalized?

> +sub tree {
> +	my $self = shift;
> +	$self->_load;
> +	$self->{_TREE()};
> +}
> +
> +sub parents {
> +	my $self = shift;
> +	$self->_load;
> +	map { ref($self)->new($self->repo, $_) } @{$self->{_PARENTS()}};
> +}

Nice... but shouldn't ->tree return Git::Tree?


> +sub author {
> +	my $self = shift;
> +	$self->_load;
> +	$self->{_AUTHOR()} or '';
> +}
> +
> +sub committer {
> +	my $self = shift;
> +	$self->_load;
> +	$self->{_COMMITTER()} or '';
> +}

Nowhere in documentation is mentioned that you use empty value for no
author or no committer (isn't commit object invalid then?).

> +# Auxiliary method to load (and parse) the commit object from the
> +# repository if it hasn't already been loaded.
> +
> +sub _load {
> +	my($self, $raw_text) = shift;

Probably should read:

+	my ($self, $raw_text) = shift;


> +	return if defined $self->{_MESSAGE()};  # already loaded

Do you ensure that 'message' (or 'body') is always defined?  Why not
use "exists" instead; is there some reason to chose "defined"?

> +	my $sha1 = $self->sha1;
> +	if (!defined $raw_text) {
> +		# Retrieve from the repository.
> +		(my $type, $raw_text) = $self->repo->cat_file($sha1);
> +		die "$sha1 is a $type object (expected a commit object)"
> +		    unless $type eq 'commit';
> +	}

The above makes Git::Commit good solution for gitweb's 'commit' and
'commitdiff' views, but bad solution for 'log', 'shortlog', 'history'
and 'rss'/'atom' views, where you would need to many command
invocations, which is very bad on OS with slow fork.

> +	(my $header, $self->{_MESSAGE()}) = split "\n\n", $raw_text, 2;
> +	# Parse header.

Why not simply parse headers, then slurp rest of object into 'message'
field?  I'm just asking, because it is alternate (but I don't know if
better) solution.

> +	for my $line (split "\n", $header) {
> +		local $/ = "\n"; # for chomp
> +		chomp($line);
> +		my($key, $value) = split ' ', $line, 2;
> +		if ($key eq 'tree') {
> +			$self->{_TREE()} = $value;
> +		} elsif ($key eq 'parent') {
> +			push @{$self->{_PARENTS()}}, $value;
> +		} elsif ($key eq 'author') {
> +			$self->{_AUTHOR()} = $value;
> +		} elsif ($key eq 'committer') {
> +			$self->{_COMMITTER()} = $value;
> +		} elsif ($key eq 'encoding') {
> +			$self->{_ENCODING()} = $value;
> +		} else {
> +			die "unrecognized commit header $key";
> +		}

IMHO you should not die at unrecognized header, but simply store it
under its name (and make available using ->header('<NAME>')).  Please
remember that commit object format is extensible.

Nevertheless I think you can check for header name validation (there
are some restrictions on header names, isn't it?).

> +=head1 NOTES
> +
> +You will usually want to call $repo->get_commit($sha1) instead of
> +instantiating this class directly; see L<Git::Repo>.
> +
> +=cut

Ahhh, O.K.

Why not $repo->commit($sha1)?


> diff --git a/perl/Git/Object.pm b/perl/Git/Object.pm
> new file mode 100644
> index 0000000..a5126f9
> --- /dev/null
> +++ b/perl/Git/Object.pm
> @@ -0,0 +1,81 @@
> +=head1 NAME
> +
> +Git::Object - Object-oriented interface to Git objects (base class).
> +
> +=head1 DESCRIPTION
> +
> +Git::Object is a base class that provides access to commit, tag and
> +(unimplemented) tree objects.  See L<Git::Commit> and L<Git::Tag>.

Is it base class which represents types of objects in git repository:
commits, tags, trees and blobs?  Or just a class which represent
headers+payload objects, i.e. commits and tags?

> +Objects are loaded lazily, and hence instantiation is free.  Objects
> +stringify to their SHA1s.

This should probably be repeated in documentation for Git::Commit and
Git::Tree (possibly linking to Git::Object(3pm) manpage), to be more
self sufficient.

> +use overload
> +    '""' => \&stringify;

Nice.

> +# Hash indices:
> +use constant _REPO => 'R';
> +use constant _SHA1 => 'H';

Again: Is it necessary?

> +=head1 METHODS
> +=item Git::Object->new($repo, $sha1)
> +=item $obj->repo
> +=item $obj->sha1

Why those methods are IIRC repeated in Git::Commit, instead of being
inherited?

I wonder if Git::Object should provide $obj->id alias to $obj->sha1...


> diff --git a/perl/Git/Repo.pm b/perl/Git/Repo.pm
> new file mode 100644
> index 0000000..5a4c7e0

> +use strict;
> +use warnings;
> +use 5.006002;

Why is this "use 5.006002" for?

By the way here you (correctly) intersperse code and its documentation,
contrary to some other files in this patch.

> +our @EXPORT = qw();
> +our @EXPORT_OK = qw(assert_sha1 assert_opts);
> +
> +# Auxiliary subroutines
> +
> +sub assert_opts {
> +	die "must have an even number of arguments for named options"
> +	    unless $#_ % 2;
> +}
> +
> +sub assert_sha1 {
> +	my $sha1 = shift;
> +	die "'$sha1' is not a SHA1 (need to use get_sha1?)"
> +	    unless $sha1 && $sha1 =~ /^[a-f0-9]{40}$/;
> +}

Hmmm... exporting auxiliary subroutines?

> +=item $repo = Git::Repo->new(%opts)
> +
> +Return a new Git::Repo object.  The following options are supported:
> +
> +=over
> +
> +=item 'directory'
> +
> +The directory of the repository (mandatory).
> +
> +Note that this option is working-copy agnostic; you need to
> +instantiate it with the working copy's .git directory as the
> +'directory' option.

Wouldn't it be better to allow the same discovery of '.git' directory
as other git command do, and leave 'git_dir' to set directly path to
repository itself?

> +=item 'git_binary'
> +
> +The name or full path of the git binary (default: 'git').

Probably should be Git::Cmd or Git object, instead.  Well,
'git_binary' could set this up indirectly, though...

> +
> +=back
> +
> +Calling this method is free, since it does not check whether the
> +repository exists.  Trying to access the repository through one of the
> +instance methods will fail if it doesn't exist though.

> +sub new {
> +	my $class = shift;
> +	assert_opts(@_);
> +	my $self = {@_};
> +	bless $self, $class;
> +	die 'no directory given' unless $self->{directory};
> +	return $self;
> +}

Not even rudimentary check: if directory exists, if it looks like
git repository?

> +
> +=item $repo->repo_dir
> +
> +Return the directory of the repository (.../.git in case of a working
> +copy).

I think $repo->git_dir (perhaps in addition to above) would be better
name, as it is already established among git commands.

> +=item $repo->version
> +
> +Return the output of 'git --version', with /^git version / stripped.
> +
> +This method does not require the repository to exist.
> +
> +=cut
> +
> +sub version{
> +	my $self = shift;
> +	chomp(my $version = $self->cmd_output(cmd => ['--version']));
> +	$version =~ s/^git version //;  # be permissive if this does not match
> +	return $version;
> +}

We could rely instead on embedded (during build) version string...
or leave _that_ to $Git::Repo::VERSION / $Git::VERSION.

> +=back
> +
> +=head2 Calling the Git binary
> +
> +=over
> +
> +=item $repo->cmd_output(%opts)

Please do remember that there are git commands which do not need
access to git repository, for example git-ls-remote, git-clone,
git-init, even git-config.  So tying calling Git binary to
_repository_ instance is a bad API; as a convenience (adding
'--git-dir=<path to repo>' automatically) it is a good idea.

> +Return the output of the given git command as a string, or as a list
> +of lines in array context.  Valid options are:
> +
> +=over
> +
> +=item 'cmd'
> +
> +An arrayref of arguments to pass to git (mandatory).
> +
> +=item 'max_exit_code'
> +
> +Die if the exit code of the git binary is greater than
> +C<max_exit_code> (default: 0).
> +
> +=back
> +
> +Example:
> +
> +    $output = $repo->cmd_output(cmd => ['show', 'HEAD'])
> +

I think it would be easier on users if you provide two ways of calling
this command: simple and advanced (see for example CGI methods):

	$output = $repo->cmd_output('show', 'HEAD');

or

	$output = $repo->cmd_output(['show', 'HEAD'],
	                            max_exit_code=>undef);

(which I think should mean: do not die if git command dies).

> +=cut
> +
> +# To do: According to Git.pm, this might not work with ActiveState
> +# Perl on Win 32.  Need to check or wait for reports.

Why not copy code from Git.pm, then?


> +=item $repo->get_bidi_pipe(%opts)
> +
> +Open a new bidirectional pipe and return its STDIN and STDOUT file
> +handles.  Valid options are:

What about returning context, as it was done in Git.pm?

> +
> +=over
> +
> +=item 'cmd'
> +
> +An arrayref of arguments to pass to git (mandatory).

Same comment as for cmd_output.

> +=item 'reuse'
> +
> +If true, reuse a previously opened pipe with the same command line and
> +whose C<reuse> option was true (default: false).

What is this for?  Can you show example usage of this feature?  Or is
it a bit of overengineering, perhaps?

You didn't provide equivalents for the rest of command related methods
from Git.pm, even as simple wrappers around other methods: 
command_oneline, command_output_pipe, command_input_pipe; you do error
management in different way, so I think you don't need command.*close
methods (note that most of errors, beside not finding git binary and
such, are found on close, not on open).

> +=back
> +
> +=head2 Inspecting the Repository
> +
> +=over
> +
> +=item $repo->get_sha1($extended_object_identifier)
> +
> +Look up the object identified by $extended_object_identifier and
> +return its SHA1 hash in scalar context or its ($sha1, $type, $size) in
> +list context, or undef or () if the lookup failed, where $type is one
> +of 'tag', 'commit', 'tree', or 'blob'.
> +
> +See L<git-rev-parse(1)>, section "Specifying Revisions", for the
> +syntax of the $extended_object_identifier string.
> +
> +Note that even if you pass a SHA1 hash, its existence is still
> +checked, and this method returns undef or () if it doesn't exist in
> +the repository.

I think I'd rather allow extended SHA1 syntax in Git::Commit
and Git::Tag constructors; it is one call to git command less
(I think).

Nevertheless providing equivalent of git-rev-parse (the --revs part)
is, I think, a good idea.

> +=cut
> +
> +sub get_sha1 {
> +	my ($self, $object_id) = @_;
> +	die 'no object identifier given' unless $object_id;
> +	die 'object identifier must not contain newlines' if $object_id =~ /\n/;
> +	my ($in, $out) = $self->get_bidi_pipe(
> +		cmd => ['cat-file','--batch-check'], reuse => 1);

Ahhh... here I can see what 'reuse => 1' means, and when it is useful.
But doesn't it make sense _only_ for _bi-directional pipe_?  Are you
sure that you wouldn't get deadlock?

> +=item $repo->cat_file($sha1)
> +
> +Return the ($type, $content) of the object identified by $sha1, or die
> +if no such object exists in the repository.
> +
> +Note that you may want to use the higher-level methods get_commit and
> +get_tag instead.

Shouldn't it be $repo->get_object($sha1) instead?  You don't need to
repeat strange choice of git-cat-file name of command in git, which
is "hysterical raisin"

> +=cut
> +
> +# Possible to-do items: Add optional $file_handle parameter.  Guard
> +# against getting huge blobs back when we don't expect it (for
> +# instance, we could limit the size and send SIGPIPE to git if we get
> +# a blob that is too large).

If I remember correctly you do implement something like that (streamed
output) in gitweb patch.

> +=item $repo->get_commit($commit_sha1)
> +=item $repo->get_tag($tag_sha1)
> +
> +Return a new L<Git::Tag> instance referring to the tag object with SHA1
> +$tag_sha1.

Probably should mention lazy evaluation, too.

By the way, for gitweb you would need (for performance and for
rewritten parents) also get_log / get_commits / get_commits_list

> +=item $repo->get_path($tree_sha1, $file_sha1)
> +
> +Return the path of the tree or blob identified by $file_sha1 in the
> +tree identified by $tree_sha1, or undef if the tree or blob does not
> +exist in the given tree.

Note that it is *not* evaluated lazily.  Probably should go thru
Git::Tree when it gets implemented.

> +sub get_path {
> +	my ($self, $tree, $file_sha1) = @_;
> +	assert_sha1($tree, $file_sha1);
> +	my @lines = split "\n", $self->cmd_output(cmd => ['ls-tree', '-r', '-t', $tree]);
> +	for (@lines) {
> +		if (/^[0-9]+ [a-z]+ $file_sha1\t(.+)$/) {
> +			return $1;
> +		}
> +	}
> +	return undef;
> +}

You would have troubles with filename quoting!  (Probably should be
an option to get_path to get unquoted vs quoted filename).

> +=item $repo->get_refs
> +
> +=item $repo->get_refs($pattern)
> +
> +Return an arrayref of [$sha1, $object_type, $ref_name] triples.  If
> +$pattern is given, only refs matching the pattern are returned; see
> +L<git-for-each-ref(1)> for details.

> +sub get_refs {
> +	my ($self, $pattern) = @_;
> +
> +	return [ map [ split ], split("\n", $self->cmd_output(
> +			 cmd => [ 'for-each-ref',
> +				  defined $pattern ? $pattern : () ])) ];
> +}

I think you would need more complete interface to git-for-each-ref
and/or git-show-refs.

By the way, why split("\n", ...) instead of ensuring $\ = "\n" and
(en)forcing list context?


> +=item $repo->name_rev($committish_sha1, $tags_only = 0)
> +
> +Return a symbolic name for the commit identified by $committish_sha1,
> +or undef if no name can be found; see L<git-name-rev(1)> for details.
> +If $tags_only is true, no branch names are used to name the commit.

Why name_rev, and no describe?


> diff --git a/perl/Git/RepoRoot.pm b/perl/Git/RepoRoot.pm
> new file mode 100644
> index 0000000..12e1836
> --- /dev/null
> +++ b/perl/Git/RepoRoot.pm
> @@ -0,0 +1,103 @@
> +=head1 NAME
> +
> +Git::RepoRoot - A factory class representing a root directory
> +containing Git repositories.
> +
> +=head1 DESCRIPTION
> +
> +Git::RepoRoot is a factory class to create L<Git::Repo> instances that
> +are located under a common root directory.  It also allows for
> +specifying options that all Git::Repo instances will be created with.
> +
> +Using Git::RepoRoot to create Git::Repo instances is entirely
> +optional, but can be more convenient than instantiating them directly.

Nice.  Does Git::RepoRoot provides way to scan for repositories under
commin $projectroot?

> +=over
> +
> +=item 'directory'
> +
> +The directory holding all repositories.

Probably should be named 'root', or 'reporoot', or something...

> +Example:
> +
> +    $repo_root = Git::RepoRoot->new(directory => '/pub/git',
> +                                    git_binary => '/usr/bin/git'q);
                                                                   ^
It looks like typo ------------------------------------------------|

> diff --git a/perl/Git/Tag.pm b/perl/Git/Tag.pm
> new file mode 100644
> index 0000000..4e0549b

> +use constant _MESSAGE => 'M';
> +use constant _ENCODING => 'E';
> +use constant _TAGGER => 'A';
> +use constant _TAG => 'T';
> +use constant _TYPE => 'Y';
> +use constant _OBJECT => 'O';

Again.


> +# Keep documentation in one place to save space.

Again.

Tag object format is meant to be extensible; for example (unfortunately
abandoned) git-notes implementation added 'keyword' (IIRC) header to
tag object format.

> +=item $tag->tagger
> +
> +Return the tagger string of this tag object.

We would probably want some way to extract name, email, epoch/date
(and a way to convert epoch+timezone to RFC or ISO format), timezone.

> +=item $tag->message
> +=item $tag->encoding
> +
> +Return the encoding header of the tag object.

Does tag provide 'encoding' header?

Should (for completeness) Git::Tag provide $tag->validate() method?

> +=head1 NOTES
> +
> +You will usually want to call $repo->get_tag($sha1) instead of
> +instantiating this class directly; see L<Git::Repo>.

Good, although this could be put in description of new() instead...

> diff --git a/t/t9710/test.pl b/t/t9710/test.pl
> new file mode 100755
> index 0000000..188abba
> --- /dev/null
> +++ b/t/t9710/test.pl
> @@ -0,0 +1,165 @@
> +#!/usr/bin/perl
> +use lib (split(/:/, $ENV{GITPERLLIB}));
> +
> +use warnings;
> +use strict;
> +
> +use Test::More qw(no_plan);

> +use Test::Exception;
> +use Carp::Always;

Are those necessary?  If yes, shouldn't they be checked before running
this test script?

> +our $old_stderr;
> +sub discard_stderr {
> +	open our $old_stderr, ">&", STDERR or die "cannot save STDERR";
> +	close STDERR;
> +}
> +sub restore_stderr {
> +	open STDERR, ">&", $old_stderr or die "cannot restore STDERR";
> +}

Nice.

[I haven't examined test in detail].
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-16 18:21   ` Jakub Narebski
@ 2008-07-16 20:32     ` Lea Wiemann
  2008-07-17 23:49       ` Jakub Narebski
  0 siblings, 1 reply; 63+ messages in thread
From: Lea Wiemann @ 2008-07-16 20:32 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John Hawley, Petr Baudis

Jakub Narebski wrote:
> [Here is promised patch review]

Thanks.  A new patch series will follow today (hopefully).

For brevity, I'm incorporating all your suggested changes unless noted
otherwise.

> On Fri, 11 July 2008, Lea Wiemann wrote:
>>  create mode 100644 perl/Git/Commit.pm
>>  create mode 100644 perl/Git/Object.pm
>>  [...]
> 
> Does splitting into many small files is really necessary?

I think it's better.  People shouldn't get an error message if they
write "use Git::Commit".  Also we cannot split later if we don't split
now since people would write "use Git::Repo" and then access
Git::Commit; subsequently splitting the API would (might?) cause
breakage, I believe.

> Good that you provided test suite.

;-)

> You can add new headers, and old git binary should simply ignore
> unknown headers.

Unknown headers are now ignored.

>> +# Keep documentation in one place to save space.
> 
> Embedded PODs in Perl modules serve as sort of literate programming,
> serving to describe code (technical/usage documentation) in addition
> to comments in code.

Yeah, but this part is only a bunch of trivial accessor methods.  If the
module grows and the documentation needs to be split, it can be done
later.  No need to be purist here. ;-)  Also ...

> [The fact that documentation is separated from code means that
>  I cannot easily tell and write if code match documentation]

Several of the methods actually only exist in the Git::Object base
class.  I still documented them in the Commit and Tag modules since
having to look up methods in base class documentation can be a tad
annoying, especially if the base class is never used by users of the API.

>> +=item $commit = Git::Commit->new($repo, $sha1)
>> +
>> +Calls to this method are free, [...]
> 
> The technique you use has a name, and it is (IIUC) "lazy evaluation".

You understand correctly; I'd call it "lazy loading" though.  I've added
that term for clarity.

> By the way, wouldn't it be better to make this method internal, and
> use instead the following code to generate Git::Commit object
> 
>   $commit = $repo->commit($sha1);

I think the constructor shouldn't be internal (= underscore-prefixed,
you mean?), since the Commit/Tag APIs are usable on their own.

>> +=item $obj->repo
>> +=item $obj->sha1
> 
> Those do not access the repository, isn't it?

No, they don't.  (Clarified that in the documentation.)

>> +=item $commit->parents
>> +Return a list of zero or more parent commit objects.
> 
> Array or arrayref?

Array.  I've replaced "list" with "array" in case it helps clarity.

> There is little inconsistency that tree object is (from the lack of
> Git::Tree object) returned as SHA1, and parents as objects.

If we add a Git::Tree API, the Git::Tree objects will stringify to their
SHA1s, so we shouldn't have compatibility issues.  I've changed the
documentation of $commit->tree to this:

"Return an object that stringifies to the SHA1 of the tree that this
commit object refers to.  (Currently this returns an actual string,
but don't rely on it.)"

> NOTE that element of list of revisions has in addition to that also
> _effective_ parents in the event of history simplification, for example
> for 'history' view, or when using '--first-parent' extra option.

Yes, but we don't actually care about those effective parents for the
purpose of the Git::Commit API.  IOW, the effective parent should be
managed by the code that created a list of revisions, not by the
Git::Commit API.

>> +Return the author string of this commit object. [...]
>> +Return the committer string of this commit object.
> 
> It returns whole value of 'author' and 'committer' headers, not
> something extracted from it (into name, email, epoch and timezone),
> isn't it?

Yup; that's why I wrote "{author,committer} *string*". ;)

>> +=item $commit->message
>> +
>> +Return the undecoded commit message of this commit object.
> 
> Just raw data?

Yes, just raw data.  Decoding is too tricky (i.e. not guaranteed to
work) to just add a simple method to the API; IOW, it needs error
handling and perhaps fallback encodings.

> NOTE that for element of list of revisions (as returned by git-rev-list
> or git-log) would probably have commit message decoded to UTF-8 by git.

Yes, but the API doesn't use any of those commands internally, if that's
what you're worried about.

>> +=item $commit->encoding
>> +
>> +Return the encoding header of the commit object.
> 
> Normalized?

No.  (Otherwise I'd have written that ;-).)

>> +sub author { [...]
>> +	$self->{_AUTHOR()} or ''; }
> 
> Nowhere in documentation is mentioned that you use empty value for no
> author or no committer (isn't commit object invalid then?).

Yes, I'd believe so.  I basically wanted to make sure that those methods
always return a string; do you think that this is a bad idea?

>> +	if (!defined $raw_text) {
>> +		# Retrieve from the repository.
>> +		(my $type, $raw_text) = $self->repo->cat_file($sha1);
> 
> The above makes Git::Commit good solution for gitweb's 'commit' and
> 'commitdiff' views, but bad solution for 'log', 'shortlog', 'history'
> and 'rss'/'atom' views, where you would need to many command
> invocations, which is very bad on OS with slow fork.

$repo->cat_file (now renamed to get_object) actually doesn't fork but
uses a pipe (cat-file --batch); I don't think it should be a performance
issue.

>> +	(my $header, $self->{_MESSAGE()}) = split "\n\n", $raw_text, 2;
> 
> Why not simply parse headers, then slurp rest of object into 'message'
> field?

Because we happen to get the raw text in a single string from the Repo
API.  (It shouldn't be a performance/memory issue for Commit or Tag
objects at all. ;-))

> IMHO you should not die at unrecognized header,

Yes, changed, but ...

> but simply store it under its name (and make available using
> ->header('<NAME>')).

... let's not complicate the API unnecessarily.  If a new header pops up
we can immediately add it to the Commit/Tag API.

> Nevertheless I think you can check for header name validation (there
> are some restrictions on header names, isn't it?).

I don't really care, and it's too much work to come up with a test case
for this. ;-)  If the repository is borked to the point of invalid
header names, it's fine for Git::Commit to behave undefinedly.

>> +You will usually want to call $repo->get_commit($sha1) instead of
>> +instantiating this class directly; see L<Git::Repo>.
> 
> Why not $repo->commit($sha1)?

Intuition. ;-)  I think I'd read $repo->commit as "the commit of the
repository", akin to $commit->tree, which doesn't make sense here.

>> +Git::Object - Object-oriented interface to Git objects (base class).
> 
> Is it base class which represents types of objects in git repository:
> commits, tags, trees and blobs?  Or just a class which represent
> headers+payload objects, i.e. commits and tags?

It could represent any object, though I don't see a need for Git::Blob
right now (though it's possible that it's needed later).

> I wonder if Git::Object should provide $obj->id alias to $obj->sha1...

Why?  I don't think it's necessary.

>> +use 5.006002;
> 
> Why is this "use 5.006002" for?

It signifies that this module won't run with Perl <5.6.2.  I've had to
bump it to 5.008 (Perl 5.8); more about that in the message announcing
the next version of the patch series.

> Wouldn't it be better to allow the same discovery of '.git' directory
> as other git command do, and leave 'git_dir' to set directly path to
> repository itself?

I wouldn't use discovery magic here, at least for now, since it's
non-trivial to get it right (and it interacts with possible future
extensions of the API, like Git::WC).  Such a feature can be implemented
if/when it's needed.

>> +=item 'git_binary'
>> +The name or full path of the git binary (default: 'git').
> 
> Probably should be Git::Cmd or Git object, instead.

I don't think something Git::Cmd is a good idea (as I pointed out in my
reply to Petr, <487BD0F3.2060508@gmail.com>), or at least it shouldn't
be implemented as part of this patch series.  This method is really just
supposed to return an argument for exec*p, nothing more.

>> +Calling this method is free, since it does not check whether the
>> +repository exists.  Trying to access the repository through one of the
>> +instance methods will fail if it doesn't exist though.
> 
> Not even rudimentary check: if directory exists, if it looks like
> git repository?

No.  It's not helpful for error handling (which should happen in the
caller), and it's not helpful for bug detection (since it will die on
the first access to the repo anyway), but it causes performance penalty
that can be significant for programs like gitweb.

>> +=item $repo->repo_dir
>> +Return the directory of the repository (.../.git in case of a working
>> +copy).
> 
> I think $repo->git_dir (perhaps in addition to above) would be better
> name, as it is already established among git commands.

I find repo_dir somewhat clearer (and I don't like having more than one
name per method).  We're not trying to mimic or wrap standard git
commands here, anyway.

>> +sub version{
> 
> We could rely instead on embedded (during build) version string...

Yup; it's been deleted anyway.

>> +=item $repo->cmd_output(%opts)
> 
> Please do remember that there are git commands which do not need
> access to git repository,

As I wrote in my reply to Petr, Git::Repo is not trying to be a wrapper
around git binaries, so this method really shouldn't be part of the
official API -- it's just auxiliary; I'll underscore-prefix it.

> I think it would be easier on users if you provide two ways of calling
> [cmd_output]: simple and advanced

It's not part of the API anyway, so no need for complicated calling
conventions, IMO.

>> +# To do: According to Git.pm, this might not work with ActiveState
>> +# Perl on Win 32.  Need to check or wait for reports.
> 
> Why not copy code from Git.pm, then?

Apart from the fact that I don't do cargo-cult programming? ;-)  Git.pm
forks, whereas Git::Repo uses open, '-|', so it's actually different
(and it's not possible to copy the code).

>> +=item $repo->get_bidi_pipe(%opts)
>> +
>> +Open a new bidirectional pipe and return its STDIN and STDOUT file
>> +handles.  Valid options are:
> 
> What about returning context, as it was done in Git.pm?

Why, what should it do?  This just opens a pipe, nothing more.  No need
for introducing complicated concepts.

>> +=item 'reuse'
>> +
>> +If true, reuse a previously opened pipe with the same command line and
>> +whose C<reuse> option was true (default: false).
> 
> What is this for?  Can you show example usage of this feature?

You found it below. :-)  (If you had snipped this, I wouldn't have spent
time finding and pasting an example. ;-))

> I think I'd rather allow extended SHA1 syntax in Git::Commit
> and Git::Tag constructors; it is one call to git command less
> (I think).

I wouldn't -- see my blurb about error handling at the top of my reply
to Petr (<487BD0F3.2060508@gmail.com>).  You're not supposed to pass
anything that you didn't get from get_sha1 into Git::Commit or Git::Tag
constructors, or your error handling is invariably broken.

>> +	my ($in, $out) = $self->get_bidi_pipe(
>> +		cmd => ['cat-file','--batch-check'], reuse => 1);
> 
> Ahhh... here I can see what 'reuse => 1' means, and when it is useful.
> But doesn't it make sense _only_ for _bi-directional pipe_?  Are you
> sure that you wouldn't get deadlock?

Yes to both questions. :-)

>> [get_object:]
>> +# Possible to-do items: Add optional $file_handle parameter.
> 
> If I remember correctly you do implement something like that (streamed
> output) in gitweb patch.

Yes, but only for generic command calls, and with a somewhat unpleasant
(cache-specific) interface.  It'd need a bit of work for the API.

> By the way, for gitweb you would need (for performance and for
> rewritten parents) also get_log / get_commits / get_commits_list

No. ;-)  Doing fine without those.

>> +=item $repo->get_path($tree_sha1, $file_sha1)

This one has been removed as well since it would belong into Git::Tree.

>> +	my @lines = split "\n", $self->cmd_output(cmd => ['ls-tree', '-r', '-t', $tree]);
> 
> You would have troubles with filename quoting!

Thanks.  Since get_path isn't in the Git::Repo API anymore and gitweb's
get_path subroutine didn't handle quoted filenames even before my
patches, I'll only mark it as TODO for now. ;-)

>> +sub get_refs {

This has been removed as well, since it's not used and the interface
would need work.

>> +=item $repo->name_rev($committish_sha1, $tags_only = 0)
> 
> Why name_rev, and no describe?

Feel free to add it. ;-)  (It might take some work to come up with a
decent interface for that method.)

> Does Git::RepoRoot provides way to scan for repositories under
> commin $projectroot?

No.  If it's needed it could reasonably be extracted from gitweb though
(I think).

>> +Return the tagger string of this tag object.
> 
> We would probably want some way to extract name, email, epoch/date
> (and a way to convert epoch+timezone to RFC or ISO format), timezone.

Yeah.  At some point. ;-)

>> +=item $tag->encoding
>> +Return the encoding header of the tag object.
> 
> Does tag provide 'encoding' header?

Sure, for the message.

> Should (for completeness) Git::Tag provide $tag->validate() method?

No, since 'validate' sounds like it would have to do error handling.

If you mean that this should check if the object exists (and has the
advertised type), the user of the API should test for "defined
$tag->repo->get_sha1($tag->object)" or somesuch and do error handling
themselves.

-- Lea

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-16 20:32     ` Lea Wiemann
@ 2008-07-17 23:49       ` Jakub Narebski
  2008-07-18 13:40         ` Lea Wiemann
  0 siblings, 1 reply; 63+ messages in thread
From: Jakub Narebski @ 2008-07-17 23:49 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Petr Baudis

On Wed, 16 July 2008 at 22:32 (13:32:25 -0700 (PDT)), Lea Wiemann wrote:
> Jakub Narebski wrote:
> > Lea Wiemann wrote:

> > > +# Keep documentation in one place to save space.
> > 
> > Embedded PODs in Perl modules serve as sort of literate programming,
> > serving to describe code (technical/usage documentation) in addition
> > to comments in code.
> 
> Yeah, but this part is only a bunch of trivial accessor methods.
> If the module grows and the documentation needs to be split, it can be
> done later.  No need to be purist here. ;-)  Also ...
> 
> > [The fact that documentation is separated from code means that
> >  I cannot easily tell and write if code match documentation]
> 
> Several of the methods actually only exist in the Git::Object base
> class.  I still documented them in the Commit and Tag modules since
> having to look up methods in base class documentation can be a tad
> annoying, especially if the base class is never used by users of the
> API. 

I was wondering (but forgot to ask) why for some files you had
continuous block of documentation, and in others documentation 
interspersed with documented code.

I think that providing documentation for all methods for "front-end"
class, even for those that are implemented in [abstract] base class,
is a very good idea.  Better to have everything in one place, or at
least in "porcelain" documentation.
 
> > NOTE that element of list of revisions has in addition to that also
> > _effective_ parents in the event of history simplification, for
> > example  for 'history' view, or when using '--first-parent' extra
> > option. 
> 
> Yes, but we don't actually care about those effective parents for the
> purpose of the Git::Commit API.  IOW, the effective parent should be
> managed by the code that created a list of revisions, not by the
> Git::Commit API.

True.  Or to be more exact in a Git::Revlist (or somesuch) container
class.
 
> > > +Return the author string of this commit object. [...]
> > > +Return the committer string of this commit object.
> > 
> > It returns whole value of 'author' and 'committer' headers, not
> > something extracted from it (into name, email, epoch and timezone),
> > isn't it?
> 
> Yup; that's why I wrote "{author,committer} *string*". ;)

We would probably want _in the future_ to  return some object wrapper,
which stringifies to value of author and committer headers (to author
and committer string), but allows to extract (and format) parts of it,
for example

  $commit->author->name
  $commit->author->email
  $commit->author->date(format => RFC2822)

or perhaps

  $commit->author{'email'}

> > > +=item $commit->message
> > > +
> > > +Return the undecoded commit message of this commit object.
> > 
> > Just raw data?
> 
> Yes, just raw data.  Decoding is too tricky (i.e. not guaranteed to
> work) to just add a simple method to the API; IOW, it needs error
> handling and perhaps fallback encodings.

I'd rather then have _git_ convert it to UTF=8 for us (using 
--encoding=<encoding> option to git-log/git-rev-list), see below
on using git-log.

> > NOTE that for element of list of revisions (as returned by 
> > git-rev-list or git-log) would probably have commit message decoded
> > to UTF-8 by git.
> 
> Yes, but the API doesn't use any of those commands internally,
> if that's what you're worried about.

I'm not worried about; I think using git-log would be better (see also 
below)

> > > +sub author { [...]
> > > +	$self->{_AUTHOR()} or ''; }
> > 
> > Nowhere in documentation is mentioned that you use empty value
> > for no author or no committer (isn't commit object invalid then?).
> 
> Yes, I'd believe so.  I basically wanted to make sure that those 
> methods always return a string; do you think that this is a bad idea?

It don't know if it is good or bad idea, but you should have mentioned 
it in the documentation.


[HERE IS MAIN PART OF THIS RESPONSE]

> >> +	if (!defined $raw_text) {
> >> +		# Retrieve from the repository.
> >> +		(my $type, $raw_text) = $self->repo->cat_file($sha1);
> > 
> > The above makes Git::Commit good solution for gitweb's 'commit' and
> > 'commitdiff' views, but bad solution for 'log', 'shortlog', 
> > 'history' and 'rss'/'atom' views, where you would need to many 
> > command invocations, which is very bad on OS with slow fork.
> 
> $repo->cat_file (now renamed to get_object) actually doesn't fork but
> uses a pipe (cat-file --batch); I don't think it should be
> a performance issue.
 
It is (much) better than forking git-cat-file for each commit shown
on the list; nevertheless I think that it would be better to use git-log
to generate list (or Git::Revlist) of Git::Commit objects.  It is one
fork less, but what more important you don't have to access repository
twice for the very same objects.

Let me elaborate: if I understand correctly for log-like views you 
propose to first run simple git-rev-list with appropriate starting 
point and limiters (--skip, --max-count, -- <pathname>), perhaps using 
'--parents' option to get parents in simplified/rewritten history,
which would traverse history getting commit objects, but outputting
only fragments of info, then feed list of revisions (perhaps via cache,
i.e. excluding objects which are in cache) to 'git cat-file --batch'
open two-directional pipeline.

What I propose instead is to provide alternate method to fully 
instantiate Git::Commit object (in addition to ->_load), which would 
fill fields by parsing git-log / git-rev-list --headers output
(what gitweb currently does in parse_commits).

On the other hand... "git cat-file --batch" should have commits to be
accessed in filesystem cache, which means in memory; but it is possible
that they wouldn't be in cache because of I/O pressure (git-rev-list and 
git-cat-file are separate processes).  And checking if object is in
cache can be simpler... if less effective.  If you generate Git::Commit 
objects via parsing git-log / git-rev-list output, then you can limit
history further by excluding starting points from cache.

[END OF MAIN PART]

> > > +use 5.006002;
> > 
> > Why is this "use 5.006002" for?
> 
> It signifies that this module won't run with Perl <5.6.2.  I've had to
> bump it to 5.008 (Perl 5.8); more about that in the message announcing
> the next version of the patch series.

I was not asking what this mean, but why do you need to set up lower
bound on Perl version.  What feature pre 5.6.2 Perl lacks...

Requiring 5.8 is bad.  What feature pre 5.8 Perl lacks, that you
absolutely cannot go without it? There will be complaints.

> > > +=item 'git_binary'
> > > +The name or full path of the git binary (default: 'git').
> > 
> > Probably should be Git::Cmd or Git object, instead.
> 
> I don't think something Git::Cmd is a good idea (as I pointed out in 
> my reply to Petr, <487BD0F3.2060508@gmail.com>), or at least it
> shouldn't be implemented as part of this patch series.  This method
> is really just supposed to return an argument for exec*p, nothing
> more. 

I think that _not using_ Git::Cmd (or somesuch) API results in botched,
horrible API like (in the 3/3 patch in this series):

  our $git_version = $repo_root->repo(directory => 'dummy')->version;

Aaaaaargh! My eyes!

  out $git_version = Git->version;
 
(Unless it is not needed any longer, or not used any longer; if it is
so, then perhaps implementing Git::Cmd as generic wrapper around git
commands, hiding for example ActivePerl hack, could be left for later).

> > > +=item $repo->cmd_output(%opts)
> > 
> > Please do remember that there are git commands which do not need
> > access to git repository,
> 
> As I wrote in my reply to Petr, Git::Repo is not trying to be a 
> wrapper around git binaries, so this method really shouldn't be part
> of the official API -- it's just auxiliary; I'll underscore-prefix it.

Just a question: was this reply only to him, or to all?

I think that $repo->cmd_output(%opts) is a great shortcut for invoking
Git->cmd_output with '--git-dir=<repo>' added automatically.  So it
should be left, but perhaps under different implementation.

> > > +# To do: According to Git.pm, this might not work with ActiveState
> > > +# Perl on Win 32.  Need to check or wait for reports.
> > 
> > Why not copy code from Git.pm, then?
> 
> Apart from the fact that I don't do cargo-cult programming? ;-)   
> Git.pm forks, whereas Git::Repo uses open, '-|', so it's actually different
> (and it's not possible to copy the code).

Actually magic open, '-|' _does_ forks, only implicitely. So Git.pm
does generate the same or almost the same code, but it work (around)
with ActiveState Perl.

> > I think I'd rather allow extended SHA1 syntax in Git::Commit
> > and Git::Tag constructors; it is one call to git command less
> > (I think).
> 
> I wouldn't -- see my blurb about error handling at the top of my reply
> to Petr (<487BD0F3.2060508@gmail.com>).  You're not supposed to pass
> anything that you didn't get from get_sha1 into Git::Commit or
> Git::Tag constructors, or your error handling is invariably broken.

O.K.

I can understand this simpler, although less than optimal, and geared
mainly towards gitweb needs.

> > > +	my ($in, $out) = $self->get_bidi_pipe(
> > > +		cmd => ['cat-file','--batch-check'], reuse => 1);
> > 
> > Ahhh... here I can see what 'reuse => 1' means, and when it is 
> > useful. But doesn't it make sense _only_ for _bi-directional pipe_?
> > Are you sure that you wouldn't get deadlock?
> 
> Yes to both questions. :-)

Errr... "yes" to first question means that 'reuse' option makes sense
_only_ for get_bidi_pipe? If so, why it is present in other commands?


> > By the way, for gitweb you would need (for performance and for
> > rewritten parents) also get_log / get_commits / get_commits_list
> 
> No. ;-)  Doing fine without those.

See above.
 
> > > +=item $repo->name_rev($committish_sha1, $tags_only = 0)
> > 
> > Why name_rev, and no describe?
> 
> Feel free to add it. ;-)  (It might take some work to come up with a
> decent interface for that method.)

Why do you _need_ name_rev, if you are not to include git-describe
equivalent.


> > > +Return the tagger string of this tag object.
> > 
> > We would probably want some way to extract name, email, epoch/date
> > (and a way to convert epoch+timezone to RFC or ISO format), 
> > timezone.
> 
> Yeah.  At some point. ;-)

See above, in comment about Git::Commit.

> > Should (for completeness) Git::Tag provide $tag->validate() method?
> 
> No, since 'validate' sounds like it would have to do error handling.
> 
> If you mean that this should check if the object exists (and has the
> advertised type), the user of the API should test for "defined
> $tag->repo->get_sha1($tag->object)" or somesuch and do error handling
> themselves.

I meant here equivalent of "git tag -v <tag>"

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-17 23:49       ` Jakub Narebski
@ 2008-07-18 13:40         ` Lea Wiemann
  2008-07-18 15:35           ` Jakub Narebski
  0 siblings, 1 reply; 63+ messages in thread
From: Lea Wiemann @ 2008-07-18 13:40 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John Hawley, Petr Baudis

Jakub Narebski wrote:
> [$commit->author:] We would probably want _in the future_ to  return some
> object wrapper, which stringifies to value of author and committer headers

Yup, good idea.  They'll even stay strings, they'll just be blessed.

>>>> +=item $commit->message
> 
> I'd rather then have _git_ convert it to UTF=8 for us (using 
> --encoding=<encoding> option to git-log/git-rev-list)

Yeah, I guess the API should actually decode it.  You wouldn't want to
have the message in UTF-8 but in Unicode (I suggest you read man
perlunitut if you haven't done so).  We cannot have git do the decoding,
since (apart from the fact that it doesn't smell right) it isn't
guaranteed to emit valid UTF-8 (thanks Junio for the pointer):

Lea Wiemann: Does anyone know off the top of their heads how git handles
    character decoding errors in commands like git log? [...]
Junio 'gitster' Hamano: silently punt and show the original unmolested.
Junio 'gitster' Hamano: cf. pretty.c:pretty_print_commit()

So we're not guaranteed to be able to, in turn, decode git's output into
actual characters since it might just be byte soup.

Hence, how about this fallback strategy:

1. Decode according to the encoding header.
2. Decode as UTF-8 (passing through byte soup is often equivalent to
decoding UTF-8 since many terminals use UTF-8, and trying UTF-8 is
reasonably safe).
3. Decode as Latin1.

(Not that the fallbacks will matter a lot in practice, I think.)

> It is (much) better than forking git-cat-file for each commit shown
> on the list; nevertheless I think that it would be better to use git-log
> to generate list (or Git::Revlist) of Git::Commit objects.  It is one
> fork less, but what more important you don't have to access repository
> twice for the very same objects.

You're confused; it's not one fork less, it's a write to a pipe less.
(Pleeeease look at the code before you write something.  It's there, in
this very thread.)  And I don't believe the "access the repository
twice" thing is anywhere near an actual issue.  To summarize, you're
asking me to (a) write code and (presumably) (b) add something to the
interface of a public API, based on some (most probably faulty)
assumptions about performance?  You should really read
<http://c2.com/cgi/wiki?PrematureOptimization>.

> if I understand correctly for log-like views you 
> propose to first run simple git-rev-list [...], then feed list of
> revisions (perhaps via cache, i.e. excluding objects which are in
> cache) to 'git cat-file --batch' open two-directional pipeline.

Yup, it's an option, though currently it's a single cached call to git
log (or git rev-list).

> What I propose instead is to provide alternate method to fully 
> instantiate Git::Commit object (in addition to ->_load), which would 
> fill fields by parsing git-log / git-rev-list --headers output

Yes, but this would need a method in the API, it's not an optimization
that falls out for free.  Cluttering an API for some obscure (= very
doubtful) optimization?  Bad Idea.(tm)

> "git cat-file --batch" should have commits to be
> accessed in filesystem cache, which means in memory; but it is possible
> that they wouldn't be in cache because of I/O pressure

No.  Page cache turnover time is at least around 10 seconds (and that's
under fairly artificial conditions), definitely not in the millisecond
range.

> I think that _not using_ Git::Cmd (or somesuch) API results in botched,
> horrible API
>   our $git_version = $repo_root->repo(directory => 'dummy')->version;
> (Unless it is not needed any longer, or not used any longer; if it is
> so, then perhaps implementing Git::Cmd as generic wrapper around git
> commands, hiding for example ActivePerl hack, could be left for later).

It isn't used any longer -- I really suggest you read the whole thread
before replying. ;-)

>> As I wrote in my reply to Petr [...]
> 
> Just a question: was this reply only to him, or to all?

To all, otherwise I wouldn't have Cc'ed the list.

>> I wouldn't -- see my blurb about error handling at the top of my reply
>> to Petr (<487BD0F3.2060508@gmail.com>).  You're not supposed to pass
>> anything that you didn't get from get_sha1 into Git::Commit or
>> Git::Tag constructors, or your error handling is invariably broken.
> 
> I can understand this simpler, although less than optimal, and geared
> mainly towards gitweb needs.

FTR, yes it is simpler, but no, it is not really geared toward gitweb
needs, and it's definitely not "less than optimal" in the sense of being
worse than the exception-based error handling Git.pm does.  Trust on me
on this one. ;-)

> Errr... "yes" to first question means that 'reuse' option makes sense
> _only_ for get_bidi_pipe? If so, why it is present in other commands?

Yes, and no, it isn't present in other commands.  (Hey, could you please
check the code before posting?  Really.)

>>> Why name_rev, and no describe?
>> Feel free to add it. ;-)  (It might take some work to come up with a
>> decent interface for that method.)
> 
> Why do you _need_ name_rev, if you are not to include git-describe
> equivalent.

I needed it for gitweb.  As I said, I'm not trying to create a complete
API.  A describe_rev (or so) method can be added later, if and when it's
needed.  (As I said, I don't think writing APIs without at least one use
case is a good idea anyway.)

>>> Should (for completeness) Git::Tag provide $tag->validate() method?
> 
> I meant here equivalent of "git tag -v <tag>"

I guess it could be added.  As with describe_rev, I won't add it myself,
in particular not as part of this patch series.

-- Lea

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-18 13:40         ` Lea Wiemann
@ 2008-07-18 15:35           ` Jakub Narebski
  2008-07-18 16:51             ` Lea Wiemann
  0 siblings, 1 reply; 63+ messages in thread
From: Jakub Narebski @ 2008-07-18 15:35 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Petr Baudis

Lea Wiemann wrote:
> Jakub Narebski wrote:

>>>>> +=item $commit->message
>> 
>> I'd rather then have _git_ convert it to UTF=8 for us (using 
>> --encoding=<encoding> option to git-log/git-rev-list)
> 
> Yeah, I guess the API should actually decode it.  You wouldn't want to
> have the message in UTF-8 but in Unicode (I suggest you read man
> perlunitut if you haven't done so).

You mean perluniintro(1) here, isn't it?

Besides if decoding is done in Perl API, we can convert it simply
to Perl internal form (which, IIUC, in modern Perl is UTF-8 and
marked as such).

> We cannot have git do the decoding, 
> since (apart from the fact that it doesn't smell right) it isn't
> guaranteed to emit valid UTF-8 [...]

Well, if that is the case then Perl API has to do conversion, that
is the only sensible way.

>> It is (much) better than forking git-cat-file for each commit shown
>> on the list; nevertheless I think that it would be better to use git-log
>> to generate list (or Git::Revlist) of Git::Commit objects.  It is one
>> fork less, but what more important you don't have to access repository
>> twice for the very same objects.
> 
> You're confused; it's not one fork less, it's a write to a pipe less.
> (Pleeeease look at the code before you write something.  It's there, in
> this very thread.)  And I don't believe the "access the repository
> twice" thing is anywhere near an actual issue.  To summarize, you're
> asking me to (a) write code and (presumably) (b) add something to the
> interface of a public API, based on some (most probably faulty)
> assumptions about performance?  You should really read
> <http://c2.com/cgi/wiki?PrematureOptimization>.

Code is there, in gitweb, in parse_commits subroutine, or rather in
parse_commit_text subroutine.

[cut]

But I can agree that possible (and possibly minuscule) performance
improvement is not worth introducing new API and complicating (I think)
gitweb code.
 
>> I think that _not using_ Git::Cmd (or somesuch) API results in botched,
>> horrible API
>>   our $git_version = $repo_root->repo(directory => 'dummy')->version;
>> (Unless it is not needed any longer, or not used any longer; if it is
>> so, then perhaps implementing Git::Cmd as generic wrapper around git
>> commands, hiding for example ActivePerl hack, could be left for later).
> 
> It isn't used any longer -- I really suggest you read the whole thread
> before replying. ;-)

O.K.  Still I think that putting cmd_output and other in Git::Repo
is not a good API. I'd rather route calling git commands via Git or
Git::Cmd object (but Git::Repo would have Git/Git::Cmd object which
automatically adds '--git-dir=<path>', and possibly also
'--work-dir=<path>').

By the way, would you prefer if I commented on 3/3 patch as it is now,
taking into account what I remember from discussion on this and 2/3
patch (latter only as relevant), or would you rather I wait for next
round (next version) of patches?
 
>>> I wouldn't -- see my blurb about error handling at the top of my reply
>>> to Petr (<487BD0F3.2060508@gmail.com>).  You're not supposed to pass
>>> anything that you didn't get from get_sha1 into Git::Commit or
>>> Git::Tag constructors, or your error handling is invariably broken.
>> 
>> I can understand this simpler, although less than optimal, and geared
>> mainly towards gitweb needs.
> 
> FTR, yes it is simpler, but no, it is not really geared toward gitweb
> needs, and it's definitely not "less than optimal" in the sense of being
> worse than the exception-based error handling Git.pm does.  Trust on me
> on this one. ;-)

[...]

>> Why do you _need_ name_rev, if you are not to include git-describe
>> equivalent?
> 
> I needed it for gitweb.  As I said, I'm not trying to create a complete
> API.  A describe_rev (or so) method can be added later, if and when it's
> needed.  (As I said, I don't think writing APIs without at least one use
> case is a good idea anyway.)

Errr... I guess I misspoke. I should not say 'geared toward gitweb
needs', as perhaps it is 'created according [at least somewhat] to
what gitweb would need'.
 
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-14 22:19     ` Lea Wiemann
@ 2008-07-18 16:48       ` Petr Baudis
  2008-07-18 17:05         ` Jakub Narebski
                           ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Petr Baudis @ 2008-07-18 16:48 UTC (permalink / raw)
  To: Lea Wiemann, Jakub Narebski; +Cc: git, John Hawley

In order to keep this mail within sensible size, I have trimmed some of
the bits - basically, the default reply is usually ranging from "Fair
enough" to "Great!" :-)

On Tue, Jul 15, 2008 at 12:19:31AM +0200, Lea Wiemann wrote:
> > First, I don't think it's good idea at all to put the pipe-related stuff
> > to Git::Repo - this is botched up API just like the current one.
> 
> Well, they're more like helper methods.  Since they don't fit into the
> design goals of the Git::Repo API at all, I'd suggest we just
> underscore-prefix them and take them out of the man page.  (The only
> reason why I hadn't done this is that gitweb uses $repo->cmd_output
> extensively, so it'd end up with a lot of underscore calls.  But I
> suppose we can either alias _cmd_output to cmd_output in gitweb's
> CachedRepo subclass, or live with $repo->_cmd_output calls.)  Does
> underscore-prefixing sound good to you?

I think this is one problematic point we keep hitting - my opinion is
that we _need_ such a wrapper _publically_, because it tends to be
actually the main use-case of Git.pm, and that this wrapper should be:

	(i) both available standalone for commands like ls-remote

	(ii) and available as part of Git::Repo instance, to have the
	right arguments passed to Git automagically

So you define cmd_output() and seem to argue that this command should
not be called directly and is not interesting for the outside. But the
experience shows that the pipe interface is actually the _most_ used
part of the Git Perl API, and in fact you mentioned that as part of your
gitweb migration to Git::Repo, you will temporarily introduce calls to
_cmd_output(), the "internal" API. :-) Sure, it's only temporary, but
many won't have the luxury to adjust the Git::Repo API to provide all
the operations they need, and ultimately they will need to defer to the
pipe interface.

> > It should be actually very easy to start with moving all the pipe
> > functionality to Git::Command.
> 
> Creating a new (Git::Command) API is very much non-trivial, apart from
> the fact that I'm not convinced that we need Git::Command, and that a
> clean command interface neither falls out of Git.pm nor Git::Repo.

As I said, majority of Git API usage is actually the pipe API. So we
should figure out how to provide it. I agree that it's not immediately
within your scope, but you are introducing new Perl API and this just
needs to be embedded somewhere there consistently.

> >> [Git::Commit->new, Git::Tag->new:]
> >> +Calls to this method are free, since it does not check whether $sha1
> >> +exists and has the right type.  However, accessing any of the commit
> >> +object's properties will fail if $sha1 is not a valid commit object.
> > 
> > This is nice idea, but I'd also provide a well-defined way for the user
> > to verify the object's validity at a good moment; basically, make load()
> > a public method. The user can deal with errors then and rely on
> > error-free behavior later.
> 
> No, you should never pass in an invalid SHA1 in the first place.  The
> above piece of documentation is just a warning that bugs will show up
> delayed.  IOW, this is not the right place to have your error handling.
> 
> If you're getting a SHA1 through the user-interface, check its existence
> with get_sha1 before passing it to the constructor.

But that's an expensive operation, you need extra Git exec for this,
while all the Git commands can do the checks for you, if you give them
the chance.

I was doing pretty much this thing in Cogito (initially out of
necessity) and it made it ungodly slow for any kind of batch operations.

> >> +Note that $sha1 must be the SHA1 of a commit object; tag objects are
> >> +not dereferenced.
> > 
> > Why not?
> 
> Because the SHA1 might resolve to an object of the wrong type, which
> means you have to do error handling in Git::Object objects; that's the
> wrong place.
> 
> If tag-resolving is really needed, we can add an optional $type
> parameter to get_sha1, which will cause get_sha1 to resolve the object
> until a $type object is found, or return undef if the object is or
> resolves to an object of the wrong type.

See above why I think you should reconsider requiring the explicit
"resolving" step.

> I have resolving code in gitweb's git_get_sha1_or_die (which I didn't
> implement in Git::Repo since it uses some customized error reporting).
> The resolving code could conceivably be extracted and moved to get_sha1.
>  I think there are a few things to ponder and maybe discuss, so I'd do
> that in a separate patch (if I get around it before the end of the project).

The thing that concerns me about this is that this might show that your
approach to error handling is not flexible enough for some real-world
usage and this might be a design mistake - is that not so? I didn't look
at the code.

> >> [Snipped a lot of quoting --LW]
> >> +=item $repo->repo_dir
> >> +=item $repo->git_binary
> >> +=item $repo->version
> >> +sub _get_git_cmd {
> > 
> > This definitely does not belong to a Git::Repo object.
> 
> Which of those methods are you referring to?  I think $repo->version
> might reasonably be removed (and the code re-added to gitweb); I'll do
> so unless you object.  _get_git_cmd is already underscored, and repo_dir
> and git_binary only access attributes passed in through the constructor,
> so I think those three should stay.

Sorry, you're right about repo_dir and possibly git_binary. My main
concern was about the command pipe handling itself, but I elaborated on
that above already.

> >> +=item $repo->get_refs
> >> +=item $repo->get_refs($pattern)
> > 
> > Again, the refs should be properly integrated into the object structure.
> 
> Really?  I think it's generally fine for get_refs to exist and to live
> in Git::Repo.
> 
> Its return value (currently an an arrayref of [$sha1, $object_type,
> $ref_name] arrayrefs) might need improvement though, and I find the
> $pattern parameter pretty suspect (in that it smells like a for-each-ref
> wrapper).  Since get_refs is unused at the moment (gitweb ended up
> needing the slightly different show-ref), I'll remove it for now.  (Same
> thing about me not being a fan of premature API design applies.)

Just a note, the thing is that you might want to add some methods for
inspecting and mutating the refs, and at the same time this is not a
repository-specific concept, but you can get the same structure from
git ls-remote call. That's why I think it would make sense to make
a separate object out of it. But that's moot point now that the API
won't be there yet.

-- 
				Petr "Pasky" Baudis
As in certain cults it is possible to kill a process if you know
its true name.  -- Ken Thompson and Dennis M. Ritchie

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-18 15:35           ` Jakub Narebski
@ 2008-07-18 16:51             ` Lea Wiemann
  0 siblings, 0 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-07-18 16:51 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John Hawley, Petr Baudis

Jakub Narebski wrote:
> Besides if decoding is done in Perl API, we can convert it simply
> to Perl internal form (which, IIUC, in modern Perl is UTF-8 and
> marked as such).

FWIW, Perl strings actually contain non-encoded Unicode code points.
IOW, they're not byte strings.

> Still I think that putting cmd_output and other in Git::Repo
> is not a good API.

Yup; that's why I'm underscore-prefixing it (and taking it out of the
man page) for the next version of this patch.

> By the way, would you prefer if I commented on 3/3 patch as it is now,
> [...] or would you rather I wait for next round (next version) of patches?

I suggest you wait for the next version of the patch series, which I'll
post in a few hours.  There are quite a few changes; I'll list them in
the parent message for the series.

-- Lea

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-14 23:41     ` [PATCH 2/3] add new Git::Repo API Jakub Narebski
  2008-07-15  0:11       ` Lea Wiemann
@ 2008-07-18 16:54       ` Petr Baudis
  2008-07-19  0:03         ` Jakub Narebski
  2008-07-19 19:07         ` Jakub Narebski
  1 sibling, 2 replies; 63+ messages in thread
From: Petr Baudis @ 2008-07-18 16:54 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Lea Wiemann, git, John Hawley

On Tue, Jul 15, 2008 at 01:41:38AM +0200, Jakub Narebski wrote:
> On Mon, 14 July 2008, Petr Baudis wrote:
> > Here is an idea: Introduce Git::Command object that will have very
> > general interface and look like
> > 
> > 	my $c = Git::Command->new(['git', '--git-dir=.', 'cat-file', \
> > 		'-p', 'bla'], {pipe_out=>1})
> > 	...
> > 	$c->close();
> 
> Errr... how do you read from such a pipe?  <$c> I think wouldn't work,
> unless you would use some trickery...

That's good point; it might either be done using some trickery, or
$c->pipe. The idea behind having a special object for it though is to
have *unified* (no matter how simple) error handling. You might not
detect the command erroring out at the open time.

Is there a better approach for solving this?

> > and a Git::CommandFactory with a nicer interface that would look like
> > 
> > 	my $cf = Git::CommandFactory->new('git', '--git-dir=.');
> > 	my $c = $cf->output_pipe('cat-file', '-p', 'bla');
> > 	$c->close();
> > 
> > Then, Git::Repo would have a single Git::CommandFactory instance
> > pre-initialized with the required calling convention, and returned by
> > e.g. cmd() method. Then, from the user POV, you would just:
> > 
> > 	my $repo = Git::Repo->new;
> > 	$repo->cmd->output_pipe('cat-file', '-p', 'bla');
> > 
> > Or am I overdoing it?
> 
> You are probably overdoing it.
> 
> 
> I think it would be good to have the following interface
> 
> Git->output_pipe('ls-remotes', $URL, '--heads');

This is problematic; I think mixing the new and old interface within a
single class is very bad idea, we should have Git::Standalone or
something for this. Or, just, default Git::CommandFactory. ;-)

> [...]
> $r = Git::Repo->new(<git_dir>);
> $r->output_pipe('ls_tree', 'HEAD');
> [...]
> $nb = Git::Repo::NonBare->new(<git_dir>[, <working_area>]);
> $nb->output_pipe('ls-files');
> 
> 
> How can it be done with minimal effort, unfortunately I don't know...

Well, this interface is almost identical to what I delineated, except
that I have the extra ->cmd-> step there. But maybe, we could go with
your API and instead have Git::CommandFactory as a base of Git::Repo?
The hierarchy would be

	Git::CommandFactory - provides the cmd_pipe toolkit
		|
	    Git::Repo       - provides repository model
		|
	Git::Repo::NonBare  - additional working-copy-related methods

I think I will post a sample implementation sometime over the weekend.

> > Another thing is clearly describing how error handling is going to work.
> > I have not much against ditching Error.pm, but just saying "die + eval"
> > does not cut it - how about possible sideband data? E.g. the failure
> > mode of Git.pm's command() method includes passing the error'd command
> > output in the exception object. How are we going to handle it? Now, it
> > might be actually okay to say that we _aren't_ going to handle this if
> > it is deemed unuseful, but that needs to be determined too. I don't know
> > off the top of my head.
> 
> I think that the solution might be some output_pipe option on how to
> treat command exit status, command STDERR, and errors when invoking
> command (for example command not found).
> 
> Mentioned http://http://www.perl.com/pub/a/2002/11/14/exception.html
> explains why one might want to use Error.pm.

The arguments against its usage that popped up over the year(s?):

	(i) It is not standard practice in the Perl world

	(ii) It is syntactically ambiguous, c.f. Lea's report about
	the missing semicolon

	(iii) The usage of closures in this way has inherent memory leak
	issues

-- 
				Petr "Pasky" Baudis
As in certain cults it is possible to kill a process if you know
its true name.  -- Ken Thompson and Dennis M. Ritchie

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-18 16:48       ` Petr Baudis
@ 2008-07-18 17:05         ` Jakub Narebski
  2008-07-18 17:17           ` Petr Baudis
  2008-07-18 18:09         ` Lea Wiemann
  2008-07-19 20:54         ` Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API) Jakub Narebski
  2 siblings, 1 reply; 63+ messages in thread
From: Jakub Narebski @ 2008-07-18 17:05 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Lea Wiemann, git, John Hawley

Dnia piątek 18. lipca 2008 18:48, Petr Baudis napisał:
> On Tue, Jul 15, 2008 at 12:19:31AM +0200, Lea Wiemann wrote:

> > No, you should never pass in an invalid SHA1 in the first place.  The
> > above piece of documentation is just a warning that bugs will show up
> > delayed.  IOW, this is not the right place to have your error handling.
> > 
> > If you're getting a SHA1 through the user-interface, check its existence
> > with get_sha1 before passing it to the constructor.
> 
> But that's an expensive operation, you need extra Git exec for this,
> while all the Git commands can do the checks for you, if you give them
> the chance.
> 
> I was doing pretty much this thing in Cogito (initially out of
> necessity) and it made it ungodly slow for any kind of batch operations.

Lea probably would point out that thanks to '--batch-check' option
to git-cat-file, and "reuse => 1" option to ->get_bidi_pipe you would
need only one extra Git exec...  BUT it is one extra Git exec per
entire Perl script (one per request, for example, for gitweb).
And it wouldn't help batching Perl script commands.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-18 17:05         ` Jakub Narebski
@ 2008-07-18 17:17           ` Petr Baudis
  0 siblings, 0 replies; 63+ messages in thread
From: Petr Baudis @ 2008-07-18 17:17 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Lea Wiemann, git, John Hawley

On Fri, Jul 18, 2008 at 07:05:05PM +0200, Jakub Narebski wrote:
> Dnia piątek 18. lipca 2008 18:48, Petr Baudis napisał:
> > On Tue, Jul 15, 2008 at 12:19:31AM +0200, Lea Wiemann wrote:
> 
> > > No, you should never pass in an invalid SHA1 in the first place.  The
> > > above piece of documentation is just a warning that bugs will show up
> > > delayed.  IOW, this is not the right place to have your error handling.
> > > 
> > > If you're getting a SHA1 through the user-interface, check its existence
> > > with get_sha1 before passing it to the constructor.
> > 
> > But that's an expensive operation, you need extra Git exec for this,
> > while all the Git commands can do the checks for you, if you give them
> > the chance.
> > 
> > I was doing pretty much this thing in Cogito (initially out of
> > necessity) and it made it ungodly slow for any kind of batch operations.
> 
> Lea probably would point out that thanks to '--batch-check' option
> to git-cat-file, and "reuse => 1" option to ->get_bidi_pipe you would
> need only one extra Git exec...  BUT it is one extra Git exec per
> entire Perl script (one per request, for example, for gitweb).
> And it wouldn't help batching Perl script commands.

Even so, I don't like this restriction simply since it makes the usage
more complicated - in my commandline tool, I will be forced to write

	my $csha1 = $repo->get_sha1($ARGV[0]);
	$csha1 or die "a random inconsistent error message";
	$repo->get_commit($csha1)

instead of

	$repo->get_commit($ARGV[0]) # die if unresolved

and I don't understand why. Me and 80% of the scripts don't *care* about
some more graceful error handling, and if gitweb and the other 20%
(or less) do, *they* should do the extra work, not me.

-- 
				Petr "Pasky" Baudis
As in certain cults it is possible to kill a process if you know
its true name.  -- Ken Thompson and Dennis M. Ritchie

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-18 16:48       ` Petr Baudis
  2008-07-18 17:05         ` Jakub Narebski
@ 2008-07-18 18:09         ` Lea Wiemann
  2008-07-18 18:19           ` Petr Baudis
  2008-07-18 18:23           ` Johannes Schindelin
  2008-07-19 20:54         ` Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API) Jakub Narebski
  2 siblings, 2 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-07-18 18:09 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Jakub Narebski, git, John Hawley

Petr Baudis wrote:
> [$repo->_cmd_output:]
> 
> we _need_ such a wrapper _publically_, because it tends to be
> actually the main use-case of Git.pm,

Well, sure, I happen to not be convinced, but it *may* be useful.  The
point I'm trying to make is that it's not part of what I'm writing here.

> as part of your gitweb migration to Git::Repo, you will temporarily
> introduce calls to _cmd_output(), the "internal" API. :-) Sure, it's
> only temporary, but many won't have the luxury to adjust the Git::Repo
> API to provide all the operations they need, and ultimately they will
> need to defer to the pipe interface.

Yup, and I'm actually fine with that.  (I'll probably alias _cmd_output
to cmd_output in gitweb, just to make it clear that it is, for the
purpose of gitweb, a *supported* mode of operation.)  If the
Git::Repo::_cmd_output API goes away, you'll have to insert a few lines
of code in gitweb, but that's it.  Really, no big deal.

Also, gitweb isn't using cmd_output because it needs a pipe interface,
but because it needs a caching layer in between -- most applications
would do just fine with open calls.

> As I said, majority of Git API usage is actually the pipe API. So we
> should figure out how to provide it. I agree that it's not immediately
> within your scope, but you are introducing new Perl API and this just
> needs to be embedded somewhere there consistently.

Sure, but pleeeease not as part of this patch series! :-)  Look, our
conversation is going something like this:

Lea: Here's a Perl API that fell out of my gitweb development for free.
Petr: I want a pony with the API!
Lea: But I don't have a pony.  Can we please just go with the Perl API
as a start, even if I don't supply ponies with it?

(Cf. the very cute <http://c2.com/cgi/wiki?IwantaPony>.)

>> If you're getting a SHA1 through the user-interface, check its existence
>> with get_sha1 before passing it to the constructor.
> 
> But that's an expensive operation, you need extra Git exec for this,

For the gazillionth time in this thread, there is no extra exec.  It's a
write to a bidirectional cat-file --batch-check pipe.  It's not
expensive.  Really. ;-)

>> I have resolving code in gitweb's git_get_sha1_or_die
> 
> The thing that concerns me about this is that this might show that your
> approach to error handling is not flexible enough for some real-world
> usage and this might be a design mistake - is that not so?

I don't think so; the error handling is fine.  Given that I want
fine-granular error reporting for gitweb, there *needs* to be a
git_get_sha1_or_die function; you can't move that into the API.

-- Lea

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-18 18:09         ` Lea Wiemann
@ 2008-07-18 18:19           ` Petr Baudis
  2008-07-18 18:23           ` Johannes Schindelin
  1 sibling, 0 replies; 63+ messages in thread
From: Petr Baudis @ 2008-07-18 18:19 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: Jakub Narebski, git, John Hawley

On Fri, Jul 18, 2008 at 08:09:48PM +0200, Lea Wiemann wrote:
> Also, gitweb isn't using cmd_output because it needs a pipe interface,
> but because it needs a caching layer in between -- most applications
> would do just fine with open calls.

One of the points of the API is to abstract these out.

> > As I said, majority of Git API usage is actually the pipe API. So we
> > should figure out how to provide it. I agree that it's not immediately
> > within your scope, but you are introducing new Perl API and this just
> > needs to be embedded somewhere there consistently.
> 
> Sure, but pleeeease not as part of this patch series! :-)  Look, our
> conversation is going something like this:
> 
> Lea: Here's a Perl API that fell out of my gitweb development for free.
> Petr: I want a pony with the API!
> Lea: But I don't have a pony.  Can we please just go with the Perl API
> as a start, even if I don't supply ponies with it?
> 
> (Cf. the very cute <http://c2.com/cgi/wiki?IwantaPony>.)

I'm fine with that, as long as the version that enters into master will
have a pony so that we stay with a single pony within the codebase in
the end, not two ponies with differently shaped saddles.

But as I said, I'm going to work on that.

> >> If you're getting a SHA1 through the user-interface, check its existence
> >> with get_sha1 before passing it to the constructor.
> > 
> > But that's an expensive operation, you need extra Git exec for this,
> 
> For the gazillionth time in this thread, there is no extra exec.  It's a
> write to a bidirectional cat-file --batch-check pipe.  It's not
> expensive.  Really. ;-)

But the API is still obnoxiously elaborate, as I complained in another
mail.

> >> I have resolving code in gitweb's git_get_sha1_or_die
> > 
> > The thing that concerns me about this is that this might show that your
> > approach to error handling is not flexible enough for some real-world
> > usage and this might be a design mistake - is that not so?
> 
> I don't think so; the error handling is fine.  Given that I want
> fine-granular error reporting for gitweb, there *needs* to be a
> git_get_sha1_or_die function; you can't move that into the API.

Wait, this doesn't compute here. The error handling is fine, but it is
actually not fine for gitweb. Can't we make it fine for everyone?

-- 
				Petr "Pasky" Baudis
As in certain cults it is possible to kill a process if you know
its true name.  -- Ken Thompson and Dennis M. Ritchie

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-18 18:09         ` Lea Wiemann
  2008-07-18 18:19           ` Petr Baudis
@ 2008-07-18 18:23           ` Johannes Schindelin
  1 sibling, 0 replies; 63+ messages in thread
From: Johannes Schindelin @ 2008-07-18 18:23 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: Petr Baudis, Jakub Narebski, git, John Hawley

Hi,

On Fri, 18 Jul 2008, Lea Wiemann wrote:

> Lea: Here's a Perl API that fell out of my gitweb development for free.
> Petr: I want a pony with the API!
> Lea: But I don't have a pony.  Can we please just go with the Perl API
> as a start, even if I don't supply ponies with it?

Johannes: *mumbles* no ponies?  *mumbles further* Them cowboy coders keep 
the ponies for themselves, eh? *mumbles and gets something to eat*

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-18 16:54       ` Petr Baudis
@ 2008-07-19  0:03         ` Jakub Narebski
  2008-07-19 19:07         ` Jakub Narebski
  1 sibling, 0 replies; 63+ messages in thread
From: Jakub Narebski @ 2008-07-19  0:03 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Lea Wiemann, git, John Hawley

Petr Baudis wrote:
> Jakub Narebski wrote:

>> [...]
>> $r = Git::Repo->new(<git_dir>);
>> $r->output_pipe('ls_tree', 'HEAD');
>> [...]
>> $nb = Git::Repo::NonBare->new(<git_dir>[, <working_area>]);
>> $nb->output_pipe('ls-files');
>> 
>> 
>> How can it be done with minimal effort, unfortunately I don't know...
> 
> Well, this interface is almost identical to what I delineated, except
> that I have the extra ->cmd-> step there. But maybe, we could go with
> your API and instead have Git::CommandFactory as a base of Git::Repo?
> The hierarchy would be
> 
>         Git::CommandFactory - provides the cmd_pipe toolkit
>                 |
>             Git::Repo       - provides repository model
>                 |
>         Git::Repo::NonBare  - additional working-copy-related methods
> 
> I think I will post a sample implementation sometime over the weekend.

What I would want from the solution is an easy way to set up path
to git binary for _all_ git command calls, and for invoking command
which does not need git repository to run (like git-ls-remote).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 0/3] Git::Repo API and gitweb caching
  2008-07-11  1:06 [PATCH 0/3] Git::Repo API and gitweb caching Lea Wiemann
                   ` (4 preceding siblings ...)
  2008-07-11  9:33 ` Jakub Narebski
@ 2008-07-19  5:35 ` Lea Wiemann
  2008-08-18 19:34 ` Lea Wiemann
  6 siblings, 0 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-07-19  5:35 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: Git Mailing List, John Hawley, Jakub Narebski, Petr Baudis

Lea Wiemann wrote:
> 1) adding the Mechanize tests,
> 2) adding the Git::Repo API, and (the important part:)
> 3) making gitweb use the Git::Repo API, and adding caching to gitweb.

I unfortunately didn't have enough time to finish the update to the
patch series before my vacation; I'll therefore be posting the next
series only around Aug 12th-13th.  I suggest you wait with any reviewing
till I've posted the next version, since there'll be quite a few
changes; also I'll post patch 3 (the gitweb patch) as (at least) two
separate patches, which will hopefully help with reviewability.

Alright, off to my vacation now -- I'll be completely offline till Aug 10th.

See you then!

-- Lea

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-18 16:54       ` Petr Baudis
  2008-07-19  0:03         ` Jakub Narebski
@ 2008-07-19 19:07         ` Jakub Narebski
  2008-07-20 21:36           ` Petr Baudis
  1 sibling, 1 reply; 63+ messages in thread
From: Jakub Narebski @ 2008-07-19 19:07 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Lea Wiemann, git, John Hawley

On Fri, 18 July 2008, Petr Baudis wrote:
> On Tue, Jul 15, 2008 at 01:41:38AM +0200, Jakub Narebski wrote:
> > On Mon, 14 July 2008, Petr Baudis wrote:
> > > Here is an idea: Introduce Git::Command object that will have very
> > > general interface and look like
> > > 
> > > 	my $c = Git::Command->new(['git', '--git-dir=.', 'cat-file', \
> > > 		'-p', 'bla'], {pipe_out=>1})
> > > 	...
> > > 	$c->close();
> > 
> > Errr... how do you read from such a pipe?  <$c> I think wouldn't work,
> > unless you would use some trickery...
> 
> That's good point; it might either be done using some trickery, or
> $c->pipe. The idea behind having a special object for it though is to
> have *unified* (no matter how simple) error handling. You might not
> detect the command erroring out at the open time.
> 
> Is there a better approach for solving this?

I don't know if it is _better_ approach, but the _alternate_ approach
would be to use:

 	my $c = Git::Command->new(['git', '--git-dir=.', 'cat-file', \
 		'-p', 'bla'], {out=>my $fh, err=>undef})
	... 	
	while (my $line = <$fh>) {
	...
 	$c->close();

And trickery would be to use blessed filehandle, or what?  Or perhaps
extending IO::Handle (but not all like using object methods for I/O
handles)?

> > > and a Git::CommandFactory with a nicer interface that would look like
> > > 
> > > 	my $cf = Git::CommandFactory->new('git', '--git-dir=.');
> > > 	my $c = $cf->output_pipe('cat-file', '-p', 'bla');
> > > 	$c->close();
> > > 
> > > Then, Git::Repo would have a single Git::CommandFactory instance
> > > pre-initialized with the required calling convention, and returned by
> > > e.g. cmd() method. Then, from the user POV, you would just:
> > > 
> > > 	my $repo = Git::Repo->new;
> > > 	$repo->cmd->output_pipe('cat-file', '-p', 'bla');
> > > 
> > > Or am I overdoing it?
> > 
> > You are probably overdoing it.
> > 
> > I think it would be good to have the following interface
> > 
> > Git->output_pipe('ls-remotes', $URL, '--heads');
> 
> This is problematic; I think mixing the new and old interface within a
> single class is very bad idea, we should have Git::Standalone or
> something for this. Or, just, default Git::CommandFactory. ;-)

I forgot that we cannot obsolete / replace old interface.  Nevertheless
it would be nice to be able to use for example

	Git::Cmd->output_pipe('ls-remotes', $URL, '--heads');

but also

	output_pipe('myscript.sh', <arg1>, <arg2>);

See also below for alternative interfaces to Git::Cmd->output_pipe();

> > [...]
> > $r = Git::Repo->new(<git_dir>);
> > $r->output_pipe('ls_tree', 'HEAD');
> > [...]
> > $nb = Git::Repo::NonBare->new(<git_dir>[, <working_area>]);
> > $nb->output_pipe('ls-files');
> > 
> > 
> > How can it be done with minimal effort, unfortunately I don't know...
> 
> Well, this interface is almost identical to what I delineated, except
> that I have the extra ->cmd-> step there. But maybe, we could go with
> your API and instead have Git::CommandFactory as a base of Git::Repo?
> The hierarchy would be
> 
> 	Git::CommandFactory - provides the cmd_pipe toolkit
> 		|
> 	    Git::Repo       - provides repository model
> 		|
> 	Git::Repo::NonBare  - additional working-copy-related methods
> 
> I think I will post a sample implementation sometime over the weekend.

Thanks.

I think this is a very good idea.  Although... you mix somewhat here
relationships.  Relationship between Git::CommandFactory (Git::Cmd?)
is a bit different than relationship between Git::Repo and
Git::Repo::NonBare.  Git::Repo::NonBare is a case of Git::Repo which
additionally knows where its working copy (Git::WC?) is, and where
inside working copy we are (if we are inside working copy).  Git::Repo
uses Git::CommandFactory to route calls to git commands, and to
provide default '--git-dir=<repo_path>' argument.


What I'd like to have is a way to easily set in _one_ place where git
binary can be found, even if we are using different repositories, call
git commands not related to git repository.

Should we use

	Git::Cmd->output_pipe('ls-remotes', $URL, '--heads');
or
	output_pipe(GIT, 'ls-remotes', $URL, '--heads');
or
	output_pipe($GIT, 'ls-remotes', $URL, '--heads');
or
	output_pipe($Git::GIT, 'ls-remotes', $URL, '--heads');

we would want to be able to set where git binary is once (and for all),
for example via

	Git::Cmd->set_git('/usr/local/bin/git');

or something like that.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API)
  2008-07-18 16:48       ` Petr Baudis
  2008-07-18 17:05         ` Jakub Narebski
  2008-07-18 18:09         ` Lea Wiemann
@ 2008-07-19 20:54         ` Jakub Narebski
  2008-07-19 21:14           ` Petr Baudis
  2008-07-20 10:38           ` Johannes Schindelin
  2 siblings, 2 replies; 63+ messages in thread
From: Jakub Narebski @ 2008-07-19 20:54 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Lea Wiemann, git, John Hawley

On Fri, 18 July 2008, Petr Baudis wrote:

> But the experience shows that the pipe interface is actually
> the _most_ used part of the Git Perl API.

This made me curious which parts of Git.pm, the current Git Perl API,
are used most.

Here is some statistics on the usage of Git Perl API among built-in
and contrib commands.


>From git commands in Perl the following include "use Git" in their 
sources: git-cvsexportcommit, git-send-email, git-svn, and helper
script git-add--interactive.  There are Perl scripts which do not
use Git.pm: git-archimport (which I think should be obsoleted or
moved to contrib), git-cvsimport, git-cvsserver, git-relink.  This
means that half of Perl scripts use Git Perl API.

The situation is worse for scripts in 'contrib/'.  From those, only
contrib/examples/git-remote.perl uses Git.pm; neither blameview,
continuous, git-import and import-tars in fast-import, setgitperms
and update-paranoid in hooks, stats, nor other Perl scripts in
examples (git-rerere, git-svnimport) include "use Git".

Below there are stats on how different commands from Git.pm are
used in mentioned Perl scripts:

1. git-add--interactive.perl uses Git.pm mainly to access color
   configuration: 2 ->get_colorbool, 8 ->get_color, 1 ->config.
   It also uses once ->repository() constuctor, and once
   ->repo_path().

   This means that future Git::Config _has_ to have support for
   color or colorbool valued configuration values.

2. git-cvsexportcommit.perl uses (besides ->repository() constructor)
   only once ->config; so I guess that current interface wrapping
   git-config should stay, because parsing whole config for such
   situation would be overkill.

   What is strange that git-cvsexportcommit.perl defines its own
   safe_pipe_capture and xargs_safe_pipe_capture (!), and uses
   them, open pipeline, backticks, and system() to call git commands.
   
3. git-send-email.perl uses 5 config, 2 config_bool, 2 ident_person
   (for author and for committer), 1 version, and of course once
   ->repository() constructor.  

   Here we can see how to work around current API to: it uses
   Git::config(@repo, "sendemail.identity") form, where 
   	my $repo = eval { Git->repository() };
   	my @repo = $repo ? ($repo) : ();
   to make it work both with git repository (using repo config), and
   outside/without git repository, using only user and system git
   config.

4. git-svn.perl (which is if I checked correctly third largest scripted
   git command, after gitk-gui/gitk-wish and gitweb/gitweb.perl) uses:

     3  x repository
     28 x command_oneline
     19 x command_noisy
     18 x command
     14 x command_output_pipe
     1  x command_input_pipe
     1  x git_try_cmd  (and many "eval { command()/command_oneline() }")
     2  x cat_blob
     1  x hash_and_insert_object
     1  x get_colorbool

   (The above are all if I have not made mistake when counting commands)

   Side note: git-svn is command which would get most out of
   Git::Config, as it currently uses combination of "git config -l"
   and "git config --get" (I guess that the code in question predates
   machine-parseable "git config -l -z").

Note that all those statictics doesn't count how many times some
Git.pm method was called, only how many times it occurs in the code.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API)
  2008-07-19 20:54         ` Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API) Jakub Narebski
@ 2008-07-19 21:14           ` Petr Baudis
  2008-07-20  0:16             ` Jakub Narebski
  2008-07-20 10:38           ` Johannes Schindelin
  1 sibling, 1 reply; 63+ messages in thread
From: Petr Baudis @ 2008-07-19 21:14 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Lea Wiemann, git, John Hawley

On Sat, Jul 19, 2008 at 10:54:24PM +0200, Jakub Narebski wrote:
> On Fri, 18 July 2008, Petr Baudis wrote:
> 
> > But the experience shows that the pipe interface is actually
> > the _most_ used part of the Git Perl API.

  Note that I referred both to scripts that are part of Git and
(actually more) to various third-party scripts I wrote externally.

> Here is some statistics on the usage of Git Perl API among built-in
> and contrib commands.

  Thanks!

> The situation is worse for scripts in 'contrib/'.  From those, only
> contrib/examples/git-remote.perl uses Git.pm; neither blameview,
> continuous, git-import and import-tars in fast-import, setgitperms
> and update-paranoid in hooks, stats, nor other Perl scripts in
> examples (git-rerere, git-svnimport) include "use Git".

  I have actually once converted blameview to use Git locally, but in
the end never got around to submit it; I wonder if I have the source
still around somewhere. Not so important, I guess.

> Below there are stats on how different commands from Git.pm are
> used in mentioned Perl scripts:
> 
> 2. git-cvsexportcommit.perl uses (besides ->repository() constructor)
>    only once ->config; so I guess that current interface wrapping
>    git-config should stay, because parsing whole config for such
>    situation would be overkill.

  I don't understand. Parsing whole config happen will either happen in
git-config or in Git::Config, and the performance difference is so tiny
that it is not really worth the API complexity, I believe.

> 3. git-send-email.perl uses 5 config, 2 config_bool, 2 ident_person
>    (for author and for committer), 1 version, and of course once
>    ->repository() constructor.  
> 
>    Here we can see how to work around current API to: it uses
>    Git::config(@repo, "sendemail.identity") form, where 
>    	my $repo = eval { Git->repository() };
>    	my @repo = $repo ? ($repo) : ();
>    to make it work both with git repository (using repo config), and
>    outside/without git repository, using only user and system git
>    config.

  With the envisioned model, it could use $git which would be either
a reference to a Git::Standalone singleton or Git::Repo instance.

-- 
				Petr "Pasky" Baudis
As in certain cults it is possible to kill a process if you know
its true name.  -- Ken Thompson and Dennis M. Ritchie

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API)
  2008-07-19 21:14           ` Petr Baudis
@ 2008-07-20  0:16             ` Jakub Narebski
  2008-07-20 21:38               ` Petr Baudis
  0 siblings, 1 reply; 63+ messages in thread
From: Jakub Narebski @ 2008-07-20  0:16 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

On Sat, 19 July 2008, Petr Baudis wrote:
> On Sat, Jul 19, 2008 at 10:54:24PM +0200, Jakub Narebski wrote:

> > 3. git-send-email.perl uses 5 config, 2 config_bool, 2 ident_person
> >    (for author and for committer), 1 version, and of course once
> >    ->repository() constructor.  
> > 
> >    Here we can see how to work around current API to: it uses
> >    Git::config(@repo, "sendemail.identity") form, where 
> >    	my $repo = eval { Git->repository() };
> >    	my @repo = $repo ? ($repo) : ();
> >    to make it work both with git repository (using repo config), and
> >    outside/without git repository, using only user and system git
> >    config.
> 
>   With the envisioned model, it could use $git which would be either
> a reference to a Git::Standalone singleton or Git::Repo instance.

So that the code would look like the following, instead:

  my $git = new Git::Cmd; # or Git::Standalone, or Git::CommandFactory
  my $repo = eval { Git->repository() };
  $git = $repo if $repo;

and later use

  $git->config('sendemail.identity');


By the way, git-svn can use command(...) instead of $repo->command(...)
because it sets $ENV{'GIT_DIR'} if it is unset... but I don't see
where Git.pm inserts 'git' to commands list...
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API)
  2008-07-19 20:54         ` Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API) Jakub Narebski
  2008-07-19 21:14           ` Petr Baudis
@ 2008-07-20 10:38           ` Johannes Schindelin
  2008-07-20 10:49             ` Petr Baudis
  1 sibling, 1 reply; 63+ messages in thread
From: Johannes Schindelin @ 2008-07-20 10:38 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Petr Baudis, Lea Wiemann, git, John Hawley

Hi,

On Sat, 19 Jul 2008, Jakub Narebski wrote:

> From git commands in Perl the following include "use Git" in their 
> sources: git-cvsexportcommit, git-send-email, git-svn, and helper
> script git-add--interactive.  There are Perl scripts which do not
> use Git.pm: git-archimport (which I think should be obsoleted or
> moved to contrib), git-cvsimport, git-cvsserver, git-relink.  This
> means that half of Perl scripts use Git Perl API.
> 
> The situation is worse for scripts in 'contrib/'.  From those, only
> contrib/examples/git-remote.perl uses Git.pm; neither blameview,
> continuous, git-import and import-tars in fast-import, setgitperms
> and update-paranoid in hooks, stats, nor other Perl scripts in
> examples (git-rerere, git-svnimport) include "use Git".

You call that "worse"?  Pasky tried to convert all Perl scripts at once 
IIRC, and my numerous problems just _making_ the Git scripts led me to 
rewrite a few Perl scripts in C, so I could safely exclude the Perl 
scripts from my personal fork.

I guess that it was this experience which prevented more of the old 
scripts from being converted.

But your mention of git-add--interactive actually brings up my pet-peeve: 
this script is the only Perl script needed for common operations, i.e. the 
only reason msysGit has to ship bloated with Perl.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API)
  2008-07-20 10:38           ` Johannes Schindelin
@ 2008-07-20 10:49             ` Petr Baudis
  2008-07-20 12:33               ` Johannes Schindelin
  0 siblings, 1 reply; 63+ messages in thread
From: Petr Baudis @ 2008-07-20 10:49 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jakub Narebski, Lea Wiemann, git, John Hawley

  Hi,

On Sun, Jul 20, 2008 at 12:38:07PM +0200, Johannes Schindelin wrote:
> You call that "worse"?  Pasky tried to convert all Perl scripts at once 
> IIRC, and my numerous problems just _making_ the Git scripts led me to 
> rewrite a few Perl scripts in C, so I could safely exclude the Perl 
> scripts from my personal fork.

  I don't remember any concrete report of such problems ever reaching
me; exactly what trouble are you hitting with the Perl scripts using
Git.pm?  I will be glad to try to fix it.

> I guess that it was this experience which prevented more of the old 
> scripts from being converted.
> 
> But your mention of git-add--interactive actually brings up my pet-peeve: 
> this script is the only Perl script needed for common operations, i.e. the 
> only reason msysGit has to ship bloated with Perl.

  _Many_ people seem to be using git-svn, whether we like it or not. ;-)
Also, isn't git-send-mail rather commonly used? (I wouldn't know, me
using stg mail.)

-- 
				Petr "Pasky" Baudis
As in certain cults it is possible to kill a process if you know
its true name.  -- Ken Thompson and Dennis M. Ritchie

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API)
  2008-07-20 10:49             ` Petr Baudis
@ 2008-07-20 12:33               ` Johannes Schindelin
  2008-07-20 12:58                 ` Petr Baudis
  0 siblings, 1 reply; 63+ messages in thread
From: Johannes Schindelin @ 2008-07-20 12:33 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Jakub Narebski, Lea Wiemann, git, John Hawley

Hi,

On Sun, 20 Jul 2008, Petr Baudis wrote:

> On Sun, Jul 20, 2008 at 12:38:07PM +0200, Johannes Schindelin wrote:
>
> > Pasky tried to convert all Perl scripts at once IIRC, and my numerous 
> > problems just _making_ the Git scripts led me to rewrite a few Perl 
> > scripts in C, so I could safely exclude the Perl scripts from my 
> > personal fork.
> 
> I don't remember any concrete report of such problems ever reaching me; 
> exactly what trouble are you hitting with the Perl scripts using Git.pm?  
> I will be glad to try to fix it.

They reached you:

http://article.gmane.org/gmane.comp.version-control.git/23153

http://thread.gmane.org/gmane.comp.version-control.git/22764/focus=22778

Yes, those are very old mails, but they _do_ explain why the old Perl 
scripts avoided Git.pm.

> > But your mention of git-add--interactive actually brings up my 
> > pet-peeve: this script is the only Perl script needed for common 
> > operations, i.e. the only reason msysGit has to ship bloated with 
> > Perl.
> 
> _Many_ people seem to be using git-svn, whether we like it or not. ;-)

Well, they will just stop using it on msysGit, since nobody fixes git-svn 
on msysGit.

> Also, isn't git-send-mail rather commonly used? (I wouldn't know, me 
> using stg mail.)

send-mail does not work on Windows.  See 
http://code.google.com/p/msysgit/issues/detail?id=27

Yes, you read correctly.  The issue exists since almost the birth of 
msysGit.  Torgil seems to have stopped working on it altogether.  Two 
other people have "starred" the issue, wanting to be notified of fixes but 
not wanting to work on them.

To me, it seems that either the Windows folk is a lazy bunch of bums, or 
they just do not care enough.  Or both.

Note: I want to express very loudly here that I do not count Hannes Sixt 
as Windows folk.  Even if he seems to like cmd.exe for some perverse 
reason, it is undisputable that his effort, and his effort alone, brought 
the MinGW port to where it is now.

To sum it up, I think we can safely remove Perl from the msysGit installer 
once add--interactive is a builtin.  The download will be substantially 
smaller, and Perl on msysGit was never exactly a speed demon, so it's 
probably a good change from several angles.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API)
  2008-07-20 12:33               ` Johannes Schindelin
@ 2008-07-20 12:58                 ` Petr Baudis
  2008-07-20 13:21                   ` Johannes Schindelin
  0 siblings, 1 reply; 63+ messages in thread
From: Petr Baudis @ 2008-07-20 12:58 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jakub Narebski, Lea Wiemann, git, John Hawley

  Hi,

On Sun, Jul 20, 2008 at 02:33:46PM +0200, Johannes Schindelin wrote:
> On Sun, 20 Jul 2008, Petr Baudis wrote:
> 
> > On Sun, Jul 20, 2008 at 12:38:07PM +0200, Johannes Schindelin wrote:
> >
> > > Pasky tried to convert all Perl scripts at once IIRC, and my numerous 
> > > problems just _making_ the Git scripts led me to rewrite a few Perl 
> > > scripts in C, so I could safely exclude the Perl scripts from my 
> > > personal fork.
> > 
> > I don't remember any concrete report of such problems ever reaching me; 
> > exactly what trouble are you hitting with the Perl scripts using Git.pm?  
> > I will be glad to try to fix it.
> 
> They reached you:
> 
> http://article.gmane.org/gmane.comp.version-control.git/23153

  running Git in-place without correctly setting the prefix is not
supported anyway; git wrapper will still be using non-builtin commands
from the prefix location instead of your fresh build.

> http://thread.gmane.org/gmane.comp.version-control.git/22764/focus=22778

  It seems I fixed this right away?

				Petr "Pasky" Baudis

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API)
  2008-07-20 12:58                 ` Petr Baudis
@ 2008-07-20 13:21                   ` Johannes Schindelin
  0 siblings, 0 replies; 63+ messages in thread
From: Johannes Schindelin @ 2008-07-20 13:21 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Jakub Narebski, Lea Wiemann, git, John Hawley

Hi,

On Sun, 20 Jul 2008, Petr Baudis wrote:

> On Sun, Jul 20, 2008 at 02:33:46PM +0200, Johannes Schindelin wrote:
> > On Sun, 20 Jul 2008, Petr Baudis wrote:
> > 
> > > On Sun, Jul 20, 2008 at 12:38:07PM +0200, Johannes Schindelin wrote:
> > >
> > > > Pasky tried to convert all Perl scripts at once IIRC, and my 
> > > > numerous problems just _making_ the Git scripts led me to rewrite 
> > > > a few Perl scripts in C, so I could safely exclude the Perl 
> > > > scripts from my personal fork.
> > > 
> > > I don't remember any concrete report of such problems ever reaching 
> > > me; exactly what trouble are you hitting with the Perl scripts using 
> > > Git.pm?  I will be glad to try to fix it.
> > 
> > They reached you:
> > 
> > http://article.gmane.org/gmane.comp.version-control.git/23153
> 
> running Git in-place without correctly setting the prefix is not 
> supported anyway; git wrapper will still be using non-builtin commands 
> from the prefix location instead of your fresh build.

There were none.  Anyway, because Git makes it _really_ hard to run 
in-place, I gave in.  I run Git from $HOME/bin now.

> > http://thread.gmane.org/gmane.comp.version-control.git/22764/focus=22778
> 
> It seems I fixed this right away?

That is not what I remember.  I remember that I had the next issue right 
away.  I remember that there were problems with ActiveState Perl+Cygwin. 
And I remember that the Git.xs was not merged in the end.

Anyway, this whole discussion is moot, methinks.

Let's concentrate back on the real issue: merging the two as-of-now 
incompatible Perl modules.  It really would be nice to have one pony with 
one saddle in the end, so that the use of Lea's GSoC project is not 
limited to gitweb with enabled caching.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-19 19:07         ` Jakub Narebski
@ 2008-07-20 21:36           ` Petr Baudis
  2008-07-20 21:50             ` Jakub Narebski
  0 siblings, 1 reply; 63+ messages in thread
From: Petr Baudis @ 2008-07-20 21:36 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Lea Wiemann, git, John Hawley

On Sat, Jul 19, 2008 at 09:07:55PM +0200, Jakub Narebski wrote:
> On Fri, 18 July 2008, Petr Baudis wrote:
> > On Tue, Jul 15, 2008 at 01:41:38AM +0200, Jakub Narebski wrote:
> > > On Mon, 14 July 2008, Petr Baudis wrote:
> > > > Here is an idea: Introduce Git::Command object that will have very
> > > > general interface and look like
> > > > 
> > > > 	my $c = Git::Command->new(['git', '--git-dir=.', 'cat-file', \
> > > > 		'-p', 'bla'], {pipe_out=>1})
> > > > 	...
> > > > 	$c->close();
> > > 
> > > Errr... how do you read from such a pipe?  <$c> I think wouldn't work,
> > > unless you would use some trickery...
> > 
> > That's good point; it might either be done using some trickery, or
> > $c->pipe. The idea behind having a special object for it though is to
> > have *unified* (no matter how simple) error handling. You might not
> > detect the command erroring out at the open time.
> > 
> > Is there a better approach for solving this?
> 
> I don't know if it is _better_ approach, but the _alternate_ approach
> would be to use:
> 
>  	my $c = Git::Command->new(['git', '--git-dir=.', 'cat-file', \
>  		'-p', 'bla'], {out=>my $fh, err=>undef})
> 	... 	
> 	while (my $line = <$fh>) {
> 	...
>  	$c->close();

I think this is horribly ugly, you would be *much* better keeping the
filehandle within $c if going this way.

> And trickery would be to use blessed filehandle, or what?  Or perhaps
> extending IO::Handle (but not all like using object methods for I/O
> handles)?

Maybe blessed filehandle is the simplest way; it seems that in case we
need anything more complex later, it should be possible to replace it
with an IO::Handle subclass, but that feels like overengineering now.

> I forgot that we cannot obsolete / replace old interface.  Nevertheless
> it would be nice to be able to use for example
> 
> 	Git::Cmd->output_pipe('ls-remotes', $URL, '--heads');
> 
> but also
> 
> 	output_pipe('myscript.sh', <arg1>, <arg2>);

I think exported functions should have all a git_ prefix.

> > Well, this interface is almost identical to what I delineated, except
> > that I have the extra ->cmd-> step there. But maybe, we could go with
> > your API and instead have Git::CommandFactory as a base of Git::Repo?
> > The hierarchy would be
> > 
> > 	Git::CommandFactory - provides the cmd_pipe toolkit
> > 		|
> > 	    Git::Repo       - provides repository model
> > 		|
> > 	Git::Repo::NonBare  - additional working-copy-related methods
> > 
> > I think I will post a sample implementation sometime over the weekend.
> 
> Thanks.
> 
> I think this is a very good idea.  Although... you mix somewhat here
> relationships.  Relationship between Git::CommandFactory (Git::Cmd?)
> is a bit different than relationship between Git::Repo and
> Git::Repo::NonBare.  Git::Repo::NonBare is a case of Git::Repo which
> additionally knows where its working copy (Git::WC?) is, and where
> inside working copy we are (if we are inside working copy).  Git::Repo
> uses Git::CommandFactory to route calls to git commands, and to
> provide default '--git-dir=<repo_path>' argument.

Yes, but that does not mean Git::Repo must not inherit from
Git::CmdFactory. Think of Git::CmdFactory as maybe a kind of Java-sense
interface to a degree.

> What I'd like to have is a way to easily set in _one_ place where git
> binary can be found, even if we are using different repositories, call
> git commands not related to git repository.
> 
> Should we use
> 
> 	Git::Cmd->output_pipe('ls-remotes', $URL, '--heads');
> or
> 	output_pipe(GIT, 'ls-remotes', $URL, '--heads');
> or
> 	output_pipe($GIT, 'ls-remotes', $URL, '--heads');
> or
> 	output_pipe($Git::GIT, 'ls-remotes', $URL, '--heads');
> 
> we would want to be able to set where git binary is once (and for all),
> for example via
> 
> 	Git::Cmd->set_git('/usr/local/bin/git');
> 
> or something like that.

Yes, that should work fine, with the Git::Cmd subclasses looking into
the singleton.

BTW, I don't like Git::Cmd for the factory interface, since the methods
create Git::Command objects and then the naming does not make any sense.
So I'm going to use class names Git::CmdFactory and Git::Cmd for the
first prototype (since "Command" _is_ too long), unless you have better
but still clear names.

-- 
				Petr "Pasky" Baudis
As in certain cults it is possible to kill a process if you know
its true name.  -- Ken Thompson and Dennis M. Ritchie

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API)
  2008-07-20  0:16             ` Jakub Narebski
@ 2008-07-20 21:38               ` Petr Baudis
  0 siblings, 0 replies; 63+ messages in thread
From: Petr Baudis @ 2008-07-20 21:38 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

On Sun, Jul 20, 2008 at 02:16:36AM +0200, Jakub Narebski wrote:
> By the way, git-svn can use command(...) instead of $repo->command(...)
> because it sets $ENV{'GIT_DIR'} if it is unset... but I don't see
> where Git.pm inserts 'git' to commands list...

In _execv_git_cmd(), or did I misunderstand your question?

(I think that level of indirection is probably residuum of the XS
interface.)

				Petr "Pasky" Baudis

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3] add new Git::Repo API
  2008-07-20 21:36           ` Petr Baudis
@ 2008-07-20 21:50             ` Jakub Narebski
  0 siblings, 0 replies; 63+ messages in thread
From: Jakub Narebski @ 2008-07-20 21:50 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Lea Wiemann, git, John Hawley

On Sun, 20 Jul 2008, Petr Baudis wrote:
> On Sat, Jul 19, 2008 at 09:07:55PM +0200, Jakub Narebski wrote:

> > I think this is a very good idea.  Although... you mix somewhat here
> > relationships.  Relationship between Git::CommandFactory (Git::Cmd?)
> > is a bit different than relationship between Git::Repo and
> > Git::Repo::NonBare.  Git::Repo::NonBare is a case of Git::Repo which
> > additionally knows where its working copy (Git::WC?) is, and where
> > inside working copy we are (if we are inside working copy).  Git::Repo
> > uses Git::CommandFactory to route calls to git commands, and to
> > provide default '--git-dir=<repo_path>' argument.
> 
> Yes, but that does not mean Git::Repo must not inherit from
> Git::CmdFactory. Think of Git::CmdFactory as maybe a kind of Java-sense
> interface to a degree.

I agree. Nevertheless one inheritance is "interface", second
is "subclass"... but I don't think that matters at all.  Just being
nitpicky.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 0/3] Git::Repo API and gitweb caching
  2008-07-11  1:06 [PATCH 0/3] Git::Repo API and gitweb caching Lea Wiemann
                   ` (5 preceding siblings ...)
  2008-07-19  5:35 ` Lea Wiemann
@ 2008-08-18 19:34 ` Lea Wiemann
  2008-08-18 19:39   ` [PATCH 1/3 v10] gitweb: add test suite with Test::WWW::Mechanize::CGI Lea Wiemann
                     ` (2 more replies)
  6 siblings, 3 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-08-18 19:34 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: Git Mailing List, John Hawley, Jakub Narebski, Petr Baudis

Lea Wiemann wrote:
> As follow-ups to this message, I'll be sending three patches for
> 
> 1) adding the Mechanize tests,
> 2) adding the Git::Repo API, and (the important part:)
> 3) making gitweb use the Git::Repo API, and adding caching to gitweb.

I'll be sending the next version of the patch series as a follow-up to
this message.  As the GSoC pencils-down deadline is in -0.5h, I'll try
to get the code out first and then I'll post some comments on a few
aspects it.

I unfortunately didn't end up being able to split up the third patch
(use Perl API in Gitweb, and add caching layer), since the two changes
are too intricately linked to be properly separated (I actually tried
splitting it two times, two different ways, and it just didn't work).
The only thing that's really not caching-specific to the third patch
(i.e. where using the Perl API merely prettifies the code) is where
Gitweb now uses $repo->get_sha1.  Those few occurrences are only a very
small part of the patch, and I don't think they'd warrant ripping it apart.

Here's the list of changes since the last patch series:


Patch 1/1 (Mechanize tests):

- typo


Patch 2/3 (Perl API):

- remove get_refs, get_path, and version methods

- rename cat_file to get_object

- remove unnecessary cmd_output and get_bidi_pipe methods -- they
  shouldn't be part of this API (because they don't fit into the
  design), and keeping and underscore-prefixing them isn't worth the
  code complexity -- so the Git::Repo methods now use 'open' calls
  directly

- rename the 'directory' options of RepoRoot->new, Repo->new, and
  RepoRoot->repo to 'repo_dir' and 'root_dir' respectively

- ignore (rather than die on) unrecognized header lines in tag/commit
  objects

- do not use unnecessary hash index optimization

- add design notes

- underscore-prefix assert_opts and assert_sha1

- remove dependency of the test suite (t9710) on non-standard modules;
  Git::Repo should now run on vanilla Perl 5.8

- respect PERL_PATH in test suite

- add "use perl 5.008" statement (so it dies for Perl <5.8) -- it
  needs Perl 5.8 for Unicode support

- do not "or" author/committer/tagger/type with '' (it's unnecessary
  and silly)

- improved documentation, per Jakub's suggestions mostly

- author, committer, tagger, and message now all return Unicode
  strings per <48809D31.5030008@gmail.com>

- some minor cleanups, typos

- remove warning about Git::Repo being unstable; this is a silly
  excuse that in theory allows you to make incompatible changes ("we
  told you so"), but in practice never holds up


Patch 3/3 (Gitweb):

- add installation note for the Git perl package to INSTALL

- accomodate removals/renamings in Git::Repo API

- remove two TODO markers in t9503 (gitweb tests) -- works now thanks
  to Git::Repo

- some minor cleanups, typos

- do not remove the Expires header (as was the case in the previous
  patch)

- remove option to assume case sensitive file system to unclutter the
  code -- the cache will now work on any file system

- elaborate on inefficient refs/ traversal code in
  get_last_modification method

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 1/3 v10] gitweb: add test suite with Test::WWW::Mechanize::CGI
  2008-08-18 19:34 ` Lea Wiemann
@ 2008-08-18 19:39   ` Lea Wiemann
  2008-08-19  1:17     ` Junio C Hamano
  2008-08-18 19:39   ` [PATCH 2/3 v2] add new Perl API: Git::Repo, Git::Commit, Git::Tag, and Git::RepoRoot Lea Wiemann
  2008-08-18 19:39   ` [PATCH 3/3 v2] gitweb: use new Git::Repo API, and add optional caching Lea Wiemann
  2 siblings, 1 reply; 63+ messages in thread
From: Lea Wiemann @ 2008-08-18 19:39 UTC (permalink / raw)
  To: git

This test uses Test::WWW::Mechanize::CGI to check gitweb's output.  It
also uses HTML::Lint, XML::Parser, and Archive::Tar (if present, each)
to validate the HTML/XML/tgz output, and checks all links on the
tested pages if --long-tests is given.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
 Makefile                    |    1 +
 t/t9503-gitweb-Mechanize.sh |  144 ++++++++++
 t/t9503/test.pl             |  604 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 749 insertions(+), 0 deletions(-)
 create mode 100755 t/t9503-gitweb-Mechanize.sh
 create mode 100755 t/t9503/test.pl

diff --git a/Makefile b/Makefile
index ca418fc..35779a7 100644
--- a/Makefile
+++ b/Makefile
@@ -1289,6 +1289,7 @@ GIT-CFLAGS: .FORCE-GIT-CFLAGS
 GIT-BUILD-OPTIONS: .FORCE-GIT-BUILD-OPTIONS
 	@echo SHELL_PATH=\''$(subst ','\'',$(SHELL_PATH_SQ))'\' >$@
 	@echo TAR=\''$(subst ','\'',$(subst ','\'',$(TAR)))'\' >>$@
+	@echo PERL_PATH=\''$(subst ','\'',$(PERL_PATH_SQ))'\' >>$@
 
 ### Detect Tck/Tk interpreter path changes
 ifndef NO_TCLTK
diff --git a/t/t9503-gitweb-Mechanize.sh b/t/t9503-gitweb-Mechanize.sh
new file mode 100755
index 0000000..53f2a8a
--- /dev/null
+++ b/t/t9503-gitweb-Mechanize.sh
@@ -0,0 +1,144 @@
+#!/bin/sh
+#
+# Copyright (c) 2008 Jakub Narebski
+# Copyright (c) 2008 Lea Wiemann
+#
+
+# This test supports the --long-tests option.
+
+# This test only runs on Perl 5.8 and later versions, since
+# Test::WWW::Mechanize::CGI requires Perl 5.8.
+
+test_description='gitweb tests (using WWW::Mechanize)
+
+This test uses Test::WWW::Mechanize::CGI to test gitweb.'
+
+# helper functions
+
+safe_chmod () {
+	chmod "$1" "$2" &&
+	if [ "$(git config --get core.filemode)" = false ]
+	then
+		git update-index --chmod="$1" "$2"
+	fi
+}
+
+. ./test-lib.sh
+
+# check if test can be run
+"$PERL_PATH" -MEncode -e 'decode_utf8("", Encode::FB_CROAK)' >/dev/null 2>&1 || {
+	test_expect_success \
+		'skipping gitweb tests, perl version is too old' :
+	test_done
+	exit
+}
+
+"$PERL_PATH" -MTest::WWW::Mechanize::CGI -e '' >/dev/null 2>&1 || {
+	test_expect_success \
+		'skipping gitweb tests, Test::WWW::Mechanize::CGI not found' :
+	test_done
+	exit
+}
+
+# set up test repository
+test_expect_success 'set up test repository' '
+
+	echo "Not an empty file." > file &&
+	git add file &&
+	test_tick && git commit -a -m "Initial commit." &&
+	git branch b &&
+
+	echo "New file" > new_file &&
+	git add new_file &&
+	test_tick && git commit -a -m "File added." &&
+
+	safe_chmod +x new_file &&
+	test_tick && git commit -a -m "Mode changed." &&
+
+	git mv new_file renamed_file &&
+	test_tick && git commit -a -m "File renamed." &&
+
+	rm renamed_file &&
+	ln -s file renamed_file &&
+	test_tick && git commit -a -m "File to symlink." &&
+	git tag with-symlink &&
+
+	git rm renamed_file &&
+	rm -f renamed_file &&
+	test_tick && git commit -a -m "File removed." &&
+
+	cp file file2 &&
+	git add file2 &&
+	test_tick && git commit -a -m "File copied." &&
+
+	echo "New line" >> file2 &&
+	safe_chmod +x file2 &&
+	test_tick && git commit -a -m "Mode change and modification." &&
+
+	mkdir dir1 &&
+	echo "New file with a \"pickaxe test string\"" > dir1/file1 &&
+	git add dir1/file1 &&
+	test_tick && git commit -a -m "File added in subdirectory." &&
+	git tag -m "creating a tag object" tag-object
+
+	git checkout b &&
+	echo "Branch" >> b &&
+	git add b &&
+	test_tick && git commit -a -m "On branch" &&
+	git checkout master &&
+	test_tick && git pull . b
+'
+
+# set up gitweb configuration
+safe_pwd="$("$PERL_PATH" -MPOSIX=getcwd -e 'print quotemeta(getcwd)')"
+large_cache_root="../t9503/large_cache.tmp"
+test_expect_success 'create file cache directory' \
+	'mkdir -p "$large_cache_root"'
+cat >gitweb_config.perl <<EOF
+# gitweb configuration for tests
+
+our \$version = "current";
+our \$GIT = "$GIT_EXEC_PATH/git";
+our \$projectroot = "$safe_pwd";
+our \$project_maxdepth = 8;
+our \$home_link_str = "projects";
+our \$site_name = "[localhost]";
+our \$site_header = "";
+our \$site_footer = "";
+our \$home_text = "indextext.html";
+our @stylesheets = ("file:///$safe_pwd/../../gitweb/gitweb.css");
+our \$logo = "file:///$safe_pwd/../../gitweb/git-logo.png";
+our \$favicon = "file:///$safe_pwd/../../gitweb/git-favicon.png";
+our \$projects_list = "";
+our \$export_ok = "";
+our \$strict_export = "";
+our %feature;
+\$feature{'blame'}{'default'} = [1];
+
+our \$large_cache_root = "$large_cache_root";
+if (eval { require Cache::MemoryCache; 1 }) {
+	our \$cache = Cache::MemoryCache->new;
+}
+
+
+1;
+__END__
+EOF
+
+cat >.git/description <<EOF
+t9503-gitweb-Mechanize test repository
+EOF
+
+GITWEB_CONFIG="$(pwd)/gitweb_config.perl"
+export GITWEB_CONFIG
+
+# run tests
+
+test_external \
+	'test gitweb output' \
+	"$PERL_PATH" ../t9503/test.pl
+
+test_expect_success 'remove file cache directory' \
+	'rm -rf "$large_cache_root"'
+
+test_done
diff --git a/t/t9503/test.pl b/t/t9503/test.pl
new file mode 100755
index 0000000..93108e7
--- /dev/null
+++ b/t/t9503/test.pl
@@ -0,0 +1,604 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+# This test supports the --long-tests option.
+
+use warnings;
+use strict;
+
+use Cwd qw( abs_path );
+use File::Spec;
+use File::Temp;
+use Storable;
+
+use Test::More qw(no_plan);
+
+our $long_tests = $ENV{GIT_TEST_LONG}; # "our" so we can use "local $long_tests"
+
+eval { require Archive::Tar; };
+my $archive_tar_installed = !$@
+    or diag('Archive::Tar is not installed; no tests for valid snapshots');
+
+eval { require HTML::Lint; };
+my $html_lint_installed = !$@
+    or diag('HTML::Lint is not installed; no HTML validation tests');
+
+eval { require XML::Parser; };
+my $xml_parser_installed = !$@
+    or diag('XML::Parser is not installed; no tests for well-formed XML');
+
+sub rev_parse {
+	my $name = shift;
+	chomp(my $sha1 = `git rev-parse $name 2> /dev/null`);
+	$sha1 or die;
+}
+
+sub get_type {
+	my $name = shift;
+	chomp(my $type = `git cat-file -t $name 2> /dev/null`);
+	$type or die;
+}
+
+
+package OurMechanize;
+
+use base qw( Test::WWW::Mechanize::CGI );
+
+my %page_cache;
+# Cache requests.
+sub _make_request {
+	my ($self, $request) = (shift, shift);
+
+	my $response;
+	unless ($response = Storable::thaw($page_cache{$request->uri})) {
+		$response = $self->SUPER::_make_request($request, @_);
+		$page_cache{$request->uri} = Storable::freeze($response);
+	}
+	return $response;
+}
+
+# Fix whitespace problem.
+sub cgi_application {
+	my ($self, $application) = @_;
+
+	# This subroutine was copied (and modified) from
+	# WWW::Mechanize::CGI 0.3, which is licensed 'under the same
+	# terms as perl itself' and thus GPL compatible.
+	my $cgi = sub {
+		# Use exec, not the shell, to support embedded
+		# whitespace in the path to $application.
+		# http://rt.cpan.org/Ticket/Display.html?id=36654
+		my $status = system $application $application;
+		my $exit_code  = $status >> 8;
+
+		die "Failed to execute application '$application'. Reason: '$!'"
+		    if ($status == -1);
+		die "Application '$application' exited with exit code $exit_code"
+		    if ($exit_code > 0);
+	};
+
+	$self->cgi($cgi);
+}
+
+package main;
+
+
+my @revisions = split /\s/, `git-rev-list --first-parent HEAD`;
+chomp(my @heads = map { (split('/', $_))[2] } `git-for-each-ref --sort=-committerdate refs/heads`);
+chomp(my @tags = map { (split('/', $_))[2] } `git-for-each-ref --sort=-committerdate refs/tags`);
+my @tag_objects = grep { get_type($_) eq 'tag' } @tags;
+chomp(my @root_entries = `git-ls-tree --name-only HEAD`);
+my @files = grep { get_type("HEAD:$_") eq 'blob' } @root_entries or die;
+my @directories = grep { get_type("HEAD:$_") eq 'tree' } @root_entries or die;
+unless ($long_tests) {
+	# Only test one of each kind.
+	@files = $files[0];
+	@directories = $directories[0];
+	@tag_objects = $tag_objects[0];
+}
+
+my $gitweb = abs_path(File::Spec->catfile('..', '..', 'gitweb', 'gitweb.cgi'));
+
+my $mech = OurMechanize->new;
+$mech->cgi_application($gitweb);
+# On some systems(?) it's necessary to have %ENV here, otherwise the
+# CGI process won't get *any* of the current environment variables
+# (not even PATH, etc.)
+$mech->env(%ENV,
+	   GITWEB_CONFIG => $ENV{'GITWEB_CONFIG'},
+	   SCRIPT_FILENAME => $gitweb,
+	   $mech->env);
+
+# import config, predeclaring config variables
+our $site_name;
+require_ok($ENV{'GITWEB_CONFIG'})
+	or diag('Could not load gitweb config; some tests would fail');
+
+# Perform non-recursive checks on the current page, but do not check
+# the status code.
+my %verified_uris;
+sub _verify_page {
+	my ($uri, $fragment) = split '#', $mech->uri;
+	TODO: {
+		local $TODO = 'line number fragments can be broken for diffs and blames'
+		    if $fragment && $fragment =~ /^l[0-9]+$/;
+		$mech->content_like(qr/(name|id)="$fragment"/,
+				    "[auto] fragment #$fragment exists ($uri)")
+		    if $fragment;
+	}
+
+	return 1 if $verified_uris{$uri};
+	$verified_uris{$uri} = 1;
+
+	# Internal errors yield 200 but cause gitweb.cgi to exit with
+	# non-zero exit code, which Mechanize::CGI translates to 500,
+	# so we don't really need to check for "Software error" here,
+	# provided that the test cases always check the status code.
+	#$mech->content_lacks('<h1>Software error:</h1>') or return 0;
+
+	# Validate.  This is fast, so we can do it even without
+	# $long_tests.
+	$mech->html_lint_ok('[auto] validate HTML') or return 0
+	    if $html_lint_installed && $mech->is_html;
+	my $content_type = $mech->response->header('Content-Type')
+	    or die "$uri does not have a Content-Type header";
+	if ($xml_parser_installed && $content_type =~ /xml/) {
+		eval { XML::Parser->new->parse($mech->content); };
+		ok(!$@, "[auto] check for XML well-formedness ($uri)") or diag($@);
+	}
+	if ($archive_tar_installed && $uri =~ /sf=tgz/) {
+		my $snapshot_file = File::Temp->new;
+		print $snapshot_file $mech->content;
+		close $snapshot_file;
+		my $t = Archive::Tar->new;
+		$t->read($snapshot_file->filename, 1);
+		ok($t->get_files, "[auto] valid tgz snapshot ($uri)");
+	}
+	# WebService::Validator::Feed::W3C would be nice to
+	# use, but it doesn't support direct input (as opposed
+	# to URIs) long enough for our feeds.
+
+	return 1;
+}
+
+# Verify and spider the current page, the latter only if --long-tests
+# (-l) is given.  Do not check the status code of the current page.
+my %spidered_uris;  # pages whose links have been checked
+my %status_checked_uris;  # verified pages whose status is known to be 2xx
+sub check_page {
+	_verify_page or return 0;
+	if ($long_tests && !$spidered_uris{$mech->uri} ) {
+		$spidered_uris{$mech->uri} = 1;
+		my $orig_url = $mech->uri;
+		TODO: {
+			local $TODO = "blame links can be broken sometimes"
+			    if $orig_url =~ /a=blame/;
+			for my $url (map { $_->url_abs } $mech->followable_links) {
+				if (!$status_checked_uris{$url}) {
+					$status_checked_uris{$url} = 1;
+					local $long_tests = 0;  # stop recursing
+					test_page($url, "[auto] check link")
+					    or diag("broken link to $url on $orig_url");
+					$mech->back;
+				}
+			}
+		}
+	}
+	return 1;
+}
+
+my $baseurl = "http://localhost";
+my ($params, $url, $pagedesc, $status);
+
+# test_page ( <params>, <page_description>, <expected_status> )
+# Example:
+# if (test_page('?p=.git;a=summary', 'repository summary')) {
+#     $mech->...;
+#     $mech->...;
+# }
+#
+# Test that the page can be opened, call _verify_page on it, and
+# return true if there was no test failure.  Also set the global
+# variables $params, $pagedesc, and $url for use in the if block.
+# Optionally pass a third parameter $status to test the HTTP status
+# code of the page (useful for error pages).  You can also pass a full
+# URL instead of just parameters as the first parameter.
+sub test_page {
+	($params, $pagedesc, $status) = @_;
+	# missing $pagedesc is usually accidental
+	die "$params: no pagedesc given" unless defined $pagedesc;
+	if($params =~ /^$baseurl/) {
+		$url = "$params";
+	} else {
+		$url = "$baseurl$params";
+	}
+	$mech->get($url);
+	like($mech->status, $status ? qr/$status/ : qr/^[23][0-9][0-9]$/,
+	     "$pagedesc: $url" . ($status ? " -- yields $status" : ""))
+	    or return 0;
+	if ($mech->status =~ /^3/) {
+		# Don't check 3xx, they tend to look funny.
+		my $location = $mech->response->header('Location');
+		$mech->back;  # compensate for history
+		return test_page($location, "follow redirect from $url");
+	} else {
+		return check_page;
+	}
+}
+
+# follow_link ( \%parms, $pagedesc )
+# Example:
+# if (follow_link( { text => 'commit' }, 'first commit link')) {
+#     $mech->...;
+#     $mech->back;
+# }
+# Like test_page, but does not support status code testing, and
+# returns true if there was a link at all, regardless of whether it
+# was [23]xx or not.
+sub follow_link {
+	(my $parms, $pagedesc) = @_;
+	my $link = $mech->find_link(%$parms);
+	my $current_url = $mech->uri;
+	ok($link, "link exists: $pagedesc (on page $current_url)") or return 0;
+	test_page($link->url, "follow link: $pagedesc (on page $current_url)");
+	return 1;
+}
+
+# like follow_link, except that only checks and goes back immediately;
+# use this instead of ok(find_link...).
+sub test_link {
+	my ($parms, $pagedesc) = @_;
+	my $current_url = $mech->uri;
+	if($long_tests) {
+		# Check status, validate, spider.
+		return follow_link($parms, $pagedesc) && $mech->back;
+	} else {
+		# Only check presence of the link (much faster).
+		return ok($mech->find_link(%$parms),
+			  "link exists: $pagedesc (on page $current_url)");
+	}
+}
+
+sub get_summary {
+	test_page('?p=.git', 'repository summary');
+}
+
+get_summary or die 'summary page failed; aborting all tests';
+
+
+if (test_page '', 'project list (implicit)') {
+	$mech->title_like(qr!$site_name!,
+		"title contains $site_name");
+	$mech->content_contains('t9503-gitweb-Mechanize test repository',
+		'lists test repository (by description)');
+}
+
+
+# Test repository summary: implicit, implicit with pathinfo, explicit.
+for my $sumparams ('?p=.git', '/.git', '?p=.git;a=summary') {
+	if (test_page $sumparams, 'repository summary') {
+		$mech->title_like(qr!$site_name.*\.git/summary!,
+				  "title contains $site_name and \".git/summary\"");
+	}
+}
+
+
+# Search form
+
+# Search commit
+if (get_summary && $mech->submit_form_ok(
+	    { form_number => 1, fields => { 's' => 'Initial' } },
+	    'submit search form (default: commit search)')) {
+	check_page;
+	$mech->content_contains('Initial commit',
+				'content contains commit we searched for');
+}
+
+# Pickaxe
+if (get_summary && $mech->submit_form_ok(
+	    { form_number => 1, fields => { 's' => 'pickaxe test string',
+					    'st' => 'pickaxe' } },
+	    'submit search form (pickaxe)')) {
+	check_page;
+	test_link( { text => 'dir1/file1' }, 'file found with pickaxe' );
+	$mech->content_contains('A U Thor', 'commit author mentioned');
+}
+
+# Grep
+# Let's hope the pickaxe test string is still present in HEAD.
+if (get_summary && $mech->submit_form_ok(
+	    { form_number => 1, fields => { 's' => 'pickaxe test string',
+					    'st' => 'grep' } },
+	    'submit search form (grep)')) {
+	check_page;
+	test_link( { text => 'dir1/file1' }, 'file found with grep' );
+}
+
+
+# Basic error handling
+test_page('?p=non-existent.git', 'non-existent project', 404);
+test_page('?p=.git;a=commit;h=non-existent', 'non-existent commit', 404);
+
+
+# Summary view
+get_summary;
+
+# Check short log.  To do: Extract into separate test_short_log
+# function since the short log occurs on several pages.
+for my $revision (@revisions) {
+	for my $link_text qw( commit commitdiff tree snapshot ) {
+		test_link( { url_abs_regex => qr/h=$revision/, text => $link_text },
+			   "$link_text link for $revision");
+	}
+}
+
+# Check that branches and tags are highlighted in green and yellow in
+# the shortlog.  We assume here that we are on master, so it should be
+# at the top.
+$mech->content_like(qr{<span [^>]*class="head"[^>]*>master</span>},
+		    'master branch is highlighted in shortlog');
+$mech->content_like(qr{<span [^>]*class="tag"[^>]*>$tags[0]</span>},
+		    "$tags[0] (most recent tag) is highlighted in shortlog");
+
+# Check heads.  (This should be extracted as well.)
+for my $head (@heads) {
+	for my $link_text qw( shortlog log tree ) {
+		test_link( { url_abs_regex => qr{h=refs/heads/$head}, text => $link_text },
+			   "$link_text link for head '$head'");
+	}
+}
+
+# Check tags (assume we only have tags referring to commits, not to
+# blobs or trees).
+for my $tag (@tags) {
+	my $commit = rev_parse("$tag^{commit}");
+	test_link( { url_abs_regex => qr{h=refs/tags/$tag}, text => 'shortlog' },
+		   "shortlog link for tag '$tag'");
+	test_link( { url_abs_regex => qr{h=refs/tags/$tag}, text => 'log' },
+		   "log link for tag '$tag'");
+	test_link( { url_abs_regex => qr{h=$commit}, text => 'commit' },
+		   "commit link for tag '$tag'");
+	test_link( { url_abs_regex => qr{h=$commit}, text => $tag },
+	   "'$tag' links to the commit as well");
+	# To do: Test tag link for tag objects.
+	# Why don't we have tree + snapshot links?
+}
+
+
+# RSS/Atom/OPML view
+# Simply retrieve and verify well-formedness, but don't spider.
+$mech->get_ok('?p=.git;a=atom', 'Atom feed') and _verify_page;
+$mech->get_ok('?p=.git;a=rss', 'RSS feed') and _verify_page;
+TODO: {
+	# Now spider -- but there are broken links.
+	# http://mid.gmane.org/485EB333.5070108@gmail.com
+	local $TODO = "fix broken links in Atom/RSS feeds";
+	test_page('?p=.git;a=atom', 'Atom feed');
+	test_page('?p=.git;a=rss', 'RSS feed');
+}
+test_page('?a=opml', 'OPML outline');
+
+
+# Tag view
+get_summary;
+for my $tag (@tag_objects) {
+	my $tag_sha1 = rev_parse($tag);
+	my $object_sha1 = rev_parse("$tag^{}");
+	if(follow_link( { url_abs_regex => qr{h=$tag_sha1}, text => 'tag' },
+			"tag link for tag object '$tag'" )) {
+		$mech->content_contains('C O Mitter', 'tagger mentioned');
+		test_link( { url_abs_regex => qr/h=$object_sha1/ },
+			   'link to referenced object');
+		$mech->back;
+	}
+}
+
+
+# Commit view
+if (test_page('?p=.git;a=commit;h=master', 'view HEAD commit')) {
+	my $tree_sha1 = rev_parse('master:');
+	test_link( { url_abs_regex => qr/a=tree/, text => rev_parse('master:') },
+		   "SHA1 link to tree on commit page ($url)");
+	test_link( { url_abs_regex => qr/h=$tree_sha1/, text => 'tree' },
+		   "'tree' link to tree on commit page ($url)");
+	$mech->content_like(qr/A U Thor/, "author mentioned on commit page ($url)");
+}
+
+
+# Commitdiff view
+if (get_summary &&
+    follow_link( { text_regex => qr/file added/i }, 'commit with added file') &&
+    follow_link( { text => 'commitdiff' }, 'commitdiff')) {
+	$mech->content_like(qr/new file with mode/, "commitdiff has diffstat ($url)");
+	$mech->content_like(qr/new file mode/, "commitdiff has diff ($url)");
+}
+test_page("?p=.git;a=commitdiff;h=$revisions[-1]",
+	  'commitdiff without parent');
+
+# Diff formatting problem.
+if (get_summary &&
+    follow_link( { text_regex => qr/renamed/ }, 'commit with rename') &&
+    follow_link( { text => 'commitdiff' }, 'commitdiff')) {
+	TODO: {
+		local $TODO = "bad a/* link in diff";
+		if (follow_link( { text_regex => qr!^a/! },
+				 'a/* link (probably wrong)')) {
+			# The page we land on here is broken already.
+			follow_link( { url_abs_regex => qr/a=blob_plain/ },
+				     'linked file name');  # bang
+		}
+	}
+}
+
+
+# Raw commitdiff (commitdiff_plain) view
+if (test_page('?p=.git;a=commit;h=refs/tags/tag-object',
+	      'commit view of tags/tag-object') &&
+    follow_link( { text => 'commitdiff' }, "'commitdiff'") &&
+    follow_link( { text => 'raw' }, "'raw' (commitdiff_plain)")) {
+	$mech->content_like(qr/^From: A U Thor <author\@example.com>$/m,
+			    'commitdiff_plain: From header');
+	TODO: {
+		local $TODO = 'date header mangles timezone';
+		$mech->content_like(qr/^Date: Thu, 7 Apr 2005 15:..:13 -0700$/m,
+				    'commitdiff_plain: Date header (correct)');
+	}
+	$mech->content_like(qr/^Date: Thu, 7 Apr 2005 22:..:13 \+0000 \(-0700\)$/m,
+			    'commitdiff_plain: Date header (UTC, wrong)');
+	$mech->content_like(qr/^Subject: .+$/m,
+			    'commitdiff_plain: Subject header');
+	# '$' markers inexplicably don't work here if we use like(...)
+	# or $mech->content_like().
+	ok($mech->content =~ /^X-Git-Tag: tag-object\^0$/m,
+	   'commitdiff_plain: X-Git-Tag header');
+	ok($mech->content =~ /^X-Git-Url: $baseurl\?p=\.git;a=commitdiff_plain;h=refs%2Ftags%2Ftag-object$/m,
+	   'commitdiff_plain: X-Git-Url header');
+	ok($mech->content =~ /^---$/m, 'commitdiff_plain: separator');
+	ok($mech->content =~ /^diff --git /m, 'commitdiff_plain: diff');
+}
+
+
+# Blobdiff view
+# This assumes file2 has changed at least once, so that there can be a
+# blobdiff for it.
+if (get_summary && follow_link( { text => 'tree' }, 'first tree link') &&
+    follow_link( { text => 'history', url_abs_regex => qr/f=file2/ },
+		 'history of file2') &&
+    follow_link( { text => 'diff to current' },
+		 'some "diff to current" (blobdiff)')) {
+	$mech->content_contains('diff --git', 'blobdiff: has diff header');
+	test_link( { text => 'raw', url_abs_regex => qr/a=blobdiff_plain/ },
+		   '"raw" (blobdiff_plain) link');
+}
+
+# Tree view
+if (get_summary && follow_link( { text => 'tree' }, 'first tree link')) {
+	for my $file (@files) {
+		my $sha1 = rev_parse("HEAD:$file");
+		test_link( { text => $file, url_abs_regex => qr/h=$sha1/ },
+			   "'$file' is listed and linked");
+		test_link({ url_abs_regex => qr/f=$file/, text => $_ },
+			  "'$_' link") foreach qw( blame blob history raw );
+	}
+	for my $directory (@directories) {
+		my $sha1 = rev_parse("HEAD:$directory");
+		test_link({ url_abs_regex => qr/f=$directory/, text => $_ },
+			  "'$_' link") foreach qw( tree history );
+		if(follow_link( { text => $directory, url_abs_regex => qr/h=$sha1/ },
+				"'$directory is listed and linked" )) {
+			if(follow_link( { text => '..' }, 'parent directory')) {
+				test_link({ url_abs_regex => qr/h=$sha1/,
+					    text => $directory },
+					  'back to original tree view');
+				$mech->back;
+			}
+			$mech->back;
+		}
+	}
+}
+
+
+# Blame view
+if (get_summary && follow_link( { text => 'tree' }, 'first tree link')) {
+	for my $blame_link ($mech->find_all_links(text => 'blame')) {
+		my $url = $blame_link->url;
+		$mech->get_ok($url, "get $url -- blame link on tree view")
+		    and _verify_page;
+		$mech->content_like(qr/A U Thor/,
+				    "author mentioned on blame page");
+		TODO: {
+			# Now spider -- but there are broken links.
+			# http://mid.gmane.org/485EC621.7090101@gmail.com
+			local $TODO = "fix broken links in certain blame views";
+			check_page;
+		}
+		last unless $long_tests; # only test first blame link
+	}
+}
+
+
+# History view
+if (get_summary && follow_link( { text => 'tree' }, 'first tree link')) {
+	for my $file (@files, @directories) {
+		my $type = get_type("HEAD:$file");  # blob or tree
+		if (follow_link( { text => 'history', url_abs_regex => qr/f=$file/ },
+				 "history link for '$file'")) {
+			# There is at least one commit, so A U Thor is mentioned.
+			$mech->content_contains('A U Thor', 'A U Thor mentioned');
+			# The following tests test for at least *one*
+			# link of each type and are weak since we
+			# don't have any knowledge of commit hashes.
+			test_link( { text => $type, url_abs_regex => qr/f=$file/ },
+				   "$type");
+			test_link( { text => 'commitdiff' },
+				   "commitdiff");
+			test_link( { url_abs_regex => qr/a=commit;.*h=[a-f0-9]{40}/ },
+				   "subject links to commit"); # weak, brittle
+			$mech->back;
+		}
+	}
+}
+
+
+# Blob view
+if (get_summary && follow_link( { text => 'tree' }, 'first tree link')) {
+	for my $file (@files) {
+		if (follow_link( { text => $file, url_abs_regex => qr/a=blob/ },
+				 "\"$file\" (blob) entry on tree view")) {
+			chomp(my $first_line_regex = (`cat "$file"`)[0]);
+			$first_line_regex =~ s/ / |&nbsp;/g;
+			# Hope that the first line doesn't contain any
+			# HTML-escapable character.
+			$mech->content_like(qr/$first_line_regex/,
+					    "blob view contains first line of file ($url)");
+			$mech->back;
+		}
+	}
+}
+
+
+# Raw (blob_plain) view
+if (get_summary && follow_link( { text => 'tree' }, 'first tree link')) {
+	for my $file (@files) {
+		if (follow_link( { text => 'raw', url_abs_regex => qr/f=$file/ },
+				 "raw (blob_plain) entry for \"$file\" in tree view")) {
+			chomp(my $first_line = (`cat "$file"`)[0]);
+			$mech->content_contains(
+				$first_line, "blob_plain view contains first line of file");
+			$mech->back;
+		}
+	}
+}
+
+
+# Error handling
+# Pass valid and invalid paths to various file-based actions
+for my $action qw( blame blob blob_plain ) {
+	test_page("?p=.git;a=$action;f=$files[0];hb=HEAD",
+		  "$action: look up existent file");
+	test_page("?p=.git;a=$action;f=does_not_exist;hb=HEAD",
+		  "$action: look up non-existent file", 404);
+	TODO: {
+		local $TODO = 'wrong error code (but using Git::Repo will fix this)';
+		test_page("?p=.git;a=$action;f=$directories[0];hb=HEAD",
+			  "$action: look up directory", 400);
+	}
+}
+TODO: {
+	local $TODO = 'wrong error code (but using Git::Repo will fix this)';
+	test_page("?p=.git;a=tree;f=$files[0];hb=HEAD",
+		  'tree: look up existent file', 400);
+}
+# Pass valid and invalid paths to tree action
+test_page("?p=.git;a=tree;f=does_not_exist;hb=HEAD",
+	  'tree: look up non-existent file', 404);
+test_page("?p=.git;a=tree;f=$directories[0];hb=HEAD",
+	  'tree: look up directory');
+TODO: {
+	local $TODO = 'cannot use f=/ or f= for trees';
+	test_page("?p=.git;a=tree;f=/;hb=HEAD", 'tree: look up directory');
+}
+
+
+1;
+__END__
-- 
1.6.0.90.gf3f7b

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 2/3 v2] add new Perl API: Git::Repo, Git::Commit, Git::Tag, and Git::RepoRoot
  2008-08-18 19:34 ` Lea Wiemann
  2008-08-18 19:39   ` [PATCH 1/3 v10] gitweb: add test suite with Test::WWW::Mechanize::CGI Lea Wiemann
@ 2008-08-18 19:39   ` Lea Wiemann
  2008-08-19  1:32     ` Junio C Hamano
  2008-08-19 13:51     ` Lea Wiemann
  2008-08-18 19:39   ` [PATCH 3/3 v2] gitweb: use new Git::Repo API, and add optional caching Lea Wiemann
  2 siblings, 2 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-08-18 19:39 UTC (permalink / raw)
  To: git

The Git::Repo class provides low-level access to Git repositories.
The Git::Commit and Git::Tag classes represent commit and tag objects
and allow for accessing their respective properties (like 'author',
'tagger', etc.); Git::Object serves as a common base class for them.
Git::RepoRoot is a helper factory class to instantiate Git::Repo
objects with common properties for repositories under a common root
directory.

Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
 perl/Git/Commit.pm       |  179 ++++++++++++++++++++++++++++++
 perl/Git/Object.pm       |   94 ++++++++++++++++
 perl/Git/Repo.pm         |  274 ++++++++++++++++++++++++++++++++++++++++++++++
 perl/Git/RepoRoot.pm     |  103 +++++++++++++++++
 perl/Git/Tag.pm          |  169 ++++++++++++++++++++++++++++
 perl/Makefile            |    1 +
 perl/Makefile.PL         |   10 ++-
 t/t9710-perl-git-repo.sh |   49 ++++++++
 t/t9710/test.pl          |  136 +++++++++++++++++++++++
 9 files changed, 1014 insertions(+), 1 deletions(-)
 create mode 100644 perl/Git/Commit.pm
 create mode 100644 perl/Git/Object.pm
 create mode 100644 perl/Git/Repo.pm
 create mode 100644 perl/Git/RepoRoot.pm
 create mode 100644 perl/Git/Tag.pm
 create mode 100755 t/t9710-perl-git-repo.sh
 create mode 100755 t/t9710/test.pl

diff --git a/perl/Git/Commit.pm b/perl/Git/Commit.pm
new file mode 100644
index 0000000..58e502f
--- /dev/null
+++ b/perl/Git/Commit.pm
@@ -0,0 +1,179 @@
+=head1 NAME
+
+Git::Commit - Object-oriented interface to Git commit objects.
+
+=head1 DESCRIPTION
+
+Git::Commit is a class representing a commit object in a Git
+repository.  It stringifies to the commit object's SHA1.
+
+=cut
+
+use strict;
+use warnings;
+
+
+package Git::Commit;
+
+use base qw(Git::Object);
+
+
+# Keep documentation in one place to save space.
+
+=head1 METHODS
+
+=head2 General Methods
+
+=over
+
+=item $commit = Git::Commit->new($repo, $sha1)
+
+Return a new Git::Commit instance for a commit object with $sha1 in
+repository $repo.
+
+The commit object is loaded lazily.  Hence, calls to this method are
+free, and it does not check whether $sha1 exists and has the right
+type.  However, accessing any of the commit object's properties will
+fail if $sha1 is not a valid commit object.
+
+Note that $sha1 must be the SHA1 of a commit object; tag objects are
+not dereferenced.
+
+The author, committer and message methods return Unicoded strings,
+decoded according to the "encoding" header, with UTF-8 and then
+Latin-1 as fallbacks.  (These Unicode strings can contain code points
+greater than 256 and are *not* UTF-8 strings; see man perlunitut on
+how Perl handles Unicode.)
+
+You will usually want to call $repo->get_commit($sha1) instead of
+instantiating this class directly; see L<Git::Repo>.
+
+=item $obj->repo
+
+Return the Git::Repo instance this object was instantiated with.
+
+=item $obj->sha1
+
+Return the SHA1 of this commit object, as provided at instantiation time.
+
+=back
+
+=head2 Property Methods
+
+Calling any of these methods will cause the commit object to be loaded
+from the repository, if it hasn't been loaded already.
+
+=over
+
+=item $commit->tree
+
+Return an object that stringifies to the SHA1 of the tree that this
+commit object refers to.  (Currently this returns an actual string,
+but don't rely on it.)
+
+=item $commit->parents
+
+Return an array of zero or more parent commit objects.  Note that
+commit objects stringify to their respective SHA1s, so you can
+alternatively treat this as an array of SHA1 strings.
+
+=item $commit->author
+
+Return the author of this commit object as a Unicode string.
+
+=item $commit->committer
+
+Return the committer of this commit object as a Unicode string.
+
+=item $commit->message
+
+Return the commit message of this commit object as a Unicode string.
+
+=item $commit->encoding
+
+Return the encoding header of the commit object, or undef if no
+encoding header is present; note that Git::Commit does the necessary
+decoding for you, so you should not normally need this method.
+
+=back
+
+=cut
+
+
+sub tree {
+	my $self = shift;
+	$self->_load;
+	return $self->{tree};
+}
+
+sub parents {
+	my $self = shift;
+	$self->_load;
+	return map { ref($self)->new($self->repo, $_) } @{$self->{parents}};
+}
+
+sub author {
+	my $self = shift;
+	$self->_load;
+	return $self->_decode($self->{author});
+}
+
+sub committer {
+	my $self = shift;
+	$self->_load;
+	return $self->_decode($self->{committer});
+}
+
+sub message {
+	my $self = shift;
+	$self->_load;
+	return $self->_decode($self->{message});
+}
+
+sub encoding {
+	my $self = shift;
+	$self->_load;
+	return $self->{encoding};
+}
+
+# Auxiliary method to load (and parse) the commit object from the
+# repository if it hasn't already been loaded.  Optional parameter:
+# The raw contents of the commit object; the commit object will be
+# retrieved from the repository if that parameter is not given.
+sub _load {
+	my ($self, $raw_text) = shift;
+	return if exists $self->{message};  # already loaded
+
+	my $sha1 = $self->sha1;
+	if (!defined $raw_text) {
+		# Retrieve from the repository.
+		(my $type, $raw_text) = $self->repo->get_object($sha1);
+		die "$sha1 is a $type object (expected a commit object)"
+		    unless $type eq 'commit';
+	}
+
+	(my $header, $self->{message}) = split "\n\n", $raw_text, 2;
+	# Parse header.
+	for my $line (split "\n", $header) {
+		local $/ = "\n"; # for chomp
+		chomp($line);
+		my ($key, $value) = split ' ', $line, 2;
+		if ($key eq 'tree') {
+			$self->{tree} = $value;
+		} elsif ($key eq 'parent') {
+			push @{$self->{parents}}, $value;
+		} elsif ($key eq 'author') {
+			$self->{author} = $value;
+		} elsif ($key eq 'committer') {
+			$self->{committer} = $value;
+		} elsif ($key eq 'encoding') {
+			$self->{encoding} = $value;
+		} else {
+			# Ignore unrecognized header lines.
+		}
+	}
+	undef;
+}
+
+
+1;
diff --git a/perl/Git/Object.pm b/perl/Git/Object.pm
new file mode 100644
index 0000000..8a158b2
--- /dev/null
+++ b/perl/Git/Object.pm
@@ -0,0 +1,94 @@
+=head1 NAME
+
+Git::Object - Object-oriented interface to Git objects (base class).
+
+=head1 DESCRIPTION
+
+Git::Object is a base class that provides access to commit, tag and
+(unimplemented) tree objects.  See L<Git::Commit> and L<Git::Tag>.
+
+Objects are loaded lazily, and hence instantiation is free.
+Git::Object instances stringify to their SHA1s.
+
+=cut
+
+
+use strict;
+use warnings;
+
+
+package Git::Object;
+
+use Encode qw(decode);
+
+use base qw(Exporter);
+
+our @EXPORT = qw();
+our @EXPORT_OK = qw();
+
+use overload
+    '""' => \&sha1;
+
+=head1 METHODS
+
+=over
+
+=item Git::Object->new($repo, $sha1)
+
+Return a new Git::Object instance for the object with $sha1 in the
+repository $repo (a Git::Repo instance).
+
+Note that this method does not check whether the object exists in the
+repository.  Trying to accessing its properties through a subclass
+will fail if the object doesn't exist, however.
+
+=cut
+
+sub new {
+	my ($class, $repo, $sha1) = @_;
+	die "$repo is not a Git::Repo instance" unless $repo->isa('Git::Repo');
+	my $self = {repo => $repo, sha1 => $sha1};
+	return bless $self, $class;
+}
+
+=item $obj->repo
+
+Return the Git::Repo instance this object was instantiated with.
+
+=cut
+
+sub repo {
+	shift->{repo}
+}
+
+=item $obj->sha1
+
+Return the SHA1 of this object.
+
+=cut
+
+sub sha1 {
+	shift->{sha1}
+}
+
+# Helper method: Decode the given octets into a Unicode string, trying
+# the $self->{encoding} encoding first, if defined, then trying UTF-8,
+# then falling back to Latin 1.
+
+sub _decode {
+	my ($self, $octets) = @_;
+	my $string;
+	# Try $self->{encoding}:
+	eval { $string = decode($self->{encoding}, $octets, Encode::FB_CROAK) }
+	    if $self->{encoding};
+	# ... else try UTF-8:
+	eval { $string = decode('utf-8', $octets, Encode::FB_CROAK) }
+	    unless defined $string;
+	# ... else fall back to Latin 1 (the first 256 Unicode code
+	# points coincide with Latin 1):
+	$string = $octets unless defined $string;
+	return $string;
+}
+
+
+1;
diff --git a/perl/Git/Repo.pm b/perl/Git/Repo.pm
new file mode 100644
index 0000000..2224c88
--- /dev/null
+++ b/perl/Git/Repo.pm
@@ -0,0 +1,274 @@
+=head1 NAME
+
+Git::Repo - Read-only access to the Git repositories.
+
+=head1 DESCRIPTION
+
+Git::Repo aims to provide low-level access to Git repositories.  For
+instance, you can resolve object names (like 'HEAD~2') to SHA1s, and
+inspect objects.  It does not attempt to be a wrapper around the git
+plumbing or porcelain commands.
+
+Error handling is simple: On a consistent repository, the Perl
+interface will never die.  You can use the get_sha1 method to resolve
+arbitrary object names or check the existence of SHA1 hashes; get_sha1
+will return undef if the object does not exist in the repository.  Any
+SHA1 that is returned by get_sha1 can be safely passed to the other
+Git::Repo methods.
+
+=head1 SYNOPSIS
+
+  use Git::Repo;
+
+  my $repo = Git::Repo->new(
+      repo_dir => '/path/to/repository.git',
+      git_binary => '/usr/bin/git');
+  my $sha1 = $repo->get_sha1('HEAD');
+  print "Last log message:\n\n" . $repo->get_commit($sha1)->message;
+
+=cut
+
+
+use strict;
+use warnings;
+# We could be compatible to Perl 5.6, but it doesn't provide sane pipe
+# handling (sane meaning does not go through shell, and allows for
+# accessing the exit code), so we require 5.8 until someone decides to
+# implement fork/exec-based pipe handling, plus possibly workarounds
+# for Windows brokenness.
+use 5.008;
+
+
+package Git::Repo;
+
+use Git::Tag;
+use Git::Commit;
+
+use IPC::Open2 qw(open2);
+use IO::Handle;
+
+use base qw(Exporter);
+
+our @EXPORT = qw();
+our @EXPORT_OK = qw();
+
+# Auxiliary subroutines
+
+sub _assert_opts {
+	die "must have an even number of arguments for named options"
+	    unless $#_ % 2;
+}
+
+sub _assert_sha1 {
+	my $sha1 = shift;
+	die "'$sha1' is not a SHA1 (need to use get_sha1?)"
+	    unless $sha1 && $sha1 =~ /^[a-f0-9]{40}$/;
+}
+
+
+=head1 METHODS
+
+=head2 General methods
+
+=over
+
+=item $repo = Git::Repo->new(%opts)
+
+Return a new Git::Repo object.  The following options are supported:
+
+=over
+
+=item 'repo_dir'
+
+The directory of the repository (mandatory).
+
+Note that this option is working-copy agnostic; you need to
+instantiate it with the working copy's .git directory as the
+'repo_dir' option.
+
+=item 'git_binary'
+
+The name or full path of the git binary (default: 'git').
+
+=back
+
+Calling this method is free, since it does not check whether the
+repository exists.  Trying to access the repository through one of the
+instance methods will fail if it doesn't exist though.
+
+Examples:
+
+    $repo = Git::Repo->new(repo_dir => '/path/to/repository.git');
+    $repo = Git::Repo->new(repo_dir => '/path/to/working_copy/.git');
+
+=cut
+
+sub new {
+	my $class = shift;
+	_assert_opts @_;
+	my $self = {@_};
+	bless $self, $class;
+	die 'no repo_dir given' unless $self->{repo_dir};
+	return $self;
+}
+
+=item $repo->repo_dir
+
+Return the directory of the repository (.../.git in case of a working
+copy).
+
+=cut
+
+sub repo_dir {
+	shift->{repo_dir}
+}
+
+=item $repo->git_binary
+
+Return the name of or path to the git binary (used with exec).
+
+=cut
+
+sub git_binary {
+	shift->{git_binary}
+}
+
+# Return the first items of the git command line, for instance
+# qw(/usr/bin/git --git-dir=/path/to/repo.git).
+sub _git_cmd {
+	my $self = shift;
+	return ($self->git_binary || 'git', '--git-dir=' . $self->repo_dir);
+}
+
+
+=back
+
+=head2 Inspecting the repository
+
+=over
+
+=item $repo->get_sha1($extended_object_identifier)
+
+Look up the object identified by $extended_object_identifier and
+return its SHA1 hash in scalar context or its ($sha1, $type, $size) in
+list context, or undef or () if the lookup failed, where $type is one
+of 'tag', 'commit', 'tree', or 'blob'.
+
+See L<git-rev-parse(1)>, section "Specifying Revisions", for the
+syntax of the $extended_object_identifier string.
+
+Note that even if you pass a SHA1 hash, its existence is still
+checked, and this method returns undef or () if it doesn't exist in
+the repository.
+
+=cut
+
+sub get_sha1 {
+	my ($self, $object_id) = @_;
+	die 'no object identifier given' unless $object_id;
+	die 'object identifier must not contain newlines' if $object_id =~ /\n/;
+	unless ($self->{sha1_stdout}) {
+		# Open bidi pipe the first time get_sha1 is called.
+		# open2 raises an exception on error, no need to 'or die'.
+		open2($self->{sha1_stdout}, $self->{sha1_stdin},
+		      $self->_git_cmd, 'cat-file', '--batch-check');
+	}
+	$self->{sha1_stdin}->print("$object_id\n")
+	    or die 'cannot write to pipe';
+	my $output = $self->{sha1_stdout}->getline
+	    or die 'cannot read from pipe';
+	chomp $output;
+	return if $output =~ /missing$/;
+	my ($sha1, $type, $size) =
+	    ($output =~ /^([0-9a-f]{40}) ([a-z]+) ([0-9]+)$/)
+	    or die "invalid response: $output";
+	return wantarray ? ($sha1, $type, $size) : $sha1;
+}
+
+=item $repo->get_object($sha1)
+
+Return the content (as a string) of the object identified by $sha1, or
+die if no such object exists in the repository.  In list context,
+return the ($type, $content) of the object.
+
+Note that you may want to use the higher-level methods get_commit and
+get_tag instead.
+
+=cut
+
+# Possible to-do items: Add optional $file_handle parameter.  Guard
+# against getting huge blobs back when we don't expect it (for
+# instance, we could limit the size and send SIGPIPE to git if we get
+# a blob that is too large).
+
+sub get_object {
+	my ($self, $sha1) = @_;
+	_assert_sha1($sha1);
+
+	unless ($self->{object_stdout}) {
+		# Open bidi pipe the first time get_object is called.
+		# open2 raises an exception on error, no need to 'or die'.
+		open2($self->{object_stdout}, $self->{object_stdin},
+		      $self->_git_cmd, 'cat-file', '--batch');
+	}
+	$self->{object_stdin}->print("$sha1\n") or die 'cannot write to pipe';
+	my ($ret_sha1, $type, $size) =
+	    split ' ', $self->{object_stdout}->getline
+	    or die 'cannot read from pipe';
+	die "'$sha1' not found in repository" if $type eq 'missing';
+	$self->{object_stdout}->read(my $content, $size);
+	$self->{object_stdout}->getline;  # eat trailing newline
+	return wantarray ? ($type, $content) : $content;
+}
+
+=item $repo->get_commit($commit_sha1)
+
+Return a new L<Git::Commit> instance referring to the commit object
+with SHA1 $commit_sha1.
+
+=cut
+
+sub get_commit {
+	my ($self, $sha1) = @_;
+	_assert_sha1($sha1);
+	return Git::Commit->new($self, $sha1);
+}
+
+=item $repo->get_tag($tag_sha1)
+
+Return a new L<Git::Tag> instance referring to the tag object with SHA1
+$tag_sha1.
+
+=cut
+
+sub get_tag {
+	my ($self, $sha1) = @_;
+	_assert_sha1($sha1);
+	return Git::Tag->new($self, $sha1);
+}
+
+=item $repo->name_rev($committish_sha1, $tags_only = 0)
+
+Return a symbolic name for the commit identified by $committish_sha1,
+or undef if no name can be found; see L<git-name-rev(1)> for details.
+If $tags_only is true, no branch names are used to name the commit.
+
+=cut
+
+sub name_rev {
+	my ($self, $sha1, $tags_only) = @_;
+	_assert_sha1($sha1);
+
+	# Note that we cannot use a bidi pipe here since name git
+	# name-rev --stdin has an excessively high start-up time.
+	# http://thread.gmane.org/gmane.comp.version-control.git/85531
+	open my $fh, '-|', $self->_git_cmd, 'name-rev',
+		$tags_only ? '--tags' : (), '--name-only', $sha1
+	    or die 'error calling git binary';
+	chomp(my $name = <$fh>);
+	close $fh or die 'git name-rev returned an unexpected error';
+	return $name eq 'undefined' ? undef : $name;
+}
+
+
+1;
diff --git a/perl/Git/RepoRoot.pm b/perl/Git/RepoRoot.pm
new file mode 100644
index 0000000..6c8e749
--- /dev/null
+++ b/perl/Git/RepoRoot.pm
@@ -0,0 +1,103 @@
+=head1 NAME
+
+Git::RepoRoot - A factory class representing a root directory
+containing Git repositories.
+
+=head1 DESCRIPTION
+
+Git::RepoRoot is a factory class to create L<Git::Repo> instances that
+are located under a common root directory.  It also allows for
+specifying options that all Git::Repo instances will be created with.
+
+Using Git::RepoRoot to create Git::Repo instances is entirely
+optional, but can be more convenient than instantiating them directly.
+
+=cut
+
+
+use strict;
+use warnings;
+
+
+package Git::RepoRoot;
+
+use File::Spec;
+
+use Git::Repo;
+
+use base qw(Exporter);
+
+our @EXPORT = qw();
+our @EXPORT_OK = qw();
+
+=head1 METHODS
+
+=over
+
+=item $repo_root = Git::RepoRoot->new(%opts)
+
+Return a new Git::RepoRoot object.  The following options are
+supported:
+
+=over
+
+=item 'root_dir'
+
+The directory holding all repositories.
+
+=back
+
+All other options will be passed through to Git::Repo->new.
+
+Example:
+
+    $repo_root = Git::RepoRoot->new(root_dir => '/pub/git',
+                                    git_binary => '/usr/bin/git');
+
+=cut
+
+sub new {
+	my $class = shift;
+	Git::Repo::_assert_opts(@_);
+	my $self = {@_};
+	bless $self, $class;
+	die 'no root_dir given' unless $self->{root_dir};
+	return $self;
+}
+
+=item $repo_root->repo(%opts)
+
+Return a new L<Git::Repo> object.  The following options are
+supported:
+
+=over
+
+=item 'root_dir'
+
+The path of the repository relative to the repository root.
+
+=item 'repo_class'
+
+The Repo class to instantiate (default: 'Git::Repo').
+
+=back
+
+All other options are passed through to Git::Repo.
+
+=cut
+
+sub repo {
+	my $self = shift;
+	Git::Repo::_assert_opts(@_);
+	my %opts = (%$self, @_);
+	die 'no repo_dir given' unless $opts{repo_dir};
+	# not technically necessary, but to guard against errors in the caller:
+	die "you passed an absolute path ($opts{repo_dir})"
+	    if $opts{repo_dir} =~ m!^/!;
+	my $repo_class = delete $opts{repo_class} || 'Git::Repo';
+	$opts{repo_dir} = File::Spec->catfile($self->{root_dir}, $opts{repo_dir});
+	return $repo_class->new(%opts);
+}
+
+
+1;
diff --git a/perl/Git/Tag.pm b/perl/Git/Tag.pm
new file mode 100644
index 0000000..07559a4
--- /dev/null
+++ b/perl/Git/Tag.pm
@@ -0,0 +1,169 @@
+=head1 NAME
+
+Git::Tag - Object-oriented interface to Git tag objects.
+
+=head1 DESCRIPTION
+
+Git::Tag is a class representing a tag object in a Git repository.  It
+stringifies to the tag object's SHA1.
+
+=cut
+
+use strict;
+use warnings;
+
+
+package Git::Tag;
+
+use base qw(Git::Object);
+
+
+# Keep documentation in one place to save space.
+
+=head1 METHODS
+
+=head2 General Methods
+
+=over
+
+=item $tag = Git::Tag->new($repo, $sha1)
+
+Return a new Git::Tag instance for a tag object with $sha1 in
+repository $repo.
+
+The tag object is loaded lazily.  Hence, calls to this method are
+free, and it does not check whether $sha1 exists and has the right
+type.  However, accessing any of the tag object's properties will fail
+if $sha1 is not a valid tag object.
+
+The tagger and message methods return Unicoded strings, decoded
+according to the "encoding" header, with UTF-8 and then Latin-1 as
+fallbacks.  (These Unicode strings can contain code points greater
+than 256 and are *not* UTF-8 strings; see man perlunitut on how Perl
+handles Unicode.)
+
+You will usually want to call $repo->get_tag($sha1) instead of
+instantiating this class directly; see L<Git::Repo>.
+
+=item $obj->repo
+
+Return the Git::Repo instance this object was instantiated with.
+
+=item $obj->sha1
+
+Return the SHA1 of this tag object, as provided at instantiation time.
+
+=back
+
+=head2 Property Methods
+
+Calling any of these methods will cause the commit object to be loaded
+from the repository, if it hasn't been loaded already.
+
+=over
+
+=item $tag->object
+
+Return the SHA1 string of the object referenced by this tag.
+
+=item $tag->type
+
+Return the type of the referenced object, as claimed by the tag
+object.  This is usually 'commit', but can be any of 'tag', 'commit',
+'tree', or 'blob'.
+
+=item $tag->tagger
+
+Return the tagger string of this tag object.
+
+=item $tag->message
+
+Return the undecoded tag message of this tag object.
+
+=item $tag->encoding
+
+Return the encoding header of the tag object, or undef if no encoding
+header is present; note that Git::Tag does the necessary decoding for
+you, so you should not normally need this method.
+
+=back
+
+=cut
+
+
+sub object {
+	my $self = shift;
+	$self->_load;
+	return $self->{object};
+}
+
+sub type {
+	my $self = shift;
+	$self->_load;
+	return $self->{type};
+}
+
+sub tag {
+	my $self = shift;
+	$self->_load;
+	return $self->_decode($self->{tag});
+}
+
+sub tagger {
+	my $self = shift;
+	$self->_load;
+	return $self->_decode($self->{tagger});
+}
+
+sub message {
+	my $self = shift;
+	$self->_load;
+	return $self->_decode($self->{message});
+}
+
+sub encoding {
+	my $self = shift;
+	$self->_load;
+	return $self->{encoding};
+}
+
+# Auxiliary method to load (and parse) the tag object from the
+# repository if it hasn't already been loaded.  Optional parameter:
+# The raw contents of the tag object; the tag object will be retrieved
+# from the repository if that parameter is not given.
+sub _load {
+	my ($self, $raw_text) = shift;
+	return if exists $self->{message};  # already loaded
+
+	my $sha1 = $self->sha1;
+	if (!defined $raw_text) {
+		(my $type, $raw_text) = $self->repo->get_object($sha1);
+		die "$sha1 is a $type object (expected a tag object)"
+		    unless $type eq 'tag';
+	}
+
+	(my $header, $self->{message}) = split "\n\n", $raw_text, 2;
+	# Parse header.
+	for my $line (split "\n", $header) {
+		local $/ = "\n"; # for chomp
+		chomp($line);
+		my ($key, $value) = split ' ', $line, 2;
+		if ($key eq 'object') {
+			$self->{object} = $value;
+		} elsif ($key eq 'type') {
+			$self->{type} = $value;
+		} elsif ($key eq 'tag') {
+			$self->{tag} = $value;
+		} elsif ($key eq 'tagger') {
+			$self->{tagger} = $value;
+		} elsif ($key eq 'encoding') {
+			$self->{encoding} = $value;
+		} else {
+			# Ignore unrecognized header lines.
+		}
+	}
+	undef;
+}
+
+
+1;
diff --git a/perl/Makefile b/perl/Makefile
index e3dd1a5..4572dc5 100644
--- a/perl/Makefile
+++ b/perl/Makefile
@@ -31,6 +31,7 @@ $(makfile): ../GIT-CFLAGS Makefile
 	echo install: >> $@
 	echo '	mkdir -p "$(instdir_SQ)"' >> $@
 	echo '	$(RM) "$(instdir_SQ)/Git.pm"; cp Git.pm "$(instdir_SQ)"' >> $@
+	echo '	mkdir -p "$(instdir_SQ)/Git"; $(RM) "$(instdir_SQ)"/Git/*.pm; cp Git/*.pm "$(instdir_SQ)/Git"' >> $@
 	echo '	$(RM) "$(instdir_SQ)/Error.pm"' >> $@
 	'$(PERL_PATH_SQ)' -MError -e 'exit($$Error::VERSION < 0.15009)' || \
 	echo '	cp private-Error.pm "$(instdir_SQ)/Error.pm"' >> $@
diff --git a/perl/Makefile.PL b/perl/Makefile.PL
index 320253e..6c62160 100644
--- a/perl/Makefile.PL
+++ b/perl/Makefile.PL
@@ -8,7 +8,15 @@ instlibdir:
 MAKE_FRAG
 }
 
-my %pm = ('Git.pm' => '$(INST_LIBDIR)/Git.pm');
+# Note that when changing %pm, you'll have to remove pm_to_blib rather
+# than the blib directory to .force the .pm files to be re-installed
+# to blib/lib.
+my %pm = ('Git.pm' => '$(INST_LIBDIR)/Git.pm',
+	  'Git/Repo.pm' => '$(INST_LIBDIR)/Git/Repo.pm',
+	  'Git/RepoRoot.pm' => '$(INST_LIBDIR)/Git/RepoRoot.pm',
+	  'Git/Commit.pm' => '$(INST_LIBDIR)/Git/Commit.pm',
+	  'Git/Tag.pm' => '$(INST_LIBDIR)/Git/Tag.pm',
+	  'Git/Object.pm' => '$(INST_LIBDIR)/Git/Object.pm');
 
 # We come with our own bundled Error.pm. It's not in the set of default
 # Perl modules so install it if it's not available on the system yet.
diff --git a/t/t9710-perl-git-repo.sh b/t/t9710-perl-git-repo.sh
new file mode 100755
index 0000000..eca283d
--- /dev/null
+++ b/t/t9710-perl-git-repo.sh
@@ -0,0 +1,49 @@
+#!/bin/sh
+#
+# Copyright (c) 2008 Lea Wiemann
+#
+
+test_description='perl interface (Git/*.pm)'
+. ./test-lib.sh
+
+"$PERL_PATH" -e 'use 5.008' 2>/dev/null || {
+	say_color skip "Perl version older than 5.8, skipping test"
+	test_done
+}
+
+# Set up test repository.  Tagging/branching is a little tricky
+# because it needs to stay unambiguous for the name_rev tests.
+
+test_expect_success \
+    'set up test repository' \
+    'echo "test file 1" > file1 &&
+     echo "test file 2" > file2 &&
+     mkdir directory1 &&
+     echo "in directory1" >> directory1/file &&
+     mkdir directory2 &&
+     echo "in directory2" >> directory2/file &&
+     git add . &&
+     git commit -m "first commit" &&
+
+     git tag -a -m "tag message 1" tag-object-1 &&
+
+     echo "changed file 1" > file1 &&
+     git commit -a -m "second commit" &&
+
+     git branch branch-2 &&
+
+     echo "changed file 2" > file2 &&
+     git commit -a -m "third commit" &&
+
+     git tag -a -m "tag message 3" tag-object-3 &&
+     git tag -a -m "indirect tag message 3" indirect-tag-3 tag-object-3 &&
+
+     echo "changed file 1 again" > file1 &&
+     git commit -a -m "fourth commit"
+     '
+
+test_external_without_stderr \
+    'Git::Repo API' \
+    "$PERL_PATH" ../t9710/test.pl
+
+test_done
diff --git a/t/t9710/test.pl b/t/t9710/test.pl
new file mode 100755
index 0000000..8fca725
--- /dev/null
+++ b/t/t9710/test.pl
@@ -0,0 +1,136 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use Test::More qw(no_plan);
+use Cwd;
+use File::Basename;
+use File::Temp;
+
+BEGIN { use_ok('Git::Repo') }
+
+sub dies_ok (&;$) {
+	my ($coderef, $descr) = @_;
+	eval { $coderef->(); };
+	ok($@, $descr);
+}
+
+sub lives_ok (&;$) {
+	my ($coderef, $descr) = @_;
+	eval { $coderef->(); };
+	ok(!$@, $descr);
+}
+
+our $old_stderr;
+sub discard_stderr {
+	open our $old_stderr, ">&", STDERR or die "cannot save STDERR";
+	close STDERR;
+}
+sub restore_stderr {
+	open STDERR, ">&", $old_stderr or die "cannot restore STDERR";
+}
+
+# set up
+our $abs_wc_dir = Cwd->cwd;
+ok(our $r = Git::Repo->new(repo_dir => "./.git"), 'open repository');
+sub rev_parse {
+	my $name = shift;
+	chomp(my $sha1 = `git rev-parse $name 2> /dev/null`);
+	$sha1 or undef;
+}
+
+my @revisions = split /\s/, `git-rev-list --first-parent HEAD`;
+my $head = $revisions[0];
+
+# get_sha1
+is($r->get_sha1('HEAD'), $head, 'get_sha1: scalar');
+is($r->get_sha1('HEAD'), $head, 'get_sha1: scalar, repeated');
+my($sha1, $type, $head_size) = $r->get_sha1('HEAD');
+is($sha1, $head, 'get_sha1: array (SHA1)');
+is($type, 'commit', 'get_sha1: array (commit)');
+ok($head_size > 0, 'get_sha1: array (size)');
+
+# get_object
+is_deeply([$r->get_object($r->get_sha1("$revisions[-1]:file1"))], ['blob', "test file 1\n"], 'get_object: blob');
+is_deeply([$r->get_object($r->get_sha1("$revisions[-1]:file1"))], ['blob', "test file 1\n"], 'get_object: blob, repeated');
+dies_ok { $r->get_object('0' x 40) } 'get_object: non-existent sha1';
+
+# get_commit
+isa_ok($r->get_commit($revisions[-1]), 'Git::Commit',
+       'get_commit: returns Git::Commit object');
+
+# get_tag
+isa_ok($r->get_tag($r->get_sha1('tag-object-1')), 'Git::Tag',
+       'get_tag: returns Git::Tag object');
+
+# name_rev
+is($r->name_rev($revisions[-2]), 'branch-2', 'name_rev: branch');
+is($r->name_rev($head, 1), undef, 'name_rev: branch, tags only');
+is($r->name_rev($revisions[-1]), 'tags/tag-object-1^0', 'name_rev: tag object');
+is($r->name_rev($revisions[-1], 1), 'tag-object-1^0', 'name_rev: tag object, tags only');
+
+
+
+# Git::Commmit
+print "# Git::Commit:\n";
+
+BEGIN { use_ok('Git::Commit') }
+
+my $invalid_commit = Git::Commit->new($r, '0' x 40);
+is($invalid_commit->sha1, '0' x 40, 'new, sha1: accept invalid SHA1');
+dies_ok { $invalid_commit->tree } 'die on accessing properties of invalid SHA1s';
+
+$invalid_commit = Git::Commit->new($r, $r->get_sha1('HEAD:')); # tree, not commit
+dies_ok { $invalid_commit->tree } 'die on accessing properties of non-commit objects';
+
+my $c = Git::Commit->new($r, $revisions[-2]);
+is($c->repo, $r, 'repo: basic');
+is($c->sha1, $revisions[-2], 'sha1: basic');
+is($c->{parents}, undef, 'lazy loading: not loaded after reading SHA1');
+is($c->tree, $r->get_sha1("$revisions[-2]:"), 'tree: basic');
+ok($c->{parents}, 'lazy loading: loaded after reading tree');
+is_deeply([$c->parents], [$revisions[-1]], 'parents: basic');
+like($c->author, qr/A U Thor <author\@example.com> [0-9]+ \+0000/, 'author: basic');
+like($c->committer, qr/C O Mitter <committer\@example.com> [0-9]+ \+0000/, 'committer: basic');
+is($c->encoding, undef, 'encoding: undef');
+is($c->message, "second commit\n", 'message: basic');
+is($c, $c->sha1, 'stringify: basic');
+
+# error handling
+dies_ok { Git::Commit->new($r, $r->get_sha1('tag-object-3'))->_load }
+    'new: pass tag SHA1 (dies)';
+dies_ok { Git::Commit->new($r, '0' x 40)->_load }
+    'new: pass invalid SHA1 (dies)';
+
+
+# Git::Tag
+print "# Git::Tag:\n";
+
+BEGIN { use_ok('Git::Tag') }
+
+# We don't test functionality inherited from Git::Object that we
+# already tested in the Git::Commit tests.
+
+my $t = Git::Tag->new($r, $r->get_sha1('tag-object-1'));
+is($t->tag, 'tag-object-1', 'tag: basic');
+is($t->object, $revisions[-1], 'object: basic');
+is($t->type, 'commit', 'tag: type');
+like($t->tagger, qr/C O Mitter <committer\@example.com> [0-9]+ \+0000/, 'tagger: basic');
+is($t->encoding, undef, 'encoding: undef');
+is($t->message, "tag message 1\n", 'message: basic');
+
+# error handling
+dies_ok { Git::Tag->new($r, $head)->_load } 'new: pass commit SHA1 (dies)';
+dies_ok { Git::Tag->new($r, '0' x 40)->_load } 'new: pass invalid SHA1 (dies)';
+
+
+# Git::RepoRoot
+print "# Git::RepoRoot:\n";
+
+BEGIN { use_ok('Git::RepoRoot'); }
+
+my $reporoot = Git::RepoRoot->new(root_dir => $abs_wc_dir);
+is($reporoot->repo(repo_dir => '.git')->get_sha1('HEAD'), $head,
+   'repo: basic');
-- 
1.6.0.90.gf3f7b

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 3/3 v2] gitweb: use new Git::Repo API, and add optional caching
  2008-08-18 19:34 ` Lea Wiemann
  2008-08-18 19:39   ` [PATCH 1/3 v10] gitweb: add test suite with Test::WWW::Mechanize::CGI Lea Wiemann
  2008-08-18 19:39   ` [PATCH 2/3 v2] add new Perl API: Git::Repo, Git::Commit, Git::Tag, and Git::RepoRoot Lea Wiemann
@ 2008-08-18 19:39   ` Lea Wiemann
  2 siblings, 0 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-08-18 19:39 UTC (permalink / raw)
  To: git

Gitweb now uses the Git::Repo API; this change is behavior-preserving,
except for slightly more aggressive error handling; see below.

This patch also adds an optional caching layer for caching repository
data using any Cache::Cache-compatible caching engine (typically
Cache::Memcached).  Larger cacheable items, like blobs, snapshots or
diffs, can be cached on disk.

Other minor changes:

- Gitweb would previously accept invalid input and either (a) display
  nothing, (b) display an obscure error message, or (c) proceed as
  normal since the parameter happens to be unused in the particular
  code path used.  This has changed in that gitweb will check for
  parameter correctness more aggressively, and display meaningful
  error messages.  This change is only relevant if you manually edit
  gitweb's CGI parameters, since gitweb only generates valid links.

- Empty projects:

  - Only display summary link for empty projects in project list to
    avoid broken links (yielding 404).

  - Slim down summary page for empty projects to avoid some broken
    links and unnecessary vertical space.

  - Sort empty projects at the bottom of the project list when sorting
    by last change.

  - Add test for empty projects to t9503 (the Mechanize test), now
    that there no broken links anymore.

Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
---
 Makefile                               |   25 +-
 gitweb/INSTALL                         |   11 +-
 gitweb/README                          |   11 +
 gitweb/gitweb.css                      |   16 +
 gitweb/gitweb.perl                     | 1371 +++++++++++++++++++------------
 t/t9500-gitweb-standalone-no-errors.sh |    4 +-
 t/t9503-gitweb-Mechanize.sh            |   10 +
 t/t9503/test.pl                        |   19 +-
 8 files changed, 911 insertions(+), 556 deletions(-)

diff --git a/Makefile b/Makefile
index 35779a7..cab42d8 100644
--- a/Makefile
+++ b/Makefile
@@ -1128,25 +1128,28 @@ $(patsubst %.perl,%,$(SCRIPT_PERL)): perl/perl.mak
 perl/perl.mak: GIT-CFLAGS perl/Makefile perl/Makefile.PL
 	$(QUIET_SUBDIR0)perl $(QUIET_SUBDIR1) PERL_PATH='$(PERL_PATH_SQ)' prefix='$(prefix_SQ)' $(@F)
 
+PERL_USE_LIB_REWRITE = \
+	-e '1{' \
+	-e '	s|\#!.*perl|\#!$(PERL_PATH_SQ)|' \
+	-e '	h' \
+	-e '	s=.*=use lib (split(/:/, \$$ENV{GITPERLLIB} || \"@@INSTLIBDIR@@\"));=' \
+	-e '	H' \
+	-e '	x' \
+	-e '}' \
+	-e 's|@@INSTLIBDIR@@|'"$$INSTLIBDIR"'|g' \
+	-e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g'
+
 $(patsubst %.perl,%,$(SCRIPT_PERL)): % : %.perl
 	$(QUIET_GEN)$(RM) $@ $@+ && \
 	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C perl -s --no-print-directory instlibdir` && \
-	sed -e '1{' \
-	    -e '	s|#!.*perl|#!$(PERL_PATH_SQ)|' \
-	    -e '	h' \
-	    -e '	s=.*=use lib (split(/:/, $$ENV{GITPERLLIB} || "@@INSTLIBDIR@@"));=' \
-	    -e '	H' \
-	    -e '	x' \
-	    -e '}' \
-	    -e 's|@@INSTLIBDIR@@|'"$$INSTLIBDIR"'|g' \
-	    -e 's/@@GIT_VERSION@@/$(GIT_VERSION)/g' \
-	    $@.perl >$@+ && \
+	sed $(PERL_USE_LIB_REWRITE) $@.perl >$@+ && \
 	chmod +x $@+ && \
 	mv $@+ $@
 
 gitweb/gitweb.cgi: gitweb/gitweb.perl
 	$(QUIET_GEN)$(RM) $@ $@+ && \
-	sed -e '1s|#!.*perl|#!$(PERL_PATH_SQ)|' \
+	INSTLIBDIR=`MAKEFLAGS= $(MAKE) -C perl -s --no-print-directory instlibdir` && \
+	sed $(PERL_USE_LIB_REWRITE) \
 	    -e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \
 	    -e 's|++GIT_BINDIR++|$(bindir)|g' \
 	    -e 's|++GITWEB_CONFIG++|$(GITWEB_CONFIG)|g' \
diff --git a/gitweb/INSTALL b/gitweb/INSTALL
index 26967e2..c4ab593 100644
--- a/gitweb/INSTALL
+++ b/gitweb/INSTALL
@@ -6,8 +6,15 @@ First you have to generate gitweb.cgi from gitweb.perl using
 gitweb.css, git-logo.png and git-favicon.png) to their destination.
 For example if git was (or is) installed with /usr prefix, you can do
 
-	$ make prefix=/usr gitweb/gitweb.cgi  ;# as yourself
-	# cp gitweb/git* /var/www/cgi-bin/    ;# as root
+	$ make prefix=/usr gitweb/gitweb.cgi           ;# as yourself
+	# cp gitweb/git* /var/www/cgi-bin/             ;# as root
+
+If the Git Perl package isn't already installed system-wide, you also
+need to copy the perl/blib/lib/Git directory to your cgi-bin
+directory:
+
+	$ make prefix=/usr -C perl                     ;# as yourself
+	# cp -r perl/blib/lib/Git /var/www/cgi-bin/    ;# as root
 
 Alternatively you can use autoconf generated ./configure script to
 set up path to git binaries (via config.mak.autogen), so you can write
diff --git a/gitweb/README b/gitweb/README
index 825162a..81ec2ae 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -214,6 +214,17 @@ not include variables usually directly set during build):
    Rename detection options for git-diff and git-diff-tree. By default
    ('-M'); set it to ('-C') or ('-C', '-C') to also detect copies, or
    set it to () if you don't want to have renames detection.
+ * $cache
+   To activate caching, set this to an instance of a Cache::Cache
+   compatible cache (e.g. Cache::Memcached).
+ * $large_cache_root
+   A directory in which gitweb will store potentially large cache
+   items (like snapshots or blobs).  Cache items in this directory are
+   not deleted, so you should clean it up yourself periodically.
+   Default: undef (no caching for large items).
+ * $page_info
+   Set to 1 to display cache statistics at the bottom of each page;
+   can be used to check whether your cache works.
 
 
 Projects list file format
diff --git a/gitweb/gitweb.css b/gitweb/gitweb.css
index aa0eeca..849bb8e 100644
--- a/gitweb/gitweb.css
+++ b/gitweb/gitweb.css
@@ -71,6 +71,22 @@ div.page_footer_text {
 	font-style: italic;
 }
 
+div.page_info {
+	color: #777777;
+	padding: 4px 8px;
+	margin-top: 0.5em;
+	background-color: #ecebe8;
+}
+
+div.page_info p {
+	margin: 0px;
+	padding: 0px;
+}
+
+pre.cache_list {
+	margin: 0.5em 2em;
+}
+
 div.page_body {
 	padding: 8px;
 	font-family: monospace;
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 90cd99b..c338fde 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -16,6 +16,9 @@ use Encode;
 use Fcntl ':mode';
 use File::Find qw();
 use File::Basename qw(basename);
+
+use Git::RepoRoot;
+
 binmode STDOUT, ':utf8';
 
 BEGIN {
@@ -161,6 +164,47 @@ our %known_snapshot_format_aliases = (
 	'x-zip' => undef, '' => undef,
 );
 
+# Cache::Cache object to cache data from the repository, or undef for
+# no cache.  You would typically use a Cache::Memcached instance here.
+our $cache = undef;
+
+# Expiration time in seconds for transient cache entries, or undef for
+# no expiration.  (Only used if $cache is defined.)
+#
+# Transient cache entries (like get_sha1('HEAD')) are automatically
+# invalidated when an mtime of either the repository's root directory
+# or of the refs directory or any subdirectory changes.  This
+# mechanism *should* detect changes to the repository reliably if you
+# only use git or rsync to write to it, and hence this expiration time
+# can be set very high.  (This does not default to non-expiring
+# [undef] just in case a change goes undetected for some reason.)  You
+# might want to set this to a lower time (e.g. a few minutes) if
+# developers change files in the refs directories on your server by
+# non-standard means (i.e. manually).
+#
+# You can usually go with the default here.
+our $cache_transient_expiration_time = 60 * 60 * 24;
+
+# Directory on disk to hold potentially large cache items (in
+# particular, snapshots, diffs and blobs), or undef for no cache.
+# Cache files will be created in this directory, but they will not be
+# expired; you should periodically delete old files yourself.  Setting
+# $large_cache_root but not $cache is possible, but usually not
+# sensible.
+our $large_cache_root = undef;
+
+# Extra cache key component to use.  This should stringify to a string
+# without null bytes; it is used as a means to discard all cache
+# entries whenever this key changes.  Since gitweb already uses its
+# own version number as a cache key component, you don't normally need
+# this unless you're doing development work on gitweb (in which case
+# you might set it to gitweb's mtime, (stat 'gitweb.cgi')[9]).
+our $cache_key = undef;
+
+# Display information in the footer of each page (currently only cache
+# statistics): 0 = none, 1 = short, 2 = long.
+our $page_info = 0;
+
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %feature = (
@@ -379,6 +423,363 @@ if (-e $GITWEB_CONFIG) {
 # version of the core git binary
 our $git_version = qx("$GIT" --version) =~ m/git version (.*)$/ ? $1 : "unknown";
 
+
+# ======================================================================
+# caching layer
+
+package CachedRepo;
+
+use Digest::MD5 qw(md5_hex);
+use List::Util qw(max);
+
+use base qw(Git::Repo);
+
+# Global statistics, collected across repositories.
+# Hits, misses, sets, and failed_sets are counters, and get_list is an
+# arrayref of keys, where a key is an arrayref of key items.
+our %cache_statistics = (
+	hits => 0, misses => 0, sets => 0, failed_sets => 0, get_list => []);
+our %large_cache_statistics = (
+	hits => 0, misses => 0, sets => 0, failed_sets => 0, get_list => []);
+
+# Options: like Git::Repo->new, and the following:
+# cache: a Cache::Cache conforming cache instance
+# transient_expiration_time: expiration time in seconds for transient
+#     cache entries (like get_hash('HEAD')), or undef; do not set to
+#     30 days or more, since it makes Cache::Memcached hiccup
+sub new {
+	my ($class, %opts) = @_;
+	my $cache = delete $opts{cache};
+	my $large_cache_root = delete $opts{large_cache_root};
+	my $transient_expiration_time = delete $opts{transient_expiration_time};
+	my $self = $class->SUPER::new(%opts);
+	$self->{cache} = $cache;
+	$self->{large_cache_root} = $large_cache_root;
+	$self->{transient_expiration_time} = $transient_expiration_time;
+	return $self;
+}
+
+sub cache { shift->{cache} }
+sub large_cache_root { shift->{large_cache_root} }
+sub transient_expiration_time { shift->{transient_expiration_time} }
+
+# Fast function to generate a unique (short, hashed) key for the cache
+# to use.  None of the parameters *should* contain null bytes. Example:
+# $repo->get_key('get_sha1', 'HEAD:file1') eq '05dd723732a45fcac864787ec1897cc1'
+sub get_key {
+	my $self = shift;
+	# Some caches (like Cache::FileCache) hash keys themselves,
+        # but Cache::Memcached does not like long keys, so we need to
+        # hash them.  MD5 is fine here, since (as of July 2008) there
+        # are only collision attacks, but no practical preimage
+        # attacks on MD5.  Constructing two colliding keys doesn't
+        # seem to pose much of a threat for the cache.  Digest::SHA1
+        # is only in core as of Perl 5.9, so we cannot use it here.
+	return md5_hex(join "\0", $self->_key_items(@_));
+}
+
+# Return a list of strings that can be used to generate a key.
+sub _key_items {
+	my $self = shift;
+	return map { defined $_ ? " $_" : '' }
+	    ('gitweb', $version, $cache_key, 'project', $self->repo_dir, @_);
+}
+
+# Convenience function: cache_set(\@key, $value, $expire);
+# $expire is boolean and indicates whether an expiry time should be set.
+sub cache_set {
+	my ($self, $key, $value, $expire) = @_;
+	return unless $self->cache;
+	my $expiration_token = $expire ? $self->get_last_modification() : undef;
+	my $ok = $self->cache->set(
+		$self->get_key($expiration_token, @$key), $value,
+		$expire ? $self->transient_expiration_time : ());
+	$ok ? $cache_statistics{sets}++ : $cache_statistics{failed_sets}++;
+	return $ok;
+}
+
+# Convenience function: cache_get(\@key, $expire)
+# $expire must be the same that has been used for cache_set or the
+# lookup will fail.
+sub cache_get {
+	my ($self, $key, $expire) = @_;
+	return unless $self->cache;
+	my $expiration_token = $expire ? $self->get_last_modification() : undef;
+	my $val = $self->cache->get($self->get_key($expiration_token, @$key));
+	defined $val ? $cache_statistics{hits}++ : $cache_statistics{misses}++;
+	push @{$cache_statistics{get_list}},
+	    [$self->repo_dir, $expire ? 1 : 0, @$key];
+	return $val;
+}
+
+# Return the output of the given git command as a string.
+# Valid options are:
+# cmd: An arrayref of arguments to pass to git (mandatory).
+# max_exit_code: Die if the exit code of the git binary is greater
+#     than this (default: 0).
+# cache: If 1, the output is cached but expires when the repo is
+#     modified; if 2, it is cached indefinitely.
+sub cmd_output {
+	my ($self, %opts) = @_;
+	my $key = ['cmd', $opts{max_exit_code}, @{$opts{cmd}}];
+	my $output;
+	unless ($opts{cache} && defined($output = $self->cache_get(
+						$key, $opts{cache} == 1))) {
+		my @cmd = ($self->_git_cmd, @{$opts{cmd}});
+		my $cmd = join ' ', @cmd;
+		open my $fh, '-|', @cmd or die "cannot open pipe: $cmd";
+		local $/;
+		$output = <$fh>;
+		if (!close $fh) {
+			die "error closing pipe ($!): $cmd" if $!;
+			my $exit_code = $? >> 8;
+			die "Command failed with exit code $exit_code: $cmd"
+			    if $exit_code > ($opts{max_exit_code} || 0);
+		}
+		$self->cache_set($key, $output, $opts{cache} == 1) if $opts{cache};
+	}
+	return $output;
+}
+
+# progressive_cmd_output(%opts)
+#
+# Return a coderef that returns a chunk of the output of the given
+# command each time it is called, or undef when the output is
+# exhausted.  For the output to be cached, it must be read until the
+# coderef returns undef, otherwise it will leave a stale temporary
+# file in the cache.
+#
+# Options:
+# cmd: an arrayref or string of arguments to git; if it's a string, it will be
+#     passed to the shell
+# max_exit_code: die if the command exits with a higher exit code (default: 0)
+# separator: like $/; if undef, read the output in chunks of arbitrary size
+# cache: if true, cache the output of the command (without expiration)
+#
+# Example:
+# my $diff_read = $repo->progressive_cmd_output(
+#     cmd => ['diff', $from_sha1, $to_sha1], separator => "\n", cache => 1]);
+# while (my $line = $diff_read->()) {
+# 	chomp $line;
+# 	...
+# }
+my $_file_seq = 0;
+sub progressive_cmd_output {
+	die 'must pass an odd number of arguments' unless @_ % 2;
+	my ($self, %opts) = @_;
+	local $/ = defined $opts{separator} ? $opts{separator} : \32768;
+	my (@cmd, $cmd_str);
+	if (ref($opts{cmd}) eq 'ARRAY') {
+		@cmd = ($self->_git_cmd, @{$opts{cmd}});
+		$cmd_str = join " ", @cmd;  # only used for diagnostics
+	} else {
+		$cmd_str = main::quote_command($self->_git_cmd) .
+		    " $opts{cmd}";  # this will be passed to the shell
+	}
+	# We read from $fh, whether it's a pipe or a cache file.  If
+	# it's a pipe, we also progressively cache it to
+	# $tmp_cache_file, and at the end move $tmp_cache_file_name to
+	# $cache_file_name.  This avoids having partially written
+	# cache entries.
+	my ($fh, $cache_file_name, $tmp_cache_file_name, $tmp_cache_file);
+	if ($opts{cache} && $self->large_cache_root) {
+		my @key_items = ('cmd', $opts{max_exit_code},
+				 ref($opts{cmd}) eq 'ARRAY' ?
+				 (@{$opts{cmd}}) : (undef, $opts{cmd}));
+		push @{$large_cache_statistics{get_list}},
+		    [$self->repo_dir, @key_items];
+		$cache_file_name = File::Spec->catfile(
+			$self->large_cache_root, $self->get_key(@key_items));
+		$tmp_cache_file_name = File::Spec->catfile(
+			$self->large_cache_root,
+			join('.', $$, $_file_seq++, 'tmp'));
+	}
+	unless ($cache_file_name && open $fh, '<', $cache_file_name) {
+		# Not in cache -- open pipe, and open cache file to write to.
+		if (@cmd) {
+			open $fh, '-|', @cmd;
+		} else {
+			open $fh, '-|', $cmd_str;
+		}
+		die "cannot open pipe: $cmd_str" unless $fh;
+		open $tmp_cache_file, '>', $tmp_cache_file_name
+		    if $tmp_cache_file_name;
+		# Increment failed_sets; it will be decremented upon
+		# successful finalization of the cache entry.
+		$large_cache_statistics{failed_sets}++;
+		# Do not die if the open fails, it simply leaves
+		# $tmp_cache_file undefined.
+		# Record uncached calls as misses.
+		$large_cache_statistics{misses}++;
+	} else {
+		$large_cache_statistics{hits}++;
+	}
+	my $read = sub {
+		return undef unless $fh;  # already closed
+		my $output = <$fh>;
+		if ($output) {
+			# Write to cache and return.
+			if ($tmp_cache_file && ! print $tmp_cache_file $output) {
+				# Writing to cache failed; clean up
+				# and stop caching this pipe.
+				close $tmp_cache_file;
+				$tmp_cache_file = undef;
+				unlink $tmp_cache_file_name;
+			}
+			return $output;
+		}
+		# End of output; close and finalize cache.
+		if (close $fh) {
+			# We sometimes get mysterious "Bad file
+			# descriptor" errors here, but reading from
+			# the pipe worked fine, so let's not die.
+			#die "error closing pipe ($!): $cmd_str" if $!;
+			my $exit_code = $? >> 8;
+			die "Command died with exit code $exit_code: $cmd_str"
+			    if $exit_code > ($opts{max_exit_code} || 0);
+		}
+		$fh = undef;
+		if ($tmp_cache_file && close $tmp_cache_file) {
+			# Cache file written OK, move it in place.
+			if (rename $tmp_cache_file_name, $cache_file_name) {
+				$large_cache_statistics{failed_sets}--;
+				$large_cache_statistics{sets}++;
+			}
+		}
+		return undef;
+	};
+	# We can also provide a &close function here in case it
+	# becomes necessary to close pipes prematurely.
+	return $read;
+}
+
+# The following methods override the base class (Git::Repo) methods to
+# add caching.
+
+sub get_sha1 {
+	my ($self, $object_id) = @_;
+	my $expire = ($object_id !~ /^[0-9a-f]{40}(?![0-9a-f])/);
+	my $triple = $self->cache_get(['SHA1', $object_id], $expire);
+	unless (defined $triple) {
+		$triple = [$self->SUPER::get_sha1($object_id)];
+		# Do not cache failed lookups -- missing SHA1s would
+		# be permanently cached, but a subsequent push to the
+		# repository might add those missing SHA1s to the
+		# repository.
+		return unless $triple->[0];
+		$self->cache_set(['SHA1', $object_id], $triple, $expire);
+	}
+	return wantarray ? @$triple : $triple->[0];
+}
+
+sub get_object {
+	my ($self, $sha1) = @_;
+	my $type_content = $self->cache_get(['cat-file', $sha1], 0);
+	unless (defined $type_content) {
+		$type_content = [$self->SUPER::get_object($sha1)];
+		die 'unexpected empty return value' unless @$type_content;
+		$self->cache_set(['cat-file', $sha1], $type_content, 0);
+	}
+	return wantarray ? @$type_content : $type_content->[1];
+}
+
+# get_commit and get_tag only return empty Commit and Tag objects,
+# which when loaded (lazily), happen to call get_object and thus are
+# cached as well.
+
+sub name_rev {
+	my ($self, $sha1, $tags_only) = @_;
+	my $name = $self->cache_get(['name-rev', $sha1, $tags_only], 1);
+	unless (defined $name) {
+		# || '' is to cache failed lookups (name_rev doesn't
+		# ever return empty names).
+		$name = $self->SUPER::name_rev($sha1, $tags_only) || '';
+		$self->cache_set(['name-rev', $sha1, $tags_only], $name, 1);
+	}
+	return $name || undef;
+}
+
+# Return the seconds since epoch when the repository was last touched.
+sub get_last_modification {
+	my $self = shift;
+	return $self->{last_modification} if $self->{last_modification};
+	# Hashref mapping absolute paths of directories to mtimes.  We
+	# rely on the fact here that every time git updates a file, it
+	# creates it under a different name and then moves it in
+	# place, thus causing the mtime of the containing directory to
+	# be updated.  Hence it's enough to just stat the directories.
+	my $mtimes = $self->cache_get(['mtimes'], 0);
+	if ($mtimes) {
+		CHECK_CACHE: {
+			# Check if the cache response is up to date.
+			while (my ($dir, $mtime) = each %$mtimes) {
+				last CHECK_CACHE if (stat $dir)[9] != $mtime;
+			}
+			$self->{last_modification} = max(values %$mtimes);
+			return max(values %$mtimes);
+		}
+	}
+	# Either mtimes are not in cache, or at least one directory
+	# has been updated.  Traverse the whole ref tree and record
+	# all directory mtimes -- this is a bit slower than the
+	# up-to-date-ness check above since we end up stat'ing all
+	# files in the refs directory.
+	my $time = time;
+	$mtimes = { $self->repo_dir => (stat $self->repo_dir)[9] };
+	my $cacheable = 1;
+	# Traverse refs/ directory and record the mtimes of all
+	# directories.  Implementation note: This stats *all* files in
+	# the tree; this can become quite inefficient for repositories
+	# with many refs (say, >100) that are pushed to frequently.
+	# There are two possible solutions:
+	# 1. If we dropped the requirement of following symlinks, it
+	#    would in theory be possible to exploit the nlinks==2
+	#    feature of many Unix file systems.  However, File::Find
+	#    doesn't allow us to only traverse directories.
+	# 2. We could traverse only those directories whose mtimes have
+	#    changed since the last time we checked (which we got from
+	#    the cache); this makes the traversal code slightly more
+	#    complicated, but should solve the worst efficiency issues.
+	File::Find::find( {
+		wanted => sub {
+			my $time = time;  # get time first
+			# No way to avoid stat'ing unconditionally
+			# with File::Find.
+			my @stat = stat($_);
+			if (Fcntl::S_ISDIR($stat[2])) {
+				# Record the directory's mtime.
+				$mtimes->{$File::Find::name} = $stat[9];
+				# Mtimes have a 1-second granularity,
+				# so if the directory has *just* been
+				# modified, we might miss subsequent
+				# modifictions in the same second if
+				# we cached it.
+				$cacheable = 0 if $stat[9] >= $time;
+			}
+		},
+		# The refs/ tree can contain symlinks -- e.g. as of
+		# Aug 08 repo.or.cz uses a 'forkee' symlink, which
+		# links to another repository's refs/ directory.
+		follow => 1,
+		follow_skip => 2, # ignore duplicates
+		}, File::Spec->catfile($self->repo_dir, 'refs'));
+	$self->cache_set(['mtimes'], $mtimes, 0) if $cacheable;
+	$self->{last_modification} = max(values %$mtimes);
+	return max(values %$mtimes);
+}
+
+package main;
+
+
+our $repo_root = Git::RepoRoot->new(root_dir => $projectroot,
+				    git_binary => $GIT,
+				    cache => $cache,
+				    large_cache_root => $large_cache_root,
+				    transient_expiration_time =>
+				      $cache_transient_expiration_time,
+				    repo_class => 'CachedRepo'
+    );
+
 $projects_list ||= $projectroot;
 
 # ======================================================================
@@ -392,6 +793,7 @@ if (defined $action) {
 
 # parameters which are pathnames
 our $project = $cgi->param('p');
+our $repo = $repo_root->repo(repo_dir => $project) if $project;
 if (defined $project) {
 	if (!validate_pathname($project) ||
 	    !(-d "$projectroot/$project") ||
@@ -509,13 +911,14 @@ sub evaluate_path_info {
 		undef $project;
 		return;
 	}
+	$repo = $repo_root->repo(repo_dir => $project);
 	# do not change any parameters if an action is given using the query string
 	return if $action;
 	$path_info =~ s,^\Q$project\E/*,,;
 	my ($refname, $pathname) = split(/:/, $path_info, 2);
 	if (defined $pathname) {
 		# we got "project.git/branch:filename" or "project.git/branch:dir/"
-		# we could use git_get_type(branch:pathname), but it needs $git_dir
+		# we could use git_get_type(branch:pathname) here
 		$pathname =~ s,^/+,,;
 		if (!$pathname || substr($pathname, -1) eq "/") {
 			$action  ||= "tree";
@@ -533,10 +936,6 @@ sub evaluate_path_info {
 }
 evaluate_path_info();
 
-# path to the current git repository
-our $git_dir;
-$git_dir = "$projectroot/$project" if $project;
-
 # dispatch
 my %actions = (
 	"blame" => \&git_blame,
@@ -597,8 +996,7 @@ sub href (%) {
 	# default is to use -absolute url() i.e. $my_uri
 	my $href = $params{-full} ? $my_url : $my_uri;
 
-	# XXX: Warning: If you touch this, check the search form for updating,
-	# too.
+	# If you touch this, check the search form for updating, too.
 
 	my @mapping = (
 		project => "p",
@@ -1309,10 +1707,12 @@ sub format_diff_cc_simplified {
 	$result .= "<div class=\"diff header\">" .
 	           "diff --cc ";
 	if (!is_deleted($diffinfo)) {
-		$result .= $cgi->a({-href => href(action=>"blob",
-		                                  hash_base=>$hash,
-		                                  hash=>$diffinfo->{'to_id'},
-		                                  file_name=>$diffinfo->{'to_file'}),
+		$result .= $cgi->a(
+			{-href => href(action=>"blob",
+				       $hash && git_get_type($hash) eq 'commit' ?
+				       (hash_base=>$hash) : (),
+				       hash=>$diffinfo->{'to_id'},
+				       file_name=>$diffinfo->{'to_file'}),
 		                    -class => "path"},
 		                   esc_path($diffinfo->{'to_file'}));
 	} else {
@@ -1332,8 +1732,6 @@ sub format_diff_line {
 	my ($from, $to) = @_;
 	my $diff_class = "";
 
-	chomp $line;
-
 	if ($from && $to && ref($from->{'href'}) eq "ARRAY") {
 		# combined diff
 		my $prefix = substr($line, 0, scalar @{$from->{'href'}});
@@ -1495,11 +1893,6 @@ sub get_feed_info {
 ## ----------------------------------------------------------------------
 ## git utility subroutines, invoking git commands
 
-# returns path to the core git executable and the --git-dir parameter as list
-sub git_cmd {
-	return $GIT, '--git-dir='.$git_dir;
-}
-
 # quote the given arguments for passing them to the shell
 # quote_command("command", "arg 1", "arg with ' and ! characters")
 # => "'command' 'arg 1' 'arg with '\'' and '\!' characters'"
@@ -1509,33 +1902,56 @@ sub quote_command {
 		    map( { my $a = $_; $a =~ s/(['!])/'\\$1'/g; "'$a'" } @_ ));
 }
 
-# get HEAD ref of given project as hash
-sub git_get_head_hash {
-	my $project = shift;
-	my $o_git_dir = $git_dir;
-	my $retval = undef;
-	$git_dir = "$projectroot/$project";
-	if (open my $fd, "-|", git_cmd(), "rev-parse", "--verify", "HEAD") {
-		my $head = <$fd>;
-		close $fd;
-		if (defined $head && $head =~ /^([0-9a-fA-F]{40})$/) {
-			$retval = $1;
+# git_get_sha1_or_die ( EXTENDED_OBJECT_IDENTIFER [, TYPE] )
+#
+# Look up the object referred to by C<EXTENDED_OBJECT_IDENTIFER> and
+# return its SHA1 hash in scalar context or its ($hash, $type, $size)
+# in list context.  Return an error page to the browser if the object
+# couldn't be found.
+#
+# If C<TYPE> is given, resolve tag and commit objects if necessary and
+# die unless the object found has the right type.  The $type return
+# value is guaranteed to equal C<TYPE>.
+sub git_get_sha1_or_die {
+	my ($object_id, $want_type) = @_;
+	# This method shouldn't be used for checking missing
+	# parameters, since it cannot generate proper error messages.
+	# Hence we die with 500.
+	die_error(500, 'No object given') unless $object_id;
+	my ($hash, $type, $size) = $repo->get_sha1($object_id);
+	unless ($hash) {
+		my $human_type = ucfirst($want_type || 'object');
+		die_error(404, "$human_type not found: '$object_id'");
+	}
+	if ($want_type && $want_type ne $type) {
+		if ($type eq 'tag') {
+			return git_get_sha1_or_die(
+				$repo->get_tag($hash)->object, $want_type);
+		} elsif ($type eq 'commit' && $want_type eq 'tree') {
+			return git_get_sha1_or_die(
+				$repo->get_commit($hash)->tree, $want_type);
+		} else {
+			# $object_id and $type can be off due to recursion,
+			# but fixing it complicates the code too much.
+			die_error(400, "Expected a $want_type object, but " .
+				  "'$object_id' is a $type object");
 		}
 	}
-	if (defined $o_git_dir) {
-		$git_dir = $o_git_dir;
-	}
-	return $retval;
+	return wantarray ? ($hash, $type, $size) : $hash;
+}
+
+# get HEAD ref hash of current project or die if no HEAD ref was found
+sub git_get_head_hash {
+	# Need to have the global $project variable defined.
+	die_error(400, 'no project given') unless $project;
+	my $sha1 = $repo->get_sha1('HEAD')
+	    or die_error(500, "HEAD ref not found for project '$project'");
+	return $sha1;
 }
 
 # get type of given object
 sub git_get_type {
-	my $hash = shift;
-
-	open my $fd, "-|", git_cmd(), "cat-file", '-t', $hash or return;
-	my $type = <$fd>;
-	close $fd or return;
-	chomp $type;
+	my ($sha1, $type, $size) = $repo->get_sha1(shift);
 	return $type;
 }
 
@@ -1563,20 +1979,14 @@ sub git_parse_project_config {
 	my $section_regexp = shift;
 	my %config;
 
-	local $/ = "\0";
-
-	open my $fh, "-|", git_cmd(), "config", '-z', '-l',
-		or return;
-
-	while (my $keyval = <$fh>) {
-		chomp $keyval;
+	return unless $repo;
+	for my $keyval (split "\0", $repo->cmd_output(
+				cmd => [qw(config -z -l)], cache => 1)) {
 		my ($key, $value) = split(/\n/, $keyval, 2);
 
 		hash_set_multi(\%config, $key, $value)
 			if (!defined $section_regexp || $key =~ /^(?:$section_regexp)\./o);
 	}
-	close $fh;
-
 	return %config;
 }
 
@@ -1639,9 +2049,9 @@ sub git_get_project_config {
 
 	# get config
 	if (!defined $config_file ||
-	    $config_file ne "$git_dir/config") {
+	    $config_file ne "$projectroot/$project/config") {
 		%config = git_parse_project_config('gitweb');
-		$config_file = "$git_dir/config";
+		$config_file = "$projectroot/$project/config";
 	}
 
 	# ensure given type
@@ -1656,65 +2066,56 @@ sub git_get_project_config {
 	return $config{"gitweb.$key"};
 }
 
-# get hash of given path at given ref
-sub git_get_hash_by_path {
-	my $base = shift;
-	my $path = shift || return undef;
-	my $type = shift;
-
+# Return the SHA1 of the blob or tree at the path in the given commit,
+# or return undef if it does not exist.
+sub git_get_sha1_by_path {
+	my ($base, $path, $type) = @_;
 	$path =~ s,/+$,,;
-
-	open my $fd, "-|", git_cmd(), "ls-tree", $base, "--", $path
-		or die_error(500, "Open git-ls-tree failed");
-	my $line = <$fd>;
-	close $fd or return undef;
-
-	if (!defined $line) {
-		# there is no tree or hash given by $path at $base
-		return undef;
-	}
-
-	#'100644 blob 0fa3f3a66fb6a137f6ec2c19351ed4d807070ffa	panic.c'
-	$line =~ m/^([0-9]+) (.+) ([0-9a-fA-F]{40})\t/;
-	if (defined $type && $type ne $2) {
-		# type doesn't match
-		return undef;
-	}
-	return $3;
+	return $repo->get_sha1("$base:$path", $type);
 }
 
-# get path of entry with given hash at given tree-ish (ref)
-# used to get 'from' filename for combined diff (merge commit) for renames
+# Get path of entry with given hash at given tree-ish (ref); used to
+# get 'from' filename for combined diff (merge commit) for renames.
+# Note that this does not resolve tag or commit objects in the $hash
+# parameter, you must pass a tree or blob object.
 sub git_get_path_by_hash {
 	my $base = shift || return;
 	my $hash = shift || return;
 
-	local $/ = "\0";
-
-	open my $fd, "-|", git_cmd(), "ls-tree", '-r', '-t', '-z', $base
-		or return undef;
-	while (my $line = <$fd>) {
-		chomp $line;
-
-		#'040000 tree 595596a6a9117ddba9fe379b6b012b558bac8423	gitweb'
-		#'100644 blob e02e90f0429be0d2a69b76571101f20b8f75530f	gitweb/README'
-		if ($line =~ m/(?:[0-9]+) (?:.+) $hash\t(.+)$/) {
-			close $fd;
+	# This subroutine could be extracted into the Git::Tree API
+	# once it exists.
+	my $tree = git_get_sha1_or_die($base, 'tree');
+	my ($file_sha1, $file_type) = $repo->get_sha1($hash);
+	die_error(404, "object not found: '$hash'") unless $file_sha1;
+	die_error(400, "'$hash' is a $file_type object, not a tree or blob object")
+	    unless $file_type eq 'blob' || $file_type eq 'tree';
+
+	# The ls-tree output can be quite large, so use
+	# progressive_cmd_output.
+	my $ls_tree_read = $repo->progressive_cmd_output(
+		cmd => [qw(ls-tree -r -t), $tree], separator => "\n",
+		cache => 1);
+	while (my $line = $ls_tree_read->()) {
+		if ($line =~ /^[0-9]+ [a-z]+ $file_sha1\t(.+)$/) {
+			while ($ls_tree_read->()) { } # cache it
+			# TODO: needs unquoting
 			return $1;
 		}
 	}
-	close $fd;
 	return undef;
 }
 
 ## ......................................................................
 ## git utility functions, directly accessing git repository
 
+# The following subroutines locally change the global $project
+# variable as a side-effect so that their calls to
+# git_get_project_config work.
+
 sub git_get_project_description {
-	my $path = shift;
+	local $project = shift;
 
-	$git_dir = "$projectroot/$path";
-	open my $fd, "$git_dir/description"
+	open my $fd, "$projectroot/$project/description"
 		or return git_get_project_config('description');
 	my $descr = <$fd>;
 	close $fd;
@@ -1725,10 +2126,9 @@ sub git_get_project_description {
 }
 
 sub git_get_project_url_list {
-	my $path = shift;
+	local $project = shift;
 
-	$git_dir = "$projectroot/$path";
-	open my $fd, "$git_dir/cloneurl"
+	open my $fd, "$projectroot/$project/cloneurl"
 		or return wantarray ?
 		@{ config_to_multi(git_get_project_config('url')) } :
 		   config_to_multi(git_get_project_config('url'));
@@ -1863,11 +2263,10 @@ sub git_get_project_list_from_file {
 }
 
 sub git_get_project_owner {
-	my $project = shift;
+	local $project = shift;
 	my $owner;
 
 	return undef unless $project;
-	$git_dir = "$projectroot/$project";
 
 	if (!defined $gitweb_project_owner) {
 		git_get_project_list_from_file();
@@ -1880,44 +2279,46 @@ sub git_get_project_owner {
 		$owner = git_get_project_config('owner');
 	}
 	if (!defined $owner) {
-		$owner = get_file_owner("$git_dir");
+		$owner = get_file_owner("$projectroot/$project");
 	}
 
 	return $owner;
 }
 
 sub git_get_last_activity {
-	my ($path) = @_;
-	my $fd;
+	my $path = shift;
 
-	$git_dir = "$projectroot/$path";
-	open($fd, "-|", git_cmd(), 'for-each-ref',
-	     '--format=%(committer)',
-	     '--sort=-committerdate',
-	     '--count=1',
-	     'refs/heads') or return;
-	my $most_recent = <$fd>;
-	close $fd or return;
-	if (defined $most_recent &&
-	    $most_recent =~ / (\d+) [-+][01]\d\d\d$/) {
-		my $timestamp = $1;
-		my $age = time - $timestamp;
-		return ($age, age_string($age));
-	}
-	return (undef, undef);
+	chomp(my $most_recent = $repo_root->repo(repo_dir => $path)->cmd_output(
+		      cmd => [ qw(for-each-ref --count=1 --format=%(committer)),
+			       qw(--sort=-committerdate refs/heads) ],
+		      cache => 1, max_exit_code => 255)) or return;
+	$most_recent =~ / (\d+) [-+][01]\d\d\d$/ or return;
+	my $timestamp = $1;
+	my $age = time - $timestamp;
+	return ($age, age_string($age));
 }
 
+# Return a hashref from SHA1s to arrayrefs of ref names.  Example:
+# { '7e51...' => ['tags/tag-object'], # tag SHA1
+#   '51ba...' => ['tags/tag-object'], # referenced commit SHA1
+#   '3c4a...' => ['heads/master', 'tags/another-tag'] }
 sub git_get_references {
 	my $type = shift || "";
 	my %refs;
-	# 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c refs/tags/v2.6.11
-	# c39ae07f393806ccf406ef966e9a15afc43cc36a refs/tags/v2.6.11^{}
-	open my $fd, "-|", git_cmd(), "show-ref", "--dereference",
-		($type ? ("--", "refs/$type") : ()) # use -- <pattern> if $type
-		or return;
-
-	while (my $line = <$fd>) {
-		chomp $line;
+	# This is not implementable in terms of a simple for-each-refs
+	# call (e.g. extractable to Git::Repo->get_refs) because it
+	# doesn't dereference, and we cannot dereference a lot of
+	# SHA1s ourselves as long as there is no implementation that
+	# uses Cache::Memcached->get_multi.  Hence, we use
+	# cmd_output.
+	my @lines = split "\n", $repo->cmd_output(
+		cmd => ['show-ref', '--dereference',
+			($type ? ("--", "refs/$type") : ())],
+		max_exit_code => 1,  # exits with status 1 on empty repos
+		cache => 1);
+	for my $line (@lines) {
+		# 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c refs/tags/v2.6.11
+		# c39ae07f393806ccf406ef966e9a15afc43cc36a refs/tags/v2.6.11^{}
 		if ($line =~ m!^([0-9a-fA-F]{40})\srefs/($type/?[^^]+)!) {
 			if (defined $refs{$1}) {
 				push @{$refs{$1}}, $2;
@@ -1926,24 +2327,15 @@ sub git_get_references {
 			}
 		}
 	}
-	close $fd or return;
 	return \%refs;
 }
 
 sub git_get_rev_name_tags {
-	my $hash = shift || return undef;
-
-	open my $fd, "-|", git_cmd(), "name-rev", "--tags", $hash
-		or return;
-	my $name_rev = <$fd>;
-	close $fd;
+	my $hash = shift or die_error();
 
-	if ($name_rev =~ m|^$hash tags/(.*)$|) {
-		return $1;
-	} else {
-		# catches also '$hash undefined' output
-		return undef;
-	}
+	my $name = $repo->name_rev($hash);
+	$name =~ s!^tags/!! if $name;
+	return $name;
 }
 
 ## ----------------------------------------------------------------------
@@ -1982,14 +2374,12 @@ sub parse_date {
 }
 
 sub parse_tag {
-	my $tag_id = shift;
+	my $sha1 = shift;
 	my %tag;
 	my @comment;
 
-	open my $fd, "-|", git_cmd(), "cat-file", "tag", $tag_id or return;
-	$tag{'id'} = $tag_id;
-	while (my $line = <$fd>) {
-		chomp $line;
+	my ($raw_header, $raw_comment) = split "\n\n", $repo->get_object($sha1), 2;
+	for my $line (split "\n", $raw_header) {
 		if ($line =~ m/^object ([0-9a-fA-F]{40})$/) {
 			$tag{'object'} = $1;
 		} elsif ($line =~ m/^type (.+)$/) {
@@ -2000,19 +2390,9 @@ sub parse_tag {
 			$tag{'author'} = $1;
 			$tag{'epoch'} = $2;
 			$tag{'tz'} = $3;
-		} elsif ($line =~ m/--BEGIN/) {
-			push @comment, $line;
-			last;
-		} elsif ($line eq "") {
-			last;
 		}
 	}
-	push @comment, <$fd>;
-	$tag{'comment'} = \@comment;
-	close $fd or return;
-	if (!defined $tag{'name'}) {
-		return
-	};
+	$tag{'comment'} = [split "\n", $raw_comment];
 	return %tag
 }
 
@@ -2021,8 +2401,6 @@ sub parse_commit_text {
 	my @commit_lines = split '\n', $commit_text;
 	my %co;
 
-	pop @commit_lines; # Remove '\0'
-
 	if (! @commit_lines) {
 		return;
 	}
@@ -2117,48 +2495,38 @@ sub parse_commit_text {
 
 sub parse_commit {
 	my ($commit_id) = @_;
-	my %co;
+	# This currently supports arbitrary object names, so we
+	# resolve it here.
+	my $sha1 = git_get_sha1_or_die($commit_id, 'commit');
 
-	local $/ = "\0";
-
-	open my $fd, "-|", git_cmd(), "rev-list",
-		"--parents",
-		"--header",
-		"--max-count=1",
-		$commit_id,
-		"--",
-		or die_error(500, "Open git-rev-list failed");
-	%co = parse_commit_text(<$fd>, 1);
-	close $fd;
-
-	return %co;
+	# This formats commits slightly differently than the raw
+	# cat-file, so we cannot use get_object here.  This function
+	# should be replaced by $repo->get_commit anyway.
+	my $commit_text = $repo->cmd_output(
+		cmd => [qw(rev-list --parents --header --max-count=1), $sha1, '--'],
+		cache => 2);
+	$commit_text =~ s/\x00$//;
+	return parse_commit_text($commit_text, 1);
 }
 
 sub parse_commits {
 	my ($commit_id, $maxcount, $skip, $filename, @args) = @_;
-	my @cos;
+	# This currently supports arbitrary object names, so we
+	# resolve it here.
+	my $sha1 = git_get_sha1_or_die($commit_id);
+	git_get_sha1_or_die("$sha1:$filename") if $filename;  # check existence
 
 	$maxcount ||= 1;
 	$skip ||= 0;
 
-	local $/ = "\0";
-
-	open my $fd, "-|", git_cmd(), "rev-list",
-		"--header",
-		@args,
-		("--max-count=" . $maxcount),
-		("--skip=" . $skip),
-		@extra_options,
-		$commit_id,
-		"--",
-		($filename ? ($filename) : ())
-		or die_error(500, "Open git-rev-list failed");
-	while (my $line = <$fd>) {
-		my %co = parse_commit_text($line);
-		push @cos, \%co;
-	}
-	close $fd;
-
+	# TODO: Integrate this into Git::Repo, and get rid of @args
+	# and @extra_options.
+	my @raw_commits = split "\0", $repo->cmd_output(
+		cmd => ['rev-list', '--header', @args, "--max-count=$maxcount",
+			"--skip=$skip", @extra_options,	$sha1, "--",
+			$filename || ()],
+		cache => 2);
+	my @cos = map { { parse_commit_text($_) } } @raw_commits;
 	return wantarray ? @cos : \@cos;
 }
 
@@ -2263,7 +2631,9 @@ sub parse_from_to_diffinfo {
 		# ordinary (not combined) diff
 		$from->{'file'} = $diffinfo->{'from_file'};
 		if ($diffinfo->{'status'} ne "A") { # not new (added) file
-			$from->{'href'} = href(action=>"blob", hash_base=>$hash_parent,
+			$from->{'href'} = href(action=>"blob",
+					       $hash_parent && git_get_type($hash_parent) eq 'commit' ?
+					       (hash_base=>$hash_parent) : (),
 			                       hash=>$diffinfo->{'from_id'},
 			                       file_name=>$from->{'file'});
 		} else {
@@ -2273,7 +2643,9 @@ sub parse_from_to_diffinfo {
 
 	$to->{'file'} = $diffinfo->{'to_file'};
 	if (!is_deleted($diffinfo)) { # file exists in result
-		$to->{'href'} = href(action=>"blob", hash_base=>$hash,
+		$to->{'href'} = href(action=>"blob",
+				     $hash && git_get_type($hash) eq 'commit' ?
+				     (hash_base=>$hash) : (),
 		                     hash=>$diffinfo->{'to_id'},
 		                     file_name=>$to->{'file'});
 	} else {
@@ -2288,15 +2660,15 @@ sub git_get_heads_list {
 	my $limit = shift;
 	my @headslist;
 
-	open my $fd, '-|', git_cmd(), 'for-each-ref',
-		($limit ? '--count='.($limit+1) : ()), '--sort=-committerdate',
-		'--format=%(objectname) %(refname) %(subject)%00%(committer)',
-		'refs/heads'
-		or return;
-	while (my $line = <$fd>) {
+	my @lines = split "\n", $repo->cmd_output(
+		cmd => ['for-each-ref', '--sort=-committerdate',
+			$limit ? '--count='.($limit+1) : (),
+			'--format=%(objectname) %(refname) %(subject)%00%(committer)',
+			'refs/heads'],
+		cache => 1);
+	for my $line (@lines) {
 		my %ref_item;
 
-		chomp $line;
 		my ($refinfo, $committerinfo) = split(/\0/, $line);
 		my ($hash, $name, $title) = split(' ', $refinfo, 3);
 		my ($committer, $epoch, $tz) =
@@ -2316,7 +2688,6 @@ sub git_get_heads_list {
 
 		push @headslist, \%ref_item;
 	}
-	close $fd;
 
 	return wantarray ? @headslist : \@headslist;
 }
@@ -2325,16 +2696,16 @@ sub git_get_tags_list {
 	my $limit = shift;
 	my @tagslist;
 
-	open my $fd, '-|', git_cmd(), 'for-each-ref',
-		($limit ? '--count='.($limit+1) : ()), '--sort=-creatordate',
-		'--format=%(objectname) %(objecttype) %(refname) '.
-		'%(*objectname) %(*objecttype) %(subject)%00%(creator)',
-		'refs/tags'
-		or return;
-	while (my $line = <$fd>) {
+	my @lines = split "\n", $repo->cmd_output(
+		cmd => ['for-each-ref', '--sort=-creatordate',
+			($limit ? '--count='.($limit+1) : ()),
+			'--format=%(objectname) %(objecttype) %(refname) '.
+			'%(*objectname) %(*objecttype) %(subject)%00%(creator)',
+			'refs/tags'],
+		cache => 1);
+	for my $line (@lines) {
 		my %ref_item;
 
-		chomp $line;
 		my ($refinfo, $creatorinfo) = split(/\0/, $line);
 		my ($id, $type, $name, $refid, $reftype, $title) = split(' ', $refinfo, 6);
 		my ($creator, $epoch, $tz) =
@@ -2365,7 +2736,6 @@ sub git_get_tags_list {
 
 		push @tagslist, \%ref_item;
 	}
-	close $fd;
 
 	return wantarray ? @tagslist : \@tagslist;
 }
@@ -2429,8 +2799,16 @@ sub mimetype_guess {
 	return $mime;
 }
 
+# Replacement for (heuristic) -T operator.  (perldoc -f -T)
+sub is_ascii_text {
+	my $text = shift;
+	return ((grep { ord($_) > 127 } split('', $text)) / length $text) <= 0.3;
+}
+
+# Determine the MIME type of a blob based on its file name ($filename)
+# and its first n bytes ($snippet).
 sub blob_mimetype {
-	my $fd = shift;
+	my $snippet = shift;
 	my $filename = shift;
 
 	if ($filename) {
@@ -2438,10 +2816,7 @@ sub blob_mimetype {
 		$mime and return $mime;
 	}
 
-	# just in case
-	return $default_blob_plain_mimetype unless $fd;
-
-	if (-T $fd) {
+	if (is_ascii_text($snippet)) {
 		return 'text/plain';
 	} elsif (! $filename) {
 		return 'application/octet-stream';
@@ -2457,9 +2832,9 @@ sub blob_mimetype {
 }
 
 sub blob_contenttype {
-	my ($fd, $file_name, $type) = @_;
+	my ($snippet, $file_name, $type) = @_;
 
-	$type ||= blob_mimetype($fd, $file_name);
+	$type ||= blob_mimetype($snippet, $file_name);
 	if ($type eq 'text/plain' && defined $default_text_plain_charset) {
 		$type .= "; charset=$default_text_plain_charset";
 	}
@@ -2472,7 +2847,7 @@ sub blob_contenttype {
 
 sub git_header_html {
 	my $status = shift || "200 OK";
-	my $expires = shift;
+	my $expires = shift || 'now';
 
 	my $title = "$site_name";
 	if (defined $project) {
@@ -2573,7 +2948,7 @@ EOF
 	print "</head>\n" .
 	      "<body>\n";
 
-	if (-f $site_header) {
+	if ($site_header && -f $site_header) {
 		open (my $fd, $site_header);
 		print <$fd>;
 		close $fd;
@@ -2593,8 +2968,8 @@ EOF
 	}
 	print "</div>\n";
 
-	my ($have_search) = gitweb_check_feature('search');
-	if (defined $project && $have_search) {
+	if (defined $project && gitweb_check_feature('search') &&
+	    $repo->get_sha1('HEAD')) {
 		if (!defined $searchtext) {
 			$searchtext = "";
 		}
@@ -2662,6 +3037,39 @@ sub git_footer_html {
 	}
 	print "</div>\n"; # class="page_footer"
 
+	if ($page_info) {
+		print "<div class=\"page_info\">\n";
+		my $print_stats = sub {
+			my ($name, $cache_exists) = (shift, shift);
+			my %s = @_;  # statistics hash
+			if ($cache_exists) {
+				my $gets = $s{hits} + $s{misses};
+				print "<p>" . ucfirst($name) . ": " .
+				    "<b>$gets</b> gets " .
+				    "(<b>$s{hits}</b> hits + ".
+				    "<b>$s{misses}</b> misses); " .
+				    "<b>$s{sets}</b> sets, " .
+				    "<b>$s{failed_sets}</b> failed sets.</p>\n";
+				if ($page_info == 2 && @{$s{get_list}}) {
+					print "<pre class=\"cache_list\">";
+					print join("\n",
+						   map(esc_html(join ',',
+								map(defined $_ ? " $_" : '',
+								    @$_)),
+						       @{$s{get_list}}));
+					print "</pre>\n";
+				}
+			} else {
+				print "<p><i>No $name.</i></p>\n";
+			}
+		};
+		$print_stats->('main cache', $cache,
+			       %CachedRepo::cache_statistics);
+		$print_stats->('large cache', $large_cache_root,
+			       %CachedRepo::large_cache_statistics);
+		print "</div>\n"; # class="page_info"
+	}
+
 	if (-f $site_footer) {
 		open (my $fd, $site_footer);
 		print <$fd>;
@@ -2710,6 +3118,8 @@ sub git_print_page_nav {
 	my ($current, $suppress, $head, $treehead, $treebase, $extra) = @_;
 	$extra = '' if !defined $extra; # pager or formats
 
+	return unless $repo->get_sha1('HEAD');  # no navigation for empty repos
+
 	my @navs = qw(summary shortlog log commit commitdiff tree);
 	if ($suppress) {
 		@navs = grep { $_ ne $suppress } @navs;
@@ -2891,24 +3301,6 @@ sub git_print_log ($;%) {
 	}
 }
 
-# return link target (what link points to)
-sub git_get_link_target {
-	my $hash = shift;
-	my $link_target;
-
-	# read link
-	open my $fd, "-|", git_cmd(), "cat-file", "blob", $hash
-		or return;
-	{
-		local $/;
-		$link_target = <$fd>;
-	}
-	close $fd
-		or return;
-
-	return $link_target;
-}
-
 # given link target, and the directory (basedir) the link is in,
 # return target of link relative to top directory (top tree);
 # return undef if it is not possible (including absolute links).
@@ -2970,7 +3362,7 @@ sub git_print_tree_entry {
 			                       file_name=>"$basedir$t->{'name'}", %base_key),
 			        -class => "list"}, esc_path($t->{'name'}));
 		if (S_ISLNK(oct $t->{'mode'})) {
-			my $link_target = git_get_link_target($t->{'hash'});
+			my $link_target = $repo->get_object($t->{'hash'});
 			if ($link_target) {
 				my $norm_target = normalize_link_target($link_target, $basedir, $hash_base);
 				if (defined $norm_target) {
@@ -3383,7 +3775,7 @@ sub git_difftree_body {
 }
 
 sub git_patchset_body {
-	my ($fd, $difftree, $hash, @hash_parents) = @_;
+	my ($read, $difftree, $hash, @hash_parents) = @_;
 	my ($hash_parent) = $hash_parents[0];
 
 	my $is_combined = (@hash_parents > 1);
@@ -3397,7 +3789,7 @@ sub git_patchset_body {
 	print "<div class=\"patchset\">\n";
 
 	# skip to first patch
-	while ($patch_line = <$fd>) {
+	while ($patch_line = $read->()) {
 		chomp $patch_line;
 
 		last if ($patch_line =~ m/^diff /);
@@ -3465,7 +3857,7 @@ sub git_patchset_body {
 		# print extended diff header
 		print "<div class=\"diff extended_header\">\n";
 	EXTENDED_HEADER:
-		while ($patch_line = <$fd>) {
+		while ($patch_line = $read->()) {
 			chomp $patch_line;
 
 			last EXTENDED_HEADER if ($patch_line =~ m/^--- |^diff /);
@@ -3484,7 +3876,7 @@ sub git_patchset_body {
 		#assert($patch_line =~ m/^---/) if DEBUG;
 
 		my $last_patch_line = $patch_line;
-		$patch_line = <$fd>;
+		$patch_line = $read->();
 		chomp $patch_line;
 		#assert($patch_line =~ m/^\+\+\+/) if DEBUG;
 
@@ -3494,7 +3886,7 @@ sub git_patchset_body {
 
 		# the patch itself
 	LINE:
-		while ($patch_line = <$fd>) {
+		while ($patch_line = $read->()) {
 			chomp $patch_line;
 
 			next PATCH if ($patch_line =~ m/^diff /);
@@ -3545,9 +3937,6 @@ sub fill_project_list_info {
  PROJECT:
 	foreach my $pr (@$projlist) {
 		my (@activity) = git_get_last_activity($pr->{'path'});
-		unless (@activity) {
-			next PROJECT;
-		}
 		($pr->{'age'}, $pr->{'age_string'}) = @activity;
 		if (!defined $pr->{'descr'}) {
 			my $descr = git_get_project_description($pr->{'path'}) || "";
@@ -3585,7 +3974,9 @@ sub print_sort_th {
 		if ($str_sort) {
 			@$list = sort {$a->{$key} cmp $b->{$key}} @$list;
 		} else {
-			@$list = sort {$a->{$key} <=> $b->{$key}} @$list;
+			# Sort undefined keys last.
+			@$list = sort { (defined $a->{$key} ? $a->{$key} : 1e30) <=>
+					(defined $b->{$key} ? $b->{$key} : 1e30)} @$list;
 		}
 		print "<th>$header</th>\n";
 	} else {
@@ -3653,16 +4044,20 @@ sub git_project_list_body {
 		      "<td>" . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary"),
 		                        -class => "list", -title => $pr->{'descr_long'}},
 		                        esc_html($pr->{'descr'})) . "</td>\n" .
-		      "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n";
-		print "<td class=\"". age_class($pr->{'age'}) . "\">" .
+		      "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n" .
+		      "<td class=\"". age_class($pr->{'age'}) . "\">" .
 		      (defined $pr->{'age_string'} ? $pr->{'age_string'} : "No commits") . "</td>\n" .
 		      "<td class=\"link\">" .
-		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary")}, "summary")   . " | " .
-		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"shortlog")}, "shortlog") . " | " .
-		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"log")}, "log") . " | " .
-		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"tree")}, "tree") .
-		      ($pr->{'forks'} ? " | " . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "forks") : '') .
-		      "</td>\n" .
+		      $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary")}, "summary");
+		if ($pr->{'age_string'}) {
+			# Non-empty repository.
+			print " | " .
+			    $cgi->a({-href => href(project=>$pr->{'path'}, action=>"shortlog")}, "shortlog") . " | " .
+			    $cgi->a({-href => href(project=>$pr->{'path'}, action=>"log")}, "log") . " | " .
+			    $cgi->a({-href => href(project=>$pr->{'path'}, action=>"tree")}, "tree");
+		}
+		print " | " . $cgi->a({-href => href(project=>$pr->{'path'}, action=>"forks")}, "forks") if $pr->{'forks'};
+		print "</td>\n" .
 		      "</tr>\n";
 	}
 	if (defined $extra) {
@@ -3760,8 +4155,8 @@ sub git_history_body {
 		      $cgi->a({-href => href(action=>"commitdiff", hash=>$commit)}, "commitdiff");
 
 		if ($ftype eq 'blob') {
-			my $blob_current = git_get_hash_by_path($hash_base, $file_name);
-			my $blob_parent  = git_get_hash_by_path($commit, $file_name);
+			my $blob_current = git_get_sha1_by_path($hash_base, $file_name);
+			my $blob_parent = git_get_sha1_by_path($commit, $file_name);
 			if (defined $blob_current && defined $blob_parent &&
 					$blob_current ne $blob_parent) {
 				print " | " .
@@ -4016,9 +4411,9 @@ sub git_project_index {
 
 sub git_summary {
 	my $descr = git_get_project_description($project) || "none";
-	my %co = parse_commit("HEAD");
-	my %cd = %co ? parse_date($co{'committer_epoch'}, $co{'committer_tz'}) : ();
-	my $head = $co{'id'};
+	my $head = $repo->get_sha1('HEAD', 'commit');
+	my %co = parse_commit($head) if $head;
+	my %cd = parse_date($co{'committer_epoch'}, $co{'committer_tz'}) if $head;
 
 	my $owner = git_get_project_owner($project);
 
@@ -4037,7 +4432,7 @@ sub git_summary {
 	git_header_html();
 	git_print_page_nav('summary','', $head);
 
-	print "<div class=\"title\">&nbsp;</div>\n";
+	print "<div class=\"title\">&nbsp;</div>\n" if $head;
 	print "<table class=\"projects_list\">\n" .
 	      "<tr><td>description</td><td>" . esc_html($descr) . "</td></tr>\n" .
 	      "<tr><td>owner</td><td>" . esc_html($owner) . "</td></tr>\n";
@@ -4103,14 +4498,12 @@ sub git_summary {
 }
 
 sub git_tag {
-	my $head = git_get_head_hash($project);
+	my $head = git_get_head_hash();
+	my $sha1 = git_get_sha1_or_die($hash, 'tag');
 	git_header_html();
 	git_print_page_nav('','', $head,undef,$head);
-	my %tag = parse_tag($hash);
-
-	if (! %tag) {
-		die_error(404, "Unknown tag object");
-	}
+	# TODO: This wants to become $repo->get_tag.
+	my %tag = parse_tag($sha1);
 
 	git_print_header_div('commit', esc_html($tag{'name'}), $hash);
 	print "<div class=\"title_text\">\n" .
@@ -4134,7 +4527,6 @@ sub git_tag {
 	print "<div class=\"page_body\">";
 	my $comment = $tag{'comment'};
 	foreach my $line (@$comment) {
-		chomp $line;
 		print esc_html($line, -nbsp=>1) . "<br/>\n";
 	}
 	print "</div>\n";
@@ -4145,25 +4537,19 @@ sub git_blame {
 	my $fd;
 	my $ftype;
 
-	gitweb_check_feature('blame')
-	    or die_error(403, "Blame view not allowed");
+	die_error(403, "Blame view not allowed")
+	    unless gitweb_check_feature('blame');
 
 	die_error(400, "No file name given") unless $file_name;
-	$hash_base ||= git_get_head_hash($project);
-	die_error(404, "Couldn't find base commit") unless ($hash_base);
-	my %co = parse_commit($hash_base)
-		or die_error(404, "Commit not found");
-	if (!defined $hash) {
-		$hash = git_get_hash_by_path($hash_base, $file_name, "blob")
-			or die_error(404, "Error looking up file");
-	}
-	$ftype = git_get_type($hash);
-	if ($ftype !~ "blob") {
-		die_error(400, "Object is not a blob");
-	}
-	open ($fd, "-|", git_cmd(), "blame", '-p', '--',
-	      $file_name, $hash_base)
-		or die_error(500, "Open git-blame failed");
+	$hash_base ||= git_get_head_hash();
+	my $hash_base_sha1 = git_get_sha1_or_die($hash_base, 'commit');
+	my %co = parse_commit($hash_base);
+	$hash ||= git_get_sha1_by_path($hash_base, $file_name, 'blob')
+	    or die_error(404, "Error looking up file: '$file_name'");
+	git_get_sha1_or_die($hash, 'blob');  # check existence
+	my @blame_lines = split "\n", $repo->cmd_output(
+		cmd => ['blame', '-p', '--', $file_name, $hash_base_sha1],
+	    cache => 2);
 	git_header_html();
 	my $formats_nav =
 		$cgi->a({-href => href(action=>"blob", -replay=>1)},
@@ -4188,7 +4574,7 @@ sub git_blame {
 HTML
 	my %metainfo = ();
 	while (1) {
-		$_ = <$fd>;
+		$_ = shift @blame_lines;
 		last unless defined $_;
 		my ($full_rev, $orig_lineno, $lineno, $group_size) =
 		    /^([0-9a-f]{40}) (\d+) (\d+)(?: (\d+))?$/;
@@ -4196,14 +4582,13 @@ HTML
 			$metainfo{$full_rev} = {};
 		}
 		my $meta = $metainfo{$full_rev};
-		while (<$fd>) {
+		while ($_ = shift @blame_lines) {
 			last if (s/^\t//);
 			if (/^(\S+) (.*)$/) {
 				$meta->{$1} = $2;
 			}
 		}
 		my $data = $_;
-		chomp $data;
 		my $rev = substr($full_rev, 0, 8);
 		my $author = $meta->{'author'};
 		my %date = parse_date($meta->{'author-time'},
@@ -4224,11 +4609,9 @@ HTML
 			              esc_html($rev));
 			print "</td>\n";
 		}
-		open (my $dd, "-|", git_cmd(), "rev-parse", "$full_rev^")
-			or die_error(500, "Open git-rev-parse failed");
-		my $parent_commit = <$dd>;
-		close $dd;
-		chomp($parent_commit);
+		# TODO: $parent_commit can be undef, in which case the
+		# link becomes invalid.
+		my $parent_commit = $repo->get_sha1("$full_rev^");
 		my $blamed = href(action => 'blame',
 		                  file_name => $meta->{'filename'},
 		                  hash_base => $parent_commit);
@@ -4243,13 +4626,11 @@ HTML
 	}
 	print "</table>\n";
 	print "</div>";
-	close $fd
-		or print "Reading blob failed\n";
 	git_footer_html();
 }
 
 sub git_tags {
-	my $head = git_get_head_hash($project);
+	my $head = git_get_head_hash();
 	git_header_html();
 	git_print_page_nav('','', $head,undef,$head);
 	git_print_header_div('summary', $project);
@@ -4262,7 +4643,7 @@ sub git_tags {
 }
 
 sub git_heads {
-	my $head = git_get_head_hash($project);
+	my $head = git_get_head_hash();
 	git_header_html();
 	git_print_page_nav('','', $head,undef,$head);
 	git_print_header_div('summary', $project);
@@ -4280,9 +4661,9 @@ sub git_blob_plain {
 
 	if (!defined $hash) {
 		if (defined $file_name) {
-			my $base = $hash_base || git_get_head_hash($project);
-			$hash = git_get_hash_by_path($base, $file_name, "blob")
-				or die_error(404, "Cannot find file");
+			my $base = $hash_base || git_get_head_hash();
+			$hash = git_get_sha1_by_path($base, $file_name, 'blob')
+			    or die_error(404, "Cannot find file: '$file_name'");
 		} else {
 			die_error(400, "No file name defined");
 		}
@@ -4291,11 +4672,15 @@ sub git_blob_plain {
 		$expires = "+1d";
 	}
 
-	open my $fd, "-|", git_cmd(), "cat-file", "blob", $hash
-		or die_error(500, "Open git-cat-file blob '$hash' failed");
+	my $sha1 = git_get_sha1_or_die($hash, 'blob');
+	my $blob_read = $repo->progressive_cmd_output(
+		cmd => ['cat-file', 'blob', $sha1], separator => undef,
+		cache => 1);
+	my $first_chunk = $blob_read->() || '';
 
 	# content-type (can include charset)
-	$type = blob_contenttype($fd, $file_name, $type);
+	$type = blob_contenttype(substr($first_chunk, 0, 1024),
+				 $file_name, $type);
 
 	# "save as" filename, even when no $file_name is given
 	my $save_as = "$hash";
@@ -4309,12 +4694,15 @@ sub git_blob_plain {
 		-type => $type,
 		-expires => $expires,
 		-content_disposition => 'inline; filename="' . $save_as . '"');
-	undef $/;
-	binmode STDOUT, ':raw';
-	print <$fd>;
-	binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
-	$/ = "\n";
-	close $fd;
+	{
+		local $/;
+		binmode STDOUT, ':raw';
+		print $first_chunk;
+		while (my $chunk = $blob_read->()) {
+			print $chunk;
+		}
+		binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+	}
 }
 
 sub git_blob {
@@ -4322,9 +4710,9 @@ sub git_blob {
 
 	if (!defined $hash) {
 		if (defined $file_name) {
-			my $base = $hash_base || git_get_head_hash($project);
-			$hash = git_get_hash_by_path($base, $file_name, "blob")
-				or die_error(404, "Cannot find file");
+			my $base = $hash_base || git_get_head_hash();
+			$hash = git_get_sha1_by_path($base, $file_name, 'blob')
+			    or die_error(404, "Cannot find file: '$file_name'");
 		} else {
 			die_error(400, "No file name defined");
 		}
@@ -4334,13 +4722,20 @@ sub git_blob {
 	}
 
 	my ($have_blame) = gitweb_check_feature('blame');
-	open my $fd, "-|", git_cmd(), "cat-file", "blob", $hash
-		or die_error(500, "Couldn't cat $file_name, $hash");
-	my $mimetype = blob_mimetype($fd, $file_name);
-	if ($mimetype !~ m!^(?:text/|image/(?:gif|png|jpeg)$)! && -B $fd) {
-		close $fd;
-		return git_blob_plain($mimetype);
-	}
+	my $sha1 = git_get_sha1_or_die($hash, 'blob');
+	my $blob_read = $repo->progressive_cmd_output(
+		cmd => ['cat-file', 'blob', $sha1], separator => "\n",
+		cache => 1);
+	my @first_lines;
+	for my $i (0..20) {
+		my $line = $blob_read->() or last;
+		push @first_lines, $line;
+	}
+	my $test_snippet = join("\n", @first_lines);
+	my $mimetype = blob_mimetype($test_snippet, $file_name);
+	return git_blob_plain($mimetype)
+	    if ($mimetype !~ m!^(?:text/|image/(?:gif|png|jpeg)$)! &&
+		! is_ascii_text($test_snippet));
 	# we can have blame only for text/* mimetype
 	$have_blame &&= ($mimetype =~ m!^text/!);
 
@@ -4389,7 +4784,7 @@ sub git_blob {
 		      qq!" />\n!;
 	} else {
 		my $nr;
-		while (my $line = <$fd>) {
+		while (my $line = shift @first_lines || $blob_read->()) {
 			chomp $line;
 			$nr++;
 			$line = untabify($line);
@@ -4397,29 +4792,23 @@ sub git_blob {
 			       $nr, $nr, $nr, esc_html($line, -nbsp=>1);
 		}
 	}
-	close $fd
-		or print "Reading blob failed.\n";
 	print "</div>";
 	git_footer_html();
 }
 
 sub git_tree {
-	if (!defined $hash_base) {
-		$hash_base = "HEAD";
-	}
+	$hash_base ||= "HEAD";
 	if (!defined $hash) {
 		if (defined $file_name) {
-			$hash = git_get_hash_by_path($hash_base, $file_name, "tree");
+			$hash = git_get_sha1_by_path($hash_base, $file_name, 'tree')
+			    or die_error(404, "Cannot find file: '$file_name'");
 		} else {
 			$hash = $hash_base;
 		}
 	}
-	$/ = "\0";
-	open my $fd, "-|", git_cmd(), "ls-tree", '-z', $hash
-		or die_error(500, "Open git-ls-tree failed");
-	my @entries = map { chomp; $_ } <$fd>;
-	close $fd or die_error(404, "Reading tree failed");
-	$/ = "\n";
+	my $sha1 = git_get_sha1_or_die($hash, 'tree');
+	my @entries = split "\0", $repo->cmd_output(
+		cmd => ['ls-tree', '-z', $sha1], cache => 2);
 
 	my $refs = git_get_references();
 	my $ref = format_ref_marker($refs, $hash_base);
@@ -4520,9 +4909,8 @@ sub git_snapshot {
 		die_error(403, "Unsupported snapshot format");
 	}
 
-	if (!defined $hash) {
-		$hash = git_get_head_hash($project);
-	}
+	$hash ||= git_get_head_hash();
+	my $sha1 = git_get_sha1_or_die($hash);
 
 	my $name = $project;
 	$name =~ s,([^/])/*\.git$,$1,;
@@ -4532,9 +4920,9 @@ sub git_snapshot {
 	my $cmd;
 	$filename .= "-$hash$known_snapshot_formats{$format}{'suffix'}";
 	$cmd = quote_command(
-		git_cmd(), 'archive',
+		'archive',
 		"--format=$known_snapshot_formats{$format}{'format'}",
-		"--prefix=$name/", $hash);
+		"--prefix=$name/", $sha1);
 	if (exists $known_snapshot_formats{$format}{'compressor'}) {
 		$cmd .= ' | ' . quote_command(@{$known_snapshot_formats{$format}{'compressor'}});
 	}
@@ -4544,16 +4932,17 @@ sub git_snapshot {
 		-content_disposition => 'inline; filename="' . "$filename" . '"',
 		-status => '200 OK');
 
-	open my $fd, "-|", $cmd
-		or die_error(500, "Execute git-archive failed");
+	my $snapshot_read = $repo->progressive_cmd_output(
+		cmd => $cmd, separator => undef, cache => 1);
 	binmode STDOUT, ':raw';
-	print <$fd>;
+	while (my $chunk = $snapshot_read->()) {
+		print $chunk;
+	}
 	binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
-	close $fd;
 }
 
 sub git_log {
-	my $head = git_get_head_hash($project);
+	my $head = git_get_head_hash();
 	if (!defined $hash) {
 		$hash = $head;
 	}
@@ -4613,8 +5002,8 @@ sub git_log {
 
 sub git_commit {
 	$hash ||= $hash_base || "HEAD";
-	my %co = parse_commit($hash)
-	    or die_error(404, "Unknown commit object");
+	my $sha1 = git_get_sha1_or_die($hash, 'commit');
+	my %co = parse_commit($hash);
 	my %ad = parse_date($co{'author_epoch'}, $co{'author_tz'});
 	my %cd = parse_date($co{'committer_epoch'}, $co{'committer_tz'});
 
@@ -4649,14 +5038,10 @@ sub git_commit {
 	if (!defined $parent) {
 		$parent = "--root";
 	}
-	my @difftree;
-	open my $fd, "-|", git_cmd(), "diff-tree", '-r', "--no-commit-id",
-		@diff_opts,
-		(@$parents <= 1 ? $parent : '-c'),
-		$hash, "--"
-		or die_error(500, "Open git-diff-tree failed");
-	@difftree = map { chomp; $_ } <$fd>;
-	close $fd or die_error(404, "Reading git-diff-tree failed");
+	my @difftree = split "\n", $repo->cmd_output(
+		cmd => ['diff-tree', '-r', '--no-commit-id', @diff_opts,
+			(@$parents <= 1 ? $parent : '-c'), $sha1, '--'],
+		cache => 2);
 
 	# non-textual hash id's can be cached
 	my $expires;
@@ -4743,41 +5128,16 @@ sub git_object {
 	# - hash_base and file_name
 	my $type;
 
-	# - hash or hash_base alone
 	if ($hash || ($hash_base && !defined $file_name)) {
-		my $object_id = $hash || $hash_base;
-
-		open my $fd, "-|", quote_command(
-			git_cmd(), 'cat-file', '-t', $object_id) . ' 2> /dev/null'
-			or die_error(404, "Object does not exist");
-		$type = <$fd>;
-		chomp $type;
-		close $fd
-			or die_error(404, "Object does not exist");
-
-	# - hash_base and file_name
+		# hash or hash_base alone
+		$type = (git_get_sha1_or_die($hash || $hash_base))[1];
 	} elsif ($hash_base && defined $file_name) {
+		# hash_base and file_name
 		$file_name =~ s,/+$,,;
-
-		system(git_cmd(), "cat-file", '-e', $hash_base) == 0
-			or die_error(404, "Base object does not exist");
-
-		# here errors should not hapen
-		open my $fd, "-|", git_cmd(), "ls-tree", $hash_base, "--", $file_name
-			or die_error(500, "Open git-ls-tree failed");
-		my $line = <$fd>;
-		close $fd;
-
-		#'100644 blob 0fa3f3a66fb6a137f6ec2c19351ed4d807070ffa	panic.c'
-		unless ($line && $line =~ m/^([0-9]+) (.+) ([0-9a-fA-F]{40})\t/) {
-			die_error(404, "File or directory for given base does not exist");
-		}
-		$type = $2;
-		$hash = $3;
+		($hash, $type) = git_get_sha1_or_die("$hash_base:$file_name");
 	} else {
 		die_error(400, "Not enough information to find object");
 	}
-
 	print $cgi->redirect(-uri => href(action=>$type, -full=>1,
 	                                  hash=>$hash, hash_base=>$hash_base,
 	                                  file_name=>$file_name),
@@ -4787,47 +5147,26 @@ sub git_object {
 sub git_blobdiff {
 	my $format = shift || 'html';
 
-	my $fd;
 	my @difftree;
 	my %diffinfo;
 	my $expires;
+	my $diff_read;
 
-	# preparing $fd and %diffinfo for git_patchset_body
+	# prepare $diff_read and %diffinfo for git_patchset_body
 	# new style URI
 	if (defined $hash_base && defined $hash_parent_base) {
-		if (defined $file_name) {
-			# read raw output
-			open $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-				$hash_parent_base, $hash_base,
-				"--", (defined $file_parent ? $file_parent : ()), $file_name
-				or die_error(500, "Open git-diff-tree failed");
-			@difftree = map { chomp; $_ } <$fd>;
-			close $fd
-				or die_error(404, "Reading git-diff-tree failed");
-			@difftree
-				or die_error(404, "Blob diff not found");
-
-		} elsif (defined $hash &&
-		         $hash =~ /[0-9a-fA-F]{40}/) {
-			# try to find filename from $hash
-
-			# read filtered raw output
-			open $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-				$hash_parent_base, $hash_base, "--"
-				or die_error(500, "Open git-diff-tree failed");
-			@difftree =
-				# ':100644 100644 03b21826... 3b93d5e7... M	ls-files.c'
-				# $hash == to_id
-				grep { /^:[0-7]{6} [0-7]{6} [0-9a-fA-F]{40} $hash/ }
-				map { chomp; $_ } <$fd>;
-			close $fd
-				or die_error(404, "Reading git-diff-tree failed");
-			@difftree
-				or die_error(404, "Blob diff not found");
-
-		} else {
-			die_error(400, "Missing one of the blob diff parameters");
-		}
+		my $commit_sha1 = git_get_sha1_or_die($hash_base, 'commit');
+		my $parent_base_sha1 = git_get_sha1_or_die($hash_parent_base, 'commit');
+		git_get_sha1_or_die($file_parent, 'blob') if $file_parent;
+		# There used to be code to handle hash (h) parameters,
+		# but it's not used (anymore), so we can require $file_name.
+		die_error(400, "No file name given") unless $file_name;
+		# read raw output
+		@difftree = split "\n", $repo->cmd_output(
+			cmd => ['diff-tree', '-r', @diff_opts, $parent_base_sha1,
+				$commit_sha1, '--', $file_parent || (),
+				$file_name],
+			cache => 2);
 
 		if (@difftree > 1) {
 			die_error(400, "Ambiguous blob diff specification");
@@ -4847,14 +5186,17 @@ sub git_blobdiff {
 		}
 
 		# open patch output
-		open $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-			'-p', ($format eq 'html' ? "--full-index" : ()),
-			$hash_parent_base, $hash_base,
-			"--", (defined $file_parent ? $file_parent : ()), $file_name
-			or die_error(500, "Open git-diff-tree failed");
+		# TODO: uncovered (?)
+		my $hash_base_sha1 = git_get_sha1_or_die($hash_base, 'commit');
+		$diff_read = $repo->progressive_cmd_output(
+			cmd => ['diff-tree', '-r', @diff_opts, '-p',
+				($format eq 'html' ? '--full-index' : ()),
+				$parent_base_sha1, $hash_base_sha1, '--',
+				$file_parent || (), $file_name],
+			separator => "\n", cache => 1);
 	}
 
-	# old/legacy style URI
+	# old/legacy style URI (still used in feed [Atom/RSS] view)
 	if (!%diffinfo && # if new style URI failed
 	    defined $hash && defined $hash_parent) {
 		# fake git-diff-tree raw output
@@ -4884,10 +5226,13 @@ sub git_blobdiff {
 		}
 
 		# open patch output
-		open $fd, "-|", git_cmd(), "diff", @diff_opts,
-			'-p', ($format eq 'html' ? "--full-index" : ()),
-			$hash_parent, $hash, "--"
-			or die_error(500, "Open git-diff failed");
+		my $parent_sha1 = git_get_sha1_or_die($hash_parent, 'blob');
+		my $sha1 = git_get_sha1_or_die($hash, 'commit');
+		$diff_read = $repo->progressive_cmd_output(
+			cmd => ['diff', @diff_opts, '-p',
+				($format eq 'html' ? '--full-index' : ()),
+				$parent_sha1, $sha1, '--'],
+			separator => "\n", cache => 1);
 	} else  {
 		die_error(400, "Missing one of the blob diff parameters")
 			unless %diffinfo;
@@ -4929,14 +5274,13 @@ sub git_blobdiff {
 	if ($format eq 'html') {
 		print "<div class=\"page_body\">\n";
 
-		git_patchset_body($fd, [ \%diffinfo ], $hash_base, $hash_parent_base);
-		close $fd;
+		git_patchset_body($diff_read, [ \%diffinfo ], $hash_base, $hash_parent_base);
 
 		print "</div>\n"; # class="page_body"
 		git_footer_html();
 
 	} else {
-		while (my $line = <$fd>) {
+		while (my $line = $diff_read->()) {
 			$line =~ s!a/($hash|$hash_parent)!'a/'.esc_path($diffinfo{'from_file'})!eg;
 			$line =~ s!b/($hash|$hash_parent)!'b/'.esc_path($diffinfo{'to_file'})!eg;
 
@@ -4944,9 +5288,9 @@ sub git_blobdiff {
 
 			last if $line =~ m!^\+\+\+!;
 		}
-		local $/ = undef;
-		print <$fd>;
-		close $fd;
+		while (my $line = $diff_read->()) {
+			print $line;
+		}
 	}
 }
 
@@ -4957,12 +5301,20 @@ sub git_blobdiff_plain {
 sub git_commitdiff {
 	my $format = shift || 'html';
 	$hash ||= $hash_base || "HEAD";
-	my %co = parse_commit($hash)
-	    or die_error(404, "Unknown commit object");
+	my $sha1 = git_get_sha1_or_die($hash, 'commit');
+	my %co = parse_commit($hash);
 
 	# choose format for commitdiff for merge
-	if (! defined $hash_parent && @{$co{'parents'}} > 1) {
-		$hash_parent = '--cc';
+	my $hash_parent_param = $hash_parent;
+	# Unfortunately we can pass in command line options as
+	# $hash_parent.
+	if ($hash_parent_param && $hash_parent_param ne '-c' &&
+	    $hash_parent_param ne '--cc') {
+		$hash_parent_param =
+		    git_get_sha1_or_die($hash_parent_param, 'commit');
+	}
+	if (! defined $hash_parent_param && @{$co{'parents'}} > 1) {
+		$hash_parent_param = '--cc';
 	}
 	# we need to prepare $formats_nav before almost any parameter munging
 	my $formats_nav;
@@ -4971,8 +5323,7 @@ sub git_commitdiff {
 			$cgi->a({-href => href(action=>"commitdiff_plain", -replay=>1)},
 			        "raw");
 
-		if (defined $hash_parent &&
-		    $hash_parent ne '-c' && $hash_parent ne '--cc') {
+		if (defined $hash_parent) {
 			# commitdiff with two commits given
 			my $hash_parent_short = $hash_parent;
 			if ($hash_parent =~ m/^[0-9a-fA-F]{40}$/) {
@@ -5004,7 +5355,7 @@ sub git_commitdiff {
 				')';
 		} else {
 			# merge commit
-			if ($hash_parent eq '--cc') {
+			if ($hash_parent && $hash_parent eq '--cc') {
 				$formats_nav .= ' | ' .
 					$cgi->a({-href => href(action=>"commitdiff",
 					                       hash=>$hash, hash_parent=>'-c')},
@@ -5026,7 +5377,6 @@ sub git_commitdiff {
 		}
 	}
 
-	my $hash_parent_param = $hash_parent;
 	if (!defined $hash_parent_param) {
 		# --cc for multiple parents, --root for parentless
 		$hash_parent_param =
@@ -5034,28 +5384,26 @@ sub git_commitdiff {
 	}
 
 	# read commitdiff
-	my $fd;
+	my $diff_read;
 	my @difftree;
 	if ($format eq 'html') {
-		open $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-			"--no-commit-id", "--patch-with-raw", "--full-index",
-			$hash_parent_param, $hash, "--"
-			or die_error(500, "Open git-diff-tree failed");
-
-		while (my $line = <$fd>) {
+		$diff_read = $repo->progressive_cmd_output(
+			cmd => ['diff-tree', '-r', @diff_opts, '--no-commit-id',
+				'--patch-with-raw', '--full-index',
+				$hash_parent_param, $sha1, '--'],
+			separator => "\n", cache => 1);
+		while (my $line = $diff_read->()) {
 			chomp $line;
 			# empty line ends raw part of diff-tree output
 			last unless $line;
 			push @difftree, scalar parse_difftree_raw_line($line);
 		}
-
-	} elsif ($format eq 'plain') {
-		open $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-			'-p', $hash_parent_param, $hash, "--"
-			or die_error(500, "Open git-diff-tree failed");
-
 	} else {
-		die_error(400, "Unknown commitdiff format");
+		die unless $format eq 'plain';
+		$diff_read = $repo->progressive_cmd_output(
+			cmd => ['diff-tree', '-r', @diff_opts, '-p',
+				$hash_parent_param, $sha1, '--'],
+			separator => "\n", cache => 1);
 	}
 
 	# non-textual hash id's can be cached
@@ -5081,8 +5429,7 @@ sub git_commitdiff {
 		}
 
 	} elsif ($format eq 'plain') {
-		my $refs = git_get_references("tags");
-		my $tagname = git_get_rev_name_tags($hash);
+		my $tagname = git_get_rev_name_tags($sha1);
 		my $filename = basename($project) . "-$hash.patch";
 
 		print $cgi->header(
@@ -5112,17 +5459,15 @@ sub git_commitdiff {
 		                  $use_parents ? @{$co{'parents'}} : $hash_parent);
 		print "<br/>\n";
 
-		git_patchset_body($fd, \@difftree, $hash,
+		git_patchset_body($diff_read, \@difftree, $hash,
 		                  $use_parents ? @{$co{'parents'}} : $hash_parent);
-		close $fd;
 		print "</div>\n"; # class="page_body"
 		git_footer_html();
 
 	} elsif ($format eq 'plain') {
-		local $/ = undef;
-		print <$fd>;
-		close $fd
-			or print "Reading git-diff-tree failed\n";
+		while (my $line = $diff_read->()) {
+			print $line;
+		}
 	}
 }
 
@@ -5131,37 +5476,29 @@ sub git_commitdiff_plain {
 }
 
 sub git_history {
-	if (!defined $hash_base) {
-		$hash_base = git_get_head_hash($project);
-	}
-	if (!defined $page) {
-		$page = 0;
-	}
+	$hash_base ||= git_get_head_hash();
+	$page ||= 0;
 	my $ftype;
-	my %co = parse_commit($hash_base)
-	    or die_error(404, "Unknown commit object");
+	my %co = parse_commit($hash_base);
 
 	my $refs = git_get_references();
 	my $limit = sprintf("--max-count=%i", (100 * ($page+1)));
 
 	my @commitlist = parse_commits($hash_base, 101, (100 * $page),
-	                               $file_name, "--full-history")
-	    or die_error(404, "No such file or directory on given branch");
+	                               $file_name, "--full-history");
 
 	if (!defined $hash && defined $file_name) {
 		# some commits could have deleted file in question,
 		# and not have it in tree, but one of them has to have it
 		for (my $i = 0; $i <= @commitlist; $i++) {
-			$hash = git_get_hash_by_path($commitlist[$i]{'id'}, $file_name);
+			$hash = git_get_sha1_by_path($commitlist[$i]{'id'}, $file_name);
 			last if defined $hash;
 		}
 	}
 	if (defined $hash) {
+		git_get_sha1_or_die($hash);
 		$ftype = git_get_type($hash);
 	}
-	if (!defined $ftype) {
-		die_error(500, "Unknown type of object");
-	}
 
 	my $paging_nav = '';
 	if ($page > 0) {
@@ -5202,13 +5539,8 @@ sub git_search {
 	if (!defined $searchtext) {
 		die_error(400, "Text field is empty");
 	}
-	if (!defined $hash) {
-		$hash = git_get_head_hash($project);
-	}
+	$hash ||= git_get_head_hash();
 	my %co = parse_commit($hash);
-	if (!%co) {
-		die_error(404, "Unknown commit object");
-	}
 	if (!defined $page) {
 		$page = 0;
 	}
@@ -5280,12 +5612,15 @@ sub git_search {
 		print "<table class=\"pickaxe search\">\n";
 		my $alternate = 1;
 		$/ = "\n";
-		open my $fd, '-|', git_cmd(), '--no-pager', 'log', @diff_opts,
-			'--pretty=format:%H', '--no-abbrev', '--raw', "-S$searchtext",
-			($search_use_regexp ? '--pickaxe-regex' : ());
+		my $pickaxe_read = $repo->progressive_cmd_output(
+			cmd => ['log', @diff_opts, '--pretty=format:%H',
+				'--no-abbrev', '--raw', "-S$searchtext",
+				($search_use_regexp ? '--pickaxe-regex' : ()),
+				git_get_head_hash()],
+			separator => "\n", cache => 1);
 		undef %co;
 		my @files;
-		while (my $line = <$fd>) {
+		while (my $line = $pickaxe_read->()) {
 			chomp $line;
 			next unless $line;
 
@@ -5326,7 +5661,6 @@ sub git_search {
 				      "<br/>\n";
 			}
 		}
-		close $fd;
 
 		# finish last commit (warning: repetition!)
 		if (%co) {
@@ -5349,12 +5683,13 @@ sub git_search {
 		print "<table class=\"grep_search\">\n";
 		my $alternate = 1;
 		my $matches = 0;
-		$/ = "\n";
-		open my $fd, "-|", git_cmd(), 'grep', '-n',
-			$search_use_regexp ? ('-E', '-i') : '-F',
-			$searchtext, $co{'tree'};
+		my $grep_read = $repo->progressive_cmd_output(
+			cmd => ['grep', '-n',
+				$search_use_regexp ? ('-E', '-i') : '-F',
+				$searchtext, $co{'tree'}],
+			separator => "\n", cache => 1);
 		my $lastfile = '';
-		while (my $line = <$fd>) {
+		while (my $line = $grep_read->()) {
 			chomp $line;
 			my ($file, $lno, $ltext, $binary);
 			last if ($matches++ > 1000);
@@ -5406,7 +5741,6 @@ sub git_search {
 		} else {
 			print "<div class=\"diff nodifferences\">No matches found</div>\n";
 		}
-		close $fd;
 
 		print "</table>\n";
 	}
@@ -5458,7 +5792,7 @@ EOT
 }
 
 sub git_shortlog {
-	my $head = git_get_head_hash($project);
+	my $head = git_get_head_hash();
 	if (!defined $hash) {
 		$hash = $head;
 	}
@@ -5500,33 +5834,22 @@ sub git_feed {
 	}
 
 	# log/feed of current (HEAD) branch, log of given branch, history of file/directory
-	my $head = $hash || 'HEAD';
-	my @commitlist = parse_commits($head, 150, 0, $file_name);
+	my $head = $hash || $repo->get_sha1('HEAD');  # can be undef
+	my %last_modified = parse_date($repo->get_last_modification);
 
-	my %latest_commit;
-	my %latest_date;
 	my $content_type = "application/$format+xml";
 	if (defined $cgi->http('HTTP_ACCEPT') &&
 		 $cgi->Accept('text/xml') > $cgi->Accept($content_type)) {
 		# browser (feed reader) prefers text/xml
 		$content_type = 'text/xml';
 	}
-	if (defined($commitlist[0])) {
-		%latest_commit = %{$commitlist[0]};
-		%latest_date   = parse_date($latest_commit{'author_epoch'});
-		print $cgi->header(
-			-type => $content_type,
-			-charset => 'utf-8',
-			-last_modified => $latest_date{'rfc2822'});
-	} else {
-		print $cgi->header(
-			-type => $content_type,
-			-charset => 'utf-8');
-	}
+	print $cgi->header(
+		-type => $content_type, -charset => 'utf-8',
+		-last_modified => $last_modified{'rfc2822'});
 
 	# Optimization: skip generating the body if client asks only
 	# for Last-Modified date.
-	return if ($cgi->request_method() eq 'HEAD');
+	return if $cgi->request_method() && $cgi->request_method() eq 'HEAD';
 
 	# header variables
 	my $title = "$site_name - $project/$action";
@@ -5593,15 +5916,11 @@ XML
 			# not twice as wide as tall: 72 x 27 pixels
 			print "<logo>" . esc_url($logo) . "</logo>\n";
 		}
-		if (! %latest_date) {
-			# dummy date to keep the feed valid until commits trickle in:
-			print "<updated>1970-01-01T00:00:00Z</updated>\n";
-		} else {
-			print "<updated>$latest_date{'iso-8601'}</updated>\n";
-		}
+		print "<updated>$last_modified{'iso-8601'}</updated>\n";
 	}
 
 	# contents
+	my @commitlist = parse_commits($head, 150, 0, $file_name) if $head;
 	for (my $i = 0; $i <= $#commitlist; $i++) {
 		my %co = %{$commitlist[$i]};
 		my $commit = $co{'id'};
@@ -5612,13 +5931,11 @@ XML
 		my %cd = parse_date($co{'author_epoch'});
 
 		# get list of changed files
-		open my $fd, "-|", git_cmd(), "diff-tree", '-r', @diff_opts,
-			$co{'parent'} || "--root",
-			$co{'id'}, "--", (defined $file_name ? $file_name : ())
-			or next;
-		my @difftree = map { chomp; $_ } <$fd>;
-		close $fd
-			or next;
+		my @difftree = split "\n", $repo->cmd_output(
+			cmd => ['diff-tree', '-r', @diff_opts,
+				$co{'parent'} || '--root', $co{'id'}, '--',
+				(defined $file_name ? $file_name : ())],
+			cache => 2);
 
 		# print element (entry, item)
 		my $co_url = href(-full=>1, action=>"commitdiff", hash=>$commit);
@@ -5733,16 +6050,8 @@ XML
 
 	foreach my $pr (@list) {
 		my %proj = %$pr;
-		my $head = git_get_head_hash($proj{'path'});
-		if (!defined $head) {
-			next;
-		}
-		$git_dir = "$projectroot/$proj{'path'}";
-		my %co = parse_commit($head);
-		if (!%co) {
-			next;
-		}
-
+		next unless $repo_root->repo(repo_dir => $proj{'path'})
+		    ->get_sha1('HEAD');
 		my $path = esc_html(chop_str($proj{'path'}, 25, 5));
 		my $rss  = "$my_url?p=$proj{'path'};a=rss";
 		my $html = "$my_url?p=$proj{'path'};a=summary";
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 46ba19b..8edbfbd 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -54,7 +54,7 @@ gitweb_run () {
 	# written to web server logs, so we are not interested in that:
 	# we are interested only in properly formatted errors/warnings
 	rm -f gitweb.log &&
-	perl -- "$TEST_DIRECTORY/../gitweb/gitweb.perl" \
+	"$PERL_PATH" -- "$TEST_DIRECTORY/../gitweb/gitweb.perl" \
 		>/dev/null 2>gitweb.log &&
 	if grep -q -s "^[[]" gitweb.log >/dev/null; then false; else true; fi
 
@@ -71,7 +71,7 @@ safe_chmod () {
 
 . ./test-lib.sh
 
-perl -MEncode -e 'decode_utf8("", Encode::FB_CROAK)' >/dev/null 2>&1 || {
+"$PERL_PATH" -MEncode -e 'decode_utf8("", Encode::FB_CROAK)' >/dev/null 2>&1 || {
     test_expect_success 'skipping gitweb tests, perl version is too old' :
     test_done
     exit
diff --git a/t/t9503-gitweb-Mechanize.sh b/t/t9503-gitweb-Mechanize.sh
index 53f2a8a..c0558e5 100755
--- a/t/t9503-gitweb-Mechanize.sh
+++ b/t/t9503-gitweb-Mechanize.sh
@@ -89,6 +89,16 @@ test_expect_success 'set up test repository' '
 	test_tick && git pull . b
 '
 
+# set up empty repository
+# create this as a subdirectory of trash directory; not pretty, but simple
+test_expect_success 'set up empty repository' '
+
+	mkdir empty.git &&
+	cd empty.git &&
+	git init --bare &&
+	cd ..
+'
+
 # set up gitweb configuration
 safe_pwd="$("$PERL_PATH" -MPOSIX=getcwd -e 'print quotemeta(getcwd)')"
 large_cache_root="../t9503/large_cache.tmp"
diff --git a/t/t9503/test.pl b/t/t9503/test.pl
index 93108e7..fca81f7 100755
--- a/t/t9503/test.pl
+++ b/t/t9503/test.pl
@@ -271,6 +271,11 @@ if (test_page '', 'project list (implicit)') {
 		"title contains $site_name");
 	$mech->content_contains('t9503-gitweb-Mechanize test repository',
 		'lists test repository (by description)');
+	if (follow_link( { text => 'empty.git' }, 'empty git repository')) {
+		# Just check that the empty.git summary page is linked
+		# and doesn't die.
+		$mech->back;
+	}
 }
 
 
@@ -578,18 +583,12 @@ for my $action qw( blame blob blob_plain ) {
 		  "$action: look up existent file");
 	test_page("?p=.git;a=$action;f=does_not_exist;hb=HEAD",
 		  "$action: look up non-existent file", 404);
-	TODO: {
-		local $TODO = 'wrong error code (but using Git::Repo will fix this)';
-		test_page("?p=.git;a=$action;f=$directories[0];hb=HEAD",
-			  "$action: look up directory", 400);
-	}
-}
-TODO: {
-	local $TODO = 'wrong error code (but using Git::Repo will fix this)';
-	test_page("?p=.git;a=tree;f=$files[0];hb=HEAD",
-		  'tree: look up existent file', 400);
+	test_page("?p=.git;a=$action;f=$directories[0];hb=HEAD",
+		  "$action: look up directory", 400);
 }
 # Pass valid and invalid paths to tree action
+test_page("?p=.git;a=tree;f=$files[0];hb=HEAD",
+	  'tree: look up existent file', 400);
 test_page("?p=.git;a=tree;f=does_not_exist;hb=HEAD",
 	  'tree: look up non-existent file', 404);
 test_page("?p=.git;a=tree;f=$directories[0];hb=HEAD",
-- 
1.6.0.90.gf3f7b

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH 1/3 v10] gitweb: add test suite with Test::WWW::Mechanize::CGI
  2008-08-18 19:39   ` [PATCH 1/3 v10] gitweb: add test suite with Test::WWW::Mechanize::CGI Lea Wiemann
@ 2008-08-19  1:17     ` Junio C Hamano
  2008-08-19 14:37       ` Lea Wiemann
  0 siblings, 1 reply; 63+ messages in thread
From: Junio C Hamano @ 2008-08-19  1:17 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

Lea Wiemann <lewiemann@gmail.com> writes:

> This test uses Test::WWW::Mechanize::CGI to check gitweb's output.  It
> also uses HTML::Lint, XML::Parser, and Archive::Tar (if present, each)
> to validate the HTML/XML/tgz output, and checks all links on the
> tested pages if --long-tests is given.
>
> Signed-off-by: Jakub Narebski <jnareb@gmail.com>
> Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>

This s-o-b chain is a bit confusing; was this authored by you or Jakub?

> diff --git a/Makefile b/Makefile
> index ca418fc..35779a7 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1289,6 +1289,7 @@ GIT-CFLAGS: .FORCE-GIT-CFLAGS
>  GIT-BUILD-OPTIONS: .FORCE-GIT-BUILD-OPTIONS
>  	@echo SHELL_PATH=\''$(subst ','\'',$(SHELL_PATH_SQ))'\' >$@
>  	@echo TAR=\''$(subst ','\'',$(subst ','\'',$(TAR)))'\' >>$@
> +	@echo PERL_PATH=\''$(subst ','\'',$(PERL_PATH_SQ))'\' >>$@
>  
>  ### Detect Tck/Tk interpreter path changes
>  ifndef NO_TCLTK
> diff --git a/t/t9503-gitweb-Mechanize.sh b/t/t9503-gitweb-Mechanize.sh
> new file mode 100755
> index 0000000..53f2a8a
> --- /dev/null
> +++ b/t/t9503-gitweb-Mechanize.sh
> @@ -0,0 +1,144 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2008 Jakub Narebski
> +# Copyright (c) 2008 Lea Wiemann
> +#
> +
> +# This test supports the --long-tests option.
> +
> +# This test only runs on Perl 5.8 and later versions, since
> +# Test::WWW::Mechanize::CGI requires Perl 5.8.
> +
> +test_description='gitweb tests (using WWW::Mechanize)
> +
> +This test uses Test::WWW::Mechanize::CGI to test gitweb.'
> +
> +# helper functions
> +
> +safe_chmod () {
> +	chmod "$1" "$2" &&
> +	if [ "$(git config --get core.filemode)" = false ]
> +	then
> +		git update-index --chmod="$1" "$2"
> +	fi
> +}

You have this in t9500 as well.  Perhaps it can go to test-lib?

> +. ./test-lib.sh
> +
> +# check if test can be run
> +"$PERL_PATH" -MEncode -e 'decode_utf8("", Encode::FB_CROAK)' >/dev/null 2>&1 || {
> +	test_expect_success \
> +		'skipping gitweb tests, perl version is too old' :
> +	test_done
> +	exit
> +}

It may be helpful to say what exactly is lacking (either "Please upgrade
your Perl to 5.8", or "We want decode_utf8() that can CROAK").

> +"$PERL_PATH" -MTest::WWW::Mechanize::CGI -e '' >/dev/null 2>&1 || {
> +	test_expect_success \
> +		'skipping gitweb tests, Test::WWW::Mechanize::CGI not found' :
> +	test_done
> +	exit
> +}

This one is better then the previous one.  t3300, t4000, t5540, t9113,
t9113, t9600, and t9700 use "say" (or say_color), t3902, t4016, t5000, and
t7004 just use "echo", and t9200, t9400, t9401 and t9500 do this phoney
"success".  We should standardize these by introducing "test_stop_early
$msg".  Then we can lose test_done and exit from these places.

> +# set up test repository
> +test_expect_success 'set up test repository' '
> ...
> +	test_tick && git pull . b
> +'

That "pull . b" is somewhat old fashioned, but is Ok.

> +# set up gitweb configuration
> +safe_pwd="$("$PERL_PATH" -MPOSIX=getcwd -e 'print quotemeta(getcwd)')"
> +large_cache_root="../t9503/large_cache.tmp"

Please use $TEST_DIRECTORY without relying on the location of "t/trash
directory"; it was painful to fix all of them.

> +test_expect_success 'create file cache directory' \
> +	'mkdir -p "$large_cache_root"'
> +cat >gitweb_config.perl <<EOF
> +# gitweb configuration for tests
> ...
> +our @stylesheets = ("file:///$safe_pwd/../../gitweb/gitweb.css");
> +our \$logo = "file:///$safe_pwd/../../gitweb/git-logo.png";
> +our \$favicon = "file:///$safe_pwd/../../gitweb/git-favicon.png";

These also assume "t/trash directory" not being "t/trash/t9503".

> +test_external \
> +	'test gitweb output' \
> +	"$PERL_PATH" ../t9503/test.pl

So does this, and you have catfile('..', '..', ...) in the perl part of
this test.

> +# Search form
> +
> +# Search commit
> +if (get_summary && $mech->submit_form_ok(
> +	    { form_number => 1, fields => { 's' => 'Initial' } },
> +	    'submit search form (default: commit search)')) {
> +	check_page;
> +	$mech->content_contains('Initial commit',
> +				'content contains commit we searched for');
> +}
> +
> +# Pickaxe
> +if (get_summary && $mech->submit_form_ok(
> +	    { form_number => 1, fields => { 's' => 'pickaxe test string',
> +					    'st' => 'pickaxe' } },
> +	    'submit search form (pickaxe)')) {
> +	check_page;
> +	test_link( { text => 'dir1/file1' }, 'file found with pickaxe' );
> +	$mech->content_contains('A U Thor', 'commit author mentioned');
> +}
> +
> +# Grep
> +# Let's hope the pickaxe test string is still present in HEAD.
> +if (get_summary && $mech->submit_form_ok(
> +	    { form_number => 1, fields => { 's' => 'pickaxe test string',
> +					    'st' => 'grep' } },
> +	    'submit search form (grep)')) {
> +	check_page;
> +	test_link( { text => 'dir1/file1' }, 'file found with grep' );
> +}

With these search oriented tests, making sure that you would find what you
expect to find is obviously important, but shouldn't you be also making
sure that irrelevant entries are not found?

It is great that there are tests for each view we care about, even though
the way the individual views are tested look somewhat sketchy.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3 v2] add new Perl API: Git::Repo, Git::Commit, Git::Tag, and Git::RepoRoot
  2008-08-18 19:39   ` [PATCH 2/3 v2] add new Perl API: Git::Repo, Git::Commit, Git::Tag, and Git::RepoRoot Lea Wiemann
@ 2008-08-19  1:32     ` Junio C Hamano
  2008-08-19 15:06       ` Lea Wiemann
  2008-08-19 13:51     ` Lea Wiemann
  1 sibling, 1 reply; 63+ messages in thread
From: Junio C Hamano @ 2008-08-19  1:32 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

Lea Wiemann <lewiemann@gmail.com> writes:

> +=head1 DESCRIPTION
> +...
> +The author, committer and message methods return Unicoded strings,

Unicoded?

> +decoded according to the "encoding" header, with UTF-8 and then
> +Latin-1 as fallbacks.  (These Unicode strings can contain code points
> +...
> +decoding for you, so you should not normally need this method.

Anyway, nicely written description.

> +=back
> +
> +=cut
> +
> +
> +sub tree {
> +	my $self = shift;
> +	$self->_load;
> +	return $self->{tree};
> +}
> +
> +sub parents {
> +	my $self = shift;
> +	$self->_load;
> +	return map { ref($self)->new($self->repo, $_) } @{$self->{parents}};
> +}
> +
> +sub author {
> +	my $self = shift;
> +	$self->_load;
> +	return $self->_decode($self->{author});
> +}
> +
> +sub committer {
> +	my $self = shift;
> +	$self->_load;
> +	return $self->_decode($self->{committer});
> +}
> +
> +sub message {
> +	my $self = shift;
> +	$self->_load;
> +	return $self->_decode($self->{message});
> +}
> +
> +sub encoding {
> +	my $self = shift;
> +	$self->_load;
> +	return $self->{encoding};
> +}
> +
> +# Auxiliary method to load (and parse) the commit object from the
> +# repository if it hasn't already been loaded.  Optional parameter:
> +# The raw contents of the commit object; the commit object will be
> +# retrieved from the repository if that parameter is not given.
> +sub _load {
> +	my ($self, $raw_text) = shift;
> +	return if exists $self->{message};  # already loaded
> +
> +	my $sha1 = $self->sha1;
> +	if (!defined $raw_text) {
> +		# Retrieve from the repository.
> +		(my $type, $raw_text) = $self->repo->get_object($sha1);
> +		die "$sha1 is a $type object (expected a commit object)"
> +		    unless $type eq 'commit';
> +	}
> +
> +	(my $header, $self->{message}) = split "\n\n", $raw_text, 2;
> +	# Parse header.
> +	for my $line (split "\n", $header) {
> +		local $/ = "\n"; # for chomp
> +		chomp($line);
> +		my ($key, $value) = split ' ', $line, 2;
> +		if ($key eq 'tree') {
> +			$self->{tree} = $value;
> +		} elsif ($key eq 'parent') {
> +			push @{$self->{parents}}, $value;
> +		} elsif ($key eq 'author') {
> +			$self->{author} = $value;
> +		} elsif ($key eq 'committer') {
> +			$self->{committer} = $value;
> +		} elsif ($key eq 'encoding') {
> +			$self->{encoding} = $value;
> +		} else {
> +			# Ignore unrecognized header lines.
> +		}
> +	}
> +	undef;
> +}

Aside from seeming repetitive and quite similar to Git::Tag::_load(), I
have to wonder how parent rewriting and grafts come into the picture.
Git::Repo::get_object is just a cat-file which means you are ignoring any
grafts.  As a design decision it is fine, but it needs to be documented.

Also if you run "per path history" using "rev-list $head -- $path" to
collect commits that touch the named $path, you might want to get
rewritten parents for each commit and use it in the presentation, but it
is something you cannot cache easily (i.e. you should not be reusing the
parent list rewritten with respect to other paths).

The tests looked fine, too.

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3 v2] add new Perl API: Git::Repo, Git::Commit, Git::Tag, and Git::RepoRoot
  2008-08-18 19:39   ` [PATCH 2/3 v2] add new Perl API: Git::Repo, Git::Commit, Git::Tag, and Git::RepoRoot Lea Wiemann
  2008-08-19  1:32     ` Junio C Hamano
@ 2008-08-19 13:51     ` Lea Wiemann
  1 sibling, 0 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-08-19 13:51 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git

Since I didn't mention it: The patch series applies on next.


Lea Wiemann wrote:
> +Git::Repo - Read-only access to the Git repositories.
> +
> +Error handling is simple: On a consistent repository, the Perl
> +interface will never die.  You can use the get_sha1 method to resolve
> +arbitrary object names or check the existence of SHA1 hashes; get_sha1
> +will return undef if the object does not exist in the repository.  Any
> +SHA1 that is returned by get_sha1 can be safely passed to the other
> +Git::Repo methods.

Here's some elaboration on the rationale behind the error handling.  As
a reminder, what we do is force developers to resolve object identifiers
(such as HEAD^) into SHA1s first, rather than allowing them to pass in
arbitrary object identifiers into functions.  Here's why:

a) There's really just one point where errors can occur: at the input
boundary (like the command line).  Hence, you usually need one to three
get_sha1 calls to resolve your input object names, and the rest of your
program will be error-handling free with regard to Git::Repo (that is,
if it dies it's either a bug or an error in the repository structure).

On the other hand, if you don't have such an explicit error-checking
boundary at the beginning of your program (where you resolve all
identifiers), you basically allow invalid object identifiers to "creep"
into your code.  For instance, gitweb oftentimes would have statements
like "or die 'ls-tree failed'" or "or die 'commit not found'" deep
inside a function -- in some cases, I found out that these failures
*could* not even happen since the objects were guaranteed to exist by
earlier calls (and hence it was basically dead code); and in many cases
the error messages were simply non-descript -- which brings me to the
next point:

b) Error reporting is really hard to implement: For instance, if
diff-tree returns non-zero, then unless you scrape its STDERR, you can't
tell which of the two to-be-diffed objects didn't exist (or had the
wrong type), and hence you're oftentimes stuck with a generic 'diff-tree
failed' message.  In other words, if a simple diff-tree call goes wrong,
there are three possible causes: (1) The left object is invalid, (2) the
right object is invalid, (3) something fatal happened (bug or repository
breakage).  Distinguishing these cases is hard, and moving the
object-resolving code to the beginning of the API user's program means
that diff-tree failure can only indicate case (3).

c) The error messages you could get from an API are not usually what you
want anyway.  So if you write

     my $diff = diff_tree($obj_identifier_1, $obj_identifier_2)

and expect it to die with a descriptive error message if one of the two
identifiers doesn't point to a valid tree object, the best your error
message can possibly be is "git diff-tree: HEAD^:foo/bar is not a valid
tree object".  Which leaves users puzzled because (1) they didn't call
diff-tree, and (2) the program might have constructed the
"HEAD^:foo/bar" string, and they only passed in parts of it, so it's not
clear to them where "HEAD^:foo/bar" comes from.

d) Last but not least, using exceptions to communicate errors is against
Perl coding conventions -- I would even have to look up the syntax to
check for a specific exception type after an 'eval' block, because I
simply never needed it.

So I think those four reasons really prevail over the extra work of
having to make 1-3 get_sha1 calls at the beginning of your program (and
providing proper error messages if they fail).

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 1/3 v10] gitweb: add test suite with Test::WWW::Mechanize::CGI
  2008-08-19  1:17     ` Junio C Hamano
@ 2008-08-19 14:37       ` Lea Wiemann
  0 siblings, 0 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-08-19 14:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> Lea Wiemann <lewiemann@gmail.com> writes:
> 
>> Signed-off-by: Jakub Narebski <jnareb@gmail.com>
>> Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
> 
> This s-o-b chain is a bit confusing; was this authored by you or Jakub?

Jakub started it, I extended it.  Should we have different SOB lines?

>> +safe_chmod () {
>> +	chmod "$1" "$2" &&
>> +	if [ "$(git config --get core.filemode)" = false ]
>> +	then
>> +		git update-index --chmod="$1" "$2"
>> +	fi
>> +}
> 
> You have this in t9500 as well.  Perhaps it can go to test-lib?

Will do in the next version of this patch.

>> +# check if test can be run
>> +"$PERL_PATH" -MEncode -e 'decode_utf8("", Encode::FB_CROAK)' >/dev/null 2>&1 || {
>> +	test_expect_success \
>> +		'skipping gitweb tests, perl version is too old' :
> 
> It may be helpful to say what exactly is lacking

Right.  Since Encode doesn't run on older Perl versions anyway, I'm
changing it to

"$PERL_PATH" -e 'use 5.008' >/dev/null 2>&1 || {
	test_expect_success \
		'skipping gitweb tests, Perl 5.8 or newer required' :

> t3300, t4000, t5540, t9113,
> t9113, t9600, and t9700 use "say" (or say_color), t3902, t4016, t5000, and
> t7004 just use "echo", and t9200, t9400, t9401 and t9500 do this phoney
> "success".  We should standardize these by introducing "test_stop_early
> $msg".

Yup; maybe "test_skip_all" is clearer though.  I think this should be
done in a separate patch.

>> +	test_tick && git pull . b
> 
> That "pull . b" is somewhat old fashioned, but is Ok.

Is "git merge b" equivalent?  (The test still passes with it.)

>> +large_cache_root="../t9503/large_cache.tmp"
> 
> Please use $TEST_DIRECTORY without relying on the location of "t/trash
> directory"; it was painful to fix all of them.

Ok, fixed all of those.  I'll also move the cache-setup code to patch 3
(gitweb caching), since it doesn't belong here as long as caching isn't
implemented.

>> +# Grep
> 
> With these search oriented tests, making sure that you would find what you
> expect to find is obviously important, but shouldn't you be also making
> sure that irrelevant entries are not found?

Technically yes, but I'm not inclined at the moment to write that test
(at least while I'm not hacking the search part of gitweb).  The test is
basically only there to exercise the code and make sure it returns
*something* sensible, which is where most breakages would occur.

Thanks for all your feedback!  I'll wait with sending a new patch series
until I've collected all feedback.

-- Lea

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 2/3 v2] add new Perl API: Git::Repo, Git::Commit, Git::Tag, and Git::RepoRoot
  2008-08-19  1:32     ` Junio C Hamano
@ 2008-08-19 15:06       ` Lea Wiemann
  0 siblings, 0 replies; 63+ messages in thread
From: Lea Wiemann @ 2008-08-19 15:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano wrote:
> Lea Wiemann <lewiemann@gmail.com> writes:
>> +The author, committer and message methods return Unicoded strings,
> 
> Unicoded?

Fixed. :)

>> +# Auxiliary method to load (and parse) the commit object from the
>> +# repository if it hasn't already been loaded. [...]
>> +sub _load {
>
> Aside from seeming repetitive and quite similar to Git::Tag::_load(),

I didn't see a clean way to extract common code since there are
differences in how some headers are parsed (e.g. 'parents' is an array).

> I have to wonder how parent rewriting and grafts come into the picture.
> Git::Repo::get_object is just a cat-file which means you are ignoring any
> grafts.  As a design decision it is fine, but it needs to be documented.

I'm not sure if I understand grafts correctly, but would something like
the following piece of text in the documentation for
Git::Commit->parents cover it?

"Note that this method does not currently respect grafts.  Hence, if you
are operating on a repository that contains grafts (e.g. a shallow
clone), the commit objects returned by the 'parents' method may not
exist in the repository; your program will die if you call an attribute
accessor method (such as 'author' or 'message') of any parent of an
object with grafted parents.  Thus, if you want your program to work
with shallow clones, you will have to check the .git/info/grafts file
yourself before accessing any parents.

You must not rely on this behavior.  In the future, the 'parents' method
may return the grafted parents instead of the parents recorded in the
commit object."

> Also if you run "per path history" using "rev-list $head -- $path" to
> collect commits that touch the named $path, you might want to get
> rewritten parents for each commit and use it in the presentation,

You're referring to gitweb here, right?  (The perl API doesn't use
rev-list.)  I don't think gitweb is currently tested with grafted
repositories at all, but in general it should respect rewritten parents,
since the parse_commit function uses rev-list without the 'raw' format,
and thus it should get the rewritten parents.

-- Lea

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2008-08-19 15:07 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-11  1:06 [PATCH 0/3] Git::Repo API and gitweb caching Lea Wiemann
2008-07-11  1:10 ` [PATCH 1/3 v9] gitweb: add test suite with Test::WWW::Mechanize::CGI Lea Wiemann
2008-07-11  1:11 ` [PATCH 2/3] add new Git::Repo API Lea Wiemann
2008-07-13 21:38   ` Junio C Hamano
2008-07-14  1:04     ` Lea Wiemann
2008-07-13 23:28   ` Jakub Narebski
2008-07-14  2:29     ` Lea Wiemann
2008-07-14  1:40   ` Petr Baudis
2008-07-14 22:19     ` Lea Wiemann
2008-07-18 16:48       ` Petr Baudis
2008-07-18 17:05         ` Jakub Narebski
2008-07-18 17:17           ` Petr Baudis
2008-07-18 18:09         ` Lea Wiemann
2008-07-18 18:19           ` Petr Baudis
2008-07-18 18:23           ` Johannes Schindelin
2008-07-19 20:54         ` Statictics on Git.pm usage in git commands (was: [PATCH 2/3] add new Git::Repo API) Jakub Narebski
2008-07-19 21:14           ` Petr Baudis
2008-07-20  0:16             ` Jakub Narebski
2008-07-20 21:38               ` Petr Baudis
2008-07-20 10:38           ` Johannes Schindelin
2008-07-20 10:49             ` Petr Baudis
2008-07-20 12:33               ` Johannes Schindelin
2008-07-20 12:58                 ` Petr Baudis
2008-07-20 13:21                   ` Johannes Schindelin
2008-07-14 23:41     ` [PATCH 2/3] add new Git::Repo API Jakub Narebski
2008-07-15  0:11       ` Lea Wiemann
2008-07-18 16:54       ` Petr Baudis
2008-07-19  0:03         ` Jakub Narebski
2008-07-19 19:07         ` Jakub Narebski
2008-07-20 21:36           ` Petr Baudis
2008-07-20 21:50             ` Jakub Narebski
2008-07-16 18:21   ` Jakub Narebski
2008-07-16 20:32     ` Lea Wiemann
2008-07-17 23:49       ` Jakub Narebski
2008-07-18 13:40         ` Lea Wiemann
2008-07-18 15:35           ` Jakub Narebski
2008-07-18 16:51             ` Lea Wiemann
2008-07-11  1:11 ` [PATCH 3/3] gitweb: use new Git::Repo API, and add optional caching Lea Wiemann
2008-07-14 21:23   ` Jakub Narebski
2008-07-14 23:03     ` Lea Wiemann
2008-07-14 23:14       ` Jakub Narebski
2008-07-14 23:56         ` Lea Wiemann
2008-07-15  0:52           ` Jakub Narebski
2008-07-15  1:16             ` Lea Wiemann
2008-07-15  1:28               ` Johannes Schindelin
2008-07-15  1:44                 ` J.H.
2008-07-15  1:50                 ` Lea Wiemann
2008-07-15  2:03                   ` J.H.
2008-07-11  1:21 ` [PATCH 0/3] Git::Repo API and gitweb caching Johannes Schindelin
2008-07-11  9:33 ` Jakub Narebski
2008-07-11 14:07   ` Lea Wiemann
2008-07-11 16:27     ` Abhijit Menon-Sen
2008-07-12 15:08       ` Jakub Narebski
2008-07-19  5:35 ` Lea Wiemann
2008-08-18 19:34 ` Lea Wiemann
2008-08-18 19:39   ` [PATCH 1/3 v10] gitweb: add test suite with Test::WWW::Mechanize::CGI Lea Wiemann
2008-08-19  1:17     ` Junio C Hamano
2008-08-19 14:37       ` Lea Wiemann
2008-08-18 19:39   ` [PATCH 2/3 v2] add new Perl API: Git::Repo, Git::Commit, Git::Tag, and Git::RepoRoot Lea Wiemann
2008-08-19  1:32     ` Junio C Hamano
2008-08-19 15:06       ` Lea Wiemann
2008-08-19 13:51     ` Lea Wiemann
2008-08-18 19:39   ` [PATCH 3/3 v2] gitweb: use new Git::Repo API, and add optional caching Lea Wiemann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).