git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Petr Baudis <pasky@suse.cz>
Cc: git@vger.kernel.org,
	John 'Warthog9' Hawley <warthog9@eaglescrag.net>,
	John 'Warthog9' Hawley <warthog9@kernel.org>
Subject: Re: [RFC PATCHv2 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching
Date: Wed, 10 Feb 2010 12:28:14 +0100	[thread overview]
Message-ID: <201002101228.15732.jnareb@gmail.com> (raw)
In-Reply-To: <20100210012343.GS9553@machine.or.cz>

[-- Attachment #1: Type: text/plain, Size: 4118 bytes --]

On Wed, 10 Feb 2010, Petr Baudis wrote:
> On Wed, Feb 10, 2010 at 02:12:24AM +0100, Jakub Narebski wrote:
> > On Tue, 9 Feb 2010 at 11:30 +0100, Jakub Narebski wrote:
> > 
> > > The cache_fetch subroutine captures output (from STDOUT only, as
> > > STDERR is usually logged) using either ->push_layer()/->pop_layer()
> > > from PerlIO::Util submodule (if it is available), or by setting and
> > > restoring *STDOUT.  Note that only the former could be tested reliably
> > > to be reliable in t9503 test!
> > 
> > Scratch that, I have just checked that (at least for Apache + mod_cgi,
> > but I don't think that it matters) the latter solution, with setting
> > and restoring *STDOUT doesn't work: I would get data in cache (so it
> > can be restored later), but instead of output I would get Internal Server
> > Error ("The server encountered an internal error or misconfiguration and
> > was unable to complete your request.") without even a hint what the
> > problem was.  Sprinkling "die ...: $!" didn't help to catch this error:
> > I suspect that the problem is with capturing.
> > 
> > So we either would have to live with non-core PerlIO::Util or (pure Perl)
> > Capture::Tiny, or do the 'print -> print $out' patch...
> 
> All the magic methods seem to be troublesome, but in that case I'd
> really prefer a level of indirection instead of filehandle - as is,
> 'print (...) -> output (...)' ins. of 'print (...) -> print $out (...)'
> (or whatever). That should be really flexible and completely
> futureproof, and I don't think the level of indirection would incur any
> measurable overhead, would it?

First, it is not only 'print (...) -> print $out (...)'; you need to
do all those:

   print  <sth>            ->  print  $out <sth>
   printf <sth>            ->  printf $out <sth>
   binmode STDOUT, <mode>  ->  binmode $out, <mode>

Second, using "tie" on filehandle (on *STDOUT) can be used also for 
just capturing output, not only for "tee"-ing; what's more to print
while capturing one has to do extra work.  It is quite similar to
replacing 'print (...)' with 'output (...)' etc., but using
tie/untie doesn't require large patch to gitweb.

Third, as you can see below tie-ing is about 1% slower than using
'output (...)', which in turn is less than 10% slower than explicit
filehandle solution i.e. 'print $out (...)'... and is almost twice
slower than solution using PerlIO::Util

Benchmark: timing 50000 iterations of output, perlio, print \$out, tie *STDOUT...
     output: 1.81462 wallclock secs ( 1.77 usr +  0.00 sys =  1.77 CPU) @ 28248.59/s (n=50000)
     perlio: 1.05585 wallclock secs ( 1.03 usr +  0.00 sys =  1.03 CPU) @ 48543.69/s (n=50000)
print \$out: 1.70027 wallclock secs ( 1.66 usr +  0.00 sys =  1.66 CPU) @ 30120.48/s (n=50000)
tie *STDOUT: 1.82248 wallclock secs ( 1.79 usr +  0.00 sys =  1.79 CPU) @ 27932.96/s (n=50000)
               Rate tie *STDOUT      output print \$out      perlio
tie *STDOUT 27933/s          --         -1%         -7%        -42%
output      28249/s          1%          --         -6%        -42%
print \$out 30120/s          8%          7%          --        -38%
perlio      48544/s         74%         72%         61%          --

Benchmark: running output, perlio, print \$out, tie *STDOUT for at least 10 CPU seconds...
     output: 10.7199 wallclock secs (10.53 usr +  0.00 sys = 10.53 CPU) @ 28029.63/s (n=295152)
     perlio: 11.2884 wallclock secs (10.46 usr +  0.00 sys = 10.46 CPU) @ 49967.11/s (n=522656)
print \$out: 10.5978 wallclock secs (10.43 usr +  0.00 sys = 10.43 CPU) @ 30318.79/s (n=316225)
tie *STDOUT: 11.3525 wallclock secs (10.68 usr +  0.00 sys = 10.68 CPU) @ 27635.96/s (n=295152)
               Rate tie *STDOUT      output print \$out      perlio
tie *STDOUT 27636/s          --         -1%         -9%        -45%
output      28030/s          1%          --         -8%        -44%
print \$out 30319/s         10%          8%          --        -39%
perlio      49967/s         81%         78%         65%          --
need

Attached there is script that was used to produce those results.
-- 
Jakub Narebski
Poland

[-- Attachment #2: Benchmark of different ways of capturing output in Perl code --]
[-- Type: text/plain, Size: 3491 bytes --]

#!/usr/bin/perl

use strict;
use warnings;

use Test::More;
use Time::HiRes;
use Benchmark qw(:all :hireswallclock);
use Data::Dumper;

use PerlIO::Util;

my $chunk = "test\n";

my $lorem = <<'EOF';
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.

Sed ut perspiciatis unde omnis iste natus error sit voluptatem
accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae
ab illo inventore veritatis et quasi architecto beatae vitae dicta
sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit
aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos
qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui
dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed
quia non numquam eius modi tempora incidunt ut labore et dolore magnam
aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum
exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex
ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in
ea voluptate velit esse quam nihil molestiae consequatur, vel illum
qui dolorem eum fugiat quo voluptas nulla pariatur?
EOF

sub capture_perlio {
	my $data = '';

	*STDOUT->push_layer(scalar => \$data);

	print $chunk;

	*STDOUT->pop_layer();

	return $data;
}

my $out = \*STDOUT;

sub capture_filehandle {
	my $data = '';

	open my $data_fh, '>', \$data;
	$out = $data_fh;

	print $out $chunk;

	$out = \*STDOUT;
	close $data_fh;

	return $data;
}

sub output {
	print $out @_;
}

sub capture_indirection {
	my $data = '';

	open my $data_fh, '>', \$data;
	$out = $data_fh;

	output $chunk;

	$out = \*STDOUT;
	close $data_fh;

	return $data;
}

sub capture_tied {
	my $data = '';

	tie *STDOUT, 'CatchSTDOUT', \$data;

	print $chunk;

	untie *STDOUT;

	return $data;
}

{
	package CatchSTDOUT;

	use strict;
	use warnings;
	use Data::Dumper;

	sub TIEHANDLE {
		my ($proto, $dataref) = @_;
		my $class = ref($proto) || $proto;

		my $self = {};
		$self = bless($self, $class);
		$self->{'scalar'} = $dataref;
		return $self;
	}

	sub PRINT {
		my $self = shift;
		${$self->{'scalar'}} .= join('',@_);
	}
}

# ----------------------------------------------------------------------

print 'capture_perlio      = '.capture_perlio();
print 'capture_filehandle  = '.capture_filehandle();
print 'capture_indirection = '.capture_indirection();
print 'capture_tied        = '.capture_tied();

$chunk = "another test\n";
print 'capture_perlio      = '.capture_perlio();
print 'capture_filehandle  = '.capture_filehandle();
print 'capture_indirection = '.capture_indirection();
print 'capture_tied        = '.capture_tied();

print "\n";


$chunk = $lorem;

my $result = timethese(50_000, {
	'perlio'      => \&capture_perlio,
	'print \$out' => \&capture_filehandle,
	'output'      => \&capture_indirection,
	'tie *STDOUT' => \&capture_indirection,
});
cmpthese($result);
print "\n";

$result = timethese(-10, {
	'perlio'      => \&capture_perlio,
	'print \$out' => \&capture_filehandle,
	'output'      => \&capture_indirection,
	'tie *STDOUT' => \&capture_indirection,
});
cmpthese($result);
print "\n";

  reply	other threads:[~2010-02-10 11:28 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-09 10:30 [RFC PATCHv2 00/10] gitweb: Simple file based output caching Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 01/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 02/10] gitweb/cache.pm - Very simple file based caching Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 03/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching Jakub Narebski
2010-02-10  1:12   ` Jakub Narebski
2010-02-10  1:23     ` Petr Baudis
2010-02-10 11:28       ` Jakub Narebski [this message]
2010-02-10 12:02         ` Petr Baudis
2010-02-10 18:22           ` Jakub Narebski
2010-02-10 20:32             ` Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 05/10] gitweb/cache.pm - Adaptive cache expiration time Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 06/10] gitweb: Use CHI compatibile (compute method) caching Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 07/10] gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 08/10] gitweb/cache.pm - Serve stale data when waiting for filling cache Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 09/10] gitweb/cache.pm - Regenerate (refresh) cache in background Jakub Narebski
2010-02-09 22:23   ` Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201002101228.15732.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pasky@suse.cz \
    --cc=warthog9@eaglescrag.net \
    --cc=warthog9@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).