git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Petr Baudis <pasky@suse.cz>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org,
	John 'Warthog9' Hawley <warthog9@eaglescrag.net>,
	John 'Warthog9' Hawley <warthog9@kernel.org>
Subject: Re: [RFC PATCHv2 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching
Date: Wed, 10 Feb 2010 13:02:57 +0100	[thread overview]
Message-ID: <20100210120257.GP4159@machine.or.cz> (raw)
In-Reply-To: <201002101228.15732.jnareb@gmail.com>

On Wed, Feb 10, 2010 at 12:28:14PM +0100, Jakub Narebski wrote:
> On Wed, 10 Feb 2010, Petr Baudis wrote:
> > On Wed, Feb 10, 2010 at 02:12:24AM +0100, Jakub Narebski wrote:
> > > On Tue, 9 Feb 2010 at 11:30 +0100, Jakub Narebski wrote:
> > > 
> > > > The cache_fetch subroutine captures output (from STDOUT only, as
> > > > STDERR is usually logged) using either ->push_layer()/->pop_layer()
> > > > from PerlIO::Util submodule (if it is available), or by setting and
> > > > restoring *STDOUT.  Note that only the former could be tested reliably
> > > > to be reliable in t9503 test!
> > > 
> > > Scratch that, I have just checked that (at least for Apache + mod_cgi,
> > > but I don't think that it matters) the latter solution, with setting
> > > and restoring *STDOUT doesn't work: I would get data in cache (so it
> > > can be restored later), but instead of output I would get Internal Server
> > > Error ("The server encountered an internal error or misconfiguration and
> > > was unable to complete your request.") without even a hint what the
> > > problem was.  Sprinkling "die ...: $!" didn't help to catch this error:
> > > I suspect that the problem is with capturing.
> > > 
> > > So we either would have to live with non-core PerlIO::Util or (pure Perl)
> > > Capture::Tiny, or do the 'print -> print $out' patch...
> > 
> > All the magic methods seem to be troublesome, but in that case I'd
> > really prefer a level of indirection instead of filehandle - as is,
> > 'print (...) -> output (...)' ins. of 'print (...) -> print $out (...)'
> > (or whatever). That should be really flexible and completely
> > futureproof, and I don't think the level of indirection would incur any
> > measurable overhead, would it?
> 
> First, it is not only 'print (...) -> print $out (...)'; you need to
> do all those:
> 
>    print  <sth>            ->  print  $out <sth>
>    printf <sth>            ->  printf $out <sth>
>    binmode STDOUT, <mode>  ->  binmode $out, <mode>
> 
> Second, using "tie" on filehandle (on *STDOUT) can be used also for 
> just capturing output, not only for "tee"-ing; what's more to print
> while capturing one has to do extra work.  It is quite similar to
> replacing 'print (...)' with 'output (...)' etc., but using
> tie/untie doesn't require large patch to gitweb.
> 
> Third, as you can see below tie-ing is about 1% slower than using
> 'output (...)', which in turn is less than 10% slower than explicit
> filehandle solution i.e. 'print $out (...)'... and is almost twice
> slower than solution using PerlIO::Util
> 
> Benchmark: timing 50000 iterations of output, perlio, print \$out, tie *STDOUT...
>      output: 1.81462 wallclock secs ( 1.77 usr +  0.00 sys =  1.77 CPU) @ 28248.59/s (n=50000)
>      perlio: 1.05585 wallclock secs ( 1.03 usr +  0.00 sys =  1.03 CPU) @ 48543.69/s (n=50000)
> print \$out: 1.70027 wallclock secs ( 1.66 usr +  0.00 sys =  1.66 CPU) @ 30120.48/s (n=50000)
> tie *STDOUT: 1.82248 wallclock secs ( 1.79 usr +  0.00 sys =  1.79 CPU) @ 27932.96/s (n=50000)
>                Rate tie *STDOUT      output print \$out      perlio
> tie *STDOUT 27933/s          --         -1%         -7%        -42%
> output      28249/s          1%          --         -6%        -42%
> print \$out 30120/s          8%          7%          --        -38%
> perlio      48544/s         74%         72%         61%          --
> 
> Benchmark: running output, perlio, print \$out, tie *STDOUT for at least 10 CPU seconds...
>      output: 10.7199 wallclock secs (10.53 usr +  0.00 sys = 10.53 CPU) @ 28029.63/s (n=295152)
>      perlio: 11.2884 wallclock secs (10.46 usr +  0.00 sys = 10.46 CPU) @ 49967.11/s (n=522656)
> print \$out: 10.5978 wallclock secs (10.43 usr +  0.00 sys = 10.43 CPU) @ 30318.79/s (n=316225)
> tie *STDOUT: 11.3525 wallclock secs (10.68 usr +  0.00 sys = 10.68 CPU) @ 27635.96/s (n=295152)
>                Rate tie *STDOUT      output print \$out      perlio
> tie *STDOUT 27636/s          --         -1%         -9%        -45%
> output      28030/s          1%          --         -8%        -44%
> print \$out 30319/s         10%          8%          --        -39%
> perlio      49967/s         81%         78%         65%          --
> need
> 
> Attached there is script that was used to produce those results.

Ok, on my machine it's similar:

                Rate      output tie *STDOUT print \$out
output      150962/s          --         -1%         -7%
tie *STDOUT 152769/s          1%          --         -6%
print \$out 162604/s          8%          6%          --

is roughly consistent image coming out of it.

I guess the time spent here is generally negligible in gitweb anyway...
I suggested using output() because I think hacking it would be _very_
_slightly_ easier than tied filehandle, but you are right that doing
that is also really easy; having the possibility to use PerlIO::Util if
available would be non-essentially nice, but requiring it by stock
gitweb is not reasonable, especially seeing that it's not packaged even
for Debian. ;-)

-- 
				Petr "Pasky" Baudis
If you can't see the value in jet powered ants you should turn in
your nerd card. -- Dunbal (464142)

  reply	other threads:[~2010-02-10 12:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-09 10:30 [RFC PATCHv2 00/10] gitweb: Simple file based output caching Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 01/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 02/10] gitweb/cache.pm - Very simple file based caching Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 03/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching Jakub Narebski
2010-02-10  1:12   ` Jakub Narebski
2010-02-10  1:23     ` Petr Baudis
2010-02-10 11:28       ` Jakub Narebski
2010-02-10 12:02         ` Petr Baudis [this message]
2010-02-10 18:22           ` Jakub Narebski
2010-02-10 20:32             ` Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 05/10] gitweb/cache.pm - Adaptive cache expiration time Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 06/10] gitweb: Use CHI compatibile (compute method) caching Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 07/10] gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 08/10] gitweb/cache.pm - Serve stale data when waiting for filling cache Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 09/10] gitweb/cache.pm - Regenerate (refresh) cache in background Jakub Narebski
2010-02-09 22:23   ` Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100210120257.GP4159@machine.or.cz \
    --to=pasky@suse.cz \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=warthog9@eaglescrag.net \
    --cc=warthog9@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).