Re: [RFC PATCHv2 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jakub Narebski <jnareb@gmail.com>
To: Petr Baudis <pasky@suse.cz>
Cc: git@vger.kernel.org,
	John 'Warthog9' Hawley <warthog9@eaglescrag.net>,
	John 'Warthog9' Hawley <warthog9@kernel.org>
Subject: Re: [RFC PATCHv2 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching
Date: Wed, 10 Feb 2010 19:22:48 +0100	[thread overview]
Message-ID: <201002101922.50010.jnareb@gmail.com> (raw)
In-Reply-To: <20100210120257.GP4159@machine.or.cz>

Dnia środa 10. lutego 2010 13:02, Petr Baudis napisał:
> On Wed, Feb 10, 2010 at 12:28:14PM +0100, Jakub Narebski wrote:
>> On Wed, 10 Feb 2010, Petr Baudis wrote:
>>> On Wed, Feb 10, 2010 at 02:12:24AM +0100, Jakub Narebski wrote:

[...]
>>>> So we either would have to live with non-core PerlIO::Util or (pure Perl)
>>>> Capture::Tiny, or do the 'print -> print $out' patch...
>>> 
>>> All the magic methods seem to be troublesome, but in that case I'd
>>> really prefer a level of indirection instead of filehandle - as is,
>>> 'print (...) -> output (...)' ins. of 'print (...) -> print $out (...)'
>>> (or whatever). That should be really flexible and completely
>>> futureproof, and I don't think the level of indirection would incur any
>>> measurable overhead, would it?
>> 
>> First, it is not only 'print (...) -> print $out (...)'; [...]
>> 
>> Second, using "tie" on filehandle (on *STDOUT) can be used also for 
>> just capturing output, not only for "tee"-ing; [...]
>> 
>> Third, as you can see below tie-ing is about 1% slower than using
>> 'output (...)', which in turn is less than 10% slower than explicit
>> filehandle solution i.e. 'print $out (...)'... and is almost twice
>> slower than solution using PerlIO::Util
[...]
>>                Rate tie *STDOUT      output print \$out      perlio
>> tie *STDOUT 27636/s          --         -1%         -9%        -45%
>> output      28030/s          1%          --         -8%        -44%
>> print \$out 30319/s         10%          8%          --        -39%
>> perlio      49967/s         81%         78%         65%          --
>> need

> Ok, on my machine it's similar:
> 
>                 Rate      output tie *STDOUT print \$out
> output      150962/s          --         -1%         -7%
> tie *STDOUT 152769/s          1%          --         -6%
> print \$out 162604/s          8%          6%          --

I wonder why in my case the 'output (...)' solution was faster than tie,
and you have tie faster than 'output'... but I guess 1% is the noise
level.

> is roughly consistent image coming out of it.
> 
> I guess the time spent here is generally negligible in gitweb anyway...
> I suggested using output() because I think hacking it would be _very_
> _slightly_ easier than tied filehandle, but you are right that doing
> that is also really easy; having the possibility to use PerlIO::Util if
> available would be non-essentially nice, but requiring it by stock
> gitweb is not reasonable, especially seeing that it's not packaged even
> for Debian. ;-)

Well, the idea was to use PerlIO::Util if possible, checking it via

  out $use_perlio_layers = eval { require PerlIO::Util; 1 };

and fallback to generic mechanism if it is not present.  Only the
generic mechanism would have to be changed from manipulating *STDOUT
(*STDOUT = $data_fh etc.) to tied filehandle.

What we need to be careful about is ':utf8' vs ':raw' mode (IO layer).
In the PerlIO layers solution, and in 'print <sth> -> print $out <sth>'
solution where $out = $data_fh, and $data_fh was opened to in-memory
file, the data saved in variable is already converted, already passed
via 'utf8' layer, and is saved as bytes.  And if we use binary mode,
it is passed unchanged, and is also saved as bytes.  Therefore we can
save to cache file in ':raw' more, and read from cache file in ':raw'
mode, and that is why we don't need separate files for text and for
binary output.

PRINT method in class tied to filehandle gets _untransformed_ argument,
so we have to use utf8::encode($str) if in ':utf8' mode, and either
use PerlIO::get_layers on *STDOUT, or provide BINMODE method in tied
class to watch for mode changes.  (Note that e.g. for snapshots we 
print HTTP headers in ':utf8' mode, and the snapshot itself in ':raw'
i.e. binary mode.).

But all that is doable, and not that much work.  Well, perhaps more
than in the case of 'print -> print $out' etc., and opening in-memory
file via

  open $data_fh, '>', \$data;

but not that more, and we don't need extra global variable $out.  But
no large gitweb patch, and no worry about somebody accidentally using
'print <sth>;' or 'printf <fmt>, <sth>;' instead of respectively
'print $out <sth>;' and 'printf $out <fmt>, <sth>;'.

As to how I installed PerlIO::Util for myself (this might be interesting
to other people): in short, I use local::lib bootstrapping and cpan 
client.  I could from start install some Perl modules from CPAN locally
using 'cpan' client (included in perl RPM).  I have asked on #perl 
channel on FreeNode what to do, and they recommended local::lib.  After
following the bootstapping technique described in local::lib manpage
(see e.g. http://p3rl.org/local::lib) installing PerlIO::Util in
~/perl5 is as simple as 'cpan -i PerlIO::Util' (or using 'cpan' client
interactively).

You can always put
  use lib '/path/to/perl5/lib';
in your $GITWEB_CONFIG file.

Perhaps adding something like "use lib __DIR__.'/lib';" somewhere near
beginning of file (where __DIR__ is appropriate expression that expands
to directory the gitweb.cgi/gitweb.perl is in) to gitweb would be a good
idea?  Then you would be able to make __DIR__/lib symlink to local Perl
modules, or put extra modules by hand under __DIR__/lib.
-- 
Jakub Narebski
Poland

next prev parent reply	other threads:[~2010-02-10 18:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-09 10:30 [RFC PATCHv2 00/10] gitweb: Simple file based output caching Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 01/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 02/10] gitweb/cache.pm - Very simple file based caching Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 03/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 04/10] gitweb: Use Cache::Cache compatibile (get, set) output caching Jakub Narebski
2010-02-10  1:12   ` Jakub Narebski
2010-02-10  1:23     ` Petr Baudis
2010-02-10 11:28       ` Jakub Narebski
2010-02-10 12:02         ` Petr Baudis
2010-02-10 18:22           ` Jakub Narebski [this message]
2010-02-10 20:32             ` Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 05/10] gitweb/cache.pm - Adaptive cache expiration time Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 06/10] gitweb: Use CHI compatibile (compute method) caching Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 07/10] gitweb/cache.pm - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 08/10] gitweb/cache.pm - Serve stale data when waiting for filling cache Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 09/10] gitweb/cache.pm - Regenerate (refresh) cache in background Jakub Narebski
2010-02-09 22:23   ` Jakub Narebski
2010-02-09 10:30 ` [RFC PATCHv2 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201002101922.50010.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pasky@suse.cz \
    --cc=warthog9@eaglescrag.net \
    --cc=warthog9@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).