[RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling
@ 2010-12-22 23:54 Jakub Narebski
  2010-12-22 23:55 ` [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error Jakub Narebski
                   ` (11 more replies)
  0 siblings, 12 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-22 23:54 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley

This is preliminary (proof of concept) version of shortened series
intended as replacement (rewrite) of "Gitweb caching v8" series from
John 'Warthog9' Hawley (J.H.).

This series shows how one can manage exception handling using
die_error like die, even in the presence of output caching.  The
output caching engine has an option that allows to turn off (default)
or on caching of error pages.

This series is unfinished; it does not include adaptive cache
lifetime, nor support for other caching engines than the one provided
(like Cache::Cache or CHI), nor does it support background cache
generation or progress info indicator.

This is just intended as proof of concept.

---
Jakub Narebski (9):
      gitweb: Add optional output caching
      gitweb/lib - Cache captured output (using compute_fh)
      gitweb/lib - Very simple file based cache
      gitweb/lib - Simple output capture by redirecting STDOUT to file
      t/test-lib.sh: Export also GIT_BUILD_DIR in test_external
      gitweb: Prepare for splitting gitweb
      gitweb: Introduce %actions_info, gathering information about actions
      gitweb: use eval + die for error (exception) handling
      gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error

 gitweb/Makefile                                |   22 +
 gitweb/README                                  |   46 ++
 gitweb/gitweb.perl                             |  280 +++++++++++++--
 gitweb/lib/GitwebCache/CacheOutput.pm          |   84 ++++
 gitweb/lib/GitwebCache/Capture/ToFile.pm       |  109 ++++++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |  452 ++++++++++++++++++++++++
 t/gitweb-lib.sh                                |   11 +
 t/t9500-gitweb-standalone-no-errors.sh         |   20 +
 t/t9501-gitweb-standalone-http-status.sh       |   13 +
 t/t9502-gitweb-standalone-parse-output.sh      |   33 ++
 t/t9510-gitweb-capture-interface.sh            |   34 ++
 t/t9510/test_capture_interface.pl              |  132 +++++++
 t/t9511-gitweb-caching-interface.sh            |   34 ++
 t/t9511/test_cache_interface.pl                |  381 ++++++++++++++++++++
 t/t9512-gitweb-cache-output-interface.sh       |   34 ++
 t/t9512/test_cache_output.pl                   |  162 +++++++++
 t/test-lib.sh                                  |    4 
 17 files changed, 1806 insertions(+), 45 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/CacheOutput.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture/ToFile.pm
 create mode 100644 gitweb/lib/GitwebCache/FileCacheWithLocking.pm
 create mode 100755 t/t9510-gitweb-capture-interface.sh
 create mode 100755 t/t9510/test_capture_interface.pl
 create mode 100755 t/t9511-gitweb-caching-interface.sh
 create mode 100755 t/t9511/test_cache_interface.pl
 create mode 100755 t/t9512-gitweb-cache-output-interface.sh
 create mode 100755 t/t9512/test_cache_output.pl

-- 
Jakub Narebski
ShadeHawk on #git
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error
  2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
@ 2010-12-22 23:55 ` Jakub Narebski
  2010-12-23  1:55   ` Jonathan Nieder
  2010-12-22 23:55 ` [RFC PATCH v7 2/9] gitweb: use eval + die for error (exception) handling Jakub Narebski
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2010-12-22 23:55 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley



End the request after die_error finishes, rather than exiting gitweb
instance (perhaps wrapped like in ModPerl::Registry or gitweb.psgi
case).

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

 gitweb/gitweb.perl |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)


diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 4779618..724287b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1169,6 +1169,7 @@ sub run {
 
 		run_request();
 
+	DONE_REQUEST:
 		$post_dispatch_hook->()
 			if $post_dispatch_hook;
 		$first_request = 0;
@@ -3767,7 +3768,7 @@ EOF
 	print "</div>\n";
 
 	git_footer_html();
-	goto DONE_GITWEB
+	goto DONE_REQUEST
 		unless ($opts{'-error_handler'});
 }
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error
  2010-12-22 23:55 ` [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error Jakub Narebski
@ 2010-12-23  1:55   ` Jonathan Nieder
  2010-12-25 22:14     ` Jakub Narebski
  0 siblings, 1 reply; 34+ messages in thread
From: Jonathan Nieder @ 2010-12-23  1:55 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, J.H., John 'Warthog9' Hawley

Jakub Narebski wrote:

> End the request after die_error finishes, rather than exiting gitweb
> instance
[...]
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1169,6 +1169,7 @@ sub run {
>  
>  		run_request();
>  
> +	DONE_REQUEST:
>  		$post_dispatch_hook->()
>  			if $post_dispatch_hook;
>  		$first_request = 0;
> @@ -3767,7 +3768,7 @@ EOF

[side note: the "@@ EOF" line above would say "@@ sub die_error {" if
userdiff.c had perl support and gitattributes used it.]

>  	print "</div>\n";
>  
>  	git_footer_html();
> -	goto DONE_GITWEB
> +	goto DONE_REQUEST
>  		unless ($opts{'-error_handler'});

This seems to remove the last user of the DONE_GITWEB label.  Why not
delete the label, too?

When die_error is called by CGI::Carp (via handle_errors_html), it
does not rearm the error handler afaict.  Previously that did not
matter because die_error kills gitweb; now should it be set up
again?

die_error gets called when server load is too high; I wonder whether
it is right to go back for another request in that case.

A broken per-request (or other) configuration could potentially leave
a gitweb process in a broken state, and until now the state would be
reset on the first error.  I wonder if escape valve would be needed
--- e.g., does the CGI harness take care of starting a new gitweb
process after every couple hundred requests or so?

Aside from those (minor) worries, this patch seems like a good idea.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error
  2010-12-23  1:55   ` Jonathan Nieder
@ 2010-12-25 22:14     ` Jakub Narebski
  2010-12-26  9:07       ` [RFC/PATCH] diff: funcname and word patterns for perl Jonathan Nieder
  2010-12-26  9:50       ` [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error Jonathan Nieder
  0 siblings, 2 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-25 22:14 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, J.H., John 'Warthog9' Hawley

On Thu, 23 Dec 2010, Jonathan Nieder wrote:
> Jakub Narebski wrote:
> 
> > End the request after die_error finishes, rather than exiting gitweb
> > instance
> [...]
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -1169,6 +1169,7 @@ sub run {
> >  
> >  		run_request();
> >  
> > +	DONE_REQUEST:
> >  		$post_dispatch_hook->()
> >  			if $post_dispatch_hook;
> >  		$first_request = 0;
> > @@ -3767,7 +3768,7 @@ EOF
> 
> [side note: the "@@ EOF" line above would say "@@ sub die_error {" if
> userdiff.c had perl support and gitattributes used it.]

Hmmm, I thought that git has Perl-specific diff driver (xfuncname), but
I see that it doesn't.  The default funcname works quite well for Perl
code... with exception of here-documents (or rather their ending).

BTW. do you know how such perl support should look like?

> >  	print "</div>\n";
> >  
> >  	git_footer_html();
> > -	goto DONE_GITWEB
> > +	goto DONE_REQUEST
> >  		unless ($opts{'-error_handler'});
> 
> This seems to remove the last user of the DONE_GITWEB label.  Why not
> delete the label, too?

Well, actually this patch is in this series only for the label ;-)

Anyway, I can simply drop this patch, and have next one in series
(adding exception-based error handling, making die_error work like
'die') delete DONE_GITWEB label...

> When die_error is called by CGI::Carp (via handle_errors_html), it
> does not rearm the error handler afaict.  Previously that did not
> matter because die_error kills gitweb; now should it be set up
> again?

Thanks, I missed this (but after examining it turns out to be a 
non-issue).  That will teach me to leave code outside of run() 
subroutine; one of reasons behind creating c2394fe (gitweb: Put all 
per-connection code in run() subroutine, 2010-05-07) was to clarify 
code flow.

A note: using set_message inside handle_errors_html was necessary 
because if there was a fatal error in die_error, then 
handle_errors_html would be called recursively - this was fixed in 
CGI.pm 3.45, but we cannot rely on this; we cannot rely on having new 
enough version of CGI::Carp that supports set_die_handler either.

But actually handle_errors_html gets called only from fatalsToBrowser,
which in turn gets called from CGI::Carp::die... which ends calling
CODE::die (aka realdie), which ends CGI process anyway.

That is why die_error ends with

	goto DONE_GITWEB
		unless ($opts{'-error_handler'});

i.e. it doesn't goto DONE_GITWEB nor DONE_REQUEST if called from
handle_errors_html anyway.

> die_error gets called when server load is too high; I wonder whether
> it is right to go back for another request in that case.

If client (web browser) are requesting connection, we have to tell it
something anyway.  Note that each request might serve different client.
But when the die_error(503, "The load average on the server is too 
high") doesn't generate load by itself, all should be all right.

> 
> A broken per-request (or other) configuration could potentially leave
> a gitweb process in a broken state, and until now the state would be
> reset on the first error.  I wonder if escape valve would be needed
> --- e.g., does the CGI harness take care of starting a new gitweb
> process after every couple hundred requests or so?

'die $@ if $@' would call CORE::die, which means it would end gitweb
process.

For CGI server it doesn't matter anyway, as for each request the process
is respawned anyway (together with respawning Perl interpreter), and I
think that ModPerl::Registry and FastCGI servers monitor process that it
is to serve requests, and respawn it if/when it dies.

> Aside from those (minor) worries, this patch seems like a good idea.

Thanks a lot for your comments.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC/PATCH] diff: funcname and word patterns for perl
  2010-12-25 22:14     ` Jakub Narebski
@ 2010-12-26  9:07       ` Jonathan Nieder
       [not found]         ` <201012261143.33190.trast@student.ethz.ch>
  2010-12-26 23:14         ` Jakub Narebski
  2010-12-26  9:50       ` [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error Jonathan Nieder
  1 sibling, 2 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-12-26  9:07 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: git, J.H., John 'Warthog9' Hawley, Thomas Rast, Jeff King

The default function name discovery already works quite well for Perl
code... with the exception of here-documents (or rather their ending).

 sub foo {
	print <<END
 here-document
 END
	return 1;
 }

The default funcname pattern treats the unindented END line as a
function declaration and puts it in the @@ line of diff and "grep
--show-function" output.

With a little knowledge of perl syntax, we can do better.  You can
try it out by adding "*.perl diff=perl" to the gitattributes file.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Jakub Narebski wrote:

> BTW. do you know how such perl support should look like?

Maybe something like this?

 Documentation/gitattributes.txt |    2 ++
 t/t4018-diff-funcname.sh        |    2 +-
 userdiff.c                      |   15 +++++++++++++++
 3 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 5a7f936..e59b878 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -494,6 +494,8 @@ patterns are available:
 
 - `pascal` suitable for source code in the Pascal/Delphi language.
 
+- `perl` suitable for source code in the Perl language.
+
 - `php` suitable for source code in the PHP language.
 
 - `python` suitable for source code in the Python language.
diff --git a/t/t4018-diff-funcname.sh b/t/t4018-diff-funcname.sh
index 0a61b57..3646930 100755
--- a/t/t4018-diff-funcname.sh
+++ b/t/t4018-diff-funcname.sh
@@ -32,7 +32,7 @@ EOF
 
 sed 's/beer\\/beer,\\/' < Beer.java > Beer-correct.java
 
-builtin_patterns="bibtex cpp csharp fortran html java objc pascal php python ruby tex"
+builtin_patterns="bibtex cpp csharp fortran html java objc pascal perl php python ruby tex"
 for p in $builtin_patterns
 do
 	test_expect_success "builtin $p pattern compiles" '
diff --git a/userdiff.c b/userdiff.c
index 2d54536..fc2afe3 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -61,6 +61,21 @@ PATTERNS("pascal",
 	 "|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+"
 	 "|<>|<=|>=|:=|\\.\\."
 	 "|[^[:space:]]|[\x80-\xff]+"),
+PATTERNS("perl",
+	 "^[ \t]*package .*;\n"
+	 "^[ \t]*sub .* \\{",
+	 /* -- */
+	 "[[:alpha:]_'][[:alnum:]_']*"
+	 "|0[xb]?[0-9a-fA-F_]*"
+	 /* taking care not to interpret 3..5 as (3.)(.5) */
+	 "|[0-9a-fA-F_]+(\\.[0-9a-fA-F_]+)?([eE][-+]?[0-9_]+)?"
+	 "|=>|-[rwxoRWXOezsfdlpSugkbctTBMAC>]|~~|::"
+	 "|&&=|\\|\\|=|//=|\\*\\*="
+	 "|&&|\\|\\||//|\\+\\+|--|\\*\\*|\\.\\.\\.?"
+	 "|[-+*/%.^&<>=!|]="
+	 "|=~|!~"
+	 "|<<|<>|<=>|>>"
+	 "|[^[:space:]]"),
 PATTERNS("php",
 	 "^[\t ]*(((public|protected|private|static)[\t ]+)*function.*)$\n"
 	 "^[\t ]*(class.*)$",
-- 
1.7.2.3.554.gc9b5c.dirty

^ permalink raw reply related	[flat|nested] 34+ messages in thread

[parent not found: <201012261143.33190.trast@student.ethz.ch>]

* Re: [RFC/PATCH] diff: funcname and word patterns for perl
       [not found]         ` <201012261143.33190.trast@student.ethz.ch>
@ 2010-12-26 10:54           ` Jonathan Nieder
       [not found]             ` <201012261206.11942.trast@student.ethz.ch>
  0 siblings, 1 reply; 34+ messages in thread
From: Jonathan Nieder @ 2010-12-26 10:54 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Jakub Narebski, git, J.H., John 'Warthog9' Hawley,
	Jeff King

Thomas Rast wrote:
> Jonathan Nieder wrote:

>> +	 "|[^[:space:]]"),
>
> I think it should get the |[\x80-\xff]+ arm, too.  That one was
> designed to avoid splitting UTF-8 characters.  At the risk of gluing
> together too many of them, of course, but I think confusing the
> terminal would be worse.

Hmm.  Should it be

	|([\x80-\xff]+[\x00-\x7f])

then, to match exactly one multibyte UTF-8 character?

^ permalink raw reply	[flat|nested] 34+ messages in thread

[parent not found: <201012261206.11942.trast@student.ethz.ch>]

* Re: [RFC/PATCH] diff: funcname and word patterns for perl
       [not found]             ` <201012261206.11942.trast@student.ethz.ch>
@ 2010-12-26 11:22               ` Jonathan Nieder
  0 siblings, 0 replies; 34+ messages in thread
From: Jonathan Nieder @ 2010-12-26 11:22 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Jakub Narebski, git, J.H., John 'Warthog9' Hawley,
	Jeff King

Thomas Rast wrote:

> I just took the laziest (and most obvious) approach possible when I
> wrote the original patterns.  I think the second most laziest one
> would be to observe that bit patterns for leading characters are
> always 11.., while those for continuation chars are 10..
> 
> So that gives
> 
>   |[\xc0-\xff][\x80-\xbf]+

Yes, that's what I was thinking of.  v2 will be a two-part series
starting with that.

BTW, the perl token matcher is pretty half-hearted.  In part this is
because "only perl can parse perl" [1] terrifies me and in part it is
because I am too lazy to write down the state machine implied by
PPI/Token/*.pm.

If some tokenization wizard would like to work on it, something like
the following might produce more pleasant word diffs:

	"[%&$][[:space:]]*[0-9]+"	/* $1 */
	"|[%&$][[:space:]]*([[:alpha:]_']|::)([[:alnum:]_']|::)*"	/* $var1 */
	"|[%&$][[:space:]]*\\$([[:alnum:]_]|::)([[:alnum:]_']|::)*"	/* $$var1 */
	"|[%&$][[:space:]]*\\$\\{"     /* $${ introducing complicated expression */
	"|[%&$][[:space:]]*\\$\\$"     /* $$$ introducing complicated expression */
	"|[%&$][[:space:]]*[^[:alnum:]_:'^$]"	/* $! */
	"|[%&$][[:space:]]*\\^[][A-Z\\^_?]"	/* $^A */
	"|[%&$][[:space:]]*\\{\\^[][A-Z\\^_?]\\}"	/* ${^A} */
	"|[%&$][[:space:]]*\\{\\^[][A-Z\\^_?][[:alnum:]_]*\\}" /* ${^Foo} */
	/* ${var} */
	"|[%&$][[:space:]]*\\{[[:space:]]*([[:alpha:]_']|::)[[:alnum:]_:]*[[:space:]]\\}"
	"|[%&$][[:space:]]*\\{"	/* ${ introducing complicated expression */
	...

though it is an unmaintainable mess. :)

[1] perl::toke.c and http://www.perlmonks.org/?node_id=44722

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH] diff: funcname and word patterns for perl
  2010-12-26  9:07       ` [RFC/PATCH] diff: funcname and word patterns for perl Jonathan Nieder
       [not found]         ` <201012261143.33190.trast@student.ethz.ch>
@ 2010-12-26 23:14         ` Jakub Narebski
  2010-12-27 17:18           ` Junio C Hamano
  1 sibling, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2010-12-26 23:14 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, J.H., John 'Warthog9' Hawley, Thomas Rast, Jeff King

On Sun, 26 Dec 2010 10:07, Jonathan Nieder wrote:
> The default function name discovery already works quite well for Perl
> code... with the exception of here-documents (or rather their ending).
> 
>  sub foo {
> 	print <<END
>  here-document
>  END
> 	return 1;
>  }
> 
> The default funcname pattern treats the unindented END line as a
> function declaration and puts it in the @@ line of diff and "grep
> --show-function" output.
> 
> With a little knowledge of perl syntax, we can do better.  You can
> try it out by adding "*.perl diff=perl" to the gitattributes file.
> 
> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
> ---
> Jakub Narebski wrote:
> 
> > BTW. do you know how such perl support should look like?
> 
> Maybe something like this?

Thanks a lot.


Besides here-doc, there are some tricky things that such code should
be aware about.

1. BEGIN {
   	...
   }

   and similar code blocks (END, CHECK, INIT, ...) which I think should
   be marked as 'BEGIN' in diff chunk.

2. sub foo {
    FOO: while (1) {
   		...
   	}
   }

   which should be marked with 'sub foo {', I think

3. =head1 NAME

   Git - Perl interface to the Git version control system

   =cut

   i.e. POD... which I don't know what to do about.


I have not checked what your code does wrt those.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH] diff: funcname and word patterns for perl
  2010-12-26 23:14         ` Jakub Narebski
@ 2010-12-27 17:18           ` Junio C Hamano
  2010-12-27 22:44             ` Jakub Narebski
  2010-12-28  3:52             ` Jeff King
  0 siblings, 2 replies; 34+ messages in thread
From: Junio C Hamano @ 2010-12-27 17:18 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Jonathan Nieder, git, J.H., John 'Warthog9' Hawley,
	Thomas Rast, Jeff King

Jakub Narebski <jnareb@gmail.com> writes:

> 2. sub foo {
>     FOO: while (1) {
>    		...
>    	}
>    }
>
>    which should be marked with 'sub foo {', I think

I do not think Jonathan's patterns would be fooled by this; it wants to
catch only "package <anything>;" and "sub <anything> {".

Jonathan's pattern set allows them to be indented, and followed by some
garbage at the end., which we might want to tighten.  How many people
start 'package' and the outermost 'sub' indented?

 userdiff.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/userdiff.c b/userdiff.c
index fc2afe3..79569c4 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -62,8 +62,10 @@ PATTERNS("pascal",
 	 "|<>|<=|>=|:=|\\.\\."
 	 "|[^[:space:]]|[\x80-\xff]+"),
 PATTERNS("perl",
-	 "^[ \t]*package .*;\n"
-	 "^[ \t]*sub .* \\{",
+	 "^package .*;\n"
+	 "^sub .* \\{\n"
+	 "^[A-Z]+ \\{\n"	/* BEGIN, END, ... */
+	 "^=head[0-9] ",	/* POD */
 	 /* -- */
 	 "[[:alpha:]_'][[:alnum:]_']*"
 	 "|0[xb]?[0-9a-fA-F_]*"

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH] diff: funcname and word patterns for perl
  2010-12-27 17:18           ` Junio C Hamano
@ 2010-12-27 22:44             ` Jakub Narebski
  2010-12-28  3:52             ` Jeff King
  1 sibling, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-27 22:44 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, git, J.H., John 'Warthog9' Hawley,
	Thomas Rast, Jeff King

On Mon, 27 Dec 2010, Junio C Hamano wrote:
> Jakub Narebski <jnareb@gmail.com> writes:
> 
> > 2. sub foo {
> >     FOO: while (1) {
> >    		...
> >    	}
> >    }
> >
> >    which should be marked with 'sub foo {', I think
> 
> I do not think Jonathan's patterns would be fooled by this; it wants to
> catch only "package <anything>;" and "sub <anything> {".

All right.

> Jonathan's pattern set allows them to be indented, and followed by some
> garbage at the end., which we might want to tighten.  How many people
> start 'package' and the outermost 'sub' indented?
> 
>  userdiff.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/userdiff.c b/userdiff.c
> index fc2afe3..79569c4 100644
> --- a/userdiff.c
> +++ b/userdiff.c
> @@ -62,8 +62,10 @@ PATTERNS("pascal",
>  	 "|<>|<=|>=|:=|\\.\\."
>  	 "|[^[:space:]]|[\x80-\xff]+"),
>  PATTERNS("perl",
> -	 "^[ \t]*package .*;\n"
> -	 "^[ \t]*sub .* \\{",
> +	 "^package .*;\n"

Note that in future Perl 5.14 there would be 'package NAME {' form,
so perhaps it would be better to future-proof and use

  +	 "^package .*[;{]\n"

> +	 "^sub .* \\{\n"

Using "sub foo {" is just a recommended programming convention (like e.g.
GNU convention or K&R convention for C code).  I think it would be better
to relax it a bit, either

  +	 "^sub "

or

  +	 "^sub .*( \\{)?\n"

> +	 "^[A-Z]+ \\{\n"	/* BEGIN, END, ... */

We won't list possible block here?

> +	 "^=head[0-9] ",	/* POD */
>  	 /* -- */
>  	 "[[:alpha:]_'][[:alnum:]_']*"
>  	 "|0[xb]?[0-9a-fA-F_]*"

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH] diff: funcname and word patterns for perl
  2010-12-27 17:18           ` Junio C Hamano
  2010-12-27 22:44             ` Jakub Narebski
@ 2010-12-28  3:52             ` Jeff King
  1 sibling, 0 replies; 34+ messages in thread
From: Jeff King @ 2010-12-28  3:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jakub Narebski, Jonathan Nieder, git, J.H.,
	John 'Warthog9' Hawley, Thomas Rast

On Mon, Dec 27, 2010 at 09:18:17AM -0800, Junio C Hamano wrote:

> Jonathan's pattern set allows them to be indented, and followed by some
> garbage at the end., which we might want to tighten.  How many people
> start 'package' and the outermost 'sub' indented?

FWIW, I sometimes do:

{
  my $static;
  sub foo {
    ... use $static ...
  }
}

so perhaps allowing whitespace in front of the keyword is worthwhile
there. I have never indented "package", though.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error
  2010-12-25 22:14     ` Jakub Narebski
  2010-12-26  9:07       ` [RFC/PATCH] diff: funcname and word patterns for perl Jonathan Nieder
@ 2010-12-26  9:50       ` Jonathan Nieder
  2010-12-26 22:25         ` Jakub Narebski
  1 sibling, 1 reply; 34+ messages in thread
From: Jonathan Nieder @ 2010-12-26  9:50 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, J.H., John 'Warthog9' Hawley

Jakub Narebski wrote:
> On Thu, 23 Dec 2010, Jonathan Nieder wrote:

>> This seems to remove the last user of the DONE_GITWEB label.  Why not
>> delete the label, too?
>
> Well, actually this patch is in this series only for the label ;-)
>
> Anyway, I can simply drop this patch, and have next one in series
> (adding exception-based error handling, making die_error work like
> 'die') delete DONE_GITWEB label...

I like the current order (first the brief patch to change the
semantics, then the more ambitious change to an eval {} based error
handling implementation), but it doesn't matter so much.

>> die_error gets called when server load is too high; I wonder whether
>> it is right to go back for another request in that case.
>
> If client (web browser) are requesting connection, we have to tell it
> something anyway.

Right, I should have thought a few seconds more.  Respawning
gitweb.perl would generate _more_ load[1].

>> A broken per-request (or other) configuration could potentially leave
>> a gitweb process in a broken state,
[...]
> 'die $@ if $@' would call CORE::die, which means it would end gitweb
> process.

This is referring to a later patch?

> For CGI server it doesn't matter anyway, as for each request the process
> is respawned anyway (together with respawning Perl interpreter), and I
> think that ModPerl::Registry and FastCGI servers monitor process that it
> is to serve requests, and respawn it if/when it dies.

Sorry, that was unclear of me.  I meant that buggy configuration could
leave a gitweb process in buggy but alive state and frequent failing
requests might be a way to notice that.  Contrived example (just to
illustrate what I mean):

	our $version .= ".custom";
	if (length $version >= 1000) {	# untested, buggy code goes here.
		@diff_opts = ("--nonsense");
	}

I think I was not right to worry about this, either.  It is better to
make such unusual and buggy configurations as noticeable as possible
so they can be fixed.

[...]
> But actually handle_errors_html gets called only from fatalsToBrowser,
> which in turn gets called from CGI::Carp::die... which ends calling
> CODE::die (aka realdie), which ends CGI process anyway.
>
> That is why die_error ends with
> 
>	goto DONE_GITWEB
>		unless ($opts{'-error_handler'});
> 
> i.e. it doesn't goto DONE_GITWEB nor DONE_REQUEST if called from
> handle_errors_html anyway.
[...]
> Thanks a lot for your comments.

Thanks for a thorough explanation.  For what it's worth, with or
without removal of the DONE_GITWEB: label,

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

[1] I can imagine scenarios in which exiting gitweb would help
alleviate the load, involving:

 - large memory footprint for each gitweb process forcing the system
   into swapping (e.g., from a memory leak), or
 - FastCGI-like server noticing the load and choosing to decrease the
   number of gitweb instances.

In the usual case, presumably gitweb memory footprint is small and
FastCGI-like servers limit the number of gitweb instances to a modest
fixed number.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error
  2010-12-26  9:50       ` [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error Jonathan Nieder
@ 2010-12-26 22:25         ` Jakub Narebski
  0 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-26 22:25 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, J.H., John 'Warthog9' Hawley

On Sun, 26 Dec 2010, Jonathan Nieder wrote:
> Jakub Narebski wrote:
>> On Thu, 23 Dec 2010, Jonathan Nieder wrote:
 
>>> die_error gets called when server load is too high; I wonder whether
>>> it is right to go back for another request in that case.
>>
>> If client (web browser) are requesting connection, we have to tell it
>> something anyway.
> 
> Right, I should have thought a few seconds more.  Respawning
> gitweb.perl would generate _more_ load[1].

> [1] I can imagine scenarios in which exiting gitweb would help
> alleviate the load, involving:
> 
>  - large memory footprint for each gitweb process forcing the system
>    into swapping (e.g., from a memory leak), or
>  - FastCGI-like server noticing the load and choosing to decrease the
>    number of gitweb instances.
> 
> In the usual case, presumably gitweb memory footprint is small and
> FastCGI-like servers limit the number of gitweb instances to a modest
> fixed number.

I assume that CGI / FastCGI / mod_perl (+ ModPerl::Registry) web server
would know how to regulate number of workers according to the server
load.
 
>>> A broken per-request (or other) configuration could potentially leave
>>> a gitweb process in a broken state,
> [...]
>> 'die $@ if $@' would call CORE::die, which means it would end gitweb
>> process.
> 
> This is referring to a later patch?

I'm sorry I haven't made myself clear.

What I meant here is that gitweb includes the following code

	if (-e $GITWEB_CONFIG) {
		do $GITWEB_CONFIG;
		die $@ if $@;
	}

which means that CGI::Carp::die is called, which might call 
handle_errors_html, and which ends in CORE::die, which ends gitweb
process.  So if there is no way for broken configuration to leave
gitweb in a rboken state _at this point in series_.

Thank you for thinking about this, because it could cause problems
(could because I have not checked if it does or if it doesn't) in the
following patch, when gitweb uses eval / die for error handling.
Then it might happen when $per_request_config is false or CODE that
instead of trying to reread broken config on subsequent requests, we
will run with broken config.  It depends if "die"-ing in 
evaluate_gitweb_config would prevent setting $first_request to false.
I'd have to check that.

>> For CGI server it doesn't matter anyway, as for each request the process
>> is respawned anyway (together with respawning Perl interpreter), and I
>> think that ModPerl::Registry and FastCGI servers monitor process that it
>> is to serve requests, and respawn it if/when it dies.
> 
> Sorry, that was unclear of me.  I meant that buggy configuration could
> leave a gitweb process in buggy but alive state and frequent failing
> requests might be a way to notice that.  Contrived example (just to
> illustrate what I mean):
> 
> 	our $version .= ".custom";
> 	if (length $version>= 1000) {	# untested, buggy code goes here.
> 		@diff_opts = ("--nonsense");
> 	}
> 
> I think I was not right to worry about this, either.  It is better to
> make such unusual and buggy configurations as noticeable as possible
> so they can be fixed.

See above.

> [...]
>> But actually handle_errors_html gets called only from fatalsToBrowser,
>> which in turn gets called from CGI::Carp::die... which ends calling
>> CODE::die (aka realdie), which ends CGI process anyway.
>>
>> That is why die_error ends with
>> 
>>	goto DONE_GITWEB
>>		unless ($opts{'-error_handler'});
>> 
>> i.e. it doesn't goto DONE_GITWEB nor DONE_REQUEST if called from
>> handle_errors_html anyway.
> [...]
>> Thanks a lot for your comments.

Which should make it in either commit message, or comments, I guess.
 
> Thanks for a thorough explanation.  For what it's worth, with or
> without removal of the DONE_GITWEB: label,
> 
> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thanks.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v7 2/9] gitweb: use eval + die for error (exception) handling
  2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
  2010-12-22 23:55 ` [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error Jakub Narebski
@ 2010-12-22 23:55 ` Jakub Narebski
  2010-12-23  2:08   ` Jonathan Nieder
  2011-01-04  0:35   ` [RFC PATCH v7 2.5/9] gitweb: Make die_error just die, and use send_error to create error pages Jakub Narebski
  2010-12-22 23:55 ` [RFC PATCH v7 3/9] gitweb: Introduce %actions_info, gathering information about actions Jakub Narebski
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-22 23:55 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley



In gitweb code it is assumed that calling die_error() subroutine would
end request, just like running "die" would.  Up till now it was done by
having die_error() jump to DONE_REQUEST (earlier DONE_GITWEB), or in
earlier version just 'exit' (for mod_perl via ModPerl::Registry it ends
request instead of exiting worker).

Instead of using 'goto DONE_REQUEST' for longjmp-like nonlocal jump, or
using 'exit', gitweb uses now native for Perl exception handlingin the
form of eval / die pair ("eval BLOCK" to trap exceptions, "die LIST" to
raise/throw them).

Up till now the "goto DONE_REQUEST" solution was enough, but with the
coming output caching support and it adding modular structure to gitweb,
it would be difficult to continue to keep using this solution
(e.g. interaction with capturing output).


Because gitweb now traps all exceptions occuring run_request(), the
handle_errors_html handler (set via set_message from CGI::Carp) is not
needed; gitweb can call die_error in -error_handler mode itself.  This
has the advantage that we can now set correct HTTP header (handler from
CGI::Carp::set_message was run after HTTP headers were already sent).

Gitweb assumes here that exceptions thrown by Perl would be simple
strings; die_error() throws hash reference (if not for minimal
extrenal dependencies, it would be probable object of Class::Exception
or Throwable class thrown).

Note: in newer versions of CGI::Carp there is set_die_handler(), where
handler have to set HTTP headers to the browser itself, but we cannot
rely on new enough CGI::Carp version to have been installed.  Also
set_die_handler interferes with fatalsToBrowser.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

 gitweb/gitweb.perl |   26 ++++++++------------------
 1 files changed, 8 insertions(+), 18 deletions(-)


diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 724287b..c7a1892 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -12,7 +12,7 @@ use strict;
 use warnings;
 use CGI qw(:standard :escapeHTML -nosticky);
 use CGI::Util qw(unescape);
-use CGI::Carp qw(fatalsToBrowser set_message);
+use CGI::Carp qw(fatalsToBrowser);
 use Encode;
 use Fcntl ':mode';
 use File::Find qw();
@@ -1045,21 +1045,6 @@ sub configure_gitweb_features {
 	}
 }
 
-# custom error handler: 'die <message>' is Internal Server Error
-sub handle_errors_html {
-	my $msg = shift; # it is already HTML escaped
-
-	# to avoid infinite loop where error occurs in die_error,
-	# change handler to default handler, disabling handle_errors_html
-	set_message("Error occured when inside die_error:\n$msg");
-
-	# you cannot jump out of die_error when called as error handler;
-	# the subroutine set via CGI::Carp::set_message is called _after_
-	# HTTP headers are already written, so it cannot write them itself
-	die_error(undef, undef, $msg, -error_handler => 1, -no_http_header => 1);
-}
-set_message(\&handle_errors_html);
-
 # dispatch
 sub dispatch {
 	if (!defined $action) {
@@ -1167,7 +1152,11 @@ sub run {
 		$pre_dispatch_hook->()
 			if $pre_dispatch_hook;
 
-		run_request();
+		eval { run_request() };
+		if (defined $@ && !ref($@)) {
+			# some Perl error, but not one thrown by die_error
+			die_error(undef, undef, $@, -error_handler => 1);
+		}
 
 	DONE_REQUEST:
 		$post_dispatch_hook->()
@@ -3768,7 +3757,8 @@ EOF
 	print "</div>\n";
 
 	git_footer_html();
-	goto DONE_REQUEST
+
+	die {'status' => $status, 'error' => $error}
 		unless ($opts{'-error_handler'});
 }
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 2/9] gitweb: use eval + die for error (exception) handling
  2010-12-22 23:55 ` [RFC PATCH v7 2/9] gitweb: use eval + die for error (exception) handling Jakub Narebski
@ 2010-12-23  2:08   ` Jonathan Nieder
  2010-12-25 23:17     ` Jakub Narebski
  2011-01-04  0:35   ` [RFC PATCH v7 2.5/9] gitweb: Make die_error just die, and use send_error to create error pages Jakub Narebski
  1 sibling, 1 reply; 34+ messages in thread
From: Jonathan Nieder @ 2010-12-23  2:08 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, J.H., John 'Warthog9' Hawley

Jakub Narebski wrote:

> Gitweb assumes here that exceptions thrown by Perl would be simple
> strings; die_error() throws hash reference (if not for minimal
> extrenal dependencies, it would be probable object of Class::Exception
> or Throwable class thrown).

Hmm, why not throw an object of new type Gitweb::Exception?

> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -1045,21 +1045,6 @@ sub configure_gitweb_features {
>  	}
>  }
>  
> -# custom error handler: 'die <message>' is Internal Server Error
> -sub handle_errors_html {
> -	my $msg = shift; # it is already HTML escaped
> -
> -	# to avoid infinite loop where error occurs in die_error,
> -	# change handler to default handler, disabling handle_errors_html
> -	set_message("Error occured when inside die_error:\n$msg");
> -
> -	# you cannot jump out of die_error when called as error handler;
> -	# the subroutine set via CGI::Carp::set_message is called _after_
> -	# HTTP headers are already written, so it cannot write them itself
> -	die_error(undef, undef, $msg, -error_handler => 1, -no_http_header => 1);
> -}
> -set_message(\&handle_errors_html);
> -

Hoorah!

>  # dispatch
>  sub dispatch {
>  	if (!defined $action) {
> @@ -1167,7 +1152,11 @@ sub run {
>  		$pre_dispatch_hook->()
>  			if $pre_dispatch_hook;
>  
> -		run_request();
> +		eval { run_request() };
> +		if (defined $@ && !ref($@)) {
> +			# some Perl error, but not one thrown by die_error
> +			die_error(undef, undef, $@, -error_handler => 1);
> +		}

The !ref($@) seems overzealous, which is why I am wondering if it
would be possible to use bless() for a finer-grained check.

>  
>  	DONE_REQUEST:
>  		$post_dispatch_hook->()
> @@ -3768,7 +3757,8 @@ EOF
>  	print "</div>\n";
>  
>  	git_footer_html();
> -	goto DONE_REQUEST
> +
> +	die {'status' => $status, 'error' => $error}
>  		unless ($opts{'-error_handler'});

Is the DONE_REQUEST label still needed?

Thanks, I am happy to see the semantics becoming less thorny.
Jonathan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 2/9] gitweb: use eval + die for error (exception) handling
  2010-12-23  2:08   ` Jonathan Nieder
@ 2010-12-25 23:17     ` Jakub Narebski
  0 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-25 23:17 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, J.H., John 'Warthog9' Hawley

On Thu, 23 Dec 2010, Jonathan Nieder wrote:
> Jakub Narebski wrote:
> 
> > Gitweb assumes here that exceptions thrown by Perl would be simple
> > strings; die_error() throws hash reference (if not for minimal
> > external dependencies, it would be probable object of Class::Exception
> > or Throwable class thrown).
> 
> Hmm, why not throw an object of new type Gitweb::Exception?

First, 'gitweb: Prepare for splitting gitweb' commit is only later in
series... ;-) but that of course is not a serious issue.

Second, more important is that I'd rather gitweb doesn't go "reinvent
the wheel" route.  I'd rather (re)use Exception::Class (like e.g. 
SVN::Web does it) if we go the OO exception handling route.

But if we are going to use Exception::Class, then we can also use
Try::Tiny, I think.

> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -1045,21 +1045,6 @@ sub configure_gitweb_features {
> >  	}
> >  }
> >  
> > -# custom error handler: 'die <message>' is Internal Server Error
> > -sub handle_errors_html {
> > -	my $msg = shift; # it is already HTML escaped
> > -
> > -	# to avoid infinite loop where error occurs in die_error,
> > -	# change handler to default handler, disabling handle_errors_html
> > -	set_message("Error occured when inside die_error:\n$msg");
> > -
> > -	# you cannot jump out of die_error when called as error handler;
> > -	# the subroutine set via CGI::Carp::set_message is called _after_
> > -	# HTTP headers are already written, so it cannot write them itself
> > -	die_error(undef, undef, $msg, -error_handler => 1, -no_http_header => 1);
> > -}
> > -set_message(\&handle_errors_html);
> > -
> 
> Hoorah!

Yeah, that is very nice.

> >  # dispatch
> >  sub dispatch {
> >  	if (!defined $action) {
> > @@ -1167,7 +1152,11 @@ sub run {
> >  		$pre_dispatch_hook->()
> >  			if $pre_dispatch_hook;
> >  
> > -		run_request();
> > +		eval { run_request() };
> > +		if (defined $@ && !ref($@)) {

Ooops, it should be 'if ($@ ...)', not 'if (defined $@ ...)'.

> > +			# some Perl error, but not one thrown by die_error
> > +			die_error(undef, undef, $@, -error_handler => 1);
> > +		}
> 
> The !ref($@) seems overzealous, which is why I am wondering if it
> would be possible to use bless() for a finer-grained check.

You meant Scalar::Util::blessed here, isn't it? Fortunately Scalar::Util
is core Perl module.

By 'overzealous' do you mean here possibility of catching what we 
shouldn't, i.e. non-gitweb error (not thrown by die_error)?  We can
narrow it to "ref($@) eq 'HASH'", but I don't think it would be ever
necessary: Perl throws string exceptions.

> >  
> >  	DONE_REQUEST:
> >  		$post_dispatch_hook->()
> > @@ -3768,7 +3757,8 @@ EOF
> >  	print "</div>\n";
> >  
> >  	git_footer_html();
> > -	goto DONE_REQUEST
> > +
> > +	die {'status' => $status, 'error' => $error}
> >  		unless ($opts{'-error_handler'});
> 
> Is the DONE_REQUEST label still needed?

No it isn't.

> Thanks, I am happy to see the semantics becoming less thorny.

Now I should check if this doesn't affect gitweb performance too badly.
IIRC I have chosen 'goto DONE_GITWEB' because I didn't know about 
ModPerl::Registry redefining 'exit' (why it was done), and because of
some microbenchmark showing that it performs better than die/eval (why
this specific solution)...

But I think that the performance hit would be negligible in practice;
making gitweb more maintainable is I think worth the cost.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v7 2.5/9] gitweb: Make die_error just die, and use send_error to create error pages
  2010-12-22 23:55 ` [RFC PATCH v7 2/9] gitweb: use eval + die for error (exception) handling Jakub Narebski
  2010-12-23  2:08   ` Jonathan Nieder
@ 2011-01-04  0:35   ` Jakub Narebski
  1 sibling, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2011-01-04  0:35 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley, Jonathan Nieder

Unify error handling by treating errors from Perl (and thrown early in
process using 'die STRING'), and errors from gitweb in the same way.
This means that in both cases error page is generated after an error
is caught in run() subroutine.

die_error() subroutine is now split into three: gen_error() which
massages parameters (escaping HTML, turning HTTP status number into
full HTTP status code), die_error() which uses gen_error() and just
throws an error (and does not generate an error page), and
send_error() which catually generate error page based on provided
error / exception.


Sidenote: probably in the future instead of using simple hash for
throwing gitweb exception, gitweb would use some custom error class,
e.g. derivative of Exception::Class (like SVN::Web does it).

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is sent early to facilitate early comments.  It passes test suite,
but it was not extensively tested.

Now die_error() functions mode like 'die'...

 gitweb/gitweb.perl |   47 ++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c7a1892..5854f73 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -1153,9 +1154,13 @@ sub run {
 			if $pre_dispatch_hook;
 
 		eval { run_request() };
-		if (defined $@ && !ref($@)) {
+		my $error = $@;
+		if ($error) {
 			# some Perl error, but not one thrown by die_error
-			die_error(undef, undef, $@, -error_handler => 1);
+			$error = gen_error(undef, undef, $error)
+				unless ref($error);
+
+			send_error($error);
 		}
 
 	DONE_REQUEST:
@@ -3730,11 +3735,14 @@ sub git_footer_html {
 #      an unknown error occurred (e.g. the git binary died unexpectedly).
 # 503: The server is currently unavailable (because it is overloaded,
 #      or down for maintenance).  Generally, this is a temporary state.
-sub die_error {
+
+# gen_error()  generates error object from parameters
+# die_error()  uses gen_error() to generate error object and dies
+# send_error() generates an error page from provided error object
+sub gen_error {
 	my $status = shift || 500;
 	my $error = esc_html(shift) || "Internal Server Error";
 	my $extra = shift;
-	my %opts = @_;
 
 	my %http_responses = (
 		400 => '400 Bad Request',
@@ -3743,23 +3751,40 @@ sub die_error {
 		500 => '500 Internal Server Error',
 		503 => '503 Service Unavailable',
 	);
-	git_header_html($http_responses{$status}, undef, %opts);
+
+	my $err = {
+		'status' => $status,
+		'http_status' => $http_responses{$status},
+		'error'  => $error,
+		'extra'  => $extra,
+	};
+	return $err;
+}
+
+sub die_error {
+	my $error = gen_error(@_);
+	print STDERR Dumper($error);
+	die $error;
+}
+
+sub send_error {
+	my $error = shift;
+
+	git_header_html($error->{'http_status'}, undef);
+
 	print <<EOF;
 <div class="page_body">
 <br /><br />
-$status - $error
+$error->{'status'} - $error->{'error'}
 <br />
 EOF
-	if (defined $extra) {
+	if (defined $error->{'extra'}) {
 		print "<hr />\n" .
-		      "$extra\n";
+		      "$error->{'extra'}\n";
 	}
 	print "</div>\n";
 
 	git_footer_html();
-
-	die {'status' => $status, 'error' => $error}
-		unless ($opts{'-error_handler'});
 }
 
 ## ----------------------------------------------------------------------
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v7 3/9] gitweb: Introduce %actions_info, gathering information about actions
  2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
  2010-12-22 23:55 ` [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error Jakub Narebski
  2010-12-22 23:55 ` [RFC PATCH v7 2/9] gitweb: use eval + die for error (exception) handling Jakub Narebski
@ 2010-12-22 23:55 ` Jakub Narebski
  2010-12-22 23:56 ` [RFC PATCH v7 4/9] gitweb: Prepare for splitting gitweb Jakub Narebski
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-22 23:55 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley



Currently it only contains information about output format, and is not
used anywhere.  It will be used to check whether current action
produces HTML output, and therefore is displaying HTML-based progress
info about (re)generating cache makes sense.

It can contain information about allowed extra options, whether to
display link to feed (Atom or RSS), etc. in easier and faster way than
listing all matching or all non-matching actions at appropriate place.


Currently not used; will be used in next commit, to check if action
produces HTML output and therefore we can use HTML-specific progress
indicator.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

 gitweb/gitweb.perl |   57 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 53 insertions(+), 4 deletions(-)


diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c7a1892..e50654b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -749,6 +749,54 @@ our %allowed_options = (
 	"--no-merges" => [ qw(rss atom log shortlog history) ],
 );
 
+# action => {
+# 	# what is the output format (content-type) of action
+# 	'output_format' => ('html' | 'text' | 'feed' | 'binary' | undef),
+# 	# does action require $project parameter to work
+# 	'needs_project' => (boolean | undef),
+# 	# log-like action, can start with arbitrary ref or revision
+# 	'log_like' => (boolean | undef),
+# 	# has no specific feed, or should lik to OPML / generic project feed
+# 	'no_feed' => (boolean | undef),
+# 	# allowed options to be passed ussing 'opt' parameter
+# 	'allowed_options' => { 'option_1' => 1 [, ... ] },
+# }
+our %actions_info = ();
+sub evaluate_actions_info {
+	our %actions_info;
+	our (%actions);
+
+	# unless explicitely stated otherwise, default output format is html
+	# most actions needs $project parameter
+	foreach my $action (keys %actions) {
+		$actions_info{$action}{'output_format'} = 'html';
+		$actions_info{$action}{'needs_project'} = 1;
+	}
+	# list all exceptions; undef means variable format (no definite format)
+	$actions_info{$_}{'output_format'} = 'text'
+		foreach qw(commitdiff_plain patch patches project_index blame_data);
+	$actions_info{$_}{'output_format'} = 'feed'
+		foreach qw(rss atom opml); # there are different types (document formats) of XML
+	$actions_info{$_}{'output_format'} = undef
+		foreach qw(blob_plain object);
+	$actions_info{'snapshot'}{'output_format'} = 'binary';
+
+	$actions_info{$_}{'needs_project'} = 0
+		foreach qw(opml project_list project_index);
+
+	$actions_info{$_}{'log_like'} = 1
+		foreach qw(log shortlog history);
+
+	$actions_info{$_}{'no_feed'} = 1
+		foreach qw(tags heads forks tag search);
+
+	foreach my $opt (keys %allowed_options) {
+		foreach my $act (@{$allowed_options{$opt}}) {
+			$actions_info{$act}{'allowed_options'}{$opt} = 1;
+		}
+	}
+}
+
 # fill %input_params with the CGI parameters. All values except for 'opt'
 # should be single values, but opt can be an array. We should probably
 # build an array of parameters that can be multi-valued, but since for the time
@@ -980,7 +1028,7 @@ sub evaluate_and_validate_params {
 		if (not exists $allowed_options{$opt}) {
 			die_error(400, "Invalid option parameter");
 		}
-		if (not grep(/^$action$/, @{$allowed_options{$opt}})) {
+		if (!$actions_info{$action}{'allowed_options'}{$opt}) {
 			die_error(400, "Invalid option parameter for this action");
 		}
 	}
@@ -1061,7 +1109,7 @@ sub dispatch {
 	if (!defined($actions{$action})) {
 		die_error(400, "Unknown action");
 	}
-	if ($action !~ m/^(?:opml|project_list|project_index)$/ &&
+	if ($actions_info{$action}{'needs_project'} &&
 	    !$project) {
 		die_error(400, "Project needed");
 	}
@@ -1142,6 +1190,7 @@ sub evaluate_argv {
 
 sub run {
 	evaluate_argv();
+	evaluate_actions_info();
 
 	$first_request = 1;
 	$pre_listen_hook->()
@@ -1803,7 +1852,7 @@ sub format_ref_marker {
 
 			if ($indirect) {
 				$dest_action = "tag" unless $action eq "tag";
-			} elsif ($action =~ /^(history|(short)?log)$/) {
+			} elsif ($actions_info{$action}{'log_like'}) {
 				$dest_action = $action;
 			}
 
@@ -2277,7 +2326,7 @@ sub get_feed_info {
 	return unless (defined $project);
 	# some views should link to OPML, or to generic project feed,
 	# or don't have specific feed yet (so they should use generic)
-	return if ($action =~ /^(?:tags|heads|forks|tag|search)$/x);
+	return if ($actions_info{$action}{'no_feed'});
 
 	my $branch;
 	# branches refs uses 'refs/heads/' prefix (fullname) to differentiate

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v7 4/9] gitweb: Prepare for splitting gitweb
  2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
                   ` (2 preceding siblings ...)
  2010-12-22 23:55 ` [RFC PATCH v7 3/9] gitweb: Introduce %actions_info, gathering information about actions Jakub Narebski
@ 2010-12-22 23:56 ` Jakub Narebski
  2010-12-24  9:29   ` Jonathan Nieder
  2010-12-22 23:56 ` [RFC PATCH v7 5/9] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2010-12-22 23:56 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley



Prepare gitweb for having been split into modules that are to be
installed alongside gitweb in 'lib/' subdirectory, by adding

  use lib __DIR__.'/lib';

to gitweb.perl (to main gitweb script), and preparing for putting
modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile.

This preparatory work allows to add new module to gitweb by simply
adding

  GITWEB_MODULES += <module>

to gitweb/Makefile (assuming that the module is in 'gitweb/lib/'
directory).

While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED to
allow testing installed version of gitweb and installed version of
modules (for future tests which would check individual (sub)modules).


Using __DIR__ from Dir::Self module (not in core, that's why currently
gitweb includes excerpt of code from Dir::Self defining __DIR__) was
chosen over using FindBin-based solution (in core since perl 5.00307,
while gitweb itself requires at least perl 5.8.0) because FindBin uses
BEGIN block, which is a problem under mod_perl and other persistent
Perl environments (thought there are workarounds).

At Pavan Kumar Sankara suggestion gitweb/Makefile uses

  install [OPTION]... SOURCE... DIRECTORY

format (2nd format) with single SOURCE rather than

  install [OPTION]... SOURCE DEST

format (1st format) because of security reasons (race conditions).
Modern GNU install has `-T' / `--no-target-directory' option, but we
cannot rely that the $(INSTALL) we are using supports this option.

The install-modules target in gitweb/Makefile uses shell 'for' loop,
instead of make's $(foreach) function, to avoid possible problem with
generating a command line that exceeded the maximum argument list
length.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

 gitweb/Makefile    |   17 +++++++++++++++--
 gitweb/gitweb.perl |    8 ++++++++
 2 files changed, 23 insertions(+), 2 deletions(-)


diff --git a/gitweb/Makefile b/gitweb/Makefile
index 0a6ac00..e6029e1 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -57,6 +57,7 @@ PERL_PATH  ?= /usr/bin/perl
 bindir_SQ = $(subst ','\'',$(bindir))#'
 gitwebdir_SQ = $(subst ','\'',$(gitwebdir))#'
 gitwebstaticdir_SQ = $(subst ','\'',$(gitwebdir)/static)#'
+gitweblibdir_SQ = $(subst ','\'',$(gitwebdir)/lib)#'
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))#'
 PERL_PATH_SQ  = $(subst ','\'',$(PERL_PATH))#'
 DESTDIR_SQ    = $(subst ','\'',$(DESTDIR))#'
@@ -153,20 +154,32 @@ test:
 
 test-installed:
 	GITWEB_TEST_INSTALLED='$(DESTDIR_SQ)$(gitwebdir_SQ)' \
+	GITWEBLIBDIR='$(DESTDIR_SQ)$(gitweblibdir_SQ)' \
 		$(MAKE) -C ../t gitweb-test
 
 ### Installation rules
 
-install: all
+install: all install-modules
 	$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitwebdir_SQ)'
 	$(INSTALL) -m 755 $(GITWEB_PROGRAMS) '$(DESTDIR_SQ)$(gitwebdir_SQ)'
 	$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitwebstaticdir_SQ)'
 	$(INSTALL) -m 644 $(GITWEB_FILES) '$(DESTDIR_SQ)$(gitwebstaticdir_SQ)'
 
+install-modules:
+	install_dirs="$(sort $(dir $(GITWEB_MODULES)))" && \
+	for dir in $$install_dirs; do \
+		test -d '$(DESTDIR_SQ)$(gitweblibdir_SQ)'/"$$dir" || \
+		$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitweblibdir_SQ)'/"$$dir"; \
+	done
+	gitweb_modules="$(GITWEB_MODULES)" && \
+	for mod in $$gitweb_modules; do \
+		$(INSTALL) -m 644 "lib/$$mod" '$(DESTDIR_SQ)$(gitweblibdir_SQ)'/"$$(dirname $$mod)"; \
+	done
+
 ### Cleaning rules
 
 clean:
 	$(RM) gitweb.cgi static/gitweb.min.js static/gitweb.min.css GITWEB-BUILD-OPTIONS
 
-.PHONY: all clean install test test-installed .FORCE-GIT-VERSION-FILE FORCE
+.PHONY: all clean install install-modules test test-installed .FORCE-GIT-VERSION-FILE FORCE
 
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e50654b..880fdf2 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -10,6 +10,14 @@
 use 5.008;
 use strict;
 use warnings;
+
+use File::Spec;
+# __DIR__ is taken from Dir::Self __DIR__ fragment
+sub __DIR__ () {
+	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
+}
+use lib __DIR__ . '/lib';
+
 use CGI qw(:standard :escapeHTML -nosticky);
 use CGI::Util qw(unescape);
 use CGI::Carp qw(fatalsToBrowser);

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 4/9] gitweb: Prepare for splitting gitweb
  2010-12-22 23:56 ` [RFC PATCH v7 4/9] gitweb: Prepare for splitting gitweb Jakub Narebski
@ 2010-12-24  9:29   ` Jonathan Nieder
  2010-12-26 22:54     ` Jakub Narebski
  0 siblings, 1 reply; 34+ messages in thread
From: Jonathan Nieder @ 2010-12-24  9:29 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, J.H., John 'Warthog9' Hawley

Jakub Narebski wrote:

> Prepare gitweb for having been split into modules that are to be
> installed alongside gitweb in 'lib/' subdirectory, by adding
> 
>   use lib __DIR__.'/lib';
> 
> to gitweb.perl (to main gitweb script), and preparing for putting
> modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile.

Spelled out, this means modules would typically go in

	/usr/share/gitweb/lib

Is that the right place?  I suspect something like

	/usr/lib/gitweb/

could make sense in some installations for two reasons:

 - even braindamaged webserver configurations would not serve lib/
   as static files in that case;

 - if some modules are implemented in C for speed, they would need
   to go in /usr/lib anyway to follow usual filesystem conventions.

Does the Makefile let us override the directory with such a setting?

> While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED to
> allow testing installed version of gitweb and installed version of
> modules (for future tests which would check individual (sub)modules).
> 
> Using __DIR__ from Dir::Self module (not in core, that's why currently
> gitweb includes excerpt of code from Dir::Self defining __DIR__) was
> chosen over using FindBin-based solution (in core since perl 5.00307,
> while gitweb itself requires at least perl 5.8.0) because FindBin uses
> BEGIN block

This explanation and the code below leave me nervous that the answer
might be "no". ;-)

[...]
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -10,6 +10,14 @@
>  use 5.008;
>  use strict;
>  use warnings;
> +
> +use File::Spec;
> +# __DIR__ is taken from Dir::Self __DIR__ fragment
> +sub __DIR__ () {
> +	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
> +}
> +use lib __DIR__ . '/lib';
> +
>  use CGI qw(:standard :escapeHTML -nosticky);
>  use CGI::Util qw(unescape);
>  use CGI::Carp qw(fatalsToBrowser);

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 4/9] gitweb: Prepare for splitting gitweb
  2010-12-24  9:29   ` Jonathan Nieder
@ 2010-12-26 22:54     ` Jakub Narebski
  0 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-26 22:54 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, J.H., John 'Warthog9' Hawley

On Fri, 24 Dec 2010 10:29, Jonathan Nieder wrote:
> Jakub Narebski wrote:
> 
> > Prepare gitweb for having been split into modules that are to be
> > installed alongside gitweb in 'lib/' subdirectory, by adding
> > 
> >   use lib __DIR__.'/lib';
> > 
> > to gitweb.perl (to main gitweb script), and preparing for putting
> > modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile.
> 
> Spelled out, this means modules would typically go in
> 
> 	/usr/share/gitweb/lib

Yes, it's true.  It is mainly to support situation where one can install
files in (subdirectory of) cgi-bin, but nowehere else.  That is why the
default is to install modules alongside with gitweb.

The additional advantage is that t/gitweb-lib.sh used by gitweb tests
can very simply test source version of gitweb, with gitweb finding
source version of modules.  But it is not a very large obstacle to
change this.

> Is that the right place?  I suspect something like
> 
> 	/usr/lib/gitweb/
> 
> could make sense in some installations for two reasons:
> 
>  - even braindamaged webserver configurations would not serve lib/
>    as static files in that case;

Actually it doesn't matter what web server does with those files when
accessed directly, except for the client (user) confusion if he/she
goes where not invited.  Modules are used by Perl (by gitweb), not by
web server.

> 
>  - if some modules are implemented in C for speed, they would need
>    to go in /usr/lib anyway to follow usual filesystem conventions.

Ugh, XS!  I sincerely hope that when there would be decision to implement
some features in C for speed, we would be able to use Perl version of
ctypes for C-to-Perl interface, not XS.

Anyway most probable to be implemented in C would be Git.pm, or rather
Perl interface to libgit2.  It is probable that at some point gitweb
would be converted to use Git.pm or its successor.  But I guess that
Git Perl module would be installed somewhere in PERL5LIB, so it would
be found even without  "use lib __DIR__ . '/lib';"  or its replacement.

> Does the Makefile let us override the directory with such a setting?
> 

I have thought that I did provide 'gitweblibdir' as configurable knob,
but I see that in the version I have send I don't do this:

  # Shell quote;
  bindir_SQ = $(subst ','\'',$(bindir))#'
  gitwebdir_SQ = $(subst ','\'',$(gitwebdir))#'
  gitwebstaticdir_SQ = $(subst ','\'',$(gitwebdir)/static)#'
  gitweblibdir_SQ = $(subst ','\'',$(gitwebdir)/lib)#'

But if we are to allow custom gitweblibdir, we would have to change the
way gitweb is to find its modules.  One solution would be inetad of
current

  # __DIR__ is taken from Dir::Self __DIR__ fragment
  sub __DIR__ () {
  	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
  }
  use lib __DIR__ . '/lib';

use simply

  use lib $ENV{GITWEBLIBDIR} || "++GITWEBLIBDIR++";

Of course both gitweb/Makefile and t/gitweb-lib.sh would have to be 
updated: gitweb/Makefile to include replacement rule for '++GITWEBLIBDIR++'
in GITWEB_REPLACE, and t/gitweb-lib.sh to declare and export GITWEBLIBDIR
environmental variable so that gitweb/gitweb.perl would be able to find
its modules when used for gitweb tests (see comment earlier).

> > While at it pass GITWEBLIBDIR in addition to GITWEB_TEST_INSTALLED to
> > allow testing installed version of gitweb and installed version of
> > modules (for future tests which would check individual (sub)modules).
> > 
> > Using __DIR__ from Dir::Self module (not in core, that's why currently
> > gitweb includes excerpt of code from Dir::Self defining __DIR__) was
> > chosen over using FindBin-based solution (in core since perl 5.00307,
> > while gitweb itself requires at least perl 5.8.0) because FindBin uses
> > BEGIN block
> 
> This explanation and the code below leave me nervous that the answer
> might be "no". ;-)

No it doesn't, but yes it could (see above).

> [...]
> > --- a/gitweb/gitweb.perl
> > +++ b/gitweb/gitweb.perl
> > @@ -10,6 +10,14 @@
> >  use 5.008;
> >  use strict;
> >  use warnings;
> > +
> > +use File::Spec;
> > +# __DIR__ is taken from Dir::Self __DIR__ fragment
> > +sub __DIR__ () {
> > +	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
> > +}
> > +use lib __DIR__ . '/lib';
> > +
> >  use CGI qw(:standard :escapeHTML -nosticky);
> >  use CGI::Util qw(unescape);
> >  use CGI::Carp qw(fatalsToBrowser);

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v7 5/9] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external
  2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
                   ` (3 preceding siblings ...)
  2010-12-22 23:56 ` [RFC PATCH v7 4/9] gitweb: Prepare for splitting gitweb Jakub Narebski
@ 2010-12-22 23:56 ` Jakub Narebski
  2010-12-22 23:57 ` [RFC PATCH v7 6/9] gitweb/lib - Simple output capture by redirecting STDOUT to file Jakub Narebski
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-22 23:56 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley



This way we can use it in test scripts written in other languages
(e.g. in Perl), and not rely on "$TEST_DIRECTORY/.."

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

 t/test-lib.sh |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)


diff --git a/t/test-lib.sh b/t/test-lib.sh
index 48fa516..c077fa4 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -552,9 +552,9 @@ test_external () {
 		# Announce the script to reduce confusion about the
 		# test output that follows.
 		say_color "" "# run $test_count: $descr ($*)"
-		# Export TEST_DIRECTORY, TRASH_DIRECTORY and GIT_TEST_LONG
+		# Export TEST_DIRECTORY, GIT_BUILD_DIR, TRASH_DIRECTORY and GIT_TEST_LONG
 		# to be able to use them in script
-		export TEST_DIRECTORY TRASH_DIRECTORY GIT_TEST_LONG
+		export TEST_DIRECTORY GIT_BUILD_DIR TRASH_DIRECTORY GIT_TEST_LONG
 		# Run command; redirect its stderr to &4 as in
 		# test_run_, but keep its stdout on our stdout even in
 		# non-verbose mode.

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v7 6/9] gitweb/lib - Simple output capture by redirecting STDOUT to file
  2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
                   ` (4 preceding siblings ...)
  2010-12-22 23:56 ` [RFC PATCH v7 5/9] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
@ 2010-12-22 23:57 ` Jakub Narebski
  2010-12-24  9:49   ` Jonathan Nieder
  2010-12-22 23:57 ` [RFC PATCH v7 7/9] gitweb/lib - Very simple file based cache Jakub Narebski
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2010-12-22 23:57 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley



Add GitwebCache::Capture::ToFile package, which captures output by
redirecting STDOUT to given file (specified by filename, or given opened
filehandle), earlier saving original STDOUT to restore it when finished
capturing.

GitwebCache::Capture::ToFile preserves PerlIO layers, both those set
before started capturing output, and those set during capture.

No care was taken to handle the following special cases (prior to
starting capture): closed STDOUT, STDOUT reopened to scalar reference,
tied STDOUT.  You shouldn't modify STDOUT during capture.

Includes separate tests for capturing output in
t9510/test_capture_interface.pl which is run as external test from
t9510-gitweb-capture-interface.sh.  It tests capturing of utf8 data
printed in :utf8 mode, and of binary data (containing invalid utf8) in
:raw mode.


This patch was based on "gitweb: add output buffering and associated
functions" patch by John 'Warthog9' Hawley (J.H.) in "Gitweb caching v7"
series, and on code of Capture::Tiny by David Golden (Apache License 2.0).

Based-on-work-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

 gitweb/lib/GitwebCache/Capture/ToFile.pm |  109 +++++++++++++++++++++++++
 t/t9510-gitweb-capture-interface.sh      |   34 ++++++++
 t/t9510/test_capture_interface.pl        |  132 ++++++++++++++++++++++++++++++
 3 files changed, 275 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/Capture/ToFile.pm
 create mode 100755 t/t9510-gitweb-capture-interface.sh
 create mode 100755 t/t9510/test_capture_interface.pl


diff --git a/gitweb/lib/GitwebCache/Capture/ToFile.pm b/gitweb/lib/GitwebCache/Capture/ToFile.pm
new file mode 100644
index 0000000..d2dbf0f
--- /dev/null
+++ b/gitweb/lib/GitwebCache/Capture/ToFile.pm
@@ -0,0 +1,109 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Simple output capturing via redirecting STDOUT to given file.
+#
+
+# This is the same mechanism that Capture::Tiny uses, only simpler;
+# we don't capture STDERR at all, we don't tee, we capture to
+# explicitely provided file (or filehandle).
+
+package GitwebCache::Capture::ToFile;
+
+use strict;
+use warnings;
+
+use PerlIO;
+use Symbol qw(qualify_to_ref);
+
+# Constructor
+sub new {
+	my $class = shift;
+
+	my $self = {};
+	$self = bless($self, $class);
+
+	return $self;
+}
+
+sub capture {
+	my $self = shift;
+	my $code = shift;
+
+	$self->capture_start(@_); # pass rest of params
+	eval { $code->(); 1; };
+	my $exit_code = $?; # save this for later
+	my $error = $@;     # save this for later
+
+	my $got_out = $self->capture_stop();
+	$? = $exit_code;
+	die $error if $error;
+
+	return $got_out;
+}
+
+# ----------------------------------------------------------------------
+
+# Start capturing data (STDOUT)
+sub capture_start {
+	my $self = shift;
+	my $to   = shift;
+
+	# save copy of real STDOUT via duplicating it
+	my @layers = PerlIO::get_layers(\*STDOUT);
+	open $self->{'orig_stdout'}, ">&", \*STDOUT
+		or die "Couldn't dup STDOUT for capture: $!";
+
+	# close STDOUT, so that it isn't used anymode (to have it fd0)
+	close STDOUT;
+
+	$self->{'to'} = $to;
+	my $fileno = fileno(qualify_to_ref($to)); 
+	if (defined $fileno) {
+		# if $to is filehandle, redirect
+		open STDOUT, '>&', $fileno;
+	} elsif (! ref($to)) {
+		# if $to is name of file, open it
+		open STDOUT, '>',  $to;
+	}
+	_relayer(\*STDOUT, \@layers);
+
+	# started capturing
+	$self->{'capturing'} = 1;
+}
+
+# Stop capturing data (required for die_error)
+sub capture_stop {
+	my $self = shift;
+
+	# return if we didn't start capturing
+	return unless delete $self->{'capturing'};
+
+	# close capture file, and restore original STDOUT
+	my @layers = PerlIO::get_layers(\*STDOUT);
+	close STDOUT;
+	open STDOUT, '>&', fileno($self->{'orig_stdout'});
+	_relayer(\*STDOUT, \@layers);
+
+	return exists $self->{'to'} ? $self->{'to'} : $self->{'data'};
+}
+
+# taken from Capture::Tiny by David Golden, Apache License 2.0
+# with debugging stripped out
+sub _relayer {
+	my ($fh, $layers) = @_;
+
+	my %seen = ( unix => 1, perlio => 1); # filter these out
+	my @unique = grep { !$seen{$_}++ } @$layers;
+
+	binmode($fh, join(":", ":raw", @unique));
+}
+
+
+1;
+__END__
+# end of package GitwebCache::Capture::ToFile
diff --git a/t/t9510-gitweb-capture-interface.sh b/t/t9510-gitweb-capture-interface.sh
new file mode 100755
index 0000000..9151454
--- /dev/null
+++ b/t/t9510-gitweb-capture-interface.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Jakub Narebski
+#
+
+test_description='gitweb capturing interface
+
+This test checks capturing interface used for capturing gitweb output
+in gitweb caching (GitwebCache::Capture* modules).'
+
+# for now we are running only cache interface tests
+. ./test-lib.sh
+
+# this test is present in gitweb-lib.sh
+if ! test_have_prereq PERL; then
+	skip_all='perl not available, skipping test'
+	test_done
+fi
+
+"$PERL_PATH" -MTest::More -e 0 >/dev/null 2>&1 || {
+	skip_all='perl module Test::More unavailable, skipping test'
+	test_done
+}
+
+# ----------------------------------------------------------------------
+
+# The external test will outputs its own plan
+test_external_has_tap=1
+
+test_external \
+	'GitwebCache::Capture* Perl API (in gitweb/lib/)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9510/test_capture_interface.pl
+
+test_done
diff --git a/t/t9510/test_capture_interface.pl b/t/t9510/test_capture_interface.pl
new file mode 100755
index 0000000..6d90497
--- /dev/null
+++ b/t/t9510/test_capture_interface.pl
@@ -0,0 +1,132 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+use utf8;
+
+use Test::More;
+
+# test source version
+use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
+
+# ....................................................................
+
+use_ok('GitwebCache::Capture::ToFile');
+note("Using lib '$INC[0]'");
+note("Testing '$INC{'GitwebCache/Capture/ToFile.pm'}'");
+
+# Test setting up capture
+#
+my $capture = new_ok('GitwebCache::Capture::ToFile' => [], 'The $capture');
+
+
+# Test capturing to file (given by filename) and to filehandle
+#
+sub capture_block (&;$) {
+	$capture->capture(shift, shift || 'actual');
+
+	open my $fh, '<', 'actual' or return;
+	local $/ = undef;
+	my $result = <$fh>;
+	close $fh;
+	return $result;
+}
+
+diag('Should not print anything except test results and diagnostic');
+my $test_data = 'Capture this';
+my $captured = capture_block {
+	print $test_data;
+};
+is($captured, $test_data, 'capture simple data: filename');
+
+open my $fh, '>', 'actual';
+$captured = capture_block(sub {
+	print $test_data;
+}, $fh);
+close $fh;
+is($captured, $test_data, 'capture simple data: filehandle');
+
+
+# Test capturing :utf8 and :raw data
+#
+binmode STDOUT, ':utf8';
+$test_data = <<'EOF';
+Zażółć gęsią jaźń
+EOF
+utf8::decode($test_data);
+$captured = capture_block {
+	binmode STDOUT, ':utf8';
+
+	print $test_data;
+};
+utf8::decode($captured);
+is($captured, $test_data, 'capture utf8 data');
+
+$test_data = '|\x{fe}\x{ff}|\x{9F}|\000|'; # invalid utf-8
+$captured = capture_block {
+	binmode STDOUT, ':raw';
+
+	print $test_data;
+};
+is($captured, $test_data, 'capture raw data');
+
+
+# Test nested capturing, useful for future GitwebCache::CacheOutput tests
+#
+sub read_file {
+	my $filename = shift;
+
+	open my $fh, '<', $filename or return;
+	local $/ = undef;
+	my $result = <$fh>;
+	close $fh;
+
+	return $result;
+}
+
+my $outer_capture = GitwebCache::Capture::ToFile->new();
+$captured = $outer_capture->capture(sub {
+	print "pre|";
+	my $captured = $capture->capture(sub {
+		print "INNER";
+	}, 'inner_actual');
+	print "|post";
+}, 'outer_actual');
+
+my $inner = read_file('inner_actual');
+my $outer = read_file('outer_actual');
+
+is($inner, "INNER",     'nested capture: inner');
+is($outer, "pre||post", 'nested capture: outer');
+
+
+# Testing capture when code dies
+#
+$captured = $outer_capture->capture(sub {
+	print "pre|";
+	eval {
+		my $captured = $capture->capture(sub {
+			print "INNER:pre|";
+			die "die from inner\n";
+			print "INNER:post|"
+		}, 'inner_actual');
+	};
+	print "@=$@" if $@;
+	print "|post";
+}, 'outer_actual');
+
+my $inner = read_file('inner_actual');
+my $outer = read_file('outer_actual');
+
+is($inner, "INNER:pre|",
+   'nested capture with die: inner output captured up to die');
+is($outer, "pre|@=die from inner\n|post",
+   'nested capture with die: outer caught rethrown exception from inner');
+
+
+done_testing();
+
+# Local Variables:
+# coding: utf-8
+# End:

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 6/9] gitweb/lib - Simple output capture by redirecting STDOUT to file
  2010-12-22 23:57 ` [RFC PATCH v7 6/9] gitweb/lib - Simple output capture by redirecting STDOUT to file Jakub Narebski
@ 2010-12-24  9:49   ` Jonathan Nieder
  2010-12-26 23:03     ` Jakub Narebski
  0 siblings, 1 reply; 34+ messages in thread
From: Jonathan Nieder @ 2010-12-24  9:49 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, J.H., John 'Warthog9' Hawley

Jakub Narebski wrote:

> This patch was based on "gitweb: add output buffering and associated
> functions" patch by John 'Warthog9' Hawley (J.H.) in "Gitweb caching v7"
> series, and on code of Capture::Tiny by David Golden (Apache License 2.0).

Micronit: if the license of Capture::Tiny were relevant then we would be
in trouble, I think.  (Apache-2.0 and GPLv2 aren't compatible licenses.)
Luckily

[...]
> +# taken from Capture::Tiny by David Golden, Apache License 2.0
> +# with debugging stripped out
> +sub _relayer {
> +	my ($fh, $layers) = @_;
> +
> +	my %seen = ( unix => 1, perlio => 1); # filter these out
> +	my @unique = grep { !$seen{$_}++ } @$layers;
> +
> +	binmode($fh, join(":", ":raw", @unique));
> +}

looks trivial enough.  Maybe either avoiding mention of the license or
clarifying that that is not intended to be the sole license for the
stripped-down code would help?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 6/9] gitweb/lib - Simple output capture by redirecting STDOUT to file
  2010-12-24  9:49   ` Jonathan Nieder
@ 2010-12-26 23:03     ` Jakub Narebski
  0 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-26 23:03 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, J.H., John 'Warthog9' Hawley

On Fri, 24 Dec 2010, Jonathan Nieder wrote:
> Jakub Narebski wrote:
> 
> > This patch was based on "gitweb: add output buffering and associated
> > functions" patch by John 'Warthog9' Hawley (J.H.) in "Gitweb caching v7"
> > series, and on code of Capture::Tiny by David Golden (Apache License 2.0).
> 
> Micronit: if the license of Capture::Tiny were relevant then we would be
> in trouble, I think.  (Apache-2.0 and GPLv2 aren't compatible licenses.)

Damn, I have thought that Apache-2.0 and GPLv2 are compatibile.  This is
the only reason that I explicitely mentioned the license (that and it is
not usual "licensed like Perl", i.e. dual Artistic Perl License / GPL 
licensed).  I should have checked that Apache and GPLv2 are compatibile.

> Luckily
> 
> [...]
> > +# taken from Capture::Tiny by David Golden, Apache License 2.0
> > +# with debugging stripped out
> > +sub _relayer {
> > +	my ($fh, $layers) = @_;
> > +
> > +	my %seen = ( unix => 1, perlio => 1); # filter these out
> > +	my @unique = grep { !$seen{$_}++ } @$layers;
> > +
> > +	binmode($fh, join(":", ":raw", @unique));
> > +}
> 
> looks trivial enough.  Maybe either avoiding mention of the license or
> clarifying that that is not intended to be the sole license for the
> stripped-down code would help?

You are right.  I have done similar thing for PerlIO::Util based capture,
though I didn't know about the 'binmode($fh, join(":", ":raw", @unique));'
trick.

So I think we would be in the clear by changing the comment to read:

  +# see also _relayer in Capture::Tiny by David Golden

or something like that.


Or we can try to change gitweb license to GPLv3 / AGPLv3, which is
compatibile (one way only) with Apache-2.0... just kidding :-)
-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH v7 7/9] gitweb/lib - Very simple file based cache
  2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
                   ` (5 preceding siblings ...)
  2010-12-22 23:57 ` [RFC PATCH v7 6/9] gitweb/lib - Simple output capture by redirecting STDOUT to file Jakub Narebski
@ 2010-12-22 23:57 ` Jakub Narebski
  2010-12-22 23:57 ` [RFC PATCH v7 8/9] gitweb/lib - Cache captured output (using compute_fh) Jakub Narebski
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-22 23:57 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley



This is first step towards implementing file based output (response)
caching layer that is used on such large sites as kernel.org.

This patch introduces GitwebCaching::SimpleFileCache package, which
follows Cache::Cache / CHI interface, although do not implement it
fully.  The intent of following established convention for cache
interface is to be able to replace our simple file based cache,
e.g. by the one using memcached.

The data is stored in the cache as-is, without adding metadata (like
expiration date), and without serialization (which means that one can
store only scalar data).  At this point there is no support for
expiring cache entries.


The code of GitwebCaching::SimpleFileCache package in gitweb/lib
was heavily based on file-based cache in Cache::Cache package, i.e.
on Cache::FileCache, Cache::FileBackend and Cache::BaseCache, and on
file-based cache in CHI, i.e. on CHI::Driver::File and CHI::Driver
(including implementing atomic write, something that original patch
lacks).  It tries to follow more modern CHI architecture, but without
requiring Moose.  It is much simplified compared to both interfaces
and their file-based drivers.

This patch does not yet enable output caching in gitweb (it doesn't
have all required features yet); on the other hand it includes tests
of cache Perl API in t/t9503-gitweb-caching-interface.sh.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |  452 ++++++++++++++++++++++++
 t/t9511-gitweb-caching-interface.sh            |   34 ++
 t/t9511/test_cache_interface.pl                |  381 ++++++++++++++++++++
 3 files changed, 867 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/FileCacheWithLocking.pm
 create mode 100755 t/t9511-gitweb-caching-interface.sh
 create mode 100755 t/t9511/test_cache_interface.pl


diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
new file mode 100644
index 0000000..ecd0e18
--- /dev/null
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -0,0 +1,452 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Gitweb caching engine, file-based cache with flock-based entry locking
+#
+
+# Minimalistic cache that stores data in the filesystem, without serialization.
+# It uses file locks (flock) to have only one process generating data and
+# writing to cache, when using CHI-like interface ->compute_fh() method.
+
+package GitwebCache::FileCacheWithLocking;
+
+use strict;
+use warnings;
+
+use Carp;
+use File::Path qw(mkpath);
+use Digest::MD5 qw(md5_hex);
+use Fcntl qw(:flock);
+use POSIX qw(setsid);
+
+# by default, the cache nests all entries on the filesystem single
+# directory deep, i.e. '60/b725f10c9c85c70d97880dfe8191b3' for
+# key name (key digest) 60b725f10c9c85c70d97880dfe8191b3.
+#
+our $DEFAULT_CACHE_DEPTH = 1;
+
+# by default, the root of the cache is located in 'cache'.
+#
+our $DEFAULT_CACHE_ROOT = "cache";
+
+# by default we don't use cache namespace (empty namespace);
+# empty namespace does not allow for simple implementation of clear() method.
+#
+our $DEFAULT_NAMESPACE = '';
+
+# anything less than 0 means to not expire
+#
+our $NEVER_EXPIRE = -1;
+
+# cache expiration of 0 means that entry is expired
+#
+our $EXPIRE_NOW = 0;
+
+# ......................................................................
+# constructor
+
+# The options are set by passing in hash or a reference to a hash containing
+# any of the following keys:
+#  * 'namespace'
+#    The namespace associated with this cache.  This allows easy separation of
+#    multiple, distinct caches without worrying about key collision.  Defaults
+#    to $DEFAULT_NAMESPACE.  Might be empty string.
+#  * 'cache_root' (Cache::FileCache compatibile),
+#    'root_dir' (CHI::Driver::File compatibile),
+#    The location in the filesystem that will hold the root of the cache.
+#    Defaults to $DEFAULT_CACHE_ROOT.
+#  * 'cache_depth' (Cache::FileCache compatibile),
+#    'depth' (CHI::Driver::File compatibile),
+#    The number of subdirectories deep to cache object item.  This should be
+#    large enough that no cache directory has more than a few hundred objects.
+#    Defaults to $DEFAULT_CACHE_DEPTH unless explicitly set.
+#  * 'default_expires_in' (Cache::Cache compatibile),
+#    'expires_in' (CHI compatibile) [seconds]
+#    The expiration time for objects place in the cache.
+#    Defaults to -1 (never expire) if not explicitly set.
+#  * 'max_lifetime' [seconds]
+#    If it is greater than 0, and cache entry is expired but not older
+#    than it, serve stale data when waiting for cache entry to be 
+#    regenerated (refreshed).  Non-adaptive.
+#  * 'on_error' (similar to CHI 'on_get_error'/'on_set_error')
+#    How to handle runtime errors occurring during cache gets and cache
+#    sets, which may or may not be considered fatal in your application.
+#    Options are:
+#    * "die" (the default) - call die() with an appropriate message
+#    * "warn" - call warn() with an appropriate message
+#    * "ignore" - do nothing
+#    * <coderef> - call this code reference with an appropriate message
+sub new {
+	my $class = shift;
+	my %opts = ref $_[0] ? %{ $_[0] } : @_;
+
+	my $self = {};
+	$self = bless($self, $class);
+
+	$self->{'root'} =
+		exists $opts{'cache_root'} ? $opts{'cache_root'} :
+		exists $opts{'root_dir'}   ? $opts{'root_dir'} :
+		$DEFAULT_CACHE_ROOT;
+	$self->{'depth'} =
+		exists $opts{'cache_depth'} ? $opts{'cache_depth'} :
+		exists $opts{'depth'}       ? $opts{'depth'} :
+		$DEFAULT_CACHE_DEPTH;
+	$self->{'namespace'} =
+		exists $opts{'namespace'} ? $opts{'namespace'} :
+		$DEFAULT_NAMESPACE;
+	$self->{'expires_in'} =
+		exists $opts{'default_expires_in'} ? $opts{'default_expires_in'} :
+		exists $opts{'expires_in'}         ? $opts{'expires_in'} :
+		$NEVER_EXPIRE;
+	$self->{'max_lifetime'} =
+		exists $opts{'max_lifetime'}       ? $opts{'max_lifetime'} :
+		exists $opts{'max_cache_lifetime'} ? $opts{'max_cache_lifetime'} :
+		$NEVER_EXPIRE;
+	$self->{'on_error'} =
+		exists $opts{'on_error'}      ? $opts{'on_error'} :
+		exists $opts{'on_get_error'}  ? $opts{'on_get_error'} :
+		exists $opts{'on_set_error'}  ? $opts{'on_set_error'} :
+		exists $opts{'error_handler'} ? $opts{'error_handler'} :
+		'die';
+
+	# validation could be put here
+
+	return $self;
+}
+
+
+# ......................................................................
+# accessors
+
+# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
+
+# creates get_depth() and set_depth($depth) etc. methods
+foreach my $i (qw(depth root namespace expires_in max_lifetime
+                  on_error)) {
+	my $field = $i;
+	no strict 'refs';
+	*{"get_$field"} = sub {
+		my $self = shift;
+		return $self->{$field};
+	};
+	*{"set_$field"} = sub {
+		my ($self, $value) = @_;
+		$self->{$field} = $value;
+	};
+}
+
+
+# ----------------------------------------------------------------------
+# utility functions and methods
+
+# $path = $self->path_to_namespace();
+#
+# Return root dir for namespace (lazily built, cached)
+sub path_to_namespace {
+	my ($self) = @_;
+
+	if (!exists $self->{'path_to_namespace'}) {
+		if (defined $self->{'namespace'} &&
+		    $self->{'namespace'} ne '') {
+			$self->{'path_to_namespace'} = "$self->{'root'}/$self->{'namespace'}";
+		} else {
+			$self->{'path_to_namespace'} =  $self->{'root'};
+		}
+	}
+	return $self->{'path_to_namespace'};
+}
+
+# $path = $cache->path_to_key($key);
+# $path = $cache->path_to_key($key, \$dir);
+#
+# Take an human readable key, and return file path.
+# Puts dirname of file path in second argument, if it is provided.
+sub path_to_key {
+	my ($self, $key, $dir_ref) = @_;
+
+	my @paths = ( $self->path_to_namespace() );
+
+	# Create a unique (hashed) key from human readable key
+	my $filename = md5_hex($key); # or $digester->add($key)->hexdigest();
+
+	# Split filename so that it have DEPTH subdirectories,
+	# where each subdirectory has a two-letter name
+	push @paths, unpack("(a2)[$self->{'depth'}] a*", $filename);
+	$filename = pop @paths;
+
+	# Join paths together, computing dir separately if $dir_ref was passed.
+	my $filepath;
+	if (defined $dir_ref && ref($dir_ref)) {
+		my $dir = join('/', @paths);
+		$filepath = "$dir/$filename";
+		$$dir_ref = $dir;
+	} else {
+		$filepath = join('/', @paths, $filename);
+	}
+
+	return $filepath;
+}
+
+# $self->ensure_path($dir);
+#
+# create $dir (directory) if it not exists, thus ensuring that path exists
+sub ensure_path {
+	my $self = shift;
+	my $dir = shift || return;
+
+	if (!-d $dir) {
+		# mkpath will croak()/die() if there is an error
+		mkpath($dir, 0, 0777);
+	}
+}
+
+# $filename = $self->get_lockname($key);
+#
+# Take an human readable key, and return path to be used for lockfile
+# Ensures that file can be created, if needed.
+sub get_lockname {
+	my ($self, $key) = @_;
+
+	my $lockfile = $self->path_to_key($key, \my $dir) . '.lock';
+
+	# ensure that directory leading to lockfile exists
+	$self->ensure_path($dir);
+
+	return $lockfile;
+}
+
+# ----------------------------------------------------------------------
+# "private" utility functions and methods
+
+# ($fh, $filename) = $self->_tempfile_to_path($path_for_key, $dir_for_key);
+#
+# take a file path to cache entry, and its directory
+# return filehandle and filename of open temporary file,
+# like File::Temp::tempfile
+sub _tempfile_to_path {
+	my ($self, $file, $dir) = @_;
+
+	my $tempname = "$file.tmp";
+	open my $temp_fh, '>', $tempname
+		or die "Couldn't open temporary file '$tempname' for writing: $!";
+
+	return ($temp_fh, $tempname);
+}
+
+# ($fh, $filename) = $self->_wait_for_data($key, $code);
+#
+# Wait for data to be available using (blocking) $code,
+# then return filehandle and filename to read from for $key.
+sub _wait_for_data {
+	my ($self, $key, $sync_coderef) = @_;
+	my @result;
+
+	# wait for data to be available
+	$sync_coderef->();
+	# fetch data
+	@result = $self->fetch_fh($key);
+
+	return @result;
+}
+
+# $self->_handle_error($raw_error)
+#
+# based on _handle_get_error and _dispatch_error_msg from CHI::Driver
+sub _handle_error {
+	my ($self, $error) = @_;
+
+	for ($self->get_on_error()) {
+		(ref($_) eq 'CODE') && do { $_->($error) };
+		/^ignore$/ && do { };
+		/^warn$/   && do { carp $error };
+		/^die$/    && do { croak $error };
+	}
+}
+
+# ----------------------------------------------------------------------
+# nonstandard worker and semi-interface methods
+
+# ($fh, $filename) = $self->fetch_fh($key);
+#
+# Get filehandle to read from for given $key, and filename of cache file.
+# Doesn't check if entry expired.
+sub fetch_fh {
+	my ($self, $key) = @_;
+
+	my $path = $self->path_to_key($key);
+	return unless (defined $path);
+
+	open my $fh, '<', $path or return;
+	return ($fh, $path);
+}
+
+# ($fh, $filename) = $self->get_fh($key, [option => value, ...])
+#
+# Returns filehandle to read from for given $key, and filename of cache file.
+# Returns empty list if entry expired.
+#
+# $key may be followed by one or more name/value parameters:
+# * expires_in [DURATION] - override global expiration time
+sub get_fh {
+	my ($self, $key, %opts) = @_;
+
+	return unless ($self->is_valid($key, $opts{'expires_in'}));
+
+	return $self->fetch_fh($key);
+}
+
+# [($fh, $filename) =] $self->set_coderef_fh($key, $code_fh);
+#
+# Runs $code_fh, passing to it $fh and $filename of file to write to;
+# the contents of this file would be contents of cache entry.
+# Returns what $self->fetch_fh($key) would return.
+sub set_coderef_fh {
+	my ($self, $key, $code) = @_;
+
+	my $path = $self->path_to_key($key, \my $dir);
+	return unless (defined $path && defined $dir);
+
+	# ensure that directory leading to cache file exists
+	$self->ensure_path($dir);
+
+	# generate a temporary file / file to write to
+	my ($fh, $tempfile) = $self->_tempfile_to_path($path, $dir);
+
+	# code writes to filehandle or file
+	$code->($fh, $tempfile);
+
+	close $fh;
+	rename($tempfile, $path)
+		or die "Couldn't rename temporary file '$tempfile' to '$path': $!";
+
+	open $fh, '<', $path or return;
+	return ($fh, $path);
+}
+
+# ======================================================================
+# ......................................................................
+# interface methods
+#
+# note that only those methods use 'on_error' handler;
+# all the rest just use "die"
+
+# Removing and expiring
+
+# $cache->remove($key)
+#
+# Remove the data associated with the $key from the cache.
+sub remove {
+	my ($self, $key) = @_;
+
+	my $file = $self->path_to_key($key)
+		or return;
+	return unless -f $file;
+	unlink($file)
+		or $self->_handle_error("Couldn't remove cache entry file '$file' for key '$key': $!");
+}
+
+# $cache->is_valid($key[, $expires_in])
+#
+# Returns a boolean indicating whether $key exists in the cache
+# and has not expired.  Uses global per-cache expires time, unless
+# passed optional $expires_in argument.
+sub is_valid {
+	my ($self, $key, $expires_in) = @_;
+
+	my $path = $self->path_to_key($key);
+
+	# does file exists in cache?
+	return 0 unless -f $path;
+	# get its modification time
+	my $mtime = (stat(_))[9] # _ to reuse stat structure used in -f test
+		or $self->_handle_error("Couldn't stat file '$path' for key '$key': $!");
+
+	# expire time can be set to never
+	$expires_in = defined $expires_in ? $expires_in : $self->get_expires_in();
+	return 1 unless (defined $expires_in && $expires_in >= 0);
+
+	# is file expired?
+	my $now = time();
+
+	return (($now - $mtime) < $expires_in);
+}
+
+# Getting and setting
+
+# ($fh, $filename) = $cache->compute_fh($key, $code);
+#
+# Combines the get and set operations in a single call.  Attempts to
+# get $key; if successful, returns the filehandle it can be read from.
+# Otherwise, calls $code passing filehandle to write to as a
+# parameter; contents of this file is then used as the new value for
+# $key; returns filehandle from which one can read newly generated data.
+#
+# Uses file locking to have only one process updating value for $key
+# to avoid 'cache miss stampede' (aka 'stampeding herd') problem.
+sub compute_fh {
+	my ($self, $key, $code_fh) = @_;
+
+	my @result = eval { $self->get_fh($key) };
+	return @result if @result;
+	$self->_handle_error($@) if $@;
+
+	my $lockfile = $self->get_lockname($key);
+
+	# this loop is to protect against situation where process that
+	# acquired exclusive lock (writer) dies or exits
+	# before writing data to cache
+	my $lock_state; # needed for loop condition
+	do {
+		open my $lock_fh, '+>', $lockfile
+			or $self->_handle_error("Could't open lockfile '$lockfile': $!");
+
+		$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
+		if ($lock_state) {
+			## acquired writers lock, have to generate data
+			@result = eval { $self->set_coderef_fh($key, $code_fh) };
+			$self->_handle_error($@) if $@;
+
+			# closing lockfile releases writer lock
+			flock($lock_fh, LOCK_UN);
+			close $lock_fh
+				or $self->_handle_error("Could't close lockfile '$lockfile': $!");
+
+		} else {
+			## didn't acquire writers lock, get stale data or wait for regeneration
+
+			# try to retrieve stale data
+			eval {
+				@result = $self->get_fh($key,
+					'expires_in' => $self->get_max_lifetime());
+			};
+			return @result if @result;
+			$self->_handle_error($@) if $@;
+
+			# wait for regeneration if no stale data to serve,
+			# using shared / readers lock to sync (wait for data)
+			@result = eval {
+				$self->_wait_for_data($key, sub {
+					flock($lock_fh, LOCK_SH);
+				});
+			};
+			$self->_handle_error($@) if $@;
+			# closing lockfile releases readers lock
+			flock($lock_fh, LOCK_UN);
+			close $lock_fh
+				or $self->_handle_error("Could't close lockfile '$lockfile': $!");
+
+		}
+	} until (@result || $lock_state);
+	# repeat until we have data, or we tried generating data oneself and failed
+	return @result;
+}
+
+
+1;
+__END__
+# end of package GitwebCache::FileCacheWithLocking;
diff --git a/t/t9511-gitweb-caching-interface.sh b/t/t9511-gitweb-caching-interface.sh
new file mode 100755
index 0000000..d8fc946
--- /dev/null
+++ b/t/t9511-gitweb-caching-interface.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Jakub Narebski
+#
+
+test_description='gitweb caching interface
+
+This test checks caching interface used in gitweb caching, and caching
+infrastructure (GitwebCache::* modules).'
+
+# for now we are running only cache interface tests
+. ./test-lib.sh
+
+# this test is present in gitweb-lib.sh
+if ! test_have_prereq PERL; then
+	skip_all='perl not available, skipping test'
+	test_done
+fi
+
+"$PERL_PATH" -MTest::More -e 0 >/dev/null 2>&1 || {
+	skip_all='perl module Test::More unavailable, skipping test'
+	test_done
+}
+
+# ----------------------------------------------------------------------
+
+# The external test will outputs its own plan
+test_external_has_tap=1
+
+test_external \
+	'GitwebCache::*Cache* Perl API (in gitweb/lib/)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9511/test_cache_interface.pl
+
+test_done
diff --git a/t/t9511/test_cache_interface.pl b/t/t9511/test_cache_interface.pl
new file mode 100755
index 0000000..a2b006c
--- /dev/null
+++ b/t/t9511/test_cache_interface.pl
@@ -0,0 +1,381 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use POSIX qw(dup2);
+use Fcntl qw(:DEFAULT);
+use IO::Handle;
+use IO::Select;
+use IO::Pipe;
+use File::Basename;
+
+use Test::More;
+
+# test installed version or source version
+use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
+
+
+# Test creating a cache
+#
+BEGIN { use_ok('GitwebCache::FileCacheWithLocking'); }
+note("Using lib '$INC[0]'");
+note("Testing '$INC{'GitwebCache/FileCacheWithLocking.pm'}'");
+
+my $cache = new_ok('GitwebCache::FileCacheWithLocking');
+
+# Test that default values are defined
+#
+ok(defined $GitwebCache::FileCacheWithLocking::DEFAULT_CACHE_ROOT,
+	'$GitwebCache::FileCacheWithLocking::DEFAULT_CACHE_ROOT defined');
+ok(defined $GitwebCache::FileCacheWithLocking::DEFAULT_CACHE_DEPTH,
+	'$GitwebCache::FileCacheWithLocking::DEFAULT_CACHE_DEPTH defined');
+
+# Test some accessors and some default values for cache
+#
+SKIP: {
+	skip 'default values not defined', 2
+		unless ($GitwebCache::FileCacheWithLocking::DEFAULT_CACHE_ROOT &&
+		        $GitwebCache::FileCacheWithLocking::DEFAULT_CACHE_DEPTH);
+
+	cmp_ok($cache->get_root(),  'eq', $GitwebCache::FileCacheWithLocking::DEFAULT_CACHE_ROOT,
+		"default cache root is '$GitwebCache::FileCacheWithLocking::DEFAULT_CACHE_ROOT'");
+	cmp_ok($cache->get_depth(), '==', $GitwebCache::FileCacheWithLocking::DEFAULT_CACHE_DEPTH,
+		"default cache depth is $GitwebCache::FileCacheWithLocking::DEFAULT_CACHE_DEPTH");
+}
+
+# Test the getting and setting of a cached value,
+# and removal of a cached value
+#
+my $key   = 'Test Key';
+my $value = 'Test Value';
+
+my $call_count = 0;
+sub get_value_fh {
+	my $fh = shift;
+	$call_count++;
+	print {$fh} $value;
+}
+
+# use ->compute_fh($key, $code_fh) interface
+sub cache_compute_fh {
+	my ($cache, $key, $code_fh) = @_;
+
+	my ($fh, $filename) = $cache->compute_fh($key, $code_fh);
+	return unless $fh;
+
+	local $/ = undef;
+	return <$fh>;
+}
+
+# use ->get_fh($key) interface
+sub cache_get_fh {
+	my ($cache, $key) = @_;
+
+	my ($fh, $filename) = $cache->get_fh($key);
+	return unless $fh;
+
+	local $/ = undef;
+	return <$fh>;
+}
+
+# use ->set_coderef_fh($key, $code_fh) to set $key to $value
+sub cache_set_fh {
+	my ($cache, $key, $value) = @_;
+
+	$cache->set_coderef_fh($key, sub { print {$_[0]} $value });
+	return $value;
+}
+
+subtest 'compute_fh interface' => sub {
+	foreach my $method (qw(remove compute_fh)) {
+		can_ok($cache, $method);
+	}
+
+	eval { $cache->remove('Not-Existent Key'); };
+	ok(!$@, 'remove on non-existent key doesn\'t die');
+	diag($@) if $@;
+
+	$cache->remove($key); # just in case
+	is(cache_compute_fh($cache, $key, \&get_value_fh), $value,
+	   "compute_fh 1st time (set) returns '$value'");
+	is(cache_compute_fh($cache, $key, \&get_value_fh), $value,
+	   "compute_fh 2nd time (get) returns '$value'");
+	is(cache_compute_fh($cache, $key, \&get_value_fh), $value,
+	   "compute_fh 3rd time (get) returns '$value'");
+	cmp_ok($call_count, '==', 1, 'get_value_fh() is called once from compute_fh');
+
+	done_testing();
+};
+
+
+# Test cache expiration
+#
+subtest 'cache expiration' => sub {
+	$cache->set_expires_in(60*60*24); # set expire time to 1 day
+	cmp_ok($cache->get_expires_in(), '>', 0, '"expires in" is greater than 0 (set to 1d)');
+	$call_count = 0;
+	cache_compute_fh($cache, $key, \&get_value_fh);
+	cmp_ok($call_count, '==', 0, 'compute_fh didn\'t need to compute data (not expired in 1d)');
+	is(cache_get_fh($cache, $key), $value, 'get_fh returns cached value (not expired in 1d)');
+
+	$cache->set_expires_in(-1); # set expire time to never expire
+	is($cache->get_expires_in(), -1,         '"expires in" is set to never (-1)');
+	is(cache_get_fh($cache, $key), $value,   'get returns cached value (not expired)');
+
+	$cache->set_expires_in(0);
+	is($cache->get_expires_in(),  0,         '"expires in" is set to now (0)');
+	ok(!defined(cache_get_fh($cache, $key)), 'cache is expired, get_fh returns undef');
+	cache_compute_fh($cache, $key, \&get_value_fh);
+	cmp_ok($call_count, '==', 1,             'compute_fh computed and set data');
+
+	done_testing();
+};
+
+
+# ----------------------------------------------------------------------
+# CONCURRENT ACCESS
+sub parallel_run (&); # forward declaration of prototype
+
+# Test 'stampeding herd' / 'cache miss stampede' problem
+#
+my $slow_time = 1; # how many seconds to sleep in mockup of slow generation
+sub get_value_slow_fh {
+	my $fh = shift;
+
+	$call_count++;
+	sleep $slow_time;
+	print {$fh} $value;
+}
+sub get_value_die {
+	$call_count++;
+	die "get_value_die\n";
+}
+my $lock_file = "$0.$$.lock"; # if exists then get_value_die_once_fh was already called
+sub get_value_die_once_fh {
+	if (sysopen my $lock_fh, $lock_file, (O_WRONLY | O_CREAT | O_EXCL)) {
+		close $lock_fh;
+		die "get_value_die_once_fh\n";
+	} else {
+		get_value_slow_fh(@_);
+	}
+}
+
+my @output;    # gathers output from concurrent invocations
+my $sep = '|'; # separate different parts of data for tests
+my $total_count = 0; # number of calls around all concurrent invocations
+
+note("Following tests contain artifical delay of $slow_time seconds");
+subtest 'parallel access' => sub {
+
+	$cache->remove($key);
+	@output = parallel_run {
+		$call_count = 0;
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print $data if defined $data;
+		print "$sep$call_count";
+	};
+	$total_count = 0;
+	foreach (@output) {
+		my ($child_out, $child_count) = split(quotemeta $sep, $_);
+		$total_count += $child_count;
+	}
+	cmp_ok($total_count, '==', 1, 'parallel compute_fh: get_value_slow_fh() called only once');
+	# extract only data, without child count
+	@output = map { s/\Q$sep\E.*$//; $_ } @output;
+	is_deeply(
+		\@output,
+		[ ($value) x 2 ],
+		"parallel compute_fh: both returned '$value'"
+	);
+
+	$cache->set_on_error(sub { die @_; });
+	eval {
+		local $SIG{ALRM} = sub { die "alarm\n"; };
+		alarm 4*$slow_time;
+
+		@output = parallel_run {
+			$call_count = 0;
+			my $data = eval { cache_compute_fh($cache, 'No Key', \&get_value_die); };
+			my $eval_error = $@;
+			print "$data" if defined $data;
+			print "$sep";
+			print "$eval_error" if $eval_error;
+		};
+		is_deeply(
+			\@output,
+			[ ( "${sep}get_value_die\n" ) x 2 ],
+			'parallel compute_fh: get_value_die() died in both'
+		);
+
+		alarm 0;
+	};
+	ok(!$@, 'parallel compute_fh: no alarm call (neither process hung)');
+	diag($@) if $@;
+
+	$cache->remove($key);
+	unlink($lock_file);
+	@output = parallel_run {
+		my $data = eval { cache_compute_fh($cache, $key, \&get_value_die_once_fh); };
+		my $eval_error = $@;
+		print "$data" if defined $data;
+		print "$sep";
+		print "$eval_error" if $eval_error;
+	};
+	is_deeply(
+		[sort @output],
+		[sort ("$value$sep", "${sep}get_value_die_once_fh\n")],
+		'parallel compute_fh: return correct value even if other process died'
+	);
+	unlink($lock_file);
+
+	done_testing();
+};
+
+
+# Test that cache returns stale data in existing but expired cache situation
+#
+my $stale_value = 'Stale Value';
+
+subtest 'serving stale data when regenerating' => sub {
+	cache_set_fh($cache, $key, $stale_value);
+	$cache->set_expires_in(-1);   # never expire, for next check
+	is(cache_get_fh($cache, $key), $stale_value,
+	   'stale value set (prepared) correctly');
+
+	$call_count = 0;
+	$cache->set_expires_in(0);    # expire now (so there are no fresh data)
+	$cache->set_max_lifetime(-1); # forever (always serve stale data)
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print "$call_count$sep";
+		print $data if defined $data;
+	};
+	# returning stale data works
+	is_deeply(
+		[sort @output],
+		[sort ("0$sep$stale_value", "1$sep$value")],
+		'no background: stale data returned by one process (the one not generating data)'
+	);
+	$cache->set_expires_in(-1); # never expire for next ->get
+	is(cache_get_fh($cache, $key), $value,
+	   'no background: value got set correctly, even if stale data returned');
+
+
+	cache_set_fh($cache, $key, $stale_value);
+	$cache->set_expires_in(0);   # expire now
+	$cache->set_max_lifetime(0); # don't serve stale data
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print $data;
+	};
+	# no returning stale data
+	ok(!scalar(grep { $_ eq $stale_value } @output),
+	   'no stale data if configured');
+
+
+	done_testing();
+};
+$cache->set_expires_in(-1);
+
+
+done_testing();
+
+
+#######################################################################
+#######################################################################
+#######################################################################
+
+# from http://aaroncrane.co.uk/talks/pipes_and_processes/
+sub fork_child (&) {
+	my ($child_process_code) = @_;
+
+	my $pid = fork();
+	die "Failed to fork: $!\n" if !defined $pid;
+
+	return $pid if $pid != 0;
+
+	# Now we're in the new child process
+	$child_process_code->();
+	exit;
+}
+
+sub parallel_run (&) {
+	my $child_code = shift;
+	my $nchildren = 2;
+
+	my %children;
+	my (%pid_for_child, %fd_for_child);
+	my $sel = IO::Select->new();
+	foreach my $child_idx (1..$nchildren) {
+		my $pipe = IO::Pipe->new()
+			or die "Failed to create pipe: $!\n";
+
+		my $pid = fork_child {
+			$pipe->writer()
+				or die "$$: Child \$pipe->writer(): $!\n";
+			dup2(fileno($pipe), fileno(STDOUT))
+				or die "$$: Child $child_idx failed to reopen stdout to pipe: $!\n";
+			close $pipe
+				or die "$$: Child $child_idx failed to close pipe: $!\n";
+
+			# From Test-Simple-0.96/t/subtest/fork.t
+			#
+			# Force all T::B output into the pipe (redirected to STDOUT),
+			# for the parent builder as well as the current subtest builder.
+			{
+				no warnings 'redefine';
+				*Test::Builder::output         = sub { *STDOUT };
+				*Test::Builder::failure_output = sub { *STDOUT };
+				*Test::Builder::todo_output    = sub { *STDOUT };
+			}
+
+			$child_code->();
+
+			*STDOUT->flush();
+			close(STDOUT);
+		};
+
+		$pid_for_child{$pid} = $child_idx;
+		$pipe->reader()
+			or die "Failed to \$pipe->reader(): $!\n";
+		$fd_for_child{$pipe} = $child_idx;
+		$sel->add($pipe);
+
+		$children{$child_idx} = {
+			'pid'    => $pid,
+			'stdout' => $pipe,
+			'output' => '',
+		};
+	}
+
+	while (my @ready = $sel->can_read()) {
+		foreach my $fh (@ready) {
+			my $buf = '';
+			my $nread = sysread($fh, $buf, 1024);
+
+			exists $fd_for_child{$fh}
+				or die "Cannot find child for fd: $fh\n";
+
+			if ($nread > 0) {
+				$children{$fd_for_child{$fh}}{'output'} .= $buf;
+			} else {
+				$sel->remove($fh);
+			}
+		}
+	}
+
+	while (%pid_for_child) {
+		my $pid = waitpid -1, 0;
+		warn "Child $pid_for_child{$pid} ($pid) failed with status: $?\n"
+			if $? != 0;
+		delete $pid_for_child{$pid};
+	}
+
+	return map { $children{$_}{'output'} } keys %children;
+}
+
+__END__

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v7 8/9] gitweb/lib - Cache captured output (using compute_fh)
  2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
                   ` (6 preceding siblings ...)
  2010-12-22 23:57 ` [RFC PATCH v7 7/9] gitweb/lib - Very simple file based cache Jakub Narebski
@ 2010-12-22 23:57 ` Jakub Narebski
  2010-12-22 23:58 ` [RFC PATCH v7 9/9] gitweb: Add optional output caching Jakub Narebski
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-22 23:57 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley



Add GitwebCache::CacheOutput package, which introduces cache_output
subroutine.  If data for given key is present in cache, then
cache_output gets data from cache and prints it.  If data is not present
in cache, then cache_output runs provided subroutine (code reference),
captures its output, saves this output in cache, and prints it.

It requires that provided $cache supports ->capture_fh method, like
GitwebCache::FileCacheWithLocking introduced in earlier commit, and that
provided $capture supports capturing to file or filehandle via
->capture($code, $file) method, like GitwebCache::Capture::ToFile
introduced in some earlier commit.

Exceptions in $code should be thrown using 'die' (Perl exception
mechanism); one can choose whether error output (output printed when
exception is raised, before raising it) should be saved to cache or not.
By default error output is not cached.

Gitweb would use cache_output to get page from cache, or to generate
page and save it to cache.  The die_error subroutine throws exception,
which will be caught and by default rethrown; error pages would not be
cached.

It is assumed that data is saved to cache _converted_, and should
therefore be read from cache and printed to STDOUT in ':raw' (binary)
mode.


Add t9512/test_cache_output.pl test, run as external test in
t9512-gitweb-cache.  It checks that cache_output behaves correctly,
namely that it saves and restores action output in cache, and that it
prints generated output or cached output, depending on whether there
exist data in cache.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

 gitweb/lib/GitwebCache/CacheOutput.pm    |   84 ++++++++++++++++
 t/t9512-gitweb-cache-output-interface.sh |   34 ++++++
 t/t9512/test_cache_output.pl             |  162 ++++++++++++++++++++++++++++++
 3 files changed, 280 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/CacheOutput.pm
 create mode 100755 t/t9512-gitweb-cache-output-interface.sh
 create mode 100755 t/t9512/test_cache_output.pl


diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
new file mode 100644
index 0000000..4a75a7f
--- /dev/null
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -0,0 +1,84 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+#
+# This program is licensed under the GPLv2
+
+#
+# Capturing and caching (gitweb) output
+#
+
+# Capture output, save it in cache and print it, or retrieve it from
+# cache and print it.
+
+package GitwebCache::CacheOutput;
+
+use strict;
+use warnings;
+
+use File::Copy qw();
+use Symbol qw(qualify_to_ref);
+
+use Exporter qw(import);
+our @EXPORT      = qw(cache_output);
+our %EXPORT_TAGS = (all => [ @EXPORT ]);
+
+# cache_output($cache, $capture, $key, $action_code, [ option => value ]);
+#
+# Attempts to get $key from $cache; if successful, prints the value.
+# Otherwise, calls $action_code, capture its output using $capture,
+# and use the captured output as the new value for $key in $cache,
+# then print captured output.
+#
+# It is assumed that captured data is already converted and it is
+# in ':raw' format (and thus restored in ':raw' from cache)
+#
+# Supported options:
+# * -cache_errors => 0|1  - whether error output should be cached
+sub cache_output {
+	my ($cache, $capture, $key, $code, %opts) = @_;
+
+	my ($fh, $filename);
+	my ($capture_fh, $capture_filename);
+	eval { # this `eval` is to catch rethrown error, so we can print captured output
+		($fh, $filename) = $cache->compute_fh($key, sub {
+			($capture_fh, $capture_filename) = @_;
+
+			# this `eval` is to be able to cache error output (up till 'die')
+			eval { $capture->capture($code, $capture_fh); };
+
+			# note that $cache can catch this error itself (like e.g. CHI);
+			# use "die"-ing error handler to rethrow this exception to outside
+			die $@ if ($@ && ! $opts{'-cache_errors'});
+		});
+	};
+	my $error = $@;
+
+	# if an exception was rethrown, and not caught by caching engine (by $cache)
+	# then ->compute_fh will not set $fh nor $filename; use those used for capture
+	if (!defined $fh) {
+		$filename ||= $capture_filename;
+	}
+
+	if (defined $fh || defined $filename) {
+		# set binmode only if $fh is defined (is a filehandle)
+		# File::Copy::copy opens files given by filename in binary mode
+		binmode $fh,    ':raw' if (defined $h);
+		binmode STDOUT, ':raw';
+		File::Copy::copy($fh || $filename, \*STDOUT);
+	}
+
+	# rethrow error if captured in outer `eval` (i.e. no -cache_errors),
+	# removing temporary file (exception thrown out of cache)
+	if ($error) {
+		unlink $capture_filename
+			if (defined $capture_filename && -e $capture_filename);
+		die $error;
+	}
+	return;
+}
+
+1;
+__END__
+# end of package GitwebCache::CacheOutput
diff --git a/t/t9512-gitweb-cache-output-interface.sh b/t/t9512-gitweb-cache-output-interface.sh
new file mode 100755
index 0000000..fb9525a
--- /dev/null
+++ b/t/t9512-gitweb-cache-output-interface.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Jakub Narebski
+#
+
+test_description='gitweb cache
+
+This test checks GitwebCache::CacheOutput Perl module that is
+responsible for capturing and caching gitweb output.'
+
+# for now we are running only cache interface tests
+. ./test-lib.sh
+
+# this test is present in gitweb-lib.sh
+if ! test_have_prereq PERL; then
+	skip_all='perl not available, skipping test'
+	test_done
+fi
+
+"$PERL_PATH" -MTest::More -e 0 >/dev/null 2>&1 || {
+	skip_all='perl module Test::More unavailable, skipping test'
+	test_done
+}
+
+# ----------------------------------------------------------------------
+
+# The external test will outputs its own plan
+test_external_has_tap=1
+
+test_external \
+	'GitwebCache::CacheOutput Perl API (in gitweb/lib/)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9512/test_cache_output.pl
+
+test_done
diff --git a/t/t9512/test_cache_output.pl b/t/t9512/test_cache_output.pl
new file mode 100755
index 0000000..758848c
--- /dev/null
+++ b/t/t9512/test_cache_output.pl
@@ -0,0 +1,162 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use Test::More;
+
+# test source version
+use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
+
+# ....................................................................
+
+# prototypes must be known at compile time, otherwise they do not work
+BEGIN { use_ok('GitwebCache::CacheOutput'); }
+
+require_ok('GitwebCache::FileCacheWithLocking');
+require_ok('GitwebCache::Capture::ToFile');
+
+note("Using lib '$INC[0]'");
+note("Testing '$INC{'GitwebCache/CacheOutput.pm'}'");
+note("Testing '$INC{'GitwebCache/FileCacheWithLocking.pm'}'");
+note("Testing '$INC{'GitwebCache/Capture/ToFile.pm'}'");
+
+
+# Test setting up $cache and $capture
+my ($cache, $capture);
+subtest 'setup' => sub {
+	$cache   = new_ok('GitwebCache::FileCacheWithLocking' => [], 'The $cache  ');
+	$capture = new_ok('GitwebCache::Capture::ToFile'      => [], 'The $capture');
+
+	done_testing();
+};
+
+# ......................................................................
+
+# Prepare for testing cache_output
+my $key = 'Key';
+my $action_output = <<'EOF';
+# This is data to be cached and shown
+EOF
+my $cached_output = <<"EOF";
+$action_output# (version recovered from cache)
+EOF
+my $call_count = 0;
+sub action {
+	$call_count++;
+	print $action_output;
+}
+
+my $die_output = <<"EOF";
+$action_output# (died)
+EOF
+sub die_action {
+	print $die_output;
+	die "die_action\n";
+}
+
+# Catch output printed by cache_output
+sub capture_output_of_cache_output {
+	my ($code, @args) = @_;
+
+	GitwebCache::Capture::ToFile->new()->capture(sub {
+		cache_output($cache, $capture, $key, $code, @args);
+	}, 'actual');
+
+	return get_actual();
+}
+
+sub get_actual {
+	open my $fh, '<', 'actual' or return;
+	local $/ = undef;
+	my $result = <$fh>;
+	close $fh;
+	return $result;
+}
+
+# use ->get_fh($key) interface
+sub cache_get_fh {
+	my ($cache, $key) = @_;
+
+	my ($fh, $filename) = $cache->get_fh($key);
+	return unless $fh;
+
+	local $/ = undef;
+	return <$fh>;
+}
+
+# use ->set_coderef_fh($key, $code_fh) to set $key to $value
+sub cache_set_fh {
+	my ($cache, $key, $value) = @_;
+
+	$cache->set_coderef_fh($key, sub { print {$_[0]} $value });
+	return $value;
+}
+
+
+# ......................................................................
+
+# clean state
+$cache->set_expires_in(-1);
+$cache->remove($key);
+my $test_data;
+
+# first time (if there is no cache) generates cache entry
+subtest '1st time (generate data)' => sub {
+	$call_count = 0;
+	$test_data = capture_output_of_cache_output(\&action);
+	is($test_data,                 $action_output, 'action() output is printed');
+	is(cache_get_fh($cache, $key), $action_output, 'action() output is saved in cache');
+	cmp_ok($call_count, '==', 1, 'action() was called to generate data');
+
+	done_testing();
+};
+
+# second time (if cache is set/valid) reads from cache
+subtest '2nd time (retreve from cache)' => sub {
+	cache_set_fh($cache, $key, $cached_output);
+	$call_count = 0;
+	$test_data = capture_output_of_cache_output(\&action);
+	is(cache_get_fh($cache, $key), $cached_output, 'correct value is prepared in cache');
+	is($test_data,                 $cached_output, 'output is printed from cache');
+	cmp_ok($call_count, '==', 0, 'action() was not called');
+
+	done_testing();
+};
+
+# caching output and error handling
+subtest 'errors (exceptions) are not cached by default' => sub {
+	$cache->remove($key);
+	ok(!defined cache_get_fh($cache, $key), 'cache is prepared correctly (no data in cache)');
+	eval {
+		$test_data = capture_output_of_cache_output(\&die_action);
+	};
+	my $error = $@;
+	$test_data = get_actual();
+	is($test_data, $die_output,             'output of an error is printed');
+	ok(!defined cache_get_fh($cache, $key), 'output is not captured and not cached');
+	like($error, qr/^die_action\n/m,        'exception made it to outside, correctly');
+
+	done_testing();
+};
+
+subtest 'errors are cached with -cache_errors => 1' => sub {
+	$cache->remove($key);
+	ok(!defined cache_get_fh($cache, $key), 'cache is prepared correctly (no data in cache)');
+	eval {
+		$test_data = capture_output_of_cache_output(\&die_action, -cache_errors => 1);
+	};
+	my $error = $@;
+	$test_data = get_actual();
+	is($test_data,                 $die_output, 'output of an error is printed');
+	is(cache_get_fh($cache, $key), $die_output, 'output is captured and cached');
+	ok(! $error, 'exception didn\'t made it to outside');
+	diag($error) if $error;
+
+	done_testing();
+};
+
+
+done_testing();
+__END__

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v7 9/9] gitweb: Add optional output caching
  2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
                   ` (7 preceding siblings ...)
  2010-12-22 23:57 ` [RFC PATCH v7 8/9] gitweb/lib - Cache captured output (using compute_fh) Jakub Narebski
@ 2010-12-22 23:58 ` Jakub Narebski
  2010-12-31 18:03 ` [RFC PATCH v7 10/9] gitweb: Background cache generation and progress indicator Jakub Narebski
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-22 23:58 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley



This commit actually adds output caching to gitweb, as we have now
minimal features required for it in GitwebCache::FileCacheWithLocking
(a 'dumb' but fast file-based cache engine).  To enable cache you need
(at least) set $caching_enabled to true in gitweb config, and copy
required modules alongside generated gitweb.cgi - this is described
in more detail in the new "Gitweb caching" section in gitweb/README.
"make install-gitweb" would install all modules alongside gitweb
itself.

Capturing and caching is designed in such way that there is no
behaviour change if $caching_enabled is false.  If caching is not
enabled, then capturing is also turned off.

Enabling caching causes the following additional changes to gitweb
output:
* Disables content-type negotiation (choosing between 'text/html'
  mimetype and 'application/xhtml+xml') when caching, as there is no
  content-type negotiation done when retrieving page from cache.
  Use lowest common denominator of 'text/html' mimetype which can
  be used by all browsers.  This may change in the future.
* Disable optional timing info (how much time it took to generate the
  original page, and how many git commands it took), and in its place show
  unconditionally when page was originally generated (in GMT / UTC
  timezone).
* Disable 'blame_incremental' view, as it doesn't make sense without
  printing data as soon as it is generated (which would require tee-ing
  when capturing output for caching)... and it doesn't work currently
  anyway.  Alternate solution would be to run 'blame_incremental' view
  with caching disabled.


Add basic tests of caching support to t9500-gitweb-standalone-no-errors
test: set $caching_enabled to true and check for errors for first time
run (generating cache) and second time run (retrieving from cache) for a
single view - summary view for a project.

Check in the t9501-gitweb-standalone-http-status test that gitweb at
least correctly handles "404 Not Found" error pages also in the case
when gitweb caching is enabled.

Check in the t9502-gitweb-standalone-parse-output test that gitweb
produces the same output with and without caching, for first and
second run, with binary or text output.

All those tests make use of new gitweb_enable_caching subroutine added
to gitweb-lib.sh

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

 gitweb/Makefile                           |    5 +
 gitweb/README                             |   46 +++++++
 gitweb/gitweb.perl                        |  190 ++++++++++++++++++++++++++---
 gitweb/lib/GitwebCache/CacheOutput.pm     |    2 
 t/gitweb-lib.sh                           |   11 ++
 t/t9500-gitweb-standalone-no-errors.sh    |   20 +++
 t/t9501-gitweb-standalone-http-status.sh  |   13 ++
 t/t9502-gitweb-standalone-parse-output.sh |   33 +++++
 8 files changed, 299 insertions(+), 21 deletions(-)
 mode change 100644 => 100755 t/gitweb-lib.sh


diff --git a/gitweb/Makefile b/gitweb/Makefile
index e6029e1..d67c138 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -113,6 +113,11 @@ endif
 
 GITWEB_FILES += static/git-logo.png static/git-favicon.png
 
+# gitweb output caching
+GITWEB_MODULES += GitwebCache/CacheOutput.pm
+GITWEB_MODULES += GitwebCache/SimpleFileCache.pm
+GITWEB_MODULES += GitwebCache/Capture/Simple.pm
+
 GITWEB_REPLACE = \
 	-e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \
 	-e 's|++GIT_BINDIR++|$(bindir)|g' \
diff --git a/gitweb/README b/gitweb/README
index 4a67393..efe3b2c 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -258,6 +258,12 @@ not include variables usually directly set during build):
    their default values before every request, so if you want to change
    them, be sure to set this variable to true or a code reference effecting
    the desired changes.  The default is true.
+ * $caching_enabled
+   If true, gitweb would use caching to speed up generating response.
+   Currently supported is only output (response) caching.  See "Gitweb caching"
+   section below for details on how to configure and customize caching.
+   The default is false (caching is disabled).
+
 
 Projects list file format
 ~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -329,6 +335,46 @@ You can use the following files in repository:
    descriptions.
 
 
+Gitweb caching
+~~~~~~~~~~~~~~
+
+Currently gitweb supports only output (HTTP response) caching, similar
+to the one used on http://git.kernel.org.  To turn it on, set 
+$caching_enabled variable to true value in gitweb config file, i.e.:
+
+   our $caching_enabled = 1;
+
+You can choose which caching engine should gitweb use by setting
+$cache variable to _initialized_ instance of cache interface, or to
+the name of cache class.
+
+Currenly though only cache which implements non-standard ->compute_fh()
+method is supported.  Provided GitwebCache::FileCacheWithLocking implements
+this method; it is the default caching engine used if $cache is not defined.
+
+The GitwebCache::FileCacheWithLocking is 'dumb' (but fast) file based
+caching engine, currently without any support for cache size limiting, or
+even removing expired / grossly expired entries.  It has therefore the
+downside of requiring a huge amount of disk space if there are a number of
+repositories involved.  It is not uncommon for git.kernel.org to have on the
+order of 80G - 120G accumulate over the course of a few months.  It is
+therefore recommended that the cache directory be periodically completely
+deleted; this operation is safe to perform.  Suggested mechanism (substitute
+$cachedir for actual path to gitweb cache):
+
+   # mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush
+
+Site-wide cache options are defined in %cache_options hash.  Those options
+apply only when $cache is unset (GitwebCache::FileCacheWithLocking is used),
+or if $cache is name of cache class.  You can override cache options in
+gitweb config, e.g.:
+
+   $cache_options{'expires_in'} = 60; # 60 seconds = 1 minute
+
+Please read comments for %cache_options entries in gitweb/gitweb.perl for
+description of available cache options.
+
+
 Webserver configuration
 -----------------------
 
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 880fdf2..eb02b6b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -268,6 +268,71 @@ our %highlight_ext = (
 	map { $_ => 'xml' } qw(xhtml html htm),
 );
 
+
+# This enables/disables the caching layer in gitweb.  Currently supported
+# is only output (response) caching, similar to the one used on git.kernel.org.
+our $caching_enabled = 0;
+# Set to _initialized_ instance of cache interface implementing (for now)
+# compute_fh($key, $code) method (non-standard CHI-inspired interface),
+# or to name of class of cache interface implementing said method.
+# If unset, GitwebCache::FileCacheWithLocking would be used, which is 'dumb'
+# (but fast) file based caching layer, currently without any support for
+# cache size limiting.  It is therefore recommended that the cache directory
+# be periodically completely deleted; this operation is safe to perform.
+#
+# Suggested mechanism to clear cache:
+#   mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush
+# where $cachedir is directory where cache is, i.e. $cache_options{'cache_root'}
+our $cache;
+# You define site-wide cache options defaults here; override them with
+# $GITWEB_CONFIG as necessary.
+our %cache_options = (
+	# The location in the filesystem that will hold the root of the cache.
+	# This directory will be created as needed (if possible) on the first
+	# cache set.  Note that either this directory must exists and web server
+	# has to have write permissions to it, or web server must be able to
+	# create this directory.
+	# Possible values:
+	# * 'cache' (relative to gitweb),
+	# * File::Spec->catdir(File::Spec->tmpdir(), 'gitweb-cache'),
+	# * '/var/cache/gitweb' (FHS compliant, requires being set up),
+	'cache_root' => 'cache',
+
+	# The number of subdirectories deep to cache object item.  This should be
+	# large enough that no cache directory has more than a few hundred
+	# objects.  Each non-leaf directory contains up to 256 subdirectories
+	# (00-ff).  Must be larger than 0.
+	'cache_depth' => 1,
+
+	# The (global) expiration time for objects placed in the cache, in seconds.
+	'expires_in' => 20,
+
+	# How to handle runtime errors occurring during cache gets and cache
+	# sets.  Options are:
+	#  * "die" (the default) - call die() with an appropriate message
+	#  * "warn" - call warn() with an appropriate message
+	#  * "ignore" - do nothing
+	#  * <coderef> - call this code reference with an appropriate message
+	# Note that gitweb catches 'die <message>' via custom handle_errors_html
+	# handler, set via set_message() from CGI::Carp.  'warn <message>' are
+	# written to web server logs.
+	#
+	# The default is to use cache_error_handler, which wraps die_error.
+	# Only first argument passed to cache_error_handler is used (c.f. CHI)
+	'on_error' => \&cache_error_handler,
+
+	# Extra options passed to GitwebCache::CacheOutput::cache_output subroutine
+	'cache_output' => {
+		# Enable caching of error pages (boolean).  Default is false.
+		'-cache_errors' => 0,
+	},
+);
+# Set to _initialized_ instance of GitwebCache::Capture::ToFile
+# compatibile capturing engine, i.e. one implementing ->new()
+# constructor, and ->capture($code, $file) method.  If unset
+# (default), the GitwebCache::Capture::ToFile would be used.
+our $capture;
+
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %feature = (
@@ -1121,7 +1186,16 @@ sub dispatch {
 	    !$project) {
 		die_error(400, "Project needed");
 	}
-	$actions{$action}->();
+
+	if ($caching_enabled) {
+		# human readable key identifying gitweb output
+		my $output_key = href(-replay => 1, -full => 1, -path_info => 0);
+
+		cache_output($cache, $capture, $output_key, $actions{$action},
+			%{$cache_options{'cache_output'}});
+	} else {
+		$actions{$action}->();
+	}
 }
 
 sub reset_timer {
@@ -1147,6 +1221,8 @@ sub run_request {
 		}
 	}
 	check_loadavg();
+	configure_caching()
+		if ($caching_enabled);
 
 	# $projectroot and $projects_list might be set in gitweb config file
 	$projects_list ||= $projectroot;
@@ -1210,7 +1286,7 @@ sub run {
 			if $pre_dispatch_hook;
 
 		eval { run_request() };
-		if (defined $@ && !ref($@)) {
+		if ($@ && !ref($@)) {
 			# some Perl error, but not one thrown by die_error
 			die_error(undef, undef, $@, -error_handler => 1);
 		}
@@ -1227,6 +1303,49 @@ sub run {
 	1;
 }
 
+sub configure_caching {
+	if (!eval { require GitwebCache::CacheOutput; 1; }) {
+		die_error(500,
+			"Caching enabled and error loading GitwebCache::CacheOutput",
+			esc_html($@));
+
+		# turn off caching and warn instead
+		#$caching_enabled = 0;
+		#warn "Caching enabled and GitwebCache::CacheOutput not found";
+	}
+	GitwebCache::CacheOutput->import();
+
+	# $cache might be initialized (instantiated) cache, i.e. cache object,
+	# or it might be name of class, or it might be undefined
+	unless (defined $cache && ref($cache)) {
+		$cache ||= 'GitwebCache::FileCacheWithLocking';
+		eval "require $cache";
+		if ($@) {
+			die_error(500,
+				"Error loading $cache",
+				esc_html($@));
+		}
+
+		$cache = $cache->new({
+			%cache_options,
+			#'cache_root' => '/tmp/cache',
+			#'cache_depth' => 2,
+			#'expires_in' => 20, # in seconds (CHI compatibile)
+			# (Cache::Cache compatibile initialization)
+			'default_expires_in' => $cache_options{'expires_in'},
+			# (CHI compatibile initialization)
+			'root_dir' => $cache_options{'cache_root'},
+			'depth' => $cache_options{'cache_depth'},
+			'on_get_error' => $cache_options{'on_error'},
+			'on_set_error' => $cache_options{'on_error'},
+		});
+	}
+	unless (defined $capture && ref($capture)) {
+		require GitwebCache::Capture::ToFile;
+		$capture = GitwebCache::Capture::ToFile->new();
+	}
+}
+
 run();
 
 if (defined caller) {
@@ -3597,7 +3716,9 @@ sub git_header_html {
 	# 'application/xhtml+xml', otherwise send it as plain old 'text/html'.
 	# we have to do this because MSIE sometimes globs '*/*', pretending to
 	# support xhtml+xml but choking when it gets what it asked for.
-	if (defined $cgi->http('HTTP_ACCEPT') &&
+	# Disable content-type negotiation when caching (use mimetype good for all).
+	if (!$caching_enabled &&
+	    defined $cgi->http('HTTP_ACCEPT') &&
 	    $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ &&
 	    $cgi->Accept('application/xhtml+xml') != 0) {
 		$content_type = 'application/xhtml+xml';
@@ -3622,7 +3743,9 @@ sub git_header_html {
 EOF
 	# the stylesheet, favicon etc urls won't work correctly with path_info
 	# unless we set the appropriate base URL
-	if ($ENV{'PATH_INFO'}) {
+	# if caching is enabled we can get it from cache for path_info when it
+	# is generated without path_info
+	if ($ENV{'PATH_INFO'} || $caching_enabled) {
 		print "<base href=\"".esc_url($base_url)."\" />\n";
 	}
 	# print out each stylesheet that exist, providing backwards capability
@@ -3739,17 +3862,25 @@ sub git_footer_html {
 	}
 	print "</div>\n"; # class="page_footer"
 
-	if (defined $t0 && gitweb_check_feature('timed')) {
+	# timing info doesn't make much sense with output (response) caching,
+	# so when caching is enabled gitweb prints the time of page generation
+	if ((defined $t0 || $caching_enabled) &&
+	    gitweb_check_feature('timed')) {
 		print "<div id=\"generating_info\">\n";
-		print 'This page took '.
-		      '<span id="generating_time" class="time_span">'.
-		      tv_interval($t0, [ gettimeofday() ]).
-		      ' seconds </span>'.
-		      ' and '.
-		      '<span id="generating_cmd">'.
-		      $number_of_git_cmds.
-		      '</span> git commands '.
-		      " to generate.\n";
+		if ($caching_enabled) {
+			print 'This page was generated at '.
+			      gmtime( time() )." GMT\n";
+		} else {
+			print 'This page took '.
+			      '<span id="generating_time" class="time_span">'.
+			      tv_interval($t0, [ gettimeofday() ]).
+			      ' seconds </span>'.
+			      ' and '.
+			      '<span id="generating_cmd">'.
+			      $number_of_git_cmds.
+			      '</span> git commands '.
+			      " to generate.\n";
+		}
 		print "</div>\n"; # class="page_footer"
 	}
 
@@ -3758,8 +3889,8 @@ sub git_footer_html {
 	}
 
 	print qq!<script type="text/javascript" src="!.esc_url($javascript).qq!"></script>\n!;
-	if (defined $action &&
-	    $action eq 'blame_incremental') {
+	if (!$caching_enabled &&
+	    defined $action && $action eq 'blame_incremental') {
 		print qq!<script type="text/javascript">\n!.
 		      qq!startBlame("!. href(action=>"blame_data", -replay=>1) .qq!",\n!.
 		      qq!           "!. href() .qq!");\n!.
@@ -3800,6 +3931,7 @@ sub die_error {
 		500 => '500 Internal Server Error',
 		503 => '503 Service Unavailable',
 	);
+
 	git_header_html($http_responses{$status}, undef, %opts);
 	print <<EOF;
 <div class="page_body">
@@ -3819,6 +3951,22 @@ EOF
 		unless ($opts{'-error_handler'});
 }
 
+# custom error handler for caching engine (Internal Server Error)
+sub cache_error_handler {
+	my $error = shift;
+
+	# just rethrow error that came from die_error
+	# thrown from $actions{$action}->()
+	die $error if (ref $error);
+
+	$error = to_utf8($error);
+	$error =
+		"Error in caching layer: <i>".ref($cache)."</i><br>\n".
+		CGI::escapeHTML($error);
+	# die_error() would exit
+	die_error(undef, undef, $error);
+}
+
 ## ----------------------------------------------------------------------
 ## functions printing or outputting HTML: navigation
 
@@ -5554,7 +5702,8 @@ sub git_tag {
 
 sub git_blame_common {
 	my $format = shift || 'porcelain';
-	if ($format eq 'porcelain' && $cgi->param('js')) {
+	if ($format eq 'porcelain' && $cgi->param('js') &&
+	    !$caching_enabled) {
 		$format = 'incremental';
 		$action = 'blame_incremental'; # for page title etc
 	}
@@ -5608,7 +5757,8 @@ sub git_blame_common {
 			or print "ERROR $!\n";
 
 		print 'END';
-		if (defined $t0 && gitweb_check_feature('timed')) {
+		if (!$caching_enabled &&
+		    defined $t0 && gitweb_check_feature('timed')) {
 			print ' '.
 			      tv_interval($t0, [ gettimeofday() ]).
 			      ' '.$number_of_git_cmds;
@@ -5628,7 +5778,7 @@ sub git_blame_common {
 		$formats_nav .=
 			$cgi->a({-href => href(action=>"blame", javascript=>0, -replay=>1)},
 			        "blame") . " (non-incremental)";
-	} else {
+	} elsif (!$caching_enabled) {
 		$formats_nav .=
 			$cgi->a({-href => href(action=>"blame_incremental", -replay=>1)},
 			        "blame") . " (incremental)";
@@ -5787,7 +5937,7 @@ sub git_blame {
 }
 
 sub git_blame_incremental {
-	git_blame_common('incremental');
+	git_blame_common(!$caching_enabled ? 'incremental' : undef);
 }
 
 sub git_blame_data {
diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
index 4a75a7f..792ddb7 100644
--- a/gitweb/lib/GitwebCache/CacheOutput.pm
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -64,7 +64,7 @@ sub cache_output {
 	if (defined $fh || defined $filename) {
 		# set binmode only if $fh is defined (is a filehandle)
 		# File::Copy::copy opens files given by filename in binary mode
-		binmode $fh,    ':raw' if (defined $h);
+		binmode $fh,    ':raw' if (defined $fh);
 		binmode STDOUT, ':raw';
 		File::Copy::copy($fh || $filename, \*STDOUT);
 	}
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
old mode 100644
new mode 100755
index b9bb95f..4ce067f
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -52,6 +52,17 @@ EOF
 	export SCRIPT_NAME
 }
 
+gitweb_enable_caching () {
+	test_expect_success 'enable caching' '
+		cat >>gitweb_config.perl <<-\EOF &&
+		$caching_enabled = 1;
+		$cache_options{"expires_in"} = -1;      # never expire cache for tests
+		$cache_options{"cache_root"} = "cache"; # to clear the right thing
+		EOF
+		rm -rf cache/
+	'
+}
+
 gitweb_run () {
 	GATEWAY_INTERFACE='CGI/1.1'
 	HTTP_ACCEPT='*/*'
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 21cd286..cc9cee5 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -677,4 +677,24 @@ test_expect_success HIGHLIGHT \
 	 gitweb_run "p=.git;a=blob;f=test.sh"'
 test_debug 'cat gitweb.log'
 
+# ----------------------------------------------------------------------
+# caching
+
+gitweb_enable_caching
+
+test_expect_success \
+	'caching enabled (project summary, first run, generating cache)' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
+	'caching enabled (project summary, second run, cached version)' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
+	'caching enabled (non-existent commit, not cached error page)' \
+	'gitweb_run "p=.git;a=commit;h=non-existent"'
+test_debug 'cat gitweb.log'
+
 test_done
diff --git a/t/t9501-gitweb-standalone-http-status.sh b/t/t9501-gitweb-standalone-http-status.sh
index 2487da1..168e494 100755
--- a/t/t9501-gitweb-standalone-http-status.sh
+++ b/t/t9501-gitweb-standalone-http-status.sh
@@ -134,5 +134,18 @@ cat >>gitweb_config.perl <<\EOF
 our $maxload = undef;
 EOF
 
+# ----------------------------------------------------------------------
+# output caching
+
+gitweb_enable_caching
+
+test_expect_success 'caching enabled (non-existent commit, 404 error)' '
+	gitweb_run "p=.git;a=commit;h=non-existent" &&
+	grep "Status: 404 Not Found" gitweb.headers &&
+	grep "404 - Unknown commit object" gitweb.body
+'
+test_debug 'echo "headers" && cat gitweb.headers'
+test_debug 'echo "body"    && cat gitweb.body'
+
 
 test_done
diff --git a/t/t9502-gitweb-standalone-parse-output.sh b/t/t9502-gitweb-standalone-parse-output.sh
index dd83890..bc8cb92 100755
--- a/t/t9502-gitweb-standalone-parse-output.sh
+++ b/t/t9502-gitweb-standalone-parse-output.sh
@@ -112,4 +112,37 @@ test_expect_success 'snapshot: hierarchical branch name (xx/test)' '
 '
 test_debug 'cat gitweb.headers'
 
+
+# ----------------------------------------------------------------------
+# whether gitweb with caching enabled produces the same output
+
+test_expect_success 'setup for caching tests (utf8 commit, binary file)' '
+	. "$TEST_DIRECTORY"/t3901-utf8.txt &&
+	cp "$TEST_DIRECTORY"/test9200a.png image.png &&
+	git add image.png &&
+	git commit -F "$TEST_DIRECTORY"/t3900/1-UTF-8.txt &&
+	gitweb_run "p=.git;a=patch" &&
+	mv gitweb.body no_cache.html &&
+	gitweb_run "p=.git;a=blob_plain;f=image.png" &&
+	mv gitweb.body no_cache.png
+'
+
+gitweb_enable_caching
+
+for desc in 'generating cache' 'cached version'; do
+	test_expect_success "caching enabled, HTML output, $desc" '
+		gitweb_run "p=.git;a=patch" &&
+		mv gitweb.body cache.html &&
+		test_cmp no_cache.html cache.html
+	'
+done
+
+for desc in 'generating cache' 'cached version'; do
+	test_expect_success "caching enabled, binary output, $desc" '
+		gitweb_run "p=.git;a=blob_plain;f=image.png" &&
+		mv gitweb.body cache.png &&
+		cmp no_cache.png cache.png
+	'
+done
+
 test_done

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v7 10/9] gitweb: Background cache generation and progress indicator
  2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
                   ` (8 preceding siblings ...)
  2010-12-22 23:58 ` [RFC PATCH v7 9/9] gitweb: Add optional output caching Jakub Narebski
@ 2010-12-31 18:03 ` Jakub Narebski
  2011-01-03 21:33 ` [RFC PATCH v7 11/9] [PoC] gitweb/lib - tee, i.e. print and capture during cache entry generation Jakub Narebski
  2011-01-05  2:26 ` [RFC PATCH 11/9] [PoC] gitweb/lib - HTTP-aware output caching Jakub Narebski
  11 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2010-12-31 18:03 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley

This commit removes asymmetry in serving stale data (if stale data exists)
when regenerating cache in GitwebCache::FileCacheWithLocking.  The process
that acquired exclusive (writers) lock, and is therefore selected to
be the one that (re)generates data to fill the cache, can now generate
data in background, while serving stale data.

Those background processes are daemonized, i.e. detached from the main
process (the one returning data or stale data).  Otherwise there might be a
problem when gitweb is running as (part of) long-lived process, for example
from mod_perl or from FastCGI: it would leave unreaped children as zombies
(entries in process table).  We don't want to wait for background process,
and we can't set $SIG{CHLD} to 'IGNORE' in gitweb to automatically reap
child processes, because this interferes with using
  open my $fd, '-|', git_cmd(), 'param', ...
    or die_error(...)
  # read from <$fd>
  close $fd
    or die_error(...)
In the above code "close" for magic "-|" open calls waitpid...  and we
would would die with "No child processes".  Removing 'or die' would
possibly remove ability to react to other errors.

This feature can be enabled or disabled on demand via 'background_cache'
cache parameter.  It is turned on by default.


When there is no stale version suitable to serve the client, currently
we have to wait for the data to be generated in full before showing it.
Add to GitwebCache::FileCacheWithLocking, via 'generating_info' callback,
the ability to show user some activity indicator / progress bar, to
show that we are working on generating data.

Note that without generating data in background, process generating
data wouldn't print progress info, because 'generating_info' can exit
(and in the case of gitweb's git_generating_data_html does exit).

We don't need to daemonize background process in this case, where
there is no stale data to serve, but progress info is on.  This is
because we have to wait for the background process to finish
generating data anyway.

Gitweb itself uses "Generating..." page as activity indicator, which
redirects (via <meta http-equiv="Refresh" ...>) to refreshed version
of the page after the cache is filled (via trick of not closing page
and therefore not closing connection till data is available in cache).
The git_generating_data_html() subroutine, which is used by gitweb to
implement this feature, is highly configurable: you can choose
frequency of writing some data so that connection won't get closed,
and maximum time to wait for data in "Generating..." page (see
comments in %generating_options hash definition), and initial delay
before starting progress indicator page.

The git_generating_data_html() subroutine would return early (not showing
HTML-base progress indicator) if action does not return HTML output, or
if web browser / user agent is a robot / web crawler (or gitweb is run as
standalone script).  In such cases HTML "Generating..." page does not make
much sense.

For this purpose new subroutine browser_is_robot() (which uses
HTTP::BrowserDetect if possible, and fall backs on simple check of
User-Agent string) was added.


The default behavior of cache_output() from GitwebCache::CacheOutput was
changed so that it would cache error pages if they were generated in
detached background process.  This together with default initial delay in
git_generating_data_html progress_info subroutine should ensure that there
are no problems with error pages and progress info interaction.


The t9511 test got updated to test both case with background generation
enabled and case with background generation disabled.  Also adde test for
simple (not exiting) 'generating_info' subroutine, for both case with
background generation disabled and enabled.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
There are few changes included in this commit which should be fixed in
original commits/patches earlier in the series.  Namely:

* fix to gitweb/Makefile to use newer names for gitweb caching
  modules, i.e. GitwebCache/FileCacheWithLocking.pm instead of
  GitwebCache/SimpleFileCache.pm, and GitwebCache/Capture/ToFile.pm
  instead of GitwebCache/Capture/Simple.pm

* 'max_lifetime' was introduced in previous commit, so its use in
  %cache_options (setting default value for gitweb) should also be
  done in previous commit.

* gitweb_enable_caching function in tgitweb-lib.sh should use
  "$TRASH_DIRECTORY/cache" as 'cache_root' from beginning, just in
  case


It is worth mentioning that git_generating_data_html does not need to
end with 'die'; it could as well end with 'goto DONE_REQUEST'.  The
'generating_info' subroutine is outside capture anyway.


The issue with error pages should be solved, even in the case when
they are not cached.  There are three layers of defense:

1. git_generating_data_html has initial delay of 1 second, by default.
   This means that if die_error finishes within this initial delay,
   then redirection (and ending the request) wouldn't take place.  The
   error page would be printed by parent process and not cached.

2. In the case where there is no stale data to serve, but there is
   'generating_info' subroutine and it would exit / end request before
   error page is fully generated, background process would be not
   detached, and it would print error page.  The error page would not
   be cached.

   Though I wonder if exit from git_generating_data_html should be
   trapped, so that we can wait for background process; other solution
   would be to use ripper SIGCHLD signal handler for this process...

   Huh, something still to think about...

3. In the case where there is stale data for what is now an error
   condition (e.g. deleted branch or deleted project), and background
   process would generate data being detached from originating
   project, the error page would be captured and cached.

 gitweb/Makefile                                |    4 +-
 gitweb/gitweb.perl                             |  171 +++++++++++++++++++++++-
 gitweb/lib/GitwebCache/CacheOutput.pm          |   10 ++-
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |  113 +++++++++++++++-
 t/gitweb-lib.sh                                |    4 +-
 t/t9511/test_cache_interface.pl                |  149 ++++++++++++++++++++-
 6 files changed, 439 insertions(+), 12 deletions(-)

diff --git a/gitweb/Makefile b/gitweb/Makefile
index d67c138..7a5c85f 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -115,8 +115,8 @@ GITWEB_FILES += static/git-logo.png static/git-favicon.png
 
 # gitweb output caching
 GITWEB_MODULES += GitwebCache/CacheOutput.pm
-GITWEB_MODULES += GitwebCache/SimpleFileCache.pm
-GITWEB_MODULES += GitwebCache/Capture/Simple.pm
+GITWEB_MODULES += GitwebCache/FileCacheWithLocking.pm
+GITWEB_MODULES += GitwebCache/Capture/ToFile.pm
 
 GITWEB_REPLACE = \
 	-e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index eb02b6b..5ef668d 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -307,6 +307,38 @@ our %cache_options = (
 	# The (global) expiration time for objects placed in the cache, in seconds.
 	'expires_in' => 20,
 
+	# Maximum cache file life, in seconds.  If cache entry lifetime exceeds
+	# this value, it wouldn't be served as being too stale when waiting for
+	# cache to be regenerated/refreshed, instead of trying to display
+	# existing cache date.
+	#
+	# Set it to -1 to always serve existing data if it exists.
+	# Set it to  0 to turn off serving stale data, i.e. always wait.
+	'max_lifetime' => 5*60*60, # 5 hours
+
+	# This enables/disables background caching.  If it is set to true value,
+	# caching engine would return stale data (if it is not older than
+	# 'max_lifetime' seconds) if it exists, and launch process if regenerating
+	# (refreshing) cache into the background.  If it is set to false value,
+	# the process that fills cache must always wait for data to be generated.
+	# In theory this will make gitweb seem more responsive at the price of
+	# serving possibly stale data.
+	'background_cache' => 1,
+
+	# Subroutine which would be called when gitweb has to wait for data to
+	# be generated (it can't serve stale data because there isn't any,
+	# or if it exists it is older than 'max_lifetime').  The default
+	# is to use git_generating_data_html(), which creates "Generating..."
+	# page, which would then redirect or redraw/rewrite the page when
+	# data is ready.
+	# Set it to `undef' to disable this feature.
+	#
+	# Such subroutine (if invoked from GitwebCache::FileCacheWithLocking)
+	# is passed the following parameters: $cache instance, human-readable
+	# $key to current page, and $sync_coderef subroutine to invoke to wait
+	# (in a blocking way) for data.
+	'generating_info' => \&git_generating_data_html,
+
 	# How to handle runtime errors occurring during cache gets and cache
 	# sets.  Options are:
 	#  * "die" (the default) - call die() with an appropriate message
@@ -323,10 +355,27 @@ our %cache_options = (
 
 	# Extra options passed to GitwebCache::CacheOutput::cache_output subroutine
 	'cache_output' => {
-		# Enable caching of error pages (boolean).  Default is false.
-		'-cache_errors' => 0,
+		# Enable caching of error pages (tristate, with undef meaning that error
+		# pages will be cached if were generated in detached process).
+		# Default is undef.
+		'-cache_errors' => undef,
 	},
 );
+# You define site-wide options for "Generating..." page (if enabled) here
+# (which means that $cache_options{'generating_info'} is set to coderef);
+# override them with $GITWEB_CONFIG as necessary.
+our %generating_options = (
+	# The delay before displaying "Generating..." page, in seconds.  It is
+	# intended for "Generating..." page to be shown only when really needed.
+	'startup_delay' => 1,
+	# The time between generating new piece of output to prevent from
+	# redirection before data is ready, i.e. time between printing each
+	# dot in activity indicator / progress info, in seconds.
+	'print_interval' => 2,
+	# Maximum time "Generating..." page would be present, waiting for data,
+	# before unconditional redirect, in seconds.
+	'timeout' => $cache_options{'expires_min'},
+);
 # Set to _initialized_ instance of GitwebCache::Capture::ToFile
 # compatibile capturing engine, i.e. one implementing ->new()
 # constructor, and ->capture($code, $file) method.  If unset
@@ -870,6 +919,18 @@ sub evaluate_actions_info {
 	}
 }
 
+sub browser_is_robot {
+	return 1 if !exists $ENV{'HTTP_USER_AGENT'}; # gitweb run as script
+	if (eval { require HTTP::BrowserDetect; }) {
+		my $browser = HTTP::BrowserDetect->new();
+		return $browser->robot();
+	}
+	# fallback on detecting known web browsers
+	return 0 if ($ENV{'HTTP_USER_AGENT'} =~ /\b(?:Mozilla|Opera|Safari|IE)\b/);
+	# be conservative; if not sure, assume non-interactive
+	return 1;
+}
+
 # fill %input_params with the CGI parameters. All values except for 'opt'
 # should be single values, but opt can be an array. We should probably
 # build an array of parameters that can be multi-valued, but since for the time
@@ -3660,6 +3721,112 @@ sub get_page_title {
 	return $title;
 }
 
+# creates "Generating..." page when caching enabled and not in cache
+sub git_generating_data_html {
+	my ($cache, $key, $sync_coderef) = @_;
+
+	# when should gitweb show "Generating..." page
+	if ((defined $actions_info{$action}{'output_format'} &&
+	     $actions_info{$action}{'output_format'} eq 'feed') ||
+	    browser_is_robot()) {
+		return;
+	}
+
+	# Initial delay
+	if ($generating_options{'startup_delay'} > 0) {
+		eval {
+			local $SIG{ALRM} = sub { die "alarm clock restart\n" }; # NB: \n required
+			alarm $generating_options{'startup_delay'};
+			$sync_coderef->(); # wait for data
+			alarm 0;           # turn off the alarm
+		};
+		if ($@) {
+			# propagate unexpected errors
+			die $@ if $@ !~ /alarm clock restart/;
+		} else {
+			# we got response within 'startup_delay' timeout
+			return;
+		}
+	}
+
+	my $title = "[Generating...] " . get_page_title();
+	# TODO: the following line of code duplicates the one
+	# in git_header_html, and it should probably be refactored.
+	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
+
+	# Use the trick that 'refresh' HTTP header equivalent (set via http-equiv)
+	# with timeout of 0 seconds would redirect as soon as page is finished.
+	# It assumes that browser would display partially received page.
+	# This "Generating..." redirect page should not be cached (externally).
+	my %no_cache = (
+		# HTTP/1.0
+		-Pragma => 'no-cache',
+		# HTTP/1.1
+		-Cache_Control => join(', ', qw(private no-cache no-store must-revalidate
+		                                max-age=0 pre-check=0 post-check=0)),
+	);
+	print STDOUT $cgi->header(-type => 'text/html', -charset => 'utf-8',
+	                          -status=> '200 OK', -expires => 'now',
+	                          %no_cache);
+	print STDOUT <<"EOF";
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
+<!-- git web interface version $version -->
+<!-- git core binaries version $git_version -->
+<head>
+<meta http-equiv="content-type" content="text/html; charset=utf-8" />
+<meta http-equiv="refresh" content="0" />
+<meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version" />
+<meta name="robots" content="noindex, nofollow" />
+<title>$title</title>
+</head>
+<body>
+EOF
+
+	local $| = 1; # autoflush
+	print STDOUT 'Generating...';
+
+	my $total_time = 0;
+	my $interval = $generating_options{'print_interval'} || 1;
+	my $timeout  = $generating_options{'timeout'};
+	my $alarm_handler = sub {
+		local $! = 1;
+		print STDOUT '.';
+		$total_time += $interval;
+		if ($total_time > $timeout) {
+			die "timeout\n";
+		}
+	};
+	eval {
+		local $SIG{ALRM} = $alarm_handler;
+		Time::HiRes::alarm($interval, $interval);
+		my $sync_ok;
+		do {
+			# loop is needed here because SIGALRM (from 'alarm')
+			# can interrupt waiting (process of acquiring lock)
+			$sync_ok = $sync_coderef->(); # blocking wait for data
+		} until ($sync_ok);
+		alarm 0;
+	};
+	# It doesn't really matter if we got lock, or timed-out
+	# but we should re-throw unknown (unexpected) errors
+	die $@ if ($@ and $@ !~ /timeout/);
+
+	print STDOUT <<"EOF";
+
+</body>
+</html>
+EOF
+
+	# after refresh web browser would reload page and send new request
+	die { 'status' => 200 }; # to end request
+	#goto DONE_REQUEST;
+	#exit 0;
+	#return;
+}
+
 sub print_feed_meta {
 	if (defined $project) {
 		my %href_params = get_feed_info();
diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
index 792ddb7..188d4ab 100644
--- a/gitweb/lib/GitwebCache/CacheOutput.pm
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -35,16 +35,24 @@ our %EXPORT_TAGS = (all => [ @EXPORT ]);
 # in ':raw' format (and thus restored in ':raw' from cache)
 #
 # Supported options:
-# * -cache_errors => 0|1  - whether error output should be cached
+# * -cache_errors => undef|0|1  - whether error output should be cached,
+#                                 undef means cache if we are in detached process
 sub cache_output {
 	my ($cache, $capture, $key, $code, %opts) = @_;
 
+
+	my $pid = $$;
 	my ($fh, $filename);
 	my ($capture_fh, $capture_filename);
 	eval { # this `eval` is to catch rethrown error, so we can print captured output
 		($fh, $filename) = $cache->compute_fh($key, sub {
 			($capture_fh, $capture_filename) = @_;
 
+			if (!defined $opts{'-cache_errors'}) {
+				# cache errors if we are in detached process
+				$opts{'-cache_errors'} = ($$ != $pid && getppid() != $pid);
+			}
+
 			# this `eval` is to be able to cache error output (up till 'die')
 			eval { $capture->capture($code, $capture_fh); };
 
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index ecd0e18..291526e 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -73,6 +73,16 @@ our $EXPIRE_NOW = 0;
 #    If it is greater than 0, and cache entry is expired but not older
 #    than it, serve stale data when waiting for cache entry to be 
 #    regenerated (refreshed).  Non-adaptive.
+#  * 'background_cache' (boolean)
+#    This enables/disables regenerating cache in background process.
+#    Defaults to true.
+#  * 'generating_info'
+#    Subroutine (code) called when process has to wait for cache entry
+#    to be (re)generated (when there is no not-too-stale data to serve
+#    instead), for other process (or bacground process).  It is passed
+#    $cache instance, $key, and $wait_code subroutine (code reference)
+#    to invoke (to call) to wait for cache entry to be ready.
+#    Unset by default (which means no activity indicator).
 #  * 'on_error' (similar to CHI 'on_get_error'/'on_set_error')
 #    How to handle runtime errors occurring during cache gets and cache
 #    sets, which may or may not be considered fatal in your application.
@@ -107,6 +117,11 @@ sub new {
 		exists $opts{'max_lifetime'}       ? $opts{'max_lifetime'} :
 		exists $opts{'max_cache_lifetime'} ? $opts{'max_cache_lifetime'} :
 		$NEVER_EXPIRE;
+	$self->{'background_cache'} =
+		exists $opts{'background_cache'} ? $opts{'background_cache'} :
+		1;
+	$self->{'generating_info'} = $opts{'generating_info'}
+		if exists $opts{'generating_info'};
 	$self->{'on_error'} =
 		exists $opts{'on_error'}      ? $opts{'on_error'} :
 		exists $opts{'on_get_error'}  ? $opts{'on_get_error'} :
@@ -127,6 +142,7 @@ sub new {
 
 # creates get_depth() and set_depth($depth) etc. methods
 foreach my $i (qw(depth root namespace expires_in max_lifetime
+                  background_cache generating_info
                   on_error)) {
 	my $field = $i;
 	no strict 'refs';
@@ -140,6 +156,16 @@ foreach my $i (qw(depth root namespace expires_in max_lifetime
 	};
 }
 
+# $cache->generating_info($wait_code);
+# runs 'generating_info' subroutine, for activity indicator,
+# checking if it is defined first.
+sub generating_info {
+	my $self = shift;
+
+	if (defined $self->{'generating_info'}) {
+		$self->{'generating_info'}->($self, @_);
+	}
+}
 
 # ----------------------------------------------------------------------
 # utility functions and methods
@@ -246,6 +272,10 @@ sub _wait_for_data {
 	my ($self, $key, $sync_coderef) = @_;
 	my @result;
 
+	# provide "generating page..." info, if exists
+	$self->generating_info($key, $sync_coderef);
+	# generating info may exit, so we can not get there
+
 	# wait for data to be available
 	$sync_coderef->();
 	# fetch data
@@ -254,6 +284,57 @@ sub _wait_for_data {
 	return @result;
 }
 
+sub _set_maybe_background {
+	my ($self, $key, $code) = @_;
+
+	my ($pid, $detach);
+	my (@result, @stale_result);
+
+	if ($self->{'background_cache'}) {
+		# try to retrieve stale data
+		@stale_result = $self->get_fh($key,
+			'expires_in' => $self->get_max_lifetime());
+
+		# fork if there is stale data, for background process
+		# to regenerate/refresh the cache (generate data),
+		# or if main process would show progress indicator
+		$detach = @stale_result;
+		$pid = fork()
+			if (@stale_result || $self->{'generating_info'});
+	}
+
+	if ($pid) {
+		## forked and are in parent process
+		# reap child, which spawned grandchild process (detaching it)
+		waitpid $pid, 0
+			if $detach;
+
+	} else {
+		## didn't fork, or are in background process
+
+		# daemonize background process, detaching it from parent
+		# see also Proc::Daemonize, Apache2::SubProcess
+		if (defined $pid && $detach) {
+			## in background process
+			POSIX::setsid(); # or setpgrp(0, 0);
+			fork() && CORE::exit(0);
+		}
+
+		@result = $self->set_coderef_fh($key, $code);
+
+		if (defined $pid) { # && !$pid
+			## in background process; parent or grandparent
+			## will serve stale data, or just generated data
+
+			# lockfile will be automatically closed on exit,
+			# and therefore lockfile would be unlocked
+			CORE::exit(0);
+		}
+	}
+
+	return @result > 0 ? @result : @stale_result;
+}
+
 # $self->_handle_error($raw_error)
 #
 # based on _handle_get_error and _dispatch_error_msg from CHI::Driver
@@ -408,14 +489,37 @@ sub compute_fh {
 		$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
 		if ($lock_state) {
 			## acquired writers lock, have to generate data
-			@result = eval { $self->set_coderef_fh($key, $code_fh) };
+			eval { @result = $self->_set_maybe_background($key, $code_fh) };
 			$self->_handle_error($@) if $@;
 
 			# closing lockfile releases writer lock
-			flock($lock_fh, LOCK_UN);
+			#flock($lock_fh, LOCK_UN); # it would unlock here and in background process
 			close $lock_fh
 				or $self->_handle_error("Could't close lockfile '$lockfile': $!");
 
+			if (!@result) {
+				# wait for background process to finish generating data
+				open $lock_fh, '<', $lockfile
+					or $self->_handle_error("Couldn't reopen (for reading) lockfile '$lockfile': $!");
+
+				eval {
+					@result = $self->_wait_for_data($key, sub {
+						flock($lock_fh, LOCK_SH);
+						# or 'waitpid -1, 0;', or 'wait;', as we don't detach now in this situation
+					});
+				};
+				$self->_handle_error($@) if $@;
+
+				# closing lockfile releases readers lock used to wait for data
+				flock($lock_fh, LOCK_UN);
+				close $lock_fh
+					or $self->_handle_error("Could't close reopened lockfile '$lockfile': $!");
+
+				# we didn't detach, so wait for the child to reap it
+				# (it should finish working, according to lock status)
+				wait;
+			}
+
 		} else {
 			## didn't acquire writers lock, get stale data or wait for regeneration
 
@@ -429,12 +533,13 @@ sub compute_fh {
 
 			# wait for regeneration if no stale data to serve,
 			# using shared / readers lock to sync (wait for data)
-			@result = eval {
-				$self->_wait_for_data($key, sub {
+			eval {
+				@result = $self->_wait_for_data($key, sub {
 					flock($lock_fh, LOCK_SH);
 				});
 			};
 			$self->_handle_error($@) if $@;
+
 			# closing lockfile releases readers lock
 			flock($lock_fh, LOCK_UN);
 			close $lock_fh
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index 4ce067f..8652c91 100755
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -57,7 +57,9 @@ gitweb_enable_caching () {
 		cat >>gitweb_config.perl <<-\EOF &&
 		$caching_enabled = 1;
 		$cache_options{"expires_in"} = -1;      # never expire cache for tests
-		$cache_options{"cache_root"} = "cache"; # to clear the right thing
+		$cache_options{"cache_root"} = "$TRASH_DIRECTORY/cache"; # to clear the right thing
+		$cache_options{"background_cache"} = 0; # no background processes in test suite
+		$cache_options{"generating_info"} = undef; # tests do not use web browser
 		EOF
 		rm -rf cache/
 	'
diff --git a/t/t9511/test_cache_interface.pl b/t/t9511/test_cache_interface.pl
index a2b006c..1e8feb3 100755
--- a/t/t9511/test_cache_interface.pl
+++ b/t/t9511/test_cache_interface.pl
@@ -23,7 +23,13 @@ BEGIN { use_ok('GitwebCache::FileCacheWithLocking'); }
 note("Using lib '$INC[0]'");
 note("Testing '$INC{'GitwebCache/FileCacheWithLocking.pm'}'");
 
-my $cache = new_ok('GitwebCache::FileCacheWithLocking');
+my $cache = new_ok('GitwebCache::FileCacheWithLocking', [
+	'max_lifetime' => 0, # turn it off
+	'background_cache' => 0,
+]);
+
+# ->compute_fh() can fork, don't generate zombies
+#local $SIG{CHLD} = 'IGNORE';
 
 # Test that default values are defined
 #
@@ -239,6 +245,7 @@ subtest 'parallel access' => sub {
 my $stale_value = 'Stale Value';
 
 subtest 'serving stale data when regenerating' => sub {
+	$cache->remove($key);
 	cache_set_fh($cache, $key, $stale_value);
 	$cache->set_expires_in(-1);   # never expire, for next check
 	is(cache_get_fh($cache, $key), $stale_value,
@@ -246,7 +253,10 @@ subtest 'serving stale data when regenerating' => sub {
 
 	$call_count = 0;
 	$cache->set_expires_in(0);    # expire now (so there are no fresh data)
-	$cache->set_max_lifetime(-1); # forever (always serve stale data)
+	$cache->set_max_lifetime(-1); # stale data is valid forever
+
+	# without background generation
+	$cache->set_background_cache(0);
 
 	@output = parallel_run {
 		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
@@ -264,6 +274,31 @@ subtest 'serving stale data when regenerating' => sub {
 	   'no background: value got set correctly, even if stale data returned');
 
 
+	# with background generation
+	$cache->set_background_cache(1);
+	$call_count = 0;
+	cache_set_fh($cache, $key, $stale_value);
+	$cache->set_expires_in(0);  # expire now (so there are no fresh data)
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print "$call_count$sep";
+		print $data if defined $data;
+	};
+	# returning stale data works
+	is_deeply(
+		[sort @output],
+		[sort ("0$sep$stale_value", "0$sep$stale_value")],
+		'background: stale data returned by both processes'
+	);
+	$cache->set_expires_in(-1); # never expire for next ->get
+	note("waiting $slow_time sec. for background process to have time to set data");
+	sleep $slow_time; # wait for background process to have chance to set data
+	is(cache_get_fh($cache, $key), $value,
+	   'background: value got set correctly by background process');
+	$cache->set_expires_in(0);  # expire now (so there are no fresh data)
+
+
 	cache_set_fh($cache, $key, $stale_value);
 	$cache->set_expires_in(0);   # expire now
 	$cache->set_max_lifetime(0); # don't serve stale data
@@ -282,6 +317,116 @@ subtest 'serving stale data when regenerating' => sub {
 $cache->set_expires_in(-1);
 
 
+# Test 'generating_info' feature
+#
+$cache->remove($key);
+my $progress_info = "Generating...";
+sub test_generating_info {
+	local $| = 1;
+	print "$progress_info";
+}
+$cache->set_generating_info(\&test_generating_info);
+
+subtest 'generating progress info' => sub {
+	my @progress;
+
+	# without background generation, and without stale value
+	$cache->set_background_cache(0);
+	$cache->remove($key); # no data and no stale data
+	$call_count = 0;
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print "$sep$call_count$sep";
+		print $data if defined $data;
+	};
+	# split progress and output
+	@progress = map { s/^(.*)\Q${sep}\E//o && $1 } @output;
+	is_deeply(
+		[sort @progress],
+		[sort ("${sep}1", "$progress_info${sep}0")],
+		'no background, no stale data: the process waiting for data prints progress info'
+	);
+	is_deeply(
+		\@output,
+		[ ($value) x 2 ],
+		'no background, no stale data: both processes return correct value'
+	);
+
+
+	# without background generation, with stale value
+	cache_set_fh($cache, $key, $stale_value);
+	$cache->set_expires_in(0);    # set value is now expired
+	$cache->set_max_lifetime(-1); # stale data never expire
+	$call_count = 0;
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print "$sep$call_count$sep";
+		print $data if defined $data;
+	};
+	@progress = map { s/^(.*?)\Q${sep}\E//o && $1 } @output;
+	is_deeply(
+		\@progress,
+		[ ('') x 2],
+		'no background, stale data: neither process prints progress info'
+	);
+	is_deeply(
+		[sort @output],
+		[sort ("1$sep$value", "0$sep$stale_value")],
+		'no background, stale data: generating gets data, other gets stale data'
+	);
+	$cache->set_expires_in(-1);
+
+
+	# with background generation
+	$cache->set_background_cache(1);
+	$cache->remove($key); # no data and no stale value
+	$call_count = 0;
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print $sep;
+		print $data if defined $data;
+	};
+	@progress = map { s/^(.*)\Q${sep}\E//o && $1 } @output;
+	is_deeply(
+		\@progress,
+		[ ($progress_info) x 2],
+		'background, no stale data: both process print progress info'
+	);
+	is_deeply(
+		\@output,
+		[ ($value) x 2 ],
+		'background, no stale data: both processes return correct value'
+	);
+
+
+	# with background generation, with stale value
+	cache_set_fh($cache, $key, $stale_value);
+	$cache->set_expires_in(0);    # set value is now expired
+	$cache->set_max_lifetime(-1); # stale data never expire
+	$call_count = 0;
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&get_value_slow_fh);
+		print $sep;
+		print $data if defined $data;
+	};
+	@progress = map { s/^(.*)\Q${sep}\E//o && $1 } @output;
+	is_deeply(
+		\@progress,
+		[ ('') x 2],
+		'background, stale data: neither process prints progress info'
+	);
+	note("waiting $slow_time sec. for background process to have time to set data");
+	sleep $slow_time; # wait for background process to have chance to set data
+
+
+	done_testing();
+};
+$cache->set_expires_in(-1);
+
 done_testing();
 
 
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH v7 11/9] [PoC] gitweb/lib - tee, i.e. print and capture during cache entry generation
  2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
                   ` (9 preceding siblings ...)
  2010-12-31 18:03 ` [RFC PATCH v7 10/9] gitweb: Background cache generation and progress indicator Jakub Narebski
@ 2011-01-03 21:33 ` Jakub Narebski
  2011-01-03 23:31   ` J.H.
  2011-01-05  2:26 ` [RFC PATCH 11/9] [PoC] gitweb/lib - HTTP-aware output caching Jakub Narebski
  11 siblings, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2011-01-03 21:33 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley

Instead of having gitweb use progress info indicator / throbber to
notify user that data is being generated by current process, gitweb
can now (provided that PerlIO::tee from PerlIO::Util is available)
send page to web browser while simultaneously saving it to cache
(print and capture, i.e. tee), thus having incremental generating of
page serve as a progress indicator.


To do this, the GitwebCache::Capture::ToFile module acquired ->tee()
subroutine, similar to ->capture(), but it prints while capturing
output.  The ->tee() method (and its worker methods ->tee_start() and
->tee_end()) are available only if PerlIO::tee from PerlIO::Util
distribution is present.  Tests checking if this feature works as
expected were added to t9510 test.

An alternative would be to provide two versions of
GitwebCache::Capture::ToFile.  Note also that PerlIO::tee is not
strictly necessary, as Capture::Tiny shows, but in most generic case
(like done in Capture::Tiny) one needs separate process functioning as
multplexer.


Because tee-ing (printing while capturing) can function as a kind of
progress indicator only for process generating the data for cache
entry, and not for the processes waiting for data to be generated,
therefore 'generating_info' got splitinto 'get_progress_info' and
'set_progress_info'.  You can set now in GitwebCache::FIleCacheWithLocking
those two separately.  You are expected to unset 'set_progress_info'
when using tee-ing capturing engine.  Some tests added to t9511 with
tee-like situation.

As a proof of concept gitweb now uses two slightly different versions
of "Generating..." page; if you worry about interaction between progress
indicator and non-cacheable error pages, you can set 'set_progress_info'
separately to undef.


The cache_output subroutine from GitwebCache::CacheOutput got updated
to use ->tee() subroutine if $capture supports it.  If ->tee() is
used, then of course generated data doesn't need to and shouldn't be
printed; also cache_output unsets 'set_progress_info' locally.  Note
that ->tee() is used only if we are not in background process; if we
are in background process, simple ->capture() is used.  No new tests
for now.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
Note: the change to t/gitweb-lib.sh and some of changes to t9510 are
incidental fixes; original commits should be fixed instead.

This is proof of concept (PoC) patch, showing how one can use
"tee"-ing in capturing engine together with gitweb output caching.
Because we don't need and don't use 'generating_info' subroutine for
process that is writing data (one that acquired writers lock) we are
(or at least should be) now safe to have error pages not cached.

Currently the "tee"-ing support requires PerlIO::tee module from the
PerlIO::Util distribution, as it was easiest way to add such feature.
In the future we would have probably to do something similar what
'tee' in Capture::Tiny (or in other capture modules) does.  I'm not
sure if PerlIO::Util is packaged as RPM package anywhere...; well, I have
googled that ALT Linux has it: http://sisyphus.ru/en/srpm/perl-PerlIO-Util

I have only ran tests, I haven't actually run gitweb with those
changes... :-P

 gitweb/gitweb.perl                             |   32 +++++++----
 gitweb/lib/GitwebCache/CacheOutput.pm          |   18 ++++++-
 gitweb/lib/GitwebCache/Capture/ToFile.pm       |   67 ++++++++++++++++++++++-
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   52 +++++++++++++-----
 t/gitweb-lib.sh                                |    2 +-
 t/t9510/test_capture_interface.pl              |   28 +++++++++-
 t/t9511/test_cache_interface.pl                |   29 ++++++++++
 7 files changed, 194 insertions(+), 34 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 5ef668d..de283a0 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -325,19 +325,17 @@ our %cache_options = (
 	# serving possibly stale data.
 	'background_cache' => 1,
 
-	# Subroutine which would be called when gitweb has to wait for data to
+	# Subroutines which would be called when gitweb has to wait for data to
 	# be generated (it can't serve stale data because there isn't any,
-	# or if it exists it is older than 'max_lifetime').  The default
-	# is to use git_generating_data_html(), which creates "Generating..."
-	# page, which would then redirect or redraw/rewrite the page when
-	# data is ready.
-	# Set it to `undef' to disable this feature.
+	# or if it exists it is older than 'max_lifetime').
+	# Set them to `undef' to disable this feature.
 	#
-	# Such subroutine (if invoked from GitwebCache::FileCacheWithLocking)
+	# Such subroutines (if invoked from GitwebCache::FileCacheWithLocking)
 	# is passed the following parameters: $cache instance, human-readable
 	# $key to current page, and $sync_coderef subroutine to invoke to wait
 	# (in a blocking way) for data.
-	'generating_info' => \&git_generating_data_html,
+	'get_progress_info' => \&git_get_progress_info_html,
+	'set_progress_info' => \&git_set_progress_info_html,
 
 	# How to handle runtime errors occurring during cache gets and cache
 	# sets.  Options are:
@@ -3721,9 +3719,21 @@ sub get_page_title {
 	return $title;
 }
 
+sub git_get_progress_info_html {
+	git_generating_data_html("Waiting", @_);
+}
+
+sub git_set_progress_info_html {
+	# minimum startup delay is 2 seconds, just in case, for error handling
+	local $generating_options{'startup_delay'} =
+		$generating_options{'startup_delay'} > 2 ? $generating_options{'startup_delay'} : 2;
+
+	git_generating_data_html("Generating", @_);
+}
+
 # creates "Generating..." page when caching enabled and not in cache
 sub git_generating_data_html {
-	my ($cache, $key, $sync_coderef) = @_;
+	my ($msg, $cache, $key, $sync_coderef) = @_;
 
 	# when should gitweb show "Generating..." page
 	if ((defined $actions_info{$action}{'output_format'} &&
@@ -3749,7 +3759,7 @@ sub git_generating_data_html {
 		}
 	}
 
-	my $title = "[Generating...] " . get_page_title();
+	my $title = "[$msg...] " . get_page_title();
 	# TODO: the following line of code duplicates the one
 	# in git_header_html, and it should probably be refactored.
 	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
@@ -3786,7 +3796,7 @@ sub git_generating_data_html {
 EOF
 
 	local $| = 1; # autoflush
-	print STDOUT 'Generating...';
+	print STDOUT "$msg...";
 
 	my $total_time = 0;
 	my $interval = $generating_options{'print_interval'} || 1;
diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
index 188d4ab..3bcd35b 100644
--- a/gitweb/lib/GitwebCache/CacheOutput.pm
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -42,6 +42,14 @@ sub cache_output {
 
 
 	my $pid = $$;
+	my $can_tee = $capture->can('tee');
+	# if $capture can tee, we don't need progress info for generating (on set).
+	# the below breaks encapsulation, but it is a bit simpler than
+	# $old = $cache->get_...; $cache->set_...(...); ...; $cache->set_...($old);
+	local $cache->{'set_progress_info'} = undef
+		if ($can_tee);
+
+	my $printed = 0;
 	my ($fh, $filename);
 	my ($capture_fh, $capture_filename);
 	eval { # this `eval` is to catch rethrown error, so we can print captured output
@@ -54,7 +62,12 @@ sub cache_output {
 			}
 
 			# this `eval` is to be able to cache error output (up till 'die')
-			eval { $capture->capture($code, $capture_fh); };
+			if ($can_tee && $$ == $pid) {
+				$printed = 1;
+				eval { $capture->tee($code, $capture_fh); };
+			} else {
+				eval { $capture->capture($code, $capture_fh); };
+			}
 
 			# note that $cache can catch this error itself (like e.g. CHI);
 			# use "die"-ing error handler to rethrow this exception to outside
@@ -69,7 +82,8 @@ sub cache_output {
 		$filename ||= $capture_filename;
 	}
 
-	if (defined $fh || defined $filename) {
+	if ((defined $fh || defined $filename) &&
+	    !$printed) { # did we tee, i.e. already printed output?
 		# set binmode only if $fh is defined (is a filehandle)
 		# File::Copy::copy opens files given by filename in binary mode
 		binmode $fh,    ':raw' if (defined $fh);
diff --git a/gitweb/lib/GitwebCache/Capture/ToFile.pm b/gitweb/lib/GitwebCache/Capture/ToFile.pm
index d2dbf0f..0290ec4 100644
--- a/gitweb/lib/GitwebCache/Capture/ToFile.pm
+++ b/gitweb/lib/GitwebCache/Capture/ToFile.pm
@@ -20,6 +20,10 @@ use warnings;
 use PerlIO;
 use Symbol qw(qualify_to_ref);
 
+BEGIN {
+	eval { use PerlIO::Util; };
+}
+
 # Constructor
 sub new {
 	my $class = shift;
@@ -30,22 +34,41 @@ sub new {
 	return $self;
 }
 
-sub capture {
+sub capture_or_tee {
 	my $self = shift;
 	my $code = shift;
+	my ($start, $stop) = @{ shift() };
 
-	$self->capture_start(@_); # pass rest of params
+	$self->$start(@_); # pass rest of params
 	eval { $code->(); 1; };
 	my $exit_code = $?; # save this for later
 	my $error = $@;     # save this for later
 
-	my $got_out = $self->capture_stop();
+	my $got_out = $self->$stop();
 	$? = $exit_code;
 	die $error if $error;
 
 	return $got_out;
 }
 
+sub capture {
+	my ($self, $code, @args) = @_;
+
+	return
+		$self->capture_or_tee($code, ['capture_start', 'capture_stop'], @args);
+}
+
+BEGIN {
+	if ($INC{'PerlIO/Util.pm'}) {
+		*tee = sub {
+			my ($self, $code, @args) = @_;
+
+			return
+				$self->capture_or_tee($code, ['tee_start', 'tee_stop'], @args);
+		};
+	}
+}
+
 # ----------------------------------------------------------------------
 
 # Start capturing data (STDOUT)
@@ -92,6 +115,44 @@ sub capture_stop {
 	return exists $self->{'to'} ? $self->{'to'} : $self->{'data'};
 }
 
+# ......................................................................
+
+BEGIN {
+	if ($INC{'PerlIO/Util.pm'}) {
+		*tee_start = sub {
+			my ($self, $to) = @_;
+
+			# save layers, to replay them on top of 'tee' layer (?)
+			my @layers = PerlIO::get_layers(\*STDOUT);
+
+			$self->{'to'} = $to;
+			*STDOUT->push_layer('tee' => $to);
+
+			_relayer(\*STDOUT, \@layers); # is it necessary?
+
+			# started tee-ing
+			$self->{'teeing'} = 1;
+		};
+		*tee_stop = sub {
+			my $self = shift;
+
+			# return if we didn't start tee-ing
+			return unless delete $self->{'teeing'};
+
+			my @top_layers;
+			while ((my $layer = *STDOUT->pop_layer()) ne 'tee') {
+				push @top_layers, $layer;
+			}
+			binmode(STDOUT, join(":", ":", @top_layers));
+			# or is it binmode(STDOUT, join(":", ":raw", @top_layers));
+
+			return exists $self->{'to'} ? $self->{'to'} : $self->{'data'};
+		};
+	}
+}
+
+# ----------------------------------------------------------------------
+
 # taken from Capture::Tiny by David Golden, Apache License 2.0
 # with debugging stripped out
 sub _relayer {
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 291526e..09ae7b2 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -76,12 +76,17 @@ our $EXPIRE_NOW = 0;
 #  * 'background_cache' (boolean)
 #    This enables/disables regenerating cache in background process.
 #    Defaults to true.
-#  * 'generating_info'
+#  * 'get_progress_info',
+#    'set_progress_info',
+#    'generating_info' (code reference)
 #    Subroutine (code) called when process has to wait for cache entry
 #    to be (re)generated (when there is no not-too-stale data to serve
 #    instead), for other process (or bacground process).  It is passed
 #    $cache instance, $key, and $wait_code subroutine (code reference)
 #    to invoke (to call) to wait for cache entry to be ready.
+#    'get_progress_info' gets called on getting data from cache, i.e.
+#    when waiting for data to be generated, 'set_progress_info' gets
+#    called when waiting to generate data; 'generating_info' sets both.
 #    Unset by default (which means no activity indicator).
 #  * 'on_error' (similar to CHI 'on_get_error'/'on_set_error')
 #    How to handle runtime errors occurring during cache gets and cache
@@ -120,8 +125,14 @@ sub new {
 	$self->{'background_cache'} =
 		exists $opts{'background_cache'} ? $opts{'background_cache'} :
 		1;
-	$self->{'generating_info'} = $opts{'generating_info'}
-		if exists $opts{'generating_info'};
+	$self->{'get_progress_info'} =
+		exists $opts{'get_progress_info'} ? $opts{'get_progress_info'} :
+		exists $opts{'generating_info'}   ? $opts{'generating_info'} :
+		undef;
+	$self->{'set_progress_info'} =
+		exists $opts{'set_progress_info'} ? $opts{'set_progress_info'} :
+		exists $opts{'generating_info'}   ? $opts{'generating_info'} :
+		undef;
 	$self->{'on_error'} =
 		exists $opts{'on_error'}      ? $opts{'on_error'} :
 		exists $opts{'on_get_error'}  ? $opts{'on_get_error'} :
@@ -142,7 +153,7 @@ sub new {
 
 # creates get_depth() and set_depth($depth) etc. methods
 foreach my $i (qw(depth root namespace expires_in max_lifetime
-                  background_cache generating_info
+                  background_cache get_progress_info set_progress_info
                   on_error)) {
 	my $field = $i;
 	no strict 'refs';
@@ -156,14 +167,25 @@ foreach my $i (qw(depth root namespace expires_in max_lifetime
 	};
 }
 
-# $cache->generating_info($wait_code);
-# runs 'generating_info' subroutine, for activity indicator,
-# checking if it is defined first.
-sub generating_info {
+sub set_generating_info {
 	my $self = shift;
 
-	if (defined $self->{'generating_info'}) {
-		$self->{'generating_info'}->($self, @_);
+	$self->set_get_progress_info(@_);
+	$self->set_set_progress_info(@_);
+}
+
+# $cache->{get,set}_progress_info($key, $wait_code);
+# runs '{get,set}_progress_info' subroutine, for activity indicator,
+# checking if it is defined first.
+foreach my $name qw(get_progress_info set_progress_info) {
+	my $method = $name;
+	no strict 'refs';
+	*{"$method"} = sub {
+		my $self = shift;
+
+		if (defined $self->{$name}) {
+			$self->{$name}->($self, @_);
+		}
 	}
 }
 
@@ -269,11 +291,11 @@ sub _tempfile_to_path {
 # Wait for data to be available using (blocking) $code,
 # then return filehandle and filename to read from for $key.
 sub _wait_for_data {
-	my ($self, $key, $sync_coderef) = @_;
+	my ($self, $key, $progress_info, $sync_coderef) = @_;
 	my @result;
 
 	# provide "generating page..." info, if exists
-	$self->generating_info($key, $sync_coderef);
+	$self->$progress_info($key, $sync_coderef);
 	# generating info may exit, so we can not get there
 
 	# wait for data to be available
@@ -300,7 +322,7 @@ sub _set_maybe_background {
 		# or if main process would show progress indicator
 		$detach = @stale_result;
 		$pid = fork()
-			if (@stale_result || $self->{'generating_info'});
+			if (@stale_result || $self->{'set_progress_info'});
 	}
 
 	if ($pid) {
@@ -503,7 +525,7 @@ sub compute_fh {
 					or $self->_handle_error("Couldn't reopen (for reading) lockfile '$lockfile': $!");
 
 				eval {
-					@result = $self->_wait_for_data($key, sub {
+					@result = $self->_wait_for_data($key, 'set_progress_info', sub {
 						flock($lock_fh, LOCK_SH);
 						# or 'waitpid -1, 0;', or 'wait;', as we don't detach now in this situation
 					});
@@ -534,7 +556,7 @@ sub compute_fh {
 			# wait for regeneration if no stale data to serve,
 			# using shared / readers lock to sync (wait for data)
 			eval {
-				@result = $self->_wait_for_data($key, sub {
+				@result = $self->_wait_for_data($key, 'get_progress_info', sub {
 					flock($lock_fh, LOCK_SH);
 				});
 			};
diff --git a/t/gitweb-lib.sh b/t/gitweb-lib.sh
index 8652c91..f0ef009 100755
--- a/t/gitweb-lib.sh
+++ b/t/gitweb-lib.sh
@@ -57,7 +57,7 @@ gitweb_enable_caching () {
 		cat >>gitweb_config.perl <<-\EOF &&
 		$caching_enabled = 1;
 		$cache_options{"expires_in"} = -1;      # never expire cache for tests
-		$cache_options{"cache_root"} = "$TRASH_DIRECTORY/cache"; # to clear the right thing
+		$cache_options{"cache_root"} = "cache"; # to clear the right thing
 		$cache_options{"background_cache"} = 0; # no background processes in test suite
 		$cache_options{"generating_info"} = undef; # tests do not use web browser
 		EOF
diff --git a/t/t9510/test_capture_interface.pl b/t/t9510/test_capture_interface.pl
index 6d90497..35e46ad 100755
--- a/t/t9510/test_capture_interface.pl
+++ b/t/t9510/test_capture_interface.pl
@@ -116,8 +116,8 @@ $captured = $outer_capture->capture(sub {
 	print "|post";
 }, 'outer_actual');
 
-my $inner = read_file('inner_actual');
-my $outer = read_file('outer_actual');
+$inner = read_file('inner_actual');
+$outer = read_file('outer_actual');
 
 is($inner, "INNER:pre|",
    'nested capture with die: inner output captured up to die');
@@ -125,6 +125,30 @@ is($outer, "pre|@=die from inner\n|post",
    'nested capture with die: outer caught rethrown exception from inner');
 
 
+# Testing tee feature, if available
+#
+SKIP: {
+	skip "PerlIO::Util module not found", 3
+		unless eval { require PerlIO::Util; 1 };
+
+	can_ok($capture, 'tee');
+
+	$captured = $outer_capture->capture(sub {
+		print "pre|";
+		my $captured = $capture->tee(sub {
+			print "INNER";
+		}, 'inner_actual');
+		print "|post";
+	}, 'outer_actual');
+
+	$inner = read_file('inner_actual');
+	$outer = read_file('outer_actual');
+
+	is($inner, "INNER",          'tee: captured');
+	is($outer, "pre|INNER|post", 'tee: printed');
+};
+
+
 done_testing();
 
 # Local Variables:
diff --git a/t/t9511/test_cache_interface.pl b/t/t9511/test_cache_interface.pl
index 1e8feb3..83f3894 100755
--- a/t/t9511/test_cache_interface.pl
+++ b/t/t9511/test_cache_interface.pl
@@ -154,6 +154,14 @@ sub get_value_slow_fh {
 	sleep $slow_time;
 	print {$fh} $value;
 }
+sub tee_value_slow_fh {
+	my $fh = shift;
+
+	$call_count++;
+	sleep $slow_time;
+	print       $value;
+	print {$fh} $value;
+}
 sub get_value_die {
 	$call_count++;
 	die "get_value_die\n";
@@ -402,6 +410,26 @@ subtest 'generating progress info' => sub {
 	);
 
 
+	# with background generation, tee-like, no stale data
+	$cache->set_set_progress_info(undef);
+	$cache->set_background_cache(1);
+	$cache->remove($key); # no data and no stale value
+	$call_count = 0;
+
+	@output = parallel_run {
+		my $data = cache_compute_fh($cache, $key, \&tee_value_slow_fh);
+		print "$sep$call_count$sep";
+		print $data if defined $data;
+	};
+	my $getting_output = (grep /\Q${sep}0${sep}\E/, @output)[0];
+	my $setting_output = (grep /\Q${sep}1${sep}\E/, @output)[0];
+	is($getting_output, "$progress_info${sep}0$sep$value",
+	   'background, no stale, tee: waiting process prints progress, gets data');
+	is($setting_output, "$value${sep}1$sep$value",
+	   'background, no stale, tee: generating process prints data, sets data');
+	$cache->set_generating_info(\&test_generating_info); # restore
+
+
 	# with background generation, with stale value
 	cache_set_fh($cache, $key, $stale_value);
 	$cache->set_expires_in(0);    # set value is now expired
@@ -427,6 +455,7 @@ subtest 'generating progress info' => sub {
 };
 $cache->set_expires_in(-1);
 
+
 done_testing();
 
 
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 11/9] [PoC] gitweb/lib - tee, i.e. print and capture during cache entry generation
  2011-01-03 21:33 ` [RFC PATCH v7 11/9] [PoC] gitweb/lib - tee, i.e. print and capture during cache entry generation Jakub Narebski
@ 2011-01-03 23:31   ` J.H.
  2011-01-04  0:28     ` Jakub Narebski
  0 siblings, 1 reply; 34+ messages in thread
From: J.H. @ 2011-01-03 23:31 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John 'Warthog9' Hawley

On 01/03/2011 01:33 PM, Jakub Narebski wrote:
> Instead of having gitweb use progress info indicator / throbber to
> notify user that data is being generated by current process, gitweb
> can now (provided that PerlIO::tee from PerlIO::Util is available)
> send page to web browser while simultaneously saving it to cache
> (print and capture, i.e. tee), thus having incremental generating of
> page serve as a progress indicator.

In general, and particularly for the large sites that caching is
targeted at, teeing is a really bad idea.  I've mentioned this several
times before, and the progress indicator is a *MUCH* better idea.  I'm
not sure how many times I can say that, even if this was added it would
have the potential to exacerbate disk thrashing and overall make things
a lot more complex.

1) Errors may still be generated in flight as the cache is being
generated.  It would be better to let the cache run with a progress
indicator and should an error occur, display the error instead of giving
any output that may have been generated (and thus likely a broken page).

2) Having multiple clients all waiting on the same page (in particular
the index page) can lead to invalid output.  In particular if you are
teeing the output a reading client now must come in, read the current
contents of the file (as written), then pick up on the the tee after
that.  It's actually possible for the reading client to miss data as it
may be in flight to be written and the client is switching from reading
the file to reading the tee.  I don't see anything in your code to
handle that kind of switch over.

3) This makes no allowance for the file to be generated completely in
the background while serving stale data in the interim.  Keep in mind
that it can (as Fedora has experienced) take *HOURS* to generate the
index page, teeing that output just means brokenness and isn't useful.

It's much better to have a simple, lightweight waiting message get
displayed while things happen.  When they are done, output the completed
page to all waiting clients.

- John 'Warthog9' Hawley

P.S. I'm back to work full-time on Wednesday, which I'll be catching up
on gitweb and trying to make forward progress on my gitweb code again.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 11/9] [PoC] gitweb/lib - tee, i.e. print and capture during cache entry generation
  2011-01-03 23:31   ` J.H.
@ 2011-01-04  0:28     ` Jakub Narebski
  2011-01-04 13:20       ` Jakub Narebski
  0 siblings, 1 reply; 34+ messages in thread
From: Jakub Narebski @ 2011-01-04  0:28 UTC (permalink / raw)
  To: J.H.; +Cc: git, John 'Warthog9' Hawley

On Tue, 4 Jan 2011, J.H. wrote:
> On 01/03/2011 01:33 PM, Jakub Narebski wrote:

> > Instead of having gitweb use progress info indicator / throbber to
> > notify user that data is being generated by current process, gitweb
> > can now (provided that PerlIO::tee from PerlIO::Util is available)
> > send page to web browser while simultaneously saving it to cache
> > (print and capture, i.e. tee), thus having incremental generating of
> > page serve as a progress indicator.
> 
> In general, and particularly for the large sites that caching is
> targeted at, teeing is a really bad idea.  I've mentioned this several
> times before, and the progress indicator is a *MUCH* better idea.  I'm
> not sure how many times I can say that, even if this was added it would
> have the potential to exacerbate disk thrashing and overall make things
> a lot more complex.

It might be true that tee-ing is bad for very large sites, as it
increases load a bit in those (I think) extremly rare cases where
clients concurrently access the very same error page.  But it might
be a solution for those in between cases.  I think that incrementally
generated page is better progress indicator than just "Generating..."
page.

Anyway this proof of concept patch is to show how such thing should
be implemented.  I don't think that it makes things a lot more complex;
in this rewrite everything is quite well modularized, encapsulated, and
isolated.

But the main intent behind this patch was to avoid bad interaction between
'progress info' indicator (in the process that is generating page, see
below), and non-cached error pages.

> 
> 1) Errors may still be generated in flight as the cache is being
> generated.  It would be better to let the cache run with a progress
> indicator and should an error occur, display the error instead of giving
> any output that may have been generated (and thus likely a broken page).

On the contrary, with tee-ing (and zero size sanity check) you would be
able to see pages even if there are errors saving cache entry.  Though
this wouldn't help very large sites which cannot function without caching,
it could be useful for smaller sites.

But see below.

> 2) Having multiple clients all waiting on the same page (in particular
> the index page) can lead to invalid output.  In particular if you are
> teeing the output a reading client now must come in, read the current
> contents of the file (as written), then pick up on the the tee after
> that.  It's actually possible for the reading client to miss data as it
> may be in flight to be written and the client is switching from reading
> the file to reading the tee.  I don't see anything in your code to
> handle that kind of switch over.

Err... could you explain what do you mean by "client is switching from
reading the file to reading the tee"?

Hmmm... I thought that the code is clear.  Generating data, whether it
is captured to be displayed later, or tee-ed i.e. printed and captured
to cache, is inside critical section, protected by exclusive lock.  Only
after cache entry is written (in full), the lock is released, and clients
waiting for data can access it; they use shared (readers) lock for sync.

Note that in my rewrite (and I think also in _some_ cases in your version)
files are written atomically, by writing to temporary file then renaming
it to final destination.

> 3) This makes no allowance for the file to be generated completely in
> the background while serving stale data in the interim.  Keep in mind
> that it can (as Fedora has experienced) take *HOURS* to generate the
> index page, teeing that output just means brokenness and isn't useful.

It does make allowance.  cache_output from GitwebCache::CacheOutput uses
capturing and not tee-ing if we are in background process.  When there
is stale data to serve, cache entry is (re)generated in background in
detached process.

Moreover by default cache_output has safety in that error pages generated
by such detached process are cached.

Note also that in my rewrite you can simply (by changing one single 
configuration knob) configure gitweb to also cache error pages.  This
might be best and safest solution for very large sites with very large
disk space, but not so good for smaller sites.

>
> It's much better to have a simple, lightweight waiting message get
> displayed while things happen.  When they are done, output the completed
> page to all waiting clients.

The problem with 'lightweight waiting message', as it is implemented in
your code, and as I stole it ;-), is that it doesn't provide any indicator
how much work is already done, and how much work might there be left.
Well, at least for now.

With tee-ing client (well, at least the one that is generating data; other
would get "Generating...", or rather "Waiting..." page) can estimate how
long would he/she had to wait, and literally see progress, not just some
progress indicator.

P.S. In my rewrite clients would retry generating page if it was not
generated when they were waiting for it, till they try their own hand
at generating.  This protects against process generating data being 
killed; see also test suite for caching interface.

> - John 'Warthog9' Hawley
> 
> P.S. I'm back to work full-time on Wednesday, which I'll be catching up
> on gitweb and trying to make forward progress on my gitweb code again.

I'll try to send much simplified (and easier to use in caching) error
handling using exceptions (die / eval used as throw / catch) today.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH v7 11/9] [PoC] gitweb/lib - tee, i.e. print and capture during cache entry generation
  2011-01-04  0:28     ` Jakub Narebski
@ 2011-01-04 13:20       ` Jakub Narebski
  0 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2011-01-04 13:20 UTC (permalink / raw)
  To: J.H.; +Cc: git, John 'Warthog9' Hawley

Jakub Narebski wrote:
> On Tue, 4 Jan 2011, J.H. wrote:
> > On 01/03/2011 01:33 PM, Jakub Narebski wrote:
> 
> > > Instead of having gitweb use progress info indicator / throbber to
> > > notify user that data is being generated by current process, gitweb
> > > can now (provided that PerlIO::tee from PerlIO::Util is available)
> > > send page to web browser while simultaneously saving it to cache
> > > (print and capture, i.e. tee), thus having incremental generating of
> > > page serve as a progress indicator.
> > 
> > In general, and particularly for the large sites that caching is
> > targeted at, teeing is a really bad idea.
[...]
> > 1) Errors may still be generated in flight as the cache is being
> > generated.  It would be better to let the cache run with a progress
> > indicator and should an error occur, display the error instead of giving
> > any output that may have been generated (and thus likely a broken page).
> 
> On the contrary, with tee-ing (and zero size sanity check) you would be
> able to see pages even if there are errors saving cache entry.  Though
> this wouldn't help very large sites which cannot function without caching,
> it could be useful for smaller sites.

I was not sure how Perl reacts to ENOSPC (No space left on device), 
which I think it is only error that can be generated in flight as
cache is being generated (or as gitweb output is printed i.e. sent
to browser and captured/tee-ed i.e. saved to cache entry file), so
I have checked this (using loopback to create small filesystem).

The outcomes one worry about are the following:
* Perl dies during printing - this leads to broken page send to
  browser, and no cache entry generated
* Perl prints output without dying at all; the page send to browser
  via tee-in is all right, but cache entry is truncated which results
  in broken page shown to other clients.

But what actually happens is actually different, and quite safe:
* Perl prints output without dying, and dies on closing cache entry
  file with ENOSPC.  This means that client generating data gets correct
  output, and cache entry is not generated.  Other clients with my code
  try their hand at generation and also get correct page, but not save
  it to cache.

This means that no error page about problems with cache is shown, which
is bad.  On the other hand, at least for smaller sites, gitweb keeps 
working as if without cache for newer entries.

Note that observed behaviour might depend on operating system / filesystem
parameters, such as buffer sizes.

> But see below.

[...]

> Note also that in my rewrite you can simply (by changing one single 
> configuration knob) configure gitweb to also cache error pages.  This
> might be best and safest solution for very large sites with very large
> disk space, but not so good for smaller sites.

Errr... now after rereading your email I see that caching error pages
has one problem: errors that come from the caching engine or capturing
engine - those errors you cannot cache.  Sorry, my mistake.

> > - John 'Warthog9' Hawley
> > 
> > P.S. I'm back to work full-time on Wednesday, which I'll be catching up
> > on gitweb and trying to make forward progress on my gitweb code again.
> 
> I'll try to send much simplified (and easier to use in caching) error
> handling using exceptions (die / eval used as throw / catch) today.

Sent as
  [RFC PATCH v7 2.5/9] gitweb: Make die_error just die, and use send_error
    to create error pages
  Message-ID: <201101040135.08638.jnareb@gmail.com>
  http://permalink.gmane.org/gmane.comp.version-control.git/164466

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 11/9] [PoC] gitweb/lib - HTTP-aware output caching
  2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
                   ` (10 preceding siblings ...)
  2011-01-03 21:33 ` [RFC PATCH v7 11/9] [PoC] gitweb/lib - tee, i.e. print and capture during cache entry generation Jakub Narebski
@ 2011-01-05  2:26 ` Jakub Narebski
  11 siblings, 0 replies; 34+ messages in thread
From: Jakub Narebski @ 2011-01-05  2:26 UTC (permalink / raw)
  To: git; +Cc: J.H., John 'Warthog9' Hawley

This commit adds new option, -http_output, to cache_output()
subroutine from GitwebCache::CacheOutput module.  When this subroutine
is called as cache_output(..., -http_output => 1), it assumes that
cached output is HTTP response, consisting of HTTP headers separated
by CR LF pair from the HTTP body (contents of the page).  It adds then
Expires and Cache-Control: max-age headers if they do not exist based
on current cache entry expiration time, and Content-Length header
based on the size of cache entry file.

New subtest in t9512 includes basic tests for this feature.

Enable it in gitweb, via $cache_options{'cache_output'} hashref.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This patch is intended as proof of concept about making output caching
in gitweb make use of the fact that we cache HTTP response.  In this
patch the "smarts" (the "HTTP awareness") was added to the part of code
responsible by sending response to client.  Alternate solution would
be to add such "smarts" to saving captured output to cache file, or even
to caching engine itself.  Each of those solutions has its advantages
and disadvantages.

J.H., among others this patch is meant to illustrate that you don't need
treat output of 'snapshot' and 'blob_plain' views in a special way; you
can add Content-Length header in a action-agnostic way.

This patch replaces controversial "[RFC PATCH v7 11/9] [PoC] gitweb/lib
- tee, i.e. print and capture during cache entry generation" for
simplicity, though it is fairly independent, and probably would apply
without problems after it.


NOTE: This is only RFC, and while it passes t9512, I haven't done extensive
testing with it.

 gitweb/gitweb.perl                             |    3 ++
 gitweb/lib/GitwebCache/CacheOutput.pm          |   32 ++++++++++++++++++++++++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   24 ++++++++++++++++++
 t/t9512/test_cache_output.pl                   |   25 ++++++++++++++++++
 4 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index e3f02b0..f3c14a9 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -360,6 +360,9 @@ our %cache_options = (
 		# pages will be cached if were generated in detached process).
 		# Default is undef.
 		'-cache_errors' => undef,
+		# Mark that we are caching HTTP response, and that we want extra treatment,
+		# i.e. automatic adding of Expires/Cache-Control and Content-Length headers
+		'-http_output' => 1,
 	},
 );
 # You define site-wide options for "Generating..." page (if enabled) here
diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
index 188d4ab..e9884ca 100644
--- a/gitweb/lib/GitwebCache/CacheOutput.pm
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -19,6 +19,7 @@ use warnings;
 
 use File::Copy qw();
 use Symbol qw(qualify_to_ref);
+use CGI::Util qw(expires);
 
 use Exporter qw(import);
 our @EXPORT      = qw(cache_output);
@@ -69,6 +70,37 @@ sub cache_output {
 		$filename ||= $capture_filename;
 	}
 
+	if ($opts{'-http_output'}) {
+		# we need filehandle; filename is not enough
+		open $fh, '<', $filename unless defined $fh;
+
+		# get HTTP headers first
+		my (@headers, %norm_headers);
+		while (my $line = <$fh>) {
+			last if $line eq "\r\n";
+			push @headers, $line;
+			if ($line =~ /^([^:]+:)\s+(.*)$/) {
+				(my $header = lc($1)) =~ s/_/-/;
+				$norm_headers{$header} = $2;
+			}
+		}
+		print join('', @headers);
+
+		# extra headers
+		if (!exists $norm_headers{lc('Expires')} &&
+		    !exists $norm_headers{lc('Cache-Control')}) {
+			my $expires_in = $cache->expires_in($key);
+			print "Expires: " . expires($expires_in, 'http')."\r\n".
+			      "Cache-Control: max-age=$expires_in\r\n";
+		}
+		if (!exists $norm_headers{lc('Content-Length')}) {
+			my $length = (-s $fh) - (tell $fh);
+			print "Content-Length: $length\r\n" if $length;
+		}
+
+		print "\r\n"; # separates headers from body
+	}
+
 	if (defined $fh || defined $filename) {
 		# set binmode only if $fh is defined (is a filehandle)
 		# File::Copy::copy opens files given by filename in binary mode
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 291526e..813331a 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -457,6 +457,30 @@ sub is_valid {
 	return (($now - $mtime) < $expires_in);
 }
 
+# $cache->expires_in($key)
+#
+# Returns number of seconds an entry would be valid, or undef
+# if cache entry for given $key does not exists.
+sub expires_in {
+	my ($self, $key) = @_;
+
+	my $path = $self->path_to_key($key);
+
+	# does file exists in cache?
+	return undef unless -f $path;
+	# get its modification time
+	my $mtime = (stat(_))[9] # _ to reuse stat structure used in -f test
+		or $self->_handle_error("Couldn't stat file '$path' for key '$key': $!");
+
+	my $expires_in = $self->get_expires_in();
+
+	my $now = time();
+	print STDERR __PACKAGE__."now=$now; mtime=$mtime; ".
+		"expires_in=$expires_in; diff=".($now - $mtime)."\n";
+
+	return $expires_in == 0 ? 0 : ($self->get_expires_in() - ($now - $mtime));
+}
+
 # Getting and setting
 
 # ($fh, $filename) = $cache->compute_fh($key, $code);
diff --git a/t/t9512/test_cache_output.pl b/t/t9512/test_cache_output.pl
index 758848c..3492dcf 100755
--- a/t/t9512/test_cache_output.pl
+++ b/t/t9512/test_cache_output.pl
@@ -6,6 +6,8 @@ use strict;
 
 use Test::More;
 
+use CGI qw(:standard);
+
 # test source version
 use lib $ENV{GITWEBLIBDIR} || "$ENV{GIT_BUILD_DIR}/gitweb/lib";
 
@@ -158,5 +160,28 @@ subtest 'errors are cached with -cache_errors => 1' => sub {
 };
 
 
+# caching HTTP output
+subtest 'HTTP output' => sub {
+	$cache->remove($key);
+	$cache->set_expires_in(60);
+
+	my $header =
+		header(-status=>'200 OK', -type=>'text/plain', -charset => 'utf-8');
+	my $data = "1234567890";
+	$action_output = $header.$data;
+	$test_data = capture_output_of_cache_output(\&action, '-http_output' => 1);
+
+	$header =~ s/\r\n$//;
+	my $length = do { use bytes; length($data); };
+	like($test_data, qr/^\Q$header\E/, 'http: starts with provided http header');
+	like($test_data, qr/\Q$data\E$/,   'http: ends with body (payload)');
+	like($test_data, qr/^Expires: /m,  'http: some "Expires:" header added');
+	like($test_data, qr/^Cache-Control: max-age=\d+\r\n/m,
+	                                   'http: "Cache-Control:" with max-age added');
+	like($test_data, qr/^Content-Length: $length\r\n/m,
+	                                   'http: "Content-Length:" header with correct value'); 
+};
+
+
 done_testing();
 __END__
-- 
1.7.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2011-01-05  2:27 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
2010-12-22 23:55 ` [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error Jakub Narebski
2010-12-23  1:55   ` Jonathan Nieder
2010-12-25 22:14     ` Jakub Narebski
2010-12-26  9:07       ` [RFC/PATCH] diff: funcname and word patterns for perl Jonathan Nieder
     [not found]         ` <201012261143.33190.trast@student.ethz.ch>
2010-12-26 10:54           ` Jonathan Nieder
     [not found]             ` <201012261206.11942.trast@student.ethz.ch>
2010-12-26 11:22               ` Jonathan Nieder
2010-12-26 23:14         ` Jakub Narebski
2010-12-27 17:18           ` Junio C Hamano
2010-12-27 22:44             ` Jakub Narebski
2010-12-28  3:52             ` Jeff King
2010-12-26  9:50       ` [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error Jonathan Nieder
2010-12-26 22:25         ` Jakub Narebski
2010-12-22 23:55 ` [RFC PATCH v7 2/9] gitweb: use eval + die for error (exception) handling Jakub Narebski
2010-12-23  2:08   ` Jonathan Nieder
2010-12-25 23:17     ` Jakub Narebski
2011-01-04  0:35   ` [RFC PATCH v7 2.5/9] gitweb: Make die_error just die, and use send_error to create error pages Jakub Narebski
2010-12-22 23:55 ` [RFC PATCH v7 3/9] gitweb: Introduce %actions_info, gathering information about actions Jakub Narebski
2010-12-22 23:56 ` [RFC PATCH v7 4/9] gitweb: Prepare for splitting gitweb Jakub Narebski
2010-12-24  9:29   ` Jonathan Nieder
2010-12-26 22:54     ` Jakub Narebski
2010-12-22 23:56 ` [RFC PATCH v7 5/9] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
2010-12-22 23:57 ` [RFC PATCH v7 6/9] gitweb/lib - Simple output capture by redirecting STDOUT to file Jakub Narebski
2010-12-24  9:49   ` Jonathan Nieder
2010-12-26 23:03     ` Jakub Narebski
2010-12-22 23:57 ` [RFC PATCH v7 7/9] gitweb/lib - Very simple file based cache Jakub Narebski
2010-12-22 23:57 ` [RFC PATCH v7 8/9] gitweb/lib - Cache captured output (using compute_fh) Jakub Narebski
2010-12-22 23:58 ` [RFC PATCH v7 9/9] gitweb: Add optional output caching Jakub Narebski
2010-12-31 18:03 ` [RFC PATCH v7 10/9] gitweb: Background cache generation and progress indicator Jakub Narebski
2011-01-03 21:33 ` [RFC PATCH v7 11/9] [PoC] gitweb/lib - tee, i.e. print and capture during cache entry generation Jakub Narebski
2011-01-03 23:31   ` J.H.
2011-01-04  0:28     ` Jakub Narebski
2011-01-04 13:20       ` Jakub Narebski
2011-01-05  2:26 ` [RFC PATCH 11/9] [PoC] gitweb/lib - HTTP-aware output caching Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).