From: Jakub Narebski <jnareb@gmail.com>
To: "J.H." <warthog9@eaglescrag.net>
Cc: git@vger.kernel.org, "John 'Warthog9' Hawley" <warthog9@kernel.org>
Subject: Re: [RFC PATCH v7 11/9] [PoC] gitweb/lib - tee, i.e. print and capture during cache entry generation
Date: Tue, 4 Jan 2011 01:28:25 +0100 [thread overview]
Message-ID: <201101040128.26826.jnareb@gmail.com> (raw)
In-Reply-To: <4D225C6E.9000108@eaglescrag.net>
On Tue, 4 Jan 2011, J.H. wrote:
> On 01/03/2011 01:33 PM, Jakub Narebski wrote:
> > Instead of having gitweb use progress info indicator / throbber to
> > notify user that data is being generated by current process, gitweb
> > can now (provided that PerlIO::tee from PerlIO::Util is available)
> > send page to web browser while simultaneously saving it to cache
> > (print and capture, i.e. tee), thus having incremental generating of
> > page serve as a progress indicator.
>
> In general, and particularly for the large sites that caching is
> targeted at, teeing is a really bad idea. I've mentioned this several
> times before, and the progress indicator is a *MUCH* better idea. I'm
> not sure how many times I can say that, even if this was added it would
> have the potential to exacerbate disk thrashing and overall make things
> a lot more complex.
It might be true that tee-ing is bad for very large sites, as it
increases load a bit in those (I think) extremly rare cases where
clients concurrently access the very same error page. But it might
be a solution for those in between cases. I think that incrementally
generated page is better progress indicator than just "Generating..."
page.
Anyway this proof of concept patch is to show how such thing should
be implemented. I don't think that it makes things a lot more complex;
in this rewrite everything is quite well modularized, encapsulated, and
isolated.
But the main intent behind this patch was to avoid bad interaction between
'progress info' indicator (in the process that is generating page, see
below), and non-cached error pages.
>
> 1) Errors may still be generated in flight as the cache is being
> generated. It would be better to let the cache run with a progress
> indicator and should an error occur, display the error instead of giving
> any output that may have been generated (and thus likely a broken page).
On the contrary, with tee-ing (and zero size sanity check) you would be
able to see pages even if there are errors saving cache entry. Though
this wouldn't help very large sites which cannot function without caching,
it could be useful for smaller sites.
But see below.
> 2) Having multiple clients all waiting on the same page (in particular
> the index page) can lead to invalid output. In particular if you are
> teeing the output a reading client now must come in, read the current
> contents of the file (as written), then pick up on the the tee after
> that. It's actually possible for the reading client to miss data as it
> may be in flight to be written and the client is switching from reading
> the file to reading the tee. I don't see anything in your code to
> handle that kind of switch over.
Err... could you explain what do you mean by "client is switching from
reading the file to reading the tee"?
Hmmm... I thought that the code is clear. Generating data, whether it
is captured to be displayed later, or tee-ed i.e. printed and captured
to cache, is inside critical section, protected by exclusive lock. Only
after cache entry is written (in full), the lock is released, and clients
waiting for data can access it; they use shared (readers) lock for sync.
Note that in my rewrite (and I think also in _some_ cases in your version)
files are written atomically, by writing to temporary file then renaming
it to final destination.
> 3) This makes no allowance for the file to be generated completely in
> the background while serving stale data in the interim. Keep in mind
> that it can (as Fedora has experienced) take *HOURS* to generate the
> index page, teeing that output just means brokenness and isn't useful.
It does make allowance. cache_output from GitwebCache::CacheOutput uses
capturing and not tee-ing if we are in background process. When there
is stale data to serve, cache entry is (re)generated in background in
detached process.
Moreover by default cache_output has safety in that error pages generated
by such detached process are cached.
Note also that in my rewrite you can simply (by changing one single
configuration knob) configure gitweb to also cache error pages. This
might be best and safest solution for very large sites with very large
disk space, but not so good for smaller sites.
>
> It's much better to have a simple, lightweight waiting message get
> displayed while things happen. When they are done, output the completed
> page to all waiting clients.
The problem with 'lightweight waiting message', as it is implemented in
your code, and as I stole it ;-), is that it doesn't provide any indicator
how much work is already done, and how much work might there be left.
Well, at least for now.
With tee-ing client (well, at least the one that is generating data; other
would get "Generating...", or rather "Waiting..." page) can estimate how
long would he/she had to wait, and literally see progress, not just some
progress indicator.
P.S. In my rewrite clients would retry generating page if it was not
generated when they were waiting for it, till they try their own hand
at generating. This protects against process generating data being
killed; see also test suite for caching interface.
> - John 'Warthog9' Hawley
>
> P.S. I'm back to work full-time on Wednesday, which I'll be catching up
> on gitweb and trying to make forward progress on my gitweb code again.
I'll try to send much simplified (and easier to use in caching) error
handling using exceptions (die / eval used as throw / catch) today.
--
Jakub Narebski
Poland
next prev parent reply other threads:[~2011-01-04 0:28 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-22 23:54 [RFC PATCH v7 0/9] gitweb: Output caching, with eval/die based error handling Jakub Narebski
2010-12-22 23:55 ` [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error Jakub Narebski
2010-12-23 1:55 ` Jonathan Nieder
2010-12-25 22:14 ` Jakub Narebski
2010-12-26 9:07 ` [RFC/PATCH] diff: funcname and word patterns for perl Jonathan Nieder
[not found] ` <201012261143.33190.trast@student.ethz.ch>
2010-12-26 10:54 ` Jonathan Nieder
[not found] ` <201012261206.11942.trast@student.ethz.ch>
2010-12-26 11:22 ` Jonathan Nieder
2010-12-26 23:14 ` Jakub Narebski
2010-12-27 17:18 ` Junio C Hamano
2010-12-27 22:44 ` Jakub Narebski
2010-12-28 3:52 ` Jeff King
2010-12-26 9:50 ` [RFC PATCH v7 1/9] gitweb: Go to DONE_REQUEST rather than DONE_GITWEB in die_error Jonathan Nieder
2010-12-26 22:25 ` Jakub Narebski
2010-12-22 23:55 ` [RFC PATCH v7 2/9] gitweb: use eval + die for error (exception) handling Jakub Narebski
2010-12-23 2:08 ` Jonathan Nieder
2010-12-25 23:17 ` Jakub Narebski
2011-01-04 0:35 ` [RFC PATCH v7 2.5/9] gitweb: Make die_error just die, and use send_error to create error pages Jakub Narebski
2010-12-22 23:55 ` [RFC PATCH v7 3/9] gitweb: Introduce %actions_info, gathering information about actions Jakub Narebski
2010-12-22 23:56 ` [RFC PATCH v7 4/9] gitweb: Prepare for splitting gitweb Jakub Narebski
2010-12-24 9:29 ` Jonathan Nieder
2010-12-26 22:54 ` Jakub Narebski
2010-12-22 23:56 ` [RFC PATCH v7 5/9] t/test-lib.sh: Export also GIT_BUILD_DIR in test_external Jakub Narebski
2010-12-22 23:57 ` [RFC PATCH v7 6/9] gitweb/lib - Simple output capture by redirecting STDOUT to file Jakub Narebski
2010-12-24 9:49 ` Jonathan Nieder
2010-12-26 23:03 ` Jakub Narebski
2010-12-22 23:57 ` [RFC PATCH v7 7/9] gitweb/lib - Very simple file based cache Jakub Narebski
2010-12-22 23:57 ` [RFC PATCH v7 8/9] gitweb/lib - Cache captured output (using compute_fh) Jakub Narebski
2010-12-22 23:58 ` [RFC PATCH v7 9/9] gitweb: Add optional output caching Jakub Narebski
2010-12-31 18:03 ` [RFC PATCH v7 10/9] gitweb: Background cache generation and progress indicator Jakub Narebski
2011-01-03 21:33 ` [RFC PATCH v7 11/9] [PoC] gitweb/lib - tee, i.e. print and capture during cache entry generation Jakub Narebski
2011-01-03 23:31 ` J.H.
2011-01-04 0:28 ` Jakub Narebski [this message]
2011-01-04 13:20 ` Jakub Narebski
2011-01-05 2:26 ` [RFC PATCH 11/9] [PoC] gitweb/lib - HTTP-aware output caching Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201101040128.26826.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=warthog9@eaglescrag.net \
--cc=warthog9@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).