All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J.H." <warthog9@eaglescrag.net>
To: Petr Baudis <pasky@suse.cz>
Cc: Jakub Narebski <jnareb@gmail.com>,
	git@vger.kernel.org,
	"John 'Warthog9' Hawley" <warthog9@kernel.org>
Subject: Re: [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating cache (WIP)
Date: Mon, 25 Jan 2010 12:32:37 -0800	[thread overview]
Message-ID: <4B5DFFE5.6060908@eaglescrag.net> (raw)
In-Reply-To: <20100125135653.GN4159@machine.or.cz>

On 01/25/2010 05:56 AM, Petr Baudis wrote:
> On Mon, Jan 25, 2010 at 02:48:26PM +0100, Jakub Narebski wrote:
>> Now those patches (mine and J.H. both) make gitweb use locking
>> (it is IIRC configurable in J.H. patch) to make only one process
>> generate the page if it is missing from cache, or is stale.  Now
>> if it is missing, we have to wait until it is generated in full
>> before being able to show it to client.  While it is possible to
>> "tee" output (using PerlIO::tee, or Capture::Tiny, or tie like
>> CGI::Cache) writing it simultaneously to browser and to cache for 
>> the process that is generating data, it is as far as I understand
>> it impossible for processes which are waiting for data.  Therefore
>> the need for "Generating..." page, so the user does not think that
>> web server hung or something, and is not generating output.
> 
> Ah, ok, so the message is there to cover up for a technical problem. ;-)
> I didn't quite realize. Then, it would be great to tweak the mechanisms
> so that the user does not really have to wait.

No, that is an incorrect assumption on how the 'Generating...' page
works, and your missing a bit of the point.

(1) The message itself 'Generating...' is a que to the user that
something is happening and that the browser is not actually hanging.
Web users are at the point where if things are not instantaneous and
show immediately they will either browse away completely or hit the
refresh button incessantly until content does appear.  While the page is
usually only seen for about a second, and I'll admit it can be annoying,
it's nothing more than a 'sit tight a second'.  For things like the
front page it can take upwards of 7 seconds to generate for a single
user, a lot to ask for a no response scenario.

(2) It prevents the stampeding herd problem, which was very vehemently
discussed 4 years ago by HPA and myself and roughly boils down to this:

When a single user comes into the site, in particular the front page, it
kicks off a process that will start to generate at it, causing a huge
amount of git requests into individual repositories and a lot of disk
i/o.  A second user will then come in and the same requests will start
to be done from the beginning again, and so on until you basically kill
the machine because the disk i/o goes up enough that it can't ever
service the requests fast enough.

This does 2 things in the end:

1) means there's only 1 copy of the page ever being generated, thus
meaning there isn't extraneous and dangerous disk i/o going on on the system

2) prevents a user from reporting to the website that it's broken by
giving them a visual que that things aren't broken.


> So, I wonder about two things:
> 
> (i) How often does it happen that two requests for the same page are
> received? Has anyone measured it? Or is at least able to make
> a minimally educated guess? IOW, isn't this premature optimization?

For most pages, not many but it happens more often than you think.  The
data I have is much too old to be useful now but the front page could,
at times, have up to 30 people waiting for it without caching.  This is
a very important patch believe it or not.  For a site the size of
kernel.org it cannot exist without this.

But here's a quick stat, in 36 hours git.kernel.org has had
156099 accesses world wide or about 1.2 accesses a second.

android.git.kernel.org, in the same time period has had 115818 accesses.

If the first request takes 7 seconds to generate, by the time it's done
there are now 3 additional requests running.  If it again takes 7
seconds to generate there are now another 3 requests running, etc.  Very
quickly you've got so much i/o running the box more or less is useless.

> (ii) Can't the locked gitwebs do the equivalent of tail -f?

Not really going to help much, most of the gitweb operations won't
output much of anything beyond the header until it's collected all of
the data it needs anyway and then there will be a flurry of output.  It
also means that this 'Generating...' page will only work for caching
schemes that tail can read out of, which I'm not sure it would work all
that well with things like memcached or a non-custom caching layer where
we don't necessarily have direct access to the file being written to.

At least the way I had it (and I'll admit I haven't read through Jakub's
re-working of my patches so I don't know if it's still there) is that
with background caching you only get the 'Generating...' page if it's
new or the content is grossly out of data.  If it's a popular page and
it's not grossly out of date it shows you the 'stale' data while it
generates the new content in the background anyway, only locking you out
when the new file is being written.  Or at least that's how I had it.

- John 'Warthog9' Hawley

  reply	other threads:[~2010-01-25 20:33 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-14  1:22 [PATCH 0/9] Gitweb caching v5 John 'Warthog9' Hawley
2010-01-14  1:22 ` [PATCH 1/9] gitweb: Load checking John 'Warthog9' Hawley
2010-01-14  1:22   ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information John 'Warthog9' Hawley
2010-01-14  1:22     ` [PATCH 3/9] gitweb: Add option to force version match John 'Warthog9' Hawley
2010-01-14  1:23       ` [PATCH 4/9] gitweb: Makefile improvements John 'Warthog9' Hawley
2010-01-14  1:23         ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time John 'Warthog9' Hawley
2010-01-14  1:23           ` [PATCH 6/9] gitweb: add a get function to compliment print_sort_th John 'Warthog9' Hawley
2010-01-14  1:23             ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header John 'Warthog9' Hawley
2010-01-14  1:23               ` [PATCH 8/9] gitweb: Convert output to using indirect file handle John 'Warthog9' Hawley
2010-01-14  1:23                 ` [PATCH 9/9] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
2010-01-16  2:48                   ` Jakub Narebski
2010-01-23  0:27                   ` [RFC PATCH 00/10] gitweb: Simple file based output caching Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 02/10] gitweb: href(..., -path_info => 0|1) Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 03/10] gitweb/cache.pm - Very simple file based caching Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 04/10] gitweb/cache.pm - Stat-based cache expiration Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 05/10] gitweb: Use Cache::Cache compatibile (get, set) output caching (WIP) Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 06/10] gitweb/cache.pm - Adaptive cache expiration time (WIP) Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 07/10] gitweb: Use CHI compatibile (compute method) caching (WIP) Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 08/10] gitweb/cache.pm - Use locking to avoid 'stampeding herd' problem (WIP) Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 09/10] gitweb/cache.pm - Serve stale data when waiting for filling cache (WIP) Jakub Narebski
2010-01-23  0:27                     ` [RFC PATCH 10/10] gitweb: Show appropriate "Generating..." page when regenerating " Jakub Narebski
2010-01-24 22:24                       ` Petr Baudis
2010-01-25  0:03                         ` Jakub Narebski
2010-01-25  1:17                           ` Jakub Narebski
2010-01-25 11:46                         ` Jakub Narebski
2010-01-25 13:02                           ` Petr Baudis
2010-01-25 13:48                             ` Jakub Narebski
2010-01-25 13:56                               ` Petr Baudis
2010-01-25 20:32                                 ` J.H. [this message]
2010-01-26  1:49                                   ` Jakub Narebski
2010-01-28 17:39                                   ` Petr Baudis
2010-01-31 11:58                                     ` Jakub Narebski
2010-01-25 20:58                                 ` Jakub Narebski
2010-01-25 20:41                               ` J.H.
2010-01-26  2:30                                 ` Jakub Narebski
2010-01-23 19:55                     ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H.
2010-01-24 13:54                     ` [RFC PATCH 11/10] gitweb: Ajax-y "Generating..." page when regenerating cache (WIP) Jakub Narebski
2010-02-06  0:51                     ` [RFC PATCH 00/10] gitweb: Simple file based output caching J.H.
2010-02-06 23:56                       ` Jakub Narebski
2010-02-07 12:35                         ` Jakub Narebski
     [not found]                   ` <0dd15cb3f18e2a26fc834fd3b071e6d3ecc00557.1264198194.git.jnareb@gmail.com>
2010-01-23  0:48                     ` [RFC PATCH 01/10] gitweb: Print to explicit filehandle (preparing for caching) Jakub Narebski
2010-02-07 21:32                     ` Jakub Narebski
2010-01-16  0:43                 ` [PATCH 8/9] gitweb: Convert output to using indirect file handle Jakub Narebski
2010-01-16  0:58                   ` Junio C Hamano
2010-01-16  1:14                     ` Jakub Narebski
2010-01-16  1:41                       ` Junio C Hamano
2010-01-24 22:14                   ` Petr Baudis
2010-01-25  1:47                     ` Jakub Narebski
2010-01-25 20:48                       ` J.H.
2010-01-25 21:48                         ` Jakub Narebski
2010-01-15 23:49               ` [PATCH 7/9] gitweb: cleanup error message produced by undefined $site_header Jakub Narebski
2010-01-23 11:13           ` [PATCH 5/9] gitweb: add a get function to compliment print_local_time Jakub Narebski
2010-01-15 23:36       ` [PATCH 3/9] gitweb: Add option to force version match Jakub Narebski
2010-01-24 21:59       ` Petr Baudis
2010-01-24 23:17         ` Jakub Narebski
2010-01-15 22:40     ` [PATCH 2/9] gitweb: change die_error to take "extra" argument for extended die information Jakub Narebski
2010-01-15 22:30   ` [PATCH 1/9] gitweb: Load checking Jakub Narebski
2010-01-15  1:40 ` [PATCH 0/9] Gitweb caching v5 Jakub Narebski
2010-01-15  4:29   ` J.H.
2010-01-15 10:28     ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B5DFFE5.6060908@eaglescrag.net \
    --to=warthog9@eaglescrag.net \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=pasky@suse.cz \
    --cc=warthog9@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.