git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: kernel.org mirroring (Re: [GIT PULL] MMC update)
@ 2006-12-11  3:40 linux
  2006-12-11  9:30 ` Jakub Narebski
  0 siblings, 1 reply; 82+ messages in thread
From: linux @ 2006-12-11  3:40 UTC (permalink / raw)
  To: koreth; +Cc: git

>>> I posted separately about those. And I've been mulling about whether
>>> the thundering herd is really such a big problem that we need to
>>> address it head-on.
>>
>> Uhm... yes it is.
> 
> Got some more info, discussion points or links to stuff I should read
> to appreciate why that is? I am trying to articulate why I consider it
> is not a high-payoff task, as well as describing how to tackle it.
> 
> To recap, the reasons it is not high payoff is that:
>
>  - the main benefit comes from being cacheable and able to revalidate
>    the cache cheaply (with the ETags-based strategy discussed above)
>  - highly distributed caches/proxies means we'll seldom see a true
>    cold cache situation
>  - we have a huge set of URLs which are seldom hit, and will never see
>    a thundering anything
>  - we have a tiny set of very popular URLs that are the key target for
>    the thundering herd - (projects page, summary page, shortlog, fulllog)
>  - but those are in the clear as soon as the caches are populated
> 
> Why do we have to take it head-on? :-)

I think I agree with you, but not as strongly.  Certainly, having any
kind of effective cacheing (heck, just comparing the timestamp of the
relevant ref(s) with the If-Modified-Since: header) will help kernel.org
enormously.

But as soon as there's a push, particularly a release push, that
invalidates *all* of the popular pages *and* the thindering herd arrives.

The result is that all of the popular "what's new?" summary pages get
fetched 15 times in parallel and, because the front end doesn't serialize
them, populating the caches can be a painful process involving a lot of
repeated work.

I tend to agree that for the basic project summary pages, generating them
preemptively as static pages out of the push script seems best.
("find /usr/src/linux -type d -print | wc -l" is 1492.  Dear me.
Oh!  There is no per-directory shortlog page; that simplifies things.
But there *should* be.)

The only tricky thing is the "n minutes/hours/days ago" timestamps.
Basically, you want to generate a half-formatted, indefinitely-cacheable
page that contains them as absolute timestamps, and a have system for
regenerating the fully-formatted page from that (and the current time).

The ideas that people have been posting seem excellent.  Give a page
two timeouts.  If a GET arrives before the first timestamp, and no
prerequisites have changes, it's served directly from cache.  If it
arrives after the second timeout, or the prerequisites have changed,
it blocks until the page is regenerated.  But if it arrives between
those two times, it serves the stale data and starts generating fresh
data in the background.

So for the fully-formed timestamps, the first timeout is when the next
human-readable timestamp on the page ticks over.  But the second timeout
can be past that by, say, 5% of the timeout value.  It's okay to display
"3 hours ago" until 12 minutes past the 4 hour mark.

It might be okay to allow even the prerequisites to be slightly stale when
serving old data; it's okay if it takes 30 seconds for the kernel.org
web page to notice that Linus pushed.  But on my office gitweb, I'm not
sure that it's okay to take 30 seconds to notice that *I* just pushed.
(I'm also not sure about consistency issues.  If I link from one page
that shows the new release to another, it would be a bit disconcerting
if it disappeared.)


The nasty problem with built-in cacheing is that you need a whole cache
reclaim infrastructure; it would be so much nicer to let Squid deal
with that whole mess.  But it can't deal with anything other than fully

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2006-12-12 21:17 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <45708A56.3040508@drzeus.cx>
     [not found] ` <Pine.LNX.4.64.0612011639240.3695@woody.osdl.org>
     [not found]   ` <457151A0.8090203@drzeus.cx>
     [not found]     ` <Pine.LNX.4.64.0612020835110.3476@woody.osdl.org>
     [not found]       ` <45744FA3.7020908@zytor.com>
     [not found]         ` <Pine.LNX.4.64.0612061847190.3615@woody.osdl.org>
     [not found]           ` <45778AA3.7080709@zytor.com>
     [not found]             ` <Pine.LNX.4.64.0612061940170.3615@woody.osdl.org>
     [not found]               ` <4577A84C.3010601@zytor.com>
     [not found]                 ` <Pine.LNX.4.64.0612070953290.3615@woody.osdl.org>
     [not found]                   ` <45785697.1060001@zytor.com>
2006-12-07 19:05                     ` kernel.org mirroring (Re: [GIT PULL] MMC update) Linus Torvalds
2006-12-07 19:16                       ` H. Peter Anvin
2006-12-07 19:30                         ` Olivier Galibert
2006-12-07 19:57                           ` H. Peter Anvin
2006-12-07 23:50                             ` Olivier Galibert
2006-12-07 23:56                               ` H. Peter Anvin
2006-12-08 11:25                               ` Jakub Narebski
2006-12-08 12:57                             ` Rogan Dawes
2006-12-08 13:38                               ` Jakub Narebski
2006-12-08 14:31                                 ` Rogan Dawes
2006-12-08 15:38                                   ` Jonas Fonseca
2006-12-09  1:28                                 ` Martin Langhoff
2006-12-09  2:03                                   ` H. Peter Anvin
2006-12-09  2:52                                     ` Martin Langhoff
2006-12-09  5:09                                       ` H. Peter Anvin
2006-12-09  5:34                                         ` Martin Langhoff
2006-12-09 16:26                                           ` H. Peter Anvin
2006-12-08 16:16                               ` H. Peter Anvin
2006-12-08 16:35                                 ` Linus Torvalds
2006-12-08 16:42                                   ` H. Peter Anvin
2006-12-08 19:49                                     ` Lars Hjemli
2006-12-08 19:51                                       ` H. Peter Anvin
2006-12-08 19:59                                         ` Lars Hjemli
2006-12-08 20:02                                           ` H. Peter Anvin
2006-12-10  9:43                                     ` rda
2006-12-08 16:54                                   ` Jeff Garzik
2006-12-08 17:04                                     ` H. Peter Anvin
2006-12-08 17:40                                       ` Jeff Garzik
2006-12-08 23:27                                     ` Linus Torvalds
2006-12-08 23:46                                       ` Michael K. Edwards
2006-12-08 23:49                                         ` H. Peter Anvin
2006-12-09  0:18                                           ` Michael K. Edwards
2006-12-09  0:23                                             ` H. Peter Anvin
2006-12-09  0:49                                         ` Linus Torvalds
2006-12-09  0:51                                           ` H. Peter Anvin
2006-12-09  4:36                                           ` Michael K. Edwards
2006-12-09  9:27                                           ` Jeff Garzik
     [not found]                                       ` <4579FABC.5070509@garzik.org>
2006-12-09  0:45                                         ` Linus Torvalds
2006-12-09  0:47                                           ` H. Peter Anvin
2006-12-09  9:16                                           ` Jeff Garzik
2006-12-09  1:56                                       ` Martin Langhoff
2006-12-09 11:51                                         ` Jakub Narebski
2006-12-09 12:42                                           ` Jeff Garzik
2006-12-09 13:37                                             ` Jakub Narebski
2006-12-09 14:43                                               ` Jeff Garzik
2006-12-09 17:02                                                 ` Jakub Narebski
2006-12-09 17:27                                                   ` Jeff Garzik
2006-12-10  4:07                                               ` Martin Langhoff
2006-12-10 10:09                                                 ` Jakub Narebski
2006-12-10 12:41                                                   ` Jeff Garzik
2006-12-10 13:02                                                     ` Jakub Narebski
2006-12-10 13:45                                                       ` Jeff Garzik
2006-12-10 19:11                                                         ` Jakub Narebski
2006-12-10 19:50                                                           ` Linus Torvalds
2006-12-10 20:27                                                             ` Jakub Narebski
2006-12-10 20:30                                                               ` Linus Torvalds
2006-12-10 22:01                                                                 ` Martin Langhoff
2006-12-10 22:14                                                                   ` Jeff Garzik
2006-12-10 22:08                                                                 ` Jeff Garzik
2006-12-10 21:01                                                             ` H. Peter Anvin
2006-12-10 22:05                                                           ` Jeff Garzik
2006-12-10 22:59                                                             ` Jakub Narebski
2006-12-11  2:16                                                               ` Martin Langhoff
2006-12-11  8:59                                                                 ` Jakub Narebski
2006-12-11 10:18                                                                   ` Martin Langhoff
2006-12-09 18:04                                             ` Linus Torvalds
2006-12-09 18:30                                               ` H. Peter Anvin
2006-12-10  3:55                                             ` Martin Langhoff
2006-12-10  7:05                                               ` H. Peter Anvin
2006-12-12 21:19                                                 ` Jakub Narebski
2006-12-09  7:56                                       ` Steven Grimm
2006-12-07 19:30                         ` Linus Torvalds
2006-12-07 19:39                           ` Shawn Pearce
2006-12-07 19:58                             ` Linus Torvalds
2006-12-07 23:33                               ` Michael K. Edwards
2006-12-07 19:58                             ` H. Peter Anvin
2006-12-07 20:05                           ` Junio C Hamano
2006-12-07 20:09                             ` H. Peter Anvin
2006-12-07 22:11                               ` Junio C Hamano
2006-12-08  9:43                       ` Jakub Narebski
2006-12-11  3:40 linux
2006-12-11  9:30 ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).