git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rogan Dawes <discard@dawes.za.net>
To: Jakub Narebski <jnareb@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	Linus Torvalds <torvalds@osdl.org>,
	Kernel Org Admin <ftpadmin@kernel.org>,
	Git Mailing List <git@vger.kernel.org>,
	Petr Baudis <pasky@ucw.cz>
Subject: Re: kernel.org mirroring (Re: [GIT PULL] MMC update)
Date: Fri, 08 Dec 2006 16:31:56 +0200	[thread overview]
Message-ID: <4579775C.2010608@dawes.za.net> (raw)
In-Reply-To: <200612081438.25493.jnareb@gmail.com>

Jakub Narebski wrote:
> Dnia piątek 8. grudnia 2006 13:57, Rogan Dawes napisał:

>> How about extending gitweb to check to see if there already exists a 
>> cached version of these pages, before recreating them?
>>
>> e.g. structure the temp dir in such a way that each project has a place 
>> for cached pages. Then, before performing expensive operations, check to 
>> see if a file corresponding to the requested page already exists. If it 
>> does, simply return the contents of the file, otherwise go ahead and 
>> create the page dynamically, and return it to the user. Do not create 
>> cached pages in gitweb dynamically.
> 
> This would add the need for directory for temporary files... well,
> it would be optional now...
> 
It would still be optional. If the "cache" directory structure exists, 
then use it, otherwise, continue as usual. All it would cost is a stat() 
or two, I guess.

>> Then, in a post-update hook, for each of the expensive pages, invoke 
>> something like:
>>
>> # delete the cached copy of the file, to force gitweb to recreate it
>> rm -f $git_temp/$project/rss
>> # get gitweb to recreate the page appropriately
>> # use a tmp file to prevent gitweb from getting confused
>> wget -O $git_temp/$project/rss.tmp \
>>    http://kernel.org/gitweb.cgi?p=$project;a=rss
>> # move the tmp file into place
>> mv $git_temp/$project/rss.tmp $git_temp/$project/rss
> 
> Good idea... although there are some page views which shouldn't change
> at all... well, with the possible exception of changes in gitweb output,
> and even then there are some (blob_plain and snapshot views) which
> doesn't change at all.
> 
> It would be good to avoid removing them on push, and only remove
> them using some tmpwatch-like removal.

Well, my theory was that we would only cache pages that change when new 
data enters the repo. So, using the push as the trigger is almost 
guaranteed to be the right thing to do. New data indicates new rss 
items, indicates an updated shortlog page, etc.

NOTE: This caching could be problematic for the "changed 2 hours ago" 
notation for various branches/files, etc. But however we implement the 
caching, we'd have this problem.

>> This way, we get the exact output returned from the usual gitweb 
>> invocation, but we can now cache the result, and only update it when 
>> there is a new commit that would affect the page output.
>>
>> This would also not affect those who do not wish to use this mechanism. 
>> If the file does not exist, gitweb.cgi will simply revert to its usual 
>> behaviour.
> 
> Good idea. Perhaps I should add it to gitweb TODO file.
> 
> Hmmm... perhaps it is time for next "[RFC] gitweb wishlist and TODO list"
> thread?
>  
>> Possible complications are the content-type headers, etc, but you could 
>> use the -s flag to wget, and store the server headers as well in the 
>> file, and get the necessary headers from the file as you stream it.
>>
>> i.e. read the headers looking for ones that are "interesting" 
>> (Content-Type, charset, expires) until you get a blank line, print out 
>> the interesting headers using $cgi->header(), then just dump the 
>> remainder of the file to the caller via stdout.
> 
> No need for that. $cgi->header() is to _generate_ the headers, so if
> a file is saved with headers, we can just dump it to STDOUT; the possible
> exception is a need to rewrite 'expires' header, if it is used.

Good point. I guess one thing that will be incorrect in the headers is 
the server date, but I doubt that anyone cares much. As you say, though, 
this might relate to the expiry of cached content in upstream caches.

> 
> Perhaps gitweb should generate it's own ETag instead of messing with
> 'expires' header?

Well, we can possibly eliminate the expires header entirely for dynamic 
pages, and check the If-Modified-Since value against the timestamp of 
the cached file, or the server date in the cached file, and return "304 
Not Modified" responses. That would also help to reduce the load on the 
server, by only returning the headers, and not the entire response.

The downside is that it would prevent upstream proxies from caching this 
data for us.

Regards,


  reply	other threads:[~2006-12-08 14:32 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <45708A56.3040508@drzeus.cx>
     [not found] ` <Pine.LNX.4.64.0612011639240.3695@woody.osdl.org>
     [not found]   ` <457151A0.8090203@drzeus.cx>
     [not found]     ` <Pine.LNX.4.64.0612020835110.3476@woody.osdl.org>
     [not found]       ` <45744FA3.7020908@zytor.com>
     [not found]         ` <Pine.LNX.4.64.0612061847190.3615@woody.osdl.org>
     [not found]           ` <45778AA3.7080709@zytor.com>
     [not found]             ` <Pine.LNX.4.64.0612061940170.3615@woody.osdl.org>
     [not found]               ` <4577A84C.3010601@zytor.com>
     [not found]                 ` <Pine.LNX.4.64.0612070953290.3615@woody.osdl.org>
     [not found]                   ` <45785697.1060001@zytor.com>
2006-12-07 19:05                     ` kernel.org mirroring (Re: [GIT PULL] MMC update) Linus Torvalds
2006-12-07 19:16                       ` H. Peter Anvin
2006-12-07 19:30                         ` Olivier Galibert
2006-12-07 19:57                           ` H. Peter Anvin
2006-12-07 23:50                             ` Olivier Galibert
2006-12-07 23:56                               ` H. Peter Anvin
2006-12-08 11:25                               ` Jakub Narebski
2006-12-08 12:57                             ` Rogan Dawes
2006-12-08 13:38                               ` Jakub Narebski
2006-12-08 14:31                                 ` Rogan Dawes [this message]
2006-12-08 15:38                                   ` Jonas Fonseca
2006-12-09  1:28                                 ` Martin Langhoff
2006-12-09  2:03                                   ` H. Peter Anvin
2006-12-09  2:52                                     ` Martin Langhoff
2006-12-09  5:09                                       ` H. Peter Anvin
2006-12-09  5:34                                         ` Martin Langhoff
2006-12-09 16:26                                           ` H. Peter Anvin
2006-12-08 16:16                               ` H. Peter Anvin
2006-12-08 16:35                                 ` Linus Torvalds
2006-12-08 16:42                                   ` H. Peter Anvin
2006-12-08 19:49                                     ` Lars Hjemli
2006-12-08 19:51                                       ` H. Peter Anvin
2006-12-08 19:59                                         ` Lars Hjemli
2006-12-08 20:02                                           ` H. Peter Anvin
2006-12-10  9:43                                     ` rda
2006-12-08 16:54                                   ` Jeff Garzik
2006-12-08 17:04                                     ` H. Peter Anvin
2006-12-08 17:40                                       ` Jeff Garzik
2006-12-08 23:27                                     ` Linus Torvalds
2006-12-08 23:46                                       ` Michael K. Edwards
2006-12-08 23:49                                         ` H. Peter Anvin
2006-12-09  0:18                                           ` Michael K. Edwards
2006-12-09  0:23                                             ` H. Peter Anvin
2006-12-09  0:49                                         ` Linus Torvalds
2006-12-09  0:51                                           ` H. Peter Anvin
2006-12-09  4:36                                           ` Michael K. Edwards
2006-12-09  9:27                                           ` Jeff Garzik
     [not found]                                       ` <4579FABC.5070509@garzik.org>
2006-12-09  0:45                                         ` Linus Torvalds
2006-12-09  0:47                                           ` H. Peter Anvin
2006-12-09  9:16                                           ` Jeff Garzik
2006-12-09  1:56                                       ` Martin Langhoff
2006-12-09 11:51                                         ` Jakub Narebski
2006-12-09 12:42                                           ` Jeff Garzik
2006-12-09 13:37                                             ` Jakub Narebski
2006-12-09 14:43                                               ` Jeff Garzik
2006-12-09 17:02                                                 ` Jakub Narebski
2006-12-09 17:27                                                   ` Jeff Garzik
2006-12-10  4:07                                               ` Martin Langhoff
2006-12-10 10:09                                                 ` Jakub Narebski
2006-12-10 12:41                                                   ` Jeff Garzik
2006-12-10 13:02                                                     ` Jakub Narebski
2006-12-10 13:45                                                       ` Jeff Garzik
2006-12-10 19:11                                                         ` Jakub Narebski
2006-12-10 19:50                                                           ` Linus Torvalds
2006-12-10 20:27                                                             ` Jakub Narebski
2006-12-10 20:30                                                               ` Linus Torvalds
2006-12-10 22:01                                                                 ` Martin Langhoff
2006-12-10 22:14                                                                   ` Jeff Garzik
2006-12-10 22:08                                                                 ` Jeff Garzik
2006-12-10 21:01                                                             ` H. Peter Anvin
2006-12-10 22:05                                                           ` Jeff Garzik
2006-12-10 22:59                                                             ` Jakub Narebski
2006-12-11  2:16                                                               ` Martin Langhoff
2006-12-11  8:59                                                                 ` Jakub Narebski
2006-12-11 10:18                                                                   ` Martin Langhoff
2006-12-09 18:04                                             ` Linus Torvalds
2006-12-09 18:30                                               ` H. Peter Anvin
2006-12-10  3:55                                             ` Martin Langhoff
2006-12-10  7:05                                               ` H. Peter Anvin
2006-12-12 21:19                                                 ` Jakub Narebski
2006-12-09  7:56                                       ` Steven Grimm
2006-12-07 19:30                         ` Linus Torvalds
2006-12-07 19:39                           ` Shawn Pearce
2006-12-07 19:58                             ` Linus Torvalds
2006-12-07 23:33                               ` Michael K. Edwards
2006-12-07 19:58                             ` H. Peter Anvin
2006-12-07 20:05                           ` Junio C Hamano
2006-12-07 20:09                             ` H. Peter Anvin
2006-12-07 22:11                               ` Junio C Hamano
2006-12-08  9:43                       ` Jakub Narebski
2006-12-11  3:40 linux
2006-12-11  9:30 ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4579775C.2010608@dawes.za.net \
    --to=discard@dawes.za.net \
    --cc=ftpadmin@kernel.org \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    --cc=jnareb@gmail.com \
    --cc=pasky@ucw.cz \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).