git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Jeff Garzik <jeff@garzik.org>
Cc: Martin Langhoff <martin.langhoff@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Linus Torvalds <torvalds@osdl.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Rogan Dawes <discard@dawes.za.net>,
	Kernel Org Admin <ftpadmin@kernel.org>
Subject: Re: kernel.org mirroring (Re: [GIT PULL] MMC update)
Date: Sun, 10 Dec 2006 23:59:19 +0100	[thread overview]
Message-ID: <200612102359.20083.jnareb@gmail.com> (raw)
In-Reply-To: <457C84AC.7060105@garzik.org>

Jeff Garzik wrote:
> Jakub Narebski wrote:
>>
>> And in CGI standard there is a way to access additional HTTP headers
>> info from CGI script: the envirionmental variables are HTTP_HEADER,
>> for example if browser sent If-Modified-Since: header it's value
>> can be found in HTTP_IF_MODIFIED_SINCE environmental variable.
> 
> The CGI spec does not at all guarantee that the CGI environment will 
> contain all the HTTP headers sent by the client.  That was the point of 
> the environment dump script -- you can see exactly which headers are, 
> and are not, passed through to CGI.
> 
> CGI only /guarantees/ a bare minimum (things like QUERY_STRING, 
> PATH_INFO, etc.)
> 
> Even basic server info environment variables are optional.

I have checked that at least Apache 2.0.54 passes HTTP_IF_MODIFIED_SINCE
when getting If-Modified-Since: header (my own script + netcat/nc).
 
>> It is ETag, not E-tag. Besides, I don't see what the attached script is
>> meant to do: it does not output the sample file anyway.
> 
> It's not meant to output the sample file.  It outputs the server 
> metadata sent to the CGI script (the environment variables).  The sample 
> file was simply a way to play around with etag and last-modified metadata.

Ah. 
 
>> The idea is of course to stop processing in CGI script / mod_perl script
>> as soon as possible if cache validates.
> 
> Certainly.  That should help cut down on I/O.  FWIW though the projects 
> list is particularly painful, with its File::Find call, which you'll 
> need to do in order to return 304-not-modified.

First, it is better to use $projects_list which is projects index file
in the format:
  <project path> SPC <project owner>
where <project path> is relative to $projectroot and is URI encoded; well
at least SPC has to be URI (percent) encoded. <project owner> is owner
of given project, and is also URI encoded (one would usually use '+' in
the place of SPC here).

Gitweb now can generate projects list in above format, by using
"project_index" action ("a=project_index" query string), or by clicking
'TXT' link at the bottom of the projects list page in new gitweb: see
http://repo.or.cz by Petr Baudis. The problem is that it generates
projects list from the list of projects it sees, so to generate it from
scratch from the filesystem you have for generating "project_index"
to have $projects_list a directory (changing it to something that
evals to false, e.g. undef or "" makes gitweb use $projectroot for
$projects_list). I have posted how to do this.

The project list changes rarely, only on addition/removal of project,
and on changing owner of project; so it can be generated on demand.


Second, even with $projects_list being set to projects index file
as of now gitweb runs git-for-each-ref (which scans refs and access
pack file for commit date), checks for description file and reads it;
for $projects_list being directory it also checks project directory
owner. I plan to make it configurable to read last activity from
all heads (all branches) as it is now, from HEAD (current branch)
as it was before, or given branch (for example 'master').

Assuming that gitweb is configured to read last activity from single
defined branch, generating ETag = checksum(sha1 of heads of projects)
needs at least read one file from each project.
 
>> I don't know if Apache intercepts and remembers ETag and Last-Modified
>> headers, adds 304 Not Modified HTTP response on finding that cache validates
>> and cuts out CGI script output. I.e. if browser provided If-Modified-Since:,
>> script wrote Last-Modified: header, If-Modified-Since: is no earlier than
>> Last-Modified: (usually is equal in the case of cache validation), then
>> Apache provides 304 Not Modified response instead of CGI script output.
> 
> This wanders into the realm of mod_cache configuration, I think.  (which 
> I have tried to get working as reverse proxy, and failed serveral times) 
>   If you are not using mod_*_cache, then Apache must execute the CGI 
> script every time AFAICS, regardless of etag/[if-]last-mod headers.

No, it wanders into realm of header parsing by Apache, and NPH (No Parse
Headers) option.

Even if Apache does execute CGI script to completion every time, it might
not send the output of the script, but HTTP 304 Not Modified reply. Might.
I don't know if it does.

-- 
Jakub Narebski

  reply	other threads:[~2006-12-10 22:57 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <45708A56.3040508@drzeus.cx>
     [not found] ` <Pine.LNX.4.64.0612011639240.3695@woody.osdl.org>
     [not found]   ` <457151A0.8090203@drzeus.cx>
     [not found]     ` <Pine.LNX.4.64.0612020835110.3476@woody.osdl.org>
     [not found]       ` <45744FA3.7020908@zytor.com>
     [not found]         ` <Pine.LNX.4.64.0612061847190.3615@woody.osdl.org>
     [not found]           ` <45778AA3.7080709@zytor.com>
     [not found]             ` <Pine.LNX.4.64.0612061940170.3615@woody.osdl.org>
     [not found]               ` <4577A84C.3010601@zytor.com>
     [not found]                 ` <Pine.LNX.4.64.0612070953290.3615@woody.osdl.org>
     [not found]                   ` <45785697.1060001@zytor.com>
2006-12-07 19:05                     ` kernel.org mirroring (Re: [GIT PULL] MMC update) Linus Torvalds
2006-12-07 19:16                       ` H. Peter Anvin
2006-12-07 19:30                         ` Olivier Galibert
2006-12-07 19:57                           ` H. Peter Anvin
2006-12-07 23:50                             ` Olivier Galibert
2006-12-07 23:56                               ` H. Peter Anvin
2006-12-08 11:25                               ` Jakub Narebski
2006-12-08 12:57                             ` Rogan Dawes
2006-12-08 13:38                               ` Jakub Narebski
2006-12-08 14:31                                 ` Rogan Dawes
2006-12-08 15:38                                   ` Jonas Fonseca
2006-12-09  1:28                                 ` Martin Langhoff
2006-12-09  2:03                                   ` H. Peter Anvin
2006-12-09  2:52                                     ` Martin Langhoff
2006-12-09  5:09                                       ` H. Peter Anvin
2006-12-09  5:34                                         ` Martin Langhoff
2006-12-09 16:26                                           ` H. Peter Anvin
2006-12-08 16:16                               ` H. Peter Anvin
2006-12-08 16:35                                 ` Linus Torvalds
2006-12-08 16:42                                   ` H. Peter Anvin
2006-12-08 19:49                                     ` Lars Hjemli
2006-12-08 19:51                                       ` H. Peter Anvin
2006-12-08 19:59                                         ` Lars Hjemli
2006-12-08 20:02                                           ` H. Peter Anvin
2006-12-10  9:43                                     ` rda
2006-12-08 16:54                                   ` Jeff Garzik
2006-12-08 17:04                                     ` H. Peter Anvin
2006-12-08 17:40                                       ` Jeff Garzik
2006-12-08 23:27                                     ` Linus Torvalds
2006-12-08 23:46                                       ` Michael K. Edwards
2006-12-08 23:49                                         ` H. Peter Anvin
2006-12-09  0:18                                           ` Michael K. Edwards
2006-12-09  0:23                                             ` H. Peter Anvin
2006-12-09  0:49                                         ` Linus Torvalds
2006-12-09  0:51                                           ` H. Peter Anvin
2006-12-09  4:36                                           ` Michael K. Edwards
2006-12-09  9:27                                           ` Jeff Garzik
     [not found]                                       ` <4579FABC.5070509@garzik.org>
2006-12-09  0:45                                         ` Linus Torvalds
2006-12-09  0:47                                           ` H. Peter Anvin
2006-12-09  9:16                                           ` Jeff Garzik
2006-12-09  1:56                                       ` Martin Langhoff
2006-12-09 11:51                                         ` Jakub Narebski
2006-12-09 12:42                                           ` Jeff Garzik
2006-12-09 13:37                                             ` Jakub Narebski
2006-12-09 14:43                                               ` Jeff Garzik
2006-12-09 17:02                                                 ` Jakub Narebski
2006-12-09 17:27                                                   ` Jeff Garzik
2006-12-10  4:07                                               ` Martin Langhoff
2006-12-10 10:09                                                 ` Jakub Narebski
2006-12-10 12:41                                                   ` Jeff Garzik
2006-12-10 13:02                                                     ` Jakub Narebski
2006-12-10 13:45                                                       ` Jeff Garzik
2006-12-10 19:11                                                         ` Jakub Narebski
2006-12-10 19:50                                                           ` Linus Torvalds
2006-12-10 20:27                                                             ` Jakub Narebski
2006-12-10 20:30                                                               ` Linus Torvalds
2006-12-10 22:01                                                                 ` Martin Langhoff
2006-12-10 22:14                                                                   ` Jeff Garzik
2006-12-10 22:08                                                                 ` Jeff Garzik
2006-12-10 21:01                                                             ` H. Peter Anvin
2006-12-10 22:05                                                           ` Jeff Garzik
2006-12-10 22:59                                                             ` Jakub Narebski [this message]
2006-12-11  2:16                                                               ` Martin Langhoff
2006-12-11  8:59                                                                 ` Jakub Narebski
2006-12-11 10:18                                                                   ` Martin Langhoff
2006-12-09 18:04                                             ` Linus Torvalds
2006-12-09 18:30                                               ` H. Peter Anvin
2006-12-10  3:55                                             ` Martin Langhoff
2006-12-10  7:05                                               ` H. Peter Anvin
2006-12-12 21:19                                                 ` Jakub Narebski
2006-12-09  7:56                                       ` Steven Grimm
2006-12-07 19:30                         ` Linus Torvalds
2006-12-07 19:39                           ` Shawn Pearce
2006-12-07 19:58                             ` Linus Torvalds
2006-12-07 23:33                               ` Michael K. Edwards
2006-12-07 19:58                             ` H. Peter Anvin
2006-12-07 20:05                           ` Junio C Hamano
2006-12-07 20:09                             ` H. Peter Anvin
2006-12-07 22:11                               ` Junio C Hamano
2006-12-08  9:43                       ` Jakub Narebski
2006-12-11  3:40 linux
2006-12-11  9:30 ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200612102359.20083.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=discard@dawes.za.net \
    --cc=ftpadmin@kernel.org \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    --cc=jeff@garzik.org \
    --cc=martin.langhoff@gmail.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).