git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Gitweb caching: Google Summer of Code project
@ 2008-05-27 18:03 Lea Wiemann
  2008-05-27 21:53 ` Jakub Narebski
  0 siblings, 1 reply; 18+ messages in thread
From: Lea Wiemann @ 2008-05-27 18:03 UTC (permalink / raw)
  To: git; +Cc: John Hawley, Junio C Hamano, Jakub Narebski

Hi everyone,

I just wanted to let everyone know that I'm currently getting started on 
a Google Summer of Code project to improve the caching mechanism in gitweb.

Sorry for not posting about this earlier...  Anyways, some key data: 
John 'warthog9' Hawley (who wrote the current caching system for 
kernel.org) is my mentor, and GSoC is from May 26 to Aug 18, minus a 
vacation from Jul 19 to Aug 9.

While I'm planning to keep much of it on the list, if anyone else is 
particularly interested in helping or providing input, please notify me. 
  (Looking at the logs, Jakub maybe?  Cc'ing him just in case.)

The current plan is basically to get the gitweb caching fork that's been 
implemented for kernel.org back to the gitweb mainline, and then 
optimize it (probably move to memcached).  I'm not yet sure how to 
approach this (e.g. whether to merge from the fork to the mainline or 
vice versa), but I'll probably figure this out together with John and 
might post separately about that later.  In any case, expect patches and 
messages from me on the list. :)

I'm lea_w (or lea_1) on #git on Freenode, if anyone wants to contact me 
in real time (provided my Pidgin doesn't hiccup).

Best,

     Lea

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-27 18:03 Gitweb caching: Google Summer of Code project Lea Wiemann
@ 2008-05-27 21:53 ` Jakub Narebski
  2008-05-27 22:54   ` Lea Wiemann
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Narebski @ 2008-05-27 21:53 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli

On Tue, 27 May 2008, Lea Wiemann wrote:

> I just wanted to let everyone know that I'm currently getting started on 
> a Google Summer of Code project to improve the caching mechanism in gitweb.
> 
> Sorry for not posting about this earlier...  Anyways, some key data: 
> John 'warthog9' Hawley (who wrote the current caching system for 
> kernel.org) is my mentor, and GSoC is from May 26 to Aug 18, minus a 
> vacation from Jul 19 to Aug 9.

Thanks for the info.

> While I'm planning to keep much of it on the list, if anyone else is 
> particularly interested in helping or providing input, please notify me. 
>   (Looking at the logs, Jakub maybe?  Cc'ing him just in case.)

I'm certainly interested, at least from theoretical point of view, and
I think I can help (as one of main gitweb contributors).

I guess that Petr Baudis would also be interested, because he maintains
repo.or.cz, a public Git hosting site.  Lately he posted a patch
implementing projects list caching, in a bit different way from how it
is done on kernel.org, namely by caching data and not final output:
  http://thread.gmane.org/gmane.comp.version-control.git/77151
AFAIK it is implemented in repo.or.cz gitweb: 
  http://repo.or.cz/w/git/repo.git

This indirectly lead to a bit of research on caching in Perl by yours
truly:
  http://thread.gmane.org/gmane.comp.version-control.git/77529
(mentioned in http://git.or.cz/gitwiki/SoC2008Projects#gitweb-caching).


I think that you can also get some help on caching from Lars Hjemli,
author of cgit, which is caching git web interface written in C.


(I have added both Petr Baudis and Lars Hjemli to Cc:)

> The current plan is basically to get the gitweb caching fork that's been 
> implemented for kernel.org back to the gitweb mainline, and then 
> optimize it (probably move to memcached).  I'm not yet sure how to 
> approach this (e.g. whether to merge from the fork to the mainline or 
> vice versa), but I'll probably figure this out together with John and 
> might post separately about that later.  In any case, expect patches and 
> messages from me on the list. :)

>From what I remember correcly from the discussion surrounding
implementing caching for kernel.org gitweb, the main culprit of having
it remain separate from mainline was splitting gitweb into many, many
files.  While it helped John in understanding gitweb, it made it
difficult to merge changes back to mainline.

Note also that it is easier to make a site-specific changes, than to
make generic, closs-platform and cross-operating system change.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-27 21:53 ` Jakub Narebski
@ 2008-05-27 22:54   ` Lea Wiemann
  2008-05-28 12:14     ` Jakub Narebski
  0 siblings, 1 reply; 18+ messages in thread
From: Lea Wiemann @ 2008-05-27 22:54 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli

Jakub Narebski wrote:
> Lately he posted a patch
> implementing projects list caching, in a bit different way from how it
> is done on kernel.org, namely by caching data and not final output:

Thanks for this and all the other pointers.

Caching data and not final output is actually what I'm about to try 
next.  If I'm not mistaken, the HTML output is significantly larger than 
the source (repository) data; however, kernel.org still seems to benefit 
from caching the HTML, rather than letting Linux' page cache cache the 
source data.  That leads me to think that the page cache somehow fails 
to cache the source data properly -- I'm not sure why (wild speculation: 
perhaps because of the pack format).  Anyway, I'd hope that I can 
encapsulate the 30-40 git_cmd calls in gitweb.perl and somehow cache 
their results (or, to save memory, the parts of their results that are 
actually used) and cache them using memcached.  If that works well, we 
can stop bothering about frontend (HTML) caching, unless CPU becomes an 
issue, since all HTML pages are generated from cacheable source data.

I'm *kindof* hoping that in the end there will be only few issues with 
cache expiry, since most calls are uniquely identified through hashes. 
(And the ones that are not, like getting the hash of the most recent 
commit, can perhaps be cached with some fairly low expiry time.)

So that's what I'll try next.  If you have any comments or warnings off 
the top of your heads, feel free to send email of course. :)

> the main culprit of [the fork] was splitting gitweb into many, many
> files.  While it helped John in understanding gitweb, it made it
> difficult to merge changes back to mainline.

Interesting point, thanks for letting me know.  (I might have gone ahead 
and tried to split the mainline gitweb myself... ^^)  I think it would 
be nice if gitweb.perl could be split at some point, but I assume there 
are too many patches out there for that to be worth the merge problems, 
right?

-- Lea

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-27 22:54   ` Lea Wiemann
@ 2008-05-28 12:14     ` Jakub Narebski
  2008-05-28 18:33       ` Lea Wiemann
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Narebski @ 2008-05-28 12:14 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli

On Wed, 28 May 2008, Lea Wiemann wrote:
> Jakub Narebski wrote:
>>
>> Lately he posted a patch
>> implementing projects list caching, in a bit different way from how it
>> is done on kernel.org, namely by caching data and not final output:
> 
> Thanks for this and all the other pointers.
> 
> Caching data and not final output is actually what I'm about to try 
> next.

Caching data have its advantages and disadvantages, same as with
caching HTML output (or parts of HTML output). I have wrote about
it in
  http://thread.gmane.org/gmane.comp.version-control.git/77529

Let me summarize here advantages and disadvantages of caching data
and of caching HTML output.

1. Caching data
 * advantages:
   - smaller than caching HTML output
   - you can use the same data to generate different pages
     ('summary', 'shortlog', 'log', 'rss'/'atom'; pages or search
      results of projects list)
   - you can generate pages with variable data, such as relative dates
     ("5 minutes ago"), staleness info ("cached data, 5 minutes old"),
     or content type: text/html vs application/xhtml+xml
 * disadvantages:
   - more CPU
   - need to serialize and deserialize (parse) data
   - more complicated

2. Caching HTML output
 * advantages:
   - simple, no need for serialization (pay attention to that in mixed
     data + output caching solutions)
   - low CPU (although supposedly[1] gitweb performance is I/O bound,
     and not CPU bound)
   - web servers deals very well with static pages
   - web servers deals with support for HTTP caching (giving ETag and
     Last-Changed headers, responding to If-Modified-Since, 
     If-None-Match etc. headers from web browsers and caching proxies)
 * disadvantages:
   - large size of cache data (if most clients support compression, you
     can store it compressed, at the cost of CPU for non-supporting ones).
   - difficult to impossible variable output (for example you can still
     rewrite some HTTP headers for text/html vs application/xhtml+xml
     or store headers separately, you can use JavaScript to change
     visible times from absolute to relative dates)

I'm sure John, Lars and Petr can tell you more, and have more experience.

[1] Some evidence both from warthog9 and pasky, but no hard data[2]
[2] I think it would be good to start with analyse of gitweb statictics,
    e.g. from Apache logs, from kernel.org and repo.or.cz.

> If I'm not mistaken, the HTML output is significantly larger than  
> the source (repository) data; however, kernel.org still seems to benefit 
> from caching the HTML, rather than letting Linux' page cache cache the 
> source data.

I don't think kernel.org caches _all_ pages, only the most requested
(correct me if I'm wrong here, John, please).

> That leads me to think that the page cache somehow fails  
> to cache the source data properly -- I'm not sure why (wild speculation: 
> perhaps because of the pack format).

>From what I remember one of most costly to generate pages is projects
list page (that is why Petr Baudis implemented caching for this page
in repo.or.cz gitweb, using data caching here).  With 1000+ projects
(repositories) gitweb has to hit at best 1000+ packfiles, not to
mention refs, to generate "Last Changed" column from git-for-each-ref
output (accidentally, also to check if it is truly git repository).
In kernel.org case with gitweb working similar to mod_userdir module
but for git repositories (as a service, rather than as part of repo
hosting), gitweb has to hit 1000+ 'summary' files...  That is
interspersed with other requests.

How page cache and filesystem buffers can deal with that?

BTW I'm not sure if kernel.org use CGI or "legacy" mod_perl gitweb;
curently there is no support for FastCGI in gitweb (although you can
find some patches in archive).

(But I'm not an expert in those matters, so please take the above
with a pinch of salt, or two).


By the way using pack files besides reducing repository size also
improved git performance thanks to better I/O performance and better
working with filesystem cache (some say that git is optimized for
warm cache).

> Anyway, I'd hope that I can  
> encapsulate the 30-40 git_cmd calls in gitweb.perl and somehow cache 
> their results (or, to save memory, the parts of their results that are 
> actually used) and cache them using memcached.  If that works well, we 
> can stop bothering about frontend (HTML) caching, unless CPU becomes an 
> issue, since all HTML pages are generated from cacheable source data.

I don't think caching _everything_, including rarely requested pages,
would be a good idea.

> I'm *kindof* hoping that in the end there will be only few issues with 
> cache expiry, since most calls are uniquely identified through hashes. 
> (And the ones that are not, like getting the hash of the most recent 
> commit, can perhaps be cached with some fairly low expiry time.)

The trouble is with those requests which are _not_ uniquely identified
by hashes requested, such as 'summary', 'log' from given branch (not
from given hash), or web feed for given branch.  For those which are
not-changing you can just (as gitweb does even now) give large HTTP
expiry (Expires or max-age) and allow web browser or proxies to cache
it.

> So that's what I'll try next.  If you have any comments or warnings off 
> the top of your heads, feel free to send email of course. :)

I'm afraid that implementing kernel.org caching in mainline in
a generic way would be enough work for a whole GSoC 2008.  I hope
I am mistaken and you would have time to analyse and implement wider
reange of caching solutions in gitweb...
 
>> the main culprit of [the fork] was splitting gitweb into many, many
>> files.  While it helped John in understanding gitweb, it made it
>> difficult to merge changes back to mainline.
> 
> Interesting point, thanks for letting me know.  (I might have gone ahead 
> and tried to split the mainline gitweb myself... ^^)  I think it would 
> be nice if gitweb.perl could be split at some point, but I assume there 
> are too many patches out there for that to be worth the merge problems, 
> right?

On one hand gitweb.perl in single file makes it easy to install; on the
other hand if it was split into modules (like git-gui now is) it would
I think be easier to understand and modify... I think however that it
would be better to first make gitweb use Git.pm, adding improving Git.pm
when necessary (for example adding eager config parsing used in gitweb,
i.e. read whole config into Perl hash at first request, then access hash
instead of further calls to git-config).

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-28 12:14     ` Jakub Narebski
@ 2008-05-28 18:33       ` Lea Wiemann
  2008-05-29 23:27         ` Jakub Narebski
  0 siblings, 1 reply; 18+ messages in thread
From: Lea Wiemann @ 2008-05-28 18:33 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli

Jakub Narebski wrote:
> 1. Caching data
>  * disadvantages:
>    - more CPU
>    - need to serialize and deserialize (parse) data
>    - more complicated

CPU: John told me that so far CPU has *never* been an issue on k.org. 
Unless someone tells me they've had CPU problems, I'll assume that CPU 
is a non-issue until I actually run into it (and then I can optimize the 
particular pieces where CPU is actually an issue).

Serialization: I was planning to use Storable (memcached's Perl API uses 
it transparently I think).  I'm hoping that this'll just solve it.

It's true that it's more complicated.  It'll require quite a bit of 
refactoring, and maybe I'll just back off if I find that it's too hard.

> I'm afraid that implementing kernel.org caching in mainline in
> a generic way would be enough work for a whole GSoC 2008.

I probably won't reimplement the current caching mechanism.  Do you 
think that a solution using memcached is generic enough?  I'll still 
need to add some abstraction layer in the code, but when I'm finished 
the user will either get the normal uncached gitweb, or activate 
memcached caching with some configuration setting.

By the way, I'll be posting about gitweb on this mailing list 
occasionally.  If any of you would like to receive CC's on such 
messages, please let me know, otherwise I'll assume you get them through 
the mailing list.

-- Lea

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-28 18:33       ` Lea Wiemann
@ 2008-05-29 23:27         ` Jakub Narebski
  2008-05-30  7:24           ` Lea Wiemann
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Narebski @ 2008-05-29 23:27 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli

On Wed, 28 May 2008, Lea Wiemann wrote:
> Jakub Narebski wrote:
> >
> > 1. Caching data
> >  * disadvantages:
> >    - more CPU
> >    - need to serialize and deserialize (parse) data
> >    - more complicated
> 
> CPU: John told me that so far CPU has *never* been an issue on k.org. 
> Unless someone tells me they've had CPU problems, I'll assume that CPU 
> is a non-issue until I actually run into it (and then I can optimize the 
> particular pieces where CPU is actually an issue).

True.

What you have to care about (although I don't think it would be
partilcularly difficult) is to not repeat bad I/O patterns with
cache...

> Serialization: I was planning to use Storable (memcached's Perl API uses 
> it transparently I think).  I'm hoping that this'll just solve it.

While Storable is part of, I think, any modern Perl installation, there
might be problem with memcached API, and memcached API wrappers such as
CHI one.  Namely you cannot assume that memcached API is installed, so
you have to provide some kind of fallback.
 
> It's true that it's more complicated.  It'll require quite a bit of 
> refactoring, and maybe I'll just back off if I find that it's too hard.

What's more, if you want to implement If-Modified-Since and
If-None-Match, you would have to implement it by yourself, while
for static pages (cahing HTML output) web server would do this
for us "for free".

> > I'm afraid that implementing kernel.org caching in mainline in
> > a generic way would be enough work for a whole GSoC 2008.
> 
> I probably won't reimplement the current caching mechanism.  Do you 
> think that a solution using memcached is generic enough?  I'll still 
> need to add some abstraction layer in the code, but when I'm finished 
> the user will either get the normal uncached gitweb, or activate 
> memcached caching with some configuration setting.

Thats good enough, although I think that current caching mechanism in
kernel.org's gitweb (your implementation follows more what repo.or.cz's
gitweb does) has some good ideas, like for example adaptive (depending
on load) expiry time.

By the way what do you think about adding (as an option) information
about gitweb performance to the output, in the form of
  "Site generated in 0.01 seconds, 2 calls to git commands"
or
  "Site generated in 0.0023 seconds, cached output, 1m31s old"
line somewhere in the page footer?

I hope you have some ideas in gitweb access statistics from kernel.org,
repo.or.cz, and perhaps other large git hosting sites (e.g.
freedesktop.org), and you plan on benchamrking gitweb caching using
average / amortized time to generate page, ApacheBench or equivalent,
load average on server depending on number of requests, I/O load (using
fio tool, for example) depending on number of requests etc.

> By the way, I'll be posting about gitweb on this mailing list 
> occasionally.  If any of you would like to receive CC's on such 
> messages, please let me know, otherwise I'll assume you get them through 
> the mailing list.

I read git mailing list via Usenet / news interface (NNTP gateway) from
GMane. 

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-29 23:27         ` Jakub Narebski
@ 2008-05-30  7:24           ` Lea Wiemann
  2008-05-30 10:02             ` Jakub Narebski
  0 siblings, 1 reply; 18+ messages in thread
From: Lea Wiemann @ 2008-05-30  7:24 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli

Jakub Narebski wrote:
> you cannot assume that memcached API is installed, so
> you have to provide some kind of fallback.

That fallback would be to have no caching. :)  I think that's acceptable 
-- I'm not too willing to implement caching for two API's. 
(Incidentally, memcached takes two shell commands to install and get 
running on my machine; I think that's acceptably easy.)

> What's more, if you want to implement If-Modified-Since and
> If-None-Match, you would have to implement it by yourself, while
> for static pages (cahing HTML output) web server would do this
> for us "for free".

Are web servers doing anything that we can't easily reimplement in a few 
lines (and, on top of that, more easily tailored to different actions, 
projects, etc.)?

> By the way what do you think about adding (as an option) information
> about gitweb performance to the [HTML] output,

Definitely a good idea!

> I hope you have some ideas in gitweb access statistics from kernel.org,

I'm waiting for John to give me SSH access and/or send them my way. :)

> and you plan on benchamrking gitweb caching using [snip]

Absolutely -- thanks for the suggestions!

-- Lea

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-30  7:24           ` Lea Wiemann
@ 2008-05-30 10:02             ` Jakub Narebski
  2008-05-30 14:59               ` Lea Wiemann
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Narebski @ 2008-05-30 10:02 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli

On Fri, 30 May 2008, Lea Wiemann wrote:
> Jakub Narebski wrote:
> >
> > you cannot assume that memcached API is installed, so
> > you have to provide some kind of fallback.
> 
> That fallback would be to have no caching.  I think that's acceptable 
> -- I'm not too willing to implement caching for two API's.

I hope that you would make a wrapper around memcached (caching engine)
API (it's a pity we cannot use CHI unified Perl caching interface),
so it would be easy for example to change to filesystem based cache,
or size aware filesystem based cache, or mmap, etc...  I mean here
that even if you don't implement two caching API's at least make it
possible to easy change caching backend.

Note also that memcached may not have sense for single machine
(single server installation), and does not make sense for memory
starved machines... and one can want gitweb caching even in that
situation.

> (Incidentally, memcached takes two shell commands to install and get 
> running on my machine; I think that's acceptably easy.)

As John 'Warthog9' said wrt. using additional Perl modules for gitweb
caching, most sites that are used as web servers (and gitweb servers)
have strict requirements on stability of installed programs, libraries
and modules.  IIRC the policy usually is that one can install packages
from main (base) repository for Linux distribution used on server, also
from extras repository; sometimes from trusted contrib package
repository.  Modules which are only in CPAN, and programs which require
compilation are out of the question, unfortunately.

I think there is no problem wrt. memcached itself, I'm not so sure
about Perl APIs: Cache::Memcached and/or Cache::Memcached::Fast (and
optionally appropriate CHI modules/backends).
 
> > What's more, if you want to implement If-Modified-Since and
> > If-None-Match, you would have to implement it by yourself, while
> > for static pages (cahing HTML output) web server would do this
> > for us "for free".
> 
> Are web servers doing anything that we can't easily reimplement in a few 
> lines (and, on top of that, more easily tailored to different actions, 
> projects, etc.)?

Can we reimplement it?  I think we can.  Easily?  I'm not sure.  
HTTP/1.0 If-Modified-Since should be failry easy; it would be harder
to support fully and correctly ETag (weak vs. strong tags),
If-None-Match (from web browsers I think), If-Match (from web caches)
it would take some work.

> > By the way what do you think about adding (as an option) information
> > about gitweb performance to the [HTML] output,
> 
> Definitely a good idea!

I'd try to add it when I'd have a bot more of free time; unless you
would do this first.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-30 10:02             ` Jakub Narebski
@ 2008-05-30 14:59               ` Lea Wiemann
  2008-05-30 15:07                 ` Petr Baudis
  0 siblings, 1 reply; 18+ messages in thread
From: Lea Wiemann @ 2008-05-30 14:59 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli

Jakub Narebski wrote:
> even if you don't implement two caching API's at least make it
> possible to easy change caching backend.

Sure, I'll keep that in mind.

> Note also that memcached may not have sense for single machine [...],
>  and does not make sense for memory starved machines...

For single machines, memcached certainly works fine.  For on 
memory-starved machines with HD caches, you'd have to cache the 
aggregate HTML data, not the data in the backend.  So as long as I'm 
working on the backend (repository) cache, memcached should be fine.

> IIRC the policy usually is that one can install packages
> from main (base) repository for Linux distribution used on server,

libcache-memcached-perl is in Debian stable; that's fair enough I think. 
  Cache::Memcached::Fast doesn't seem to be in Debian as of now, but I 
wouldn't worry about performance unless it comes up.

>>> By the way what do you think about adding (as an option) information
>>> about gitweb performance to the [HTML] output,
> 
> I'd try to add it when I'd have a bot more of free time

I'd probably wait with this until I've written the Perl Git API.

-- Lea

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-30 14:59               ` Lea Wiemann
@ 2008-05-30 15:07                 ` Petr Baudis
  2008-05-30 15:27                   ` Lea Wiemann
  0 siblings, 1 reply; 18+ messages in thread
From: Petr Baudis @ 2008-05-30 15:07 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: Jakub Narebski, git, John Hawley, Junio C Hamano, Lars Hjemli

On Fri, May 30, 2008 at 04:59:56PM +0200, Lea Wiemann wrote:
> Jakub Narebski wrote:
>> IIRC the policy usually is that one can install packages
>> from main (base) repository for Linux distribution used on server,
>
> libcache-memcached-perl is in Debian stable; that's fair enough I think.  
> Cache::Memcached::Fast doesn't seem to be in Debian as of now, but I 
> wouldn't worry about performance unless it comes up.

Still, please make this optional. It is fine for gitweb not to do any
caching in the bare setup, but you should be able to get the simple
version running without any external dependencies.

>>>> By the way what do you think about adding (as an option) information
>>>> about gitweb performance to the [HTML] output,
>> I'd try to add it when I'd have a bot more of free time
>
> I'd probably wait with this until I've written the Perl Git API.

Hmm, it shouldn't depend on that in any way, should it?

use Time::HiRes qw(gettimeofday tv_interval);
my $t0 = [gettimeofday];
...
print "<p>This page took ".tv_interval($t0, [gettimeofday])."s to generate.</p>";

I wonder what oldest Perl versions do we aim to support? If <5.8, we
need to be more careful about Time::HiRes. It would be useful to
document this with a use perl statement at the top of the script.

-- 
				Petr "Pasky" Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.	-- J. W. von Goethe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-30 15:07                 ` Petr Baudis
@ 2008-05-30 15:27                   ` Lea Wiemann
  2008-05-30 15:38                     ` Petr Baudis
  2008-05-31 10:15                     ` Jakub Narebski
  0 siblings, 2 replies; 18+ messages in thread
From: Lea Wiemann @ 2008-05-30 15:27 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Jakub Narebski, git, John Hawley, Junio C Hamano, Lars Hjemli

Petr Baudis wrote:
> please make [memcached] optional.

Oh, absolutely.  I was definitely planning to keep Gitweb runnable 
without having Cache::Memcached installed.

> print "<p>This page took ".tv_interval($t0, [gettimeofday])."s to generate.</p>";

Sure -- I'm not sure how useful bare timings are, though.  When I look 
at individual pages, the page cache is usually warm anyway, so the only 
thing I might be interested in is advanced statistics like the number of 
calls to git or number of cache hits/misses.  To find out how the cache 
performs timing-wise, you'll have to do larger benchmarks, individual 
page generation times won't help that much.

> I wonder what oldest Perl versions do we aim to support?

I'm thinking about 5.8 or 5.10.  Looking at Debian, Perl 5.10 is not in 
stable (etch), but it's in lenny, which is planned to become stable in 
Sept. 08.  So by the time the updated Gitweb/Git.pm has stabilized (and 
shows up as a package in Debian), Perl 5.10 will definitely be available 
widely enough.

-- Lea

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-30 15:27                   ` Lea Wiemann
@ 2008-05-30 15:38                     ` Petr Baudis
  2008-05-30 16:04                       ` Rafael Garcia-Suarez
  2008-05-30 18:47                       ` Lea Wiemann
  2008-05-31 10:15                     ` Jakub Narebski
  1 sibling, 2 replies; 18+ messages in thread
From: Petr Baudis @ 2008-05-30 15:38 UTC (permalink / raw)
  To: Lea Wiemann; +Cc: Jakub Narebski, git, John Hawley, Junio C Hamano, Lars Hjemli

On Fri, May 30, 2008 at 05:27:59PM +0200, Lea Wiemann wrote:
> Petr Baudis wrote:
>> I wonder what oldest Perl versions do we aim to support?
>
> I'm thinking about 5.8 or 5.10.  Looking at Debian, Perl 5.10 is not in 
> stable (etch), but it's in lenny, which is planned to become stable in 
> Sept. 08.  So by the time the updated Gitweb/Git.pm has stabilized (and 
> shows up as a package in Debian), Perl 5.10 will definitely be available 
> widely enough.

Wow, and here I was wondering if requiring at least 5.6 was not too
liberal. ;-) I believe 5.8 is the newest possible candidate though, it
is still too widespread; e.g. Debian-wise, many servers run on Etch and
are going to stay there even for quite some time after Lenny gets
released. Heck, I still have accounts on plenty of Sarge machines. ;-)
(Sarge seems to have Perl-5.8.4.)

-- 
				Petr "Pasky" Baudis
Whatever you can do, or dream you can, begin it.
Boldness has genius, power, and magic in it.	-- J. W. von Goethe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-30 15:38                     ` Petr Baudis
@ 2008-05-30 16:04                       ` Rafael Garcia-Suarez
  2008-05-30 18:56                         ` J.H.
  2008-05-30 18:47                       ` Lea Wiemann
  1 sibling, 1 reply; 18+ messages in thread
From: Rafael Garcia-Suarez @ 2008-05-30 16:04 UTC (permalink / raw)
  To: Petr Baudis
  Cc: Lea Wiemann, Jakub Narebski, git, John Hawley, Junio C Hamano,
	Lars Hjemli

2008/5/30 Petr Baudis <pasky@suse.cz>:
>
> Wow, and here I was wondering if requiring at least 5.6 was not too
> liberal. ;-) I believe 5.8 is the newest possible candidate though, it
> is still too widespread; e.g. Debian-wise, many servers run on Etch and
> are going to stay there even for quite some time after Lenny gets
> released. Heck, I still have accounts on plenty of Sarge machines. ;-)
> (Sarge seems to have Perl-5.8.4.)

I think 5.8.2 is a good _minimum_ perl to support. Before that one,
Unicode support is next to null (5.6 and below) or too buggy, and
gitweb needs that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-30 15:38                     ` Petr Baudis
  2008-05-30 16:04                       ` Rafael Garcia-Suarez
@ 2008-05-30 18:47                       ` Lea Wiemann
  1 sibling, 0 replies; 18+ messages in thread
From: Lea Wiemann @ 2008-05-30 18:47 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Jakub Narebski, git, John Hawley, Junio C Hamano, Lars Hjemli

Petr Baudis wrote:
> [5.8] is still too widespread;

Okay; I'll keep testing Git.pm and Gitweb with Perl 5.8 then.

-- Lea

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-30 16:04                       ` Rafael Garcia-Suarez
@ 2008-05-30 18:56                         ` J.H.
  2008-05-30 20:28                           ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: J.H. @ 2008-05-30 18:56 UTC (permalink / raw)
  To: Rafael Garcia-Suarez
  Cc: Petr Baudis, Lea Wiemann, Jakub Narebski, git, Junio C Hamano,
	Lars Hjemli

I would agree - lets try and shoot for 5.8 as a baseline minimum (there
are lots of people who are slow to upgrade, and it would be nice to be
able for them to make use of newer gitweb's on things like Centos / RHEL
4

- John


On Fri, 2008-05-30 at 18:04 +0200, Rafael Garcia-Suarez wrote:
> 2008/5/30 Petr Baudis <pasky@suse.cz>:
> >
> > Wow, and here I was wondering if requiring at least 5.6 was not too
> > liberal. ;-) I believe 5.8 is the newest possible candidate though, it
> > is still too widespread; e.g. Debian-wise, many servers run on Etch and
> > are going to stay there even for quite some time after Lenny gets
> > released. Heck, I still have accounts on plenty of Sarge machines. ;-)
> > (Sarge seems to have Perl-5.8.4.)
> 
> I think 5.8.2 is a good _minimum_ perl to support. Before that one,
> Unicode support is next to null (5.6 and below) or too buggy, and
> gitweb needs that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-30 18:56                         ` J.H.
@ 2008-05-30 20:28                           ` Junio C Hamano
  2008-05-30 21:32                             ` Lea Wiemann
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2008-05-30 20:28 UTC (permalink / raw)
  To: J.H.
  Cc: Rafael Garcia-Suarez, Petr Baudis, Lea Wiemann, Jakub Narebski,
	git, Lars Hjemli

"J.H." <warthog19@eaglescrag.net> writes:

> On Fri, 2008-05-30 at 18:04 +0200, Rafael Garcia-Suarez wrote:
>> 2008/5/30 Petr Baudis <pasky@suse.cz>:
>> >
>> > Wow, and here I was wondering if requiring at least 5.6 was not too
>> > liberal. ;-) I believe 5.8 is the newest possible candidate though, it
>> > is still too widespread; e.g. Debian-wise, many servers run on Etch and
>> > are going to stay there even for quite some time after Lenny gets
>> > released. Heck, I still have accounts on plenty of Sarge machines. ;-)
>> > (Sarge seems to have Perl-5.8.4.)
>> 
>> I think 5.8.2 is a good _minimum_ perl to support. Before that one,
>> Unicode support is next to null (5.6 and below) or too buggy, and
>> gitweb needs that.

> I would agree - lets try and shoot for 5.8 as a baseline minimum (there
> are lots of people who are slow to upgrade, and it would be nice to be
> able for them to make use of newer gitweb's on things like Centos / RHEL
> 4

I do not think it is unreasonable to require recent Perl for a machine
that runs gitweb, as it is not something you would run on your "customer
site that needs to be ultra sta(b)le" nor on your "development machine
that needs to run the same version as that ultra sta(b)le customer
installation."  In other words, gitweb is primarily a developer tool, and
you can assume that people can afford to have a dedicated machine they can
update its Perl to recent version.

However, introducing dependency on 5.8 to any and all Git.pm users may
have a much wider impact.  Right now, these "use Git":

    git-add--interactive.perl
    git-cvsexportcommit.perl
    git-send-email.perl
    git-svn.perl

If you are doing development for some customer application whose end
product needs to land on a machine with a pre-5.8 Perl, it is conceivable
that you may pin the Perl running on that development machine to that old
version, say 5.6.  Introducing 5.8 dependency to Git.pm in such a way that
"use Git" from these fail might make these people somewhat unhappy.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-30 20:28                           ` Junio C Hamano
@ 2008-05-30 21:32                             ` Lea Wiemann
  0 siblings, 0 replies; 18+ messages in thread
From: Lea Wiemann @ 2008-05-30 21:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: J.H., Rafael Garcia-Suarez, Petr Baudis, Jakub Narebski, git,
	Lars Hjemli

Junio C Hamano wrote:
> Right now, these "use Git": git-add--interactive.perl
> git-cvsexportcommit.perl git-send-email.perl git-svn.perl
> 
> Introducing 5.8 dependency to Git.pm in such a way that
> "use Git" from these fail might make these people somewhat unhappy.

Git seems to generally work with Perl 5.6 after installing Scalar::Util 
through CPAN.  I'm happy with (sporadically) testing it with 5.6.2, 
though I don't have any older version installed.  Also, if at some point 
Perl 5.6 compatibility gets in the way (due to lack of Unicode support), 
we'll have to revisit this issue, but for now that should be fine.

Gitweb relies on Unicode support (e.g. "use Encode") and will continue 
to be compatible with 5.8 and 5.10 only.

I'm not sure how much changes between Perl's micro versions.  Should we 
boldly claim "use 5.6.0" or only "use 5.6.2"?  Are people still using 
versions 5.6.0/1 at all?  And for Gitweb, use 5.8.0, or use 5.8.8 (which 
is the version I'm testing with, currently)?  Or should I downgrade?

-- Lea

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Gitweb caching: Google Summer of Code project
  2008-05-30 15:27                   ` Lea Wiemann
  2008-05-30 15:38                     ` Petr Baudis
@ 2008-05-31 10:15                     ` Jakub Narebski
  1 sibling, 0 replies; 18+ messages in thread
From: Jakub Narebski @ 2008-05-31 10:15 UTC (permalink / raw)
  To: Lea Wiemann, Petr Baudis; +Cc: git, John Hawley, Junio C Hamano, Lars Hjemli

On Fri, 30 May 2008, Lea Wiemann wrote:
> Petr Baudis wrote:
> >
> > please make [memcached] optional.
> 
> Oh, absolutely.  I was definitely planning to keep Gitweb runnable 
> without having Cache::Memcached installed.

I think the idea was to have the following options:
 * cache using memcached (Cache::Memcached installed, and memcached on)
 * cache using filesystem, perhaps size aware (with limited cache size)
 * no caching

It is quite possible that one would want/need gitweb caching, but
either does not want hassle with memcached, or memcached is not
feasible (for example memory starved machine).

> > print "<p>This page took ".tv_interval($t0, [gettimeofday])."s to generate.</p>";
> 
> Sure -- I'm not sure how useful bare timings are, though.  When I look 
> at individual pages, the page cache is usually warm anyway, so the only 
> thing I might be interested in is advanced statistics like the number of 
> calls to git

This should be fairly easy, just modify git_cmd() to count number of
calls.

> or number of cache hits/misses.

And this I don't think it would be easy.

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-05-31 10:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-27 18:03 Gitweb caching: Google Summer of Code project Lea Wiemann
2008-05-27 21:53 ` Jakub Narebski
2008-05-27 22:54   ` Lea Wiemann
2008-05-28 12:14     ` Jakub Narebski
2008-05-28 18:33       ` Lea Wiemann
2008-05-29 23:27         ` Jakub Narebski
2008-05-30  7:24           ` Lea Wiemann
2008-05-30 10:02             ` Jakub Narebski
2008-05-30 14:59               ` Lea Wiemann
2008-05-30 15:07                 ` Petr Baudis
2008-05-30 15:27                   ` Lea Wiemann
2008-05-30 15:38                     ` Petr Baudis
2008-05-30 16:04                       ` Rafael Garcia-Suarez
2008-05-30 18:56                         ` J.H.
2008-05-30 20:28                           ` Junio C Hamano
2008-05-30 21:32                             ` Lea Wiemann
2008-05-30 18:47                       ` Lea Wiemann
2008-05-31 10:15                     ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).