* Gitweb caching: Google Summer of Code project @ 2008-05-27 18:03 Lea Wiemann 2008-05-27 21:53 ` Jakub Narebski 0 siblings, 1 reply; 18+ messages in thread From: Lea Wiemann @ 2008-05-27 18:03 UTC (permalink / raw) To: git; +Cc: John Hawley, Junio C Hamano, Jakub Narebski Hi everyone, I just wanted to let everyone know that I'm currently getting started on a Google Summer of Code project to improve the caching mechanism in gitweb. Sorry for not posting about this earlier... Anyways, some key data: John 'warthog9' Hawley (who wrote the current caching system for kernel.org) is my mentor, and GSoC is from May 26 to Aug 18, minus a vacation from Jul 19 to Aug 9. While I'm planning to keep much of it on the list, if anyone else is particularly interested in helping or providing input, please notify me. (Looking at the logs, Jakub maybe? Cc'ing him just in case.) The current plan is basically to get the gitweb caching fork that's been implemented for kernel.org back to the gitweb mainline, and then optimize it (probably move to memcached). I'm not yet sure how to approach this (e.g. whether to merge from the fork to the mainline or vice versa), but I'll probably figure this out together with John and might post separately about that later. In any case, expect patches and messages from me on the list. :) I'm lea_w (or lea_1) on #git on Freenode, if anyone wants to contact me in real time (provided my Pidgin doesn't hiccup). Best, Lea ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-27 18:03 Gitweb caching: Google Summer of Code project Lea Wiemann @ 2008-05-27 21:53 ` Jakub Narebski 2008-05-27 22:54 ` Lea Wiemann 0 siblings, 1 reply; 18+ messages in thread From: Jakub Narebski @ 2008-05-27 21:53 UTC (permalink / raw) To: Lea Wiemann; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli On Tue, 27 May 2008, Lea Wiemann wrote: > I just wanted to let everyone know that I'm currently getting started on > a Google Summer of Code project to improve the caching mechanism in gitweb. > > Sorry for not posting about this earlier... Anyways, some key data: > John 'warthog9' Hawley (who wrote the current caching system for > kernel.org) is my mentor, and GSoC is from May 26 to Aug 18, minus a > vacation from Jul 19 to Aug 9. Thanks for the info. > While I'm planning to keep much of it on the list, if anyone else is > particularly interested in helping or providing input, please notify me. > (Looking at the logs, Jakub maybe? Cc'ing him just in case.) I'm certainly interested, at least from theoretical point of view, and I think I can help (as one of main gitweb contributors). I guess that Petr Baudis would also be interested, because he maintains repo.or.cz, a public Git hosting site. Lately he posted a patch implementing projects list caching, in a bit different way from how it is done on kernel.org, namely by caching data and not final output: http://thread.gmane.org/gmane.comp.version-control.git/77151 AFAIK it is implemented in repo.or.cz gitweb: http://repo.or.cz/w/git/repo.git This indirectly lead to a bit of research on caching in Perl by yours truly: http://thread.gmane.org/gmane.comp.version-control.git/77529 (mentioned in http://git.or.cz/gitwiki/SoC2008Projects#gitweb-caching). I think that you can also get some help on caching from Lars Hjemli, author of cgit, which is caching git web interface written in C. (I have added both Petr Baudis and Lars Hjemli to Cc:) > The current plan is basically to get the gitweb caching fork that's been > implemented for kernel.org back to the gitweb mainline, and then > optimize it (probably move to memcached). I'm not yet sure how to > approach this (e.g. whether to merge from the fork to the mainline or > vice versa), but I'll probably figure this out together with John and > might post separately about that later. In any case, expect patches and > messages from me on the list. :) >From what I remember correcly from the discussion surrounding implementing caching for kernel.org gitweb, the main culprit of having it remain separate from mainline was splitting gitweb into many, many files. While it helped John in understanding gitweb, it made it difficult to merge changes back to mainline. Note also that it is easier to make a site-specific changes, than to make generic, closs-platform and cross-operating system change. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-27 21:53 ` Jakub Narebski @ 2008-05-27 22:54 ` Lea Wiemann 2008-05-28 12:14 ` Jakub Narebski 0 siblings, 1 reply; 18+ messages in thread From: Lea Wiemann @ 2008-05-27 22:54 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli Jakub Narebski wrote: > Lately he posted a patch > implementing projects list caching, in a bit different way from how it > is done on kernel.org, namely by caching data and not final output: Thanks for this and all the other pointers. Caching data and not final output is actually what I'm about to try next. If I'm not mistaken, the HTML output is significantly larger than the source (repository) data; however, kernel.org still seems to benefit from caching the HTML, rather than letting Linux' page cache cache the source data. That leads me to think that the page cache somehow fails to cache the source data properly -- I'm not sure why (wild speculation: perhaps because of the pack format). Anyway, I'd hope that I can encapsulate the 30-40 git_cmd calls in gitweb.perl and somehow cache their results (or, to save memory, the parts of their results that are actually used) and cache them using memcached. If that works well, we can stop bothering about frontend (HTML) caching, unless CPU becomes an issue, since all HTML pages are generated from cacheable source data. I'm *kindof* hoping that in the end there will be only few issues with cache expiry, since most calls are uniquely identified through hashes. (And the ones that are not, like getting the hash of the most recent commit, can perhaps be cached with some fairly low expiry time.) So that's what I'll try next. If you have any comments or warnings off the top of your heads, feel free to send email of course. :) > the main culprit of [the fork] was splitting gitweb into many, many > files. While it helped John in understanding gitweb, it made it > difficult to merge changes back to mainline. Interesting point, thanks for letting me know. (I might have gone ahead and tried to split the mainline gitweb myself... ^^) I think it would be nice if gitweb.perl could be split at some point, but I assume there are too many patches out there for that to be worth the merge problems, right? -- Lea ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-27 22:54 ` Lea Wiemann @ 2008-05-28 12:14 ` Jakub Narebski 2008-05-28 18:33 ` Lea Wiemann 0 siblings, 1 reply; 18+ messages in thread From: Jakub Narebski @ 2008-05-28 12:14 UTC (permalink / raw) To: Lea Wiemann; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli On Wed, 28 May 2008, Lea Wiemann wrote: > Jakub Narebski wrote: >> >> Lately he posted a patch >> implementing projects list caching, in a bit different way from how it >> is done on kernel.org, namely by caching data and not final output: > > Thanks for this and all the other pointers. > > Caching data and not final output is actually what I'm about to try > next. Caching data have its advantages and disadvantages, same as with caching HTML output (or parts of HTML output). I have wrote about it in http://thread.gmane.org/gmane.comp.version-control.git/77529 Let me summarize here advantages and disadvantages of caching data and of caching HTML output. 1. Caching data * advantages: - smaller than caching HTML output - you can use the same data to generate different pages ('summary', 'shortlog', 'log', 'rss'/'atom'; pages or search results of projects list) - you can generate pages with variable data, such as relative dates ("5 minutes ago"), staleness info ("cached data, 5 minutes old"), or content type: text/html vs application/xhtml+xml * disadvantages: - more CPU - need to serialize and deserialize (parse) data - more complicated 2. Caching HTML output * advantages: - simple, no need for serialization (pay attention to that in mixed data + output caching solutions) - low CPU (although supposedly[1] gitweb performance is I/O bound, and not CPU bound) - web servers deals very well with static pages - web servers deals with support for HTTP caching (giving ETag and Last-Changed headers, responding to If-Modified-Since, If-None-Match etc. headers from web browsers and caching proxies) * disadvantages: - large size of cache data (if most clients support compression, you can store it compressed, at the cost of CPU for non-supporting ones). - difficult to impossible variable output (for example you can still rewrite some HTTP headers for text/html vs application/xhtml+xml or store headers separately, you can use JavaScript to change visible times from absolute to relative dates) I'm sure John, Lars and Petr can tell you more, and have more experience. [1] Some evidence both from warthog9 and pasky, but no hard data[2] [2] I think it would be good to start with analyse of gitweb statictics, e.g. from Apache logs, from kernel.org and repo.or.cz. > If I'm not mistaken, the HTML output is significantly larger than > the source (repository) data; however, kernel.org still seems to benefit > from caching the HTML, rather than letting Linux' page cache cache the > source data. I don't think kernel.org caches _all_ pages, only the most requested (correct me if I'm wrong here, John, please). > That leads me to think that the page cache somehow fails > to cache the source data properly -- I'm not sure why (wild speculation: > perhaps because of the pack format). >From what I remember one of most costly to generate pages is projects list page (that is why Petr Baudis implemented caching for this page in repo.or.cz gitweb, using data caching here). With 1000+ projects (repositories) gitweb has to hit at best 1000+ packfiles, not to mention refs, to generate "Last Changed" column from git-for-each-ref output (accidentally, also to check if it is truly git repository). In kernel.org case with gitweb working similar to mod_userdir module but for git repositories (as a service, rather than as part of repo hosting), gitweb has to hit 1000+ 'summary' files... That is interspersed with other requests. How page cache and filesystem buffers can deal with that? BTW I'm not sure if kernel.org use CGI or "legacy" mod_perl gitweb; curently there is no support for FastCGI in gitweb (although you can find some patches in archive). (But I'm not an expert in those matters, so please take the above with a pinch of salt, or two). By the way using pack files besides reducing repository size also improved git performance thanks to better I/O performance and better working with filesystem cache (some say that git is optimized for warm cache). > Anyway, I'd hope that I can > encapsulate the 30-40 git_cmd calls in gitweb.perl and somehow cache > their results (or, to save memory, the parts of their results that are > actually used) and cache them using memcached. If that works well, we > can stop bothering about frontend (HTML) caching, unless CPU becomes an > issue, since all HTML pages are generated from cacheable source data. I don't think caching _everything_, including rarely requested pages, would be a good idea. > I'm *kindof* hoping that in the end there will be only few issues with > cache expiry, since most calls are uniquely identified through hashes. > (And the ones that are not, like getting the hash of the most recent > commit, can perhaps be cached with some fairly low expiry time.) The trouble is with those requests which are _not_ uniquely identified by hashes requested, such as 'summary', 'log' from given branch (not from given hash), or web feed for given branch. For those which are not-changing you can just (as gitweb does even now) give large HTTP expiry (Expires or max-age) and allow web browser or proxies to cache it. > So that's what I'll try next. If you have any comments or warnings off > the top of your heads, feel free to send email of course. :) I'm afraid that implementing kernel.org caching in mainline in a generic way would be enough work for a whole GSoC 2008. I hope I am mistaken and you would have time to analyse and implement wider reange of caching solutions in gitweb... >> the main culprit of [the fork] was splitting gitweb into many, many >> files. While it helped John in understanding gitweb, it made it >> difficult to merge changes back to mainline. > > Interesting point, thanks for letting me know. (I might have gone ahead > and tried to split the mainline gitweb myself... ^^) I think it would > be nice if gitweb.perl could be split at some point, but I assume there > are too many patches out there for that to be worth the merge problems, > right? On one hand gitweb.perl in single file makes it easy to install; on the other hand if it was split into modules (like git-gui now is) it would I think be easier to understand and modify... I think however that it would be better to first make gitweb use Git.pm, adding improving Git.pm when necessary (for example adding eager config parsing used in gitweb, i.e. read whole config into Perl hash at first request, then access hash instead of further calls to git-config). -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-28 12:14 ` Jakub Narebski @ 2008-05-28 18:33 ` Lea Wiemann 2008-05-29 23:27 ` Jakub Narebski 0 siblings, 1 reply; 18+ messages in thread From: Lea Wiemann @ 2008-05-28 18:33 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli Jakub Narebski wrote: > 1. Caching data > * disadvantages: > - more CPU > - need to serialize and deserialize (parse) data > - more complicated CPU: John told me that so far CPU has *never* been an issue on k.org. Unless someone tells me they've had CPU problems, I'll assume that CPU is a non-issue until I actually run into it (and then I can optimize the particular pieces where CPU is actually an issue). Serialization: I was planning to use Storable (memcached's Perl API uses it transparently I think). I'm hoping that this'll just solve it. It's true that it's more complicated. It'll require quite a bit of refactoring, and maybe I'll just back off if I find that it's too hard. > I'm afraid that implementing kernel.org caching in mainline in > a generic way would be enough work for a whole GSoC 2008. I probably won't reimplement the current caching mechanism. Do you think that a solution using memcached is generic enough? I'll still need to add some abstraction layer in the code, but when I'm finished the user will either get the normal uncached gitweb, or activate memcached caching with some configuration setting. By the way, I'll be posting about gitweb on this mailing list occasionally. If any of you would like to receive CC's on such messages, please let me know, otherwise I'll assume you get them through the mailing list. -- Lea ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-28 18:33 ` Lea Wiemann @ 2008-05-29 23:27 ` Jakub Narebski 2008-05-30 7:24 ` Lea Wiemann 0 siblings, 1 reply; 18+ messages in thread From: Jakub Narebski @ 2008-05-29 23:27 UTC (permalink / raw) To: Lea Wiemann; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli On Wed, 28 May 2008, Lea Wiemann wrote: > Jakub Narebski wrote: > > > > 1. Caching data > > * disadvantages: > > - more CPU > > - need to serialize and deserialize (parse) data > > - more complicated > > CPU: John told me that so far CPU has *never* been an issue on k.org. > Unless someone tells me they've had CPU problems, I'll assume that CPU > is a non-issue until I actually run into it (and then I can optimize the > particular pieces where CPU is actually an issue). True. What you have to care about (although I don't think it would be partilcularly difficult) is to not repeat bad I/O patterns with cache... > Serialization: I was planning to use Storable (memcached's Perl API uses > it transparently I think). I'm hoping that this'll just solve it. While Storable is part of, I think, any modern Perl installation, there might be problem with memcached API, and memcached API wrappers such as CHI one. Namely you cannot assume that memcached API is installed, so you have to provide some kind of fallback. > It's true that it's more complicated. It'll require quite a bit of > refactoring, and maybe I'll just back off if I find that it's too hard. What's more, if you want to implement If-Modified-Since and If-None-Match, you would have to implement it by yourself, while for static pages (cahing HTML output) web server would do this for us "for free". > > I'm afraid that implementing kernel.org caching in mainline in > > a generic way would be enough work for a whole GSoC 2008. > > I probably won't reimplement the current caching mechanism. Do you > think that a solution using memcached is generic enough? I'll still > need to add some abstraction layer in the code, but when I'm finished > the user will either get the normal uncached gitweb, or activate > memcached caching with some configuration setting. Thats good enough, although I think that current caching mechanism in kernel.org's gitweb (your implementation follows more what repo.or.cz's gitweb does) has some good ideas, like for example adaptive (depending on load) expiry time. By the way what do you think about adding (as an option) information about gitweb performance to the output, in the form of "Site generated in 0.01 seconds, 2 calls to git commands" or "Site generated in 0.0023 seconds, cached output, 1m31s old" line somewhere in the page footer? I hope you have some ideas in gitweb access statistics from kernel.org, repo.or.cz, and perhaps other large git hosting sites (e.g. freedesktop.org), and you plan on benchamrking gitweb caching using average / amortized time to generate page, ApacheBench or equivalent, load average on server depending on number of requests, I/O load (using fio tool, for example) depending on number of requests etc. > By the way, I'll be posting about gitweb on this mailing list > occasionally. If any of you would like to receive CC's on such > messages, please let me know, otherwise I'll assume you get them through > the mailing list. I read git mailing list via Usenet / news interface (NNTP gateway) from GMane. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-29 23:27 ` Jakub Narebski @ 2008-05-30 7:24 ` Lea Wiemann 2008-05-30 10:02 ` Jakub Narebski 0 siblings, 1 reply; 18+ messages in thread From: Lea Wiemann @ 2008-05-30 7:24 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli Jakub Narebski wrote: > you cannot assume that memcached API is installed, so > you have to provide some kind of fallback. That fallback would be to have no caching. :) I think that's acceptable -- I'm not too willing to implement caching for two API's. (Incidentally, memcached takes two shell commands to install and get running on my machine; I think that's acceptably easy.) > What's more, if you want to implement If-Modified-Since and > If-None-Match, you would have to implement it by yourself, while > for static pages (cahing HTML output) web server would do this > for us "for free". Are web servers doing anything that we can't easily reimplement in a few lines (and, on top of that, more easily tailored to different actions, projects, etc.)? > By the way what do you think about adding (as an option) information > about gitweb performance to the [HTML] output, Definitely a good idea! > I hope you have some ideas in gitweb access statistics from kernel.org, I'm waiting for John to give me SSH access and/or send them my way. :) > and you plan on benchamrking gitweb caching using [snip] Absolutely -- thanks for the suggestions! -- Lea ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-30 7:24 ` Lea Wiemann @ 2008-05-30 10:02 ` Jakub Narebski 2008-05-30 14:59 ` Lea Wiemann 0 siblings, 1 reply; 18+ messages in thread From: Jakub Narebski @ 2008-05-30 10:02 UTC (permalink / raw) To: Lea Wiemann; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli On Fri, 30 May 2008, Lea Wiemann wrote: > Jakub Narebski wrote: > > > > you cannot assume that memcached API is installed, so > > you have to provide some kind of fallback. > > That fallback would be to have no caching. I think that's acceptable > -- I'm not too willing to implement caching for two API's. I hope that you would make a wrapper around memcached (caching engine) API (it's a pity we cannot use CHI unified Perl caching interface), so it would be easy for example to change to filesystem based cache, or size aware filesystem based cache, or mmap, etc... I mean here that even if you don't implement two caching API's at least make it possible to easy change caching backend. Note also that memcached may not have sense for single machine (single server installation), and does not make sense for memory starved machines... and one can want gitweb caching even in that situation. > (Incidentally, memcached takes two shell commands to install and get > running on my machine; I think that's acceptably easy.) As John 'Warthog9' said wrt. using additional Perl modules for gitweb caching, most sites that are used as web servers (and gitweb servers) have strict requirements on stability of installed programs, libraries and modules. IIRC the policy usually is that one can install packages from main (base) repository for Linux distribution used on server, also from extras repository; sometimes from trusted contrib package repository. Modules which are only in CPAN, and programs which require compilation are out of the question, unfortunately. I think there is no problem wrt. memcached itself, I'm not so sure about Perl APIs: Cache::Memcached and/or Cache::Memcached::Fast (and optionally appropriate CHI modules/backends). > > What's more, if you want to implement If-Modified-Since and > > If-None-Match, you would have to implement it by yourself, while > > for static pages (cahing HTML output) web server would do this > > for us "for free". > > Are web servers doing anything that we can't easily reimplement in a few > lines (and, on top of that, more easily tailored to different actions, > projects, etc.)? Can we reimplement it? I think we can. Easily? I'm not sure. HTTP/1.0 If-Modified-Since should be failry easy; it would be harder to support fully and correctly ETag (weak vs. strong tags), If-None-Match (from web browsers I think), If-Match (from web caches) it would take some work. > > By the way what do you think about adding (as an option) information > > about gitweb performance to the [HTML] output, > > Definitely a good idea! I'd try to add it when I'd have a bot more of free time; unless you would do this first. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-30 10:02 ` Jakub Narebski @ 2008-05-30 14:59 ` Lea Wiemann 2008-05-30 15:07 ` Petr Baudis 0 siblings, 1 reply; 18+ messages in thread From: Lea Wiemann @ 2008-05-30 14:59 UTC (permalink / raw) To: Jakub Narebski; +Cc: git, John Hawley, Junio C Hamano, Petr Baudis, Lars Hjemli Jakub Narebski wrote: > even if you don't implement two caching API's at least make it > possible to easy change caching backend. Sure, I'll keep that in mind. > Note also that memcached may not have sense for single machine [...], > and does not make sense for memory starved machines... For single machines, memcached certainly works fine. For on memory-starved machines with HD caches, you'd have to cache the aggregate HTML data, not the data in the backend. So as long as I'm working on the backend (repository) cache, memcached should be fine. > IIRC the policy usually is that one can install packages > from main (base) repository for Linux distribution used on server, libcache-memcached-perl is in Debian stable; that's fair enough I think. Cache::Memcached::Fast doesn't seem to be in Debian as of now, but I wouldn't worry about performance unless it comes up. >>> By the way what do you think about adding (as an option) information >>> about gitweb performance to the [HTML] output, > > I'd try to add it when I'd have a bot more of free time I'd probably wait with this until I've written the Perl Git API. -- Lea ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-30 14:59 ` Lea Wiemann @ 2008-05-30 15:07 ` Petr Baudis 2008-05-30 15:27 ` Lea Wiemann 0 siblings, 1 reply; 18+ messages in thread From: Petr Baudis @ 2008-05-30 15:07 UTC (permalink / raw) To: Lea Wiemann; +Cc: Jakub Narebski, git, John Hawley, Junio C Hamano, Lars Hjemli On Fri, May 30, 2008 at 04:59:56PM +0200, Lea Wiemann wrote: > Jakub Narebski wrote: >> IIRC the policy usually is that one can install packages >> from main (base) repository for Linux distribution used on server, > > libcache-memcached-perl is in Debian stable; that's fair enough I think. > Cache::Memcached::Fast doesn't seem to be in Debian as of now, but I > wouldn't worry about performance unless it comes up. Still, please make this optional. It is fine for gitweb not to do any caching in the bare setup, but you should be able to get the simple version running without any external dependencies. >>>> By the way what do you think about adding (as an option) information >>>> about gitweb performance to the [HTML] output, >> I'd try to add it when I'd have a bot more of free time > > I'd probably wait with this until I've written the Perl Git API. Hmm, it shouldn't depend on that in any way, should it? use Time::HiRes qw(gettimeofday tv_interval); my $t0 = [gettimeofday]; ... print "<p>This page took ".tv_interval($t0, [gettimeofday])."s to generate.</p>"; I wonder what oldest Perl versions do we aim to support? If <5.8, we need to be more careful about Time::HiRes. It would be useful to document this with a use perl statement at the top of the script. -- Petr "Pasky" Baudis Whatever you can do, or dream you can, begin it. Boldness has genius, power, and magic in it. -- J. W. von Goethe ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-30 15:07 ` Petr Baudis @ 2008-05-30 15:27 ` Lea Wiemann 2008-05-30 15:38 ` Petr Baudis 2008-05-31 10:15 ` Jakub Narebski 0 siblings, 2 replies; 18+ messages in thread From: Lea Wiemann @ 2008-05-30 15:27 UTC (permalink / raw) To: Petr Baudis; +Cc: Jakub Narebski, git, John Hawley, Junio C Hamano, Lars Hjemli Petr Baudis wrote: > please make [memcached] optional. Oh, absolutely. I was definitely planning to keep Gitweb runnable without having Cache::Memcached installed. > print "<p>This page took ".tv_interval($t0, [gettimeofday])."s to generate.</p>"; Sure -- I'm not sure how useful bare timings are, though. When I look at individual pages, the page cache is usually warm anyway, so the only thing I might be interested in is advanced statistics like the number of calls to git or number of cache hits/misses. To find out how the cache performs timing-wise, you'll have to do larger benchmarks, individual page generation times won't help that much. > I wonder what oldest Perl versions do we aim to support? I'm thinking about 5.8 or 5.10. Looking at Debian, Perl 5.10 is not in stable (etch), but it's in lenny, which is planned to become stable in Sept. 08. So by the time the updated Gitweb/Git.pm has stabilized (and shows up as a package in Debian), Perl 5.10 will definitely be available widely enough. -- Lea ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-30 15:27 ` Lea Wiemann @ 2008-05-30 15:38 ` Petr Baudis 2008-05-30 16:04 ` Rafael Garcia-Suarez 2008-05-30 18:47 ` Lea Wiemann 2008-05-31 10:15 ` Jakub Narebski 1 sibling, 2 replies; 18+ messages in thread From: Petr Baudis @ 2008-05-30 15:38 UTC (permalink / raw) To: Lea Wiemann; +Cc: Jakub Narebski, git, John Hawley, Junio C Hamano, Lars Hjemli On Fri, May 30, 2008 at 05:27:59PM +0200, Lea Wiemann wrote: > Petr Baudis wrote: >> I wonder what oldest Perl versions do we aim to support? > > I'm thinking about 5.8 or 5.10. Looking at Debian, Perl 5.10 is not in > stable (etch), but it's in lenny, which is planned to become stable in > Sept. 08. So by the time the updated Gitweb/Git.pm has stabilized (and > shows up as a package in Debian), Perl 5.10 will definitely be available > widely enough. Wow, and here I was wondering if requiring at least 5.6 was not too liberal. ;-) I believe 5.8 is the newest possible candidate though, it is still too widespread; e.g. Debian-wise, many servers run on Etch and are going to stay there even for quite some time after Lenny gets released. Heck, I still have accounts on plenty of Sarge machines. ;-) (Sarge seems to have Perl-5.8.4.) -- Petr "Pasky" Baudis Whatever you can do, or dream you can, begin it. Boldness has genius, power, and magic in it. -- J. W. von Goethe ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-30 15:38 ` Petr Baudis @ 2008-05-30 16:04 ` Rafael Garcia-Suarez 2008-05-30 18:56 ` J.H. 2008-05-30 18:47 ` Lea Wiemann 1 sibling, 1 reply; 18+ messages in thread From: Rafael Garcia-Suarez @ 2008-05-30 16:04 UTC (permalink / raw) To: Petr Baudis Cc: Lea Wiemann, Jakub Narebski, git, John Hawley, Junio C Hamano, Lars Hjemli 2008/5/30 Petr Baudis <pasky@suse.cz>: > > Wow, and here I was wondering if requiring at least 5.6 was not too > liberal. ;-) I believe 5.8 is the newest possible candidate though, it > is still too widespread; e.g. Debian-wise, many servers run on Etch and > are going to stay there even for quite some time after Lenny gets > released. Heck, I still have accounts on plenty of Sarge machines. ;-) > (Sarge seems to have Perl-5.8.4.) I think 5.8.2 is a good _minimum_ perl to support. Before that one, Unicode support is next to null (5.6 and below) or too buggy, and gitweb needs that. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-30 16:04 ` Rafael Garcia-Suarez @ 2008-05-30 18:56 ` J.H. 2008-05-30 20:28 ` Junio C Hamano 0 siblings, 1 reply; 18+ messages in thread From: J.H. @ 2008-05-30 18:56 UTC (permalink / raw) To: Rafael Garcia-Suarez Cc: Petr Baudis, Lea Wiemann, Jakub Narebski, git, Junio C Hamano, Lars Hjemli I would agree - lets try and shoot for 5.8 as a baseline minimum (there are lots of people who are slow to upgrade, and it would be nice to be able for them to make use of newer gitweb's on things like Centos / RHEL 4 - John On Fri, 2008-05-30 at 18:04 +0200, Rafael Garcia-Suarez wrote: > 2008/5/30 Petr Baudis <pasky@suse.cz>: > > > > Wow, and here I was wondering if requiring at least 5.6 was not too > > liberal. ;-) I believe 5.8 is the newest possible candidate though, it > > is still too widespread; e.g. Debian-wise, many servers run on Etch and > > are going to stay there even for quite some time after Lenny gets > > released. Heck, I still have accounts on plenty of Sarge machines. ;-) > > (Sarge seems to have Perl-5.8.4.) > > I think 5.8.2 is a good _minimum_ perl to support. Before that one, > Unicode support is next to null (5.6 and below) or too buggy, and > gitweb needs that. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-30 18:56 ` J.H. @ 2008-05-30 20:28 ` Junio C Hamano 2008-05-30 21:32 ` Lea Wiemann 0 siblings, 1 reply; 18+ messages in thread From: Junio C Hamano @ 2008-05-30 20:28 UTC (permalink / raw) To: J.H. Cc: Rafael Garcia-Suarez, Petr Baudis, Lea Wiemann, Jakub Narebski, git, Lars Hjemli "J.H." <warthog19@eaglescrag.net> writes: > On Fri, 2008-05-30 at 18:04 +0200, Rafael Garcia-Suarez wrote: >> 2008/5/30 Petr Baudis <pasky@suse.cz>: >> > >> > Wow, and here I was wondering if requiring at least 5.6 was not too >> > liberal. ;-) I believe 5.8 is the newest possible candidate though, it >> > is still too widespread; e.g. Debian-wise, many servers run on Etch and >> > are going to stay there even for quite some time after Lenny gets >> > released. Heck, I still have accounts on plenty of Sarge machines. ;-) >> > (Sarge seems to have Perl-5.8.4.) >> >> I think 5.8.2 is a good _minimum_ perl to support. Before that one, >> Unicode support is next to null (5.6 and below) or too buggy, and >> gitweb needs that. > I would agree - lets try and shoot for 5.8 as a baseline minimum (there > are lots of people who are slow to upgrade, and it would be nice to be > able for them to make use of newer gitweb's on things like Centos / RHEL > 4 I do not think it is unreasonable to require recent Perl for a machine that runs gitweb, as it is not something you would run on your "customer site that needs to be ultra sta(b)le" nor on your "development machine that needs to run the same version as that ultra sta(b)le customer installation." In other words, gitweb is primarily a developer tool, and you can assume that people can afford to have a dedicated machine they can update its Perl to recent version. However, introducing dependency on 5.8 to any and all Git.pm users may have a much wider impact. Right now, these "use Git": git-add--interactive.perl git-cvsexportcommit.perl git-send-email.perl git-svn.perl If you are doing development for some customer application whose end product needs to land on a machine with a pre-5.8 Perl, it is conceivable that you may pin the Perl running on that development machine to that old version, say 5.6. Introducing 5.8 dependency to Git.pm in such a way that "use Git" from these fail might make these people somewhat unhappy. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-30 20:28 ` Junio C Hamano @ 2008-05-30 21:32 ` Lea Wiemann 0 siblings, 0 replies; 18+ messages in thread From: Lea Wiemann @ 2008-05-30 21:32 UTC (permalink / raw) To: Junio C Hamano Cc: J.H., Rafael Garcia-Suarez, Petr Baudis, Jakub Narebski, git, Lars Hjemli Junio C Hamano wrote: > Right now, these "use Git": git-add--interactive.perl > git-cvsexportcommit.perl git-send-email.perl git-svn.perl > > Introducing 5.8 dependency to Git.pm in such a way that > "use Git" from these fail might make these people somewhat unhappy. Git seems to generally work with Perl 5.6 after installing Scalar::Util through CPAN. I'm happy with (sporadically) testing it with 5.6.2, though I don't have any older version installed. Also, if at some point Perl 5.6 compatibility gets in the way (due to lack of Unicode support), we'll have to revisit this issue, but for now that should be fine. Gitweb relies on Unicode support (e.g. "use Encode") and will continue to be compatible with 5.8 and 5.10 only. I'm not sure how much changes between Perl's micro versions. Should we boldly claim "use 5.6.0" or only "use 5.6.2"? Are people still using versions 5.6.0/1 at all? And for Gitweb, use 5.8.0, or use 5.8.8 (which is the version I'm testing with, currently)? Or should I downgrade? -- Lea ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-30 15:38 ` Petr Baudis 2008-05-30 16:04 ` Rafael Garcia-Suarez @ 2008-05-30 18:47 ` Lea Wiemann 1 sibling, 0 replies; 18+ messages in thread From: Lea Wiemann @ 2008-05-30 18:47 UTC (permalink / raw) To: Petr Baudis; +Cc: Jakub Narebski, git, John Hawley, Junio C Hamano, Lars Hjemli Petr Baudis wrote: > [5.8] is still too widespread; Okay; I'll keep testing Git.pm and Gitweb with Perl 5.8 then. -- Lea ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Gitweb caching: Google Summer of Code project 2008-05-30 15:27 ` Lea Wiemann 2008-05-30 15:38 ` Petr Baudis @ 2008-05-31 10:15 ` Jakub Narebski 1 sibling, 0 replies; 18+ messages in thread From: Jakub Narebski @ 2008-05-31 10:15 UTC (permalink / raw) To: Lea Wiemann, Petr Baudis; +Cc: git, John Hawley, Junio C Hamano, Lars Hjemli On Fri, 30 May 2008, Lea Wiemann wrote: > Petr Baudis wrote: > > > > please make [memcached] optional. > > Oh, absolutely. I was definitely planning to keep Gitweb runnable > without having Cache::Memcached installed. I think the idea was to have the following options: * cache using memcached (Cache::Memcached installed, and memcached on) * cache using filesystem, perhaps size aware (with limited cache size) * no caching It is quite possible that one would want/need gitweb caching, but either does not want hassle with memcached, or memcached is not feasible (for example memory starved machine). > > print "<p>This page took ".tv_interval($t0, [gettimeofday])."s to generate.</p>"; > > Sure -- I'm not sure how useful bare timings are, though. When I look > at individual pages, the page cache is usually warm anyway, so the only > thing I might be interested in is advanced statistics like the number of > calls to git This should be fairly easy, just modify git_cmd() to count number of calls. > or number of cache hits/misses. And this I don't think it would be easy. -- Jakub Narebski Poland ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-05-31 10:16 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-27 18:03 Gitweb caching: Google Summer of Code project Lea Wiemann 2008-05-27 21:53 ` Jakub Narebski 2008-05-27 22:54 ` Lea Wiemann 2008-05-28 12:14 ` Jakub Narebski 2008-05-28 18:33 ` Lea Wiemann 2008-05-29 23:27 ` Jakub Narebski 2008-05-30 7:24 ` Lea Wiemann 2008-05-30 10:02 ` Jakub Narebski 2008-05-30 14:59 ` Lea Wiemann 2008-05-30 15:07 ` Petr Baudis 2008-05-30 15:27 ` Lea Wiemann 2008-05-30 15:38 ` Petr Baudis 2008-05-30 16:04 ` Rafael Garcia-Suarez 2008-05-30 18:56 ` J.H. 2008-05-30 20:28 ` Junio C Hamano 2008-05-30 21:32 ` Lea Wiemann 2008-05-30 18:47 ` Lea Wiemann 2008-05-31 10:15 ` Jakub Narebski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).