git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Bruno Cesar Ribas <ribas@c3sl.ufpr.br>
Cc: git@vger.kernel.org, Petr Baudis <pasky@suse.cz>
Subject: Re: Another bench on gitweb (also on gitweb caching)
Date: Mon, 11 Feb 2008 16:44:23 -0800 (PST)	[thread overview]
Message-ID: <m363wvdmxr.fsf@localhost.localdomain> (raw)
In-Reply-To: <20080210030919.GA32733@c3sl.ufpr.br>

Bruno Cesar Ribas <ribas@c3sl.ufpr.br> writes:

> I made another SIMPLE bench on gitweb. Testing time on git-for-each-ref.
> 
> Using my 1000 projects I ran:
> 8<----------------
> #/bin/bash
> PEGAR_ref() { 
>     PROJ=projeto$1.git; 
>     cd $PROJ; 
>     printf "\tlastref = $(git-for-each-ref --sort=-committerdate --count=1\
>             --format='%(committer)')\n" >> config; 
>     cd -; 
> }
> cd $HOME/scm
> for((i=1;i<=1000;i++)){ PEGAR_ref $i & }
> 8<----------------

Could you please do not mix English and your native language
(Portuguese?) in shown examples? Mixing two languages in one
identifier name (unless it is ref in br too) is especially bad
form... TIA.

Besides, what I'm more interested in is a script used to generate
those 1000 projects...
 
> And at the "git_get_last_activity" instead of running git-for-each-ref i
> asked to get gitweb.lastref
> 
> Here are the results:
> "dd" means: dd if=/dev/zero of=$HOME/dd/$i bs=1M count=400000
> 
> Running 2 dd to generate disk IO.  Here comes the results:
> NO projects_list  projects_list
> 7m56s55           6m11s95        cached last change, using gitweb.lastref
> 16m30s69          15m10s74       default gitweb, using FS's owner
> 16m07s40          15m24s34       patched to get gitweb.owner
> 
> Now results for a 1000projects on an idle machine. (No dd running to
> generate IO)
> NO projects_list  projects_list
> 0m26s79           0m38s70       cached last change, using gitweb.lastref
> 1m19s08           1m09s55       default gitweb, using FS's owner
> 1m17s58           1m09s55       patched to get gitweb.owner

Those are results of running gitweb as standalone script, or your
script runing git-for-each-ref?

Besides, I'd rather see results of running ApacheBench. On Linux it
usually comes with installed Apache, and it is called by runing
'ab'. Your tests instead of adding superficial load could try to use
concurrent requests, and more than 1 request to get better average.
 
> I found out those VERY interesting, so instead of trying to think a
> new way to store gitweb config, we should think a way to cache those
> information.

Below there are my thoughts about caching information for gitweb:

First, the basis of each otimisation is checking the bottlenecks.
I think it was posted sometime there that the pages taking most load
are projects list and feeds. 

Kernel.org even run modified version of gitweb, with some caching
support; Cgit (git web interface in C) also has caching support.


Due to the fact that gitweb produces relative time in output for
projects list page and for project summary page, it is unfortunately
not easy to just simply cache HTML output: one would have either
resign from using relative time, or rewrite time from relative to
absolute, either on server (in gitweb), or on client (in JavaScript).
So perhaps it would be better to cache generating (costly to obtain)
information; like lastchanged time for projects.

Or we can for example assume (i.e. do that if appropriate gitweb
feature is set) that projects are bare projects pushed to, and that
git-update-server-info is ran on repository update (for example for
HTTP protocol transport), and stat $GIT_DIR/info/refs and/or
$GIT_DIR/objects/info/packs instead of running git-for-each-ref.
Of course then column would be called something like "Last Update"
instead of "Last Change".

The "Last Update" information is especially easy because it can be
invalidated / update externally, by the update / post-receive hook,
outside gitweb. So gitweb doesn't need to implement some caching
invalidation mechanism for this.

We can store lastref / lastchange information in repository config, as
for example "gitweb.lastref" key. We can store it in gitweb wide
config, for example in $projectroot/gitwebconfig file, as for example
"gitweb.<project>.lastref" key. Or we can store it as hash initializer
in some sourced Perl file, read from gitweb_config.perl (this I think
can be done even now without touching gitweb code at all); we can use
Data::Dumper to save such information.

The possibilities are many.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

  reply	other threads:[~2008-02-12  0:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-10  3:09 Another bench on gitweb Bruno Cesar Ribas
2008-02-12  0:44 ` Jakub Narebski [this message]
2008-02-13  0:45   ` Another bench on gitweb (also on gitweb caching) Bruno Cesar Ribas
2008-02-13  0:50     ` Bruno Cesar Ribas
2008-02-13  0:57     ` J.H.
2008-02-13  1:01     ` J.H.
2008-02-13 12:17       ` Jakub Narebski
2008-02-13 19:12         ` J.H.
2008-02-14  1:01           ` Jakub Narebski
2008-02-14 22:43             ` J.H.
2008-02-15 23:19               ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m363wvdmxr.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pasky@suse.cz \
    --cc=ribas@c3sl.ufpr.br \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).