From: "Constantine A. Murenin" <mureninc@gmail.com>
To: Charles McGarvey <chazmcgarvey@brokenzipper.com>
Cc: Fredrik Gustafsson <iveqy@iveqy.com>, git@vger.kernel.org
Subject: Re: is there a fast web-interface to git for huge repos?
Date: Fri, 7 Jun 2013 13:21:30 -0700 [thread overview]
Message-ID: <CAPKkNb460fNJcwt6084xkuDa2sWMRnF+FBu+i_G01aJMMiRevA@mail.gmail.com> (raw)
In-Reply-To: <51B23F01.5020608@brokenzipper.com>
On 7 June 2013 13:13, Charles McGarvey <chazmcgarvey@brokenzipper.com> wrote:
> On 06/07/2013 01:02 PM, Constantine A. Murenin wrote:
>>> That's a one-time penalty. Why would that be a problem? And why is wget
>>> even mentioned? Did we misunderstood eachother?
>>
>> `wget` or `curl --head` would be used to trigger the caching.
>>
>> I don't understand how it's a one-time penalty. Noone wants to look
>> at an old copy of the repository, so, pretty much, if, say, I want to
>> have a gitweb of all 4 BSDs, updated daily, then, pretty much, even
>> with lots of ram (e.g. to eliminate the cold-case 5s penalty, and
>> reduce each page to 0.5s), on a quad-core box, I'd be kinda be lucky
>> to complete a generation of all the pages within 12h or so, obviously
>> using the machine at, or above, 50% capacity just for the caching. Or
>> several days or even a couple of weeks on an Intel Atom or VIA Nano
>> with 2GB of RAM or so. Obviously not acceptable, there has to be a
>> better solution.
>>
>> One could, I guess, only regenerate the pages which have changed, but
>> it still sounds like an ugly solution, where you'd have to be
>> generating a list of files that have changed between one gen and the
>> next, and you'd still have to have a very high cpu, cache and storage
>> requirements.
>
> Have you already ruled out caching on a proxy? Pages would only be generated
> on demand, so the first visitor would still experience the delay but the rest
> would be fast until the page expires. Even expiring pages as often as five
> minutes or less would probably provide significant processing savings
> (depending on how many users you have), and that level of staleness and the
> occasional delays may be acceptable to your users.
>
> As you say, generating the entire cache upfront and continuously is wasteful
> and probably unrealistic, but any type of caching, by definition, is going to
> involve users seeing stale content, and I don't see that you have any other
> option but some type of caching. Well, you could reproduce what git does in a
> bunch of distributed algorithms and run your app on a farm--which, I guess, is
> probably what GitHub is doing--but throwing up a caching reverse proxy is a
> lot quicker if you can accept the caveats.
I don't think GitHub / Gitorious / whatever have solved this problem
at all. They're terribly slow on big repos, some pages don't even
generate the first time you click on the link.
I'm totally fine with daily updates; but I think there still has to be
some better way of doing this than wasting 0.5s of CPU time and 5s of
HDD time (if completely cold) for each blame / log, at the price of
more storage and some pre-caching, and (daily (in my use-case))
fine-grained incremental updates.
C.
next prev parent reply other threads:[~2013-06-07 20:21 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-07 1:35 is there a fast web-interface to git for huge repos? Constantine A. Murenin
2013-06-07 6:33 ` Fredrik Gustafsson
2013-06-07 17:05 ` Constantine A. Murenin
2013-06-07 17:57 ` Fredrik Gustafsson
2013-06-07 19:02 ` Constantine A. Murenin
2013-06-07 20:13 ` Charles McGarvey
2013-06-07 20:21 ` Constantine A. Murenin [this message]
2013-06-14 10:55 ` Holger Hellmuth (IKS)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPKkNb460fNJcwt6084xkuDa2sWMRnF+FBu+i_G01aJMMiRevA@mail.gmail.com \
--to=mureninc@gmail.com \
--cc=chazmcgarvey@brokenzipper.com \
--cc=git@vger.kernel.org \
--cc=iveqy@iveqy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).