Re: [RFD] My thoughts about implementing gitweb output caching

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jonathan Nieder <jrnieder@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org, "J.H." <warthog9@eaglescrag.net>,
	John 'Warthog9' Hawley <warthog9@kernel.org>
Subject: Re: [RFD] My thoughts about implementing gitweb output caching
Date: Fri, 7 Jan 2011 18:26:43 -0600	[thread overview]
Message-ID: <20110108002643.GD15495@burratino> (raw)
In-Reply-To: <201101080042.36156.jnareb@gmail.com>

Hi,

Thanks for these design notes.  A few uninformed reactions.

Jakub Narebski wrote:

> There was request to support installing gitweb modules in a separate
> directory, but that would require changes to "gitweb: Prepare for
> splitting gitweb" patch (but it is doable).  Is there wider interest
> in supporting such feature?

If you are referring to my suggestion, I see no reason to wait on
that.  The lib/ dir can be made configurable later.

> Simplest solution is to use $cgi->self_url() (note that what J.H. v8
> uses, i.e.: "$my_url?". $ENV{'QUERY_STRING'}, where $my_url is just
> $cgi->url() is not enough - it doesn't take path_info into account).
>
> Alternate solution, which I used in my rewrite, is to come up with
> "canonical" URL, e.g. href(-replay => 1, -full => 1, -path_info => 0);
> with this solution using path_info vs query parameters or reordering
> query parameters still gives the same key.

It is easy to miss dependencies on parts of the URL that are being
fuzzed out.  For example, the <base href...> tag is only inserted with
path_info.  Maybe it would be less risky to first use self_url(), then
canonicalize it in a separate patch?

> J.H. patches up and including v7, and my rewrite up and including v6,
> excluded error pages from caching.  I think that the original resoning
> behind choosing to do it this way was that A.), each of specific error
> pages is usually accessed only once, so caching them would only take up
> space bloating cache, but what is more important B.) that you can't
> cache errors from caching engine.

Perhaps there is a user experience reason?  If I receive an error page
due to a problem with my repository, all else being equal, I would
prefer that the next time I reload it is fixed.  By comparison, having
to reload multiple times to forget an obsolete non-error response
would be less aggravating and perhaps expected.

But the benefit from caching e.g. a response from a broken link would
outweigh that.

> Second is if there is no stale data to serve (or data is too stale), but
> we have progress indicator.  In this case the foreground process is
> responsible for rendering progress indicator, and background process is
> responsible for generating data.  In this case foreground process waits
> for data to be generated (unless progress info subroutine exits), so
> strictly spaking we don't need to detach background process in this
> case.

What happens when the client gets tired of waiting and closes the
connection?

> With output caching gitweb can also support 'Range' requests, which
> means that it would support resumable download.  This would mean hat we
> would be able to resume downloading of snapshot (or in the future
> bundle)... if we cannot do this now.  This would require some more code
> to be added.

Exciting stuff.

Teaching gitweb to generate bundles sounds like a recipe for high server
loads, though.  I suspect manual (or by cronjob) generation would work
better, with a possible exception of very frequently cloned and
infrequently pushed-to repos like linus's linux-2.6.

Jonathan

next prev parent reply	other threads:[~2011-01-08  0:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-07 23:42 [RFD] My thoughts about implementing gitweb output caching Jakub Narebski
2011-01-08  0:26 ` Jonathan Nieder [this message]
2011-01-08  2:46   ` Nicolas Pitre
2011-01-08 11:15     ` Jakub Narebski
2011-01-08 11:44   ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110108002643.GD15495@burratino \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=warthog9@eaglescrag.net \
    --cc=warthog9@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).