Re: [RFD] My thoughts about implementing gitweb output caching

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jonathan Nieder <jrnieder@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>
Cc: git@vger.kernel.org, "J.H." <warthog9@eaglescrag.net>,
	John 'Warthog9' Hawley <warthog9@kernel.org>
Subject: Re: [RFD] My thoughts about implementing gitweb output caching
Date: Fri, 7 Jan 2011 18:26:43 -0600	[thread overview]
Message-ID: <20110108002643.GD15495@burratino> (raw)
In-Reply-To: <201101080042.36156.jnareb@gmail.com>

Hi,

Thanks for these design notes.  A few uninformed reactions.

Jakub Narebski wrote:

> There was request to support installing gitweb modules in a separate
> directory, but that would require changes to "gitweb: Prepare for
> splitting gitweb" patch (but it is doable).  Is there wider interest
> in supporting such feature?

If you are referring to my suggestion, I see no reason to wait on
that.  The lib/ dir can be made configurable later.

> Simplest solution is to use $cgi->self_url() (note that what J.H. v8
> uses, i.e.: "$my_url?". $ENV{'QUERY_STRING'}, where $my_url is just
> $cgi->url() is not enough - it doesn't take path_info into account).
>
> Alternate solution, which I used in my rewrite, is to come up with
> "canonical" URL, e.g. href(-replay => 1, -full => 1, -path_info => 0);
> with this solution using path_info vs query parameters or reordering
> query parameters still gives the same key.

It is easy to miss dependencies on parts of the URL that are being
fuzzed out.  For example, the <base href...> tag is only inserted with
path_info.  Maybe it would be less risky to first use self_url(), then
canonicalize it in a separate patch?

> J.H. patches up and including v7, and my rewrite up and including v6,
> excluded error pages from caching.  I think that the original resoning
> behind choosing to do it this way was that A.), each of specific error
> pages is usually accessed only once, so caching them would only take up
> space bloating cache, but what is more important B.) that you can't
> cache errors from caching engine.

Perhaps there is a user experience reason?  If I receive an error page
due to a problem with my repository, all else being equal, I would
prefer that the next time I reload it is fixed.  By comparison, having
to reload multiple times to forget an obsolete non-error response
would be less aggravating and perhaps expected.

But the benefit from caching e.g. a response from a broken link would
outweigh that.

> Second is if there is no stale data to serve (or data is too stale), but
> we have progress indicator.  In this case the foreground process is
> responsible for rendering progress indicator, and background process is
> responsible for generating data.  In this case foreground process waits
> for data to be generated (unless progress info subroutine exits), so
> strictly spaking we don't need to detach background process in this
> case.

What happens when the client gets tired of waiting and closes the
connection?

> With output caching gitweb can also support 'Range' requests, which
> means that it would support resumable download.  This would mean hat we
> would be able to resume downloading of snapshot (or in the future
> bundle)... if we cannot do this now.  This would require some more code
> to be added.

Exciting stuff.

Teaching gitweb to generate bundles sounds like a recipe for high server
loads, though.  I suspect manual (or by cronjob) generation would work
better, with a possible exception of very frequently cloned and
infrequently pushed-to repos like linus's linux-2.6.

Jonathan

next prev parent reply	other threads:[~2011-01-08  0:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-07 23:42 [RFD] My thoughts about implementing gitweb output caching Jakub Narebski
2011-01-08  0:26 ` Jonathan Nieder [this message]
2011-01-08  2:46   ` Nicolas Pitre
2011-01-08 11:15     ` Jakub Narebski
2011-01-08 11:44   ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110108002643.GD15495@burratino \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=warthog9@eaglescrag.net \
    --cc=warthog9@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.