From: Jakub Narebski <jnareb@gmail.com>
To: "J.H." <warthog9@eaglescrag.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 00/18] Gitweb caching v8
Date: Fri, 10 Dec 2010 02:27:27 +0100 [thread overview]
Message-ID: <201012100227.27903.jnareb@gmail.com> (raw)
In-Reply-To: <4D017796.4030506@eaglescrag.net>
On Fri, 10 Dec 2010, J.H. wrote:
> On 12/09/2010 03:26 PM, Jakub Narebski wrote:
>> John, could you please in the future Cc me? I am interested in gitweb
>> output caching development. Thanks in advance.
>
> Apologies, apparently screwed up on my git send-email line. I'll get
> that right one of these eons.
Ah, I can understand this.
>> I guess that in the final submission (i.e. the one that is to be
>> merged in into git.git repository) those changes would be squashed in,
>> isn't it?
>
> I have no objections to squashing the reversions into a single patch,
> just figured it was easier to break them out for the time being.
I guess that interdiff in comments would work as well, or almost as well...
>>> There is one additional script I've written that the Fedora folks are using,
>>> and that might be useful to include, which is an 'offline' cache file generator.
>>> It basically wraps gitweb.cgi and at the end moves the cache file into the right
>>> place. The Fedora folks were finding it took hours to generate their front
>>> page, and that doing a background generation almost never completed (due to
>>> process death). This was a simple way to handle that. If people would like
>>> I can add it in as an additional patch.
>>
>> Are you detaching the background process?
Errr... what I meant here is that perhaps detaching background process
would make it not die, but I am guessing here.
> No, in fact I completely turn off forking (using the $cacheDoFork variable.)
BTW. what I don't like is your code forking indiscriminately even if it
is not needed (e.g. background cache generation is turned off).
>
>> It would be nice to have it as separate patch.
>
> I can add it easily enough.
It is only about caching most IO intensive page, i.e. projects_list page,
isn't it? Why doesn't _it_ die, like background process?
>
>>> v8:
>>> - Reverting several changes from Jakub's change set that make no sense
>>> - is_cacheable changed to always return true - nothing special about
>>> blame or blame_incremental as far as the caching engine is concerned
>>
>> 'blame_incremental' is just another version of 'blame' view. I have
>> disabled it when caching is enabled in my rewrite (you instead disabled
>> caching for 'blame_incremental' in your v7 and mine v7.x) because I
>> couldn't get it to work together with caching. Did you check that it
>> works?
>
> blame works fine, blame_incremental generates but doesn't..... ohhhh
> someone added ajaxy kinda stuff and doesn't mention it anywhere.
Errr... I thought that the 'incremental' part is self-explaining that
it is Ajax-y stuff. Well, while commit is 4af819d (gitweb: Incremental
blame (using JavaScript), 2009-09-01), perhaps I should have added some
comment in the code.
>
> Exciting.
>
> blame_data needs to not get a 'generating...' page in all likelihood,
> generating a blame_incremental page, letting it load and then refreshing
> the whole thing gets me what I'm expecting.
Hmmm... I wonder why it didn't work for me at that time...
>
> Is enough to mask.
>
> Guess I'm looking at a v9 now.
>
>> Besides, withou "tee"-ing, i.e. printing output as it is captured,
>> cached 'blame_data' means that 'blame_incremental' is not incremental,
>> and therefore it vanishes its advantage over 'blame'.
I mean here that with current state of caching 'blame_incremental' stops
to be incremental...
> There are only 2 ways to get to a blame_incremental page
>
> 1) By going to a blame page and clicking on the incremental link in the nav
>
> 2) By enabling it by default so when you click 'blame' it goes to
> incremental first.
3) By having JavaScript add ';js=1' to all links, so clicking on
'blame' link (with action set to 'blame') would result in
'blame_incremental' view.
>
>> In the case data is in cache, then 'blame_inremental' doesn't have
>> advantage over 'blame' either.
>
> Agreed, though it's easy enough to support in the caching engine,
> basically don't return 'Generating...' and wait for that data to cache.
> Not really an advantage except that your not waiting for the whole
> generation to get a page back at all.
>
>>> - Reverted change to reset_output as
>>> open STDOUT, ">&", \*STDOUT_REAL;
>>> causes assertion failures:
>>> Assertion !((((s->var)->sv_flags & (0x00004000|0x00008000)) == 0x00008000) && (((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVGV || ((svtype)((s->var)->sv_flags & 0xff)) == SVt_PVLV)) failed: file "scalar.xs", line 49 at gitweb.cgi line 1221.
>>> if we encounter an error *BEFORE* we've ever changed the output.
>>
>> Which Perl version are you using? Because I think you found error in Perl.
>> Well, at least I have not happen on this bug.
>
> This is perl, v5.10.0 built for x86_64-linux-thread-multi
Could you check with newer perl? I don't get this error.
>> I have nothing againts using
>>
>> open STDOUT, ">&STDOUT_REAL";
>>
>> though I really prefer that you used lexical filehandles, instead of
>> "globs" which are global variables.
And using 'print STDOUT_REAL "";' protects against spurious warning
(the warning is really wrong in this case).
>> The following works:
>>
>> open STDOUT, '>&', fileno($fh);
>>
>> Note that fileno(Symbol::qualify_to_ref($fh)) might be needed...
>
> I see 0 advantage to shifting around STDOUT and STDERR to a lexical
> filehandle vs. a glob in this case. STDOUT_REAL retains all the
> properties of STDOUT should it be needed elsewhere, including what it
> was going and what it was doing.
>
> I have no objection to shifting the file handles I'm using to lexical
> variables, if nothing else the argument about them closing when falling
> out of scope is worth it, but for STDOUT, STDERR, etc I don't think
> switching to lexicals makes a lot of sense
Well... I'd have to agree that in current case (capturing engine embedded
in gitweb, and gitweb-specific; no need for recursive capture) it would
be enough to use such globs.
>
>>> - Cleanups there were indirectly mentioned by Jakub
>>> - Elimination of anything even remotely looking like duplicate code
>>> - Creation of isBinaryAction() and isFeedAction()
>>
>> Could you please do not use mixedCase names?
>
> I'm fine with renaming those if you wish.
>
>> First, that is what %actions_info from
>>
>> [PATCH 16/24] gitweb: Introduce %actions_info, gathering information about actions
>> http://thread.gmane.org/gmane.comp.version-control.git/163052/focus=163038
>> http://repo.or.cz/w/git/jnareb-git.git/commitdiff/305a10339b33d56b4a50708d71e8f42453c8cb1f
>>
>> I have invented for.
>
> I have not based any of my caching engine, right now, on anything you've
> done for your rewrite.
What I meant here that if you will be doing yet another version, you
can take a look at it as a way to avoiding not very clear and nice
long alternatives in condition, or in regexp matched.
>
>> Second, why 'isBinaryAction()'? there isn't something inherently
>> different between binary (':raw') and text (':utf8') output, as I have
>> repeatedly said before.
>
> It's a binary action in that you are shoving something down the pipe
> with the intention of sending the bits completely raw. You read the
> data raw, and write the data raw. There is no interpretation of the
> data as being anything but straight raw.
>
> Right now, in gitweb already, there are two places that treat output
> completely differently:
>
> - snapshot
> - blob_plain
>
> The only reason isBinaryAction() (or any other function name or process
> you want to grant it) exists is so that I can figure out if it's one of
> those actions so I can deal with the cache and output handling
> differently for each.
>
> Yes, I could flip the entire caching engine over to following the same
> mantra for everything and thus there is no need to care, but gitweb
> itself isn't really setup to handle that separation cleanly right now,
> and I'm trying to make as few bigger changes right now as is.
Always reading from cache in ':raw' mode and always printing from cache
in ':raw' mode (i.e. setting STDOUT to ':raw' before printing / copying
cache entry) would be in gitweb case enough to not special-case binary
files.
In gitweb you always do "binmode STDOUT, ':raw';" _after_ starting capture,
which means that it gets applied to cache file; and gitweb always do
"binmode STDOUT, ':utf8';" before stopping capture.
If you print text data to file using ':utf8' layer (applied at beginning
to cache file) it is in this file as correct sequence of bytes. Therefore
you can dump said cache file to STDOUT in ':raw' mode (or in ':utf8' mode)
- both STDOUT and read cache file has to have the same mode.
>>> - Added better error handling
>>> - Creation of .err file in the cache directory
>>> - Trap STDERR output into $output_err as this was spewing data prior
>>> to any header information being sent
>>
>> Why it is needed? We capture output of "die" via CGI::Util::set_message,
>> and "warn" output is captured to web server logs... unless you explicitely
>> use "print STDERR <sth>" -- don't do that instead.
>
> I have seen, in several instances, a case where git itself will generate
> an error, it shoves it to STDERR which makes it to the client before
> anything else, thus causing 500 level errors.
>
> Added this so that STDERR got trapped and those messages didn't make it out.
Could you give examples when it happens? Anything that happens after
"use CGI::Carp" is parsed should have STDERR redirected to web server
errors log.
I'll read the actual patch and comment on it.
>
>>> - Added hidden field in footer for url & hash of url, which is extremely useful
>>> for debugging
>>
>> Nice idea, I'll see it. Can it be disabled (information leakage)?
>
> There's not really any information leakage per-se, unless you call
> md5suming the url information leakage.
Ah, sorry, I send this comment before actually reading patch in question.
--
Jakub Narebski
Poland
next prev parent reply other threads:[~2010-12-10 1:27 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-09 21:57 [PATCH 00/18] Gitweb caching v8 John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 01/18] gitweb: Prepare for splitting gitweb John 'Warthog9' Hawley
2010-12-09 23:30 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 02/18] gitweb: add output buffering and associated functions John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 03/18] gitweb: File based caching layer (from git.kernel.org) John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 04/18] gitweb: Minimal testing of gitweb caching John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 05/18] gitweb: Regression fix concerning binary output of files John 'Warthog9' Hawley
2010-12-09 23:33 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 06/18] gitweb: Add more explicit means of disabling 'Generating...' page John 'Warthog9' Hawley
2010-12-09 21:57 ` [PATCH 07/18] gitweb: Revert back to $cache_enable vs. $caching_enabled John 'Warthog9' Hawley
2010-12-09 23:38 ` Jakub Narebski
2010-12-10 2:38 ` J.H.
2010-12-10 13:48 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 08/18] gitweb: Change is_cacheable() to return true always John 'Warthog9' Hawley
2010-12-09 23:46 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 09/18] gitweb: Revert reset_output() back to original code John 'Warthog9' Hawley
2010-12-09 23:58 ` Jakub Narebski
2010-12-10 2:43 ` J.H.
2010-12-09 21:57 ` [PATCH 10/18] gitweb: Adding isBinaryAction() and isFeedAction() to determine the action type John 'Warthog9' Hawley
2010-12-10 0:06 ` Jakub Narebski
2010-12-10 3:39 ` J.H.
2010-12-10 12:10 ` Jakub Narebski
2010-12-10 12:25 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 11/18] gitweb: add isDumbClient() check John 'Warthog9' Hawley
2010-12-10 0:12 ` Jakub Narebski
2010-12-10 4:00 ` J.H.
2010-12-11 0:07 ` Junio C Hamano
2010-12-11 0:15 ` Jakub Narebski
2010-12-11 1:15 ` J.H.
2010-12-11 1:40 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 12/18] gitweb: Change file handles (in caching) to lexical variables as opposed to globs John 'Warthog9' Hawley
2010-12-10 0:16 ` Jakub Narebski
2010-12-10 0:32 ` Junio C Hamano
2010-12-10 0:47 ` Jakub Narebski
2010-12-10 5:56 ` J.H.
2010-12-09 21:57 ` [PATCH 13/18] gitweb: Add commented url & url hash to page footer John 'Warthog9' Hawley
2010-12-10 0:26 ` Jakub Narebski
2010-12-10 6:10 ` J.H.
2010-12-09 21:57 ` [PATCH 14/18] gitweb: add print_transient_header() function for central header printing John 'Warthog9' Hawley
2010-12-10 0:36 ` Jakub Narebski
2010-12-10 6:18 ` J.H.
2010-12-09 21:57 ` [PATCH 15/18] gitweb: Add show_warning() to display an immediate warning, with refresh John 'Warthog9' Hawley
2010-12-10 1:01 ` Jakub Narebski
2010-12-10 7:38 ` J.H.
2010-12-10 14:10 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 16/18] gitweb: When changing output (STDOUT) change STDERR as well John 'Warthog9' Hawley
2010-12-10 1:36 ` Jakub Narebski
2010-12-12 5:25 ` J.H.
2010-12-12 15:17 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 17/18] gitweb: Prepare for cached error pages & better error page handling John 'Warthog9' Hawley
2010-12-10 1:49 ` Jakub Narebski
2010-12-10 8:33 ` J.H.
2010-12-10 20:33 ` Jakub Narebski
2010-12-09 21:57 ` [PATCH 18/18] gitweb: Add better error handling for gitweb caching John 'Warthog9' Hawley
2010-12-10 1:56 ` Jakub Narebski
2010-12-09 23:26 ` [PATCH 00/18] Gitweb caching v8 Jakub Narebski
2010-12-10 0:43 ` J.H.
2010-12-10 1:27 ` Jakub Narebski [this message]
2010-12-10 0:39 ` Junio C Hamano
2010-12-10 0:45 ` J.H.
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201012100227.27903.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=warthog9@eaglescrag.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).