From: Jakub Narebski <jnareb@gmail.com>
To: Lea Wiemann <lewiemann@gmail.com>
Cc: Rafael Garcia-Suarez <rgarciasuarez@gmail.com>,
git@vger.kernel.org, Luben Tuikov <ltuikov@yahoo.com>
Subject: Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame
Date: Wed, 4 Jun 2008 02:11:28 +0200 [thread overview]
Message-ID: <200806040211.29430.jnareb@gmail.com> (raw)
In-Reply-To: <4845CF9F.10604@gmail.com>
Lea Wiemann wrote:
> Jakub Narebski wrote:
>>
>> I don't think %parent_commits hash is suitable for caching; it is only
>> intermediate step, reducing number of git command calls (and forks) [...]
>>
>> ATTENTION! This example shows where caching [parsed] data have problems
>> compared to front-end caching (caching output).
>
> ATTENTION! Could we please stop having this discussion?!
Yeah, yeah, I know. "Talk is cheap, show me the code" (or at least
pseudocode).
> Your argument
> is completely bogus. If the parent commit hashes are in cache, it's an
> almost zero-time cache lookup.
You have cut a bit too much (quoted a bit too little) for me to decide
if I made myself clear wrt. saving %parent_commits hash into cache.
What I wanted to say that in caching intermediate data for 'blame' view
you have to save to cache something like @blocks (or @lines) array.
This array can contain parents of blamed commits, so there is no need
for saving %parent_commits separately: it would be duplication of
information. This hash is needed to reduce number of calls to
git-rev-parse, and is used to generate parsed info, which info in turn
(I think) can be cached.
> The only difference it might make
> compared front-end caching is the CPU time it takes to generate the
> page, and *I want to see benchmarks before I even start thinking about
> CPU*. Okay? Good, thanks.
The only place where I think front-end caching could be better is
'blob' view with syntax highlighting (using some external filter, like
GNU Source Highlight)... which is not implemented yet.
I thought that snapshots (if enabled) would fall in this category, but
this is the case where data cache is almost identical to output cache
(the same happens for [almost] all "raw" / *_plain views).
> Sorry I'm a little indignant, but you seem to be somehow trying to tell
> me what to implement, and that gets annoying after a while. I don't
> mind your input, but at some point the discussion just doesn't go any
> further.
>
>> Problems occur when we try to cache page with _streaming_ output, such
>> as blob view, blame view, diff part of commitdiff etc.
>
> We can still stream backend-cache-backed data, though it's a little
> harder. It's mostly a memory, not a performance issue though -- the
> only point where I think it actually would be performance-relevant is
> blame, and blame doesn't stream anyway (see below).
And snapshots. We certainly want to stream snapshots, as they can be
quite large.
Also blob_plain view might be difficult, if there are extremely large
binary files in the repository (it should not happen often, but it can
happen).
[...]
>>> 2) Major point: You're still forking a lot. The Right Thing is to
>>> condense everything into a single call
>>
>> This is not a good solution for 'blame' view, which is generated "on the
>> fly", by streaming git-blame output via filter.
>
> No, whether you have your "while <$fd>" loop or not doesn't make a
> difference.
It perhaps makes no difference performance wise (solution with
"git rev-list --parents --no-walk" has one fork more), but it might
make code unnecessarily more complicated. In the rev-list solution
you have to browse git-blame output to gather all blamed commits one
want to find parents of; in the case of extending git-blame you can
just process block after block of code.
> Blame first calculates the whole blame and then dumps it
> out in zero-time, unless you use --incremental.
There is some code in the mailing list archive (and perhaps used by
repo's gitweb, but I might be mistaken), which adds
git_blame_incremental and use AJAX together with "git blame --incremental"
to reduce latency. It was done by having JavaScript check if browser
is AJAX-capable, and if it was rewriting 'blame' links to
'blame_incremental'. But if there exist cached blame, I think it would
be as fast (in terms of latency) to generate 'blame' from cache as to
generate 'blame_incremental'.
> So there's no
> performance difference in getting all blame output and then dumping it
> out vs. reading and outputting it line-by-line.
Performance wise, perhaps not. Memory wise, perhaps yes; better not
to use more memory than needed, especially if memcached is to share
machine.
> And regarding memory,
> if your blame output doesn't fit into your RAM, you have different kinds
> of issues.
True.
> JFTR, I don't have any opinion about extending the porcelain output of
> git-blame (apart from the fact that happens to not be useful for gitweb
> for the reason I outlined in the previous paragraph).
It would be/might be (I haven't examined corner cases yet) important in
the case of file history which both contains evil merges, and it's
simplified history is different than full history.
--
Jakub Narebski
Poland
next prev parent reply other threads:[~2008-06-04 0:12 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-03 10:46 [PATCH] Avoid errors from git-rev-parse in gitweb blame Rafael Garcia-Suarez
2008-06-03 11:42 ` Lea Wiemann
2008-06-03 11:43 ` Jakub Narebski
2008-06-03 12:03 ` Rafael Garcia-Suarez
2008-06-03 12:45 ` Jakub Narebski
2008-06-03 13:00 ` Rafael Garcia-Suarez
2008-06-03 13:12 ` Jakub Narebski
2008-06-03 13:36 ` Rafael Garcia-Suarez
2008-06-03 14:14 ` Jakub Narebski
2008-06-03 14:40 ` Rafael Garcia-Suarez
2008-06-03 14:56 ` Jakub Narebski
2008-06-03 15:07 ` Rafael Garcia-Suarez
2008-06-03 17:50 ` Jakub Narebski
2008-06-03 21:09 ` Luben Tuikov
2008-06-03 21:03 ` Luben Tuikov
2008-06-03 20:35 ` Luben Tuikov
2008-06-03 21:31 ` Jakub Narebski
2008-06-04 5:58 ` Junio C Hamano
2008-06-04 14:03 ` Jakub Narebski
2008-06-05 6:07 ` Junio C Hamano
2008-06-05 6:09 ` [PATCH 1/2] git-blame: refactor code to emit "porcelain format" output Junio C Hamano
2008-06-06 9:22 ` Jakub Narebski
2008-06-05 6:09 ` [PATCH 2/2] blame: show "previous" information in --porcelain/--incremental format Junio C Hamano
2008-06-06 9:27 ` Jakub Narebski
2008-06-06 15:17 ` Junio C Hamano
2008-06-06 15:44 ` Jakub Narebski
2008-06-06 0:26 ` [PATCH] Avoid errors from git-rev-parse in gitweb blame Jakub Narebski
2008-06-04 22:24 ` Luben Tuikov
2008-06-03 14:24 ` Lea Wiemann
2008-06-03 20:24 ` Jakub Narebski
2008-06-03 23:11 ` Lea Wiemann
2008-06-04 0:11 ` Jakub Narebski [this message]
2008-06-04 0:39 ` Lea Wiemann
2008-06-04 12:31 ` Jakub Narebski
2008-06-08 18:19 ` Lea Wiemann
2008-06-08 20:28 ` Jakub Narebski
2008-06-03 20:18 ` Luben Tuikov
2008-06-03 20:29 ` Jakub Narebski
2008-06-03 21:27 ` Luben Tuikov
2008-06-03 21:34 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200806040211.29430.jnareb@gmail.com \
--to=jnareb@gmail.com \
--cc=git@vger.kernel.org \
--cc=lewiemann@gmail.com \
--cc=ltuikov@yahoo.com \
--cc=rgarciasuarez@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).