Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Narebski <jnareb@gmail.com>
To: Lea Wiemann <lewiemann@gmail.com>
Cc: Rafael Garcia-Suarez <rgarciasuarez@gmail.com>,
	git@vger.kernel.org, Luben Tuikov <ltuikov@yahoo.com>
Subject: Re: [PATCH] Avoid errors from git-rev-parse in gitweb blame
Date: Wed, 4 Jun 2008 02:11:28 +0200	[thread overview]
Message-ID: <200806040211.29430.jnareb@gmail.com> (raw)
In-Reply-To: <4845CF9F.10604@gmail.com>

Lea Wiemann wrote:
> Jakub Narebski wrote:
>>
>> I don't think %parent_commits hash is suitable for caching; it is only
>> intermediate step, reducing number of git command calls (and forks) [...]
>> 
>> ATTENTION! This example shows where caching [parsed] data have problems 
>> compared to front-end caching (caching output).
> 
> ATTENTION!  Could we please stop having this discussion?!

Yeah, yeah, I know.  "Talk is cheap, show me the code" (or at least
pseudocode).

> Your argument  
> is completely bogus.  If the parent commit hashes are in cache, it's an 
> almost zero-time cache lookup.

You have cut a bit too much (quoted a bit too little) for me to decide
if I made myself clear wrt. saving %parent_commits hash into cache.

What I wanted to say that in caching intermediate data for 'blame' view
you have to save to cache something like @blocks (or @lines) array.
This array can contain parents of blamed commits, so there is no need
for saving %parent_commits separately: it would be duplication of
information.  This hash is needed to reduce number of calls to
git-rev-parse, and is used to generate parsed info, which info in turn
(I think) can be cached.

> The only difference it might make  
> compared front-end caching is the CPU time it takes to generate the 
> page, and *I want to see benchmarks before I even start thinking about 
> CPU*.  Okay?  Good, thanks.

The only place where I think front-end caching could be better is
'blob' view with syntax highlighting (using some external filter, like
GNU Source Highlight)... which is not implemented yet.

I thought that snapshots (if enabled) would fall in this category, but
this is the case where data cache is almost identical to output cache
(the same happens for [almost] all "raw" / *_plain views).

> Sorry I'm a little indignant, but you seem to be somehow trying to tell 
> me what to implement, and that gets annoying after a while.  I don't 
> mind your input, but at some point the discussion just doesn't go any 
> further.
> 
>> Problems occur when we try to cache page with _streaming_ output, such 
>> as blob view, blame view, diff part of commitdiff etc.
> 
> We can still stream backend-cache-backed data, though it's a little 
> harder.  It's mostly a memory, not a performance issue though -- the 
> only point where I think it actually would be performance-relevant is 
> blame, and blame doesn't stream anyway (see below).

And snapshots.  We certainly want to stream snapshots, as they can be
quite large.

Also blob_plain view might be difficult, if there are extremely large
binary files in the repository (it should not happen often, but it can
happen).

[...]
>>> 2) Major point: You're still forking a lot.  The Right Thing is to
>>> condense everything into a single call
>> 
>> This is not a good solution for 'blame' view, which is generated "on the 
>> fly", by streaming git-blame output via filter.
> 
> No, whether you have your "while <$fd>" loop or not doesn't make a 
> difference.

It perhaps makes no difference performance wise (solution with
"git rev-list --parents --no-walk" has one fork more), but it might
make code unnecessarily more complicated.  In the rev-list solution
you have to browse git-blame output to gather all blamed commits one
want to find parents of; in the case of extending git-blame you can
just process block after block of code.

> Blame first calculates the whole blame and then dumps it  
> out in zero-time, unless you use --incremental.

There is some code in the mailing list archive (and perhaps used by
repo's gitweb, but I might be mistaken), which adds
git_blame_incremental and use AJAX together with "git blame --incremental"
to reduce latency.  It was done by having JavaScript check if browser
is AJAX-capable, and if it was rewriting 'blame' links to
'blame_incremental'.  But if there exist cached blame, I think it would
be as fast (in terms of latency) to generate 'blame' from cache as to
generate 'blame_incremental'.

> So there's no  
> performance difference in getting all blame output and then dumping it 
> out vs. reading and outputting it line-by-line.

Performance wise, perhaps not.  Memory wise, perhaps yes; better not
to use more memory than needed, especially if memcached is to share
machine.

> And regarding memory,  
> if your blame output doesn't fit into your RAM, you have different kinds 
> of issues.

True.

> JFTR, I don't have any opinion about extending the porcelain output of 
> git-blame (apart from the fact that happens to not be useful for gitweb 
> for the reason I outlined in the previous paragraph).

It would be/might be (I haven't examined corner cases yet) important in
the case of file history which both contains evil merges, and it's
simplified history is different than full history.
-- 
Jakub Narebski
Poland

next prev parent reply	other threads:[~2008-06-04  0:12 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-03 10:46 [PATCH] Avoid errors from git-rev-parse in gitweb blame Rafael Garcia-Suarez
2008-06-03 11:42 ` Lea Wiemann
2008-06-03 11:43 ` Jakub Narebski
2008-06-03 12:03   ` Rafael Garcia-Suarez
2008-06-03 12:45     ` Jakub Narebski
2008-06-03 13:00       ` Rafael Garcia-Suarez
2008-06-03 13:12         ` Jakub Narebski
2008-06-03 13:36           ` Rafael Garcia-Suarez
2008-06-03 14:14             ` Jakub Narebski
2008-06-03 14:40               ` Rafael Garcia-Suarez
2008-06-03 14:56                 ` Jakub Narebski
2008-06-03 15:07                   ` Rafael Garcia-Suarez
2008-06-03 17:50                     ` Jakub Narebski
2008-06-03 21:09                   ` Luben Tuikov
2008-06-03 21:03               ` Luben Tuikov
2008-06-03 20:35         ` Luben Tuikov
2008-06-03 21:31           ` Jakub Narebski
2008-06-04  5:58             ` Junio C Hamano
2008-06-04 14:03               ` Jakub Narebski
2008-06-05  6:07                 ` Junio C Hamano
2008-06-05  6:09                   ` [PATCH 1/2] git-blame: refactor code to emit "porcelain format" output Junio C Hamano
2008-06-06  9:22                     ` Jakub Narebski
2008-06-05  6:09                   ` [PATCH 2/2] blame: show "previous" information in --porcelain/--incremental format Junio C Hamano
2008-06-06  9:27                     ` Jakub Narebski
2008-06-06 15:17                       ` Junio C Hamano
2008-06-06 15:44                         ` Jakub Narebski
2008-06-06  0:26                   ` [PATCH] Avoid errors from git-rev-parse in gitweb blame Jakub Narebski
2008-06-04 22:24               ` Luben Tuikov
2008-06-03 14:24       ` Lea Wiemann
2008-06-03 20:24         ` Jakub Narebski
2008-06-03 23:11           ` Lea Wiemann
2008-06-04  0:11             ` Jakub Narebski [this message]
2008-06-04  0:39               ` Lea Wiemann
2008-06-04 12:31                 ` Jakub Narebski
2008-06-08 18:19             ` Lea Wiemann
2008-06-08 20:28               ` Jakub Narebski
2008-06-03 20:18   ` Luben Tuikov
2008-06-03 20:29     ` Jakub Narebski
2008-06-03 21:27       ` Luben Tuikov
2008-06-03 21:34         ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200806040211.29430.jnareb@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=lewiemann@gmail.com \
    --cc=ltuikov@yahoo.com \
    --cc=rgarciasuarez@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.