git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kirill Smelkov <kirr@landau.phys.spbu.ru>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Axel Bonnet <axel.bonnet@ensimag.imag.fr>,
	Cl??ment Poulain <clement.poulain@ensimag.imag.fr>,
	Diane Gasselin <diane.gasselin@ensimag.imag.fr>
Subject: Re: [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid
Date: Sat, 18 Dec 2010 23:55:15 +0300	[thread overview]
Message-ID: <20101218205514.GA21249@landau.phys.spbu.ru> (raw)
In-Reply-To: <20101218161337.GB18643@sigill.intra.peff.net>

On Sat, Dec 18, 2010 at 11:13:37AM -0500, Jeff King wrote:
> On Sat, Dec 18, 2010 at 05:54:12PM +0300, Kirill Smelkov wrote:
> 
> > It turned out, under blame there are requests to fill_textconv() with
> > sha1=0000000000000000000000000000000000000000 and sha1_valid=0.
> > 
> > As the code did not analyzed sha1 validity, we ended up putting 000000
> > into textconv cache which was fooling later blames to discover lots of
> > lines in 'Not Yet Committed' state.
> > [...]
> > -	if (driver->textconv_cache) {
> > +	if (driver->textconv_cache && df->sha1_valid) {
> >  		*outbuf = notes_cache_get(driver->textconv_cache, df->sha1,
> >  					  &size);
> 
> In short:
> 
>   Acked-by: Jeff King <peff@peff.net>
> 
> But it took some thinking to convince myself, so the long answer is
> below if anyone cares.
> 
> I was dubious at first that this could be the right solution. We still
> end up putting the filespec through run_textconv, which didn't seem
> right if it is not valid.
> 
> But reading into it more, there are two levels of invalidity:
> 
>   1. !DIFF_FILE_VALID(df) - we are not a valid file at all. I.e., we are
>      /dev/null.
> 
>   2. !df->sha1_valid - we are pointing to a working tree file whose sha1
>      we don't know
> 
> I think level (2) never happens at all in the regular diff code, which
> is why this case was completely unhandled. But it is OK in that case
> (required, even) to put the contents through run_textconv.
> 
> In theory we could actually calculate the sha1 in case (2) and cache
> under that, but I don't know how much it would buy us in practice. It
> saves us running the textconv filter at the expense of computing the
> sha1. Which is probably a win for most filters, but on the other hand,
> it is the wrong place to compute such a sha1. If it is a working tree
> file, we should ideally update our stat info in the index so that the
> info can be reused.

Jeff,

Thanks for your ACK and for the explanation.

My last patches to git were blame related so semi-intuitively I knew
that invalid sha1 are coming from files in worktree. Your description
makes things much more clear and I'd put it into patch log as well.
What is the best practice for this? For me to re-roll, or for Junio to
merge texts?

Also experimenting shows that, as you say, regular diff does not have this
problem, and also that diff calculates sha1 for files in worktree and so
their textconv results are cached. So clearly, there are some behaviour
differences between diff and blame.


Thanks again,
Kirill

  reply	other threads:[~2010-12-18 20:55 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-18 14:54 [PATCH 1/2] t/t8006: Demonstrate blame is broken when cachetextconv is on Kirill Smelkov
2010-12-18 14:54 ` [PATCH 2/2] fill_textconv(): Don't get/put cache if sha1 is not valid Kirill Smelkov
2010-12-18 16:13   ` Jeff King
2010-12-18 20:55     ` Kirill Smelkov [this message]
2010-12-19  3:23       ` Junio C Hamano
2010-12-19 12:10         ` Kirill Smelkov
2010-12-20  2:41           ` Junio C Hamano
2010-12-20  4:46             ` Jeff King
2010-12-20 19:28               ` Kirill Smelkov
2010-12-20  2:26     ` Junio C Hamano
2010-12-20  4:42       ` Jeff King
2010-12-20  8:42         ` Junio C Hamano
2010-12-20  2:32   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101218205514.GA21249@landau.phys.spbu.ru \
    --to=kirr@landau.phys.spbu.ru \
    --cc=axel.bonnet@ensimag.imag.fr \
    --cc=clement.poulain@ensimag.imag.fr \
    --cc=diane.gasselin@ensimag.imag.fr \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).