git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, git-dev@github.com
Subject: Re: [PATCH] parse_object: try internal cache before reading object db
Date: Thu, 5 Jan 2012 17:18:50 -0500	[thread overview]
Message-ID: <20120105221849.GA5459@sigill.intra.peff.net> (raw)
In-Reply-To: <7vehvdn7at.fsf@alter.siamese.dyndns.org>

On Thu, Jan 05, 2012 at 01:55:22PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The worst potential problem I could come up with is if you somehow had
> > an object whose "parsed" flag was set, but somehow didn't have its other
> > fields set (like type).
> > ...
> > So I think it is safe short of somebody doing some horrible manual
> > munging of a "struct object".
> 
> Yeah, I was worried about codepaths like commit-pretty-printing might be
> mucking with the contents of commit->buffer, perhaps reencoding the text
> and then calling parse_object() to get the unmodified original back, or
> something silly like that. But the lookup_object() call at the beginning
> of the parse_object() already prevents us from doing such a thing, so we
> should be OK, I would think.

Er, without my patch there is no such lookup_object, is there?

What saves you is that the parse_*_buffer functions all do nothing when
the object.parsed flag is set, and the code I added makes sure that
object.parsed is set in the object that lookup_object returns.

So yeah, anytime you tweak the contents of commit->buffer but don't
unset the "parsed" flag, you are asking for trouble.

Here's another possible code path where the behavior is changed:

  1. You set the global save_commit_buffer to 0.

  2. You call parse_commit(commit) on an unparsed commit object, which
     does not save the commit buffer, but does set
     commit->object.parsed.

  3. You call parse_object(commit->object.sha1).

     a. Without my patch, we read the file contents again, do _not_
        re-parse them (because we look up the existing object and notice
        that its "parsed" flag is set), but we _do_ assign the buffer to
        commit->buffer.

     b. With my patch, we see that there is an existing object that is
        already parsed, and return early. commit->buffer remains NULL.

I would argue that this doesn't matter, since "parse_commit" uses the
exact same optimization (it returns early without setting commit->buffer
if the parsed flag is set). So any program turning off
save_commit_buffer has to be ready to deal with a NULL commit->buffer in
the first place. The only exception would be a program that then tries
to fill in the commit->buffer field by manually running parse_object on
an already-parsed, buffer-less commit object. I don't think we do that.

You can verify that commit->buffer is the only place where these issues
can happen by following the logic in parse_object_buffer.

Sorry to belabor the discussion, but this is such a core piece of code,
I want to make sure the optimization isn't hurting anybody (I don't
think it is, and certainly the tests are all happy, but I think talking
through the cases is a good thing).

-Peff

  reply	other threads:[~2012-01-05 22:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-05 21:00 [PATCH] parse_object: try internal cache before reading object db Jeff King
2012-01-05 21:35 ` Junio C Hamano
2012-01-05 21:49   ` Jeff King
2012-01-05 21:55     ` Junio C Hamano
2012-01-05 22:18       ` Jeff King [this message]
2012-01-06 19:16   ` Jeff King
2012-01-06 21:27     ` Junio C Hamano
2012-01-06 22:33       ` Jeff King
2012-01-06 22:45         ` Junio C Hamano
2012-01-06 22:46           ` Jeff King
2012-01-06 19:17   ` [PATCH 1/2] upload-pack: avoid parsing objects during ref advertisement Jeff King
2013-01-18 23:12     ` Junio C Hamano
2013-01-24  7:50       ` Jeff King
2013-01-24 17:25         ` Junio C Hamano
2013-01-29  8:10     ` Shawn Pearce
2013-01-29  8:14       ` Jeff King
2012-01-06 19:18   ` [PATCH 2/2] upload-pack: avoid parsing tag destinations Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120105221849.GA5459@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git-dev@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).