git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Derrick Stolee <dstolee@microsoft.com>, git@vger.kernel.org
Subject: Re: [PATCH] revision.c: reduce object database queries
Date: Wed, 28 Feb 2018 01:37:07 -0500	[thread overview]
Message-ID: <20180228063707.GA4409@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqqefl6nhud.fsf@gitster-ct.c.googlers.com>

On Tue, Feb 27, 2018 at 03:16:58PM -0800, Junio C Hamano wrote:

> >> This code comes originally form 454fbbcde3 (git-rev-list: allow missing
> >> objects when the parent is marked UNINTERESTING, 2005-07-10). But later,
> >> in aeeae1b771 (revision traversal: allow UNINTERESTING objects to be
> >> missing, 2009-01-27), we marked dealt with calling parse_object() on the
> >> parents more directly.
> >>
> >> So what I wonder is whether this code is simply redundant and can go
> >> away entirely. That would save the has_object_file() call in all cases.
> 
> Hmm, interesting. I forgot all what I did around this area, but you
> are right.

I'll leave it to Stolee whether he wants to dig into removing the
has_object_file() call. I think it would do the right thing, but the
most interesting bit would be how it impacts the timings.

> > There's a similar case for trees. ...
> > though technically the existing code allows _missing_ trees, but
> > not on corrupt ones.
> 
> True, but the intention of these "do not care too much about missing
> stuff while marking uninteresting" effort is aligned better with
> ignoring corrupt ones, too, I would think, as "missing" in that
> sentence is in fact about "not availble", and stuff that exists in
> corrupt form is still not available anyway.  So I do not think it
> makes a bad change to start allowing corrupt ones.

Agreed. Here it is in patch form, though as we both said, it probably
doesn't matter that much in practice. So I'd be OK dropping it out of
a sense of conservatism.

-- >8 --
Subject: [PATCH] mark_tree_contents_uninteresting: drop has_object check

It's generally acceptable for UNINTERESTING objects in a
traversal to be unavailable (e.g., see aeeae1b771). When
marking trees UNINTERESTING, we access the object database
twice: once to check if the object is missing (and return
quietly if it is), and then again to actually parse it.

We can instead just try to parse; if that fails, we can then
return quietly. That halves the effort we spend on locating
the object.

Note that this isn't _exactly_ the same as the original
behavior, as the parse failure could be due to other
problems than a missing object: it could be corrupted, in
which case the original code would have died. But the new
behavior is arguably better, as it covers the object being
unavailable for any reason. We'll also still issue a warning
to stderr in such a case.

Signed-off-by: Jeff King <peff@peff.net>
---
 revision.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/revision.c b/revision.c
index 5ce9b93baa..221d62c52b 100644
--- a/revision.c
+++ b/revision.c
@@ -51,12 +51,9 @@ static void mark_tree_contents_uninteresting(struct tree *tree)
 {
 	struct tree_desc desc;
 	struct name_entry entry;
-	struct object *obj = &tree->object;
 
-	if (!has_object_file(&obj->oid))
+	if (parse_tree_gently(tree, 1) < 0)
 		return;
-	if (parse_tree(tree) < 0)
-		die("bad tree %s", oid_to_hex(&obj->oid));
 
 	init_tree_desc(&desc, tree->buffer, tree->size);
 	while (tree_entry(&desc, &entry)) {
-- 
2.16.2.582.ge2c16ac3c4


  reply	other threads:[~2018-02-28  6:37 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-25  1:34 [PATCH] revision.c: reduce object database queries Derrick Stolee
2018-02-25  1:41 ` Derrick Stolee
2018-02-26  1:30 ` Jeff King
2018-02-26  1:38   ` Jeff King
2018-02-27 23:16     ` Junio C Hamano
2018-02-28  6:37       ` Jeff King [this message]
2018-02-28 13:34         ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180228063707.GA4409@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).