From: Jeff King <peff@peff.net>
To: Derrick Stolee <dstolee@microsoft.com>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH] revision.c: reduce object database queries
Date: Sun, 25 Feb 2018 20:30:48 -0500 [thread overview]
Message-ID: <20180226013048.GA8677@sigill.intra.peff.net> (raw)
In-Reply-To: <1519522496-73090-1-git-send-email-dstolee@microsoft.com>
On Sat, Feb 24, 2018 at 08:34:56PM -0500, Derrick Stolee wrote:
> In mark_parents_uninteresting(), we check for the existence of an
> object file to see if we should treat a commit as parsed. The result
> is to set the "parsed" bit on the commit.
>
> Modify the condition to only check has_object_file() if the result
> would change the parsed bit.
>
> When a local branch is different from its upstream ref, "git status"
> will compute ahead/behind counts. This uses paint_down_to_common()
> and hits mark_parents_uninteresting(). On a copy of the Linux repo
> with a local instance of "master" behind the remote branch
> "origin/master" by ~60,000 commits, we find the performance of
> "git status" went from 1.42 seconds to 1.32 seconds, for a relative
> difference of -7.0%.
This looks like an obvious and easy optimization, and I'm OK with this
patch.
But looking at the existing code, it makes me wonder a few things:
> diff --git a/revision.c b/revision.c
> index 5ce9b93..bc7def5 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -113,7 +113,8 @@ void mark_parents_uninteresting(struct commit *commit)
> * it is popped next time around, we won't be trying
> * to parse it and get an error.
> */
> - if (!has_object_file(&commit->object.oid))
> + if (!commit->object.parsed &&
> + !has_object_file(&commit->object.oid))
> commit->object.parsed = 1;
We don't actually need the object contents at all right here. This is
just faking the "parsed" flag for later so that calls to parse_object()
don't barf.
This code comes originally form 454fbbcde3 (git-rev-list: allow missing
objects when the parent is marked UNINTERESTING, 2005-07-10). But later,
in aeeae1b771 (revision traversal: allow UNINTERESTING objects to be
missing, 2009-01-27), we marked dealt with calling parse_object() on the
parents more directly.
So what I wonder is whether this code is simply redundant and can go
away entirely. That would save the has_object_file() call in all cases.
We'd stop setting the fake commit->object.parsed flag, which in theory
means we'd make repeated failed calls to parse_commit_gently() if we
visited the same commit over and over. I wonder if add_parents_to_list()
should skip already-UNINTERESTING parents, which would mean we only try
to access each missing candidate once (OTOH missing ones are probably
rare enough that it doesn't make much difference either way).
Probably the fake commit->object.parsed flag isn't hurting anything in
practice, but it seems pretty nasty to lie about having loaded the
commit in the global store. E.g., imagine a follow-up traversal in the
same process in which that commit _isn't_ UNINTERESTING, and we'd
erroneously treat it as an available commit with no parents.
-Peff
next prev parent reply other threads:[~2018-02-26 1:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-25 1:34 [PATCH] revision.c: reduce object database queries Derrick Stolee
2018-02-25 1:41 ` Derrick Stolee
2018-02-26 1:30 ` Jeff King [this message]
2018-02-26 1:38 ` Jeff King
2018-02-27 23:16 ` Junio C Hamano
2018-02-28 6:37 ` Jeff King
2018-02-28 13:34 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180226013048.GA8677@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).