From: Jeff King <peff@peff.net>
To: Derrick Stolee <dstolee@microsoft.com>
Cc: git@vger.kernel.org, avarab@gmail.com, sbeller@google.com,
larsxschneider@gmail.com
Subject: Re: [PATCH 0/3] Lazy-load trees when reading commit-graph
Date: Tue, 3 Apr 2018 09:06:15 -0400 [thread overview]
Message-ID: <20180403130615.GA18824@sigill.intra.peff.net> (raw)
In-Reply-To: <20180403120057.173849-1-dstolee@microsoft.com>
On Tue, Apr 03, 2018 at 08:00:54AM -0400, Derrick Stolee wrote:
> There are several commit-graph walks that require loading many commits
> but never walk the trees reachable from those commits. However, the
> current logic in parse_commit() requires the root tree to be loaded.
> This only uses lookup_tree(), but when reading commits from the commit-
> graph file, the hashcpy() to load the root tree hash and the time spent
> checking the object cache take more time than parsing the rest of the
> commit.
>
> In this patch series, all direct references to accessing the 'tree'
> member of struct commit are replaced instead by one of the following
> methods:
>
> struct tree *get_commit_tree(struct commit *)
> struct object_id *get_commit_tree_oid(struct commit *)
This seems like a pretty sane thing to do. There may still be a few
parts of the code, notably fsck, that are reliant on a "struct object"
having been instantiated to determine whether an object is "used". I
tried to clean that up around the time of c2d17b3b6e (fsck: check
HAS_OBJ more consistently, 2017-01-16), but I won't be surprised if
there's still some hidden assumptions.
I peeked at the fsck.c parts of patch 2, and it looks like we may be OK,
since you use get_commit_tree() during the walk.
> This replacement was assisted by a Coccinelle script, but the 'tree'
> member is overloaded in other types, so the script gave false-positives
> that were removed from the diff.
That catches any existing in-tree callers, but not any topics in flight.
We could add the Coccinelle script and re-run it to catch any future
ones. But perhaps simpler still: can we rename the "tree" member to
"maybe_tree" or something, since nobody should be accessing it directly?
That will give us a compile error if an older topic is merged.
> On the Linux repository, performance tests were run for the following
> command:
>
> git log --graph --oneline -1000
>
> Before: 0.83s
> After: 0.65s
> Rel %: -21.6%
Neat.
Not strictly related, but I happened across another thing today that
might be worth investigating here. We allocate "struct commit" in these
nice big allocation blocks. But then for every commit we allocate at
least one "struct commit_list" to store the parent pointer.
I was looking at this from a memory consumption angle (I found a
pathological repository full of single-parent commits, and this wastes
an extra 16 bytes plus malloc overhead for every 64-byte "struct
commit").
But I wonder if it could also have non-negligible overhead in calling
malloc() for your case, since you've optimized out so much of the rest
of the work.
I'm not sure what the exact solution would be, but I imagine something
like variable-sized "struct commit"s with the parent pointers embedded,
with some kind of flag to indicate the number of parents (and probably
some fallback to break out to a linked list for extreme cases of more
than 2 parents). It may end up pretty invasive, though, as there's a
lot of open-coded traversals of that parent list.
Anyway, not anything to do with this patch, but food for thought as you
micro-optimize these traversals.
-Peff
next prev parent reply other threads:[~2018-04-03 13:06 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-03 12:00 [PATCH 0/3] Lazy-load trees when reading commit-graph Derrick Stolee
2018-04-03 12:00 ` [PATCH 1/3] commit: create get_commit_tree() method Derrick Stolee
2018-04-03 12:00 ` [PATCH 2/3] treewide: use get_commit_tree() for tree access Derrick Stolee
2018-04-03 12:00 ` [PATCH 3/3] commit-graph: lazy-load trees Derrick Stolee
2018-04-03 18:00 ` Stefan Beller
2018-04-03 18:22 ` Derrick Stolee
2018-04-03 18:37 ` Stefan Beller
2018-04-03 12:15 ` [PATCH 0/3] Lazy-load trees when reading commit-graph Derrick Stolee
2018-04-03 13:06 ` Jeff King [this message]
2018-04-03 13:14 ` Derrick Stolee
2018-04-03 20:20 ` Jeff King
2018-04-04 12:08 ` Derrick Stolee
2018-04-06 19:09 ` [PATCH v2 0/4] " Derrick Stolee
2018-04-06 19:09 ` [PATCH v2 1/4] treewide: rename tree to maybe_tree Derrick Stolee
2018-04-06 19:09 ` [PATCH v2 2/4] commit: create get_commit_tree() method Derrick Stolee
2018-04-06 19:09 ` [PATCH v2 3/4] treewide: replace maybe_tree with accessor methods Derrick Stolee
2018-04-06 19:09 ` [PATCH v2 4/4] commit-graph: lazy-load trees for commits Derrick Stolee
2018-04-06 19:21 ` [PATCH v2 0/4] Lazy-load trees when reading commit-graph Jeff King
2018-04-06 19:41 ` Derrick Stolee
2018-04-06 19:45 ` Stefan Beller
2018-04-08 23:18 ` Junio C Hamano
2018-04-09 13:15 ` Derrick Stolee
2018-04-09 17:25 ` Stefan Beller
2018-04-07 18:40 ` Jakub Narebski
2018-04-08 1:17 ` Derrick Stolee
2018-04-11 20:41 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180403130615.GA18824@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=avarab@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=larsxschneider@gmail.com \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).