From: Michael Haggerty <mhagger@alum.mit.edu>
To: Kirill Likhodedov <kirill.likhodedov@jetbrains.com>, git@vger.kernel.org
Cc: Stanislav.Erokhin@jetbrains.com
Subject: Re: Get all tips quickly
Date: Sun, 13 Apr 2014 21:20:25 +0200 [thread overview]
Message-ID: <534AE379.7000705@alum.mit.edu> (raw)
In-Reply-To: <4A7A3A96-DC10-4748-BBCC-F52F48977022@jetbrains.com>
On 04/13/2014 04:19 PM, Kirill Likhodedov wrote:
> What is fastest possible way to get all “tips” (leafs of the Git log
> graph) in a Git repository with hashes of commits they point to?
>
> We at JetBrains are tuning performance of Git log integration in our
> IntelliJ IDEA and want to get all tips as fast as possible. Currently
> we use 'git log —branches --tags --remotes --no-walk’, but the
> problem is that it is slow if there are a lot of references. In our
> repository we have about 35K tags, and therefore the tags is the main
> slowdown. On the other hand, we have just a couple of dozens of tips
> as well as branch references, and `git log --branches --remotes` is
> very fast.
>
> So we are searching a way to get tags pointing to the graph leafs
> faster.
The fastest ways to get all references plus the commits that are pointed
at by annotated references would probably be `git show-ref -d`. The
funny-looking entries like "refs/tags/v1.7.0^{}" are the annotated tags
peeled to the object that they ultimately refer. But this command
doesn't tell the types of the objects, and there can be trees and blobs
mixed in.
If your question is also to figure out the minimum set of references
that are needed to include all tips (i.e., commits with no descendants),
then the answer is trickier. There is a command that should do what you
say:
git merge-base --independent <commit>...
but (1) with a lot of references, your arguments wouldn't all fit on the
command line (recursive use of xargs might be needed), (2) I don't know
if "merge-base --independent" is programmed to work efficiently on so
many inputs, and (3) I don't know of a cheap way of getting a list of
all commits referred to by references (i.e., dereferencing annotated
tags but ignoring references/annotated tags that refer to trees or blobs).
Another approach is to start by finding the leaf commits by SHA-1. You
can do this by listing all commits, and listing all commits' parents,
and then finding the objects that appear in the first list but not the
second. This could look like
comm -23 \
<(git log --all --pretty=format:'%H' | sort -u) \
<(git log --all --pretty=format:'%P' | tr ' ' '\n' | sort -u)
If you want reference names corresponding to these SHA-1s, you could use
name-rev to convert the SHA-1s into refnames:
git rev-parse --symbolic-full-name $(
comm -23 \
<(git log --all --pretty=format:'%H' | sort -u) \
<(git log --all --pretty=format:'%P' | tr ' ' '\n' | sort -u) |
git name-rev --stdin --name-only
)
The "rev-parse --symbolic-full-name" is needed because "name-ref" seems
only able to emit abbreviated reference names.
In practice, you might want to cache some of the results to avoid having
to do a full tree traversal every time.
> We also tried to read tags by manually parsing .git files (it is
> faster than invoking git log), but unfortunately annotated tags in
> .git/refs/tags/ are written without the hashes they point to (unlike
> .git/packed-refs).
I strongly recommend against parsing these files yourselves. Your
software would not be robust against any future changes to the file
formats etc.
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/
next prev parent reply other threads:[~2014-04-13 19:22 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-13 14:19 Get all tips quickly Kirill Likhodedov
2014-04-13 18:29 ` Ævar Arnfjörð Bjarmason
2014-04-13 19:20 ` Michael Haggerty [this message]
2014-04-14 14:20 ` Kirill Likhodedov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=534AE379.7000705@alum.mit.edu \
--to=mhagger@alum.mit.edu \
--cc=Stanislav.Erokhin@jetbrains.com \
--cc=git@vger.kernel.org \
--cc=kirill.likhodedov@jetbrains.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).