From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Looking up objects that point to other objects
Date: Fri, 26 Aug 2011 16:10:55 -0400 [thread overview]
Message-ID: <20110826201055.GA9223@sigill.intra.peff.net> (raw)
In-Reply-To: <CACBZZX6sydEmuwj_C-KNocjra=6ynud5KFoezPd_Rr3bN4wh2w@mail.gmail.com>
On Fri, Aug 26, 2011 at 09:01:22PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Here's a couple of tasks that require brute-force with the Git object
> format that I've wanted to do at some point.
>
> * Associate a blob with trees
>
> Given a blob sha1 find trees that reference it.
>
> * Associate trees with commits / other trees.
>
> Given a tree find which commit points to that tree, or a parent
> tree N levels up the stack that a commit points to.
I've more frequently wanted to find the entrance and exit points of a
particular blob in history, and used something like:
git log --all --no-abbrev -c --raw --format='commit %H' |
perl -le '
my @blobs = map { qr/$_/ } @ARGV;
while(<STDIN>) {
if (/^commit (.*)/) {
$commit = $1;
}
else {
foreach my $re (@blobs) {
next unless /$re/;
print $commit;
last;
}
}
}
' $blobs
which is fairly efficient. It's brute-force, but at least it all happens
in O(1) processes. It can find blobs in git.git in about a minute or so
on my machine.
I don't think there's a way to ask for all of the trees in a commit in a
single process. It wouldn't be hard to write in C, of course, but it's
not something the current tools support. However, you can use the above
script to narrow the range of commits that you know contain a blob, and
then individually run ls-tree each one. It's better at least than running
ls-tree on every commit in the repo.
Anything that iterates over commits is going to end up seeing the same
trees again and again. I think you could probably do better by thinking
of it like a directed graph problem. Nodes are sha1s, and edges are any
references of interest:
1. For a commit, make an edge from the commit to its tree.
2. For a tree, make an edge from the tree to each of its entries.
And then the problem is reduced to "find all commit nodes that have a
path to $blob". Which you can do by breadth-first search from the
commits (or backwards from the blob).
-Peff
prev parent reply other threads:[~2011-08-26 20:11 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-26 19:01 Looking up objects that point to other objects Ævar Arnfjörð Bjarmason
2011-08-26 20:10 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110826201055.GA9223@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).