git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Charles Bailey <charles@hashpling.org>
Cc: Junio Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 3/3] Add filter-objects command
Date: Fri, 19 Jun 2015 06:52:11 -0400	[thread overview]
Message-ID: <20150619105210.GA29755@peff.net> (raw)
In-Reply-To: <20150619103324.GA4093@hashpling.org>

On Fri, Jun 19, 2015 at 11:33:24AM +0100, Charles Bailey wrote:

> > Obviously I've glossed over the "how to get a list of objects" part.
> > If you truly want all objects (not just reachable ones), or if "rev-list
> > --objects" is too slow [...]
> 
> So, yes, performance is definitely an issue and I could have called this
> command "git magically-generate-all-object-for-scripts" but then, as it
> was so easy to provide exactly the filtering that I was looking for in
> the C code, I thought I would do that as well and then "filter-objects"
> ("filter-all-objects"?) seemed like a better name.

Right, my point was only that it works for _your_ particular filter, but
it would be nice to have something more general. And we already have
"cat-file --batch-check". IOW, I think I would prefer the "magical" form
because it's a better scripting building block. As you note,
"filter-objects" without any filters is exactly that. Your 10 extra
lines of C code are not exactly bloat, but I just wonder if other people
will find it all that useful.

> It's about an order of magnitude faster on the systems I've checked to
> do a parameterless filter-objects then rev-list --all --objects,
> although I understand they do different things.

Right, it's the object-opening and hash lookups that kill you in
"rev-list", because it's actually walking the graph.

> I am also thinking about another piece that answers the question: "which
> commits introduce any of (or the first of) this list of objects?". This
> can be done by parseing a diff --raw for commits but I think it should
> be possible to do this faster, too.

If you care about "introduce", I think you have to traverse and do the
diffs. If you only care about "contains" (for example, because you want
to know which path the blob is found at), you can find trees which
mention it, then trees which mention that tree, and so on. I think that
ends up slower in practice, though.

I have patches that implement a "rev-list --find=$sha1", which sets a
bit on $sha1 and then traverses with --objects until we find it (or
them; you can specify multiple). It's pretty straightforward, but it
does cost as much as "git rev-list --objects" in the worst case. Let me
know if you're interested and I can clean it up and post it.

-Peff

  reply	other threads:[~2015-06-19 10:52 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-19  9:10 Improvements to parse-options and a new filter-objects command Charles Bailey
2015-06-19  9:10 ` [PATCH 1/3] Correct test-parse-options to handle negative ints Charles Bailey
2015-06-19 18:28   ` Junio C Hamano
2015-06-19  9:10 ` [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c Charles Bailey
2015-06-19 11:03   ` Remi Galan Alfonso
2015-06-19 11:06     ` Charles Bailey
2015-06-19 17:58   ` Junio C Hamano
2015-06-19 18:39     ` Junio C Hamano
2015-06-20 15:31       ` Jakub Narębski
2015-06-19 18:47     ` Jakub Narębski
2015-06-20 16:51     ` Charles Bailey
2015-06-20 17:47       ` Junio C Hamano
2015-06-19  9:10 ` [PATCH 3/3] Add filter-objects command Charles Bailey
2015-06-19 10:10   ` Jeff King
2015-06-19 10:33     ` Charles Bailey
2015-06-19 10:52       ` Jeff King [this message]
2015-06-19 18:28         ` Junio C Hamano
2015-06-19 10:52       ` John Keeping
2015-06-19 11:04         ` Charles Bailey
2015-06-21 18:25 ` Improvements to integer option parsing Charles Bailey
2015-06-21 18:25   ` [PATCH 1/2] Correct test-parse-options to handle negative ints Charles Bailey
2015-06-21 18:25   ` [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c Charles Bailey
2015-06-21 18:30     ` Charles Bailey
2015-06-22 22:03       ` Junio C Hamano
2015-06-22 22:08     ` Junio C Hamano
2015-06-22 22:09   ` Improvements to integer option parsing Junio C Hamano
2015-06-22 22:42     ` Charles Bailey
2015-06-21 19:20 ` Fast enumeration of objects Charles Bailey
2015-06-21 19:20   ` [PATCH] Add list-all-objects command Charles Bailey
2015-06-22  8:38     ` Jeff King
2015-06-22 10:33       ` Jeff King
2015-06-22 10:40         ` [PATCH 1/7] for_each_packed_object: automatically open pack index Jeff King
2015-06-22 10:40         ` [PATCH 2/7] cat-file: minor style fix in options list Jeff King
2015-06-22 10:41         ` [PATCH 3/7] cat-file: move batch_options definition to top of file Jeff King
2015-06-22 10:45         ` [PATCH 4/7] cat-file: add --buffer option Jeff King
2015-06-22 10:45         ` [PATCH 5/7] cat-file: stop returning value from batch_one_object Jeff King
2015-06-22 10:45         ` [PATCH 6/7] cat-file: split batch_one_object into two stages Jeff King
2015-06-22 10:45         ` [PATCH 7/7] cat-file: add --batch-all-objects option Jeff King
2015-06-26  6:56           ` Eric Sunshine
2015-06-26 15:48             ` Jeff King
2015-06-22 11:06         ` [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects Jeff King
2015-06-22 22:03           ` Charles Bailey
2015-06-22 23:46             ` Jeff King
2015-06-22 21:48         ` [PATCH] Add list-all-objects command Charles Bailey
2015-06-22 21:50         ` Junio C Hamano
2015-06-22 23:50           ` Jeff King
2015-06-22 11:38       ` Charles Bailey
2015-06-22  9:57     ` Duy Nguyen
2015-06-22 10:24       ` Jeff King
2015-06-22  8:35   ` Fast enumeration of objects Jeff King
2015-06-22 19:44     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150619105210.GA29755@peff.net \
    --to=peff@peff.net \
    --cc=charles@hashpling.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).