From: Charles Bailey <charles@hashpling.org>
To: Jeff King <peff@peff.net>
Cc: Junio Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 3/3] Add filter-objects command
Date: Fri, 19 Jun 2015 11:33:24 +0100 [thread overview]
Message-ID: <20150619103324.GA4093@hashpling.org> (raw)
In-Reply-To: <20150619101010.GA15802@peff.net>
On Fri, Jun 19, 2015 at 06:10:10AM -0400, Jeff King wrote:
> On Fri, Jun 19, 2015 at 10:10:59AM +0100, Charles Bailey wrote:
>
> > filter-objects is a command to scan all objects in the object database
> > for the repository and print the ids of those which match the given
> > criteria.
> >
> > The current supported criteria are object type and the minimum size of
> > the object.
> >
> > The guiding use case is to scan repositories quickly for large objects
> > which may cause performance issues for users. The list of objects can
> > then be used to guide some future remediating action.
>
> I've had to perform this exact same task. You can already do the
> "filtering" part pretty easily and efficiently with cat-file and a perl
> script, like:
>
> magically_generate_all_objects |
> git cat-file --batch-check='%(objectsize) %(objectname)' |
> perl -alne 'print $F[1] if $F[0] > 1234'
>
> That's not as friendly as your filter-objects, but it's a lot more
> flexible (since you can ask cat-file for all sorts of information).
>
> Obviously I've glossed over the "how to get a list of objects" part.
> If you truly want all objects (not just reachable ones), or if "rev-list
> --objects" is too slow [...]
So, yes, performance is definitely an issue and I could have called this
command "git magically-generate-all-object-for-scripts" but then, as it
was so easy to provide exactly the filtering that I was looking for in
the C code, I thought I would do that as well and then "filter-objects"
("filter-all-objects"?) seemed like a better name.
It's about an order of magnitude faster on the systems I've checked to
do a parameterless filter-objects then rev-list --all --objects,
although I understand they do different things.
I am also thinking about another piece that answers the question: "which
commits introduce any of (or the first of) this list of objects?". This
can be done by parseing a diff --raw for commits but I think it should
be possible to do this faster, too.
Charles.
next prev parent reply other threads:[~2015-06-19 10:41 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-19 9:10 Improvements to parse-options and a new filter-objects command Charles Bailey
2015-06-19 9:10 ` [PATCH 1/3] Correct test-parse-options to handle negative ints Charles Bailey
2015-06-19 18:28 ` Junio C Hamano
2015-06-19 9:10 ` [PATCH 2/3] Move unsigned long option parsing out of pack-objects.c Charles Bailey
2015-06-19 11:03 ` Remi Galan Alfonso
2015-06-19 11:06 ` Charles Bailey
2015-06-19 17:58 ` Junio C Hamano
2015-06-19 18:39 ` Junio C Hamano
2015-06-20 15:31 ` Jakub Narębski
2015-06-19 18:47 ` Jakub Narębski
2015-06-20 16:51 ` Charles Bailey
2015-06-20 17:47 ` Junio C Hamano
2015-06-19 9:10 ` [PATCH 3/3] Add filter-objects command Charles Bailey
2015-06-19 10:10 ` Jeff King
2015-06-19 10:33 ` Charles Bailey [this message]
2015-06-19 10:52 ` Jeff King
2015-06-19 18:28 ` Junio C Hamano
2015-06-19 10:52 ` John Keeping
2015-06-19 11:04 ` Charles Bailey
2015-06-21 18:25 ` Improvements to integer option parsing Charles Bailey
2015-06-21 18:25 ` [PATCH 1/2] Correct test-parse-options to handle negative ints Charles Bailey
2015-06-21 18:25 ` [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c Charles Bailey
2015-06-21 18:30 ` Charles Bailey
2015-06-22 22:03 ` Junio C Hamano
2015-06-22 22:08 ` Junio C Hamano
2015-06-22 22:09 ` Improvements to integer option parsing Junio C Hamano
2015-06-22 22:42 ` Charles Bailey
2015-06-21 19:20 ` Fast enumeration of objects Charles Bailey
2015-06-21 19:20 ` [PATCH] Add list-all-objects command Charles Bailey
2015-06-22 8:38 ` Jeff King
2015-06-22 10:33 ` Jeff King
2015-06-22 10:40 ` [PATCH 1/7] for_each_packed_object: automatically open pack index Jeff King
2015-06-22 10:40 ` [PATCH 2/7] cat-file: minor style fix in options list Jeff King
2015-06-22 10:41 ` [PATCH 3/7] cat-file: move batch_options definition to top of file Jeff King
2015-06-22 10:45 ` [PATCH 4/7] cat-file: add --buffer option Jeff King
2015-06-22 10:45 ` [PATCH 5/7] cat-file: stop returning value from batch_one_object Jeff King
2015-06-22 10:45 ` [PATCH 6/7] cat-file: split batch_one_object into two stages Jeff King
2015-06-22 10:45 ` [PATCH 7/7] cat-file: add --batch-all-objects option Jeff King
2015-06-26 6:56 ` Eric Sunshine
2015-06-26 15:48 ` Jeff King
2015-06-22 11:06 ` [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects Jeff King
2015-06-22 22:03 ` Charles Bailey
2015-06-22 23:46 ` Jeff King
2015-06-22 21:48 ` [PATCH] Add list-all-objects command Charles Bailey
2015-06-22 21:50 ` Junio C Hamano
2015-06-22 23:50 ` Jeff King
2015-06-22 11:38 ` Charles Bailey
2015-06-22 9:57 ` Duy Nguyen
2015-06-22 10:24 ` Jeff King
2015-06-22 8:35 ` Fast enumeration of objects Jeff King
2015-06-22 19:44 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150619103324.GA4093@hashpling.org \
--to=charles@hashpling.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).