From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 0/7] cat-file --batch-check performance improvements
Date: Fri, 12 Jul 2013 02:15:34 -0400 [thread overview]
Message-ID: <20130712061533.GA11297@sigill.intra.peff.net> (raw)
In my earlier series introducing "git cat-file --batch-check=<format>",
found here:
http://thread.gmane.org/gmane.comp.version-control.git/229761/focus=230041
I spent a little time optimizing revindex generation, and measured by
requesting information on a single object from a large repository. This
series takes the next logical step: requesting a large number of objects
from a large repository.
There are two major optimizations here:
1. Avoiding extra ref lookups due to the warning in 798c35f (get_sha1:
warn about full or short object names that look like refs,
2013-05-29).
2. Avoiding extra work for delta type resolution when the user has not
asked for %(objecttype).
I prepared the series on top of jk/in-pack-size-measurement, and
certainly optimization 2 is pointless without it (before that topic,
--batch-check always printed the type).
However, the first optimization affects regular --batch-check, and
represents a much more serious performance regression. It looks like
798c35f is in master, but hasn't been released yet, so assuming these
topics graduate before the next release, it should be OK. But if not, we
should consider pulling the first patch out and applying it (or
something like it) separately.
The results for running (in linux.git):
$ git rev-list --objects --all >objects
$ git cat-file --batch-check='%(objectsize:disk)' <objects >/dev/null
are:
before after
real 1m17.143s 0m7.205s
user 0m27.684s 0m6.580s
sys 0m49.320s 0m0.608s
Now, _most_ of that speedup is coming from the first patch, and it's
quite trivial. The rest of the patches involve a lot of refactoring, and
only manage to eke out one more second of performance, so it may not be
worth it (though I think the result actually cleans up the
sha1_object_info_extended interface a bit, and is worth it). Individual
timings are in the commit messages.
The patches are:
[1/7]: cat-file: disable object/refname ambiguity check for batch mode
Optimization 1.
[2/7]: sha1_object_info_extended: rename "status" to "type"
[3/7]: sha1_loose_object_info: make type lookup optional
[4/7]: packed_object_info: hoist delta type resolution to helper
[5/7]: packed_object_info: make type lookup optional
[6/7]: sha1_object_info_extended: make type calculation optional
Optimization 2.
[7/7]: sha1_object_info_extended: pass object_info to helpers
Optional cleanup.
-Peff
next reply other threads:[~2013-07-12 6:15 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-12 6:15 Jeff King [this message]
2013-07-12 6:20 ` [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode Jeff King
2013-07-12 8:47 ` Michael Haggerty
2013-07-12 9:22 ` Jeff King
2013-07-12 10:30 ` Michael Haggerty
2013-07-15 4:23 ` Jeff King
2013-07-15 3:45 ` Junio C Hamano
2013-07-15 4:17 ` Jeff King
2013-07-12 6:21 ` [PATCH 2/7] sha1_object_info_extended: rename "status" to "type" Jeff King
2013-07-12 6:30 ` [PATCH 3/7] sha1_loose_object_info: make type lookup optional Jeff King
2013-07-12 6:31 ` [PATCH 4/7] packed_object_info: hoist delta type resolution to helper Jeff King
2013-07-12 6:32 ` [PATCH 5/7] packed_object_info: make type lookup optional Jeff King
2013-07-12 6:34 ` [PATCH 6/7] sha1_object_info_extended: make type calculation optional Jeff King
2013-07-12 6:37 ` [PATCH 7/7] sha1_object_info_extended: pass object_info to helpers Jeff King
2013-07-12 17:23 ` [PATCH 0/7] cat-file --batch-check performance improvements Junio C Hamano
2013-07-12 20:12 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130712061533.GA11297@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).