git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2] reuse cached commit buffer when parsing signatures
Date: Fri, 13 Jun 2014 02:32:11 -0400	[thread overview]
Message-ID: <20140613063211.GA7552@sigill.intra.peff.net> (raw)
In-Reply-To: <20140610213509.GA26979@sigill.intra.peff.net>

On Tue, Jun 10, 2014 at 05:35:09PM -0400, Jeff King wrote:

> Here's a re-roll of the commit-slab series.

And here is a re-roll of the --show-signature speedup on top (i.e., the
point of the series in the first place), adjusted to use
get_commit_buffer.

-- >8 --
Subject: reuse cached commit buffer when parsing signatures

When we call show_signature or show_mergetag, we read the
commit object fresh via read_sha1_file and reparse its
headers. However, in most cases we already have the object
data available, attached to the "struct commit". This is
partially laziness in dealing with the memory allocation
issues, but partially defensive programming, in that we
would always want to verify a clean version of the buffer
(not one that might have been munged by other users of the
commit).

However, we do not currently ever munge the commit buffer,
and not using the already-available buffer carries a fairly
big performance penalty when we are looking at a large
number of commits. Here are timings on linux.git:

  [baseline, no signatures]
  $ time git log >/dev/null
  real    0m4.902s
  user    0m4.784s
  sys     0m0.120s

  [before]
  $ time git log --show-signature >/dev/null
  real    0m14.735s
  user    0m9.964s
  sys     0m0.944s

  [after]
  $ time git log --show-signature >/dev/null
  real    0m9.981s
  user    0m5.260s
  sys     0m0.936s

Note that our user CPU time drops almost in half, close to
the non-signature case, but we do still spend more
wall-clock and system time, presumably from dealing with
gpg.

An alternative to this is to note that most commits do not
have signatures (less than 1% in this repo), yet we pay the
re-parsing cost for every commit just to find out if it has
a mergetag or signature. If we checked that when parsing the
commit initially, we could avoid re-examining most commits
later on. Even if we did pursue that direction, however,
this would still speed up the cases where we _do_ have
signatures. So it's probably worth doing either way.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c   | 27 ++++++++++-----------------
 commit.h   |  2 +-
 log-tree.c |  2 +-
 3 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/commit.c b/commit.c
index a036e18..ebd7ad8 100644
--- a/commit.c
+++ b/commit.c
@@ -1138,17 +1138,14 @@ static int do_sign_commit(struct strbuf *buf, const char *keyid)
 	return 0;
 }
 
-int parse_signed_commit(const unsigned char *sha1,
+int parse_signed_commit(const struct commit *commit,
 			struct strbuf *payload, struct strbuf *signature)
 {
+
 	unsigned long size;
-	enum object_type type;
-	char *buffer = read_sha1_file(sha1, &type, &size);
+	const char *buffer = get_commit_buffer(commit, &size);
 	int in_signature, saw_signature = -1;
-	char *line, *tail;
-
-	if (!buffer || type != OBJ_COMMIT)
-		goto cleanup;
+	const char *line, *tail;
 
 	line = buffer;
 	tail = buffer + size;
@@ -1156,7 +1153,7 @@ int parse_signed_commit(const unsigned char *sha1,
 	saw_signature = 0;
 	while (line < tail) {
 		const char *sig = NULL;
-		char *next = memchr(line, '\n', tail - line);
+		const char *next = memchr(line, '\n', tail - line);
 
 		next = next ? next + 1 : tail;
 		if (in_signature && line[0] == ' ')
@@ -1177,8 +1174,7 @@ int parse_signed_commit(const unsigned char *sha1,
 		}
 		line = next;
 	}
- cleanup:
-	free(buffer);
+	unuse_commit_buffer(commit, buffer);
 	return saw_signature;
 }
 
@@ -1269,8 +1265,7 @@ void check_commit_signature(const struct commit* commit, struct signature_check
 
 	sigc->result = 'N';
 
-	if (parse_signed_commit(commit->object.sha1,
-				&payload, &signature) <= 0)
+	if (parse_signed_commit(commit, &payload, &signature) <= 0)
 		goto out;
 	status = verify_signed_buffer(payload.buf, payload.len,
 				      signature.buf, signature.len,
@@ -1315,11 +1310,9 @@ struct commit_extra_header *read_commit_extra_headers(struct commit *commit,
 {
 	struct commit_extra_header *extra = NULL;
 	unsigned long size;
-	enum object_type type;
-	char *buffer = read_sha1_file(commit->object.sha1, &type, &size);
-	if (buffer && type == OBJ_COMMIT)
-		extra = read_commit_extra_header_lines(buffer, size, exclude);
-	free(buffer);
+	const char *buffer = get_commit_buffer(commit, &size);
+	extra = read_commit_extra_header_lines(buffer, size, exclude);
+	unuse_commit_buffer(commit, buffer);
 	return extra;
 }
 
diff --git a/commit.h b/commit.h
index 61559a9..2e1492a 100644
--- a/commit.h
+++ b/commit.h
@@ -325,7 +325,7 @@ struct merge_remote_desc {
  */
 struct commit *get_merge_parent(const char *name);
 
-extern int parse_signed_commit(const unsigned char *sha1,
+extern int parse_signed_commit(const struct commit *commit,
 			       struct strbuf *message, struct strbuf *signature);
 extern void print_commit_list(struct commit_list *list,
 			      const char *format_cur,
diff --git a/log-tree.c b/log-tree.c
index 4447021..10e6844 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -376,7 +376,7 @@ static void show_signature(struct rev_info *opt, struct commit *commit)
 	struct strbuf gpg_output = STRBUF_INIT;
 	int status;
 
-	if (parse_signed_commit(commit->object.sha1, &payload, &signature) <= 0)
+	if (parse_signed_commit(commit, &payload, &signature) <= 0)
 		goto out;
 
 	status = verify_signed_buffer(payload.buf, payload.len,
-- 
2.0.0.566.gfe3e6b2

      parent reply	other threads:[~2014-06-13  6:32 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-09 18:02 [PATCH 0/15] store length of commit->buffer Jeff King
2014-06-09 18:09 ` [PATCH 01/15] alloc: include any-object allocations in alloc_report Jeff King
2014-06-09 18:09 ` [PATCH 02/15] commit: push commit_index update into alloc_commit_node Jeff King
2014-06-10 21:31   ` Junio C Hamano
2014-06-09 18:10 ` [PATCH 03/15] do not create "struct commit" with xcalloc Jeff King
2014-06-10 21:35   ` Junio C Hamano
2014-06-10 21:49     ` Jeff King
2014-06-09 18:10 ` [PATCH 04/15] logmsg_reencode: return const buffer Jeff King
2014-06-10  1:36   ` Eric Sunshine
2014-06-09 18:10 ` [PATCH 05/15] sequencer: use logmsg_reencode in get_message Jeff King
2014-06-10 21:40   ` Junio C Hamano
2014-06-10 21:50     ` Jeff King
2014-06-09 18:11 ` [PATCH 06/15] provide a helper to free commit buffer Jeff King
2014-06-09 18:11 ` [PATCH 07/15] provide a helper to set the " Jeff King
2014-06-09 18:11 ` [PATCH 08/15] provide helpers to access " Jeff King
2014-06-10  8:00   ` Eric Sunshine
2014-06-09 18:12 ` [PATCH 09/15] use get_cached_commit_buffer where appropriate Jeff King
2014-06-09 18:12 ` [PATCH 10/15] use get_commit_buffer to avoid duplicate code Jeff King
2014-06-10  8:01   ` Eric Sunshine
2014-06-10 20:34     ` Jeff King
2014-06-09 18:13 ` [PATCH 11/15] convert logmsg_reencode to get_commit_buffer Jeff King
2014-06-09 18:13 ` [PATCH 12/15] use get_commit_buffer everywhere Jeff King
2014-06-09 22:40   ` Junio C Hamano
2014-06-10  0:02     ` Jeff King
2014-06-10  0:22       ` Jeff King
2014-06-10 16:06       ` Junio C Hamano
2014-06-10 21:27         ` Jeff King
2014-06-09 18:15 ` [PATCH 13/15] commit-slab: provide a static initializer Jeff King
2014-06-09 18:15 ` [PATCH 14/15] commit: convert commit->buffer to a slab Jeff King
2014-06-09 18:15 ` [PATCH 15/15] commit: record buffer length in cache Jeff King
2014-06-10  5:12   ` Christian Couder
2014-06-10  5:27     ` Jeff King
2014-06-10 20:33       ` Jeff King
2014-06-10 21:30         ` Christian Couder
2014-06-10 21:35 ` [PATCH v2 0/17] store length of commit->buffer Jeff King
2014-06-10 21:36   ` [PATCH 01/17] commit_tree: take a pointer/len pair rather than a const strbuf Jeff King
2014-06-10 21:38   ` [PATCH 02/17] replace dangerous uses of strbuf_attach Jeff King
2014-06-10 21:38   ` [PATCH 03/17] alloc: include any-object allocations in alloc_report Jeff King
2014-06-10 21:39   ` [PATCH 04/17] commit: push commit_index update into alloc_commit_node Jeff King
2014-06-10 21:39   ` [PATCH 05/17] do not create "struct commit" with xcalloc Jeff King
2014-06-10 21:39   ` [PATCH 06/17] logmsg_reencode: return const buffer Jeff King
2014-06-10 21:39   ` [PATCH 07/17] sequencer: use logmsg_reencode in get_message Jeff King
2014-06-10 21:40   ` [PATCH 08/17] provide a helper to free commit buffer Jeff King
2014-06-12 18:22     ` Junio C Hamano
2014-06-12 20:08       ` Jeff King
2014-06-12 22:05         ` Jeff King
2014-06-10 21:40   ` [PATCH 09/17] provide a helper to set the " Jeff King
2014-06-10 21:40   ` [PATCH 10/17] provide helpers to access " Jeff King
2014-06-10 21:40   ` [PATCH 11/17] use get_cached_commit_buffer where appropriate Jeff King
2014-06-10 21:41   ` [PATCH 12/17] use get_commit_buffer to avoid duplicate code Jeff King
2014-06-10 21:41   ` [PATCH 13/17] convert logmsg_reencode to get_commit_buffer Jeff King
2014-06-10 21:41   ` [PATCH 14/17] use get_commit_buffer everywhere Jeff King
2014-06-10 21:42   ` [PATCH 15/17] commit-slab: provide a static initializer Jeff King
2014-06-12 18:15     ` Junio C Hamano
2014-06-12 19:51       ` Jeff King
2014-06-10 21:43   ` [PATCH 16/17] commit: convert commit->buffer to a slab Jeff King
2014-06-10 21:44   ` [PATCH 17/17] commit: record buffer length in cache Jeff King
2014-06-10 21:46   ` [PATCH v2 0/17] store length of commit->buffer Jeff King
2014-06-12 17:22     ` Junio C Hamano
2014-06-13  6:32   ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140613063211.GA7552@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).