git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Michael J Gruber <git@drmicha.warpmail.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/3] fast textconv
Date: Sun, 28 Mar 2010 13:34:20 -0400	[thread overview]
Message-ID: <20100328173420.GA18047@coredump.intra.peff.net> (raw)
In-Reply-To: <20100328165646.GA10293@coredump.intra.peff.net>

On Sun, Mar 28, 2010 at 12:56:46PM -0400, Jeff King wrote:

> Better, but not perfect. My script is below. I get:
> 
>   $ time git show >/dev/null
>   real    0m1.036s
>   user    0m0.412s
>   sys     0m0.672s
> 
> which is still a 2.5x speedup (versus my other fast-textconv solution
> earlier in the thread), but I suspect we can do better. The notes
> mechanism does some up-front work to get very fast lookups, but because
> we invoke git-notes repeatedly, we never get the amortized benefit of
> that up-front work.  Doing it in-core would fix that.

Here is a quick and dirty in-core implementation. The most notable
defect is that all textconvs store under refs/notes/textconv, which is
obviously bogus if you might textconv the same blob in two different
ways.

It was relatively easy to code thanks to a very nice notes interface
from Johan Herland. That same commit is now:

  $ time git show >/dev/null
  real    0m0.350s
  user    0m0.172s
  sys     0m0.176s

which is starting to feel sufficiently snappy.

Patch is on top of 1/3 from my earlier series (it uses some of the
centralizing cleanup).

diff --git a/diff.c b/diff.c
index 3ddc05e..e1ef25e 100644
--- a/diff.c
+++ b/diff.c
@@ -14,6 +14,8 @@
 #include "userdiff.h"
 #include "sigchain.h"
 #include "submodule.h"
+#include "notes.h"
+#include "refs.h"
 
 #ifdef NO_FAST_WORKING_DIRECTORY
 #define FAST_WORKING_DIRECTORY 0
@@ -3881,19 +3883,57 @@ static char *run_textconv(const char *pgm, struct diff_filespec *spec,
 	return strbuf_detach(&buf, outsize);
 }
 
+static struct notes_tree textconv_notes;
+static int textconv_notes_initialized;
+
+static const unsigned char *run_and_cache_textconv(const char *cmd,
+		struct diff_filespec *df)
+{
+	char *data;
+	size_t size;
+	unsigned char blob_sha1[20];
+	unsigned char tree_sha1[20];
+
+	data = run_textconv(cmd, df, &size);
+	if (!data)
+		die("unable to read files to diff");
+	if (write_sha1_file(data, size, "blob", blob_sha1) < 0)
+		die("unable to write textconv cache blob");
+	free(data);
+
+	add_note(&textconv_notes, df->sha1, blob_sha1, combine_notes_overwrite);
+	if (write_notes_tree(&textconv_notes, tree_sha1) < 0)
+		die("unable to write textconv cache notes tree");
+	update_ref("textconv cache update", "refs/notes/textconv",
+			tree_sha1, NULL, 0, DIE_ON_ERR);
+
+	return get_note(&textconv_notes, df->sha1);
+}
+
 static size_t fill_textconv(const char *cmd,
 			    struct diff_filespec *df,
 			    char **outbuf)
 {
-	size_t size;
+	const unsigned char *sha1;
+	unsigned long size;
+	enum object_type type;
 
 	if (!cmd) {
 		*outbuf = df->data;
 		return df->size;
 	}
 
-	*outbuf = run_textconv(cmd, df, &size);
+	if (!textconv_notes_initialized) {
+		init_notes(&textconv_notes, "refs/notes/textconv", 0, 0);
+		textconv_notes_initialized = 1;
+	}
+
+	sha1 = get_note(&textconv_notes, df->sha1);
+	if (!sha1)
+		sha1 = run_and_cache_textconv(cmd, df);
+
+	*outbuf = read_sha1_file(sha1, &type, &size);
 	if (!*outbuf)
-		die("unable to read files to diff");
+		die("unable to read textconv cache object");
 	return size;
 }

  reply	other threads:[~2010-03-28 17:34 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-28 14:53 [PATCH 0/3] fast textconv Jeff King
2010-03-28 14:53 ` [PATCH 1/3] textconv: refactor calls to run_textconv Jeff King
2010-03-28 14:53 ` [PATCH 2/3] textconv: refactor to handle multiple textconv types Jeff King
2010-03-28 14:54 ` [PATCH 3/3] diff: add "fasttextconv" config option Jeff King
2010-03-28 18:23   ` Johannes Sixt
2010-03-30 16:30     ` Jeff King
2010-03-30 17:36       ` [PATCH] diff: fix textconv error zombies Johannes Sixt
2010-03-30 21:46         ` Junio C Hamano
2010-03-30 22:17           ` Johannes Sixt
2010-03-30 22:56             ` Jeff King
2010-03-28 16:09 ` [PATCH 0/3] fast textconv Michael J Gruber
2010-03-28 16:17   ` Jeff King
2010-03-28 16:19     ` Jeff King
2010-03-28 16:56       ` Jeff King
2010-03-28 17:34         ` Jeff King [this message]
2010-03-28 18:13           ` Sverre Rabbelier
2010-03-30 16:04             ` Jeff King
2010-03-30  3:52 ` Junio C Hamano
2010-03-30 17:07   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100328173420.GA18047@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@drmicha.warpmail.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).