From: Jeff King <peff@peff.net>
To: Michael J Gruber <git@drmicha.warpmail.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/3] fast textconv
Date: Sun, 28 Mar 2010 13:34:20 -0400 [thread overview]
Message-ID: <20100328173420.GA18047@coredump.intra.peff.net> (raw)
In-Reply-To: <20100328165646.GA10293@coredump.intra.peff.net>
On Sun, Mar 28, 2010 at 12:56:46PM -0400, Jeff King wrote:
> Better, but not perfect. My script is below. I get:
>
> $ time git show >/dev/null
> real 0m1.036s
> user 0m0.412s
> sys 0m0.672s
>
> which is still a 2.5x speedup (versus my other fast-textconv solution
> earlier in the thread), but I suspect we can do better. The notes
> mechanism does some up-front work to get very fast lookups, but because
> we invoke git-notes repeatedly, we never get the amortized benefit of
> that up-front work. Doing it in-core would fix that.
Here is a quick and dirty in-core implementation. The most notable
defect is that all textconvs store under refs/notes/textconv, which is
obviously bogus if you might textconv the same blob in two different
ways.
It was relatively easy to code thanks to a very nice notes interface
from Johan Herland. That same commit is now:
$ time git show >/dev/null
real 0m0.350s
user 0m0.172s
sys 0m0.176s
which is starting to feel sufficiently snappy.
Patch is on top of 1/3 from my earlier series (it uses some of the
centralizing cleanup).
diff --git a/diff.c b/diff.c
index 3ddc05e..e1ef25e 100644
--- a/diff.c
+++ b/diff.c
@@ -14,6 +14,8 @@
#include "userdiff.h"
#include "sigchain.h"
#include "submodule.h"
+#include "notes.h"
+#include "refs.h"
#ifdef NO_FAST_WORKING_DIRECTORY
#define FAST_WORKING_DIRECTORY 0
@@ -3881,19 +3883,57 @@ static char *run_textconv(const char *pgm, struct diff_filespec *spec,
return strbuf_detach(&buf, outsize);
}
+static struct notes_tree textconv_notes;
+static int textconv_notes_initialized;
+
+static const unsigned char *run_and_cache_textconv(const char *cmd,
+ struct diff_filespec *df)
+{
+ char *data;
+ size_t size;
+ unsigned char blob_sha1[20];
+ unsigned char tree_sha1[20];
+
+ data = run_textconv(cmd, df, &size);
+ if (!data)
+ die("unable to read files to diff");
+ if (write_sha1_file(data, size, "blob", blob_sha1) < 0)
+ die("unable to write textconv cache blob");
+ free(data);
+
+ add_note(&textconv_notes, df->sha1, blob_sha1, combine_notes_overwrite);
+ if (write_notes_tree(&textconv_notes, tree_sha1) < 0)
+ die("unable to write textconv cache notes tree");
+ update_ref("textconv cache update", "refs/notes/textconv",
+ tree_sha1, NULL, 0, DIE_ON_ERR);
+
+ return get_note(&textconv_notes, df->sha1);
+}
+
static size_t fill_textconv(const char *cmd,
struct diff_filespec *df,
char **outbuf)
{
- size_t size;
+ const unsigned char *sha1;
+ unsigned long size;
+ enum object_type type;
if (!cmd) {
*outbuf = df->data;
return df->size;
}
- *outbuf = run_textconv(cmd, df, &size);
+ if (!textconv_notes_initialized) {
+ init_notes(&textconv_notes, "refs/notes/textconv", 0, 0);
+ textconv_notes_initialized = 1;
+ }
+
+ sha1 = get_note(&textconv_notes, df->sha1);
+ if (!sha1)
+ sha1 = run_and_cache_textconv(cmd, df);
+
+ *outbuf = read_sha1_file(sha1, &type, &size);
if (!*outbuf)
- die("unable to read files to diff");
+ die("unable to read textconv cache object");
return size;
}
next prev parent reply other threads:[~2010-03-28 17:34 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-28 14:53 [PATCH 0/3] fast textconv Jeff King
2010-03-28 14:53 ` [PATCH 1/3] textconv: refactor calls to run_textconv Jeff King
2010-03-28 14:53 ` [PATCH 2/3] textconv: refactor to handle multiple textconv types Jeff King
2010-03-28 14:54 ` [PATCH 3/3] diff: add "fasttextconv" config option Jeff King
2010-03-28 18:23 ` Johannes Sixt
2010-03-30 16:30 ` Jeff King
2010-03-30 17:36 ` [PATCH] diff: fix textconv error zombies Johannes Sixt
2010-03-30 21:46 ` Junio C Hamano
2010-03-30 22:17 ` Johannes Sixt
2010-03-30 22:56 ` Jeff King
2010-03-28 16:09 ` [PATCH 0/3] fast textconv Michael J Gruber
2010-03-28 16:17 ` Jeff King
2010-03-28 16:19 ` Jeff King
2010-03-28 16:56 ` Jeff King
2010-03-28 17:34 ` Jeff King [this message]
2010-03-28 18:13 ` Sverre Rabbelier
2010-03-30 16:04 ` Jeff King
2010-03-30 3:52 ` Junio C Hamano
2010-03-30 17:07 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100328173420.GA18047@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@drmicha.warpmail.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).