* [PATCH 1/5] Add test-scrap-cache-tree
2011-12-06 17:43 [PATCH 0/5] cache-tree revisited Thomas Rast
@ 2011-12-06 17:43 ` Thomas Rast
2011-12-06 22:51 ` Junio C Hamano
2011-12-06 17:43 ` [PATCH 2/5] Test the current state of the cache-tree optimization Thomas Rast
` (4 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Thomas Rast @ 2011-12-06 17:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Carlos Martín Nieto
A simple utility that invalidates all existing cache-tree data. We
need this for tests. (We don't need a tool to rebuild the cache-tree
data; git read-tree HEAD works for that.)
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
.gitignore | 1 +
Makefile | 1 +
test-scrap-cache-tree.c | 17 +++++++++++++++++
3 files changed, 19 insertions(+), 0 deletions(-)
create mode 100644 test-scrap-cache-tree.c
diff --git a/.gitignore b/.gitignore
index 8572c8c..122336c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -171,6 +171,7 @@
/test-date
/test-delta
/test-dump-cache-tree
+/test-scrap-cache-tree
/test-genrandom
/test-index-version
/test-line-buffer
diff --git a/Makefile b/Makefile
index cbb5601..6bfa10b 100644
--- a/Makefile
+++ b/Makefile
@@ -436,6 +436,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
TEST_PROGRAMS_NEED_X += test-date
TEST_PROGRAMS_NEED_X += test-delta
TEST_PROGRAMS_NEED_X += test-dump-cache-tree
+TEST_PROGRAMS_NEED_X += test-scrap-cache-tree
TEST_PROGRAMS_NEED_X += test-genrandom
TEST_PROGRAMS_NEED_X += test-index-version
TEST_PROGRAMS_NEED_X += test-line-buffer
diff --git a/test-scrap-cache-tree.c b/test-scrap-cache-tree.c
new file mode 100644
index 0000000..4728013
--- /dev/null
+++ b/test-scrap-cache-tree.c
@@ -0,0 +1,17 @@
+#include "cache.h"
+#include "tree.h"
+#include "cache-tree.h"
+
+static struct lock_file index_lock;
+
+int main(int ac, char **av)
+{
+ int fd = hold_locked_index(&index_lock, 1);
+ if (read_cache() < 0)
+ die("unable to read index file");
+ active_cache_tree = NULL;
+ if (write_cache(fd, active_cache, active_nr)
+ || commit_lock_file(&index_lock))
+ die("unable to write index file");
+ return 0;
+}
--
1.7.8.425.ga639d3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/5] Test the current state of the cache-tree optimization
2011-12-06 17:43 [PATCH 0/5] cache-tree revisited Thomas Rast
2011-12-06 17:43 ` [PATCH 1/5] Add test-scrap-cache-tree Thomas Rast
@ 2011-12-06 17:43 ` Thomas Rast
2011-12-06 17:43 ` [PATCH 3/5] Refactor cache_tree_update idiom from commit Thomas Rast
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Thomas Rast @ 2011-12-06 17:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Carlos Martín Nieto
The cache-tree optimization originally helped speed up write-tree
operation. However, many commands no longer properly maintain -- or
use an opportunity to cheaply generate -- the cache-tree data. In
particular, this affects commit, checkout and reset. The notable
examples that *do* write cache-tree data are read-tree and write-tree.
This sadly means most people no longer benefit from the optimization,
as they would not normally use the plumbing commands.
Document the current state of affairs in a test file, in preparation
for improvements in the area.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
t/t0090-cache-tree.sh | 95 +++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 95 insertions(+), 0 deletions(-)
create mode 100755 t/t0090-cache-tree.sh
diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh
new file mode 100755
index 0000000..3d0702a
--- /dev/null
+++ b/t/t0090-cache-tree.sh
@@ -0,0 +1,95 @@
+#!/bin/sh
+
+test_description="Test whether cache-tree is properly updated
+
+Tests whether various commands properly update and/or rewrite the
+cache-tree extension.
+"
+ . ./test-lib.sh
+
+cmp_cache_tree () {
+ test-dump-cache-tree >actual &&
+ sed "s/$_x40/SHA/" <actual >filtered &&
+ test_cmp "$1" filtered
+}
+
+# We don't bother with actually checking the SHA1:
+# test-dump-cache-tree already verifies that all existing data is
+# correct.
+test_shallow_cache_tree () {
+ echo "SHA " \
+ "($(git ls-files|wc -l) entries, 0 subtrees)" >expect &&
+ cmp_cache_tree expect
+}
+
+test_invalid_cache_tree () {
+ echo "invalid (0 subtrees)" >expect &&
+ echo "SHA #(ref) " \
+ "($(git ls-files|wc -l) entries, 0 subtrees)" >>expect &&
+ cmp_cache_tree expect
+}
+
+test_no_cache_tree () {
+ : >expect &&
+ cmp_cache_tree expect
+}
+
+test_expect_failure 'initial commit has cache-tree' '
+ test_commit foo &&
+ test_shallow_cache_tree
+'
+
+test_expect_success 'read-tree HEAD establishes cache-tree' '
+ git read-tree HEAD &&
+ test_shallow_cache_tree
+'
+
+test_expect_success 'git-add invalidates cache-tree' '
+ test_when_finished "git reset --hard; git read-tree HEAD" &&
+ echo "I changed this file" > foo &&
+ git add foo &&
+ test_invalid_cache_tree
+'
+
+test_expect_success 'update-index invalidates cache-tree' '
+ test_when_finished "git reset --hard; git read-tree HEAD" &&
+ echo "I changed this file" > foo &&
+ git update-index --add foo &&
+ test_invalid_cache_tree
+'
+
+test_expect_success 'write-tree establishes cache-tree' '
+ test-scrap-cache-tree &&
+ git write-tree &&
+ test_shallow_cache_tree
+'
+
+test_expect_success 'test-scrap-cache-tree works' '
+ git read-tree HEAD &&
+ test-scrap-cache-tree &&
+ test_no_cache_tree
+'
+
+test_expect_failure 'second commit has cache-tree' '
+ test_commit bar &&
+ test_shallow_cache_tree
+'
+
+test_expect_failure 'reset --hard gives cache-tree' '
+ test-scrap-cache-tree &&
+ git reset --hard &&
+ test_shallow_cache_tree
+'
+
+test_expect_failure 'reset --hard without index gives cache-tree' '
+ rm -f .git/index &&
+ git reset --hard &&
+ test_shallow_cache_tree
+'
+
+test_expect_failure 'checkout gives cache-tree' '
+ git checkout HEAD^ &&
+ test_shallow_cache_tree
+'
+
+test_done
--
1.7.8.425.ga639d3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/5] Refactor cache_tree_update idiom from commit
2011-12-06 17:43 [PATCH 0/5] cache-tree revisited Thomas Rast
2011-12-06 17:43 ` [PATCH 1/5] Add test-scrap-cache-tree Thomas Rast
2011-12-06 17:43 ` [PATCH 2/5] Test the current state of the cache-tree optimization Thomas Rast
@ 2011-12-06 17:43 ` Thomas Rast
2011-12-06 17:43 ` [PATCH 4/5] commit: write cache-tree data when writing index anyway Thomas Rast
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Thomas Rast @ 2011-12-06 17:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Carlos Martín Nieto
We'll need to safely create or update the cache-tree data of the_index
from other places. While at it, give it an argument that lets us
silence the messages produced by unmerged entries (which prevent it
from working).
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
builtin/commit.c | 5 +----
cache-tree.c | 19 +++++++++++++++----
cache-tree.h | 4 +++-
merge-recursive.c | 2 +-
test-dump-cache-tree.c | 2 +-
5 files changed, 21 insertions(+), 11 deletions(-)
diff --git a/builtin/commit.c b/builtin/commit.c
index e36e9ad..0163086 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -862,10 +862,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
*/
discard_cache();
read_cache_from(index_file);
- if (!active_cache_tree)
- active_cache_tree = cache_tree();
- if (cache_tree_update(active_cache_tree,
- active_cache, active_nr, 0, 0) < 0) {
+ if (update_main_cache_tree(0)) {
error(_("Error building trees"));
return 0;
}
diff --git a/cache-tree.c b/cache-tree.c
index f755590..8de3959 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -150,7 +150,7 @@ void cache_tree_invalidate_path(struct cache_tree *it, const char *path)
}
static int verify_cache(struct cache_entry **cache,
- int entries)
+ int entries, int silent)
{
int i, funny;
@@ -159,6 +159,8 @@ static int verify_cache(struct cache_entry **cache,
for (i = 0; i < entries; i++) {
struct cache_entry *ce = cache[i];
if (ce_stage(ce) || (ce->ce_flags & CE_INTENT_TO_ADD)) {
+ if (silent)
+ return -1;
if (10 < ++funny) {
fprintf(stderr, "...\n");
break;
@@ -370,10 +372,11 @@ int cache_tree_update(struct cache_tree *it,
struct cache_entry **cache,
int entries,
int missing_ok,
- int dryrun)
+ int dryrun,
+ int silent)
{
int i;
- i = verify_cache(cache, entries);
+ i = verify_cache(cache, entries, silent);
if (i)
return i;
i = update_one(it, cache, entries, "", 0, missing_ok, dryrun);
@@ -573,7 +576,7 @@ int write_cache_as_tree(unsigned char *sha1, int flags, const char *prefix)
if (cache_tree_update(active_cache_tree,
active_cache, active_nr,
- missing_ok, 0) < 0)
+ missing_ok, 0, 0) < 0)
return WRITE_TREE_UNMERGED_INDEX;
if (0 <= newfd) {
if (!write_cache(newfd, active_cache, active_nr) &&
@@ -668,3 +671,11 @@ int cache_tree_matches_traversal(struct cache_tree *root,
return it->entry_count;
return 0;
}
+
+int update_main_cache_tree (int silent)
+{
+ if (!the_index.cache_tree)
+ the_index.cache_tree = cache_tree();
+ return cache_tree_update(the_index.cache_tree,
+ the_index.cache, the_index.cache_nr, 0, 0, silent);
+}
diff --git a/cache-tree.h b/cache-tree.h
index 3df641f..0ec0b2a 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -29,7 +29,9 @@ struct cache_tree {
struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
int cache_tree_fully_valid(struct cache_tree *);
-int cache_tree_update(struct cache_tree *, struct cache_entry **, int, int, int);
+int cache_tree_update(struct cache_tree *, struct cache_entry **, int, int, int, int);
+
+int update_main_cache_tree(int);
/* bitmasks to write_cache_as_tree flags */
#define WRITE_TREE_MISSING_OK 1
diff --git a/merge-recursive.c b/merge-recursive.c
index 5a2db29..d83cd6c 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -264,7 +264,7 @@ struct tree *write_tree_from_memory(struct merge_options *o)
if (!cache_tree_fully_valid(active_cache_tree) &&
cache_tree_update(active_cache_tree,
- active_cache, active_nr, 0, 0) < 0)
+ active_cache, active_nr, 0, 0, 0) < 0)
die("error building trees");
result = lookup_tree(active_cache_tree->sha1);
diff --git a/test-dump-cache-tree.c b/test-dump-cache-tree.c
index 1f73f1e..e6c2923 100644
--- a/test-dump-cache-tree.c
+++ b/test-dump-cache-tree.c
@@ -59,6 +59,6 @@ int main(int ac, char **av)
struct cache_tree *another = cache_tree();
if (read_cache() < 0)
die("unable to read index file");
- cache_tree_update(another, active_cache, active_nr, 0, 1);
+ cache_tree_update(another, active_cache, active_nr, 0, 1, 0);
return dump_cache_tree(active_cache_tree, another, "");
}
--
1.7.8.425.ga639d3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/5] commit: write cache-tree data when writing index anyway
2011-12-06 17:43 [PATCH 0/5] cache-tree revisited Thomas Rast
` (2 preceding siblings ...)
2011-12-06 17:43 ` [PATCH 3/5] Refactor cache_tree_update idiom from commit Thomas Rast
@ 2011-12-06 17:43 ` Thomas Rast
2011-12-06 17:43 ` [PATCH 5/5] reset: update cache-tree data when appropriate Thomas Rast
2011-12-08 14:15 ` [PATCH 0/5] cache-tree revisited Thomas Rast
5 siblings, 0 replies; 10+ messages in thread
From: Thomas Rast @ 2011-12-06 17:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Carlos Martín Nieto
In prepare_index(), we refresh the index, and then write it to disk if
this changed the index data. After running hooks we re-read the index
and compute the root tree sha1 with the cache-tree machinery.
This gives us a mostly free opportunity to write up-to-date cache-tree
data: we can compute it in prepare_index() immediately before writing
the index to disk.
If we do this, we were going to write the index anyway, and the later
cache-tree update has no further work to do. If we don't do it, we
don't do any extra work, though we still don't have have cache-tree
data after the commit.
The only case that suffers badly is when the pre-commit hook changes
many trees in the index. I'm writing this off as highly unusual.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
Don't ask me why git thinks the index has been changed only when
already building upon an earlier commit. I don't know. I suspect
it's some raciness issue though.
builtin/commit.c | 2 ++
t/t0090-cache-tree.sh | 2 +-
2 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/builtin/commit.c b/builtin/commit.c
index 0163086..57d028e 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -394,6 +394,7 @@ static char *prepare_index(int argc, const char **argv, const char *prefix,
fd = hold_locked_index(&index_lock, 1);
add_files_to_cache(also ? prefix : NULL, pathspec, 0);
refresh_cache_or_die(refresh_flags);
+ update_main_cache_tree(1);
if (write_cache(fd, active_cache, active_nr) ||
close_lock_file(&index_lock))
die(_("unable to write new_index file"));
@@ -414,6 +415,7 @@ static char *prepare_index(int argc, const char **argv, const char *prefix,
fd = hold_locked_index(&index_lock, 1);
refresh_cache_or_die(refresh_flags);
if (active_cache_changed) {
+ update_main_cache_tree(1);
if (write_cache(fd, active_cache, active_nr) ||
commit_locked_index(&index_lock))
die(_("unable to write new_index file"));
diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh
index 3d0702a..a3527a5 100755
--- a/t/t0090-cache-tree.sh
+++ b/t/t0090-cache-tree.sh
@@ -70,7 +70,7 @@ test_expect_success 'test-scrap-cache-tree works' '
test_no_cache_tree
'
-test_expect_failure 'second commit has cache-tree' '
+test_expect_success 'second commit has cache-tree' '
test_commit bar &&
test_shallow_cache_tree
'
--
1.7.8.425.ga639d3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/5] reset: update cache-tree data when appropriate
2011-12-06 17:43 [PATCH 0/5] cache-tree revisited Thomas Rast
` (3 preceding siblings ...)
2011-12-06 17:43 ` [PATCH 4/5] commit: write cache-tree data when writing index anyway Thomas Rast
@ 2011-12-06 17:43 ` Thomas Rast
2011-12-06 23:13 ` Junio C Hamano
2011-12-08 14:15 ` [PATCH 0/5] cache-tree revisited Thomas Rast
5 siblings, 1 reply; 10+ messages in thread
From: Thomas Rast @ 2011-12-06 17:43 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Carlos Martín Nieto
In the case of --mixed and --hard, we throw away the old index and
rebuild everything from the tree argument (or HEAD). So we have an
opportunity here to fill in the cache-tree data, just as read-tree
did.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
builtin/reset.c | 7 +++++++
t/t0090-cache-tree.sh | 4 ++--
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/builtin/reset.c b/builtin/reset.c
index 811e8e2..8c2c1d5 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -43,6 +43,7 @@ static int reset_index_file(const unsigned char *sha1, int reset_type, int quiet
int nr = 1;
int newfd;
struct tree_desc desc[2];
+ struct tree *tree;
struct unpack_trees_options opts;
struct lock_file *lock = xcalloc(1, sizeof(struct lock_file));
@@ -84,6 +85,12 @@ static int reset_index_file(const unsigned char *sha1, int reset_type, int quiet
return error(_("Failed to find tree of %s."), sha1_to_hex(sha1));
if (unpack_trees(nr, desc, &opts))
return -1;
+
+ if (reset_type == MIXED || reset_type == HARD) {
+ tree = parse_tree_indirect(sha1);
+ prime_cache_tree(&active_cache_tree, tree);
+ }
+
if (write_cache(newfd, active_cache, active_nr) ||
commit_locked_index(lock))
return error(_("Could not write new index file."));
diff --git a/t/t0090-cache-tree.sh b/t/t0090-cache-tree.sh
index a3527a5..f972562 100755
--- a/t/t0090-cache-tree.sh
+++ b/t/t0090-cache-tree.sh
@@ -75,13 +75,13 @@ test_expect_success 'second commit has cache-tree' '
test_shallow_cache_tree
'
-test_expect_failure 'reset --hard gives cache-tree' '
+test_expect_success 'reset --hard gives cache-tree' '
test-scrap-cache-tree &&
git reset --hard &&
test_shallow_cache_tree
'
-test_expect_failure 'reset --hard without index gives cache-tree' '
+test_expect_success 'reset --hard without index gives cache-tree' '
rm -f .git/index &&
git reset --hard &&
test_shallow_cache_tree
--
1.7.8.425.ga639d3
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 5/5] reset: update cache-tree data when appropriate
2011-12-06 17:43 ` [PATCH 5/5] reset: update cache-tree data when appropriate Thomas Rast
@ 2011-12-06 23:13 ` Junio C Hamano
2011-12-07 7:53 ` Thomas Rast
0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2011-12-06 23:13 UTC (permalink / raw)
To: Thomas Rast; +Cc: git, Carlos Martín Nieto
Thomas Rast <trast@student.ethz.ch> writes:
> In the case of --mixed and --hard, we throw away the old index and
> rebuild everything from the tree argument (or HEAD). So we have an
> opportunity here to fill in the cache-tree data, just as read-tree
> did.
>
> Signed-off-by: Thomas Rast <trast@student.ethz.ch>
> ---
> builtin/reset.c | 7 +++++++
> t/t0090-cache-tree.sh | 4 ++--
> 2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/reset.c b/builtin/reset.c
> index 811e8e2..8c2c1d5 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -43,6 +43,7 @@ static int reset_index_file(const unsigned char *sha1, int reset_type, int quiet
> int nr = 1;
> int newfd;
> struct tree_desc desc[2];
> + struct tree *tree;
> struct unpack_trees_options opts;
> struct lock_file *lock = xcalloc(1, sizeof(struct lock_file));
>
> @@ -84,6 +85,12 @@ static int reset_index_file(const unsigned char *sha1, int reset_type, int quiet
> return error(_("Failed to find tree of %s."), sha1_to_hex(sha1));
> if (unpack_trees(nr, desc, &opts))
> return -1;
> +
> + if (reset_type == MIXED || reset_type == HARD) {
> + tree = parse_tree_indirect(sha1);
> + prime_cache_tree(&active_cache_tree, tree);
> + }
The basic idea that MIXED or HARD should result in a cache-tree that match
the tree we just read is sound, but how expensive is prime_cache_tree()? I
think it reads the same tree once again. Admittedly, the data needed to
reconstruct the tree is likely to be hot in core, but it may be necessary
to measure before deciding if this is a good change.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 5/5] reset: update cache-tree data when appropriate
2011-12-06 23:13 ` Junio C Hamano
@ 2011-12-07 7:53 ` Thomas Rast
0 siblings, 0 replies; 10+ messages in thread
From: Thomas Rast @ 2011-12-07 7:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Carlos Martín Nieto
Junio C Hamano wrote:
> Thomas Rast <trast@student.ethz.ch> writes:
> > @@ -84,6 +85,12 @@ static int reset_index_file(const unsigned char *sha1, int reset_type, int quiet
> > return error(_("Failed to find tree of %s."), sha1_to_hex(sha1));
> > if (unpack_trees(nr, desc, &opts))
> > return -1;
> > +
> > + if (reset_type == MIXED || reset_type == HARD) {
> > + tree = parse_tree_indirect(sha1);
> > + prime_cache_tree(&active_cache_tree, tree);
> > + }
>
> The basic idea that MIXED or HARD should result in a cache-tree that match
> the tree we just read is sound, but how expensive is prime_cache_tree()? I
> think it reads the same tree once again. Admittedly, the data needed to
> reconstruct the tree is likely to be hot in core, but it may be necessary
> to measure before deciding if this is a good change.
Oh, I just didn't bother with timings because it's the same strategy
that read-tree follows, so I assumed if it was worth it back then, it
should again be a win.
For linux-2.6 as usual and best-of-10:
git reset before: 0:00.25real 0.12user 0.11system
after: 0:00.28real 0.14user 0.12system
git reset --hard before: 0:00.21real 0.13user 0.06system
after: 0:00.18real 0.10user 0.07system
So we spend ~30ms of extra user time, at a possible gain of 30% in
future git-commit invocations, as mentioned in the cover letter. I
think that's worth it :-)
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/5] cache-tree revisited
2011-12-06 17:43 [PATCH 0/5] cache-tree revisited Thomas Rast
` (4 preceding siblings ...)
2011-12-06 17:43 ` [PATCH 5/5] reset: update cache-tree data when appropriate Thomas Rast
@ 2011-12-08 14:15 ` Thomas Rast
5 siblings, 0 replies; 10+ messages in thread
From: Thomas Rast @ 2011-12-08 14:15 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Carlos Martín Nieto
Thomas Rast wrote:
> Junio C Hamano wrote:
> > Ahh, I forgot all about that exchange.
> >
> > http://thread.gmane.org/gmane.comp.version-control.git/178480/focus=178515
> >
> > The cache-tree mechanism has traditionally been one of the more important
> > optimizations and it would be very nice if we can resurrect the behaviour
> > for "git commit" too.
>
> Oh, I buried that. Let's try something other than the aggressive
> strategy I had there: only compute cache-tree if
>
> * we know we're going to need it soon, and we're about to write out
> the index anyway (as in git-commit)
I had another idea: we could write out *just* a new cache-tree data
set at the end of git-commit.
Doing it the cheap way would mean rehashing the on-disk data without
actually touching it. (That might not be so bad, but then if your
index is small, why is writing it from scratch expensive?)
Doing it efficiently requires making the sha1 restartable, which is
entirely doable withblock-sha1/sha1.h (I haven't looked into
ppc/sha1.h). As far as I can see it's not feasible with openssl's
sha1.
That is, we would add a new index extension (say PSHA: partial SHA)
and structure the index as
signature
header
cache data
PSHA <sha state up until just before PSHA>
TREE ...
[REUC ...]
sha1 footer
Then it's easy to cheaply replace only the extensions, by restarting
the hashing from the PSHA data and re-emitting only the extension
data.
I think all the bits are in place, and it would be easy to do.
However, for it to make sense, we would have to make BLK_SHA1 the
default for the most-used platforms and also not mind extending the
SHA1 API. Do you think that would fly?
I thought about other ways to make the index writing restartable from
the middle, but the only clean approach I came up with would require a
format change to something like
signature
0 header
1 cache data
2 sha1 of 0..1
3 extension data A
4 sha1 of 2..3
5 extension data B
6 sha1 of 4..5
[possibly more]
7 end-of-index marker
8 sha1 of 6..7
etc.
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 10+ messages in thread