From: Thomas Rast <trast@student.ethz.ch>
To: <git@vger.kernel.org>
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Carlos Martín Nieto" <cmn@elego.de>
Subject: Re: [PATCH 0/5] cache-tree revisited
Date: Thu, 8 Dec 2011 15:15:29 +0100 [thread overview]
Message-ID: <201112081515.29652.trast@student.ethz.ch> (raw)
In-Reply-To: <cover.1323191497.git.trast@student.ethz.ch>
Thomas Rast wrote:
> Junio C Hamano wrote:
> > Ahh, I forgot all about that exchange.
> >
> > http://thread.gmane.org/gmane.comp.version-control.git/178480/focus=178515
> >
> > The cache-tree mechanism has traditionally been one of the more important
> > optimizations and it would be very nice if we can resurrect the behaviour
> > for "git commit" too.
>
> Oh, I buried that. Let's try something other than the aggressive
> strategy I had there: only compute cache-tree if
>
> * we know we're going to need it soon, and we're about to write out
> the index anyway (as in git-commit)
I had another idea: we could write out *just* a new cache-tree data
set at the end of git-commit.
Doing it the cheap way would mean rehashing the on-disk data without
actually touching it. (That might not be so bad, but then if your
index is small, why is writing it from scratch expensive?)
Doing it efficiently requires making the sha1 restartable, which is
entirely doable withblock-sha1/sha1.h (I haven't looked into
ppc/sha1.h). As far as I can see it's not feasible with openssl's
sha1.
That is, we would add a new index extension (say PSHA: partial SHA)
and structure the index as
signature
header
cache data
PSHA <sha state up until just before PSHA>
TREE ...
[REUC ...]
sha1 footer
Then it's easy to cheaply replace only the extensions, by restarting
the hashing from the PSHA data and re-emitting only the extension
data.
I think all the bits are in place, and it would be easy to do.
However, for it to make sense, we would have to make BLK_SHA1 the
default for the most-used platforms and also not mind extending the
SHA1 API. Do you think that would fly?
I thought about other ways to make the index writing restartable from
the middle, but the only clean approach I came up with would require a
format change to something like
signature
0 header
1 cache data
2 sha1 of 0..1
3 extension data A
4 sha1 of 2..3
5 extension data B
6 sha1 of 4..5
[possibly more]
7 end-of-index marker
8 sha1 of 6..7
etc.
--
Thomas Rast
trast@{inf,student}.ethz.ch
prev parent reply other threads:[~2011-12-08 14:15 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-06 17:43 [PATCH 0/5] cache-tree revisited Thomas Rast
2011-12-06 17:43 ` [PATCH 1/5] Add test-scrap-cache-tree Thomas Rast
2011-12-06 22:51 ` Junio C Hamano
2011-12-06 17:43 ` [PATCH 2/5] Test the current state of the cache-tree optimization Thomas Rast
2011-12-06 17:43 ` [PATCH 3/5] Refactor cache_tree_update idiom from commit Thomas Rast
2011-12-06 17:43 ` [PATCH 4/5] commit: write cache-tree data when writing index anyway Thomas Rast
2011-12-06 17:43 ` [PATCH 5/5] reset: update cache-tree data when appropriate Thomas Rast
2011-12-06 23:13 ` Junio C Hamano
2011-12-07 7:53 ` Thomas Rast
2011-12-08 14:15 ` Thomas Rast [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201112081515.29652.trast@student.ethz.ch \
--to=trast@student.ethz.ch \
--cc=cmn@elego.de \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).