From: Andreas Ericsson <ae@op5.se>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Junio C Hamano <junkio@cox.net>,
Bahadir Balban <bahadir.balban@gmail.com>,
git@vger.kernel.org, Andy Parkins <andyparkins@gmail.com>
Subject: Re: Adding a new file as if it had existed
Date: Wed, 13 Dec 2006 16:52:46 +0100 [thread overview]
Message-ID: <458021CE.1000407@op5.se> (raw)
In-Reply-To: <Pine.LNX.4.63.0612131611050.3635@wbgn013.biozentrum.uni-wuerzburg.de>
Johannes Schindelin wrote:
> Hi,
>
> On Wed, 13 Dec 2006, Andreas Ericsson wrote:
>
>> Junio C Hamano wrote:
>>> "Bahadir Balban" <bahadir.balban@gmail.com> writes:
>>>
>>> There is one thing we could further optimize, though.
>>>
>>> Switching branches with 100k blobs in a commit even when there
>>> are a handful paths different between the branches would still
>>> need to populate the index by reading two trees and collapsing
>>> them into a single stage. In theory, we should be able to do a
>>> lot better if two-tree case of read-tree took advanrage of
>>> cache-tree information. If ce_match_stat() says Ok for all
>>> paths in a subdirectory and the cached tree object name for that
>>> subdirectory in the index match what we are reading from the new
>>> tree, we should be able to skip reading that subdirectory (and
>>> its subdirectories) from the new tree object at all.
>>>
>>> Anybody interested to give it a try?
>>>
>> I'm not vell-versed enough in git internals to have my hopes high of
>> making something useful of it, but if you give me a pointer of where to
>> start I'd be happy to try, and perhaps learn something in the process.
>
> Okay, I'll have a stab at explaining it.
>
> For huge working directories, you usually have a huge number of trees. The
> idea of cache_tree is to remember not only the stat information of the
> blobs in the index, but to cache the hashes of the trees also (until they
> are invalidated, e.g. by an update-index). This avoids recalculation of
> the hashes when committing.
>
> This cache is accessible by the global variable active_cache_tree. It is
> best accessed by the function cache_tree_find(), which you call like that:
>
> struct cache_tree *ct = cache_tree_find(active_cache_tree, path);
>
> where the variable "path" may contain slashes. The SHA1 of the
> corresponding tree is in ct->sha1, and you can check if the hash is still
> valid by asking
>
> if (cache_tree_fully_valid(ct))
> /* still valid */
>
> AFAIU Junio would like to take the shortcut of doing nothing at all when
> (twoway) reading a tree whose hash is identical to the hash stored in the
> corresponding cache_tree _and_ when the cache is still fully valid.
>
Seems you wrote half the code for me already. :)
Thanks for the excellent explanation. I'll see if I can grok it further
tonight.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
next prev parent reply other threads:[~2006-12-13 15:52 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-12 10:05 Adding a new file as if it had existed Bahadir Balban
2006-12-12 10:13 ` Junio C Hamano
2006-12-12 11:32 ` Bahadir Balban
2006-12-12 12:07 ` Johannes Schindelin
2006-12-12 12:26 ` Andy Parkins
2006-12-12 13:20 ` Andreas Ericsson
2006-12-12 18:31 ` Junio C Hamano
2006-12-13 9:40 ` Andreas Ericsson
2006-12-13 15:46 ` Johannes Schindelin
2006-12-13 15:52 ` Andreas Ericsson [this message]
2006-12-12 12:36 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=458021CE.1000407@op5.se \
--to=ae@op5.se \
--cc=Johannes.Schindelin@gmx.de \
--cc=andyparkins@gmail.com \
--cc=bahadir.balban@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).