git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Ericsson <ae@op5.se>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Junio C Hamano <junkio@cox.net>,
	Bahadir Balban <bahadir.balban@gmail.com>,
	git@vger.kernel.org, Andy Parkins <andyparkins@gmail.com>
Subject: Re: Adding a new file as if it had existed
Date: Wed, 13 Dec 2006 16:52:46 +0100	[thread overview]
Message-ID: <458021CE.1000407@op5.se> (raw)
In-Reply-To: <Pine.LNX.4.63.0612131611050.3635@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin wrote:
> Hi,
> 
> On Wed, 13 Dec 2006, Andreas Ericsson wrote:
> 
>> Junio C Hamano wrote:
>>> "Bahadir Balban" <bahadir.balban@gmail.com> writes:
>>>
>>> There is one thing we could further optimize, though.
>>>
>>> Switching branches with 100k blobs in a commit even when there
>>> are a handful paths different between the branches would still
>>> need to populate the index by reading two trees and collapsing
>>> them into a single stage.  In theory, we should be able to do a
>>> lot better if two-tree case of read-tree took advanrage of
>>> cache-tree information.  If ce_match_stat() says Ok for all
>>> paths in a subdirectory and the cached tree object name for that
>>> subdirectory in the index match what we are reading from the new
>>> tree, we should be able to skip reading that subdirectory (and
>>> its subdirectories) from the new tree object at all.
>>>
>>> Anybody interested to give it a try?
>>>
>> I'm not vell-versed enough in git internals to have my hopes high of 
>> making something useful of it, but if you give me a pointer of where to 
>> start I'd be happy to try, and perhaps learn something in the process.
> 
> Okay, I'll have a stab at explaining it.
> 
> For huge working directories, you usually have a huge number of trees. The 
> idea of cache_tree is to remember not only the stat information of the 
> blobs in the index, but to cache the hashes of the trees also (until they 
> are invalidated, e.g. by an update-index). This avoids recalculation of 
> the hashes when committing.
> 
> This cache is accessible by the global variable active_cache_tree. It is 
> best accessed by the function cache_tree_find(), which you call like that:
> 
> 	struct cache_tree *ct = cache_tree_find(active_cache_tree, path);
> 
> where the variable "path" may contain slashes. The SHA1 of the 
> corresponding tree is in ct->sha1, and you can check if the hash is still 
> valid by asking
> 
> 	if (cache_tree_fully_valid(ct))
> 		/* still valid */
> 
> AFAIU Junio would like to take the shortcut of doing nothing at all when 
> (twoway) reading a tree whose hash is identical to the hash stored in the 
> corresponding cache_tree _and_ when the cache is still fully valid.
> 

Seems you wrote half the code for me already. :)

Thanks for the excellent explanation. I'll see if I can grok it further 
tonight.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

  reply	other threads:[~2006-12-13 15:52 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-12 10:05 Adding a new file as if it had existed Bahadir Balban
2006-12-12 10:13 ` Junio C Hamano
2006-12-12 11:32   ` Bahadir Balban
2006-12-12 12:07     ` Johannes Schindelin
2006-12-12 12:26     ` Andy Parkins
2006-12-12 13:20       ` Andreas Ericsson
2006-12-12 18:31     ` Junio C Hamano
2006-12-13  9:40       ` Andreas Ericsson
2006-12-13 15:46         ` Johannes Schindelin
2006-12-13 15:52           ` Andreas Ericsson [this message]
2006-12-12 12:36 ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=458021CE.1000407@op5.se \
    --to=ae@op5.se \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=andyparkins@gmail.com \
    --cc=bahadir.balban@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).