git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Git tree object storing policy
@ 2012-02-21  9:22 Ivan Tolstosheyev
  2012-02-21 10:18 ` Thomas Rast
  0 siblings, 1 reply; 2+ messages in thread
From: Ivan Tolstosheyev @ 2012-02-21  9:22 UTC (permalink / raw)
  To: git

Hello,

now tree object is a simple list of <attributes, hash, name>sorted by name
(tricky sorted, cause we assuming that directory name "$X" is actually "$X/"
in comparison function). The problem is, that if I want to insert 10k files
in empty git repository on / folder there will be 10k new 
trees with sizes from (1 to 10k)*(hash+name+attribute)+eps  .

itroot@localhost ~/tmp> cat git-test.sh 
#!/usr/bin/env bash

git init test
cd test
for i in `seq 1 10000` 
do
touch ${i} ; git add ${i} ; git commit -m "Add ${i}" ;
done
cd ..
du -hs test
itroot@localhost ~/tmp>

itroot@localhost ~/tmp> ./git-test.sh
...
180M	test
itroot@localhost ~/tmp>

180 MB!!!?? and 7.4M after `git gc` - thanks to delta compression!

Ok, you can say that this example is artificial, and I can add 10k files
with 1 commit. Thats true. But manipulating files in big tree objects
(in a big directories) is storage-expensive, and if I need to store a 
lot of files in one directory and frequently change them - git just
don't scales now properly at this use-case.

What do I propose? 
We can add another git object, named for example "btree" , 
that contains another "btree" objects or files.  This will be a simple
btree structure (tree entries sorted practically by name, BTW,
maybe it's time to fix sorting =] ), that allows us to do insertion,
removal, search in ln(n) time. But - we do not have troubles 
with big direcories now. BTW, if all directories are small, btree
will be tree-like - just btree pointing to  files.
So, one big tree with 10k files transforms to (hmm, for example...)
101 btrees - one, pointing to 100 btrees, and thay points to files.
(100 entries per btree is a wild guess =) )

Suggestions?

 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-02-21 10:18 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-21  9:22 Git tree object storing policy Ivan Tolstosheyev
2012-02-21 10:18 ` Thomas Rast

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).