All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ivan Tolstosheyev <ivan.tolstosheyev@gmail.com>
To: git@vger.kernel.org
Subject: Git tree object storing policy
Date: Tue, 21 Feb 2012 09:22:12 +0000 (UTC)	[thread overview]
Message-ID: <loom.20120221T094746-680@post.gmane.org> (raw)

Hello,

now tree object is a simple list of <attributes, hash, name>sorted by name
(tricky sorted, cause we assuming that directory name "$X" is actually "$X/"
in comparison function). The problem is, that if I want to insert 10k files
in empty git repository on / folder there will be 10k new 
trees with sizes from (1 to 10k)*(hash+name+attribute)+eps  .

itroot@localhost ~/tmp> cat git-test.sh 
#!/usr/bin/env bash

git init test
cd test
for i in `seq 1 10000` 
do
touch ${i} ; git add ${i} ; git commit -m "Add ${i}" ;
done
cd ..
du -hs test
itroot@localhost ~/tmp>

itroot@localhost ~/tmp> ./git-test.sh
...
180M	test
itroot@localhost ~/tmp>

180 MB!!!?? and 7.4M after `git gc` - thanks to delta compression!

Ok, you can say that this example is artificial, and I can add 10k files
with 1 commit. Thats true. But manipulating files in big tree objects
(in a big directories) is storage-expensive, and if I need to store a 
lot of files in one directory and frequently change them - git just
don't scales now properly at this use-case.

What do I propose? 
We can add another git object, named for example "btree" , 
that contains another "btree" objects or files.  This will be a simple
btree structure (tree entries sorted practically by name, BTW,
maybe it's time to fix sorting =] ), that allows us to do insertion,
removal, search in ln(n) time. But - we do not have troubles 
with big direcories now. BTW, if all directories are small, btree
will be tree-like - just btree pointing to  files.
So, one big tree with 10k files transforms to (hmm, for example...)
101 btrees - one, pointing to 100 btrees, and thay points to files.
(100 entries per btree is a wild guess =) )

Suggestions?

 

             reply	other threads:[~2012-02-21  9:25 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-21  9:22 Ivan Tolstosheyev [this message]
2012-02-21 10:18 ` Git tree object storing policy Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=loom.20120221T094746-680@post.gmane.org \
    --to=ivan.tolstosheyev@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.