From: Tomas Mraz <t8m@centrum.cz>
To: duchier@ps.uni-sb.de
Cc: gnu-arch-dev@lists.seyza.com, talli@museatech.net,
git@vger.kernel.org, torvalds@osdi.org
Subject: Re: [Gnu-arch-users] Re: [ANNOUNCEMENT] /Arch/ embraces `git'
Date: Thu, 21 Apr 2005 00:51:49 +0200 [thread overview]
Message-ID: <1114037509.5880.62.camel@perun.redhat.usu> (raw)
In-Reply-To: <877jixfjxw.fsf@star.lifl.fr>
On Wed, 2005-04-20 at 19:15 +0200, duchier@ps.uni-sb.de wrote:
...
> As data, I used my /usr/src/linux which uses 301M and contains 20753 files and
> 1389 directories. To compute the key for a directory, I considered that its
> contents were a mapping from names to keys.
I suppose if you used the blob archive for storing many revisions the
number of stored blobs would be much higher. However even then we can
estimate that the maximum number of stored blobs will be in the order of
milions.
> When constructing the indexed archive, I actually stored empty files instead of
> blobs because I am only interested in overhead.
>
> Using your suggested indexing method that uses [0:4] as the 1st level key and
[0:3]
> [4:8] as the 2nd level key, I obtain an indexed archive that occupies 159M,
> where the top level contains 18665 1st level keys, the largest first level dir
> contains 5 entries, and all 2nd level dirs contain exactly 1 entry.
Yes, it really doesn't make much sense to have so big keys on the
directories. If we would assume that SHA1 is a really good hashing
function so the probability of any hash value is the same this would
allow storing 2^16 * 2^16 * 2^16 blobs with approximately same directory
usage.
> Using Linus suggested 1 level [0:2] indexing, I obtain an indexed archive that
[0:1] I suppose
> occupies 1.8M, where the top level contains 256 1st level keys, and where the
> largest 1st level dir contains 110 entries.
The question is how many entries in directory is optimal compromise
between space and the speed of access to it's files.
If we suppose the maximum number of stored blobs in the order of milions
probably the optimal indexing would be 1 level [0:2] indexing or 2
levels [0:1] [2:3]. However it would be necessary to do some
benchmarking first before setting this to stone.
--
Tomas Mraz <t8m@centrum.cz>
next prev parent reply other threads:[~2005-04-20 22:47 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-04-20 10:00 [ANNOUNCEMENT] /Arch/ embraces `git' Tom Lord
2005-04-20 10:19 ` Miles Bader
2005-04-20 17:15 ` duchier
2005-04-20 22:40 ` [Gnu-arch-users] Re: [GNU-arch-dev] " Tomas Mraz
2005-04-21 9:09 ` Denys Duchier
2005-04-21 10:21 ` Tomas Mraz
2005-04-21 11:46 ` [Gnu-arch-users] " duchier
2005-04-20 22:51 ` Tomas Mraz [this message]
2005-04-21 19:04 ` Tom Lord
2005-04-21 20:35 ` [Gnu-arch-users] Re: [GNU-arch-dev] " Tom Lord
2005-04-20 23:04 ` Tom Lord
2005-04-21 0:05 ` [Gnu-arch-users] Re: [GNU-arch-dev] " Denys Duchier
2005-04-21 20:39 ` [Gnu-arch-users] " Tom Lord
2005-04-21 7:49 ` Tomas Mraz
2005-04-21 21:51 ` [Gnu-arch-users] Re: [GNU-arch-dev] " Tom Lord
2005-04-21 21:52 ` Tom Lord
2005-04-22 16:13 ` Linus Torvalds
2005-04-22 17:39 ` Edésio Costa e Silva
2005-04-20 21:31 ` Petr Baudis
2005-04-20 21:55 ` C. Scott Ananian
2005-04-20 22:22 ` chunking (Re: [ANNOUNCEMENT] /Arch/ embraces `git') Linus Torvalds
2005-04-20 23:42 ` C. Scott Ananian
2005-04-22 21:02 ` blowing chunks (quick update) C. Scott Ananian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1114037509.5880.62.camel@perun.redhat.usu \
--to=t8m@centrum.cz \
--cc=duchier@ps.uni-sb.de \
--cc=git@vger.kernel.org \
--cc=gnu-arch-dev@lists.seyza.com \
--cc=talli@museatech.net \
--cc=torvalds@osdi.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).