All of lore.kernel.org
 help / color / mirror / Atom feed
From: duchier@ps.uni-sb.de
To: Tomas Mraz <t8m@centrum.cz>
Cc: gnu-arch-dev@lists.seyza.com, talli@museatech.net, git@vger.kernel.org
Subject: Re: [Gnu-arch-users] Re: [ANNOUNCEMENT] /Arch/ embraces `git'
Date: Thu, 21 Apr 2005 13:46:45 +0200	[thread overview]
Message-ID: <878y3ce4ii.fsf@star.lifl.fr> (raw)
In-Reply-To: <1114078877.5886.37.camel@perun.redhat.usu> (Tomas Mraz's message of "Thu, 21 Apr 2005 12:21:16 +0200")

Tomas Mraz <t8m@centrum.cz> writes:

>> Btw, if, as you indicate above, you do believe that a 1 level indexing should
>> use [0:2], then it doesn't make much sense to me to also suggest that a 2 level
>> indexing should use [0:1] as primary subkey :-)
>
> Why do you think so? IMHO we should always target a similar number of
> files/subdirectories in a directories of the blob archive. So If I
> always suppose that the archive would contain at most 16 millions of
> files then the possible indexing schemes are either 1 level with key
> length 3 (each directory would contain ~4096 files) or 2 level with key
> length 2 (each directory would contain ~256 files).
> Which one is better could be of course filesystem and hardware
> dependent.

First off, I have been using python slice notation, so when I write [0:2] I mean
a key of length 2 (the second index is not included).  I now realize that when
you wrote the same you meant to include the second index.

I believe that our disagreement comes from the fact that we are asking different
questions.  You consider the question of how to best index a fixed database and
I consider the question of how to best index an ever increasing database.

Now consider why we even want multiple indexing levels: presumably this is
because certain operations become too costly when the size of a directory
becomes too large.  If that's not the case, then we might as well just have one
big flat directory - perhaps that's even a viable option for some
filesystems.[1]

  [1] there is the additional consideration that a hierarchical system
  implements a form of key compression by sharing key prefixes.  I don't know at
  what point such an effect becomes beneficial, if ever.

Now suppose we need at least one level of indexing.  Under an assumption of
uniform distribution of bits in keys, as more objects are added to the database,
the lower levels are going to fill up uniformly.  Therefore at those levels we
are again faced with exactly the same indexing problem and thus should come up
with exactly the same answer.

This is why I believe that the scheme I proposed is best: when a bottom level
directory fills up past a certain size, introduce under it an additional level,
and reindex the keys.  Since the "certain size" is fixed, this is a constant
time operation.

One could also entertain the idea of reindexing not just a bottom level
directory but an entire subtree of the database (this would be closer to your
idea of finding an optimal reindexing of just this part of the database).
However this has the disadvantage that the operation's cost grows exponentially
with the depth of the tree.

Cheers,

--Denys

  reply	other threads:[~2005-04-21 11:55 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-20 10:00 [ANNOUNCEMENT] /Arch/ embraces `git' Tom Lord
2005-04-20 10:19 ` Miles Bader
2005-04-20 17:15 ` duchier
2005-04-20 22:40   ` [Gnu-arch-users] Re: [GNU-arch-dev] " Tomas Mraz
2005-04-21  9:09     ` Denys Duchier
2005-04-21 10:21       ` Tomas Mraz
2005-04-21 11:46         ` duchier [this message]
2005-04-20 22:51   ` [Gnu-arch-users] " Tomas Mraz
2005-04-21 19:04     ` Tom Lord
2005-04-21 20:35     ` [Gnu-arch-users] Re: [GNU-arch-dev] " Tom Lord
2005-04-20 23:04   ` Tom Lord
2005-04-21  0:05     ` [Gnu-arch-users] Re: [GNU-arch-dev] " Denys Duchier
2005-04-21 20:39       ` [Gnu-arch-users] " Tom Lord
2005-04-21  7:49     ` Tomas Mraz
2005-04-21 21:51       ` [Gnu-arch-users] Re: [GNU-arch-dev] " Tom Lord
2005-04-21 21:52       ` Tom Lord
2005-04-22 16:13       ` Linus Torvalds
2005-04-22 17:39         ` Edésio Costa e Silva
2005-04-20 21:31 ` Petr Baudis
2005-04-20 21:55   ` C. Scott Ananian
2005-04-20 22:22     ` chunking (Re: [ANNOUNCEMENT] /Arch/ embraces `git') Linus Torvalds
2005-04-20 23:42       ` C. Scott Ananian
2005-04-22 21:02       ` blowing chunks (quick update) C. Scott Ananian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878y3ce4ii.fsf@star.lifl.fr \
    --to=duchier@ps.uni-sb.de \
    --cc=git@vger.kernel.org \
    --cc=gnu-arch-dev@lists.seyza.com \
    --cc=t8m@centrum.cz \
    --cc=talli@museatech.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.