All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: Avery Pennarun <apenwarr@gmail.com>,
	Joshua Juran <jjuran@gmail.com>,
	Finn Arne Gangstad <finnag@pvv.org>,
	git@vger.kernel.org
Subject: Re: inotify daemon speedup for git [POC/HACK]
Date: Wed, 28 Jul 2010 06:06:22 -0700 (PDT)	[thread overview]
Message-ID: <m3tynjkb90.fsf@localhost.localdomain> (raw)
In-Reply-To: <20100728000009.GE25268@spearce.org>

"Shawn O. Pearce" <spearce@spearce.org> writes:

> Avery Pennarun <apenwarr@gmail.com> wrote:
> > 
> > While we're here, it's probably worth mentioning that git's index file
> > format (which stores a sequential list of full paths in alphabetical
> > order, instead of an actual hierarchy) does become a bottleneck when
> > you actually have a huge number of files in your repo (like literally
> > a million).  You can't actually binary search through the index!  The
> > current implementation of submodules allows you to dodge that
> > scalability problem since you end up with multiple smaller index
> > files.  Anyway, that's fixable too.
> 
> Yes.
> 
> More than once I've been tempted to rewrite the on-disk (and I guess
> in-memory) format of the index.  And then I remember how painful that
> stuff is in either C git.git or JGit, and I back away slowly.  :-)
> 
> Ideally the index is organized the same way the trees are, but
> you still can't do a really good binary search because of the
> ass-backwards name sorting rule for trees.  But for performance
> reasons you still want to keep the entire index in a single file,
> an index per directory (aka SVN/CVS) is too slow for the common
> case of <30k files.

I guess that modern filesystems solve the problem of very many files
in a single directory somehow (hash tables?).  Perhaps the index file
could borrow some such mechanism as an extension.

Index for index?
-- 
Jakub Narebski
Poland
ShadeHawk on #git

  parent reply	other threads:[~2010-07-28 13:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-27 12:20 inotify daemon speedup for git [POC/HACK] Finn Arne Gangstad
2010-07-27 23:29 ` Avery Pennarun
2010-07-27 23:39   ` Joshua Juran
2010-07-27 23:51     ` Avery Pennarun
2010-07-28  0:00       ` Shawn O. Pearce
2010-07-28  0:18         ` Avery Pennarun
2010-07-28  1:14           ` Joshua Juran
2010-07-28  1:31             ` Avery Pennarun
2010-07-28  6:03               ` Sverre Rabbelier
2010-07-28  6:06                 ` Jonathan Nieder
2010-07-28  7:44                   ` Ævar Arnfjörð Bjarmason
2010-07-28 11:08                     ` Theodore Tso
2010-07-28  8:20                 ` Nguyen Thai Ngoc Duy
2010-08-13 17:53                   ` Enrico Weigelt
2010-07-28 13:09           ` Jakub Narebski
2010-07-28 13:06         ` Jakub Narebski [this message]
2010-08-13 17:58           ` Enrico Weigelt
2010-07-27 23:58 ` Sverre Rabbelier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3tynjkb90.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=apenwarr@gmail.com \
    --cc=finnag@pvv.org \
    --cc=git@vger.kernel.org \
    --cc=jjuran@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.