From: Martin Waitz <tali@admingilde.org>
To: David Lang <david.lang@digitalinsight.com>
Cc: Junio C Hamano <junkio@cox.net>,
Josef Weidendorfer <Josef.Weidendorfer@gmx.de>,
Eric Lesh <eclesh@ucla.edu>, Matthieu Moy <Matthieu.Moy@imag.fr>,
git@vger.kernel.org
Subject: Re: Submodule object store
Date: Tue, 27 Mar 2007 17:25:07 +0200 [thread overview]
Message-ID: <20070327152507.GO22773@admingilde.org> (raw)
In-Reply-To: <Pine.LNX.4.63.0703261530030.14387@qynat.qvtvafvgr.pbz>
[-- Attachment #1: Type: text/plain, Size: 2872 bytes --]
hoi :)
Its really funny that when I proposed one big object database everybody
wanted it separated and now that I propose a separate database everybody
wants it as one combined database.
I read this as a sign that people really try to think critically about
the design, which is a good thing and will hopefully lead to a good
and stable submodule implementation.
On Mon, Mar 26, 2007 at 03:40:15PM -0800, David Lang wrote:
> useing the same object store makes this work automaticaly (think of all the
> copies of COPYING that would end up being the same as a trivial example)
Yes, but I guess not much more than COPYING, INSTALL, some trivial
Makefiles and empty files will be shared between subprojects.
Except when you have the same subproject in your tree multiple times,
of course.
Yet this sharing is exactly why I started to do it that way, until Linus
stopped me.
> >If someone comes up with a nice way to handle everything in one big
> >object store I would happily use that! :-)
>
> what exactly are the problems with one big object store?
I think we really have to discuss this separation on several layers:
traversal, pack-files, and object database.
For the traversal the point of separating it into a per-module traversal
is that only one module has to be loaded into RAM at a time.
This effects all operations which do a (potentially) recursive traversal:
push, pull, fsck, prune, repack.
However a separated traversal will no longer be garanteed to only list
an object once, so this has to be handled in some way.
Pack files should have better access patterns if they are per-module.
Most of the time you are only interested in one individual module and
locality is important here.
Separating the entire object database is a way to improve unreachability
analysis, as it now can be done per module.
The other two separations are easier to implement with a separated
object database, but that's not too strong an argument.
So if we can come up with a nice way to do unreachability analysis we
can indeed go on with the shared object database and tackle the
remaining scalability issues as they arise. Those could then be added
later without changing the on-disk format.
> ones that I can think of:
>
> 1. when you are doing a fsck you need to walk all the trees and find out
> the list of objects that you know about.
>
> done as a tree of binary values you can hold a LOT in memory before
> running into swap.
Could you explain the algorithm you are thinking about in more detail?
> if it's enough larger then available ram then an option for fsck to use
> trees on disk is an option.
This could simplify some things.
There could be an on-disk index of all known objects, so that the sha1
sums do not have to loaded into RAM all at once.
--
Martin Waitz
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2007-03-27 15:25 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-03-25 12:30 .gitlink for Summer of Code Eric Lesh
2007-03-25 15:20 ` Matthieu Moy
2007-03-25 20:39 ` Shawn O. Pearce
2007-03-25 20:54 ` Johannes Schindelin
2007-03-25 21:03 ` Shawn O. Pearce
2007-03-25 20:55 ` Junio C Hamano
2007-03-25 21:05 ` Shawn O. Pearce
2007-03-27 3:40 ` Petr Baudis
2007-03-26 17:16 ` Eric Lesh
2007-03-26 17:22 ` Matthieu Moy
2007-03-26 17:38 ` Eric Lesh
2007-03-26 18:35 ` Martin Waitz
2007-03-26 19:33 ` Josef Weidendorfer
2007-03-26 19:49 ` Matthieu Moy
2007-03-26 23:14 ` Josef Weidendorfer
2007-03-27 16:59 ` Matthieu Moy
2007-03-26 22:03 ` Martin Waitz
2007-03-26 22:51 ` Junio C Hamano
2007-03-26 23:16 ` Submodule object store Martin Waitz
2007-03-26 23:28 ` Junio C Hamano
2007-03-26 23:36 ` Martin Waitz
2007-03-26 23:20 ` David Lang
2007-03-26 23:55 ` Martin Waitz
2007-03-26 23:40 ` David Lang
2007-03-27 15:25 ` Martin Waitz [this message]
2007-03-27 16:53 ` David Lang
2007-03-27 0:29 ` Junio C Hamano
2007-03-27 14:28 ` Martin Waitz
2007-03-27 11:25 ` Uwe Kleine-König
2007-03-27 11:50 ` Uwe Kleine-König
2007-03-27 15:53 ` Martin Waitz
2007-03-27 16:56 ` Josef Weidendorfer
2007-03-27 16:44 ` Martin Waitz
2007-03-27 17:22 ` Uwe Kleine-König
2007-03-27 18:41 ` Linus Torvalds
2007-03-27 19:42 ` Uwe Kleine-König
2007-03-27 19:53 ` Linus Torvalds
2007-03-27 19:59 ` Linus Torvalds
2007-03-27 15:46 ` Martin Waitz
2007-03-26 23:17 ` .gitlink for Summer of Code Josef Weidendorfer
[not found] ` <Pine.LNX.4.64.0703270952020. 6730@woody.linux-foundation.org>
2007-03-26 23:24 ` Junio C Hamano
2007-03-27 17:04 ` Linus Torvalds
2007-03-27 17:00 ` David Lang
2007-03-27 18:15 ` Linus Torvalds
2007-03-27 17:35 ` Martin Waitz
2007-03-27 18:09 ` Daniel Barkalow
2007-03-27 18:19 ` Linus Torvalds
2007-03-27 20:54 ` Daniel Barkalow
2007-03-27 21:11 ` Linus Torvalds
2007-03-27 20:54 ` David Lang
2007-03-27 23:31 ` Jakub Narebski
2007-03-27 23:20 ` David Lang
2007-03-27 18:36 ` Steven Grimm
2007-03-27 20:02 ` Daniel Barkalow
2007-03-27 21:27 ` Linus Torvalds
2007-03-26 23:00 ` Josef Weidendorfer
2007-03-26 23:27 ` Martin Waitz
2007-03-26 17:31 ` Jakub Narebski
2007-03-26 18:21 ` Matthieu Moy
2007-03-27 0:48 ` Jakub Narebski
2007-03-25 20:46 ` Shawn O. Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070327152507.GO22773@admingilde.org \
--to=tali@admingilde.org \
--cc=Josef.Weidendorfer@gmx.de \
--cc=Matthieu.Moy@imag.fr \
--cc=david.lang@digitalinsight.com \
--cc=eclesh@ucla.edu \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).