From: Shawn Pearce <spearce@spearce.org>
To: Josef Weidendorfer <Josef.Weidendorfer@gmx.de>
Cc: Linus Torvalds <torvalds@osdl.org>,
Martin Waitz <tali@admingilde.org>,
sf <sf-gmane@stephan-feder.de>,
git@vger.kernel.org
Subject: Re: Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT]
Date: Sat, 2 Dec 2006 21:46:55 -0500 [thread overview]
Message-ID: <20061203024655.GD26668@spearce.org> (raw)
In-Reply-To: <200612030307.26429.Josef.Weidendorfer@gmx.de>
Josef Weidendorfer <Josef.Weidendorfer@gmx.de> wrote:
> Thinking even one step further:
> Would it make sense to define an encoding format for the content of
> commit and tree objects inside of packs, where the SHA1 is replaced by the
> offset of the object in this pack?
> As exactly the SHA1 is the least compressable thing, this could promise
> quite a benefit.
I actually had the same idea the other day. I discarded it after
thinking about it for a minute. Here's the problem:
Lets say we do this for the tree and parent IDs in a commit, because
these are the most commonly needed part of a commit during revision
traversal. So we want to put the offset to the tree and the offset
to each parent at the front of the commit somehow to make them very
cheap to access.
This means that when we start to write out a commit we need to know
the offset to the tree that commit references. But git-pack-objects
sorts object by type: commit, tree, blob (I forget where tags go,
but they aren't important in this context). So generally *all*
commits appear before the first tree. So when we write out the first
commit we need to know exactly how many bytes every commit will need
(compressed mind you) in this pack so we can determine the position
of the first tree. Now do this for every commit and every tree
that those commits use... yes, its a lot of work to precompute
and store all offsets before you even write out the first byte.
Its even worse with parent commits because ancestors tend to appear
behind the commit (newest->oldest) so that "git log" can benefit
from OS read-ahead. So you also have to keep track of your parent
commmit offsets. Not pretty.
Extending that idea to tree objects (store the offset of the entry)
makes the issue even uglier.
Oh, and packs aren't entirely self-contained. A pack is only self
contained in the sense that no object in the pack deltafies against
an object outside of the pack[1]. However by design an object
(e.g. a commit or a tree) can reference an object which is either
loose or which is in another pack. This is especially important
for every large projects where not every commit/tree/tag/blob will
fit into one 4 giB file.
**1** Except in the case of thin packs, which are used only on the
network and only to save bandwidth.
> AFAIK, we currently only use these offsets for referencing objects in
> delta chains.
Yes, that's a recent feature to reference a delta base.
--
next prev parent reply other threads:[~2006-12-03 2:47 UTC|newest]
Thread overview: 252+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-20 21:51 [RFC] Submodules in GIT Martin Waitz
2006-11-20 22:16 ` Jakub Narebski
2006-11-20 22:28 ` Martin Waitz
2006-11-20 22:43 ` Junio C Hamano
2006-11-20 23:02 ` Jakub Narebski
2006-11-20 23:52 ` Martin Waitz
2006-11-21 1:31 ` Sam Vilain
2006-11-20 23:05 ` Linus Torvalds
2006-11-20 23:25 ` J. Bruce Fields
2006-11-20 23:33 ` Martin Waitz
2006-11-21 18:01 ` J. Bruce Fields
2006-11-21 19:32 ` Martin Waitz
2006-11-20 23:29 ` Martin Waitz
2006-11-21 0:10 ` Junio C Hamano
2006-11-21 0:42 ` Jakub Narebski
2006-11-21 6:21 ` Martin Waitz
2006-11-21 10:04 ` Jakub Narebski
2006-11-21 11:49 ` Martin Waitz
2006-11-21 6:27 ` Martin Waitz
2006-11-21 7:36 ` Junio C Hamano
2006-11-21 7:55 ` Martin Waitz
2006-11-21 22:31 ` Yann Dirson
2006-11-21 22:51 ` Linus Torvalds
2006-11-21 22:59 ` Linus Torvalds
2006-11-21 23:54 ` Yann Dirson
2006-11-22 3:40 ` Shawn Pearce
2006-11-23 23:23 ` Yann Dirson
2006-11-25 6:53 ` Shawn Pearce
2006-11-25 11:12 ` Yann Dirson
2006-11-25 18:57 ` Linus Torvalds
2006-11-25 19:19 ` Steven Grimm
2006-11-25 19:30 ` Linus Torvalds
2006-11-25 23:49 ` Yann Dirson
2006-11-26 1:14 ` Sven Verdoolaege
2006-11-26 1:32 ` Yann Dirson
2006-11-26 3:39 ` Linus Torvalds
2006-11-26 8:05 ` Daniel Barkalow
2006-11-28 9:36 ` Andreas Ericsson
2006-11-28 10:29 ` Andy Parkins
2006-11-28 10:50 ` Jakub Narebski
2006-11-28 13:35 ` Andy Parkins
2006-11-28 15:44 ` Shawn Pearce
2006-11-28 16:29 ` Andy Parkins
2006-11-28 16:36 ` Shawn Pearce
2006-11-28 17:38 ` Jon Loeliger
2006-11-29 16:15 ` Martin Waitz
2006-11-30 11:57 ` sf
[not found] ` <200611301255.41733.andyparkins@gmail.com>
2006-11-30 14:00 ` Stephan Feder
2006-11-30 14:49 ` Andy Parkins
2006-11-30 15:20 ` Sven Verdoolaege
2006-11-30 15:30 ` Andy Parkins
2006-11-30 15:50 ` Andreas Ericsson
2006-11-30 16:08 ` Andy Parkins
2006-11-30 16:33 ` Sven Verdoolaege
2006-12-01 0:01 ` Andy Parkins
2006-12-01 0:11 ` Jakub Narebski
2006-12-01 9:32 ` Sven Verdoolaege
2006-12-01 10:19 ` Andy Parkins
2006-11-30 17:19 ` Martin Waitz
2006-11-30 16:05 ` sf
2006-11-30 16:12 ` sf
2006-12-01 9:19 ` Andy Parkins
2006-12-01 9:57 ` Martin Waitz
2006-12-01 10:29 ` Andy Parkins
2006-12-01 10:42 ` Sven Verdoolaege
2006-12-01 11:02 ` Andy Parkins
2006-12-01 11:10 ` Sven Verdoolaege
2006-12-01 11:45 ` sf
2006-12-01 12:12 ` Andy Parkins
2006-12-01 12:28 ` Martin Waitz
2006-12-01 14:11 ` Andy Parkins
2006-12-01 15:12 ` Martin Waitz
2006-12-01 11:46 ` Martin Waitz
2006-12-01 12:16 ` Andy Parkins
2006-12-01 12:34 ` Martin Waitz
2006-12-01 13:59 ` Andy Parkins
2006-12-01 14:07 ` Martin Waitz
2006-12-01 11:31 ` Martin Waitz
2006-12-01 12:20 ` Andy Parkins
2006-12-01 12:37 ` Martin Waitz
2006-12-02 15:16 ` Jakub Narebski
2006-11-28 19:58 ` Steven Grimm
2006-11-28 21:02 ` Shawn Pearce
2006-11-29 16:03 ` Martin Waitz
2006-11-29 20:00 ` Andy Parkins
2006-11-30 12:16 ` Andreas Ericsson
2006-11-30 12:40 ` Andy Parkins
2006-11-30 17:06 ` Martin Waitz
2006-11-30 18:57 ` Andreas Ericsson
2006-12-01 8:49 ` Andy Parkins
2006-12-01 9:33 ` Andreas Ericsson
2006-12-01 10:38 ` Andy Parkins
2006-12-01 12:03 ` sf
2006-12-01 12:11 ` Martin Waitz
2006-12-01 13:21 ` sf
2006-12-01 13:43 ` Martin Waitz
2006-12-01 14:23 ` Stephan Feder
2006-12-01 15:07 ` Martin Waitz
2006-12-01 16:04 ` Stephan Feder
2006-12-01 16:15 ` Martin Waitz
2006-12-05 9:01 ` Uwe Kleine-Koenig
2006-12-05 10:33 ` Andreas Ericsson
2006-12-05 11:11 ` Jakub Narebski
2006-12-05 15:02 ` Uwe Kleine-Koenig
2006-12-05 15:30 ` Andreas Ericsson
2006-12-05 16:00 ` Sven Verdoolaege
2006-12-01 9:02 ` Andy Parkins
2006-12-01 11:00 ` Martin Waitz
2006-12-01 12:09 ` sf
2006-12-01 12:12 ` Martin Waitz
2006-12-01 13:05 ` sf
2006-12-01 13:35 ` Martin Waitz
2006-12-01 13:43 ` Andreas Ericsson
2006-12-01 13:46 ` Martin Waitz
2006-12-01 14:52 ` Andreas Ericsson
2006-12-01 15:00 ` Martin Waitz
2006-12-01 16:38 ` Andreas Ericsson
2006-12-01 16:49 ` Linus Torvalds
2006-12-01 17:08 ` sf
2006-12-01 18:06 ` Andreas Ericsson
2006-12-01 20:13 ` Linus Torvalds
2006-12-01 20:30 ` Martin Waitz
2006-12-01 23:23 ` Alan Chandler
2006-12-01 22:06 ` Josef Weidendorfer
2006-12-01 22:12 ` Martin Waitz
2006-12-01 22:26 ` Josef Weidendorfer
2006-12-01 22:40 ` Martin Waitz
2006-12-01 23:17 ` Josef Weidendorfer
2006-12-02 20:24 ` Martin Waitz
2006-12-03 0:55 ` Josef Weidendorfer
2006-12-03 6:29 ` Martin Waitz
2006-12-01 22:26 ` Linus Torvalds
2006-12-01 22:41 ` sf
2006-12-01 23:03 ` Josef Weidendorfer
2006-12-01 23:09 ` Linus Torvalds
2006-12-01 23:36 ` Josef Weidendorfer
2006-12-02 0:12 ` Linus Torvalds
2006-12-02 9:22 ` Andy Parkins
[not found] ` <200612021255.59972.Josef.Weidendorfer@gmx.de>
2006-12-03 9:42 ` Andy Parkins
2006-12-02 11:32 ` Josef Weidendorfer
2006-12-02 19:52 ` Linus Torvalds
2006-12-02 20:21 ` Martin Waitz
2006-12-02 20:46 ` Linus Torvalds
2006-12-02 20:58 ` Martin Waitz
2006-12-03 1:11 ` Josef Weidendorfer
2006-12-02 20:18 ` Martin Waitz
2006-12-02 20:44 ` Linus Torvalds
2006-12-02 21:06 ` Martin Waitz
2006-12-02 21:29 ` Linus Torvalds
2006-12-02 21:22 ` Linus Torvalds
2006-12-03 2:07 ` Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT] Josef Weidendorfer
2006-12-03 2:25 ` Linus Torvalds
2006-12-03 2:46 ` Shawn Pearce [this message]
2006-12-03 3:21 ` Josef Weidendorfer
2006-12-03 11:10 ` Jakub Narebski
2006-12-03 11:47 ` Josef Weidendorfer
2006-12-03 20:46 ` [RFC] Submodules in GIT Martin Waitz
2006-12-03 22:16 ` Sven Verdoolaege
2006-12-03 22:32 ` Linus Torvalds
2006-12-03 22:49 ` Jakub Narebski
2006-12-04 11:12 ` Josef Weidendorfer
2006-12-01 23:49 ` sf
2006-12-02 18:57 ` Torgil Svensson
2006-12-02 19:41 ` Linus Torvalds
2006-12-03 9:19 ` Torgil Svensson
2006-12-03 17:54 ` Linus Torvalds
2006-12-04 20:26 ` Torgil Svensson
2006-12-04 20:41 ` Linus Torvalds
2006-12-04 21:36 ` Torgil Svensson
2006-12-05 10:42 ` Andreas Ericsson
2006-12-05 11:09 ` Jakub Narebski
2006-12-05 10:38 ` Andreas Ericsson
2006-12-05 11:01 ` Jakub Narebski
2006-12-03 19:33 ` Andy Parkins
2006-12-05 2:33 ` Daniel Barkalow
2006-12-05 22:07 ` sf
2006-12-09 21:34 ` R. Steve McKown
2006-12-10 11:47 ` Torgil Svensson
2006-12-14 21:27 ` Torgil Svensson
2006-12-14 23:07 ` Josef Weidendorfer
2006-12-15 17:43 ` Torgil Svensson
2006-12-15 21:42 ` Josef Weidendorfer
2006-12-15 23:43 ` Torgil Svensson
2006-12-16 1:13 ` Torgil Svensson
2006-12-16 1:20 ` Torgil Svensson
2006-12-16 1:34 ` Jakub Narebski
2006-12-16 8:40 ` Torgil Svensson
2006-12-16 9:57 ` Jakub Narebski
2006-12-16 10:25 ` Junio C Hamano
2006-12-16 15:05 ` Torgil Svensson
2006-12-16 15:38 ` Torgil Svensson
2006-12-16 16:32 ` Jakub Narebski
2006-12-17 0:21 ` Torgil Svensson
2006-12-16 1:49 ` Linus Torvalds
2006-12-16 2:12 ` Linus Torvalds
2006-12-16 8:50 ` Torgil Svensson
2006-12-02 20:12 ` Martin Waitz
2006-12-01 22:55 ` Josef Weidendorfer
2006-12-01 23:07 ` Martin Waitz
2006-12-01 23:30 ` Linus Torvalds
2006-12-02 0:14 ` Josef Weidendorfer
2006-12-02 0:33 ` Linus Torvalds
2006-12-02 9:27 ` Andy Parkins
2006-12-04 18:56 ` Michael K. Edwards
2006-12-05 1:31 ` Sam Vilain
2006-12-01 22:35 ` sf
2006-12-08 18:29 ` Jon Loeliger
2006-12-08 18:45 ` Sven Verdoolaege
2006-12-12 8:32 ` Andreas Ericsson
2006-12-01 17:14 ` Martin Waitz
2006-12-01 16:57 ` Martin Waitz
2006-12-01 18:08 ` Andreas Ericsson
2006-12-01 18:51 ` Martin Waitz
2006-12-01 13:51 ` Stephan Feder
2006-12-01 14:58 ` Martin Waitz
2006-12-01 15:47 ` Stephan Feder
2006-12-01 16:54 ` Martin Waitz
2006-12-01 17:33 ` Stephan Feder
2006-12-01 18:48 ` Martin Waitz
2006-12-01 23:34 ` sf
2006-12-02 19:46 ` Martin Waitz
2006-12-01 19:17 ` Andy Parkins
2006-12-01 19:38 ` Martin Waitz
2006-12-01 21:04 ` Andy Parkins
2006-12-01 21:37 ` Martin Waitz
2006-12-01 21:54 ` Andy Parkins
2006-12-01 22:08 ` Martin Waitz
2006-12-02 10:04 ` Andy Parkins
2006-12-02 13:50 ` Josef Weidendorfer
2006-12-02 20:43 ` Martin Waitz
2006-12-03 1:02 ` Josef Weidendorfer
2006-12-02 20:40 ` Martin Waitz
2006-12-02 13:14 ` Jakub Narebski
2006-12-02 13:08 ` Jakub Narebski
2006-12-02 12:48 ` Jakub Narebski
2006-11-28 17:28 ` Daniel Barkalow
2006-11-28 18:08 ` Sven Verdoolaege
2006-11-28 18:37 ` Daniel Barkalow
2006-11-28 19:06 ` Sven Verdoolaege
2006-11-28 20:41 ` Daniel Barkalow
2006-11-28 21:10 ` Shawn Pearce
2006-11-28 21:32 ` Daniel Barkalow
2006-11-28 21:53 ` Linus Torvalds
2006-11-20 22:49 ` Jakub Narebski
2006-11-21 7:21 ` Shawn Pearce
2006-11-22 5:29 ` Petr Baudis
2006-12-02 20:16 ` Jakub Narebski
2006-12-03 1:24 ` Robin Rosenberg
2006-12-03 1:31 ` Jakub Narebski
2006-12-03 12:22 ` Robin Rosenberg
2006-12-03 12:31 ` Jakub Narebski
2006-12-03 11:00 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061203024655.GD26668@spearce.org \
--to=spearce@spearce.org \
--cc=Josef.Weidendorfer@gmx.de \
--cc=git@vger.kernel.org \
--cc=sf-gmane@stephan-feder.de \
--cc=tali@admingilde.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).