From: Shawn Pearce <spearce@spearce.org> To: Josef Weidendorfer <Josef.Weidendorfer@gmx.de> Cc: Linus Torvalds <torvalds@osdl.org>, Martin Waitz <tali@admingilde.org>, sf <sf-gmane@stephan-feder.de>, git@vger.kernel.org Subject: Re: Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT] Date: Sat, 2 Dec 2006 21:46:55 -0500 [thread overview] Message-ID: <20061203024655.GD26668@spearce.org> (raw) In-Reply-To: <200612030307.26429.Josef.Weidendorfer@gmx.de> Josef Weidendorfer <Josef.Weidendorfer@gmx.de> wrote: > Thinking even one step further: > Would it make sense to define an encoding format for the content of > commit and tree objects inside of packs, where the SHA1 is replaced by the > offset of the object in this pack? > As exactly the SHA1 is the least compressable thing, this could promise > quite a benefit. I actually had the same idea the other day. I discarded it after thinking about it for a minute. Here's the problem: Lets say we do this for the tree and parent IDs in a commit, because these are the most commonly needed part of a commit during revision traversal. So we want to put the offset to the tree and the offset to each parent at the front of the commit somehow to make them very cheap to access. This means that when we start to write out a commit we need to know the offset to the tree that commit references. But git-pack-objects sorts object by type: commit, tree, blob (I forget where tags go, but they aren't important in this context). So generally *all* commits appear before the first tree. So when we write out the first commit we need to know exactly how many bytes every commit will need (compressed mind you) in this pack so we can determine the position of the first tree. Now do this for every commit and every tree that those commits use... yes, its a lot of work to precompute and store all offsets before you even write out the first byte. Its even worse with parent commits because ancestors tend to appear behind the commit (newest->oldest) so that "git log" can benefit from OS read-ahead. So you also have to keep track of your parent commmit offsets. Not pretty. Extending that idea to tree objects (store the offset of the entry) makes the issue even uglier. Oh, and packs aren't entirely self-contained. A pack is only self contained in the sense that no object in the pack deltafies against an object outside of the pack[1]. However by design an object (e.g. a commit or a tree) can reference an object which is either loose or which is in another pack. This is especially important for every large projects where not every commit/tree/tag/blob will fit into one 4 giB file. **1** Except in the case of thin packs, which are used only on the network and only to save bandwidth. > AFAIK, we currently only use these offsets for referencing objects in > delta chains. Yes, that's a recent feature to reference a delta base. --
next prev parent reply other threads:[~2006-12-03 2:47 UTC|newest]
Thread overview: 252+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-20 21:51 [RFC] Submodules in GIT Martin Waitz
2006-11-20 22:16 ` Jakub Narebski
2006-11-20 22:28 ` Martin Waitz
2006-11-20 22:43 ` Junio C Hamano
2006-11-20 23:02 ` Jakub Narebski
2006-11-20 23:52 ` Martin Waitz
2006-11-21 1:31 ` Sam Vilain
2006-11-20 23:05 ` Linus Torvalds
2006-11-20 23:25 ` J. Bruce Fields
2006-11-20 23:33 ` Martin Waitz
2006-11-21 18:01 ` J. Bruce Fields
2006-11-21 19:32 ` Martin Waitz
2006-11-20 23:29 ` Martin Waitz
2006-11-21 0:10 ` Junio C Hamano
2006-11-21 0:42 ` Jakub Narebski
2006-11-21 6:21 ` Martin Waitz
2006-11-21 10:04 ` Jakub Narebski
2006-11-21 11:49 ` Martin Waitz
2006-11-21 6:27 ` Martin Waitz
2006-11-21 7:36 ` Junio C Hamano
2006-11-21 7:55 ` Martin Waitz
2006-11-21 22:31 ` Yann Dirson
2006-11-21 22:51 ` Linus Torvalds
2006-11-21 22:59 ` Linus Torvalds
2006-11-21 23:54 ` Yann Dirson
2006-11-22 3:40 ` Shawn Pearce
2006-11-23 23:23 ` Yann Dirson
2006-11-25 6:53 ` Shawn Pearce
2006-11-25 11:12 ` Yann Dirson
2006-11-25 18:57 ` Linus Torvalds
2006-11-25 19:19 ` Steven Grimm
2006-11-25 19:30 ` Linus Torvalds
2006-11-25 23:49 ` Yann Dirson
2006-11-26 1:14 ` Sven Verdoolaege
2006-11-26 1:32 ` Yann Dirson
2006-11-26 3:39 ` Linus Torvalds
2006-11-26 8:05 ` Daniel Barkalow
2006-11-28 9:36 ` Andreas Ericsson
2006-11-28 10:29 ` Andy Parkins
2006-11-28 10:50 ` Jakub Narebski
2006-11-28 13:35 ` Andy Parkins
2006-11-28 15:44 ` Shawn Pearce
2006-11-28 16:29 ` Andy Parkins
2006-11-28 16:36 ` Shawn Pearce
2006-11-28 17:38 ` Jon Loeliger
2006-11-29 16:15 ` Martin Waitz
2006-11-30 11:57 ` sf
[not found] ` <200611301255.41733.andyparkins@gmail.com>
2006-11-30 14:00 ` Stephan Feder
2006-11-30 14:49 ` Andy Parkins
2006-11-30 15:20 ` Sven Verdoolaege
2006-11-30 15:30 ` Andy Parkins
2006-11-30 15:50 ` Andreas Ericsson
2006-11-30 16:08 ` Andy Parkins
2006-11-30 16:33 ` Sven Verdoolaege
2006-12-01 0:01 ` Andy Parkins
2006-12-01 0:11 ` Jakub Narebski
2006-12-01 9:32 ` Sven Verdoolaege
2006-12-01 10:19 ` Andy Parkins
2006-11-30 17:19 ` Martin Waitz
2006-11-30 16:05 ` sf
2006-11-30 16:12 ` sf
2006-12-01 9:19 ` Andy Parkins
2006-12-01 9:57 ` Martin Waitz
2006-12-01 10:29 ` Andy Parkins
2006-12-01 10:42 ` Sven Verdoolaege
2006-12-01 11:02 ` Andy Parkins
2006-12-01 11:10 ` Sven Verdoolaege
2006-12-01 11:45 ` sf
2006-12-01 12:12 ` Andy Parkins
2006-12-01 12:28 ` Martin Waitz
2006-12-01 14:11 ` Andy Parkins
2006-12-01 15:12 ` Martin Waitz
2006-12-01 11:46 ` Martin Waitz
2006-12-01 12:16 ` Andy Parkins
2006-12-01 12:34 ` Martin Waitz
2006-12-01 13:59 ` Andy Parkins
2006-12-01 14:07 ` Martin Waitz
2006-12-01 11:31 ` Martin Waitz
2006-12-01 12:20 ` Andy Parkins
2006-12-01 12:37 ` Martin Waitz
2006-12-02 15:16 ` Jakub Narebski
2006-11-28 19:58 ` Steven Grimm
2006-11-28 21:02 ` Shawn Pearce
2006-11-29 16:03 ` Martin Waitz
2006-11-29 20:00 ` Andy Parkins
2006-11-30 12:16 ` Andreas Ericsson
2006-11-30 12:40 ` Andy Parkins
2006-11-30 17:06 ` Martin Waitz
2006-11-30 18:57 ` Andreas Ericsson
2006-12-01 8:49 ` Andy Parkins
2006-12-01 9:33 ` Andreas Ericsson
2006-12-01 10:38 ` Andy Parkins
2006-12-01 12:03 ` sf
2006-12-01 12:11 ` Martin Waitz
2006-12-01 13:21 ` sf
2006-12-01 13:43 ` Martin Waitz
2006-12-01 14:23 ` Stephan Feder
2006-12-01 15:07 ` Martin Waitz
2006-12-01 16:04 ` Stephan Feder
2006-12-01 16:15 ` Martin Waitz
2006-12-05 9:01 ` Uwe Kleine-Koenig
2006-12-05 10:33 ` Andreas Ericsson
2006-12-05 11:11 ` Jakub Narebski
2006-12-05 15:02 ` Uwe Kleine-Koenig
2006-12-05 15:30 ` Andreas Ericsson
2006-12-05 16:00 ` Sven Verdoolaege
2006-12-01 9:02 ` Andy Parkins
2006-12-01 11:00 ` Martin Waitz
2006-12-01 12:09 ` sf
2006-12-01 12:12 ` Martin Waitz
2006-12-01 13:05 ` sf
2006-12-01 13:35 ` Martin Waitz
2006-12-01 13:43 ` Andreas Ericsson
2006-12-01 13:46 ` Martin Waitz
2006-12-01 14:52 ` Andreas Ericsson
2006-12-01 15:00 ` Martin Waitz
2006-12-01 16:38 ` Andreas Ericsson
2006-12-01 16:49 ` Linus Torvalds
2006-12-01 17:08 ` sf
2006-12-01 18:06 ` Andreas Ericsson
2006-12-01 20:13 ` Linus Torvalds
2006-12-01 20:30 ` Martin Waitz
2006-12-01 23:23 ` Alan Chandler
2006-12-01 22:06 ` Josef Weidendorfer
2006-12-01 22:12 ` Martin Waitz
2006-12-01 22:26 ` Josef Weidendorfer
2006-12-01 22:40 ` Martin Waitz
2006-12-01 23:17 ` Josef Weidendorfer
2006-12-02 20:24 ` Martin Waitz
2006-12-03 0:55 ` Josef Weidendorfer
2006-12-03 6:29 ` Martin Waitz
2006-12-01 22:26 ` Linus Torvalds
2006-12-01 22:41 ` sf
2006-12-01 23:03 ` Josef Weidendorfer
2006-12-01 23:09 ` Linus Torvalds
2006-12-01 23:36 ` Josef Weidendorfer
2006-12-02 0:12 ` Linus Torvalds
2006-12-02 9:22 ` Andy Parkins
[not found] ` <200612021255.59972.Josef.Weidendorfer@gmx.de>
2006-12-03 9:42 ` Andy Parkins
2006-12-02 11:32 ` Josef Weidendorfer
2006-12-02 19:52 ` Linus Torvalds
2006-12-02 20:21 ` Martin Waitz
2006-12-02 20:46 ` Linus Torvalds
2006-12-02 20:58 ` Martin Waitz
2006-12-03 1:11 ` Josef Weidendorfer
2006-12-02 20:18 ` Martin Waitz
2006-12-02 20:44 ` Linus Torvalds
2006-12-02 21:06 ` Martin Waitz
2006-12-02 21:29 ` Linus Torvalds
2006-12-02 21:22 ` Linus Torvalds
2006-12-03 2:07 ` Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT] Josef Weidendorfer
2006-12-03 2:25 ` Linus Torvalds
2006-12-03 2:46 ` Shawn Pearce [this message]
2006-12-03 3:21 ` Josef Weidendorfer
2006-12-03 11:10 ` Jakub Narebski
2006-12-03 11:47 ` Josef Weidendorfer
2006-12-03 20:46 ` [RFC] Submodules in GIT Martin Waitz
2006-12-03 22:16 ` Sven Verdoolaege
2006-12-03 22:32 ` Linus Torvalds
2006-12-03 22:49 ` Jakub Narebski
2006-12-04 11:12 ` Josef Weidendorfer
2006-12-01 23:49 ` sf
2006-12-02 18:57 ` Torgil Svensson
2006-12-02 19:41 ` Linus Torvalds
2006-12-03 9:19 ` Torgil Svensson
2006-12-03 17:54 ` Linus Torvalds
2006-12-04 20:26 ` Torgil Svensson
2006-12-04 20:41 ` Linus Torvalds
2006-12-04 21:36 ` Torgil Svensson
2006-12-05 10:42 ` Andreas Ericsson
2006-12-05 11:09 ` Jakub Narebski
2006-12-05 10:38 ` Andreas Ericsson
2006-12-05 11:01 ` Jakub Narebski
2006-12-03 19:33 ` Andy Parkins
2006-12-05 2:33 ` Daniel Barkalow
2006-12-05 22:07 ` sf
2006-12-09 21:34 ` R. Steve McKown
2006-12-10 11:47 ` Torgil Svensson
2006-12-14 21:27 ` Torgil Svensson
2006-12-14 23:07 ` Josef Weidendorfer
2006-12-15 17:43 ` Torgil Svensson
2006-12-15 21:42 ` Josef Weidendorfer
2006-12-15 23:43 ` Torgil Svensson
2006-12-16 1:13 ` Torgil Svensson
2006-12-16 1:20 ` Torgil Svensson
2006-12-16 1:34 ` Jakub Narebski
2006-12-16 8:40 ` Torgil Svensson
2006-12-16 9:57 ` Jakub Narebski
2006-12-16 10:25 ` Junio C Hamano
2006-12-16 15:05 ` Torgil Svensson
2006-12-16 15:38 ` Torgil Svensson
2006-12-16 16:32 ` Jakub Narebski
2006-12-17 0:21 ` Torgil Svensson
2006-12-16 1:49 ` Linus Torvalds
2006-12-16 2:12 ` Linus Torvalds
2006-12-16 8:50 ` Torgil Svensson
2006-12-02 20:12 ` Martin Waitz
2006-12-01 22:55 ` Josef Weidendorfer
2006-12-01 23:07 ` Martin Waitz
2006-12-01 23:30 ` Linus Torvalds
2006-12-02 0:14 ` Josef Weidendorfer
2006-12-02 0:33 ` Linus Torvalds
2006-12-02 9:27 ` Andy Parkins
2006-12-04 18:56 ` Michael K. Edwards
2006-12-05 1:31 ` Sam Vilain
2006-12-01 22:35 ` sf
2006-12-08 18:29 ` Jon Loeliger
2006-12-08 18:45 ` Sven Verdoolaege
2006-12-12 8:32 ` Andreas Ericsson
2006-12-01 17:14 ` Martin Waitz
2006-12-01 16:57 ` Martin Waitz
2006-12-01 18:08 ` Andreas Ericsson
2006-12-01 18:51 ` Martin Waitz
2006-12-01 13:51 ` Stephan Feder
2006-12-01 14:58 ` Martin Waitz
2006-12-01 15:47 ` Stephan Feder
2006-12-01 16:54 ` Martin Waitz
2006-12-01 17:33 ` Stephan Feder
2006-12-01 18:48 ` Martin Waitz
2006-12-01 23:34 ` sf
2006-12-02 19:46 ` Martin Waitz
2006-12-01 19:17 ` Andy Parkins
2006-12-01 19:38 ` Martin Waitz
2006-12-01 21:04 ` Andy Parkins
2006-12-01 21:37 ` Martin Waitz
2006-12-01 21:54 ` Andy Parkins
2006-12-01 22:08 ` Martin Waitz
2006-12-02 10:04 ` Andy Parkins
2006-12-02 13:50 ` Josef Weidendorfer
2006-12-02 20:43 ` Martin Waitz
2006-12-03 1:02 ` Josef Weidendorfer
2006-12-02 20:40 ` Martin Waitz
2006-12-02 13:14 ` Jakub Narebski
2006-12-02 13:08 ` Jakub Narebski
2006-12-02 12:48 ` Jakub Narebski
2006-11-28 17:28 ` Daniel Barkalow
2006-11-28 18:08 ` Sven Verdoolaege
2006-11-28 18:37 ` Daniel Barkalow
2006-11-28 19:06 ` Sven Verdoolaege
2006-11-28 20:41 ` Daniel Barkalow
2006-11-28 21:10 ` Shawn Pearce
2006-11-28 21:32 ` Daniel Barkalow
2006-11-28 21:53 ` Linus Torvalds
2006-11-20 22:49 ` Jakub Narebski
2006-11-21 7:21 ` Shawn Pearce
2006-11-22 5:29 ` Petr Baudis
2006-12-02 20:16 ` Jakub Narebski
2006-12-03 1:24 ` Robin Rosenberg
2006-12-03 1:31 ` Jakub Narebski
2006-12-03 12:22 ` Robin Rosenberg
2006-12-03 12:31 ` Jakub Narebski
2006-12-03 11:00 ` Jakub Narebski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061203024655.GD26668@spearce.org \
--to=spearce@spearce.org \
--cc=Josef.Weidendorfer@gmx.de \
--cc=git@vger.kernel.org \
--cc=sf-gmane@stephan-feder.de \
--cc=tali@admingilde.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.