From: "George Spelvin" <linux@horizon.com>
To: jnareb@gmail.com, linux@horizon.com
Cc: anthonyvdgent@gmail.com, david@lang.hm, git@vger.kernel.org,
hordp@cisco.com, nico@fluxnic.net, spearce@spearce.org,
torvalds@linux-foundation.org
Subject: Re: Git commit generation numbers
Date: 21 Jul 2011 16:27:22 -0400 [thread overview]
Message-ID: <20110721202722.3765.qmail@science.horizon.com> (raw)
In-Reply-To: <m3mxg7sasa.fsf@localhost.localdomain>
> There is also another issue that I have mentioned, namely incomplete
> clones - which currently means shallow clone, without access to full
> history.
As far as history walking is concerned, you can just consider "missing
parent" the same as "no parent" and start the generation numbers at 0.
As long as you recompute
> Nb. grafts are so horrible hack that I would be not against turning
> off generation numbers if they are used.
Yeah, but it's not too miserable to add support (the logic is very similar
to replace objects), and then you would be able to have the history walking
code depend on the presence of generation numbers. (The "load the cache"
function would regenerate it if necessary.)
Only do this if you already have support for "no generation numbers" in
the history walking code for (say) loose objects.
> In the case of replace objects you need both non-replaced and replaced
> DAG generation numbers.
Yes, the cache validity/invalidation criteria are the tricky bit.
Honestly, this is where the code gets ugly, not computing and storing
the generation numbers.
One thought on an expanded generation number cache:
There are many git operations that use ONLY the commit DAG, and do not
actually use any information from the commits other than their hashes
and parent pointers. The ones that come to mind are rev-parse, rev-list,
describe, name-rev, and merge-base.
These could be sped up if, instead of just generation numbers, we kept
a complete cached copy of the commit DAG, so the commit objects didn't
have to be uncompressed and parsed.
This could be provided by an extended form of generation number cache.
In addition to listing the generation number of each commit, it
would list all the ancestors (by file offset rather than hash, for
compactness). Then simple commit walking could load this cache and
avoid unpacking commit objects from packs.
A compact implementation would abuse the flexibility of generation numbers
to make them serve double duty. They would be used as offsets into a
table of parent pointers. By keeping the table topologically sorted,
the offsets would satisfy the requirements for generation numbers, but
would be unique, and there would be additional gaps when a commit had
multiple parents.
The parent pointers would themselves be 31-bit offsets into the table of
SHA-1 hashes, with the msbit meaning "this commit has multiple parents,
also look at the following table entry". (If we use offset 0 to mean
"no parents", it might be more convenient to have the offset point to
the *end* of the run of parents rather than the beginning, so "following"
would be earlier in the file, but that's an implementation detail.)
I'm assuming that 2^31 commits having (in aggregate) 2^32 parents would
be enough for the time being. As a local cache, it can be extended
with a software upgrade. There's no need to ever have support for two
formats in any given release; just notice that the cache format is wrong,
blow it away, and regenerate it.
next prev parent reply other threads:[~2011-07-21 20:27 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-17 18:27 Git commit generation numbers George Spelvin
2011-07-17 19:00 ` Long, Martin
2011-07-17 19:30 ` Linus Torvalds
2011-07-17 23:39 ` George Spelvin
2011-07-17 23:58 ` Linus Torvalds
2011-07-18 5:13 ` George Spelvin
2011-07-18 10:28 ` Anthony Van de Gejuchte
2011-07-18 11:48 ` George Spelvin
2011-07-20 20:51 ` Nicolas Pitre
2011-07-20 22:16 ` George Spelvin
2011-07-20 23:26 ` david
2011-07-20 23:36 ` Nicolas Pitre
2011-07-21 0:08 ` Phil Hord
2011-07-21 0:18 ` david
2011-07-21 0:37 ` Shawn Pearce
2011-07-21 0:47 ` Phil Hord
2011-07-21 4:26 ` david
2011-07-21 12:43 ` George Spelvin
2011-07-21 19:19 ` Jakub Narebski
2011-07-21 20:27 ` George Spelvin [this message]
2011-07-21 20:33 ` Shawn Pearce
2011-07-22 12:18 ` Jakub Narebski
2011-07-22 13:09 ` Nicolas Pitre
2011-07-22 18:02 ` david
2011-07-22 18:34 ` Jakub Narebski
2011-07-22 19:06 ` Linus Torvalds
2011-07-22 22:02 ` Jeff King
2011-07-28 15:00 ` Felipe Contreras
2011-09-06 10:02 ` Ramkumar Ramachandra
2011-07-22 19:08 ` david
2011-07-22 19:40 ` Nicolas Pitre
2011-07-22 18:02 ` david
2011-07-21 0:39 ` Phil Hord
2011-07-21 0:58 ` Nicolas Pitre
2011-07-21 1:09 ` Phil Hord
2011-07-21 12:03 ` Drew Northup
2011-07-21 12:55 ` George Spelvin
2011-07-21 15:57 ` Drew Northup
2011-07-21 16:24 ` Phil Hord
2011-07-21 22:40 ` Pēteris Kļaviņš
2011-07-22 9:30 ` Christian Couder
2011-07-21 17:36 ` George Spelvin
-- strict thread matches above, loose matches on Subject: below --
2011-07-14 18:24 Linus Torvalds
2011-07-14 18:37 ` Jeff King
2011-07-14 18:47 ` Linus Torvalds
2011-07-14 18:55 ` Linus Torvalds
2011-07-14 19:12 ` Jeff King
2011-07-14 19:46 ` Ted Ts'o
2011-07-14 19:51 ` Linus Torvalds
2011-07-14 20:07 ` Jeff King
2011-07-14 20:08 ` Ted Ts'o
2011-07-14 19:08 ` Jeff King
2011-07-14 19:23 ` Linus Torvalds
2011-07-14 20:01 ` Jeff King
2011-07-14 20:19 ` Linus Torvalds
2011-07-14 20:31 ` Jeff King
2011-07-15 1:19 ` Linus Torvalds
2011-07-15 2:41 ` Geert Bosch
2011-07-15 7:46 ` Jeff King
2011-07-15 16:10 ` Linus Torvalds
2011-07-15 16:18 ` Shawn Pearce
2011-07-15 16:44 ` Linus Torvalds
2011-07-15 18:42 ` Ted Ts'o
2011-07-15 19:00 ` Linus Torvalds
2011-07-16 9:16 ` Christian Couder
2011-07-18 3:41 ` Jeff King
2011-07-19 4:14 ` Christian Couder
2011-07-19 20:00 ` Jeff King
2011-07-21 6:29 ` Christian Couder
2011-07-15 18:46 ` Tony Luck
2011-07-15 18:58 ` Linus Torvalds
2011-07-15 19:48 ` Jeff King
2011-07-15 20:07 ` Jeff King
2011-07-15 21:17 ` Linus Torvalds
2011-07-15 21:54 ` Jeff King
2011-07-15 23:10 ` Linus Torvalds
2011-07-15 23:16 ` Linus Torvalds
2011-07-15 23:36 ` Linus Torvalds
2011-07-16 0:42 ` Jeff King
2011-07-16 0:40 ` Jeff King
2011-07-15 9:12 ` Jakub Narebski
2011-07-15 9:17 ` Long, Martin
2011-07-15 15:33 ` Long, Martin
2011-07-15 16:15 ` Drew Northup
2011-07-14 18:52 ` Linus Torvalds
2011-07-14 19:08 ` Jakub Narebski
2011-07-14 20:26 ` Junio C Hamano
2011-07-14 20:41 ` Jeff King
2011-07-14 21:30 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110721202722.3765.qmail@science.horizon.com \
--to=linux@horizon.com \
--cc=anthonyvdgent@gmail.com \
--cc=david@lang.hm \
--cc=git@vger.kernel.org \
--cc=hordp@cisco.com \
--cc=jnareb@gmail.com \
--cc=nico@fluxnic.net \
--cc=spearce@spearce.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).