git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "George Spelvin" <linux@horizon.com>
To: linux@horizon.com, nico@fluxnic.net
Cc: anthonyvdgent@gmail.com, git@vger.kernel.org,
	torvalds@linux-foundation.org
Subject: Re: Git commit generation numbers
Date: 20 Jul 2011 18:16:32 -0400	[thread overview]
Message-ID: <20110720221632.14223.qmail@science.horizon.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1107201538590.21187@xanadu.home>

> The alternative of having to sometimes use the generation number, 
> sometimes use the possibly broken commit date, makes for much more 
> complicated code that has to be maintained forever.  Having a solution 
> that starts working only after a certain point in history doesn't look 
> eleguant to me at all.  It is not like having different pack formats 
> where back and forth conversions can be made for the _entire_ history.

It seemed like a pretty strong argument to me, too.

> And if you don't care about graft/replace then the cached data is 
> immutable just like the in-commit version would, so there is no 
> consistency issues.  If you do care about graft/replace (or who knows 
> what other dag alteration scheme might be created in 5 years from now) 
> then a separate cache will be required _anyway_, regardless of any 
> in-commit gen number.

A possible workaround would be to keep track of the largest generation
number skew introduced by any graft, and add that safety factor into
the history-walking code, but that would be painful if you replace a
single large commit with an equivalent long development history, such
as adding a historical development tree behind a recently-cut-off one.
or development history You can do a workaround at the expense of ine

> Neither did I think about the actual cache format (I don't think that
> adding it to the pack index is a good idea if grafts are to be honored)
> which certainly has bearing on that fundamental question too.

I was thinking of something very close to the V2 pack format.
http://book.git-scm.com/7_the_packfile.html
A magic number, a 256-entry fanout table, a sorted list of 20-byte hashes,
followed by a matching list of 4-byte generation numbers.

Ending with a 20-byte hash of the replaces and grafts state that this
cache is valid for, and a hash of the cache itself.

A bit of code factoring should make it easy to share much of the code.


It would certainly be possible to share the SHA1 table in an existing
pack index and store the generation numbers of the base (no replacement)
case, but you'd have to store null values for all the non-commit objects.

That takes 4 bytes per object, while a separate list of commits
takes 24 bytes per commit.  A separate list is better if commits
are less than 1/6 of all objects.

Looking at git's own object database, we have:
 66125 blobs   (45.50%)
 49292 trees   (33.92%)
 29554 commits (20.33%)
   362 tags    ( 0.25%)
145333 total

So we're actually a bit over the 16.66% optimum. but it's not far enough
to be a real efficiency problem.

  reply	other threads:[~2011-07-20 22:16 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-17 18:27 Git commit generation numbers George Spelvin
2011-07-17 19:00 ` Long, Martin
2011-07-17 19:30 ` Linus Torvalds
2011-07-17 23:39   ` George Spelvin
2011-07-17 23:58     ` Linus Torvalds
2011-07-18  5:13       ` George Spelvin
2011-07-18 10:28         ` Anthony Van de Gejuchte
2011-07-18 11:48           ` George Spelvin
2011-07-20 20:51             ` Nicolas Pitre
2011-07-20 22:16               ` George Spelvin [this message]
2011-07-20 23:26                 ` david
2011-07-20 23:36                   ` Nicolas Pitre
2011-07-21  0:08                     ` Phil Hord
2011-07-21  0:18                       ` david
2011-07-21  0:37                         ` Shawn Pearce
2011-07-21  0:47                           ` Phil Hord
2011-07-21  4:26                           ` david
2011-07-21 12:43                             ` George Spelvin
2011-07-21 19:19                               ` Jakub Narebski
2011-07-21 20:27                                 ` George Spelvin
2011-07-21 20:33                                   ` Shawn Pearce
2011-07-22 12:18                                   ` Jakub Narebski
2011-07-22 13:09                                     ` Nicolas Pitre
2011-07-22 18:02                                       ` david
2011-07-22 18:34                                         ` Jakub Narebski
2011-07-22 19:06                                           ` Linus Torvalds
2011-07-22 22:02                                             ` Jeff King
2011-07-28 15:00                                             ` Felipe Contreras
2011-09-06 10:02                                               ` Ramkumar Ramachandra
2011-07-22 19:08                                           ` david
2011-07-22 19:40                                             ` Nicolas Pitre
2011-07-22 18:02                                     ` david
2011-07-21  0:39                         ` Phil Hord
2011-07-21  0:58                       ` Nicolas Pitre
2011-07-21  1:09                         ` Phil Hord
2011-07-21 12:03                   ` Drew Northup
2011-07-21 12:55                     ` George Spelvin
2011-07-21 15:57                       ` Drew Northup
2011-07-21 16:24                         ` Phil Hord
2011-07-21 22:40                           ` Pēteris Kļaviņš
2011-07-22  9:30                             ` Christian Couder
2011-07-21 17:36                         ` George Spelvin
  -- strict thread matches above, loose matches on Subject: below --
2011-07-14 18:24 Linus Torvalds
2011-07-14 18:37 ` Jeff King
2011-07-14 18:47   ` Linus Torvalds
2011-07-14 18:55     ` Linus Torvalds
2011-07-14 19:12       ` Jeff King
2011-07-14 19:46       ` Ted Ts'o
2011-07-14 19:51         ` Linus Torvalds
2011-07-14 20:07           ` Jeff King
2011-07-14 20:08           ` Ted Ts'o
2011-07-14 19:08     ` Jeff King
2011-07-14 19:23       ` Linus Torvalds
2011-07-14 20:01         ` Jeff King
2011-07-14 20:19           ` Linus Torvalds
2011-07-14 20:31             ` Jeff King
2011-07-15  1:19               ` Linus Torvalds
2011-07-15  2:41                 ` Geert Bosch
2011-07-15  7:46                 ` Jeff King
2011-07-15 16:10                   ` Linus Torvalds
2011-07-15 16:18                     ` Shawn Pearce
2011-07-15 16:44                       ` Linus Torvalds
2011-07-15 18:42                         ` Ted Ts'o
2011-07-15 19:00                           ` Linus Torvalds
2011-07-16  9:16                           ` Christian Couder
2011-07-18  3:41                             ` Jeff King
2011-07-19  4:14                               ` Christian Couder
2011-07-19 20:00                                 ` Jeff King
2011-07-21  6:29                                   ` Christian Couder
2011-07-15 18:46                         ` Tony Luck
2011-07-15 18:58                           ` Linus Torvalds
2011-07-15 19:48                     ` Jeff King
2011-07-15 20:07                       ` Jeff King
2011-07-15 21:17                       ` Linus Torvalds
2011-07-15 21:54                         ` Jeff King
2011-07-15 23:10                         ` Linus Torvalds
2011-07-15 23:16                           ` Linus Torvalds
2011-07-15 23:36                             ` Linus Torvalds
2011-07-16  0:42                               ` Jeff King
2011-07-16  0:40                           ` Jeff King
2011-07-15  9:12                 ` Jakub Narebski
2011-07-15  9:17                   ` Long, Martin
2011-07-15 15:33                     ` Long, Martin
2011-07-15 16:15                       ` Drew Northup
2011-07-14 18:52   ` Linus Torvalds
2011-07-14 19:08     ` Jakub Narebski
2011-07-14 20:26   ` Junio C Hamano
2011-07-14 20:41     ` Jeff King
2011-07-14 21:30       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110720221632.14223.qmail@science.horizon.com \
    --to=linux@horizon.com \
    --cc=anthonyvdgent@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=nico@fluxnic.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).