From: "George Spelvin" <linux@horizon.com>
To: linux@horizon.com, torvalds@linux-foundation.org
Cc: git@vger.kernel.org
Subject: Re: Git commit generation numbers
Date: 18 Jul 2011 01:13:47 -0400 [thread overview]
Message-ID: <20110718051347.28952.qmail@science.horizon.com> (raw)
In-Reply-To: <CA+55aFwt+RDRK_r=9CXbdzsLuGDqswvGTtJDKi9Q3DQwB_Ha5Q@mail.gmail.com>
> Nobody has *ever* given a reason why the cache would be better than
> just making it explicit.
I thought I listed a few. Let me be clearer.
1) It involves changing the commit format. Since the change is
backward-compatible, it's not too bad, but this is still fundamentally
A Bad Thing, to be avoided if possible.
2) It can't be retrofitted to help historical browsing.
3) You have to support commits without generation numbers forever.
This is a support burden. If you can generate generation numbers for
an entire repository, including pre-existing commits, you can *throw
out* the commit date heuristic code entirely.
4) It can't be made to work with grafts or replace objects.
5) It includes information which is redundant, but hard to verify,
in git objects. Leading to potentially bizarre and version-dependent
behaviour if it's wrong. (Checking that the numbers are consistent
is the same work as regenerating a cache.)
6) It makes git commits slightly larger. (Okay, that's reaching.)
> Why is that so hard for people to understand? The cache is just EXTRA WORK.
That's why it *might* have been a good idea to include the number in
the original design. But now that the design is widely deployed, it's
better to avoid changing the design if not necessary.
With a bit of extra work, it's not necessary.
> To take your TLB example: it's like having a TLB for a page table that
> would be as easy to just create in a way that it's *faster* to look up
> in the actual data structure than it would be to look up in the cache.
You've subtly jumped points. The original point was that it's worth
precomputing and storing the generation numbers. I was trying to
say that this is fundamentally a caching operation.
Now we're talking about *where* to store the cached generation numbers.
Your point, which is a very valid one, is that they are to be stored
on disk, exactly one per commit, can be computed when the commit is
generated, and are accessed at the same time as the commit, so it makes
all kinds of sense to store them *with* the commits. As part of them,
even.
This has the huge benefit that it does away with the need for a *separate*
data structure. (Kinda sorts like the way AMD stores instruction
boundaries in the L1 I-cache, avoiding the need for a separate data
structure.)
I'm arguing that, despite this annoying overhead, there are valid reasons
to want to store it separately. There are some practical ones, but the
basic one is an esthetic/maintainability judgement of "less cruft in
the commit objects is worth more cruft in the code".
Git has done very well partly *because* of the minimality of its basic
persistent object database format. I think we should be very reluctant
to add to that without a demonstrated need that *cannot* be met in
another way.
In this particular case, a TLB is not a transport format. It's okay
to add redundant cruft to make it faster, because it only lasts until
the next reboot. (A more apropos, software-oriented analogy might be
"struct page".)
A git commit object *is* a transport format, one specifically designed
for transporting data a very long way forward in time, so it should be
designed with considerable care, and cruft ruthlessly eradicated.
Whatever you add to it has to be supported by every git implementation,
forever. As does every implementation bug ever produced.
A cache, on the other hand, is purely a local implementation detail.
It can be changed between versions with much less effort.
I agree it's more implementation work. But the upside is a cleaner
struct commit. Which is a very good thing.
next prev parent reply other threads:[~2011-07-18 5:13 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-17 18:27 Git commit generation numbers George Spelvin
2011-07-17 19:00 ` Long, Martin
2011-07-17 19:30 ` Linus Torvalds
2011-07-17 23:39 ` George Spelvin
2011-07-17 23:58 ` Linus Torvalds
2011-07-18 5:13 ` George Spelvin [this message]
2011-07-18 10:28 ` Anthony Van de Gejuchte
2011-07-18 11:48 ` George Spelvin
2011-07-20 20:51 ` Nicolas Pitre
2011-07-20 22:16 ` George Spelvin
2011-07-20 23:26 ` david
2011-07-20 23:36 ` Nicolas Pitre
2011-07-21 0:08 ` Phil Hord
2011-07-21 0:18 ` david
2011-07-21 0:37 ` Shawn Pearce
2011-07-21 0:47 ` Phil Hord
2011-07-21 4:26 ` david
2011-07-21 12:43 ` George Spelvin
2011-07-21 19:19 ` Jakub Narebski
2011-07-21 20:27 ` George Spelvin
2011-07-21 20:33 ` Shawn Pearce
2011-07-22 12:18 ` Jakub Narebski
2011-07-22 13:09 ` Nicolas Pitre
2011-07-22 18:02 ` david
2011-07-22 18:34 ` Jakub Narebski
2011-07-22 19:06 ` Linus Torvalds
2011-07-22 22:02 ` Jeff King
2011-07-28 15:00 ` Felipe Contreras
2011-09-06 10:02 ` Ramkumar Ramachandra
2011-07-22 19:08 ` david
2011-07-22 19:40 ` Nicolas Pitre
2011-07-22 18:02 ` david
2011-07-21 0:39 ` Phil Hord
2011-07-21 0:58 ` Nicolas Pitre
2011-07-21 1:09 ` Phil Hord
2011-07-21 12:03 ` Drew Northup
2011-07-21 12:55 ` George Spelvin
2011-07-21 15:57 ` Drew Northup
2011-07-21 16:24 ` Phil Hord
2011-07-21 22:40 ` Pēteris Kļaviņš
2011-07-22 9:30 ` Christian Couder
2011-07-21 17:36 ` George Spelvin
-- strict thread matches above, loose matches on Subject: below --
2011-07-14 18:24 Linus Torvalds
2011-07-14 18:37 ` Jeff King
2011-07-14 18:47 ` Linus Torvalds
2011-07-14 18:55 ` Linus Torvalds
2011-07-14 19:12 ` Jeff King
2011-07-14 19:46 ` Ted Ts'o
2011-07-14 19:51 ` Linus Torvalds
2011-07-14 20:07 ` Jeff King
2011-07-14 20:08 ` Ted Ts'o
2011-07-14 19:08 ` Jeff King
2011-07-14 19:23 ` Linus Torvalds
2011-07-14 20:01 ` Jeff King
2011-07-14 20:19 ` Linus Torvalds
2011-07-14 20:31 ` Jeff King
2011-07-15 1:19 ` Linus Torvalds
2011-07-15 2:41 ` Geert Bosch
2011-07-15 7:46 ` Jeff King
2011-07-15 16:10 ` Linus Torvalds
2011-07-15 16:18 ` Shawn Pearce
2011-07-15 16:44 ` Linus Torvalds
2011-07-15 18:42 ` Ted Ts'o
2011-07-15 19:00 ` Linus Torvalds
2011-07-16 9:16 ` Christian Couder
2011-07-18 3:41 ` Jeff King
2011-07-19 4:14 ` Christian Couder
2011-07-19 20:00 ` Jeff King
2011-07-21 6:29 ` Christian Couder
2011-07-15 18:46 ` Tony Luck
2011-07-15 18:58 ` Linus Torvalds
2011-07-15 19:48 ` Jeff King
2011-07-15 20:07 ` Jeff King
2011-07-15 21:17 ` Linus Torvalds
2011-07-15 21:54 ` Jeff King
2011-07-15 23:10 ` Linus Torvalds
2011-07-15 23:16 ` Linus Torvalds
2011-07-15 23:36 ` Linus Torvalds
2011-07-16 0:42 ` Jeff King
2011-07-16 0:40 ` Jeff King
2011-07-15 9:12 ` Jakub Narebski
2011-07-15 9:17 ` Long, Martin
2011-07-15 15:33 ` Long, Martin
2011-07-15 16:15 ` Drew Northup
2011-07-14 18:52 ` Linus Torvalds
2011-07-14 19:08 ` Jakub Narebski
2011-07-14 20:26 ` Junio C Hamano
2011-07-14 20:41 ` Jeff King
2011-07-14 21:30 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110718051347.28952.qmail@science.horizon.com \
--to=linux@horizon.com \
--cc=git@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).