git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "George Spelvin" <linux@horizon.com>
To: linux@horizon.com, torvalds@linux-foundation.org
Cc: git@vger.kernel.org
Subject: Re: Git commit generation numbers
Date: 17 Jul 2011 19:39:59 -0400	[thread overview]
Message-ID: <20110717233959.3548.qmail@science.horizon.com> (raw)
In-Reply-To: <CA+55aFwqFhzd_cmbFxkCyNXhF99igBqdr8p4J76hLz=m4=ZNWg@mail.gmail.com>

> So the generation number really is very very fundamnetal. It's
> absolutely not some "additional information that can be computed",
> because the whole AND ONLY point of having the number is to not
> compute it.
> 
> We are never interested in the generation number for its own sake. We
> are only interested in it in order to avoid having to look at the rest
> of the DAG.

You're making my point and somehow not seeing it.

What you're describing here is the archetpical cache.

The only reason for having a memory cache is to avoid accessing memory!
The only reason for having a TLB is to avoid walking the page tables!
The only reason for having a page cache is to avoid hitting the disk!
The only reason for having a dcache is to avoid traversing the file
system directories!

And yes, the only reason for having a generation number cache is to avoid
traversing the DAG.  D'oh.  Do you think this is somehow news to anyone?

The fundamental nature of a cache is that it lets you look something up
quickly that you could compute but don't want to.

I'm slapping my forehead like Homer Simpson here.  The fact that computing
the generation number is expensive is why it's worth cacheing.  But the
fact that it *can* be computed is a reason not to clutter the published
commit object format with it.


The generation number is NOT FUNDAMENTAL.  It contains no information
that's not already in the DAG.  The danger of putting it into a commit
is that you'll do it wrong, and thereby screw everything up.

If we have broken code that generates a broken cache, we fix the code
and the bugs magically go away.

If we have broken code that generates a broken commit object, we have
a huge problem.

Just like we don't ship pack indexes around, but recompute them on arrival.
The index is essential for performance, but it's absolutely non-essential
for correctness.


As a general design principle, the exported data structures, like the
commits, should be as simple as possible.  Do not include extraneous
or redundant data, because then you have to deal with the possibility
of inconsistency.  This leads to bugs.  (Frequently buffer overflow bugs.)

Maybe it would have been worth violating that principle during the initial
git design.  I still see a good argument for not doing that even if we
had a time machine.

But now that the commit format is established and widely used, the argument
has far more force.  Changing the commit format provides zero functionality
gain, and the performance gain can be obtained a different way.

Maybe a bit more code, but nothing extraordinary.

To me, the KISS principle says "don't change the commit format!"

Now, you complain about code complexity.  But this is a read-only cache.
The generation number of a commit object never changes.  There's no update
operation.  Like an I-cache, if there's ever any problem, throw it away.

Arguing that "the patch to put it in the commit object is smaller" is
stupidly short-sighted.  Now every version of git from now until forever
has to support both kinds of commit objects.  (And browsing old git
trees will forever be slow.)

You only take on that sort of legacy support burden if you absolutely have to.

> But "just because we could recompute it" is a bad bad reason.

Bull puckey.  You're ugly and stupid and WRONG.

It's an excellent reason.  I'm amazed that you're not seeing it.
The principle is "don't include redundant data in a transport format."
Because it can be recomputed, it's redundant.  Therefore, it shouldn't
be included in the transport format.

It's exactly the same principle as "don't store the indexes in the
database dump" and "don't store filename hashes in file system
archives".

This is a principle, not an iron-clad rule.  It can be violated for
good and sufficient reasons, notably performance.

But in this case, we can get the performance without it.  Without,
in fact, changing the git transport format at all.

And "don't change a widely-used transport format" is ANOTHER important
principle.  Backward-compatible is much better than incompatible, but
far better to avoid changing it at all.

Breaking two such principles without an absolutely iron-clad reason is
ugly and stupid and wrong.

(As you well know, the more general principle is "don't store redundant
data AT ALL unless you need to for performance".  Redundant data is A
Bad Thing.  It can get out of sync.  But if you have to, a private cache
is much better than a exchange format.)


Put another way, it IS stupid, it IS expendable, and therefore it SHOULD go.

  reply	other threads:[~2011-07-17 23:40 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-17 18:27 Git commit generation numbers George Spelvin
2011-07-17 19:00 ` Long, Martin
2011-07-17 19:30 ` Linus Torvalds
2011-07-17 23:39   ` George Spelvin [this message]
2011-07-17 23:58     ` Linus Torvalds
2011-07-18  5:13       ` George Spelvin
2011-07-18 10:28         ` Anthony Van de Gejuchte
2011-07-18 11:48           ` George Spelvin
2011-07-20 20:51             ` Nicolas Pitre
2011-07-20 22:16               ` George Spelvin
2011-07-20 23:26                 ` david
2011-07-20 23:36                   ` Nicolas Pitre
2011-07-21  0:08                     ` Phil Hord
2011-07-21  0:18                       ` david
2011-07-21  0:37                         ` Shawn Pearce
2011-07-21  0:47                           ` Phil Hord
2011-07-21  4:26                           ` david
2011-07-21 12:43                             ` George Spelvin
2011-07-21 19:19                               ` Jakub Narebski
2011-07-21 20:27                                 ` George Spelvin
2011-07-21 20:33                                   ` Shawn Pearce
2011-07-22 12:18                                   ` Jakub Narebski
2011-07-22 13:09                                     ` Nicolas Pitre
2011-07-22 18:02                                       ` david
2011-07-22 18:34                                         ` Jakub Narebski
2011-07-22 19:06                                           ` Linus Torvalds
2011-07-22 22:02                                             ` Jeff King
2011-07-28 15:00                                             ` Felipe Contreras
2011-09-06 10:02                                               ` Ramkumar Ramachandra
2011-07-22 19:08                                           ` david
2011-07-22 19:40                                             ` Nicolas Pitre
2011-07-22 18:02                                     ` david
2011-07-21  0:39                         ` Phil Hord
2011-07-21  0:58                       ` Nicolas Pitre
2011-07-21  1:09                         ` Phil Hord
2011-07-21 12:03                   ` Drew Northup
2011-07-21 12:55                     ` George Spelvin
2011-07-21 15:57                       ` Drew Northup
2011-07-21 16:24                         ` Phil Hord
2011-07-21 22:40                           ` Pēteris Kļaviņš
2011-07-22  9:30                             ` Christian Couder
2011-07-21 17:36                         ` George Spelvin
  -- strict thread matches above, loose matches on Subject: below --
2011-07-14 18:24 Linus Torvalds
2011-07-14 18:37 ` Jeff King
2011-07-14 18:47   ` Linus Torvalds
2011-07-14 18:55     ` Linus Torvalds
2011-07-14 19:12       ` Jeff King
2011-07-14 19:46       ` Ted Ts'o
2011-07-14 19:51         ` Linus Torvalds
2011-07-14 20:07           ` Jeff King
2011-07-14 20:08           ` Ted Ts'o
2011-07-14 19:08     ` Jeff King
2011-07-14 19:23       ` Linus Torvalds
2011-07-14 20:01         ` Jeff King
2011-07-14 20:19           ` Linus Torvalds
2011-07-14 20:31             ` Jeff King
2011-07-15  1:19               ` Linus Torvalds
2011-07-15  2:41                 ` Geert Bosch
2011-07-15  7:46                 ` Jeff King
2011-07-15 16:10                   ` Linus Torvalds
2011-07-15 16:18                     ` Shawn Pearce
2011-07-15 16:44                       ` Linus Torvalds
2011-07-15 18:42                         ` Ted Ts'o
2011-07-15 19:00                           ` Linus Torvalds
2011-07-16  9:16                           ` Christian Couder
2011-07-18  3:41                             ` Jeff King
2011-07-19  4:14                               ` Christian Couder
2011-07-19 20:00                                 ` Jeff King
2011-07-21  6:29                                   ` Christian Couder
2011-07-15 18:46                         ` Tony Luck
2011-07-15 18:58                           ` Linus Torvalds
2011-07-15 19:48                     ` Jeff King
2011-07-15 20:07                       ` Jeff King
2011-07-15 21:17                       ` Linus Torvalds
2011-07-15 21:54                         ` Jeff King
2011-07-15 23:10                         ` Linus Torvalds
2011-07-15 23:16                           ` Linus Torvalds
2011-07-15 23:36                             ` Linus Torvalds
2011-07-16  0:42                               ` Jeff King
2011-07-16  0:40                           ` Jeff King
2011-07-15  9:12                 ` Jakub Narebski
2011-07-15  9:17                   ` Long, Martin
2011-07-15 15:33                     ` Long, Martin
2011-07-15 16:15                       ` Drew Northup
2011-07-14 18:52   ` Linus Torvalds
2011-07-14 19:08     ` Jakub Narebski
2011-07-14 20:26   ` Junio C Hamano
2011-07-14 20:41     ` Jeff King
2011-07-14 21:30       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110717233959.3548.qmail@science.horizon.com \
    --to=linux@horizon.com \
    --cc=git@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).