git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: Git commit generation numbers
Date: Fri, 15 Jul 2011 15:48:07 -0400	[thread overview]
Message-ID: <20110715194807.GA356@sigill.intra.peff.net> (raw)
In-Reply-To: <CA+55aFzS3KDNvKt-dXvYpuAQwFwD3+GCj8y8bRQCycPvrynT8Q@mail.gmail.com>

On Fri, Jul 15, 2011 at 09:10:48AM -0700, Linus Torvalds wrote:

> I think it's much worse to have the same information in two different
> places where it can cause inconsistencies that are hard to see and may
> not be repeatable. If git ever finds the wrong merge base (because,
> say, the generation numbers are wrong), I want it to be a *repeatable*
> thing. I want to be able to repeat on the git mailing list "hey, guys,
> look at what happens when I try to merge commits ABC and XYZ". If you
> go "yeah, it works for me", then that is bad.

Having the information in two different places is my concern, too. And I
think the fundamental difference between putting it inside or outside
the commit sha1 (where outside encompasses putting it in a cache, in the
pack-index, or whatever), is that I see the commit sha1 as somehow more
"definitive". That is, it is the sole data we pass from repo to repo
during pushes and pulls, and it is the thing that is consistency-checked
by hashes.

So if there is an inconsistency between what the parent pointers
represent, and what the generation number in "outside" storage says,
then the outside storage is wrong, and the parent pointers are the right
answer. It becomes a lot more fuzzy to me if there is an inconsistency
between what the parent pointers represent, and what the generation
number says.

How should that situation be handled? Should fsck check for it and
complain? Should we just ignore it, even though it may cause our
traversal algorithms to be inaccurate? Like clock skew, there's not much
that can be done if the commits are published.

Those are serious questions that I think should be considered if we are
going to put a generation header into the commit object, and I haven't
seen answers for them yet.

> Partly for that reason, I do think that if the generation count was
> embedded in the pack-file, that would not be an "ugly" decision. The
> pack-files have definitely become "core git data structures", and are
> more than just a local filesystem representation of the objects:
> they're obviously also the data transport method, even if the rules
> there are slightly different (no index, thank god, and incomplete
> "thin" packs).
> 
> That said, I don't think a generation count necessarily "fits" in the
> pack-file. They are designed to be incremental, so it's not very
> natural there. But I do think it would be conceptually prettier to
> have the "depth of commit" be part of the "filesystem" data than to
> have it as a separate ad-hoc cache.

Sure, I would be fine with that. When you say "packfile", do you mean
the the general concept, as in it could go in the pack index as opposed
to the packfile itself? Or specifically in the packfile? The latter
seems a lot more problematic to me in terms of implementation.

> > Those things rely on the idea that the git DAG is a data model that we
> > present to the user, but that we're allowed to do things behind the
> > scenes to make things faster.
> 
> .. and that is relevant to this discussion exactly *how*?

Because keeping the generation information outside of the DAG keeps the
model we present to the user simple (and not just the user; the
information that we present to other programs), but lets git still use
the information without calculating it from scratch each time. Just like
we present the data as a DAG of loose objects via things like "git
cat-file", even though the underlying storage inside a packfile may be
very different. I just don't see those two ideas as fundamentally
different.

> It's not. It's totally irrelevant. I certainly would never walk away
> from the DAG model. It's a fundamental git decision, and it's the
> correct one.

Of course not. I never suggested we should.

> And that is what this discussion fundamentally boils down to for me.
> 
> If we should have fixed it in the original specification, we damn well
> should fix it today. It's been "ignorable" because it's just not been
> important enough. But if git now adds a fundamental cache for them,
> then that information is clearly no longer "not important enough".

OK, so let's say we add generation headers to each commit. What happens
next? Are we going to convert algorithms that use timestamps to use
commit generations? How are we going to handle performance issues when
dealing with older parts of history that don't have generations?

Again, those are serious questions that need answered. I respect that
you think the lack of a generation header is a design decision that
should be corrected. As I said before, I'm not 100% sure I agree, but
nor do I completely disagree (and I think it largely boils down to a
philosophical distinction, which I think you will agree should take a
backseat to real, practical concerns). But it's not 2005, and we have a
ton of history without generation numbers. So adding them now is only
one piece of the puzzle.

What's your solution for the rest of it?

-Peff

  parent reply	other threads:[~2011-07-15 19:49 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-14 18:24 Git commit generation numbers Linus Torvalds
2011-07-14 18:37 ` Jeff King
2011-07-14 18:47   ` Linus Torvalds
2011-07-14 18:55     ` Linus Torvalds
2011-07-14 19:12       ` Jeff King
2011-07-14 19:46       ` Ted Ts'o
2011-07-14 19:51         ` Linus Torvalds
2011-07-14 20:07           ` Jeff King
2011-07-14 20:08           ` Ted Ts'o
2011-07-14 19:08     ` Jeff King
2011-07-14 19:23       ` Linus Torvalds
2011-07-14 20:01         ` Jeff King
2011-07-14 20:19           ` Linus Torvalds
2011-07-14 20:31             ` Jeff King
2011-07-15  1:19               ` Linus Torvalds
2011-07-15  2:41                 ` Geert Bosch
2011-07-15  7:46                 ` Jeff King
2011-07-15 16:10                   ` Linus Torvalds
2011-07-15 16:18                     ` Shawn Pearce
2011-07-15 16:44                       ` Linus Torvalds
2011-07-15 18:42                         ` Ted Ts'o
2011-07-15 19:00                           ` Linus Torvalds
2011-07-16  9:16                           ` Christian Couder
2011-07-18  3:41                             ` Jeff King
2011-07-19  4:14                               ` Christian Couder
2011-07-19 20:00                                 ` Jeff King
2011-07-21  6:29                                   ` Christian Couder
2011-07-15 18:46                         ` Tony Luck
2011-07-15 18:58                           ` Linus Torvalds
2011-07-15 19:48                     ` Jeff King [this message]
2011-07-15 20:07                       ` Jeff King
2011-07-15 21:17                       ` Linus Torvalds
2011-07-15 21:54                         ` Jeff King
2011-07-15 23:10                         ` Linus Torvalds
2011-07-15 23:16                           ` Linus Torvalds
2011-07-15 23:36                             ` Linus Torvalds
2011-07-16  0:42                               ` Jeff King
2011-07-16  0:40                           ` Jeff King
2011-07-15  9:12                 ` Jakub Narebski
2011-07-15  9:17                   ` Long, Martin
2011-07-15 15:33                     ` Long, Martin
2011-07-15 16:15                       ` Drew Northup
2011-07-14 18:52   ` Linus Torvalds
2011-07-14 19:08     ` Jakub Narebski
2011-07-14 20:26   ` Junio C Hamano
2011-07-14 20:41     ` Jeff King
2011-07-14 21:30       ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2011-07-17 18:27 George Spelvin
2011-07-17 19:00 ` Long, Martin
2011-07-17 19:30 ` Linus Torvalds
2011-07-17 23:39   ` George Spelvin
2011-07-17 23:58     ` Linus Torvalds
2011-07-18  5:13       ` George Spelvin
2011-07-18 10:28         ` Anthony Van de Gejuchte
2011-07-18 11:48           ` George Spelvin
2011-07-20 20:51             ` Nicolas Pitre
2011-07-20 22:16               ` George Spelvin
2011-07-20 23:26                 ` david
2011-07-20 23:36                   ` Nicolas Pitre
2011-07-21  0:08                     ` Phil Hord
2011-07-21  0:18                       ` david
2011-07-21  0:37                         ` Shawn Pearce
2011-07-21  0:47                           ` Phil Hord
2011-07-21  4:26                           ` david
2011-07-21 12:43                             ` George Spelvin
2011-07-21 19:19                               ` Jakub Narebski
2011-07-21 20:27                                 ` George Spelvin
2011-07-21 20:33                                   ` Shawn Pearce
2011-07-22 12:18                                   ` Jakub Narebski
2011-07-22 13:09                                     ` Nicolas Pitre
2011-07-22 18:02                                       ` david
2011-07-22 18:34                                         ` Jakub Narebski
2011-07-22 19:06                                           ` Linus Torvalds
2011-07-22 22:02                                             ` Jeff King
2011-07-28 15:00                                             ` Felipe Contreras
2011-09-06 10:02                                               ` Ramkumar Ramachandra
2011-07-22 19:08                                           ` david
2011-07-22 19:40                                             ` Nicolas Pitre
2011-07-22 18:02                                     ` david
2011-07-21  0:39                         ` Phil Hord
2011-07-21  0:58                       ` Nicolas Pitre
2011-07-21  1:09                         ` Phil Hord
2011-07-21 12:03                   ` Drew Northup
2011-07-21 12:55                     ` George Spelvin
2011-07-21 15:57                       ` Drew Northup
2011-07-21 16:24                         ` Phil Hord
2011-07-21 22:40                           ` Pēteris Kļaviņš
2011-07-22  9:30                             ` Christian Couder
2011-07-21 17:36                         ` George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110715194807.GA356@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).