git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jeff King <peff@peff.net>, Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Jakub Narebski <jnareb@gmail.com>
Subject: Re: Git commit generation numbers
Date: Fri, 15 Jul 2011 02:12:43 -0700 (PDT)	[thread overview]
Message-ID: <m3fwm7aox1.fsf@localhost.localdomain> (raw)
In-Reply-To: <CA+55aFyDzr+SfgSzWMr9pQuQUXTw9mcjZ-00NZof74PKZzbGPA@mail.gmail.com>

Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Thu, Jul 14, 2011 at 1:31 PM, Jeff King <peff@peff.net> wrote:
> >
> > However, I'm not 100% convinced leaving generation numbers out was a
> > mistake. The git philosophy seems always to have been to keep the
> > minimal required information in the DAG.
> 
> Yes.
> 
> And until I saw the patches trying to add generation numbers, I didn't
> really try to push adding generation numbers to commits (although it
> actually came up as early as July 2005, so the "let's use generation
> numbers in commits" thing is *really* old).
> 
> In other words, I do agree that we should strive for minimal required
> information.
> 
> But dammit, if you start using generation numbers, then they *are*
> required information. The fact that you then hide them in some
> unarchitected random file doesn't change anything! It just makes it
> ugly and random, for chrissake!
> 
> I really don't understand your logic that says that the cache is
> somehow cleaner. It's a random hack! It's saying "we don't have it in
> the main data structure, so let's add it to some other one instead,
> and now we have a consistency and cache generation problem instead".

You store redundant information, one that is used to speed up
calculations, in a cache.
 
[...]
> > Generation numbers are _completely_ redundant with the actual structure
> > of history represented by the parent pointers.

What is more important the perceived structure of history can change
by three mechanisms:

 * grafts
 * replace objects
 * shallow clone

I can understand that you don't want to worry about grafts - they are
a terrible hack.  We can simply turn off using generation numbers
stored in commit if they are present.

The problem with shallow clones is only at beginning, when some of
commits in shallow repository does not have generation numbers.  You
cannot simply calculate generation number for a new commit in such
case.

But what about REPLACE OBJECTS?  If one for example use "git replace"
on root commit to join contemporary repository with historical
repository... this is not addressed in your emails.


And let's not forget the fact that we need cache for old commits which
don't have yet generation number in a commit.


BTW. you are not fair comparing size of code.  

First, some of Peff code is about _using_ generation numbers, which
will be needed regardless of whether generation numbers are stored in
cache or packfile index, or whether they are embedded in commit
objects.

Second, with generation number commit header you need to write fsck
code, and have to consider size of this yet-to-be-written code.

[...]
> > I liken it somewhat to the "don't store renames" debate.
> 
> That's total and utter bullshit.

I think Peff meant here that if you make mistakes in calculating
rename info or generation number, and have incorrect information
stored in commit object, you are f**ked.
 
> Storing renames is *wrong*. I've explained a million times why it's
> wrong. Doing it is a disaster. I know. I've used systems that did it.
> It's crap. It's fundamentally information that is actively misleading
> and WRONG. It's not even that you can do rename detection at run-time,
> it's that you *HAVE* to do rename detection at run-time, because doing
> it at commit time is simply utterly and fundamentally *wrong*.
> 
> Just look at "git blame -C" to remind yourself why rename information is wrong.

Also doing full code movement and copying detection (that is what "git
blame -C" does) rather than simplistic whole-file rename detection is
pretty much impossible at commit time.

Nb. most SCMs that use path-id based rename tracking require that user
explicitly marks renames using "scm move" or "scm rename" (well,
Mercurial has a tool for rename detection before commit, "hg
addremove").  But asking user to mark code movements is simply
infeasible.
 
> But even more importantly, look at git merges. Look at how git has
> gotten merging right since pretty much day #1, and has absolutely no
> issues with files that got generated two different ways. Look at every
> SCM that tries to do rename detection, and look at how THEY CANNOT DO
> MERGES RIGHT.
> 
> It's that simple. Rename detection is not about avoiding "redundant
> data". It's about doing the right thing.

Well, rename tracking supporters say that heuristic rename detection
can be wrong.


By the way, what happened to "wholesame directory rename detection"
patches?  Without them in the situation where one side renamed
directory, and other created new file in said directory git on merge
creates file in re-created old name of directory...

-- 
Jakub Narebski
Poland

  parent reply	other threads:[~2011-07-15  9:12 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-14 18:24 Git commit generation numbers Linus Torvalds
2011-07-14 18:37 ` Jeff King
2011-07-14 18:47   ` Linus Torvalds
2011-07-14 18:55     ` Linus Torvalds
2011-07-14 19:12       ` Jeff King
2011-07-14 19:46       ` Ted Ts'o
2011-07-14 19:51         ` Linus Torvalds
2011-07-14 20:07           ` Jeff King
2011-07-14 20:08           ` Ted Ts'o
2011-07-14 19:08     ` Jeff King
2011-07-14 19:23       ` Linus Torvalds
2011-07-14 20:01         ` Jeff King
2011-07-14 20:19           ` Linus Torvalds
2011-07-14 20:31             ` Jeff King
2011-07-15  1:19               ` Linus Torvalds
2011-07-15  2:41                 ` Geert Bosch
2011-07-15  7:46                 ` Jeff King
2011-07-15 16:10                   ` Linus Torvalds
2011-07-15 16:18                     ` Shawn Pearce
2011-07-15 16:44                       ` Linus Torvalds
2011-07-15 18:42                         ` Ted Ts'o
2011-07-15 19:00                           ` Linus Torvalds
2011-07-16  9:16                           ` Christian Couder
2011-07-18  3:41                             ` Jeff King
2011-07-19  4:14                               ` Christian Couder
2011-07-19 20:00                                 ` Jeff King
2011-07-21  6:29                                   ` Christian Couder
2011-07-15 18:46                         ` Tony Luck
2011-07-15 18:58                           ` Linus Torvalds
2011-07-15 19:48                     ` Jeff King
2011-07-15 20:07                       ` Jeff King
2011-07-15 21:17                       ` Linus Torvalds
2011-07-15 21:54                         ` Jeff King
2011-07-15 23:10                         ` Linus Torvalds
2011-07-15 23:16                           ` Linus Torvalds
2011-07-15 23:36                             ` Linus Torvalds
2011-07-16  0:42                               ` Jeff King
2011-07-16  0:40                           ` Jeff King
2011-07-15  9:12                 ` Jakub Narebski [this message]
2011-07-15  9:17                   ` Long, Martin
2011-07-15 15:33                     ` Long, Martin
2011-07-15 16:15                       ` Drew Northup
2011-07-14 18:52   ` Linus Torvalds
2011-07-14 19:08     ` Jakub Narebski
2011-07-14 20:26   ` Junio C Hamano
2011-07-14 20:41     ` Jeff King
2011-07-14 21:30       ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2011-07-17 18:27 George Spelvin
2011-07-17 19:00 ` Long, Martin
2011-07-17 19:30 ` Linus Torvalds
2011-07-17 23:39   ` George Spelvin
2011-07-17 23:58     ` Linus Torvalds
2011-07-18  5:13       ` George Spelvin
2011-07-18 10:28         ` Anthony Van de Gejuchte
2011-07-18 11:48           ` George Spelvin
2011-07-20 20:51             ` Nicolas Pitre
2011-07-20 22:16               ` George Spelvin
2011-07-20 23:26                 ` david
2011-07-20 23:36                   ` Nicolas Pitre
2011-07-21  0:08                     ` Phil Hord
2011-07-21  0:18                       ` david
2011-07-21  0:37                         ` Shawn Pearce
2011-07-21  0:47                           ` Phil Hord
2011-07-21  4:26                           ` david
2011-07-21 12:43                             ` George Spelvin
2011-07-21 19:19                               ` Jakub Narebski
2011-07-21 20:27                                 ` George Spelvin
2011-07-21 20:33                                   ` Shawn Pearce
2011-07-22 12:18                                   ` Jakub Narebski
2011-07-22 13:09                                     ` Nicolas Pitre
2011-07-22 18:02                                       ` david
2011-07-22 18:34                                         ` Jakub Narebski
2011-07-22 19:06                                           ` Linus Torvalds
2011-07-22 22:02                                             ` Jeff King
2011-07-28 15:00                                             ` Felipe Contreras
2011-09-06 10:02                                               ` Ramkumar Ramachandra
2011-07-22 19:08                                           ` david
2011-07-22 19:40                                             ` Nicolas Pitre
2011-07-22 18:02                                     ` david
2011-07-21  0:39                         ` Phil Hord
2011-07-21  0:58                       ` Nicolas Pitre
2011-07-21  1:09                         ` Phil Hord
2011-07-21 12:03                   ` Drew Northup
2011-07-21 12:55                     ` George Spelvin
2011-07-21 15:57                       ` Drew Northup
2011-07-21 16:24                         ` Phil Hord
2011-07-21 22:40                           ` Pēteris Kļaviņš
2011-07-22  9:30                             ` Christian Couder
2011-07-21 17:36                         ` George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3fwm7aox1.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).