git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marc Strapetz <marc.strapetz@syntevo.com>
To: Junio C Hamano <gitster@pobox.com>, Stefan Beller <sbeller@google.com>
Cc: Jeff King <peff@peff.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: topological index field for commit objects
Date: Thu, 30 Jun 2016 00:15:36 +0200	[thread overview]
Message-ID: <2fff9bc4-0c40-dc51-1cf6-68b359cb16bf@syntevo.com> (raw)
In-Reply-To: <xmqqk2h73f2i.fsf@gitster.mtv.corp.google.com>

On 29.06.2016 22:39, Junio C Hamano wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> On Wed, Jun 29, 2016 at 11:59 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>> On Wed, Jun 29, 2016 at 11:31 AM, Marc Strapetz
>>> <marc.strapetz@syntevo.com> wrote:
>>>> This is no RFE but rather recurring thoughts whenever I'm working with
>>>> commit graphs: a topological index attribute for commit objects would be
>>>> incredible useful. By "topological index" I mean a simple integer for which
>>>> following condition holds true:
>>>
>>> Look for "generation numbers" in the list archive, perhaps?
>>
>> Thanks for the pointer to the interesting discussions.
>>
>> In http://www.spinics.net/lists/git/msg161363.html
>> Linus wrote in a discussion with Jeff:
>>
>>> Right now, we do *have* a "generation number". It's just that it's
>>> very easy to corrupt even by mistake. It's called "committer date". We
>>> could improve on it.
>>
>> Would it make sense to refuse creating commits that have a commit date
>> prior to its parents commit date (except when the user gives a
>> `--dammit-I-know-I-break-a-wildy-used-heuristic`)?
>
> I think that has also been discussed in the past.  I do not think it
> would help very much in practice, as projects already have up to 10
> years (and the ones migrated from CVS, even more) worth of commits
> they cannot rewrite that may record incorrect committer dates.
> You'd need something like "you can trust committer dates that are
> newer that this date" per project to switch between slow path and
> fast path, with an updated fsck that knows how to compute that
> number after you pulled from somebody who used that overriding
> option.
>
> If the use of generation number can somehow be limited narrowly, we
> may be able to incrementally introduce it only for new commits, but
> I haven't thought things through, so let me do so aloud here ;-)
>
> Suppose we use it only for this purpose:
>
>  * When we have two commits, C1 and C2, with generation numbers G1
>    and G2, we can say "C1 cannot possibly be an ancestor of C2" if
>    G1 > G2.  We cannot say anything else based on generation
>    numbers (or lack thereof).
>
> then I think we could just say "A newly created commit must record
> generation number G that is larger than generation numbers of its
> parent commits; ignore parents that lack generation number for the
> purpose of this sentence".

 From algorithm perspective, for already existing repositories you would 
still have to switch from an optimized generation number code to the 
current commit-time based code. That could things make even more complex 
and it's possibly expensive to determine whether a repository has full 
generation number support or not.

On the other hand, for new repositories, you could immediately use 
generation number based algorithms. So it could be "A newly created 
commit must record generation number G that is larger than generation 
numbers of its parent commits if all parents commits have a generation 
number recorded; otherwise do not record a generation number". Something 
like "git filter-branch" might already be sufficient to convert 
repositories.

Git versions released in 2019 may start issuing warnings if HEAD has no 
generation number assigned and Git versions released in 2025 may 
completely refuse to work with such repositories.

In the interim period, a local cache as Jeff is proposing could serve as 
secondary source for generation numbers. This would allow to phase out 
current algorithms immediately.

-Marc




  parent reply	other threads:[~2016-06-29 22:24 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-29 18:31 topological index field for commit objects Marc Strapetz
2016-06-29 18:59 ` Junio C Hamano
2016-06-29 20:20   ` Stefan Beller
2016-06-29 20:39     ` Junio C Hamano
2016-06-29 20:54       ` Stefan Beller
2016-06-29 21:37         ` Stefan Beller
2016-06-29 21:43           ` Jeff King
2016-06-29 20:56       ` Jeff King
2016-06-29 21:49         ` Jakub Narębski
2016-06-29 22:00           ` Jeff King
2016-06-29 22:11             ` Junio C Hamano
2016-06-29 22:30               ` Jeff King
2016-07-05 11:43                 ` Johannes Schindelin
2016-07-05 12:59                   ` Jakub Narębski
2016-06-30 10:30             ` Jakub Narębski
2016-06-30 18:12               ` Linus Torvalds
2016-06-30 23:39                 ` Jakub Narębski
2016-06-30 23:59                 ` Mike Hommey
2016-07-01  3:17                 ` Jeff King
2016-07-01  6:45                   ` Marc Strapetz
2016-07-01  9:48                   ` Jakub Narębski
2016-07-01 16:08                   ` Junio C Hamano
2016-07-01  6:54               ` Jeff King
2016-07-01  9:59                 ` Jakub Narębski
2016-07-20  0:07             ` Jakub Narębski
2016-07-20 13:02               ` Jeff King
2017-02-04 13:43                 ` Jakub Narębski
2017-02-17  9:26                   ` Jeff King
2017-02-17  9:28                     ` Jakub Narębski
2016-06-29 22:15       ` Marc Strapetz [this message]
2016-06-29 21:00   ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2fff9bc4-0c40-dc51-1cf6-68b359cb16bf@syntevo.com \
    --to=marc.strapetz@syntevo.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).