* [PATCH] Add a 'generation' number to commits
@ 2011-07-14 18:34 Linus Torvalds
2011-07-15 19:49 ` Junio C Hamano
0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2011-07-14 18:34 UTC (permalink / raw)
To: Git Mailing List, Junio C Hamano, Jeff King
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu, 14 Jul 2011 11:09:46 -0700
Subject: [PATCH] Add a 'generation' number to commits
It turns out that it's ok with git-fsck, and it's really not that
complicated.
We unconditionally add the generation number to new commits, but we
don't require it in old ones. Even if mix old and new versions of git,
once you have the occasional new user, it's all good: there will be
generation numbers every once in a while, which means that computing new
ones will get cheaper (it's expensive to compute the generation number
for a deep tree that doesn't currently have any).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
Comments? This is pretty simplistic, and yes, it's slow. On the kernel, it
now takes a few seconds to generate a new commit when there are no
generation numbers - and that's on a fast machine.
But if I as a maintainer start using this, even if nobody else does, my
merges and my releases will start having generation numbers in the
commits, and once people start using those as the bases for their
development, the "generate the numbers" cost will quickly start going
down. It will always exist for old commits, but those get progressively
less relevant as time goes by, and soon enough all merging will be based
on stuff that has generation numbers somewhere reasonably recent.
And the thing is, we don't actually have to generate the generation
numbers very often. New commits, yes (but if you have a series of new
commits due to something like quilt import usage, it's only the first one
that ends up having that cost). But for the "might this be a merge base",
we could easily decide to never do any dynamic generation, and only say
that "IF we have pre-generated generation numbers, then we'll use them to
say "this cannot possibly be an ancestor, because it has a bigger
generation number".
So we'd not see the advantages immediately, but the downsides would be
pretty small too. And the upside is that eventually new commits _will_
have those generation numbers that we should have added to git originally.
commit.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
commit.h | 1 +
2 files changed, 51 insertions(+), 0 deletions(-)
diff --git a/commit.c b/commit.c
index ac337c7d7dc1..6a6b9978f252 100644
--- a/commit.c
+++ b/commit.c
@@ -89,6 +89,28 @@ static unsigned long parse_commit_date(const char *buf, const char *tail)
return strtoul(dateptr, NULL, 10);
}
+static long parse_commit_generation(const char *author, const char *tail)
+{
+ const char *p = author;
+ while (p + 13 < tail) {
+ /* Empty line before commit message? */
+ if (*p == '\n')
+ break;
+ if (!memcmp(p, "generation ", 11)) {
+ long value;
+ char *end;
+
+ value = strtoul(p+11, &end, 10);
+ if (!value || *end != '\n')
+ break;
+ return value;
+ }
+ while (p < tail && *p++ != '\n')
+ /* nothing */;
+ }
+ return -1;
+}
+
static struct commit_graft **commit_graft;
static int commit_graft_alloc, commit_graft_nr;
@@ -296,6 +318,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
}
}
item->date = parse_commit_date(bufptr, tail);
+ item->generation = parse_commit_generation(bufptr, tail);
return 0;
}
@@ -824,6 +847,26 @@ struct commit_list *reduce_heads(struct commit_list *heads)
return result;
}
+static long commit_generation(struct commit *commit)
+{
+ struct commit_list *parents;
+ unsigned long max = 0;
+
+ if (parse_commit(commit))
+ return -1;
+ if (commit->generation >= 0)
+ return commit->generation;
+ parents = commit->parents;
+ while (parents) {
+ long gen = commit_generation(parents->item);
+ if (gen >= max)
+ max = gen+1;
+ parents = parents->next;
+ }
+ commit->generation = max;
+ return max;
+}
+
static const char commit_utf8_warn[] =
"Warning: commit message does not conform to UTF-8.\n"
"You may want to amend it after fixing the message, or set the config\n"
@@ -836,6 +879,7 @@ int commit_tree(const char *msg, unsigned char *tree,
int result;
int encoding_is_utf8;
struct strbuf buffer;
+ unsigned long generation = 0;
assert_sha1_type(tree, OBJ_TREE);
@@ -851,9 +895,13 @@ int commit_tree(const char *msg, unsigned char *tree,
* if everything else stays the same.
*/
while (parents) {
+ long parent_gen;
struct commit_list *next = parents->next;
strbuf_addf(&buffer, "parent %s\n",
sha1_to_hex(parents->item->object.sha1));
+ parent_gen = commit_generation(parents->item);
+ if (parent_gen >= generation)
+ generation = parent_gen+1;
free(parents);
parents = next;
}
@@ -865,6 +913,8 @@ int commit_tree(const char *msg, unsigned char *tree,
strbuf_addf(&buffer, "committer %s\n", git_committer_info(IDENT_ERROR_ON_NO_NAME));
if (!encoding_is_utf8)
strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
+ if (generation)
+ strbuf_addf(&buffer, "generation %lu\n", generation);
strbuf_addch(&buffer, '\n');
/* And add the comment */
diff --git a/commit.h b/commit.h
index a2d571b97410..fd36274a2b0a 100644
--- a/commit.h
+++ b/commit.h
@@ -16,6 +16,7 @@ struct commit {
void *util;
unsigned int indegree;
unsigned long date;
+ long generation;
struct commit_list *parents;
struct tree *tree;
char *buffer;
--
1.7.6.1.g7f306
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] Add a 'generation' number to commits
2011-07-14 18:34 [PATCH] Add a 'generation' number to commits Linus Torvalds
@ 2011-07-15 19:49 ` Junio C Hamano
2011-07-15 23:58 ` Linus Torvalds
0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2011-07-15 19:49 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List, Jeff King
> Comments? This is pretty simplistic, and yes, it's slow. On the kernel, it
> now takes a few seconds to generate a new commit when there are no
> generation numbers - and that's on a fast machine.
I agree this is the way to go if we _were_ to use generation number
associated with commit objects in the longer term, and if the SLOP
logic in still_interesting() in revision.c:
(1) can gracefully fall back to the date based heuristics for older
commits without the header; and
(2) can take advantage of the generation numbers in more recent commit.
If we cannot do (1), we could augment this with Peff's generation number
cache. I suspect (1) is doable and in that case we do not have to have
(and we may be better off without) the on-disk cache that could go stale,
but nobody so far has shown that yet, so...
As I mentioned in a review comment of the actual patch, I however am not
convinced that generation number is a better substitute for the timestamp
in the context of "tag --contains" optimization.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Add a 'generation' number to commits
2011-07-15 19:49 ` Junio C Hamano
@ 2011-07-15 23:58 ` Linus Torvalds
2011-07-16 0:36 ` Junio C Hamano
0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2011-07-15 23:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List, Jeff King
On Fri, Jul 15, 2011 at 12:49 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> I agree this is the way to go if we _were_ to use generation number
> associated with commit objects in the longer term,
I have to say, if the main issue was "git tag/branch --contains", and
if the time-based slop approach of the patch I sent out is acceptable,
I think that we can continue to ignore generation numbers.
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Add a 'generation' number to commits
2011-07-15 23:58 ` Linus Torvalds
@ 2011-07-16 0:36 ` Junio C Hamano
0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2011-07-16 0:36 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List, Jeff King
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Fri, Jul 15, 2011 at 12:49 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> I agree this is the way to go if we _were_ to use generation number
>> associated with commit objects in the longer term,
>
> I have to say, if the main issue was "git tag/branch --contains", and
> if the time-based slop approach of the patch I sent out is acceptable,
> I think that we can continue to ignore generation numbers.
I think we are in agreement that "--contains" can be sped up without
generation numbers.
As I mentioned elsewhere, rev-list SLOP and merge-base traversal have
different performance characteristics and requirements from "--contains"
(for one thing, they cannot say "the commit tagged with v2.6.13 is too old
that there is no way this commit made three days ago is contained in it"
to optimize the traversal). And I agree that if we had generation header
in commit in May 2005, optimizing these traversals properly would have
been much cleaner, and it may still be worth doing it.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-07-16 0:36 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-14 18:34 [PATCH] Add a 'generation' number to commits Linus Torvalds
2011-07-15 19:49 ` Junio C Hamano
2011-07-15 23:58 ` Linus Torvalds
2011-07-16 0:36 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).