[PATCH] [RFD] Add repoid identifier to commit

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] [RFD] Add repoid identifier to commit
@ 2005-05-11 21:38 Thomas Gleixner
  2005-05-11 22:00 ` Sean
  2005-05-11 23:14 ` H. Peter Anvin
  0 siblings, 2 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-11 21:38 UTC (permalink / raw)
  To: git

This is an initial attempt to enable history tracking for multiple
repositories in a consistent state. At the moment this can only be done
by heuristic guessing on the parent dates and the committer names. 
This fails for example with Dave Millers net-2.6 and sparc-2.6 trees, as
in both cases the committer name is the same. It fails also completely
in cases where the system clock of the committer is wrong and the merge
is a head forward. The old bk repository contains entries from 1999 and
2027, which will happen also with git over the time. 

To identify a repository commit-tree tries to read an environment
variable "GIT_REPOSITORY_ID" and has a fallback to the current working
directory. The environment variable keeps the door open for managed
repository id's, but the current working directory is certainly a quite
helpful information to solve the origin decision for history tracking.

Adding a line after the committer should not break any existing tools
AFAICS.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

--- a/commit-tree.c
+++ b/commit-tree.c
@@ -110,6 +110,7 @@ int main(int argc, char **argv)
 	char *gecos, *realgecos, *commitgecos;
 	char *email, *commitemail, realemail[1000];
 	char date[20], realdate[20];
+	char *repoid, repoidbuf[MAXPATHLEN];
 	char *audate;
 	char comment[1000];
 	struct passwd *pw;
@@ -154,6 +155,14 @@ int main(int argc, char **argv)
 	if (audate)
 		parse_date(audate, date, sizeof(date));

+	repoid = getenv("GIT_REPOSITORY_ID");
+	if (!repoid)
+		repoid = getcwd(repoidbuf, MAXPATHLEN);
+	else {
+		if (strlen(repoid) == 0)
+			die("GIT_REPOSITORY_ID is empty. Fix it !");
+	}
+
 	remove_special(gecos); remove_special(realgecos); remove_special(commitgecos);
 	remove_special(email); remove_special(realemail); remove_special(commitemail);

@@ -170,7 +179,8 @@ int main(int argc, char **argv)

 	/* Person/date information */
 	add_buffer(&buffer, &size, "author %s <%s> %s\n", gecos, email, date);
-	add_buffer(&buffer, &size, "committer %s <%s> %s\n\n", commitgecos, commitemail, realdate);
+	add_buffer(&buffer, &size, "committer %s <%s> %s\n", commitgecos, commitemail, realdate);
+	add_buffer(&buffer, &size, "repoid %s\n\n", repoid);

 	/* And add the comment */
 	while (fgets(comment, sizeof(comment), stdin) != NULL)

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 21:38 [PATCH] [RFD] Add repoid identifier to commit Thomas Gleixner
@ 2005-05-11 22:00 ` Sean
  2005-05-11 22:05   ` Thomas Gleixner
  2005-05-11 23:14 ` H. Peter Anvin
  1 sibling, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-11 22:00 UTC (permalink / raw)
  To: tglx; +Cc: git

On Wed, May 11, 2005 5:38 pm, Thomas Gleixner said:
> This is an initial attempt to enable history tracking for multiple
> repositories in a consistent state. At the moment this can only be done
> by heuristic guessing on the parent dates and the committer names.
> This fails for example with Dave Millers net-2.6 and sparc-2.6 trees, as
> in both cases the committer name is the same. It fails also completely
> in cases where the system clock of the committer is wrong and the merge
> is a head forward. The old bk repository contains entries from 1999 and
> 2027, which will happen also with git over the time.
>
> To identify a repository commit-tree tries to read an environment
> variable "GIT_REPOSITORY_ID" and has a fallback to the current working
> directory. The environment variable keeps the door open for managed
> repository id's, but the current working directory is certainly a quite
> helpful information to solve the origin decision for history tracking.
>
> Adding a line after the committer should not break any existing tools
> AFAICS.

To make this useful you're also going to have to change the parent entries
to something like:

parent SHA1 REPOID

At least when the referenced commit has a repoid that doesn't match the
repository from which you obtained the object, ie. fast forward heads. 
This implies that you know the repoid of the repository you pulled the
object from!

Otherwise, you still haven't solved the problem of identifying fast
forward heads as you traverse the history.

Sean



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 22:00 ` Sean
@ 2005-05-11 22:05   ` Thomas Gleixner
  2005-05-11 22:24     ` Sean
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-11 22:05 UTC (permalink / raw)
  To: Sean; +Cc: git

On Wed, 2005-05-11 at 18:00 -0400, Sean wrote:
> To make this useful you're also going to have to change the parent entries
> to something like:
> 
> parent SHA1 REPOID
> 
> At least when the referenced commit has a repoid that doesn't match the
> repository from which you obtained the object, ie. fast forward heads. 
> This implies that you know the repoid of the repository you pulled the
> object from!
> 
> Otherwise, you still haven't solved the problem of identifying fast
> forward heads as you traverse the history.

Err, 
each parent is a commit, which is identified by its repoid. Why do you
want to add redundant information ?

tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 22:05   ` Thomas Gleixner
@ 2005-05-11 22:24     ` Sean
  2005-05-11 22:30       ` Thomas Gleixner
  0 siblings, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-11 22:24 UTC (permalink / raw)
  To: tglx; +Cc: git

On Wed, May 11, 2005 6:05 pm, Thomas Gleixner said:

> Err,
> each parent is a commit, which is identified by its repoid. Why do you
> want to add redundant information ?
>

It's not necessarily the repoid you pulled the object from though.  It may
be the repoid of another completely separate repository.

Repo A -  creates object  HEAD = (A)
Repo B -  pulls objects from Repo A  FAST FORWARD HEAD = (A)
Repo C -  pulls from Repo B

Now as you traverse the history in Repo C, the object will show as coming
from Repo A, not Repo B.

Sean

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 22:24     ` Sean
@ 2005-05-11 22:30       ` Thomas Gleixner
  2005-05-11 22:36         ` Sean
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-11 22:30 UTC (permalink / raw)
  To: Sean; +Cc: git

On Wed, 2005-05-11 at 18:24 -0400, Sean wrote:
> On Wed, May 11, 2005 6:05 pm, Thomas Gleixner said:
> 
> > Err,
> > each parent is a commit, which is identified by its repoid. Why do you
> > want to add redundant information ?
> >
> 
> It's not necessarily the repoid you pulled the object from though.  It may
> be the repoid of another completely separate repository.
> 
> Repo A -  creates object  HEAD = (A)
> Repo B -  pulls objects from Repo A  FAST FORWARD HEAD = (A)
> Repo C -  pulls from Repo B
> 
> Now as you traverse the history in Repo C, the object will show as coming
> from Repo A, not Repo B.

At this point it is completely irrelevant if you pulled from A or B. The
originator of Head A is A forever.

tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 22:30       ` Thomas Gleixner
@ 2005-05-11 22:36         ` Sean
  2005-05-11 22:48           ` Thomas Gleixner
  0 siblings, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-11 22:36 UTC (permalink / raw)
  To: tglx; +Cc: git

On Wed, May 11, 2005 6:30 pm, Thomas Gleixner said:

> At this point it is completely irrelevant if you pulled from A or B. The
> originator of Head A is A forever.

But who cares what repository was used to create the object?   You can't
talk to a repository.   What you want to know is who created the object,
and Author/Committer completely solves that problem.

If on the otherhand you're trying to reliably track the chain-of-command
that landed the object in your repository, your patch falls short.

Sean

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 22:36         ` Sean
@ 2005-05-11 22:48           ` Thomas Gleixner
  2005-05-11 23:01             ` Sean
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-11 22:48 UTC (permalink / raw)
  To: Sean; +Cc: git

On Wed, 2005-05-11 at 18:36 -0400, Sean wrote:
> On Wed, May 11, 2005 6:30 pm, Thomas Gleixner said:
> 
> > At this point it is completely irrelevant if you pulled from A or B. The
> > originator of Head A is A forever.
> 
> But who cares what repository was used to create the object?   You can't
> talk to a repository.   What you want to know is who created the object,
> and Author/Committer completely solves that problem.

Maybe you have missed the point, where one Committer holds more than one
repository. See davem/net-2.6 and davem/sparc-2.6. Not to talk of
Russell King's and Greg's multiple repositories.
The Author is irrelevant, because one Author sends patches to more than
one maintainer. Author _cannot_ be a source of tracking information. If
you want to do heuristic guesses on Author/Committer pairs, then you
make the situation more complex than it is already.

> If on the otherhand you're trying to reliably track the chain-of-command
> that landed the object in your repository, your patch falls short.

As I said before it is completely irrelevant whether fast forward was
pulled into C directly from A or from B. 

Whats the relevant content of getting the same thing from A or B ? 

If you want to do this, you break the fast forward mechanism and
reinvent the pull ping-pong which is avoided by the fast forwards.

tglx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 22:48           ` Thomas Gleixner
@ 2005-05-11 23:01             ` Sean
  2005-05-11 23:33               ` Thomas Gleixner
  0 siblings, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-11 23:01 UTC (permalink / raw)
  To: tglx; +Cc: git

On Wed, May 11, 2005 6:48 pm, Thomas Gleixner said:

Hey Thomas,

> Maybe you have missed the point, where one Committer holds more than one
> repository. See davem/net-2.6 and davem/sparc-2.6. Not to talk of
> Russell King's and Greg's multiple repositories.
> The Author is irrelevant, because one Author sends patches to more than
> one maintainer. Author _cannot_ be a source of tracking information. If
> you want to do heuristic guesses on Author/Committer pairs, then you
> make the situation more complex than it is already.

Why would anyone care how many repositories Russell or Greg use?  Why does
anyone care if Dave used his repo A, B, or C?   Aren't I still just going
to contact him via his author email addy if I have an issue with an object
he has added to the stream?

And if I do care which repo he used, why don't I care about the case i've
outlined where the chain of command information is lost?

> As I said before it is completely irrelevant whether fast forward was
> pulled into C directly from A or from B.
>
> Whats the relevant content of getting the same thing from A or B ?

Exactly!!!  So what is relevant of getting the same thing from Dave's A or
B?  The only point would be to show chain of command, but you don't seem
interested in that.

> If you want to do this, you break the fast forward mechanism and
> reinvent the pull ping-pong which is avoided by the fast forwards.

Yes, I think there are other ways to avoid the ping pong too.

Sean

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 21:38 [PATCH] [RFD] Add repoid identifier to commit Thomas Gleixner
  2005-05-11 22:00 ` Sean
@ 2005-05-11 23:14 ` H. Peter Anvin
  2005-05-11 23:38   ` Thomas Gleixner
  2005-05-13  1:37   ` [PATCH] [RFD] Add repoid identifier to commit [its a workspace id, isn't it?] Jon Seymour
  1 sibling, 2 replies; 74+ messages in thread
From: H. Peter Anvin @ 2005-05-11 23:14 UTC (permalink / raw)
  To: tglx; +Cc: git

Thomas Gleixner wrote:
> This is an initial attempt to enable history tracking for multiple
> repositories in a consistent state. At the moment this can only be done
> by heuristic guessing on the parent dates and the committer names. 
> This fails for example with Dave Millers net-2.6 and sparc-2.6 trees, as
> in both cases the committer name is the same. It fails also completely
> in cases where the system clock of the committer is wrong and the merge
> is a head forward. The old bk repository contains entries from 1999 and
> 2027, which will happen also with git over the time. 
> 
> To identify a repository commit-tree tries to read an environment
> variable "GIT_REPOSITORY_ID" and has a fallback to the current working
> directory. The environment variable keeps the door open for managed
> repository id's, but the current working directory is certainly a quite
> helpful information to solve the origin decision for history tracking.
> 
> Adding a line after the committer should not break any existing tools
> AFAICS.
> 

I would like to suggest a few limiters are set on the repoid.  In 
particular, I'd like to suggest that a repoid is a UUID, that a file is 
used to track it (.git/repoid), and that if it doesn't exist, a new one 
is created from /dev/urandom.

	-hpa

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 23:01             ` Sean
@ 2005-05-11 23:33               ` Thomas Gleixner
  2005-05-11 23:44                 ` Sean
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-11 23:33 UTC (permalink / raw)
  To: Sean; +Cc: git

On Wed, 2005-05-11 at 19:01 -0400, Sean wrote:
> Why would anyone care how many repositories Russell or Greg use?  Why does
> anyone care if Dave used his repo A, B, or C?   Aren't I still just going
> to contact him via his author email addy if I have an issue with an object
> he has added to the stream?

He? What the hell have the sparc-2.6 and net-2.6 in common except the
same owner/maintainer ? Should we base the heuristics on directories and
filenames ? Cool.

It is relevant for the maintainers to have information which is
consistent over a repository. So the source of change _is_ relevant.

> Exactly!!!  So what is relevant of getting the same thing from Dave's A or
> B?  

The relevant part is, that it _is_ relevant for Dave to know where the
hell a problem was introduced.

> The only point would be to show chain of command, but you don't seem
> interested in that.

What is the chain of commands good for ? Does the chain of commands
change the history information in a specific repository ? 

No. 

If you buy food, then it is relevant if you get it from A directly or
via B. The commit and the referenced tree is immutable and does neither
change the consistency nor gets uneatable.

> > If you want to do this, you break the fast forward mechanism and
> > reinvent the pull ping-pong which is avoided by the fast forwards.
> 
> Yes, I think there are other ways to avoid the ping pong too.

True, but not with a plain rsync approach

tglx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 23:14 ` H. Peter Anvin
@ 2005-05-11 23:38   ` Thomas Gleixner
  2005-05-11 23:40     ` H. Peter Anvin
  2005-05-12  0:41     ` Dmitry Torokhov
  2005-05-13  1:37   ` [PATCH] [RFD] Add repoid identifier to commit [its a workspace id, isn't it?] Jon Seymour
  1 sibling, 2 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-11 23:38 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: git

On Wed, 2005-05-11 at 16:14 -0700, H. Peter Anvin wrote:
> I would like to suggest a few limiters are set on the repoid.  In 
> particular, I'd like to suggest that a repoid is a UUID, that a file is 
> used to track it (.git/repoid), and that if it doesn't exist, a new one 
> is created from /dev/urandom.

Which is complety error prone due to rsync. Some of the repositories on
kernel.org keep identical copies of .git/description already. Why should
they preserve an unique .git/repoid ?

There is one clean way to solve this. Managed repository id's and a lot
of discipline.

I expect neither of those two things to happen, but a complete working
directory path is better than nothing to make educated guesses.
Committer names (maintainers) can be the same over repositories, but its
unlikely that somebody who manages more than one subsystems uses the
same working directory for them.

tglx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 23:38   ` Thomas Gleixner
@ 2005-05-11 23:40     ` H. Peter Anvin
  2005-05-11 23:45       ` Sean
  2005-05-12  0:33       ` Thomas Gleixner
  2005-05-12  0:41     ` Dmitry Torokhov
  1 sibling, 2 replies; 74+ messages in thread
From: H. Peter Anvin @ 2005-05-11 23:40 UTC (permalink / raw)
  To: tglx; +Cc: git

Thomas Gleixner wrote:
> 
> Which is complety error prone due to rsync. Some of the repositories on
> kernel.org keep identical copies of .git/description already. Why should
> they preserve an unique .git/repoid ?
> 
> There is one clean way to solve this. Managed repository id's and a lot
> of discipline.
> 
> I expect neither of those two things to happen, but a complete working
> directory path is better than nothing to make educated guesses.
> Committer names (maintainers) can be the same over repositories, but its
> unlikely that somebody who manages more than one subsystems uses the
> same working directory for them.
> 

I can tell you what would happen in at least my case: you'll see each 
"repository" with about 23 different IDs.

	-hpa

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 23:33               ` Thomas Gleixner
@ 2005-05-11 23:44                 ` Sean
  2005-05-12  0:30                   ` Thomas Gleixner
  0 siblings, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-11 23:44 UTC (permalink / raw)
  To: tglx; +Cc: git

On Wed, May 11, 2005 7:33 pm, Thomas Gleixner said:

> He? What the hell have the sparc-2.6 and net-2.6 in common except the
> same owner/maintainer ? Should we base the heuristics on directories and
> filenames ? Cool.

What problem are you trying to solve?  Has dave or russell or anybody with
multiple repositories given you reason to think they have a problem
tracking their personal repositories?   I doubt it very much.

>> The only point would be to show chain of command, but you don't seem
>> interested in that.
>
> What is the chain of commands good for ? Does the chain of commands
> change the history information in a specific repository ?

The chain of command might be good to know in the same way that an
accurate signed-off-by chain is good to know.

> No.

Yes.  Not that I care personally very much.

> If you buy food, then it is relevant if you get it from A directly or
> via B. The commit and the referenced tree is immutable and does neither
> change the consistency nor gets uneatable.

Lol..

> True, but not with a plain rsync approach

Agreed.

Sean.



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 23:40     ` H. Peter Anvin
@ 2005-05-11 23:45       ` Sean
  2005-05-12  0:04         ` H. Peter Anvin
  2005-05-12  0:33       ` Thomas Gleixner
  1 sibling, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-11 23:45 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: tglx, git

On Wed, May 11, 2005 7:40 pm, H. Peter Anvin said:

> I can tell you what would happen in at least my case: you'll see each
> "repository" with about 23 different IDs.
>

Amongst other issues and complexity this will introduce.   This is really
a solution in search of a problem anyway.

Sean



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 23:45       ` Sean
@ 2005-05-12  0:04         ` H. Peter Anvin
  2005-05-12  0:20           ` Sean
  0 siblings, 1 reply; 74+ messages in thread
From: H. Peter Anvin @ 2005-05-12  0:04 UTC (permalink / raw)
  To: Sean; +Cc: tglx, git

Sean wrote:
> 
> Amongst other issues and complexity this will introduce.   This is really
> a solution in search of a problem anyway.
> 

You mean repoid?

	-hpa


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  0:04         ` H. Peter Anvin
@ 2005-05-12  0:20           ` Sean
  0 siblings, 0 replies; 74+ messages in thread
From: Sean @ 2005-05-12  0:20 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: tglx, git

On Wed, May 11, 2005 8:04 pm, H. Peter Anvin said:
> Sean wrote:
>>
>> Amongst other issues and complexity this will introduce.   This is
>> really a solution in search of a problem anyway.
>>
> You mean repoid?

Hey Peter,

   Yes, it will create just as many problems as it sets out to solve. 
Actually, I still don't know what problem is being addressed by the
current proposal.

Sean



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 23:44                 ` Sean
@ 2005-05-12  0:30                   ` Thomas Gleixner
  2005-05-12  0:45                     ` Sean
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12  0:30 UTC (permalink / raw)
  To: Sean; +Cc: git

On Wed, 2005-05-11 at 19:44 -0400, Sean wrote:
> What problem are you trying to solve?  

The problem to explain the obvious facts to an agnostic

> Has dave or russell or anybody with
> multiple repositories given you reason to think they have a problem
> tracking their personal repositories?   I doubt it very much.

Aarg. Did you ever get in contact with QA departements ?

Assume you have:  bugfix - stable - devel repositories.

You have to track down a problem in bugfix and the source of it.
It does not matter whether the maintainer of "bugfix" pulled it from
devel or from stable. It's his fault anyway. 

But we are not talking about faults and guiltiness. We want to identify
the location and the context _where_ and _why_ this change was created.

The current solution of git makes it impossible to retrieve this
information in a consistent way. 

So you have no quick solution to figure out what happened. Quite
contrary, you have to dissect inconsistent information.

See also the thread about "Stop git-rev-list at sha1 match".

> The chain of command might be good to know in the same way that an
> accurate signed-off-by chain is good to know.

This sentence makes me guess, that you actually are working in a QA
departement and therefor trying to maximize the amount of irrelevant
information.

tglx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 23:40     ` H. Peter Anvin
  2005-05-11 23:45       ` Sean
@ 2005-05-12  0:33       ` Thomas Gleixner
  2005-05-12  1:46         ` Junio C Hamano
  1 sibling, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12  0:33 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: git

On Wed, 2005-05-11 at 16:40 -0700, H. Peter Anvin wrote:
> > I expect neither of those two things to happen, but a complete working
> > directory path is better than nothing to make educated guesses.
> > Committer names (maintainers) can be the same over repositories, but its
> > unlikely that somebody who manages more than one subsystems uses the
> > same working directory for them.
> > 
> 
> I can tell you what would happen in at least my case: you'll see each 
> "repository" with about 23 different IDs.

You won. :)

So what alternatives do we have ?

- commit history per repository
  .git/head-history               rsync and user error prone 
- .git/repoid                     rsync error prone
- GIT_REPO_ID=xyz                 user  error prone
- directory name based guessing   hpa error prone

What's your preferred error scenario ?

tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-11 23:38   ` Thomas Gleixner
  2005-05-11 23:40     ` H. Peter Anvin
@ 2005-05-12  0:41     ` Dmitry Torokhov
  2005-05-12  0:44       ` Thomas Gleixner
  1 sibling, 1 reply; 74+ messages in thread
From: Dmitry Torokhov @ 2005-05-12  0:41 UTC (permalink / raw)
  To: git, tglx; +Cc: H. Peter Anvin

On Wednesday 11 May 2005 18:38, Thomas Gleixner wrote:
> On Wed, 2005-05-11 at 16:14 -0700, H. Peter Anvin wrote:
> > I would like to suggest a few limiters are set on the repoid.  In 
> > particular, I'd like to suggest that a repoid is a UUID, that a file is 
> > used to track it (.git/repoid), and that if it doesn't exist, a new one 
> > is created from /dev/urandom.
> 
> Which is complety error prone due to rsync. Some of the repositories on
> kernel.org keep identical copies of .git/description already. Why should
> they preserve an unique .git/repoid ?

I think that an unique repoid should be created automatically every time
you clone. It is ok for it to go away when you discard a tree, it will just
identify a line (set) of changes originating from some place.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  0:41     ` Dmitry Torokhov
@ 2005-05-12  0:44       ` Thomas Gleixner
  2005-05-12  1:09         ` H. Peter Anvin
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12  0:44 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: git, H. Peter Anvin

On Wed, 2005-05-11 at 19:41 -0500, Dmitry Torokhov wrote:
> > 
> > Which is complety error prone due to rsync. Some of the repositories on
> > kernel.org keep identical copies of .git/description already. Why should
> > they preserve an unique .git/repoid ?
> 
> I think that an unique repoid should be created automatically every time
> you clone. It is ok for it to go away when you discard a tree, it will just
> identify a line (set) of changes originating from some place.

Yes, as long as you make sure that rsync does _NOT_ pollute/populate it

tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  0:30                   ` Thomas Gleixner
@ 2005-05-12  0:45                     ` Sean
  2005-05-12  0:56                       ` Thomas Gleixner
  0 siblings, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-12  0:45 UTC (permalink / raw)
  To: tglx; +Cc: git

On Wed, May 11, 2005 8:30 pm, Thomas Gleixner said:
> On Wed, 2005-05-11 at 19:44 -0400, Sean wrote:
>> What problem are you trying to solve?
>
> The problem to explain the obvious facts to an agnostic

No the problem is you're seeing dragons.

> Aarg. Did you ever get in contact with QA departements ?

Can we please not _invent_ problems where there are none?  Can you show a
specific case today where repoid would make one ounce of difference in the
life of anyone?

> Assume you have:  bugfix - stable - devel repositories.

Why does this imaginary QA department use the same committer and author
for all of them?  And why is it you switch from imaginary problems of
dave, greg and russell to imaginary problems of a fictitious QA
department?

> You have to track down a problem in bugfix and the source of it.
> It does not matter whether the maintainer of "bugfix" pulled it from
> devel or from stable. It's his fault anyway.
>
> But we are not talking about faults and guiltiness. We want to identify
> the location and the context _where_ and _why_ this change was created.
>
> The current solution of git makes it impossible to retrieve this
> information in a consistent way.

Wrong.  When a commit is pulled from a repository, all the surrounding
context of every commit that came before it and after it on that branch is
pulled right along with it.

> So you have no quick solution to figure out what happened. Quite
> contrary, you have to dissect inconsistent information.
>
> See also the thread about "Stop git-rev-list at sha1 match".

Sorry, this one is entertaining enough <g>

>> The chain of command might be good to know in the same way that an
>> accurate signed-off-by chain is good to know.
>
> This sentence makes me guess, that you actually are working in a QA
> departement and therefor trying to maximize the amount of irrelevant
> information.

No, you seem to want it both ways.  Sometimes it's important to you to
know where an object came from and how it got there, and sometimes it's
not.  Interesting blind spot.

Sean

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  0:45                     ` Sean
@ 2005-05-12  0:56                       ` Thomas Gleixner
  2005-05-12  0:58                         ` Sean
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12  0:56 UTC (permalink / raw)
  To: Sean; +Cc: git

On Wed, 2005-05-11 at 20:45 -0400, Sean wrote:
> Can we please not _invent_ problems where there are none?  Can you show a
> specific case today where repoid would make one ounce of difference in the
> life of anyone?

Try to find out the history of kernel.org/.../dwmw2/audit-2.6 in correct
order, using the available tools. 

Come back to me when you are done.

> No, you seem to want it both ways.  Sometimes it's important to you to
> know where an object came from and how it got there, and sometimes it's
> not.  Interesting blind spot.

He ? 

I was not aware, that omitting irrelevant information is creating a
blind spot. 

Period. End of thread.

tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  0:56                       ` Thomas Gleixner
@ 2005-05-12  0:58                         ` Sean
  2005-05-12 10:07                           ` David Woodhouse
  0 siblings, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-12  0:58 UTC (permalink / raw)
  To: tglx; +Cc: git

On Wed, May 11, 2005 8:56 pm, Thomas Gleixner said:

> Try to find out the history of kernel.org/.../dwmw2/audit-2.6 in correct
> order, using the available tools.
>
> Come back to me when you are done.

Ask me any question that matters and i'll answer it with available tools.

> I was not aware, that omitting irrelevant information is creating a
> blind spot.

Sorry, your assessment that it is irrelevant is incorrect and overlooks
that  there is information loss.

> Period. End of thread.

Fair enough.

Sean



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  0:44       ` Thomas Gleixner
@ 2005-05-12  1:09         ` H. Peter Anvin
  2005-05-12  1:13           ` H. Peter Anvin
  0 siblings, 1 reply; 74+ messages in thread
From: H. Peter Anvin @ 2005-05-12  1:09 UTC (permalink / raw)
  To: tglx; +Cc: Dmitry Torokhov, git

Thomas Gleixner wrote:
> On Wed, 2005-05-11 at 19:41 -0500, Dmitry Torokhov wrote:
> 
>>>Which is complety error prone due to rsync. Some of the repositories on
>>>kernel.org keep identical copies of .git/description already. Why should
>>>they preserve an unique .git/repoid ?
>>
>>I think that an unique repoid should be created automatically every time
>>you clone. It is ok for it to go away when you discard a tree, it will just
>>identify a line (set) of changes originating from some place.
> 
> 
> Yes, as long as you make sure that rsync does _NOT_ pollute/populate it
> 

You shouldn't be rsyncing the .git directory, only .git/objects anyway. 
   Some people seem to have merely copied Linus' entire tree, and that's 
what causing problems.

That one you can't win.

	-hpa


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  1:09         ` H. Peter Anvin
@ 2005-05-12  1:13           ` H. Peter Anvin
  2005-05-12  3:30             ` Joel Becker
  2005-05-12  9:17             ` Thomas Gleixner
  0 siblings, 2 replies; 74+ messages in thread
From: H. Peter Anvin @ 2005-05-12  1:13 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: tglx, Dmitry Torokhov, git

H. Peter Anvin wrote:
>>
>> Yes, as long as you make sure that rsync does _NOT_ pollute/populate it
>>
> 
> You shouldn't be rsyncing the .git directory, only .git/objects anyway. 
>   Some people seem to have merely copied Linus' entire tree, and that's 
> what causing problems.
> 
> That one you can't win.
> 

What I meant with that is I think .git/repoid is the right thing, if the 
file doesn't exist a new ID file is generated.

If people are copying their repoid file explicitly it's up to them to 
know what they're doing.

	-hpa

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  0:33       ` Thomas Gleixner
@ 2005-05-12  1:46         ` Junio C Hamano
  2005-05-12  7:57           ` Thomas Gleixner
  0 siblings, 1 reply; 74+ messages in thread
From: Junio C Hamano @ 2005-05-12  1:46 UTC (permalink / raw)
  To: tglx; +Cc: H. Peter Anvin, git

>>>>> "TG" == Thomas Gleixner <tglx@linutronix.de> writes:

TG> So what alternatives do we have ?

How about doing nothing of this sort, introducing repo-id?  I do
not understand what problem repo-id is solving.

Earlier in your response to Sean <seanlkml@sympaticoca>, you
gave a QA department example.

TG> You have to track down a problem in bugfix and the source of it.
TG> It does not matter whether the maintainer of "bugfix" pulled it from
TG> devel or from stable. It's his fault anyway. 
TG> 
TG> But we are not talking about faults and guiltiness. We want
TG> to identify the location and the context _where_ and _why_
TG> this change was created.

Here is my understanding of the scenario you are describing.
Are these correct?

 - There is a problem in the source.

 - You know what lines of which file is causing the problem.
   But you cannot tell how the file got into that state and why
   by just looking at the problem revision.

 - You have the complete history (commit chain) leading to the
   revision.

 - You want to get some context to help you understand why those
   offending lines are there.

Assuming I am with you so far, I would like to know what kind of
information you are looking for ("some context to help you
understand").  Is a specific commit object (rather, one pair of
commits that is parent-child) that made those lines into the
current shape enough?

My understanding of Sean's argument is that finding such a
commit (or a commit-pair) is a good enough place to start
understanding why that change was introduced and finding who to
ask for help, and it does not matter in which repository the
change was introduced.  I tend to agree with him if that is what
is being discussed.

If the owner has multiple repositories and he needs to know in
which of his repositories the change was introduced, I assume he
would xsbe able to run the same procedure the QA department run
to find the problem commit on each of his repositories to find
such a commit, and commits around it (its ancestors and
descendants).  So a maintainer having more than one repositories
does not seem to be an issue, either.

So I am having a hard time understanding what problem repo-id
solves.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  1:13           ` H. Peter Anvin
@ 2005-05-12  3:30             ` Joel Becker
  2005-05-12  9:17             ` Thomas Gleixner
  1 sibling, 0 replies; 74+ messages in thread
From: Joel Becker @ 2005-05-12  3:30 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: tglx, Dmitry Torokhov, git

On Wed, May 11, 2005 at 06:13:45PM -0700, H. Peter Anvin wrote:
> What I meant with that is I think .git/repoid is the right thing, if the 
> file doesn't exist a new ID file is generated.

	Count me in the "what does repoid help?" camp.  If we create a
new UUID on each clone, imagine this typical usage:

	linux-2.6.git has repoid AAAAAA.
	I clone it locally, local-2.6-clean, repoid BBBBBB
	I clone the local one, local-2.6-working, repoid CCCCCC
	I work in the local one and commit my change.  commit abcd,
		repoid CCCCCC.
	I then rsync, copy, or clone that working repository to some
		place that Linus can pull from.
	I then throw away the copy with repoid CCCCCC, because I'm done
		with that temporary work area.
	lather, rinse, repeat.

	IOW, each of my changes, if I work like this, has a different
repoid.  And when a problem arises, the repoid tells us diddly.  I
thought one of the tenents of bk/git/codeville/whatever development is
that clone is the way to do any temporary area.  You work in a clone or
10, and then clean up for submission.  Which of the 10 clones is the
associated repoid seems, well, unimporant.
	
Joel

-- 

Life's Little Instruction Book #99

	"Think big thoughts, but relish small pleasures."

Joel Becker
Senior Member of Technical Staff
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  1:46         ` Junio C Hamano
@ 2005-05-12  7:57           ` Thomas Gleixner
  2005-05-12  9:32             ` Sean
  2005-05-12 17:35             ` Junio C Hamano
  0 siblings, 2 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12  7:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: H. Peter Anvin, git

On Wed, 2005-05-11 at 18:46 -0700, Junio C Hamano wrote:
> >>>>> "TG" == Thomas Gleixner <tglx@linutronix.de> writes:
> So I am having a hard time understanding what problem repo-id
> solves.

Rn   o
     | \
Rn-1 o  |
     |  o Mn
     |  o Mn-1
Rn-2 o /
Rn-3 o

rev-tree shows you 

Rn
Rn-1
Mn
Mn-1
Rn-2
Rn-3

Which is wrong. 

After syncing M to Rn you see the same thing in M

Rn
Rn-1
Mn
Mn-1
Rn-2
Rn-3

which is also wrong. 

The correct display looking at R is

Rn
 Mn
 Mn-1
Rn-1
Rn-2
Rn-3

Looking from M it is

Rn
 Rn-1
 Rn-2
Mn
Mn-2
Rn-3

tglx








^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  1:13           ` H. Peter Anvin
  2005-05-12  3:30             ` Joel Becker
@ 2005-05-12  9:17             ` Thomas Gleixner
  1 sibling, 0 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12  9:17 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Dmitry Torokhov, git

On Wed, 2005-05-11 at 18:13 -0700, H. Peter Anvin wrote:
> > You shouldn't be rsyncing the .git directory, only .git/objects anyway. 
> >   Some people seem to have merely copied Linus' entire tree, and that's 
> > what causing problems. 
> > That one you can't win.

:)

> What I meant with that is I think .git/repoid is the right thing, if the 
> file doesn't exist a new ID file is generated.

Yep, convinced. 
The only thing I'd like to see is some thing which is human readable and
maybe helpful to deduce the context of this. 
Adding a dev/random number to make it unique is not bad.

So what about
repoid 'pwd' 'random' ?

> If people are copying their repoid file explicitly it's up to them to 
> know what they're doing.

True. I makes sense for maintainers doing updates to their public
repositories to keep the same repoid in their working copy at home/work.

tglx




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  7:57           ` Thomas Gleixner
@ 2005-05-12  9:32             ` Sean
  2005-05-12  9:39               ` Thomas Gleixner
  2005-05-12 17:35             ` Junio C Hamano
  1 sibling, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-12  9:32 UTC (permalink / raw)
  To: tglx; +Cc: Junio C Hamano, H. Peter Anvin, git

[-- Attachment #1: Type: text/plain, Size: 1962 bytes --]

On Thu, May 12, 2005 3:57 am, Thomas Gleixner said:
> On Wed, 2005-05-11 at 18:46 -0700, Junio C Hamano wrote:
>> >>>>> "TG" == Thomas Gleixner <tglx@linutronix.de> writes:
>> So I am having a hard time understanding what problem repo-id
>> solves.
>
> Rn   o
>      > \
> Rn-1 o  |
>      >  o Mn
>      >  o Mn-1
> Rn-2 o /
> Rn-3 o
>
[snip]

All you forgot was to explain how repo-id helps one iota.   And if you're
up to it, explain how it would help sort out the following, where Xn is a
fast forward head:

Rn   o
     > \
Rn-1 o  |
     >  o Mn
     >  o Mn-1
Rn-2 o /
Xn   o

And what about sorting out branches created by a single developer in a
single repository? doh!   Sounds like a solution that addresses all these
should be worked out instead of repoid.   You really are barking up the
wrong tree here.

Just because rev-tree may get it wrong, doesn't mean every other tool does.
Actually, I just ran your above scenario through git, and here is what
cg-log shows, which seems perfectly acceptable:

commit 19de0d5cd9269f0869fecb0b866efa12ef882a11
parent 490ae38bcbf70fe19bcc0c1a28d1fa301620a2d5
parent 71890fc6b9e3da470623dbbf3dc492b937757a37
    Merge with ../test2/.git
    Rn

commit 490ae38bcbf70fe19bcc0c1a28d1fa301620a2d5
parent b96d60d7b632a188f4550762f8a1a99f8b381c9b
    Rn-1

commit b96d60d7b632a188f4550762f8a1a99f8b381c9b
parent 0dbe9da9b565bb695d464532470734c6f4676951
    Rn-2

commit 71890fc6b9e3da470623dbbf3dc492b937757a37
parent 81dbf4bac14c3caeadfa084d57ad78544e69d6d8
    Mn

commit 81dbf4bac14c3caeadfa084d57ad78544e69d6d8
parent 0dbe9da9b565bb695d464532470734c6f4676951
    Mn-1

commit 0dbe9da9b565bb695d464532470734c6f4676951
parent 221a1474b35d700fd67895cb6206d04fc17b083a
    Rn-3

In fact, please see attached .png image that shows how the nice gitk tool
from Paul Mackerras displays it exactly as you request WITHOUT Repo-id.

* Please * explain what problem you are trying to solve and how repoid
will solve it.

Sean

[-- Attachment #2: what_problem.png --]
[-- Type: image/png, Size: 2188 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  9:32             ` Sean
@ 2005-05-12  9:39               ` Thomas Gleixner
  2005-05-12  9:46                 ` Sean
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12  9:39 UTC (permalink / raw)
  To: Sean; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, 2005-05-12 at 05:32 -0400, Sean wrote:
> In fact, please see attached .png image that shows how the nice gitk tool
> from Paul Mackerras displays it exactly as you request WITHOUT Repo-id.

Please do the complete test. Sync test2 with test1 and show me the
picture there. It will be the same as you see in test1, which is wrong

> * Please * explain what problem you are trying to solve and how repoid
> will solve it.

Having the repository id in there you can identify the different order
of test2

tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  9:39               ` Thomas Gleixner
@ 2005-05-12  9:46                 ` Sean
  2005-05-12 11:18                   ` Thomas Gleixner
  0 siblings, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-12  9:46 UTC (permalink / raw)
  To: tglx; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, May 12, 2005 5:39 am, Thomas Gleixner said:

> Please do the complete test. Sync test2 with test1 and show me the
> picture there. It will be the same as you see in test1, which is wrong

It will get the fast forward head from test1, and so it _should_ show the
exact same thing!  The repositories are in sync, they should display the
exact same way.  What is the problem?

> Having the repository id in there you can identify the different order
> of test2
>

What different order?   Everything I want as a developer or even as a QA
department is right there in front of me.    What VALUE does some other
order have?   What question will you answer with a different order?   Who
will ask this question?  Why would anyone care?

Sean

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  0:58                         ` Sean
@ 2005-05-12 10:07                           ` David Woodhouse
  2005-05-12 10:18                             ` Sean
  2005-05-12 10:39                             ` Sean
  0 siblings, 2 replies; 74+ messages in thread
From: David Woodhouse @ 2005-05-12 10:07 UTC (permalink / raw)
  To: Sean; +Cc: tglx, git

On Wed, 2005-05-11 at 20:58 -0400, Sean wrote:
> > Try to find out the history of kernel.org/.../dwmw2/audit-2.6 in 
> > correct order, using the available tools.
> >
> > Come back to me when you are done.
> 
> Ask me any question that matters and i'll answer it with available
> tools.

The above question matters, so please answer it if you can. I'll make it
clearer for you though...

By 'correct order' Thomas means the order in which my old BK-export
script used to generate the "changesets since last release" web page;
the order in which the changes actually got merged into Linus'
repository.

If I looked at the page yesterday, and then I look at it again today, I
want all the commits I hadn't seen already to be at the _top_.
Regardless of the date on which they were _originally_ committed to some
private tree elsewhere.

There were a lot of complaints until I worked out how to get that
ordering out of BitKeeper.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 10:07                           ` David Woodhouse
@ 2005-05-12 10:18                             ` Sean
  2005-05-12 10:42                               ` Thomas Gleixner
  2005-05-12 10:43                               ` David Woodhouse
  2005-05-12 10:39                             ` Sean
  1 sibling, 2 replies; 74+ messages in thread
From: Sean @ 2005-05-12 10:18 UTC (permalink / raw)
  To: David Woodhouse; +Cc: tglx, git

On Thu, May 12, 2005 6:07 am, David Woodhouse said:
> On Wed, 2005-05-11 at 20:58 -0400, Sean wrote:
>> > Try to find out the history of kernel.org/.../dwmw2/audit-2.6 in
>> > correct order, using the available tools.
>> >
>> > Come back to me when you are done.
>>
>> Ask me any question that matters and i'll answer it with available
>> tools.
>
> The above question matters, so please answer it if you can. I'll make it
> clearer for you though...
>
> By 'correct order' Thomas means the order in which my old BK-export
> script used to generate the "changesets since last release" web page;
> the order in which the changes actually got merged into Linus'
> repository.
>
> If I looked at the page yesterday, and then I look at it again today, I
> want all the commits I hadn't seen already to be at the _top_.
> Regardless of the date on which they were _originally_ committed to some
> private tree elsewhere.
>
> There were a lot of complaints until I worked out how to get that
> ordering out of BitKeeper.
>

Does BK use a repo ID ?  If not, can you not apply the same process to
git?   Seems the fast forward heads might complicate things slightly....

Sean




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 10:07                           ` David Woodhouse
  2005-05-12 10:18                             ` Sean
@ 2005-05-12 10:39                             ` Sean
  1 sibling, 0 replies; 74+ messages in thread
From: Sean @ 2005-05-12 10:39 UTC (permalink / raw)
  To: David Woodhouse; +Cc: tglx, git

On Thu, May 12, 2005 6:07 am, David Woodhouse said:
> On Wed, 2005-05-11 at 20:58 -0400, Sean wrote:
>> > Try to find out the history of kernel.org/.../dwmw2/audit-2.6 in
>> > correct order, using the available tools.
>> >
>> > Come back to me when you are done.
>>
>> Ask me any question that matters and i'll answer it with available
>> tools.
>
> The above question matters, so please answer it if you can. I'll make it
> clearer for you though...
>
> By 'correct order' Thomas means the order in which my old BK-export
> script used to generate the "changesets since last release" web page;
> the order in which the changes actually got merged into Linus'
> repository.
>
> If I looked at the page yesterday, and then I look at it again today, I
> want all the commits I hadn't seen already to be at the _top_.
> Regardless of the date on which they were _originally_ committed to some
> private tree elsewhere.
>
> There were a lot of complaints until I worked out how to get that
> ordering out of BitKeeper.
>

Actually, here is one very simple idea, just use the times from the object
files themselves.  Now as you descend the hierarchy you can simply stat
the object file to get the "local commit time".  Just simply stop
descending down each branch when you find a commit with a time stamp that
is outside the range you're interested in.

Sean



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 10:18                             ` Sean
@ 2005-05-12 10:42                               ` Thomas Gleixner
  2005-05-12 10:43                               ` David Woodhouse
  1 sibling, 0 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12 10:42 UTC (permalink / raw)
  To: Sean; +Cc: David Woodhouse, git

On Thu, 2005-05-12 at 06:18 -0400, Sean wrote:
> Does BK use a repo ID ?  

Yes.




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 10:18                             ` Sean
  2005-05-12 10:42                               ` Thomas Gleixner
@ 2005-05-12 10:43                               ` David Woodhouse
  2005-05-12 10:58                                 ` Sean
  1 sibling, 1 reply; 74+ messages in thread
From: David Woodhouse @ 2005-05-12 10:43 UTC (permalink / raw)
  To: Sean; +Cc: tglx, git

On Thu, 2005-05-12 at 06:18 -0400, Sean wrote:
> Does BK use a repo ID ?  If not, can you not apply the same process to
> git?   Seems the fast forward heads might complicate things
> slightly....

BK doesn't fast-forward in quite the same way as git does. But we're not
really supposed to be paying too much attention to how BK works.

Your claim is that you can do this with existing git tools. I await that
demonstration.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 10:43                               ` David Woodhouse
@ 2005-05-12 10:58                                 ` Sean
  0 siblings, 0 replies; 74+ messages in thread
From: Sean @ 2005-05-12 10:58 UTC (permalink / raw)
  To: David Woodhouse; +Cc: tglx, git

On Thu, May 12, 2005 6:43 am, David Woodhouse said:
> On Thu, 2005-05-12 at 06:18 -0400, Sean wrote:
>> Does BK use a repo ID ?  If not, can you not apply the same process to
>> git?   Seems the fast forward heads might complicate things
>> slightly....
>
> BK doesn't fast-forward in quite the same way as git does. But we're not
> really supposed to be paying too much attention to how BK works.

lol, i just asked because you brought it up.

> Your claim is that you can do this with existing git tools. I await that
> demonstration.

Well i'm not going to write the code for you, but simply descend the
history ordered by local commit time as given by the file object and
you're done.

Sean.



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  9:46                 ` Sean
@ 2005-05-12 11:18                   ` Thomas Gleixner
  2005-05-12 11:24                     ` Sean
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12 11:18 UTC (permalink / raw)
  To: Sean; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, 2005-05-12 at 05:46 -0400, Sean wrote:
> On Thu, May 12, 2005 5:39 am, Thomas Gleixner said:
> 
> > Please do the complete test. Sync test2 with test1 and show me the
> > picture there. It will be the same as you see in test1, which is wrong
> 
> It will get the fast forward head from test1, and so it _should_ show the
> exact same thing!  The repositories are in sync, they should display the
> exact same way.  What is the problem?

What you see is a clone and not a sync / merge. 

tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 11:18                   ` Thomas Gleixner
@ 2005-05-12 11:24                     ` Sean
  2005-05-12 11:43                       ` Thomas Gleixner
  0 siblings, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-12 11:24 UTC (permalink / raw)
  To: tglx; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, May 12, 2005 7:18 am, Thomas Gleixner said:
> On Thu, 2005-05-12 at 05:46 -0400, Sean wrote:
>> On Thu, May 12, 2005 5:39 am, Thomas Gleixner said:
>>
>> > Please do the complete test. Sync test2 with test1 and show me the
>> > picture there. It will be the same as you see in test1, which is wrong
>>
>> It will get the fast forward head from test1, and so it _should_ show
>> the
>> exact same thing!  The repositories are in sync, they should display the
>> exact same way.  What is the problem?
>
> What you see is a clone and not a sync / merge.
>

Right, that's what a fast forward head is.  It replaces a sync / merge and
the  trees become exactly syncronized via a shared head.   I have mixed
feelings about fast forward heads, but they don't hide _too_ much
information.  Is there any _useful_ question you can ask where the answer
is lost for all time because of this.

Sean



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 11:24                     ` Sean
@ 2005-05-12 11:43                       ` Thomas Gleixner
  2005-05-12 11:48                         ` Sean
  2005-05-12 13:29                         ` Jan Harkes
  0 siblings, 2 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12 11:43 UTC (permalink / raw)
  To: Sean; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, 2005-05-12 at 07:24 -0400, Sean wrote:
> Right, that's what a fast forward head is.  It replaces a sync / merge and
> the  trees become exactly syncronized via a shared head.   I have mixed
> feelings about fast forward heads, but they don't hide _too_ much
> information.  

The question is how hard it is to do a reconstruction. In the current
state automatic reconstruction is simply not possible. 

> Is there any _useful_ question you can ask where the answer
> is lost for all time because of this.

I want to see the history of _any_ repository in the order of  changes
in the specific repository. The fast forward heads without additional
information simply do not allow this. 

I want to see the history of a file in the correct order. The current
solution ends up with useless file version diffs or annotates where
changes are shown in random order and therefor worthless.

tglx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 11:43                       ` Thomas Gleixner
@ 2005-05-12 11:48                         ` Sean
  2005-05-12 12:16                           ` Thomas Gleixner
  2005-05-12 12:29                           ` David Woodhouse
  2005-05-12 13:29                         ` Jan Harkes
  1 sibling, 2 replies; 74+ messages in thread
From: Sean @ 2005-05-12 11:48 UTC (permalink / raw)
  To: tglx; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, May 12, 2005 7:43 am, Thomas Gleixner said:
> On Thu, 2005-05-12 at 07:24 -0400, Sean wrote:
>> Right, that's what a fast forward head is.  It replaces a sync / merge
>> and
>> the  trees become exactly syncronized via a shared head.   I have mixed
>> feelings about fast forward heads, but they don't hide _too_ much
>> information.
>
> The question is how hard it is to do a reconstruction. In the current
> state automatic reconstruction is simply not possible.

You keep evading the question.  What are you reconstructing, and why? 
What questions can you then answer with your reconstruction that you can't
answer  with what we already have today.   You HAVE to explain what the
VALUE of the end result is beyond what we already have today.

>> Is there any _useful_ question you can ask where the answer
>> is lost for all time because of this.
>
> I want to see the history of _any_ repository in the order of  changes
> in the specific repository. The fast forward heads without additional
> information simply do not allow this.

Then just download their repository with the -t switch of rsync or its
equal and preserve the timestamps on the files as they exist in the remote
repository.

> I want to see the history of a file in the correct order. The current
> solution ends up with useless file version diffs or annotates where
> changes are shown in random order and therefor worthless.
>

What is this correct order you're talking about?   The order is _given_
explicitly in the parent child relationships.  There is no other order of
any value, at least none you've been able to put forth.

Sean



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 11:48                         ` Sean
@ 2005-05-12 12:16                           ` Thomas Gleixner
  2005-05-12 12:16                             ` Sean
  2005-05-12 12:17                             ` Sean
  2005-05-12 12:29                           ` David Woodhouse
  1 sibling, 2 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12 12:16 UTC (permalink / raw)
  To: Sean; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, 2005-05-12 at 07:48 -0400, Sean wrote:

> Then just download their repository with the -t switch of rsync or its
> equal and preserve the timestamps on the files as they exist in the remote
> repository.

Thats really the brightest idea since the invention of sliced bread.

tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 12:16                           ` Thomas Gleixner
@ 2005-05-12 12:16                             ` Sean
  2005-05-12 12:34                               ` Thomas Gleixner
  2005-05-12 12:17                             ` Sean
  1 sibling, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-12 12:16 UTC (permalink / raw)
  To: tglx; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, May 12, 2005 8:16 am, Thomas Gleixner said:
> On Thu, 2005-05-12 at 07:48 -0400, Sean wrote:
>
>> Then just download their repository with the -t switch of rsync or its
>> equal and preserve the timestamps on the files as they exist in the
>> remote
>> repository.
>
> Thats really the brightest idea since the invention of sliced bread.
>

That's pretty smug for someone who can't even formulate a problem statement.

Sean



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 12:16                           ` Thomas Gleixner
  2005-05-12 12:16                             ` Sean
@ 2005-05-12 12:17                             ` Sean
  1 sibling, 0 replies; 74+ messages in thread
From: Sean @ 2005-05-12 12:17 UTC (permalink / raw)
  To: tglx; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, May 12, 2005 8:16 am, Thomas Gleixner said:
> On Thu, 2005-05-12 at 07:48 -0400, Sean wrote:
>
>> Then just download their repository with the -t switch of rsync or its
>> equal and preserve the timestamps on the files as they exist in the
>> remote
>> repository.
>
> Thats really the brightest idea since the invention of sliced bread.
>

By the way, you have to download the object ANYWAY

Sean




^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 11:48                         ` Sean
  2005-05-12 12:16                           ` Thomas Gleixner
@ 2005-05-12 12:29                           ` David Woodhouse
  2005-05-12 12:32                             ` Sean
  1 sibling, 1 reply; 74+ messages in thread
From: David Woodhouse @ 2005-05-12 12:29 UTC (permalink / raw)
  To: Sean; +Cc: tglx, Junio C Hamano, H. Peter Anvin, git

On Thu, 2005-05-12 at 07:48 -0400, Sean wrote:
> What is this correct order you're talking about?   The order is _given_
> explicitly in the parent child relationships.  There is no other order of
> any value, at least none you've been able to put forth.

Now you're just being silly. You _replied_ to a message in which it was
stated perfectly coherently. Even you appeared to understand the
explanation at that point.

*plonk*

-- 
dwmw2


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 12:29                           ` David Woodhouse
@ 2005-05-12 12:32                             ` Sean
  0 siblings, 0 replies; 74+ messages in thread
From: Sean @ 2005-05-12 12:32 UTC (permalink / raw)
  To: David Woodhouse; +Cc: tglx, Junio C Hamano, H. Peter Anvin, git

On Thu, May 12, 2005 8:29 am, David Woodhouse said:
> On Thu, 2005-05-12 at 07:48 -0400, Sean wrote:
>> What is this correct order you're talking about?   The order is _given_
>> explicitly in the parent child relationships.  There is no other order
>> of
>> any value, at least none you've been able to put forth.
>
> Now you're just being silly. You _replied_ to a message in which it was
> stated perfectly coherently. Even you appeared to understand the
> explanation at that point.
>

David,

I gave you a solution to your problem, what is the issue that remains?

Sean



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 12:16                             ` Sean
@ 2005-05-12 12:34                               ` Thomas Gleixner
  2005-05-12 12:35                                 ` Sean
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12 12:34 UTC (permalink / raw)
  To: Sean; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, 2005-05-12 at 08:16 -0400, Sean wrote:
> On Thu, May 12, 2005 8:16 am, Thomas Gleixner said:
> > On Thu, 2005-05-12 at 07:48 -0400, Sean wrote:
> >
> >> Then just download their repository with the -t switch of rsync or its
> >> equal and preserve the timestamps on the files as they exist in the
> >> remote
> >> repository.
> >
> > Thats really the brightest idea since the invention of sliced bread.
> >
> 
> That's pretty smug for someone who can't even formulate a problem statement.

I explained it several times and you refuse to understand it. 

I accept and understand your POV that it does not matter for you. But
thats not a really good reason to refuse others information which can be
added easily and without harming you.

tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 12:34                               ` Thomas Gleixner
@ 2005-05-12 12:35                                 ` Sean
  0 siblings, 0 replies; 74+ messages in thread
From: Sean @ 2005-05-12 12:35 UTC (permalink / raw)
  To: tglx; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, May 12, 2005 8:34 am, Thomas Gleixner said:

> I explained it several times and you refuse to understand it.
>
> I accept and understand your POV that it does not matter for you. But
> thats not a really good reason to refuse others information which can be
> added easily and without harming you.

And you have a perfectly workable solution handed to you that doesn't
require any change whatsoever.

Sean



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 11:43                       ` Thomas Gleixner
  2005-05-12 11:48                         ` Sean
@ 2005-05-12 13:29                         ` Jan Harkes
  2005-05-12 15:44                           ` Jon Seymour
  1 sibling, 1 reply; 74+ messages in thread
From: Jan Harkes @ 2005-05-12 13:29 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Sean, Junio C Hamano, H. Peter Anvin, git

On Thu, May 12, 2005 at 01:43:50PM +0200, Thomas Gleixner wrote:
> > Is there any _useful_ question you can ask where the answer
> > is lost for all time because of this.
> 
> I want to see the history of _any_ repository in the order of  changes
> in the specific repository. The fast forward heads without additional
> information simply do not allow this. 

But you can't add additional information to the fast-forward head. That
would defeat the whole point of the fast-forward.

> I want to see the history of a file in the correct order. The current
> solution ends up with useless file version diffs or annotates where
> changes are shown in random order and therefor worthless.

Not random order, those changes were performed in parallel, so there is
no order between them until they are merged, at which point the parent
linkage defines the order. If you want to add a total ordering to them,
write out a file with 'commit-id parent-id' pairs and run it through
'tsort'.

Your examples break if you consider additional merges where M syncs up a
couple of times (f.i. at Rn-2) before M is merged back into R.

What you seem to want won't be fixed by adding a repoid, you need to
keep a list of all the commits you have already seen and append any new
ones whenever you look at the history. If you look whenever you pull or
merge the list will be in the total ordering that you seem to expect for
your repository. But that is a porcelain thing.

Jan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 13:29                         ` Jan Harkes
@ 2005-05-12 15:44                           ` Jon Seymour
  2005-05-12 15:48                             ` Jon Seymour
  0 siblings, 1 reply; 74+ messages in thread
From: Jon Seymour @ 2005-05-12 15:44 UTC (permalink / raw)
  To: Thomas Gleixner, Sean, Junio C Hamano, H. Peter Anvin, git

On 5/12/05, Jan Harkes <jaharkes@cs.cmu.edu> wrote:
> On Thu, May 12, 2005 at 01:43:50PM +0200, Thomas Gleixner wrote:
> ....
> Your examples break if you consider additional merges where M syncs up a
> couple of times (f.i. at Rn-2) before M is merged back into R.
> 
> What you seem to want won't be fixed by adding a repoid, you need to
> keep a list of all the commits you have already seen and append any new
> ones whenever you look at the history. If you look whenever you pull or
> merge the list will be in the total ordering that you seem to expect for
> your repository. But that is a porcelain thing.
> 
> Jan

If committers always follow the convention that their previous local
commit is nominated as the first (local) parent in the commit and
commits from foreign repositories are listed after the first parent,
can the chain of "local" parents be an effective proxy for repoid?

Consider first a graph where there are no more than 2 parents in a merge

Ln
|     \
Ln-1  Fn
|         |
Ln-2  Fn-1
|       /
Ln-3

Thomas would like to sort this as:

Ln
Fn
Fn-1
Ln-1
Ln-2
Ln-3

So, use this algorithm:

1. Merge result comes first.
2. For each foreign parent:
    - sort the graph between the foreign parent and the merge base
according to his algorithm using the foreign parent as the starting
point of the algorithm. Append the result into the list.
3. Append the merge base to the list.

Admittedly the order for foreign parent for N-way merges is somewhat
arbitrary but a committer could probably make a choice that "works" in
most cases by specifying the foreign parents in a "sensible" order.

Of course, this relies on a committer always nominating the local
parent first, but that wouldn't be hard to enforce in the porcelain
layer.

jon.

1. the merging commit comes first
2. the graph of commits between each foreign parent and the
"merge-base" is sorted
3.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 15:44                           ` Jon Seymour
@ 2005-05-12 15:48                             ` Jon Seymour
  2005-05-12 15:50                               ` Jon Seymour
  0 siblings, 1 reply; 74+ messages in thread
From: Jon Seymour @ 2005-05-12 15:48 UTC (permalink / raw)
  To: Git Mailing List

| small clarification to algorithm, removed editing work area 

On 5/12/05, Jan Harkes <jaharkes@cs.cmu.edu> wrote:
> On Thu, May 12, 2005 at 01:43:50PM +0200, Thomas Gleixner wrote:
> ....
> Your examples break if you consider additional merges where M syncs up a
> couple of times (f.i. at Rn-2) before M is merged back into R.
>
> What you seem to want won't be fixed by adding a repoid, you need to
> keep a list of all the commits you have already seen and append any new
> ones whenever you look at the history. If you look whenever you pull or
> merge the list will be in the total ordering that you seem to expect for
> your repository. But that is a porcelain thing.
>
> Jan

If committers always follow the convention that their previous local
commit is nominated as the first (local) parent in the commit and
commits from foreign repositories are listed after the first parent,
can the chain of "local" parents be an effective proxy for repoid?

Consider first a graph where there are no more than 2 parents in a merge

Ln
|     \
Ln-1  Fn
|         |
Ln-2  Fn-1
|       /
Ln-3

Thomas would like to sort this as:

Ln
Fn
Fn-1
Ln-1
Ln-2
Ln-3

So, use this algorithm:

1. Merge result comes first.
2. For each foreign parent:
    - sort the graph between the foreign parent and the merge base
(not including merge base) according to his algorithm using the
foreign parent as the starting
point of the algorithm. Append the result into the list.
3. Append the merge base to the list.

Admittedly the order for foreign parent for N-way merges is somewhat
arbitrary but a committer could probably make a choice that "works" in
most cases by specifying the foreign parents in a "sensible" order.

Of course, this relies on a committer always nominating the local
parent first, but that wouldn't be hard to enforce in the porcelain
layer.

jon.
-- 
homepage: http://www.zeta.org.au/~jon/
blog: http://orwelliantremors.blogspot.com/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 15:48                             ` Jon Seymour
@ 2005-05-12 15:50                               ` Jon Seymour
  2005-05-12 16:20                                 ` Jan Harkes
  0 siblings, 1 reply; 74+ messages in thread
From: Jon Seymour @ 2005-05-12 15:50 UTC (permalink / raw)
  To: Git Mailing List

|| oops - fix to algorithm, sorry guys
| small clarification to algorithm, removed editing work area

On 5/12/05, Jan Harkes <jaharkes@cs.cmu.edu> wrote:
> On Thu, May 12, 2005 at 01:43:50PM +0200, Thomas Gleixner wrote:
> ....
> Your examples break if you consider additional merges where M syncs up a
> couple of times (f.i. at Rn-2) before M is merged back into R.
>
> What you seem to want won't be fixed by adding a repoid, you need to
> keep a list of all the commits you have already seen and append any new
> ones whenever you look at the history. If you look whenever you pull or
> merge the list will be in the total ordering that you seem to expect for
> your repository. But that is a porcelain thing.
>
> Jan

If committers always follow the convention that their previous local
commit is nominated as the first (local) parent in the commit and
commits from foreign repositories are listed after the first parent,
can the chain of "local" parents be an effective proxy for repoid?

Consider first a graph where there are no more than 2 parents in a merge

Ln
|     \
Ln-1  Fn
|         |
Ln-2  Fn-1
|       /
Ln-3

Thomas would like to sort this as:

Ln
Fn
Fn-1
Ln-1
Ln-2
Ln-3

So, use this algorithm:

1. Merge result comes first.
2. For each foreign parent:
    - sort the graph between the foreign parent and the merge base
(not including merge base) according to this algorithm . Append the
result into the list.
3. Sort the graph between the local parent and the merge base
(including merge base) according to this algorithm. Append the result
into the list.

Admittedly the order for foreign parent for N-way merges is somewhat
arbitrary but a committer could probably make a choice that "works" in
most cases by specifying the foreign parents in a "sensible" order.

Of course, this relies on a committer always nominating the local
parent first, but that wouldn't be hard to enforce in the porcelain
layer.

jon.
-- 
homepage: http://www.zeta.org.au/~jon/
blog: http://orwelliantremors.blogspot.com/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 15:50                               ` Jon Seymour
@ 2005-05-12 16:20                                 ` Jan Harkes
  2005-05-12 17:09                                   ` Jon Seymour
  0 siblings, 1 reply; 74+ messages in thread
From: Jan Harkes @ 2005-05-12 16:20 UTC (permalink / raw)
  To: jon; +Cc: Git Mailing List

On Fri, May 13, 2005 at 01:50:50AM +1000, Jon Seymour wrote:
> On 5/12/05, Jan Harkes <jaharkes@cs.cmu.edu> wrote:
> > On Thu, May 12, 2005 at 01:43:50PM +0200, Thomas Gleixner wrote:
> > ....
> > Your examples break if you consider additional merges where M syncs up a
> > couple of times (f.i. at Rn-2) before M is merged back into R.
...
> If committers always follow the convention that their previous local
> commit is nominated as the first (local) parent in the commit and
> commits from foreign repositories are listed after the first parent,
> can the chain of "local" parents be an effective proxy for repoid?
> 
> Consider first a graph where there are no more than 2 parents in a merge
> 
> Ln
> |     \
> Ln-1  Fn
> |         |
> Ln-2  Fn-1
> |       /
> Ln-3

It breaks when Fn was a pull from Ln-1, and Ln was a fast-forward to Fn.
Now the first parent is going to be Fn-1 and the history of the local
repository after the fast forward warps to

    Fn (== Ln)
    Ln-1
    Ln-2
    Fn-1
    Ln-3

And adding repoids doesn't help a bit. However if the local repo kept a
history of what the user has seen previously, it can be linearized
consistently. The history file would contain Ln-3...Ln-1 before the
fast-forward and would add Fn-1,Fn. We would end up with a history that
looks like,

    Fn (== Ln)
    Fn-1
    Ln-1
    Ln-2
    Ln-3

Which I believe is exactly what Thomas wants to see in this case. I
don't see how repoid's can be useful for this. It is a porcelain thing
where you need to track what you have seen before. Anything else doesn't
matter because most permutations of the history are perfectly valid
since the Fn and Ln changes in reality occured in parallel and as a
result can be arbitrarily interleaved.

In fact anyone else who branched at Ln-3 and merges again at Ln doesn't
really care in what order changes in the F and L branches occurred, only
that all modifications are included.

Jan

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 16:20                                 ` Jan Harkes
@ 2005-05-12 17:09                                   ` Jon Seymour
  2005-05-12 17:12                                     ` Jon Seymour
  0 siblings, 1 reply; 74+ messages in thread
From: Jon Seymour @ 2005-05-12 17:09 UTC (permalink / raw)
  To: Git Mailing List

On 5/13/05, Jan Harkes <jaharkes@cs.cmu.edu> wrote:
> >
> > Ln
> > |     \
> > Ln-1  Fn
> > |         |
> > Ln-2  Fn-1
> > |       /
> > Ln-3
> 
> It breaks when Fn was a pull from Ln-1, and Ln was a fast-forward to Fn.
> Now the first parent is going to be Fn-1 and the history of the local
> repository after the fast forward warps to
> 
>     Fn (== Ln)
>     Ln-1
>     Ln-2
>     Fn-1
>     Ln-3
> 

Yep, you are right.

> Which I believe is exactly what Thomas wants to see in this case. I
> don't see how repoid's can be useful for this. It is a porcelain thing
> where you need to track what you have seen before. Anything else doesn't
> matter because most permutations of the history are perfectly valid
> since the Fn and Ln changes in reality occured in parallel and as a
> result can be arbitrarily interleaved.
> 

I may be wrong, but I don't think Thomas is interested in his own
repository. I think he is interested in the history of commits found
in any public repository. Therefore, he needs an algorithm that
doesn't rely on locally cached information.

In otherwords, at each point in the commit graph, what did the
committer consider as "foreign" changes that needed to be merged into
the "local" repository to progress the repository forward. He wants to
derive that order only from the information in the repository itself -
everyone given the same commit graph should reach the same conclusion
as to what the committer saw as local and foreign at the time of the
commit.

My previous algorithm was incorrect, but I suspect it could probably
be fixed with a 2-pass algorithm that marked any nodes in the path
between the merge base and the merge head as local and then ensured
that nodes marked that way are sorted after any nodes reached via
"foreign" paths.
-- 
homepage: http://www.zeta.org.au/~jon/
blog: http://orwelliantremors.blogspot.com/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 17:09                                   ` Jon Seymour
@ 2005-05-12 17:12                                     ` Jon Seymour
  0 siblings, 0 replies; 74+ messages in thread
From: Jon Seymour @ 2005-05-12 17:12 UTC (permalink / raw)
  To: Git Mailing List

| added "local" to clarify what I meant the first-pass should do

> My previous algorithm was incorrect, but I suspect it could probably
> be fixed with a 2-pass algorithm that marked any nodes in the "local" path
> between the merge base and the merge head as local and then ensured
> that nodes marked that way are sorted after any nodes reached via
> "foreign" paths.
-- 
homepage: http://www.zeta.org.au/~jon/
blog: http://orwelliantremors.blogspot.com/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12  7:57           ` Thomas Gleixner
  2005-05-12  9:32             ` Sean
@ 2005-05-12 17:35             ` Junio C Hamano
  2005-05-12 18:18               ` Sean
  2005-05-12 20:47               ` Thomas Gleixner
  1 sibling, 2 replies; 74+ messages in thread
From: Junio C Hamano @ 2005-05-12 17:35 UTC (permalink / raw)
  To: tglx; +Cc: H. Peter Anvin, git

>>>>> "TG" == Thomas Gleixner <tglx@linutronix.de> writes:

TG> Rn   o
TG>      | \
TG> Rn-1 o  |
TG>      |  o Mn
TG>      |  o Mn-1
TG> Rn-2 o /
TG> Rn-3 o

TG> The correct display looking at R is

TG> Rn
TG>  Mn
TG>  Mn-1
TG> Rn-1
TG> Rn-2
TG> Rn-3

TG> Looking from M it is

TG> Rn
TG>  Rn-1
TG>  Rn-2
TG> Mn
TG> Mn-2
TG> Rn-3

Thanks for a very clear explanation.  The situation is
intriguing in that both R and M after converging end up with
exactly the same HEAD with the same set of objects but still
would want to see history leading to the HEAD differently.

I wonder what happens to a third person S, who pulls from both R
and M.  What does S see?  

Does the commit order observed by S depend on which one S pulls
from first?  That is, if S pulls from R then at that point Mn-1
and Mn comes after Rn-1 in S's history?  And after that what
hapens if S pulls from M (which is obviously a no-op except that
it would update .git/refs/heads/M)?  Does the history for S
change?

IIRC, Cogito lets you "track" upstream branches.  When S starts
tracking R, does it see R's history and when S starts tracking M
its history view changes to that of M?

Let's further say R and M are both based on another upstream L,
and R and M have converged at this point.  S has been tracking L
and it merged from R and M.  If S did not have any local
modifications since L, then that is just two fast forward
merges.  What does the history look like to S?  Which comes
first---Mn or Rn-1?

The answer to the above could be "the merge order history is per
tree and not something to be exported or given away to other
trees", in which case it may make sense from S's point of view
that Mn and Rn-1 are compares solely based on their commit
timestamps.  You will get consistent history and switching which
tree is being tracked would not change the history.  Is the goal
here to give the merge order history from R and M to S?

If that is not needed, then you can record in an auxiliary file
that is local to each tree the timestamp of when merge happened
in that tree along with set of foreign commit objects, and teach
rev-tree or rev-list to read from that auxiliary file and use
that timestamp for foreign commit objects instead of commit time
recorded in them when sorting by time is needed.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 17:35             ` Junio C Hamano
@ 2005-05-12 18:18               ` Sean
  2005-05-12 19:24                 ` Junio C Hamano
  2005-05-12 20:47               ` Thomas Gleixner
  1 sibling, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-12 18:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: tglx, H. Peter Anvin, git

On Thu, May 12, 2005 1:35 pm, Junio C Hamano said:

> If that is not needed, then you can record in an auxiliary file
> that is local to each tree the timestamp of when merge happened
> in that tree along with set of foreign commit objects, and teach
> rev-tree or rev-list to read from that auxiliary file and use
> that timestamp for foreign commit objects instead of commit time
> recorded in them when sorting by time is needed.

The time is already recorded.  Ie. the commit object is a separate file
with a modification time which can be used as a "local commit timestamp". 
 If you want to protect those time stamps by also recording them in a
separate file, that's a bonus I guess but shouldn't really be needed.

You can descend the history tree based on the parent position as described
by Jon Seymour.  That is, Cogito lists the "local" parent first, so you
descend that branch marking off visited nodes, then descend the other
branches reporting unvisited nodes only.  Afterward return and list any
unreported nodes in the first branch.

Of course, the problem with that is a fast forward node, where you can't
just blindly pick the first parent listed because it may belong to another
repository.   So the answer is to do away with fast forward nodes, or give
up on using the ordering of the parents to mean anything.   In which case
you have to pick the parent with the oldest local commit time as the first
node to descend.

So it seems, that rather than a repository identifier, we need each
repository to record the time of each local commit.   Either in a separate
file or just using the object file timestamps directly.

Sean

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 18:18               ` Sean
@ 2005-05-12 19:24                 ` Junio C Hamano
  2005-05-12 19:35                   ` Sean
  0 siblings, 1 reply; 74+ messages in thread
From: Junio C Hamano @ 2005-05-12 19:24 UTC (permalink / raw)
  To: Sean; +Cc: Junio C Hamano, tglx, H. Peter Anvin, git

>>>>> "S" == Sean  <seanlkml@sympatico.ca> writes:

S> On Thu, May 12, 2005 1:35 pm, Junio C Hamano said:
>> If that is not needed, then you can record in an auxiliary file
>> that is local to each tree the timestamp of when merge happened
>> in that tree along with set of foreign commit objects, and teach
>> rev-tree or rev-list to read from that auxiliary file and use
>> that timestamp for foreign commit objects instead of commit time
>> recorded in them when sorting by time is needed.

S> The time is already recorded.  Ie. the commit object is a
S> separate file with a modification time which can be used as a
S> "local commit timestamp".  If you want to protect those time
S> stamps by also recording them in a separate file, that's a
S> bonus I guess but shouldn't really be needed.

That would not work if (1) you are using SHA1_FILE_DIRECTORY
mechanism to share object pool for multiple trees, or (2) you
git-*-pull'ed but did not merge for some time.  The file
timestamps are the time of download but we want the time of
merge for this applicaton.  Also, that approach captures only
half the information necessary.  The other half you missed is
"which ones are foreign commits from this tree's point of view",
and as you described that is something you cannot tell just by
looking at the order of parents in commit objects.

S> So it seems, that rather than a repository identifier, we
S> need each repository to record the time of each local commit.
S> Either in a separate file or just using the object file
S> timestamps directly.

I think we are in agreement here, except that object file
timestamps is not something you can use.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 19:24                 ` Junio C Hamano
@ 2005-05-12 19:35                   ` Sean
  0 siblings, 0 replies; 74+ messages in thread
From: Sean @ 2005-05-12 19:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Junio C Hamano, tglx, H. Peter Anvin, git

On Thu, May 12, 2005 3:24 pm, Junio C Hamano said:

> That would not work if (1) you are using SHA1_FILE_DIRECTORY
> mechanism to share object pool for multiple trees, or (2) you
> git-*-pull'ed but did not merge for some time.  The file
> timestamps are the time of download but we want the time of

Surely you mean "GIT_OBJECT_DIRECTORY" <g> and you're right, if the local
object is shared amongst several trees you'd have to store the timestamp
separately.   However, as for your second case, the merge process could
set the timestamp on the file so that one really isn't a problem.  I for
one, would like the option to use this method when its appropriate,
although I agree you'd need a timestamp-database for other situations.

> merge for this applicaton.  Also, that approach captures only
> half the information necessary.  The other half you missed is
> "which ones are foreign commits from this tree's point of view",
> and as you described that is something you cannot tell just by
> looking at the order of parents in commit objects.

Right, but we're not talking about identifying foreign commits anymore! 
The point is just to list multiple parents in the correct "local" order. 
The timestamp information _is_ enough to identify the proper order for
local viewing.   And this has the very nice feature that it works for
branches made in the same repository, where the repoid proposal would
fail.

> S> So it seems, that rather than a repository identifier, we
> S> need each repository to record the time of each local commit.
> S> Either in a separate file or just using the object file
> S> timestamps directly.
>
> I think we are in agreement here, except that object file
> timestamps is not something you can use.

You can use it, just not in every situation.

Sean

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 17:35             ` Junio C Hamano
  2005-05-12 18:18               ` Sean
@ 2005-05-12 20:47               ` Thomas Gleixner
  2005-05-12 21:09                 ` Sean
  1 sibling, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12 20:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: H. Peter Anvin, git

On Thu, 2005-05-12 at 10:35 -0700, Junio C Hamano wrote:
> Thanks for a very clear explanation.  The situation is
> intriguing in that both R and M after converging end up with
> exactly the same HEAD with the same set of objects but still
> would want to see history leading to the HEAD differently.

Yes, thats what I wanted to achieve first hand with the repository id. I
think my first attempt is far from perfect and I agree with hpa on
having a .git/repoid file. 

> I wonder what happens to a third person S, who pulls from both R
> and M.  What does S see?  
> Does the commit order observed by S depend on which one S pulls
> from first?  That is, if S pulls from R then at that point Mn-1
> and Mn comes after Rn-1 in S's history?  And after that what
> hapens if S pulls from M (which is obviously a no-op except that
> it would update .git/refs/heads/M)?  Does the history for S
> change?

That's an interesting question. Of course, if you change the head you
see the tree from a different POV, but you can detect this when S pulls
from M after a pull from R. So the tool can ask the user, if he really
wants to change the commit order or not. You might even argue that they
could refuse to do the head change

> The answer to the above could be "the merge order history is per
> tree and not something to be exported or given away to other
> trees", in which case it may make sense from S's point of view
> that Mn and Rn-1 are compares solely based on their commit
> timestamps.  You will get consistent history and switching which
> tree is being tracked would not change the history.  Is the goal
> here to give the merge order history from R and M to S?

The goal from my side is to preserve the merge order history of R, M and
S in the individual way of commit order per repository, which includes
the merge order R->S, M->S or the other way round. See above

> If that is not needed, then you can record in an auxiliary file
> that is local to each tree the timestamp of when merge happened
> in that tree along with set of foreign commit objects, and teach
> rev-tree or rev-list to read from that auxiliary file and use
> that timestamp for foreign commit objects instead of commit time
> recorded in them when sorting by time is needed.

As I said before timestamps can be a horrid source of information. Also
if you keep a list of commits merges and head forwards in timed order it
is simple to read the repository history, but in case of corruption you
have to reconstruct it manually. There is no way to do so with the
information available.

Repository id's can be lost, but are simple to replace as they are
recorded in the commit blob.

tglx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 20:47               ` Thomas Gleixner
@ 2005-05-12 21:09                 ` Sean
  2005-05-12 21:21                   ` Thomas Gleixner
  0 siblings, 1 reply; 74+ messages in thread
From: Sean @ 2005-05-12 21:09 UTC (permalink / raw)
  To: tglx; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, May 12, 2005 4:47 pm, Thomas Gleixner said:

> As I said before timestamps can be a horrid source of information. Also
> if you keep a list of commits merges and head forwards in timed order it
> is simple to read the repository history, but in case of corruption you
> have to reconstruct it manually. There is no way to do so with the
> information available.
>
> Repository id's can be lost, but are simple to replace as they are
> recorded in the commit blob.

And the time is recorded on the commit blob too. In case of corruption,
restore the blobs from backup, you get everything back.  Corruption can
wipe out repoids and complete git objects too, you had better have
backups.  Repoids offer no protection from corruption or otherwise lost
blobs.

Sean



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 21:09                 ` Sean
@ 2005-05-12 21:21                   ` Thomas Gleixner
  2005-05-12 21:32                     ` Sean
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12 21:21 UTC (permalink / raw)
  To: Sean; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, 2005-05-12 at 17:09 -0400, Sean wrote:
> On Thu, May 12, 2005 4:47 pm, Thomas Gleixner said:
> 
> > As I said before timestamps can be a horrid source of information. Also
> > if you keep a list of commits merges and head forwards in timed order it
> > is simple to read the repository history, but in case of corruption you
> > have to reconstruct it manually. There is no way to do so with the
> > information available.
> >
> > Repository id's can be lost, but are simple to replace as they are
> > recorded in the commit blob.
> 
> And the time is recorded on the commit blob too. 

How do you enforce correct timestamps  ? 

tglx


Received: from simmts5-srv.bellnexxia.net (simmts5.bellnexxia.net
[206.47.199.163]) by mail.tglx.de (Postfix) with ESMTP id 623FE65C003
for <tglx@linutronix.de>; Thu, 12 May 2005 23:09:21 +0200 (CEST)
Received: from linux1 ([69.156.111.46]) by simmts5-srv.bellnexxia.net
(InterMail vM.5.01.06.10 201-253-122-130-110-20040306) with ESMTP id
<20050512210923.CCQN11606.simmts5-srv.bellnexxia.net@linux1>; Thu, 12
May 2005 17:09:23 -0400









^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 21:21                   ` Thomas Gleixner
@ 2005-05-12 21:32                     ` Sean
  2005-05-12 21:44                       ` Junio C Hamano
  2005-05-12 22:06                       ` Thomas Gleixner
  0 siblings, 2 replies; 74+ messages in thread
From: Sean @ 2005-05-12 21:32 UTC (permalink / raw)
  To: tglx; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, May 12, 2005 5:21 pm, Thomas Gleixner said:

>> And the time is recorded on the commit blob too.
>
> How do you enforce correct timestamps  ?

When an object is committed locally it is set to the local time.  You can
only have this feature when you use private commit objects (shared blobs
are okay).  It doesn't matter if the timestamps are correct in the global
sense, just that they're correct for the local server, because they'll
only ever be compared against each other.

By the way, repoid doesn't work when all the branches are done in the same
repository.  You'd need to use something like repoid-branch.

One area where your repoid is superior that i missed in my previous email
is that you can actually recover a corrupt blob from an unrelated
repository that happens to contain it, and you've lost no information. 
Which is what Linus was expounding as one of the benefits of the git
design.

Sean

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 21:32                     ` Sean
@ 2005-05-12 21:44                       ` Junio C Hamano
  2005-05-12 22:06                       ` Thomas Gleixner
  1 sibling, 0 replies; 74+ messages in thread
From: Junio C Hamano @ 2005-05-12 21:44 UTC (permalink / raw)
  To: Sean; +Cc: tglx, Junio C Hamano, H. Peter Anvin, git

>>>>> "S" == Sean  <seanlkml@sympatico.ca> writes:

S> When an object is committed locally it is set to the local time.  You can
S> only have this feature when you use private commit objects (shared blobs
S> are okay).

This brings up an interesting possibility, which is off topic
from this thread.

You _could_ (I am not advocating this, just thinking aloud) have
GIT_OBJECT_DIRECTORY and GIT_COMMIT_OBJECT_DIRECTORY pointing at
two separate object pools, with the value of
GIT_COMMIT_OBJECT_DIRECTORY being on
GIT_ALTERNATE_OBJECT_DIRECTORIES list.  Your commits go to
GIT_COMMIT_OBJECT_DIRECTORY (local to the tree) and everything
else go to GIT_OBJECT_DIRECTORY (can be shared across trees).

Hmm.... Interesting.  My gut feeling tells me not to go there,
though.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 21:32                     ` Sean
  2005-05-12 21:44                       ` Junio C Hamano
@ 2005-05-12 22:06                       ` Thomas Gleixner
  2005-05-12 22:24                         ` Sean
  1 sibling, 1 reply; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-12 22:06 UTC (permalink / raw)
  To: Sean; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, 2005-05-12 at 17:32 -0400, Sean wrote:
> > How do you enforce correct timestamps  ?
> 
> When an object is committed locally it is set to the local time.  You can
> only have this feature when you use private commit objects (shared blobs
> are okay).  It doesn't matter if the timestamps are correct in the global
> sense, just that they're correct for the local server, because they'll
> only ever be compared against each other.

That limits the usefulness to a local place, which makes no sense in a
distributed development scenario. 


> By the way, repoid doesn't work when all the branches are done in the same
> repository.  You'd need to use something like repoid-branch.

Right. That was my basic idea to collect the information either from an
environment variable or deduce it from the current wroking directory,
which is unlikely to be the same for different branches. hpa's arguments
against this approach are quite good but I think somethink like a per
branch repository id is not too hard to implement.

tglx



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit
  2005-05-12 22:06                       ` Thomas Gleixner
@ 2005-05-12 22:24                         ` Sean
  0 siblings, 0 replies; 74+ messages in thread
From: Sean @ 2005-05-12 22:24 UTC (permalink / raw)
  To: tglx; +Cc: Junio C Hamano, H. Peter Anvin, git

On Thu, May 12, 2005 6:06 pm, Thomas Gleixner said:

Thomas,

> That limits the usefulness to a local place, which makes no sense in a
> distributed development scenario.

I don't think that is true, the only time you'd use this time is when
comparing against other commits from the same repository.  As you download
the commits you're interested in from a remote repository, you compare
them to each other to get the order.

Sean

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit [its a workspace id, isn't it?]
  2005-05-11 23:14 ` H. Peter Anvin
  2005-05-11 23:38   ` Thomas Gleixner
@ 2005-05-13  1:37   ` Jon Seymour
  2005-05-13  8:36     ` Thomas Gleixner
  2005-05-13 22:25     ` Petr Baudis
  1 sibling, 2 replies; 74+ messages in thread
From: Jon Seymour @ 2005-05-13  1:37 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: tglx, git

> 
> I would like to suggest a few limiters are set on the repoid.  In
> particular, I'd like to suggest that a repoid is a UUID, that a file is
> used to track it (.git/repoid), and that if it doesn't exist, a new one
> is created from /dev/urandom.
> 

I think I understand what Thomas is trying to achieve, but I think
there is a naming problem here. The marker really isn't a repoid - it
is a workspace id.

Two workspaces can share the same physical repository, yet have
different "repoid"s. So the thing being identified isn't the
repository - it's the workspace in the commit was performed.

Thomas' objective, I think, is the following: 

    from the point of view of a given workspace, determine the merge
order of the
    global repository (and there really is only _one_ repository for
this purpose) from
    the point of view of that workspace. The interesting workspaces
are workspaces that
    contributed commits to the global history.

Thomas is correct to point out that committer id is not a substitute
for a workspace identifier since a given committer may work in
multiple workspaces concurrently.

I can also see why an identifier in the commit is necessary to
reconstruct the history.

Consider the following history:

Rn
|      \
Rn-1 Mn
|     /
Rn-2
|       \ 
Rn-3 Mn-1
|    /
Rn-4

Assume that changes Mn and Mn-1 are made the same workspace, M. Then, from the 
point of view of workspace M, the history is:

Rn
Rn-1
Mn
Rn-2
Rn-3
Mn-1
Rn-4

>From the point of view of a given change epoch, M always wants to see
"local changes occur first". To know what changes were local to M you
need to mark the changes that workspace M made with an identifier
saying that M did this in this workspace, hence the need for the
marker that Thomas is proposing.

Assuming that there is value in being able to reconstruct the merge
order from the perspective of workspaces that have contributed to the
global history it would seem that Thomas's suggestion of marking each
commit with an identifier is reasonable, however, I think the name of
the identifier should change - what's being tracked is a workspace,
not a repository.

jon.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit [its a workspace id, isn't it?]
  2005-05-13  1:37   ` [PATCH] [RFD] Add repoid identifier to commit [its a workspace id, isn't it?] Jon Seymour
@ 2005-05-13  8:36     ` Thomas Gleixner
  2005-05-13 22:25     ` Petr Baudis
  1 sibling, 0 replies; 74+ messages in thread
From: Thomas Gleixner @ 2005-05-13  8:36 UTC (permalink / raw)
  To: jon; +Cc: H. Peter Anvin, git

On Fri, 2005-05-13 at 11:37 +1000, Jon Seymour wrote:

> I think I understand what Thomas is trying to achieve, but I think
> there is a naming problem here. The marker really isn't a repoid - it
> is a workspace id.

I did not think about the naming convention here. I was just looking at
the repositories of Dave Miller - net-2.6 and sparc-2.6 - which are not
seperable by any automated mechanism due to the fact that Dave uses the
same committer name for both, which is reasonable. 

You are right, those are workspaces which happen to have a seperate
public repository.

> From the point of view of a given change epoch, M always wants to see
> "local changes occur first". To know what changes were local to M you
> need to mark the changes that workspace M made with an identifier
> saying that M did this in this workspace, hence the need for the
> marker that Thomas is proposing.

My main concern here is to be able to see a change in the context in
which it was made.

In distributed development a change made in workspace A is correct in
the context of A and a change made in the workspace B is correct in the
context of B. By merging these maybe unrelated changes produce a
problem. Add a random number of changes to increase the complexitiy.

It is helpful from my experience to have a possibility to see the
seperate changes in the context where they were made to understand why
the change was made.

If your history is cluttered by the head forward cloning you have more
work to deduce the information you want to have instead of having it
available on demand by a tool.

> Assuming that there is value in being able to reconstruct the merge
> order from the perspective of workspaces that have contributed to the
> global history it would seem that Thomas's suggestion of marking each
> commit with an identifier is reasonable, however, I think the name of
> the identifier should change - what's being tracked is a workspace,
> not a repository.

Ack.

The question is how to automate those workspace identifiers in a
senseful way. A shared object repository makes it necessary to keep the
identifier in workspace itself. A first idea might be
a .git_workspace_id file in the toplevel directory of the workspace,
which can automatically be ignored by all git tools. Maybe a ignore rule
for all .git* files is also reasonable to make future extensions simpler

tglx

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit [its a workspace id, isn't it?]
  2005-05-13  1:37   ` [PATCH] [RFD] Add repoid identifier to commit [its a workspace id, isn't it?] Jon Seymour
  2005-05-13  8:36     ` Thomas Gleixner
@ 2005-05-13 22:25     ` Petr Baudis
  2005-05-13 22:26       ` H. Peter Anvin
  2005-05-13 23:49       ` Jon Seymour
  1 sibling, 2 replies; 74+ messages in thread
From: Petr Baudis @ 2005-05-13 22:25 UTC (permalink / raw)
  To: Jon Seymour; +Cc: H. Peter Anvin, tglx, git

Dear diary, on Fri, May 13, 2005 at 03:37:47AM CEST, I got a letter
where Jon Seymour <jon.seymour@gmail.com> told me that...
> > 
> > I would like to suggest a few limiters are set on the repoid.  In
> > particular, I'd like to suggest that a repoid is a UUID, that a file is
> > used to track it (.git/repoid), and that if it doesn't exist, a new one
> > is created from /dev/urandom.
> > 
> 
> I think I understand what Thomas is trying to achieve, but I think
> there is a naming problem here. The marker really isn't a repoid - it
> is a workspace id.

Why not just call the thing "branch"? It's as well eligible for that
term as anything. :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit [its a workspace id, isn't it?]
  2005-05-13 22:25     ` Petr Baudis
@ 2005-05-13 22:26       ` H. Peter Anvin
  2005-05-13 23:39         ` Petr Baudis
  2005-05-13 23:49       ` Jon Seymour
  1 sibling, 1 reply; 74+ messages in thread
From: H. Peter Anvin @ 2005-05-13 22:26 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Jon Seymour, tglx, git

Petr Baudis wrote:
> 
> Why not just call the thing "branch"? It's as well eligible for that
> term as anything. :-)
> 

Because cogito already calls too many things "branches"?

	-hpa

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit [its a workspace id, isn't it?]
  2005-05-13 22:26       ` H. Peter Anvin
@ 2005-05-13 23:39         ` Petr Baudis
  0 siblings, 0 replies; 74+ messages in thread
From: Petr Baudis @ 2005-05-13 23:39 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Jon Seymour, tglx, git

Dear diary, on Sat, May 14, 2005 at 12:26:53AM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> told me that...
> Petr Baudis wrote:
> >
> >Why not just call the thing "branch"? It's as well eligible for that
> >term as anything. :-)
> >
> 
> Because cogito already calls too many things "branches"?

Well, I know of one thing cogito calls "branch", and incidentally the
meaning if effectively same as this "branch" would have.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit [its a workspace id, isn't it?]
  2005-05-13 22:25     ` Petr Baudis
  2005-05-13 22:26       ` H. Peter Anvin
@ 2005-05-13 23:49       ` Jon Seymour
  2005-05-14  5:02         ` Jon Seymour
  1 sibling, 1 reply; 74+ messages in thread
From: Jon Seymour @ 2005-05-13 23:49 UTC (permalink / raw)
  To: Petr Baudis; +Cc: H. Peter Anvin, tglx, git

On 5/14/05, Petr Baudis <pasky@ucw.cz> wrote:
> Dear diary, on Fri, May 13, 2005 at 03:37:47AM CEST, I got a letter
> where Jon Seymour <jon.seymour@gmail.com> told me that...
> > >
> > > I would like to suggest a few limiters are set on the repoid.  In
> > > particular, I'd like to suggest that a repoid is a UUID, that a file is
> > > used to track it (.git/repoid), and that if it doesn't exist, a new one
> > > is created from /dev/urandom.
> > >
> >
> > I think I understand what Thomas is trying to achieve, but I think
> > there is a naming problem here. The marker really isn't a repoid - it
> > is a workspace id.
> 
> Why not just call the thing "branch"? It's as well eligible for that
> term as anything. :-)
> 

I very nearly agreed with you except for one thing. In a traditional
SCM, a branch is something that can be contributed by multiple people
but I don't think we want that meaning here.

Consider this graph where there is a main Trunk R, and a branch B and
two programmers Ba and Bb working on a branch B.  Thomas wants to be
able to form views of the merge order of 3 workspaces (W(R), W(Ba) and
W(Bb)), even though Ba and Bb are working on the same "branch" in SCM
terms.

Rn -------------
|      \         \ 
Rn-1 Ba,n   Bb,n
|     /             |
Rn-2             |
|       \           |
Rn-3 Ba,n-1  |
|    /             /
Rn-4 -----------

Also, most times in a traditional SCM, branches diverge, but the
behaviour we are interested in here is the repeated converging and
diverging of workspaces on the same "branch" [ I know, a branches can
be used in that way in traditional SCMs, but that is a degenerate case
of their intended use - maintaining paths of long term divergence ].

So, while I think there may well be some value is a separate
"branchid" attribute in a commit, I think they are describing
(slightly) different things.

jon
-- 
homepage: http://www.zeta.org.au/~jon/
blog: http://orwelliantremors.blogspot.com/

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH] [RFD] Add repoid identifier to commit [its a workspace id, isn't it?]
  2005-05-13 23:49       ` Jon Seymour
@ 2005-05-14  5:02         ` Jon Seymour
  0 siblings, 0 replies; 74+ messages in thread
From: Jon Seymour @ 2005-05-14  5:02 UTC (permalink / raw)
  To: Petr Baudis; +Cc: H. Peter Anvin, tglx, git

Of course, this graph wasn't really the best example of a traditional
SCM branch or the point I was trying to make.
> 
> Rn -------------
> |      \         \
> Rn-1 Ba,n   Bb,n
> |     /             |
> Rn-2             |
> |       \           |
> Rn-3 Ba,n-1  |
> |    /             /
> Rn-4 -----------

It's more like:

Tn     B,W(a),n
|        |                 \
|        B,W(a),n-1   B,W(b),n-1
|        |                  |
Tn-1  |                   |
|        B,W(a),1------|
|       /
Tn-2

Branch B diverges from the main trunk T, parallel (but frequently
convergent) development in branch B happens in two workspaces W(a) and
W(b).

So a branchid would track all commits that contribute to a branch,
while a workspace id would track which commits happened in which
workspace, to enable the reconstruction of the merge order of the
global history with respect to the workspaces that perform commits.

jon.
-- 
homepage: http://www.zeta.org.au/~jon/
blog: http://orwelliantremors.blogspot.com/

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2005-05-14  5:02 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-11 21:38 [PATCH] [RFD] Add repoid identifier to commit Thomas Gleixner
2005-05-11 22:00 ` Sean
2005-05-11 22:05   ` Thomas Gleixner
2005-05-11 22:24     ` Sean
2005-05-11 22:30       ` Thomas Gleixner
2005-05-11 22:36         ` Sean
2005-05-11 22:48           ` Thomas Gleixner
2005-05-11 23:01             ` Sean
2005-05-11 23:33               ` Thomas Gleixner
2005-05-11 23:44                 ` Sean
2005-05-12  0:30                   ` Thomas Gleixner
2005-05-12  0:45                     ` Sean
2005-05-12  0:56                       ` Thomas Gleixner
2005-05-12  0:58                         ` Sean
2005-05-12 10:07                           ` David Woodhouse
2005-05-12 10:18                             ` Sean
2005-05-12 10:42                               ` Thomas Gleixner
2005-05-12 10:43                               ` David Woodhouse
2005-05-12 10:58                                 ` Sean
2005-05-12 10:39                             ` Sean
2005-05-11 23:14 ` H. Peter Anvin
2005-05-11 23:38   ` Thomas Gleixner
2005-05-11 23:40     ` H. Peter Anvin
2005-05-11 23:45       ` Sean
2005-05-12  0:04         ` H. Peter Anvin
2005-05-12  0:20           ` Sean
2005-05-12  0:33       ` Thomas Gleixner
2005-05-12  1:46         ` Junio C Hamano
2005-05-12  7:57           ` Thomas Gleixner
2005-05-12  9:32             ` Sean
2005-05-12  9:39               ` Thomas Gleixner
2005-05-12  9:46                 ` Sean
2005-05-12 11:18                   ` Thomas Gleixner
2005-05-12 11:24                     ` Sean
2005-05-12 11:43                       ` Thomas Gleixner
2005-05-12 11:48                         ` Sean
2005-05-12 12:16                           ` Thomas Gleixner
2005-05-12 12:16                             ` Sean
2005-05-12 12:34                               ` Thomas Gleixner
2005-05-12 12:35                                 ` Sean
2005-05-12 12:17                             ` Sean
2005-05-12 12:29                           ` David Woodhouse
2005-05-12 12:32                             ` Sean
2005-05-12 13:29                         ` Jan Harkes
2005-05-12 15:44                           ` Jon Seymour
2005-05-12 15:48                             ` Jon Seymour
2005-05-12 15:50                               ` Jon Seymour
2005-05-12 16:20                                 ` Jan Harkes
2005-05-12 17:09                                   ` Jon Seymour
2005-05-12 17:12                                     ` Jon Seymour
2005-05-12 17:35             ` Junio C Hamano
2005-05-12 18:18               ` Sean
2005-05-12 19:24                 ` Junio C Hamano
2005-05-12 19:35                   ` Sean
2005-05-12 20:47               ` Thomas Gleixner
2005-05-12 21:09                 ` Sean
2005-05-12 21:21                   ` Thomas Gleixner
2005-05-12 21:32                     ` Sean
2005-05-12 21:44                       ` Junio C Hamano
2005-05-12 22:06                       ` Thomas Gleixner
2005-05-12 22:24                         ` Sean
2005-05-12  0:41     ` Dmitry Torokhov
2005-05-12  0:44       ` Thomas Gleixner
2005-05-12  1:09         ` H. Peter Anvin
2005-05-12  1:13           ` H. Peter Anvin
2005-05-12  3:30             ` Joel Becker
2005-05-12  9:17             ` Thomas Gleixner
2005-05-13  1:37   ` [PATCH] [RFD] Add repoid identifier to commit [its a workspace id, isn't it?] Jon Seymour
2005-05-13  8:36     ` Thomas Gleixner
2005-05-13 22:25     ` Petr Baudis
2005-05-13 22:26       ` H. Peter Anvin
2005-05-13 23:39         ` Petr Baudis
2005-05-13 23:49       ` Jon Seymour
2005-05-14  5:02         ` Jon Seymour

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).