History messup

Git development
 help / color / mirror / Atom feed

* History messup
@ 2005-05-09 16:59 Thomas Gleixner
  2005-05-09 17:06 ` Thomas Glanzmann
  2005-05-09 17:27 ` David Woodhouse
  0 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2005-05-09 16:59 UTC (permalink / raw)
  To: git; +Cc: David Woodhouse, Linus Torvalds

Hi,

I wrote a git repository tracker, which can track and coordinate
multiple git repositories. Before it goes public, I want to clarify a
problem which I encountered

The commit bfd4bda097f8758d28e632ff2035e25577f6b060 
by David Woodhouse (Thu May 5 12:59:37 2005 +100)  
Merge with
master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.git 

breaks the history.

David merged from Linus repository . Linus synced later with David.
Linus did not create a new commit for this update and just pointed his
"HEAD" to Davids "HEAD", which means he forked Davids repository at this
point.

Due to that the parent->parent history is not longer unique. This makes
it impossible to do file revision graphs over the various repositories
in the correct order.

Is this a unique problem or is the omission of a commit in cases like
that usual practice ? In the latter case proper history tracking is
almost impossible.

tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: History messup
  2005-05-09 16:59 History messup Thomas Gleixner
@ 2005-05-09 17:06 ` Thomas Glanzmann
  2005-05-09 18:05   ` Thomas Gleixner
  2005-05-09 17:27 ` David Woodhouse
  1 sibling, 1 reply; 11+ messages in thread
From: Thomas Glanzmann @ 2005-05-09 17:06 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: git, David Woodhouse, Linus Torvalds

Hello,
if merging with a repository just means to bring the head forward (e.g.
no local development since the fork) there is no seperate commit object,
just an update of the HEAD. Linus did explain this behaviour and the
thoughts behind this decission on the ML:

If two repositories pull alternating from each other one has never the
exact state the other has. They would play ping-pong and this is a bad
thing. That was AFAIK the reason there is no COMMIT object introduced on
'head forward' merges.

	Thomas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: History messup
  2005-05-09 17:06 ` Thomas Glanzmann
@ 2005-05-09 18:05   ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2005-05-09 18:05 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: git, David Woodhouse, Linus Torvalds

On Mon, 2005-05-09 at 19:06 +0200, Thomas Glanzmann wrote:
> If two repositories pull alternating from each other one has never the
> exact state the other has. They would play ping-pong and this is a bad
> thing. That was AFAIK the reason there is no COMMIT object introduced on
> 'head forward' merges.

That makes totaly sense, but for history tracking it is a horror
scenario, unless you dont have head history lists per repository or an
unique repository identifier in the commit blob itself.

tglx



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: History messup
  2005-05-09 16:59 History messup Thomas Gleixner
  2005-05-09 17:06 ` Thomas Glanzmann
@ 2005-05-09 17:27 ` David Woodhouse
  2005-05-09 17:48   ` Thomas Gleixner
  1 sibling, 1 reply; 11+ messages in thread
From: David Woodhouse @ 2005-05-09 17:27 UTC (permalink / raw)
  To: tglx; +Cc: git, Linus Torvalds

On Mon, 2005-05-09 at 16:59 +0000, Thomas Gleixner wrote:
> David merged from Linus repository . Linus synced later with David.
> Linus did not create a new commit for this update and just pointed his
> "HEAD" to Davids "HEAD", which means he forked Davids repository at
> this point.
> 
> Due to that the parent->parent history is not longer unique. This
> makes it impossible to do file revision graphs over the various
> repositories in the correct order.
> 
> Is this a unique problem or is the omission of a commit in cases like
> that usual practice ? In the latter case proper history tracking is
> almost impossible.

It's normal practice, and it _has_ to be the case. Otherwise the trees
would never stabilise -- every time Linus pulled from my tree he would
create a merge-commit which I don't yet have, and vice versa.

Unless a commit also carries a unique repo-id identifying the repository
in which it originally occurred, you'll only ever be able to track
history in the way people want by means of heuristics.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: History messup
  2005-05-09 17:27 ` David Woodhouse
@ 2005-05-09 17:48   ` Thomas Gleixner
  2005-05-09 19:01     ` H. Peter Anvin
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2005-05-09 17:48 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git, Linus Torvalds

On Mon, 2005-05-09 at 18:27 +0100, David Woodhouse wrote:
> On Mon, 2005-05-09 at 16:59 +0000, Thomas Gleixner wrote:

> It's normal practice, and it _has_ to be the case. Otherwise the trees
> would never stabilise -- every time Linus pulled from my tree he would
> create a merge-commit which I don't yet have, and vice versa.

Sure

> Unless a commit also carries a unique repo-id identifying the repository
> in which it originally occurred, you'll only ever be able to track
> history in the way people want by means of heuristics.

That would be really great. A line after "parents" like

repoid "username/reponame" 

would be sufficient

tglx



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: History messup
  2005-05-09 17:48   ` Thomas Gleixner
@ 2005-05-09 19:01     ` H. Peter Anvin
  2005-05-09 19:06       ` David Woodhouse
  0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2005-05-09 19:01 UTC (permalink / raw)
  To: tglx; +Cc: David Woodhouse, git, Linus Torvalds

Thomas Gleixner wrote:
> 
> That would be really great. A line after "parents" like
> 
> repoid "username/reponame" 
> 
> would be sufficient
> 

Seems like a UUID or a SHA-1 identifier would be better.

However, one can definitely argue that even the meaning of "a 
repository" isn't well-defined in the context of git.

	-hpa

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: History messup
  2005-05-09 19:01     ` H. Peter Anvin
@ 2005-05-09 19:06       ` David Woodhouse
  2005-05-09 19:34         ` H. Peter Anvin
  2005-05-11 20:31         ` Petr Baudis
  0 siblings, 2 replies; 11+ messages in thread
From: David Woodhouse @ 2005-05-09 19:06 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: tglx, git, Linus Torvalds

On Mon, 2005-05-09 at 12:01 -0700, H. Peter Anvin wrote:
> Seems like a UUID or a SHA-1 identifier would be better.
> 
> However, one can definitely argue that even the meaning of "a 
> repository" isn't well-defined in the context of git.

Of course it isn't. But neither is the meaning "a committer" or 
"an author" or even "a date".

Including some kind of repo-specific identifier with each commit would
help us to make sense of the history, just as those other fields do.

-- 
dwmw2



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: History messup
  2005-05-09 19:06       ` David Woodhouse
@ 2005-05-09 19:34         ` H. Peter Anvin
  2005-05-09 22:08           ` Sean
  2005-05-11 20:31         ` Petr Baudis
  1 sibling, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2005-05-09 19:34 UTC (permalink / raw)
  To: David Woodhouse; +Cc: tglx, git, Linus Torvalds

David Woodhouse wrote:
> 
> Of course it isn't. But neither is the meaning "a committer" or 
> "an author" or even "a date".
> 
> Including some kind of repo-specific identifier with each commit would
> help us to make sense of the history, just as those other fields do.
> 

I'm particularly thinking of when you copy and clone directories, you 
have to define your semantics more specifically.  When do you want to 
*copy* this ID, and when do you want to make sure a new one is created?

One possible answer to that is to have .git/repoid and have it 
auto-created (from /dev/urandom) if it doesn't exist, but I also observe 
that at least two people (davem and pavel) have managed to clone "Linus' 
kernel tree" as their description on http://www.kernel.org/git/ ...

	-hpa

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: History messup
  2005-05-09 19:34         ` H. Peter Anvin
@ 2005-05-09 22:08           ` Sean
  2005-05-11 17:09             ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Sean @ 2005-05-09 22:08 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: David Woodhouse, tglx, git, Linus Torvalds

On Mon, May 9, 2005 3:34 pm, H. Peter Anvin said:
> I'm particularly thinking of when you copy and clone directories, you
> have to define your semantics more specifically.  When do you want to
> *copy* this ID, and when do you want to make sure a new one is created?

What question will a repoid let you answer?

Isn't the real problem, where to store "repoid"?  Obviously it would be
stored in each commit created in a repo, but that's not really the
problem.   The problem is identifying which repo each branch belongs to as
you traverse the history.  This lets you ask questions like, "was this
commit created on this branch, or is it just a copy".

But just as the committer information in each object doesn't help you
identify the branch, nor will having a repoid inside each commit.   The
problem is labelling the branch, not the commits found in it.  And there
is no place to store the repoid for each branch.

Using the repoid of the first commit found on a branch is pretty close,
but fails because it might be a fast-forward HEAD rather than a genuine
local commit of the source repository.  So there is no way to know as you
embark down a branch its repoid, so you have nothing to compare against
the repoid inside each commit you find along its path.

Seems the only solution is a full search of the history, unless there is
some clever way to label branches or detect fast forward heads.

> One possible answer to that is to have .git/repoid and have it
> auto-created (from /dev/urandom) if it doesn't exist, but I also observe
> that at least two people (davem and pavel) have managed to clone "Linus'
> kernel tree" as their description on http://www.kernel.org/git/ ...

:o)

sean

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: History messup
  2005-05-09 22:08           ` Sean
@ 2005-05-11 17:09             ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2005-05-11 17:09 UTC (permalink / raw)
  To: Sean; +Cc: H. Peter Anvin, David Woodhouse, git, Linus Torvalds

On Mon, 2005-05-09 at 18:08 -0400, Sean wrote:
> On Mon, May 9, 2005 3:34 pm, H. Peter Anvin said:
> Seems the only solution is a full search of the history, unless there is
> some clever way to label branches or detect fast forward heads.

You can apply some heuristic guessing to detect fast forward heads, but
at the very end you will end up with manual selection puzzles.

On one hand we keep care to track the source of a change in the kernel
code by adding signed,acked but on the other hand we don't care about
history correctness. If you look at some file revisions, which you read
from the tree history then you just have patches applied in the wrong
order. 

Maybe nobody cares, but for maintaining customer trees with a bugfix,
stable and experimental branch it's necessary to keep track of
information in a consistent way especially if you have to deal with the
QA department.

I know that there is no simple and fool proof solution for this problem,
but some band-aid to make the reconstruction of history simpler would be
not too bad. 

tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: History messup
  2005-05-09 19:06       ` David Woodhouse
  2005-05-09 19:34         ` H. Peter Anvin
@ 2005-05-11 20:31         ` Petr Baudis
  1 sibling, 0 replies; 11+ messages in thread
From: Petr Baudis @ 2005-05-11 20:31 UTC (permalink / raw)
  To: David Woodhouse; +Cc: H. Peter Anvin, tglx, git, Linus Torvalds

Dear diary, on Mon, May 09, 2005 at 09:06:38PM CEST, I got a letter
where David Woodhouse <dwmw2@infradead.org> told me that...
> On Mon, 2005-05-09 at 12:01 -0700, H. Peter Anvin wrote:
> > Seems like a UUID or a SHA-1 identifier would be better.
> > 
> > However, one can definitely argue that even the meaning of "a 
> > repository" isn't well-defined in the context of git.
> 
> Of course it isn't. But neither is the meaning "a committer" or 
> "an author" or even "a date".
> 
> Including some kind of repo-specific identifier with each commit would
> help us to make sense of the history, just as those other fields do.

FWIW, I recently added .git/branch-name to Cogito, since I needed some
identifier through which to differentiate between the branches
(repositories - it's all blurred in Cogito view) when sending commits
to CIA.

It is strictly per-branch (never to be shared by multiple repositories),
optional, informative and more of a temporary solution for now I had to
cook together in a minute.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-05-11 20:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-09 16:59 History messup Thomas Gleixner
2005-05-09 17:06 ` Thomas Glanzmann
2005-05-09 18:05   ` Thomas Gleixner
2005-05-09 17:27 ` David Woodhouse
2005-05-09 17:48   ` Thomas Gleixner
2005-05-09 19:01     ` H. Peter Anvin
2005-05-09 19:06       ` David Woodhouse
2005-05-09 19:34         ` H. Peter Anvin
2005-05-09 22:08           ` Sean
2005-05-11 17:09             ` Thomas Gleixner
2005-05-11 20:31         ` Petr Baudis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox