git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel.org now has gitweb installed
@ 2005-04-28  1:38 H. Peter Anvin
  2005-04-28  4:17 ` Daniel Jacobowitz
  2005-04-28  7:35 ` David Woodhouse
  0 siblings, 2 replies; 23+ messages in thread
From: H. Peter Anvin @ 2005-04-28  1:38 UTC (permalink / raw)
  To: Git Mailing List

http://www.kernel.org/git/

	-hpa

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28  1:38 kernel.org now has gitweb installed H. Peter Anvin
@ 2005-04-28  4:17 ` Daniel Jacobowitz
  2005-04-28  7:35 ` David Woodhouse
  1 sibling, 0 replies; 23+ messages in thread
From: Daniel Jacobowitz @ 2005-04-28  4:17 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Git Mailing List

On Wed, Apr 27, 2005 at 06:38:01PM -0700, H. Peter Anvin wrote:
> http://www.kernel.org/git/

Thanks!  Now all I crave is a version which can browse the file tree
and file history; but I think we're almost ready for that...

-- 
Daniel Jacobowitz
CodeSourcery, LLC

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28  1:38 kernel.org now has gitweb installed H. Peter Anvin
  2005-04-28  4:17 ` Daniel Jacobowitz
@ 2005-04-28  7:35 ` David Woodhouse
  2005-04-28  8:10   ` Petr Baudis
  1 sibling, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2005-04-28  7:35 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Git Mailing List

On Wed, 2005-04-27 at 18:38 -0700, H. Peter Anvin wrote:
> http://www.kernel.org/git/

Looks like the ordering is wrong. A chronological sort means that
commits which were made three weeks ago, but which Linus only pulled
yesterday, do not show up at the top of the tree.

-- 
dwmw2



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28  7:35 ` David Woodhouse
@ 2005-04-28  8:10   ` Petr Baudis
  2005-04-28  8:29     ` David Woodhouse
  0 siblings, 1 reply; 23+ messages in thread
From: Petr Baudis @ 2005-04-28  8:10 UTC (permalink / raw)
  To: David Woodhouse; +Cc: H. Peter Anvin, Git Mailing List

Dear diary, on Thu, Apr 28, 2005 at 09:35:23AM CEST, I got a letter
where David Woodhouse <dwmw2@infradead.org> told me that...
> On Wed, 2005-04-27 at 18:38 -0700, H. Peter Anvin wrote:
> > http://www.kernel.org/git/
> 
> Looks like the ordering is wrong. A chronological sort means that
> commits which were made three weeks ago, but which Linus only pulled
> yesterday, do not show up at the top of the tree.

  Linus                     ASM (Anonymous Subsystem Maintainer)

    |------------------------.
   A|                        |B
    |                        |
    |                        \-------------\
    |                        :             |
    \------------------------\             |E
   C|                        |D            |
    |                        /-------------/
    |                        |F
    /------------------------/

How would you show that? F E D C B A? F D C A E B?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28  8:10   ` Petr Baudis
@ 2005-04-28  8:29     ` David Woodhouse
  2005-04-28  9:23       ` David Woodhouse
  0 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2005-04-28  8:29 UTC (permalink / raw)
  To: Petr Baudis; +Cc: H. Peter Anvin, Git Mailing List

On Thu, 2005-04-28 at 10:10 +0200, Petr Baudis wrote:
>   Linus                     ASM (Anonymous Subsystem Maintainer)
> 
>     |------------------------.
>    A|                        |B
>     |                        |
>     |                        \-------------\
>     |                        :             |
>     \------------------------\             |E
>    C|                        |D            |
>     |                        /-------------/
>     |                        |F
>     /------------------------/
> 
> How would you show that? F E D C B A? F D C A E B?

Let us assume that C and A were already in Linus' tree (and on our web
page) yesterday. Thus, they should be last. The newly-pulled stuff
should be first -- FEDBCA.

I'd say "depth-first, remote parent first" but that would actually show
show 'A' (as a parent of D) long before it shows C. Walking of remote
parents should stop as soon as we hit a commit which was accessible
through a more local parent, rather than as soon as we hit a commit
which we've already printed. Maybe it should be something like depth-
first, local parent first, but _reversed_?

The latter is what the mailing list feeder does, but that has the
advantage of being about to use 'rev-tree $today ^$yesterday' so we
_know_ we're excluding the ones people have already seen. Hence I
haven't really paid that much attention to getting the order strictly
correct.

(Yes, I know that strictly speaking, git has no concept of 'remote' or
'local' parents. But the ordering of the two parents in a Cogito merge
or pull hasn't changed, has it?)

-- 
dwmw2



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28  8:29     ` David Woodhouse
@ 2005-04-28  9:23       ` David Woodhouse
  2005-04-28 18:55         ` Linus Torvalds
  0 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2005-04-28  9:23 UTC (permalink / raw)
  To: Petr Baudis; +Cc: H. Peter Anvin, Git Mailing List

On Thu, 2005-04-28 at 09:29 +0100, David Woodhouse wrote:
> Let us assume that C and A were already in Linus' tree (and on our web
> page) yesterday. Thus, they should be last. The newly-pulled stuff
> should be first -- FEDBCA.
> 
> I'd say "depth-first, remote parent first" but that would actually show
> show 'A' (as a parent of D) long before it shows C. Walking of remote
> parents should stop as soon as we hit a commit which was accessible
> through a more local parent, rather than as soon as we hit a commit
> which we've already printed.

Walk the tree once. For each commit, count the number of _children_.
That's not hard -- each new commit you find below HEAD has one child to
start with, then you increment that figure by one each time you find
another path to the same commit.

When printing, you walk the tree depth-first, remote-parent-first. If
you hit a commit with multiple children, decrement its count by one. If
the count is still non-zero, ignore that commit (and its parents) and
continue. If the count _is_ zero, then this is the "most local" path to
the commit in question, so print it and continue to process its
parents...

(Actually I'd probably do it by adding real pointers to the children
instead of using a counter. Operations like convert-cache would be far
better off working that way round, and 'cg comments' is going to need to
do something very similar to convert-cache.)

-- 
dwmw2



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28  9:23       ` David Woodhouse
@ 2005-04-28 18:55         ` Linus Torvalds
  2005-04-28 21:20           ` David Woodhouse
  2005-04-28 21:21           ` Junio C Hamano
  0 siblings, 2 replies; 23+ messages in thread
From: Linus Torvalds @ 2005-04-28 18:55 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Petr Baudis, H. Peter Anvin, Git Mailing List



On Thu, 28 Apr 2005, David Woodhouse wrote:
> 
> Walk the tree once. For each commit, count the number of _children_.
> That's not hard -- each new commit you find below HEAD has one child to
> start with, then you increment that figure by one each time you find
> another path to the same commit.
> 
> When printing, you walk the tree depth-first, remote-parent-first.

No, that really sucks. 

Realize that "remote" and "local" parents don't really exist. They have no 
meaning. I've considered sorting the parents by the sha1 name, but I've 
left that for now.

Anyway, the reason remote and local don't matter is that if somebody else
merges with me, and I just pull the result without having any changes in 
my tree, we just "fast-forward" to that other side, because otherwise you 
can never "converge" on anything (people merging each others trees would 
always create a new commit, for no good reason).

What does that mean? It means that my local tree now became the _remote_ 
parent, even though it was always local to my tree.

So if you look at remote vs local, you're _guaranteed_ to mess up. It has 
no meaning.

So what you can do is:
 - if there is one parent, just always walk straight down
 - if it's a merge, add the parents _in_date_order_ to the list of things 
   to do, and then pop the most recent one.

Really. You say that dates don't matter, but they _do_ actually matter a
lot more than "remote/local" does. At least they have meaning.

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 18:55         ` Linus Torvalds
@ 2005-04-28 21:20           ` David Woodhouse
  2005-04-28 21:40             ` Linus Torvalds
                               ` (2 more replies)
  2005-04-28 21:21           ` Junio C Hamano
  1 sibling, 3 replies; 23+ messages in thread
From: David Woodhouse @ 2005-04-28 21:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, H. Peter Anvin, Git Mailing List

On Thu, 2005-04-28 at 11:55 -0700, Linus Torvalds wrote:
> Anyway, the reason remote and local don't matter is that if somebody else
> merges with me, and I just pull the result without having any changes in 
> my tree, we just "fast-forward" to that other side, because otherwise you 
> can never "converge" on anything (people merging each others trees would 
> always create a new commit, for no good reason).
> 
> What does that mean? It means that my local tree now became the _remote_ 
> parent, even though it was always local to my tree.

Hmm, that's true; albeit unfortunate. 

Still, using the date isn't any better. It'll give results which are
about as random as just sorting by the sha1 of each parent.

Yes, the ordering of the parents in a merge is probably meaningless in
the general case, but so is the date.

The best we could probably do, from a theoretical standpoint, is to look
at the paths via each parent to a common ancestor, and look at how many
of the commits on each path were done by the same committer. Even that
isn't ideal, and it's probably fairly expensive -- but it's pointless to
pretend we can infer anything from _either_ the dates or the ordering of
the parents in a merge.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 18:55         ` Linus Torvalds
  2005-04-28 21:20           ` David Woodhouse
@ 2005-04-28 21:21           ` Junio C Hamano
  2005-04-28 21:23             ` David Woodhouse
                               ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Junio C Hamano @ 2005-04-28 21:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Woodhouse, Petr Baudis, H. Peter Anvin, Git Mailing List

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> So what you can do is:
LT>  - if there is one parent, just always walk straight down
LT>  - if it's a merge, add the parents _in_date_order_ to the list of things 
LT>    to do, and then pop the most recent one.
LT> Really. You say that dates don't matter, but they _do_ actually matter a
LT> lot more than "remote/local" does. At least they have meaning.

On a related topic, I have two questions on commit objects.

1. Currently, commit-tree does not seem to verify that all its
   parent SHA1's actually name valid commit objects.  Is this
   intentional?

I cannot see a good practical reason to commit a new version
that claim to be descendant of some SHA1 you know exists in
somebody else's tree, without actually having that object also
in your SHA1_FILE_DIRECTORY.  Otherwise how did you merge with
it in the first place?  For that reason, I expect the answer to
this question to be "no it was just being lazy.  Go ahead if you
really care."

2. Assuming that we do want to enforce that parent fields of a
   commit object name valid commit objects, is it OK to also
   require that the commit timestamp of a child object is not in
   the future relative to any and all of its parent commit
   objects (I'm talking about the timestamp of committer field
   not author field, although your e-mail patch acceptance
   procedure seems to be giving it the same timestamp right
   now)?

I have been wondering if imposing these two requirement has some
negative effects, but I do not offhand see any.  And these
requirements may make implementation of git log viewer simpler
when the user specifies "I want to view commit between these
ones---give me a linearlized list of commits."  When following
the ancestor chain from the current top, we can immediately stop
upon seeing a commit made before the timestamp of the named
bottom one.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:21           ` Junio C Hamano
@ 2005-04-28 21:23             ` David Woodhouse
  2005-04-28 21:44               ` Junio C Hamano
  2005-04-28 22:59               ` Gerhard Schrenk
  2005-04-28 21:38             ` David Woodhouse
  2005-04-28 21:44             ` Linus Torvalds
  2 siblings, 2 replies; 23+ messages in thread
From: David Woodhouse @ 2005-04-28 21:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Petr Baudis, H. Peter Anvin, Git Mailing List

On Thu, 2005-04-28 at 14:21 -0700, Junio C Hamano wrote:
> 2. Assuming that we do want to enforce that parent fields of a
>    commit object name valid commit objects, is it OK to also
>    require that the commit timestamp of a child object is not in
>    the future relative to any and all of its parent commit
>    objects

No. Time is utterly meaningless -- it's perfectly normal for clocks to
be out of sync. We really don't want to fall into the trap of assigning
any meaning to the timestamp.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:21           ` Junio C Hamano
  2005-04-28 21:23             ` David Woodhouse
@ 2005-04-28 21:38             ` David Woodhouse
  2005-04-28 21:49               ` Junio C Hamano
  2005-04-28 21:44             ` Linus Torvalds
  2 siblings, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2005-04-28 21:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Petr Baudis, H. Peter Anvin, Git Mailing List

On Thu, 2005-04-28 at 14:21 -0700, Junio C Hamano wrote:
> "I want to view commit between these ones---give me a linearlized list
> of commits."  When following the ancestor chain from the current top,
> we can immediately stop upon seeing a commit made before the timestamp
> of the named bottom one.

This absolutely must not be timestamp based. If I ask for a list of
commits before 2.6.12-rc3 and 2.6.12-rc4 I _really_ want to see those
commits which happened before 2.6.12-rc3 but in a remote tree which was
only later pulled. That's what 'rev-tree AAAAAA ^BBBBBB' already gives
you.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:20           ` David Woodhouse
@ 2005-04-28 21:40             ` Linus Torvalds
  2005-04-28 21:47               ` David Woodhouse
  2005-04-28 21:50             ` H. Peter Anvin
  2005-04-28 21:52             ` H. Peter Anvin
  2 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2005-04-28 21:40 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Petr Baudis, H. Peter Anvin, Git Mailing List



On Thu, 28 Apr 2005, David Woodhouse wrote:
> 
> Still, using the date isn't any better. It'll give results which are
> about as random as just sorting by the sha1 of each parent.

Well, it does use real information, and it is repeatable. And I don't see 
why you say that the date is meaningless, when it clearly isn't. The date 
absolutely does have meaning. 

Not having a global clock doesn't mean that clocks go away. It just means 
that they don't generate a total sort. They still generate a _partial_ 
sort, though, and it's a very valid partial sort.

The fact is, this is how the world works in real life too. Relativity 
doesn't make time "pointless". You still have "before" and "after" for 
almost all relevant events. The fact that not _all_ events can be sorted 
by "before" and "after", and different observers can disagree about some 
of the ordering does not mean that causality has gone away and that time 
is meaningless.

The same is true in a distributed system. Time still exists, and is still 
meaningful even outside the direct "causality" links implied by the 
parents. People probably discussed things, and there are methods of 
communication other than just direct parent links, and while you're not 
_guaranteed_ that "before" and "after" always makes sense, they definitely 
still exist 99% of the time.

> The best we could probably do, from a theoretical standpoint, is to look
> at the paths via each parent to a common ancestor, and look at how many
> of the commits on each path were done by the same committer.

That's quite expensive. 

> Even that isn't ideal, and it's probably fairly expensive -- but it's
> pointless to pretend we can infer anything from _either_ the dates or
> the ordering of the parents in a merge.

Wrong. The date _does_ have meaning. It shows which of the parents was 
more recent, which indirectly is a hint about which side had more activity 
going on. 

In other words, it _is_ meanginful. Maybe it's a _statistical_ meaning 
("that side is probably the active one, because it has the last commit"), 
but it's a meaning.

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:21           ` Junio C Hamano
  2005-04-28 21:23             ` David Woodhouse
  2005-04-28 21:38             ` David Woodhouse
@ 2005-04-28 21:44             ` Linus Torvalds
  2 siblings, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2005-04-28 21:44 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: David Woodhouse, Petr Baudis, H. Peter Anvin, Git Mailing List



On Thu, 28 Apr 2005, Junio C Hamano wrote:
> 
> 1. Currently, commit-tree does not seem to verify that all its
>    parent SHA1's actually name valid commit objects.  Is this
>    intentional?

No. Me lazy. I think we should check as many _cheap_ things as possible,
and checking whether a parent at least superficially looks like a real
commit object is certainly cheap.

> 2. Assuming that we do want to enforce that parent fields of a
>    commit object name valid commit objects, is it OK to also
>    require that the commit timestamp of a child object is not in
>    the future relative to any and all of its parent commit
>    objects (I'm talking about the timestamp of committer field
>    not author field, although your e-mail patch acceptance
>    procedure seems to be giving it the same timestamp right
>    now)?

No, this is not ok. Clock skew is real, and somebody may have a 
misconfigured machine. Being careful about integrity is good, but trying 
to enforce time flow in a distributed environment is just being anal.

Maybe a warning.

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:23             ` David Woodhouse
@ 2005-04-28 21:44               ` Junio C Hamano
  2005-04-28 22:04                 ` Linus Torvalds
  2005-04-28 22:59               ` Gerhard Schrenk
  1 sibling, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2005-04-28 21:44 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, Petr Baudis, H. Peter Anvin, Git Mailing List

>>>>> "DW" == David Woodhouse <dwmw2@infradead.org> writes:

DW> On Thu, 2005-04-28 at 14:21 -0700, Junio C Hamano wrote:
>> 2. Assuming that we do want to enforce that parent fields of a
>> commit object name valid commit objects, is it OK to also
>> require that the commit timestamp of a child object is not in
>> the future relative to any and all of its parent commit
>> objects

DW> No. Time is utterly meaningless -- it's perfectly normal for clocks to
DW> be out of sync. We really don't want to fall into the trap of assigning
DW> any meaning to the timestamp.

If that is really the case, shouldn't we do one of the
following:

 (1) Timestamp is meaningless.  Stop recording it in the commit
     objects.

 (2) Keep recording meaningless timestamp in the commit objects,
     because otherwise it would break backward compatibility.
     However, stop looking at timestamp in commit.c; especially
     pop_most-recent_commit() is meaningless hance what rev-list
     does.

 (3) Require the proper ordering in the timestamp as I
     suggested.  Users should take note and make corrective
     action if their clocks are _way_ out of sync.

I do not think we want to do either (1) or (2).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:40             ` Linus Torvalds
@ 2005-04-28 21:47               ` David Woodhouse
  0 siblings, 0 replies; 23+ messages in thread
From: David Woodhouse @ 2005-04-28 21:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, H. Peter Anvin, Git Mailing List

On Thu, 2005-04-28 at 14:40 -0700, Linus Torvalds wrote:
> Wrong. The date _does_ have meaning. It shows which of the parents was 
> more recent, which indirectly is a hint about which side had more activity 
> going on. 
> 
> In other words, it _is_ meanginful. Maybe it's a _statistical_ meaning 
> ("that side is probably the active one, because it has the last commit"), 
> but it's a meaning.

It's not entirely clear what 'active' is supposed to be useful for in
this instance. You could just as well count the commits between the
merge and the common ancestor, if you want to see which side was most
_active_ -- but that isn't helpful for deciding the order in which
'cg-log' should show commits.

What you really want there is 'local' vs. 'remote', because people want
to see the order in which changesets arrived in the _local_ repository
-- if the last thing you did was pull from me, people want all my
changesets to be at the top; regardless of who last committed to their
tree before the merge -- i.e. regardless of whether I did a last-minute
commit before you pulled, or whether you'd done another commit to your
tree immediately before pulling.

As you rightly point out, the local/remote information isn't really
available in an easy form -- certainly not from the ordering of the
parents in a merge commit. But let's not fool ourselves that we can
piece it together from the date either.

OK, the date _is_ meaningful in a way, but only in the same way that the
author's name and IRC address information is meaningful. Of course we
didn't include it for _nothing_, but it's outside the scope of git
itself; it isn't part of the useful information which git should care
about.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:38             ` David Woodhouse
@ 2005-04-28 21:49               ` Junio C Hamano
  0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2005-04-28 21:49 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, Petr Baudis, H. Peter Anvin, Git Mailing List

>>>>> "DW" == David Woodhouse <dwmw2@infradead.org> writes:

DW> On Thu, 2005-04-28 at 14:21 -0700, Junio C Hamano wrote:
>> "I want to view commit between these ones---give me a linearlized list
>> of commits."  When following the ancestor chain from the current top,
>> we can immediately stop upon seeing a commit made before the timestamp
>> of the named bottom one.

DW> This absolutely must not be timestamp based. If I ask for a list of
DW> commits before 2.6.12-rc3 and 2.6.12-rc4 I _really_ want to see those
DW> commits which happened before 2.6.12-rc3 but in a remote tree which was
DW> only later pulled. That's what 'rev-tree AAAAAA ^BBBBBB' already gives
DW> you.

How true.  I stand corrected.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:20           ` David Woodhouse
  2005-04-28 21:40             ` Linus Torvalds
@ 2005-04-28 21:50             ` H. Peter Anvin
  2005-04-28 21:52             ` H. Peter Anvin
  2 siblings, 0 replies; 23+ messages in thread
From: H. Peter Anvin @ 2005-04-28 21:50 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List

David Woodhouse wrote:
> 
> Hmm, that's true; albeit unfortunate. 
> 
> Still, using the date isn't any better. It'll give results which are
> about as random as just sorting by the sha1 of each parent.
> 
> Yes, the ordering of the parents in a merge is probably meaningless in
> the general case, but so is the date.
> 
> The best we could probably do, from a theoretical standpoint, is to look
> at the paths via each parent to a common ancestor, and look at how many
> of the commits on each path were done by the same committer. Even that
> isn't ideal, and it's probably fairly expensive -- but it's pointless to
> pretend we can infer anything from _either_ the dates or the ordering of
> the parents in a merge.
> 

Perhaps the right thing to do is to draw a graph instead?

	-hpa

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:20           ` David Woodhouse
  2005-04-28 21:40             ` Linus Torvalds
  2005-04-28 21:50             ` H. Peter Anvin
@ 2005-04-28 21:52             ` H. Peter Anvin
  2005-04-28 22:12               ` Linus Torvalds
  2005-04-28 22:12               ` David Woodhouse
  2 siblings, 2 replies; 23+ messages in thread
From: H. Peter Anvin @ 2005-04-28 21:52 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List

David Woodhouse wrote:
> 
> Hmm, that's true; albeit unfortunate. 
> 
> Still, using the date isn't any better. It'll give results which are
> about as random as just sorting by the sha1 of each parent.
> 
> Yes, the ordering of the parents in a merge is probably meaningless in
> the general case, but so is the date.
> 
> The best we could probably do, from a theoretical standpoint, is to look
> at the paths via each parent to a common ancestor, and look at how many
> of the commits on each path were done by the same committer. Even that
> isn't ideal, and it's probably fairly expensive -- but it's pointless to
> pretend we can infer anything from _either_ the dates or the ordering of
> the parents in a merge.
> 

I thought about this for a few seconds (I really should do that more 
often...) and realized what it is you want: you want a primary search 
criterion which is "when did event X become visible to me", where "me" 
in this case is the web tool.  That is not repository information, but 
it is perfectly possible for the webtool to be aware of what it has 
previously seen and when.

And yes, this ordering is clearly different for each observer.

	-hpa

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:44               ` Junio C Hamano
@ 2005-04-28 22:04                 ` Linus Torvalds
  0 siblings, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2005-04-28 22:04 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: David Woodhouse, Petr Baudis, H. Peter Anvin, Git Mailing List



On Thu, 28 Apr 2005, Junio C Hamano wrote:
> 
> If that is really the case, shouldn't we do one of the
> following:

No. It's not the case that time-stamps are meaningless. 

The thing about distributed stuff is that time gets "fuzzy". It doesn't go 
away. It's still very valid to say "this was done yesterday".

But what gets fuzzy is "before" and "after". For two reasons:

 - time isn't synchronized, and clocks can be off. Usually by just a 
   little bit, but sometimes you'll find just plain badly maintained 
   machines, and time can be a year or two off.

   Ergo: time is a _hint_. It's usually a pretty good hint, but it's a 
   hint.

 - the "parent" relationship is the only "hard" before/after thing that 
   git knows about, but it ignores a lot of real-world interaction, so
   thinking that it is the _only_ before/after measure is ignoring all the
   other communication in a system.

   So parenthood guarantees that something happened "before", but _not_ 
   being directly related doesn't mean that they were totally independent. 
   There's no fixed "speed of light" that defines some absolute "cone of 
   reachability".

So time is relevant, but it's more of a hint than anything absolute. 
Anything that -depends- on time is a bug waiting to happen, but something 
that uses time to visualize things makes sense.

The big advantage with time is that it's cheap. If you want to do a full 
reachability analysis, you have to look at the whole revision tree. That's 
quite possible RIGHT NOW, but it simply ill not be practical in a year, 
when we have 15,000 commits.

So "time" ends up being an approximation for "doing it right".

As an example: it's quite expensive to ask "was this commit part of 
2.6.12-rc3?" because that involves knowing the whole set of commits 
involved in 2.6.12-rc3. Which in turn involves walking the whole revision 
tree starting at 2.6.12-rc3 downwards. 

That's exactly what "rev-tree" does, though. "rev-tree" will do the whole
reachability thing, and as a result you can see whether something was in
2.6.12-rc3 or not. But just for fun - time how long it takes for
"rev-tree" to output its first entry, and how long it takes for "rev-list"
to print its first line. 

Hint: do it with a cold-cache "sparse" tree. "rev-list" will start
outputting data immediately, and work it out as it goes along. "rev-tree"  
will think for some time, and then blast the data out.

In other words: rev-list is what you want for something like "git log", 
because you care about _latency_ of the result.

And that's why it uses time. It's an approximation, but it is an 
approximation that has meaning in real life.

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:52             ` H. Peter Anvin
@ 2005-04-28 22:12               ` Linus Torvalds
  2005-04-28 22:12               ` David Woodhouse
  1 sibling, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2005-04-28 22:12 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: David Woodhouse, Petr Baudis, Git Mailing List



On Thu, 28 Apr 2005, H. Peter Anvin wrote:
>
> I thought about this for a few seconds (I really should do that more 
> often...) and realized what it is you want: you want a primary search 
> criterion which is "when did event X become visible to me", where "me" 
> in this case is the web tool.  That is not repository information, but 
> it is perfectly possible for the webtool to be aware of what it has 
> previously seen and when.

This is exactly what rev-tree does, and how things like the commit emails 
happen.

The problem is that since it's observer-dependent, it's not generally very 
useful for something like a web interface. You really don't want to keep 
track of what everybody has seen ;)

What you _can_ try to keep track of is what some "special observer" has
seen. That's really quite complicated too, but if you do a web interface,
the "special observer" is yourself. Then at every time you mirror the
thing, you need to remember what your "last view" was, and you base your
"new view" on the fact that you know what you saw last time, so you know
which things are new to _you_.

But it really means that each web interface ends up showing quite
_different_ information, and the particular information you show ends up
being dependent on when you started looking at the tree (and how often you
re-generate new views).

This really is why "time" is interesting. Because it's simple, and 
observers can agree about it (not because the time was the same, but 
because each observer just agrees that time is "whatever was reported as 
the local time at the point the action happened").

		Linus

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:52             ` H. Peter Anvin
  2005-04-28 22:12               ` Linus Torvalds
@ 2005-04-28 22:12               ` David Woodhouse
  2005-04-29  2:46                 ` Jan Harkes
  1 sibling, 1 reply; 23+ messages in thread
From: David Woodhouse @ 2005-04-28 22:12 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List

On Thu, 2005-04-28 at 14:52 -0700, H. Peter Anvin wrote:
> I thought about this for a few seconds (I really should do that more 
> often...) and realized what it is you want: you want a primary search 
> criterion which is "when did event X become visible to me", where "me" 
> in this case is the web tool.  That is not repository information, but 
> it is perfectly possible for the webtool to be aware of what it has 
> previously seen and when.
> 
> And yes, this ordering is clearly different for each observer.

The mailing list does it with tags -- it remembers the 'last seen
commit' and then effectively does 'rev-tree HEAD ^LASTSEEN', except that
I make a primitive attempt to get the ordering a little better than what
I get from rev-tree. But since the mailing list runs are hourly, I
really can get away with a _primitive_ attempt. That's why I hadn't
noticed the local/remote ordering problem that Linus pointed out.

It's not clear how you'd attempt to track local history for the general
case though -- the whole concept of a 'local' branch being special is
anathema to git. You'd have to hack it into some auxiliary storage, as I
do with tags -- but to get a fullly correct ordering it'd have to track
at least every locally-performed merge, and you really don't want to be
doing that kind of thing.

You might perhaps attempt to find a path through the graph which takes
in as many commits as possible where committer == `logname`@`hostname`
-- but as Linus and I already said, that's expensive.

I'm not entirely sure what the answer is; but it isn't parent ordering
and it isn't dates.

Using dates might be a nice quick approximation, but that really isn't
good enough. 

I wonder if we could try to enforce some meaning for dates though....
Currently, 'rev-tree AAAA ^BBBB' has to build the _entire_ tree for BBBB
back to the beginning, so it knows where to stop when following AAAA. 

However, if we _do_ take Junio's suggesting of enforcing monotonicity,
then we'll always know that the parents of a given commit will have a
timestamp which is older than its own timestamp. 

So given the task "list commits between 2.6.12-rc3 and 2.6.12-rc4' we
could look at the timestamp of rc3, and immediately follow the rc4
parents until we start seeing commits which are older than rc3. Then
each time we hit a commit in the parents of rc4 which is older than rc3
is, we continue doing a breadth-first search from rc3 until all the
parents we're looking at are older than the parent of rc4 which we're
currently considering. Etc. 

That means that the common case of "in A but not in B" can at least be
handled relatively efficiently without having to wait while it tracks
the history all the way back to the beginning. I still don't like it
much though...

-- 
dwmw2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 21:23             ` David Woodhouse
  2005-04-28 21:44               ` Junio C Hamano
@ 2005-04-28 22:59               ` Gerhard Schrenk
  1 sibling, 0 replies; 23+ messages in thread
From: Gerhard Schrenk @ 2005-04-28 22:59 UTC (permalink / raw)
  To: Git Mailing List

* David Woodhouse <dwmw2@infradead.org> [2005-04-28 23:23]:
 
> No. Time is utterly meaningless -- 

This is fundamentally wrong. Space-time and causality has a *very*
important meaning.  If don't use this information (directly or
indirectly) in your data modell or history graph you do something very
stupid. You simply won't optimize for the common case because you won't
scale with the fundamental physical laws of information exchange and
syncronisation, you just kind of break space-time-symmetrie. Ever
compared feynman diagrams to merge diagrams?

> it's perfectly normal for clocks to be out of sync.

Yes even special relativity just boils down to "there is no absolut
simultaneity". So what? 

I'll predict if you break causality your kernel will suddenly
destabilize and explode like a nuclear bomb ;-)

Gerhard

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kernel.org now has gitweb installed
  2005-04-28 22:12               ` David Woodhouse
@ 2005-04-29  2:46                 ` Jan Harkes
  0 siblings, 0 replies; 23+ messages in thread
From: Jan Harkes @ 2005-04-29  2:46 UTC (permalink / raw)
  To: Git Mailing List

On Thu, Apr 28, 2005 at 11:12:52PM +0100, David Woodhouse wrote:
> You might perhaps attempt to find a path through the graph which takes
> in as many commits as possible where committer == `logname`@`hostname`
> -- but as Linus and I already said, that's expensive.
> 
> I'm not entirely sure what the answer is; but it isn't parent ordering
> and it isn't dates.

Perhaps a lamport clock?

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2005-04-29  2:41 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-28  1:38 kernel.org now has gitweb installed H. Peter Anvin
2005-04-28  4:17 ` Daniel Jacobowitz
2005-04-28  7:35 ` David Woodhouse
2005-04-28  8:10   ` Petr Baudis
2005-04-28  8:29     ` David Woodhouse
2005-04-28  9:23       ` David Woodhouse
2005-04-28 18:55         ` Linus Torvalds
2005-04-28 21:20           ` David Woodhouse
2005-04-28 21:40             ` Linus Torvalds
2005-04-28 21:47               ` David Woodhouse
2005-04-28 21:50             ` H. Peter Anvin
2005-04-28 21:52             ` H. Peter Anvin
2005-04-28 22:12               ` Linus Torvalds
2005-04-28 22:12               ` David Woodhouse
2005-04-29  2:46                 ` Jan Harkes
2005-04-28 21:21           ` Junio C Hamano
2005-04-28 21:23             ` David Woodhouse
2005-04-28 21:44               ` Junio C Hamano
2005-04-28 22:04                 ` Linus Torvalds
2005-04-28 22:59               ` Gerhard Schrenk
2005-04-28 21:38             ` David Woodhouse
2005-04-28 21:49               ` Junio C Hamano
2005-04-28 21:44             ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).