Git development

Git development
 help / color / mirror / Atom feed

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Linus Torvalds @ 2006-10-20 23:59 UTC (permalink / raw)
  To: Jeff Licquia; +Cc: Jan Hudec, bazaar-ng, git, Jakub Narebski
In-Reply-To: <1161382416.9241.19.camel@localhost.localdomain>

On Fri, 20 Oct 2006, Jeff Licquia wrote:
> 
> After this conflict is resolved, merging from b causes conflicts, while
> merging from c appears to work fine.  This continues until b merges from
> a (and resolves a conflict in a similar manner to a), at which time
> merging/pulling works as you'd expect between the branches.  Whenever b
> is marked as conflicting before it merges from a, bzr preserves b's
> changes by moving b's modified file.

This sounds somewhat like what I think BK did. I'm not sure if BK actually 
marked it as a conflict or whether BK just warned about "changes to 
deleted file" or something similar, but it didn't entirely _silently_ 
throw them away.

But I hope this shows some of the basic problems.

The much more _serious_ problem of "file identity" tracking is actually 
that you can't track partial file movement or file copies sanely. The 
thing is, tracking things at file boundaries simply is fundamnetally a 
broken notion, simply because _code_ doesn't get done at file boundaries.

Both of these things that git can actually do. Admittedly it does not do 
that in any _released_ version, so you'd have to work with the development 
branch, and it's a fairly early thing, but currently it can actually 
notice that our "revision.c" file largely came from the "rev-list.c" file 
that still exists!

And btw, that's not just some random feature that happened to get 
implemented last week. Yes, it actually _did_ get implemented last week, 
but this was something I outlined when I started git in April of last 
year, and tried to explain to people WHY TRACKING FILE ID'S ARE WRONG!

You can find me explaining these things to people in April-2005, which 
should tell you something: the initial revision of "git" was on Thursday, 
April 7. So the lack of file identity tracking has been controversial from 
the very beginning, but I was right then, and I'm right now.

Because the _fact_ is, that as long as you track stuff on a file basis, 
you're _never_ going to be able to do the things that git alreadt does, 
and that are very natural.

Here's the real-world example of something that git CAN DO TODAY:

 - we used to have a file called "rev-list.c", which did a lot of the 
   commit history revision traversal, and is the source of the git command 
   "git rev-list".

 - I (and others) extended it a lot, and turned it into a more generic 
   library interface, so that other commands could traverse the commit 
   graph on their own, rather than forking and executing "git-rev-list" 
   and piping the output between them.

 - as a result, the old "rev-list.c" still exists (except it was renamed 
   to "builtin-rev-list.c" since it's now a builtin command to the main 
   "git" binary). 

 - HOWEVER, a lot of the actual code got split into the library file, 
   called "revision.c", which contains the real smarts of the program.

See? There was a file rename involved (rev-list.c => builtin-rev-list.c), 
but that actually happened after a lot of the really _interesting_ code 
had been excised from that file, and put into the new internal library 
file (revision.c).

Now, as a result, in many ways the rename is _much_ less interesting than 
the question about the history of the code in "revision.c" (because that's 
really some very core code). And that was never a rename at all. That was 
just a file create, where a lot of the contents happened to come from a 
file that continued to exist.

Wouldn't you want "annotate" to be able to follow this kind of data 
movement? Notice how there is no "file" that moved at all. Only code that 
moved between files.

I tell you: as long as you work with "file ID's", you'll always be 
inferior. You'll never be able to see that some code was copied 
_partially_ from one file into another. You'll never be able to see an 
important function moving between file boundaries.

Unless you work with "git", that is. Because git isn't so _stupid_ as to 
think that file boundaries matter. Git knows better. The only thing that 
matters is the actual _data_, and file boundaries are just one way of 
delimiting that data.

Just try it out. Get the "next" branch of the git repository (that's the 
"stable development" branch in git.git - ie it's going to be in the next 
release and is expected to work, unless some of the more "experimental 
development" that is in the "pu" branch - pu = proposed updates), compile 
it, and run

	git pickaxe -C revision.c | less -S

and marvel. Marvel at my shining intelligence (and the small matter of 
programming, which was all done by Junio, but I'm taking all the credit 
_anyway_, because *dammit* I talked about this last year when people 
didn't understand! And besides, I always take all the credit regardless, 
so what are you whining about? Get off my back!).

More seriously, Junio really did a kick-ass job. I really had nothing at 
all to do with it, and deserve no real credit. But I _did_ forsee it, and 
yes, it really is about the fact that git tracks _contents_.

As somebody smarter that I have said (*): "I'm always right, but this time 
I'm even more right than usual".

			Linus

(*) Just kidding. It was me. Of course.

^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Aaron Bentley @ 2006-10-20 23:33 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Jakub Narebski, bazaar-ng, git
In-Reply-To: <20061020224030.GL20017@pasky.or.cz>

[-- Attachment #1: Type: text/plain, Size: 2835 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Petr Baudis wrote:
> Dear diary, on Fri, Oct 20, 2006 at 05:34:39PM CEST, I got a letter
> where Aaron Bentley <aaron.bentley@utoronto.ca> said that...
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Jakub Narebski wrote:
>>> Aaron Bentley wrote:
>>>> In Bazaar bundles, the text of the diff is an integral part of the data.
>>>> It is used to generate the text of all the files in the revision.
>>>
>>> I thought that the diff was combined diff of changes.
>> It is.  It's a description of how to produce revision X given revision
>> Y, where Y is the last-merged mainline revision.
> 
> Aha, so by default a bundle can carry just a _single_ revision?

No, bundles contain 1 or more revisions.  They contain all the ancestors
of X that are not ancestors of Y.

Only the diff from X to Y is shown, but the diffs for all other
revisions are present in the MIME-encoded section.

Consider these four revisions in a straight-line ancestry: a, b, c, d.
'a' is a common ancestor.  b, c and d are the revisions that are missing
from the target repository.

A default bundle will contain

metadata for d
diff from a -> d in plaintext
metadata for c
diff from b -> c in MIME encoding
metadata for b
diff from a -> b in MIME encoding

To install b, the diff for a->b is applied to a.  To install c, the diff
for b->c is applied to b.  To install d, the diff for a -> d is applied
to a.

Doing a diff from a -> d instead of from c -> d introduces some
redundancy, of course.  But we do that because we want an overview diff.

> That doesn't sound right either, because then it wouldn't make sense to
> talk about "combined" or "simple" diffs. So I guess sending a bundle
> really is taking n revisions at your side, bundling them to a single
> diff and when the other side takes it, it will result in a single
> revision?

No, it copies the revisions verbatim, and we are careful to avoid data loss.

> Hmm, but that doesn't sound right either, that's certainly no revolting
> functionality and seems to be in contradiction with previous bundles
> description. But if it doesn't squash the changes, I don't see how the
> combined diff can be integral part of the data. Sorry, I don't get it.

It's because there's no other diff in the bundle that produces 'd'.

>> I've attached an example of what a combined patch-by-patch bundle looks
>> like.
> 
> But that's the one there's no UI to select? Or where is the combined
> diff?

That is the one that doesn't have UI to select it.  I've attached a
normal bundle for comparison.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFOVzR0F+nu1YWqI0RAkACAJ4z2SJZgelZLfhoFKhEZbmvRIXMjACfag+h
6j+5vvIeHt7xMZOvp6CUcPk=
=33G4
-----END PGP SIGNATURE-----

[-- Attachment #2: hello-world-default.patch --]
[-- Type: text/x-patch, Size: 1884 bytes --]

# Bazaar revision bundle v0.8
#
# message:
#   Added 'world'
# committer: Aaron Bentley <abentley@panoramicfeedback.com>
# date: Fri 2006-10-20 11:30:21.903000116 -0400

=== added directory  // file-id:TREE_ROOT
=== added file world // file-id:world-20061020152929-12bknd8mm9mx48as-1
--- /dev/null
+++ world
@@ -0,0 +1,1 @@
+Hello, world

# revision id: abentley@panoramicfeedback.com-20061020153021-b5fcea14e9cd2b34
# sha1: 6d553e72158aaa76c258d98c15cd24922d171cd9
# inventory sha1: 64af82c4d81d9d6ad4f33fc734d32c2a1eaa0df5
# parent ids:
#   abentley@panoramicfeedback.com-20061020152951-10cff5ff5a51e9a2
# base id: null:
# properties:
#   branch-nick: bar

# message:
#   Capitalized
# committer: Aaron Bentley <abentley@panoramicfeedback.com>
# date: Fri 2006-10-20 11:29:51.953999996 -0400

=== modified file world // encoding:base64
LS0tIHdvcmxkCisrKyB3b3JsZApAQCAtMSwxICsxLDEgQEAKLWhlbGxvCitIZWxsbwoK

=== modified directory  // last-changed:abentley@panoramicfeedback.com-20061020
... 152951-10cff5ff5a51e9a2
# revision id: abentley@panoramicfeedback.com-20061020152951-10cff5ff5a51e9a2
# sha1: f7b79934bc3b0a944e35168b5df6b106c5b29ebf
# inventory sha1: 1400d56451752300cc31c9c94ff7ee2188e8ef8c
# parent ids:
#   abentley@panoramicfeedback.com-20061020152935-64bde004f622131f
# properties:
#   branch-nick: bar

# message:
#   initial commit
# committer: Aaron Bentley <abentley@panoramicfeedback.com>
# date: Fri 2006-10-20 11:29:35.536999941 -0400

=== added directory  // file-id:TREE_ROOT
=== added file world // file-id:world-20061020152929-12bknd8mm9mx48as-1 // enco
... ding:base64
LS0tIC9kZXYvbnVsbAorKysgd29ybGQKQEAgLTAsMCArMSwxIEBACitoZWxsbwoK

# revision id: abentley@panoramicfeedback.com-20061020152935-64bde004f622131f
# sha1: 0728f761b891b257f0a71e2e360799eec080cd21
# inventory sha1: e52e030ea40f6bf5da78f4e8eb8efcd072b0930a
# properties:
#   branch-nick: bar


^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Jeff Licquia @ 2006-10-20 23:39 UTC (permalink / raw)
  To: Robert Collins; +Cc: bazaar-ng, git
In-Reply-To: <1161386129.13697.63.camel@localhost.localdomain>

On Sat, 2006-10-21 at 09:15 +1000, Robert Collins wrote:
> I meant to add, that I think inference is a great tool to use as an
> adjunct to whatever explicit data one can capture.

If you ask me, that's the most interesting idea in this whole thread.

^ permalink raw reply

* Re: [ANNOUNCE] GIT 1.4.3
From: Junio C Hamano @ 2006-10-20 23:35 UTC (permalink / raw)
  To: git; +Cc: linux-kernel
In-Reply-To: <7vejt5xjt9.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

>  - git-diff paginates its output to the tty by default.  If this
>    irritates you, using LESS=RF might help.

I am considering the following to address irritation some people
(including me, actually) are experiencing with this change when
viewing a small (or no) diff.  Any objections?

diff --git a/pager.c b/pager.c
index dcb398d..8bd33a1 100644
--- a/pager.c
+++ b/pager.c
@@ -50,7 +50,7 @@ void setup_pager(void)
 	close(fd[0]);
 	close(fd[1]);
 
-	setenv("LESS", "-RS", 0);
+	setenv("LESS", "FRS", 0);
 	run_pager(pager);
 	die("unable to execute pager '%s'", pager);
 	exit(255);

^ permalink raw reply related

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Petr Baudis @ 2006-10-20 23:28 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git, bazaar-ng
In-Reply-To: <ehblrq$lh3$1@sea.gmane.org>

Dear diary, on Sat, Oct 21, 2006 at 01:24:51AM CEST, I got a letter
where Jakub Narebski <jnareb@gmail.com> said that...
> Robert Collins wrote:
> 
> > However, I'm still convinced that tracking the user intention of renames
> > leads to a slicker system than renames via inference.
> 
> Well, there was (abandoned for now) idea of rr2-cache, the cache of how
> renames were resolved during merge conflict resolving.

Is that really relevant? It rather seems something like rerere, which is
handy, but only if you are the one who is actually supposed to have clue
on how should it be resolved; the caches aren't replicated on clones.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)

^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Jakub Narebski @ 2006-10-20 23:24 UTC (permalink / raw)
  To: git; +Cc: bazaar-ng
In-Reply-To: <1161385512.13697.61.camel@localhost.localdomain>

Robert Collins wrote:

> However, I'm still convinced that tracking the user intention of renames
> leads to a slicker system than renames via inference.

Well, there was (abandoned for now) idea of rr2-cache, the cache of how
renames were resolved during merge conflict resolving.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply

* Re: VCS comparison table
From: Junio C Hamano @ 2006-10-20 23:19 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <200610201350.12273.jnareb@gmail.com>

Jakub Narebski <jnareb@gmail.com> writes:

>> The lack of parents ordering in Git is directly connected with
>> fast-forwarding.
>
> There are exactly _two_ places where Git treats first parent specially 
> (correct me if I'm wrong).

I am not bold enough to say _exactly_ N places, but you missed
at least one more important one.  Merge simplification favors
the earlier parents over later ones.

^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Robert Collins @ 2006-10-20 23:15 UTC (permalink / raw)
  To: Jeff Licquia; +Cc: bazaar-ng, git
In-Reply-To: <1161385512.13697.61.camel@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

On Sat, 2006-10-21 at 09:05 +1000, Robert Collins wrote:
> On Fri, 2006-10-20 at 18:13 -0400, Jeff Licquia wrote:
> > 
> > All in all, not ideal, but it seems bzr handles this better than bk.
> > Certainly, bzr doesn't silently drop anyone's changes, at least.  I
> > suspect that bzr could improve its handling of this use case, but not,
> > I'm sure, to Linus's specifications; some of the fun and games does
> > seem to come from the use of file IDs. 
...
> However, I'm still convinced that tracking the user intention of renames
> leads to a slicker system than renames via inference. My off the cuff
> list of corner cases is:

I meant to add, that I think inference is a great tool to use as an
adjunct to whatever explicit data one can capture.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Robert Collins @ 2006-10-20 23:05 UTC (permalink / raw)
  To: Jeff Licquia; +Cc: bazaar-ng, git
In-Reply-To: <1161382416.9241.19.camel@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 2583 bytes --]

On Fri, 2006-10-20 at 18:13 -0400, Jeff Licquia wrote:
> 
> All in all, not ideal, but it seems bzr handles this better than bk.
> Certainly, bzr doesn't silently drop anyone's changes, at least.  I
> suspect that bzr could improve its handling of this use case, but not,
> I'm sure, to Linus's specifications; some of the fun and games does
> seem to come from the use of file IDs. 

We have a few features we're focusing on right now, but coming shortly
after them we hope to address parallel imports [which this is a case of]
better than we do now. I have a number of ideas, and I'm sure other devs
do too, about the right way to solve this. Fundamentally, I think using
1-1 mapped path ids [which can be considered a memo of the origin commit
id + path] of a path is not sufficiently rich a representation of what
happens to paths - there is a dual that you can convert to, which is
identity via ancestry traversal - each path has N <= M parent paths in
each of M parent revisions. Our current path ids can only represent the
case where when you traverse to the start of history this graph has a
single tail (that is, that a single file must start at one and only one
place). The graph however is not intrinsically limited in this way -
files can split and join, and we should be able to represent this more
fully.

I'll happily acknowledge that we dont need fileids per se: tracking
renames can be done without a memo of the origin.

However, I'm still convinced that tracking the user intention of renames
leads to a slicker system than renames via inference. My off the cuff
list of corner cases is:

 - change file, rename: rename the changed file/change the renamed file.
 - change file, remove: conflict on removal/text change
 - add path to dir, rename the dir: move the current contents of the
directory/add the new path to the renamed directory.
 - move paths out of a directory, rename the directory: leave the paths
moved out where they were moved to/move the paths from wherever their
new location is.
 - introduce path A + rename old A to B , change path A: change path
B/rename A to B and introduce the new A.

All these cases work roughly along the form of 'have two branches, do
one action in one, one in the other: merge other to one/merge one to
other'. I haven't yet seen an inference system get all these right.

There are other, more complex cases, but I think they all boil down to
one of those primitives to all intents and purposes.

Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Jeff King @ 2006-10-20 22:59 UTC (permalink / raw)
  To: Jan Hudec; +Cc: bazaar-ng, git, Jakub Narebski
In-Reply-To: <20061020181210.GA29843@artax.karlin.mff.cuni.cz>

On Fri, Oct 20, 2006 at 08:12:10PM +0200, Jan Hudec wrote:

> At this point, I expect the tree to look like this:
> A$ ls -R
> .:
> data/
> data:
> hello.txt
> A$ cat data/hello.txt
> Hello World!

Git does what you expect here.

> A$ VCT mv data greetings
> A$ VCT commit -m "Renamed the data directory to greetings"
> B$ echo "Goodbye World!" > data/goodbye.txt
> B$ VCT add data/goodbye.txt
> B$ VCT commit -m "Added goodbye message."
> A$ VCT merge B
> 
> And now I expect to have tree looking like this:
> 
> A$ ls -R
> .:
> greetings/
> greetings:
> hello.txt
> goodbye.txt

Git does not do what you expect here. It notes that files moved, but it
does not have a concept of directories moving.  Git could, even without
file-ids or special patch types, figure out what happened by noting that
every file in data/ was renamed to its analogue in greetings/, and infer
that previously non-existant files in data/ should also be moved to
greetings/.

However, I'm not sure that I personally would prefer that behavior. In
some cases you might actually WANT data/goodbye.txt, and in some other
cases a conflict might be more appropriate. In any case, I would rather
the SCM do the simple and predictable thing (which I consider to be
creating data/goodbye.txt) rather than be clever and wrong (even if it's
only wrong a small percentage of the time).

In short, git doesn't do what you expect, but I'm not convinced that
it's a bug or lack of feature, and not simply a difference in desired
behavior.

-Peff

^ permalink raw reply

* Re: VCS comparison table
From: Petr Baudis @ 2006-10-20 22:58 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: James Henstridge, bazaar-ng, Linus Torvalds, Andreas Ericsson,
	Carl Worth, git
In-Reply-To: <200610210050.32254.jnareb@gmail.com>

Dear diary, on Sat, Oct 21, 2006 at 12:50:31AM CEST, I got a letter
where Jakub Narebski <jnareb@gmail.com> said that...
> P.S. what Git lacks at least now is a way to generate diff between
> two different local repositories, but you can always setup alternates
> file and fetch the other repository into some tag.

It's not exactly convenient, but you can do

	xpasky@machine[0:0]~/git$ GIT_ALTERNATE_OBJECT_DIRECTORIES=../cogito/.git/objects cg-diff -r `GIT_DIR=../cogito/.git cg-object-id -c HEAD`..HEAD

I don't personally think it's worth a special UI, but there're no
boundaries for initiative... :-)

^ permalink raw reply

* Re: VCS comparison table
From: Jakub Narebski @ 2006-10-20 22:50 UTC (permalink / raw)
  To: James Henstridge
  Cc: bazaar-ng, Linus Torvalds, Carl Worth, Andreas Ericsson, git
In-Reply-To: <a7e835d40610200759h49859a20k8a409fe34f68630a@mail.gmail.com>

On 20-10-2006, James Henstridge wrote:
> On 20/10/06, Jakub Narebski <jnareb@gmail.com> wrote:
>> James Henstridge wrote:

>>> With the above layout, I would just type:
>>>     bzr branch http://server/repo/branch1
>>
>> With Cogito (you can think of it either as alternate Git UI, or as SCM
>> built on top of Git) you would use
>>
>>    $ cg clone http://server/repo#branch
>>
>> for example
>>
>>    $ cg clone git://git.kernel.org/pub/scm/git/git.git#next
>>
>> to clone _single_ branch (in bzr terminology, "heavy checkout" of branch).
> 
> My understanding of git is that this would be equivalent to the "bzr
> branch" command.  A checkout (heavy or lightweight) has the property
> that commits are made to the original branch.

Not exactly (my mistake in explaining it). "cg clone git://host/repo@branch"
clones only part of history DAG of commits reachable from given branch.
Still it is full repository. You can add branches to it later with
cg-branch-add and fetch changes with cg-fetch.

>> But you can also clone _whole_ repository, _all_ published branches with
>>
>>    $ cg clone git://git.kernel.org/pub/scm/git/git.git
> 
> I suppose that'd be useful if you want a copy of all the branches at
> once.  There is no builtin command in Bazaar to do that at present.

That is _very_ useful. And that is default option for Git. For
example with git.git repository I'm interested both in 'master'
branch (main line of development), and in 'next' branch (development
branch). For example I send some patches, based on 'master', they
get accepted but in 'next' (to cook for a while for example), and
I want to do further work in this direction I have to base my
new work on 'next' branch.

It looks like the Bazaar-NG "branches" are equivalent of the
one-branch-clone of Git.

And if there is no command to clone whole repository, how
you do public repository?

See below.

[...] 
> Two points:
> (1) if we are publishing branches, we wouldn't include working trees
> -- they are not needed to pull or merge from such a branch.

Same with Git. Public repositories are usually "bare" clones, i.e.
without working directory. We can clone/fetch from "clothed" repo
without problem - we just have to point to .git.

> (2) if we did have working trees, they'd be rooted at /repo/branch1
> and /repo/branch2 -- not at /repo (since /repo is not a branch).

That's explains it.

> In case (2) there is a potential for conflicts if you nest branches,
> but people don't generally trigger this problem with the way they use
> Bazaar.

There is no problem in Git to have git repository nested within
working area: of course you better ignore .git directory; you can
ignore files in this embedded repository or not.

[...]
>> How checked out working area looks like in Bazaar-NG?
> 
> The layout of a standalone branch would be:
>   .bzr/repository/ -- storage of trees and metadata
>   .bzr/branch/ -- branch metadagta (e.g. pointer to the head revision)
>   .bzr/checkout/ -- working tree book-keeping files
>   source code

The layout of git repository (git clone, as it is equivalent of bzr branch)
you have the following layout:
  .git/objects/ -- repository objects database
  .git/refs/ -- heads (branches) and tags
  .git/index -- staging area for commit (adding files, merge resolving)
  .git/HEAD -- which branch is current branch
  source code

> If we use a shared repository, the contained branches would lack the
> .bzr/repository/ directory.  The parent directory would instead have a
> .bzr/repository/, but usually wouldn't have .bzr/branch/ (unless there
> is a branch rooted at the base of the repository).

The equivalent of shared repository would be having .git/objects/
to be symlink to some directory which would serve as common area
to store object database.

You can use alternates file: .git/objects/info/alternates can have
list of absolute pathnames (one per line) where objects can be found
instead. If I understand correctly new objects gets commited to current
repository object database, therefore to have equivalent of symlinking
.git/objects directory you would have for every repository which you
want to share object database to have in alternates file all repositories
except self. 

Or you can use GIT_ALTERNATE_OBJECT_DIRECTORIES environmental variable.

Repository using any kind of alternates mechanism is not suitable
to publish using "dumb" (non-git-aware) transports.

> if we are publishing a branch to a web server, we'd skip the working
> tree, so the source code and .bzr/checkout/ directory would be
> missing.

For "bare" clone only 'source files' would be missing. Well, perhaps
also '.git/index' but I'm not sure.

> In the case of a checkout, the .bzr/branch/ directory has a special
> format and acts as a pointer to the original branch.  If the checkout
> is lightweight, the .bzr/repository/ directory would be missing, and
> bzr would need to contact the original branch for the data.

There is no equivalent for bzr "checkout" (and could you please use
other name for that, like "lazy branch"?) in Git. There was some talk
about how to do "lazy clone"/"remote alternates" in Git, but no consensus
was reached about how to do this effectively, and for both "dumb"
(http, https, ftp, rsync) transports and git-aware (local, git, ssh+git)
transports. From what I've read Bazaar-NG doesn't try the "effective"
part...

[...]
>> Yes, but using Git that way has serious disadvantages. For example
>> there is only one current branch pointer and only one index (dircache)
>> per git repository.
> 
> Okay.  So using Bazaar terminology, this seems to be an issue of the
> working tree being associated with the repository rather than the
> branch?

From the point of view of Git users, there is (in Bazaar-NG) an issue
of working tree being associated with the individual branch rather than
repository.

In git to work on some project you clone its repository; in bzr to
work on some project you get one of its branches.

IMVHO if "Cheap Branching Anywhere" was changed to "Lightweight Branches"
then Bazaar-NG would have to put "Partial" in there. Unless you setup
your branches to share data, branches are not cheap (in the sense of
disk space). That's probably the cause for _need_ for "checkouts".
Bazaar-NG doesn't encourage using temporary branches, with
lifespan no longer than day. Can you ever switch between branches
using only one working area; can you do it fast?

It looks somewhat like bzr started without permanent branches, and
they were added later (sharing repository data). But I might be mistaken.

P.S. what Git lacks at least now is a way to generate diff between
two different local repositories, but you can always setup alternates
file and fetch the other repository into some tag.
-- 
Jakub Narebski
Poland

^ permalink raw reply

* [PATCH 2/2] git-pickaxe: introduce heuristics to avoid "trivial" chunks
From: Junio C Hamano @ 2006-10-20 22:41 UTC (permalink / raw)
  To: git
In-Reply-To: <7v1wp2oi6s.fsf@assigned-by-dhcp.cox.net>

This adds scoring logic to blame_entry to prevent blames on very
trivial chunks (e.g. lots of empty lines, indent followed by a
closing brace) from being passed down to unrelated lines in the
parent.

The current heuristics are quite simple and may need to be
tweaked later, but we need to start from somewhere.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 builtin-pickaxe.c |   36 ++++++++++++++++++++++++++++++++----
 1 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/builtin-pickaxe.c b/builtin-pickaxe.c
index 3c73d82..49673a5 100644
--- a/builtin-pickaxe.c
+++ b/builtin-pickaxe.c
@@ -40,6 +40,15 @@ #define PICKAXE_BLAME_MOVE		01
 #define PICKAXE_BLAME_COPY		02
 #define PICKAXE_BLAME_COPY_HARDER	04
 
+/*
+ * blame for a blame_entry with score lower than these threasholds
+ * is not passed to the parent using move/copy logic.
+ */
+static unsigned blame_move_score;
+static unsigned blame_copy_score;
+#define BLAME_DEFAULT_MOVE_SCORE	20
+#define BLAME_DEFAULT_COPY_SCORE	40
+
 /* bits #0..7 in revision.h, #8..11 used for merge_bases() in commit.c */
 #define METAINFO_SHOWN		(1u<<12)
 #define MORE_THAN_ONE_PATH	(1u<<13)
@@ -645,7 +654,8 @@ static int find_move_in_parent(struct sc
 		if (ent->suspect != target || ent->guilty)
 			continue;
 		find_copy_in_blob(sb, ent, parent, split, &file_p);
-		if (split[1].suspect)
+		if (split[1].suspect &&
+		    blame_move_score < ent_score(sb, &split[1]))
 			split_blame(sb, split, ent);
 	}
 	free(blob_p);
@@ -716,7 +726,8 @@ static int find_copy_in_parent(struct sc
 			find_copy_in_blob(sb, ent, norigin, this, &file_p);
 			copy_split_if_better(sb, split, this);
 		}
-		if (split[1].suspect)
+		if (split[1].suspect &&
+		    blame_copy_score < ent_score(sb, &split[1]))
 			split_blame(sb, split, ent);
 	}
 	diff_flush(&diff_opts);
@@ -1177,6 +1188,15 @@ static int has_path_in_work_tree(const c
 	return !lstat(path, &st);
 }
 
+static unsigned parse_score(const char *arg)
+{
+	char *end;
+	unsigned long score = strtoul(arg, &end, 10);
+	if (*end)
+		return 0;
+	return score;
+}
+
 int cmd_pickaxe(int argc, const char **argv, const char *prefix)
 {
 	struct rev_info revs;
@@ -1206,12 +1226,15 @@ int cmd_pickaxe(int argc, const char **a
 			output_option |= OUTPUT_LONG_OBJECT_NAME;
 		else if (!strcmp("-S", arg) && ++i < argc)
 			revs_file = argv[i];
-		else if (!strcmp("-M", arg))
+		else if (!strncmp("-M", arg, 2)) {
 			opt |= PICKAXE_BLAME_MOVE;
-		else if (!strcmp("-C", arg)) {
+			blame_move_score = parse_score(arg+2);
+		}
+		else if (!strncmp("-C", arg, 2)) {
 			if (opt & PICKAXE_BLAME_COPY)
 				opt |= PICKAXE_BLAME_COPY_HARDER;
 			opt |= PICKAXE_BLAME_COPY | PICKAXE_BLAME_MOVE;
+			blame_copy_score = parse_score(arg+2);
 		}
 		else if (!strcmp("-L", arg) && ++i < argc) {
 			char *term;
@@ -1249,6 +1272,11 @@ int cmd_pickaxe(int argc, const char **a
 			argv[unk++] = arg;
 	}
 
+	if (!blame_move_score)
+		blame_move_score = BLAME_DEFAULT_MOVE_SCORE;
+	if (!blame_copy_score)
+		blame_copy_score = BLAME_DEFAULT_COPY_SCORE;
+
 	/* We have collected options unknown to us in argv[1..unk]
 	 * which are to be passed to revision machinery if we are
 	 * going to do the "bottom" procesing.
-- 
1.4.3.ge193

^ permalink raw reply related

* [PATCH 1/2] git-pickaxe: introduce heuristics to "best match" scoring
From: Junio C Hamano @ 2006-10-20 22:41 UTC (permalink / raw)
  To: git
In-Reply-To: <7v1wp2oi6s.fsf@assigned-by-dhcp.cox.net>

Instead of comparing number of lines matched, look at the
matched characters and count alnums, so that we do not pass
blame on not-so-interesting lines, such as empty lines and lines
that are indentation with closing brace.

Add an option --score-debug to show the score of each
blame_entry while we cook this further on the "next" branch.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 * This comes on top of "next".  The next one makes output from
   "pickaxe -C commit" actually make sense.

 builtin-pickaxe.c |   71 +++++++++++++++++++++++++++++++++++-----------------
 1 files changed, 48 insertions(+), 23 deletions(-)

diff --git a/builtin-pickaxe.c b/builtin-pickaxe.c
index 74c7c9a..3c73d82 100644
--- a/builtin-pickaxe.c
+++ b/builtin-pickaxe.c
@@ -34,8 +34,7 @@ static int longest_file;
 static int longest_author;
 static int max_orig_digits;
 static int max_digits;
-
-#define DEBUG 0
+static int max_score_digits;
 
 #define PICKAXE_BLAME_MOVE		01
 #define PICKAXE_BLAME_COPY		02
@@ -78,6 +77,11 @@ struct blame_entry {
 	 * suspect's file; internally all line numbers are 0 based.
 	 */
 	int s_lno;
+
+	/* how significant this entry is -- cached to avoid
+	 * scanning the lines over and over
+	 */
+	unsigned score;
 };
 
 struct scoreboard {
@@ -215,9 +219,6 @@ static void process_u_diff(void *state_,
 	struct chunk *chunk;
 	int off1, off2, len1, len2, num;
 
-	if (DEBUG)
-		fprintf(stderr, "%.*s", (int) len, line);
-
 	num = state->ret->num;
 	if (len < 4 || line[0] != '@' || line[1] != '@') {
 		if (state->hunk_in_pre_context && line[0] == ' ')
@@ -295,10 +296,6 @@ static struct patch *get_patch(struct or
 	char *blob_p, *blob_o;
 	struct patch *patch;
 
-	if (DEBUG) fprintf(stderr, "get patch %.8s %.8s\n",
-			   sha1_to_hex(parent->commit->object.sha1),
-			   sha1_to_hex(origin->commit->object.sha1));
-
 	blob_p = read_sha1_file(parent->blob_sha1, type,
 				(unsigned long *) &file_p.size);
 	blob_o = read_sha1_file(origin->blob_sha1, type,
@@ -352,6 +349,7 @@ static void dup_entry(struct blame_entry
 	memcpy(dst, src, sizeof(*src));
 	dst->prev = p;
 	dst->next = n;
+	dst->score = 0;
 }
 
 static const char *nth_line(struct scoreboard *sb, int lno)
@@ -448,7 +446,7 @@ static void split_blame(struct scoreboar
 		add_blame_entry(sb, new_entry);
 	}
 
-	if (DEBUG) {
+	if (1) { /* sanity */
 		struct blame_entry *ent;
 		int lno = 0, corrupt = 0;
 
@@ -530,12 +528,6 @@ static int pass_blame_to_parent(struct s
 	for (i = 0; i < patch->num; i++) {
 		struct chunk *chunk = &patch->chunks[i];
 
-		if (DEBUG)
-			fprintf(stderr,
-				"plno = %d, tlno = %d, "
-				"same as parent up to %d, resync %d and %d\n",
-				plno, tlno,
-				chunk->same, chunk->p_next, chunk->t_next);
 		blame_chunk(sb, tlno, plno, chunk->same, target, parent);
 		plno = chunk->p_next;
 		tlno = chunk->t_next;
@@ -547,14 +539,37 @@ static int pass_blame_to_parent(struct s
 	return 0;
 }
 
-static void copy_split_if_better(struct blame_entry best_so_far[3],
+static unsigned ent_score(struct scoreboard *sb, struct blame_entry *e)
+{
+	unsigned score;
+	const char *cp, *ep;
+
+	if (e->score)
+		return e->score;
+
+	score = 0;
+	cp = nth_line(sb, e->lno);
+	ep = nth_line(sb, e->lno + e->num_lines);
+	while (cp < ep) {
+		unsigned ch = *((unsigned char *)cp);
+		if (isalnum(ch))
+			score++;
+		cp++;
+	}
+	e->score = score;
+	return score;
+}
+
+static void copy_split_if_better(struct scoreboard *sb,
+				 struct blame_entry best_so_far[3],
 				 struct blame_entry this[3])
 {
 	if (!this[1].suspect)
 		return;
-	if (best_so_far[1].suspect &&
-	    (this[1].num_lines < best_so_far[1].num_lines))
-		return;
+	if (best_so_far[1].suspect) {
+		if (ent_score(sb, &this[1]) < ent_score(sb, &best_so_far[1]))
+			return;
+	}
 	memcpy(best_so_far, this, sizeof(struct blame_entry [3]));
 }
 
@@ -596,7 +611,7 @@ static void find_copy_in_blob(struct sco
 				      tlno + ent->s_lno, plno,
 				      chunk->same + ent->s_lno,
 				      parent);
-			copy_split_if_better(split, this);
+			copy_split_if_better(sb, split, this);
 		}
 		plno = chunk->p_next;
 		tlno = chunk->t_next;
@@ -699,7 +714,7 @@ static int find_copy_in_parent(struct sc
 				continue;
 			}
 			find_copy_in_blob(sb, ent, norigin, this, &file_p);
-			copy_split_if_better(split, this);
+			copy_split_if_better(sb, split, this);
 		}
 		if (split[1].suspect)
 			split_blame(sb, split, ent);
@@ -944,6 +959,7 @@ #define OUTPUT_RAW_TIMESTAMP	004
 #define OUTPUT_PORCELAIN	010
 #define OUTPUT_SHOW_NAME	020
 #define OUTPUT_SHOW_NUMBER	040
+#define OUTPUT_SHOW_SCORE      0100
 
 static void emit_porcelain(struct scoreboard *sb, struct blame_entry *ent)
 {
@@ -1016,6 +1032,8 @@ static void emit_other(struct scoreboard
 					   show_raw_time),
 			       ent->lno + 1 + cnt);
 		else {
+			if (opt & OUTPUT_SHOW_SCORE)
+				printf(" %*d", max_score_digits, ent->score);
 			if (opt & OUTPUT_SHOW_NAME)
 				printf(" %-*.*s", longest_file, longest_file,
 				       suspect->path);
@@ -1060,8 +1078,9 @@ static void output(struct scoreboard *sb
 	for (ent = sb->ent; ent; ent = ent->next) {
 		if (option & OUTPUT_PORCELAIN)
 			emit_porcelain(sb, ent);
-		else
+		else {
 			emit_other(sb, ent, option);
+		}
 	}
 }
 
@@ -1118,6 +1137,7 @@ static void find_alignment(struct scoreb
 {
 	int longest_src_lines = 0;
 	int longest_dst_lines = 0;
+	unsigned largest_score = 0;
 	struct blame_entry *e;
 
 	for (e = sb->ent; e; e = e->next) {
@@ -1143,9 +1163,12 @@ static void find_alignment(struct scoreb
 		num = e->lno + e->num_lines;
 		if (longest_dst_lines < num)
 			longest_dst_lines = num;
+		if (largest_score < ent_score(sb, e))
+			largest_score = ent_score(sb, e);
 	}
 	max_orig_digits = lineno_width(longest_src_lines);
 	max_digits = lineno_width(longest_dst_lines);
+	max_score_digits = lineno_width(largest_score);
 }
 
 static int has_path_in_work_tree(const char *path)
@@ -1206,6 +1229,8 @@ int cmd_pickaxe(int argc, const char **a
 				tmp = top; top = bottom; bottom = tmp;
 			}
 		}
+		else if (!strcmp("--score-debug", arg))
+			output_option |= OUTPUT_SHOW_SCORE;
 		else if (!strcmp("-f", arg) ||
 			 !strcmp("--show-name", arg))
 			output_option |= OUTPUT_SHOW_NAME;
-- 
1.4.3.ge193

^ permalink raw reply related

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Petr Baudis @ 2006-10-20 22:40 UTC (permalink / raw)
  To: Aaron Bentley; +Cc: Jakub Narebski, bazaar-ng, git
In-Reply-To: <4538EC8F.7020502@utoronto.ca>

Dear diary, on Fri, Oct 20, 2006 at 05:34:39PM CEST, I got a letter
where Aaron Bentley <aaron.bentley@utoronto.ca> said that...
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Jakub Narebski wrote:
> > Aaron Bentley wrote:
> >>In Bazaar bundles, the text of the diff is an integral part of the data.
> >> It is used to generate the text of all the files in the revision.
> > 
> > 
> > I thought that the diff was combined diff of changes.
> 
> It is.  It's a description of how to produce revision X given revision
> Y, where Y is the last-merged mainline revision.

Aha, so by default a bundle can carry just a _single_ revision?

That doesn't sound right either, because then it wouldn't make sense to
talk about "combined" or "simple" diffs. So I guess sending a bundle
really is taking n revisions at your side, bundling them to a single
diff and when the other side takes it, it will result in a single
revision? That is basically what our merge --squash does.

Hmm, but that doesn't sound right either, that's certainly no revolting
functionality and seems to be in contradiction with previous bundles
description. But if it doesn't squash the changes, I don't see how the
combined diff can be integral part of the data. Sorry, I don't get it.

> The bundle format can also support sending a single bundles that
> displays the series of patches, though there's currently no UI to select
> this.
..snip..
> > I was under an impression that user sees only mega-patch of all the
> > revisions in bundle together, and rest is for machine consumption only.
> 
> All of it is for machine consumption.  The MIME-encoded sections are a
> series of patches.  They're usually MIME-encoded to avoid confusion with
> the overview patch, but this is optional.
> 
> I've attached an example of what a combined patch-by-patch bundle looks
> like.

But that's the one there's no UI to select? Or where is the combined
diff?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)

^ permalink raw reply

* StGIT and rerere
From: Robin Rosenberg @ 2006-10-20 22:39 UTC (permalink / raw)
  To: git

Hi,

It seems stgit does not use git-rerere, so why not? Any reason other than it
hasn't been done yet?

I abuse stgit heavily, by frequently reording patches, which for some patches
result in re-occuring conflicts. git-rerere seems to be the solution.

What's the "rules" for when to invoke rerere? It seems it is mostly automatic 
in git, but since only the porcelainish commands use it, that means StGIT 
doesn't.

So here is what I *think* needs to be done. Seems simple enough.

stg push, stg pick, stg import, stg goto, stg fold, stg float
	do what push does and invoke git-rerere at the end whether the push ends with 
conflicts or not

stg pop
	nothing, or do I need to remove rr-cache/MERGE_RR, like git-reset does?

stg status --reset, stg push --undo
	remove rr-cache/MERGE_RR ?

stg refresh
	do what stgit does normally and then invoke git-rerere

stg resolved:
	do what stgit does normally and then invoke git-rerere

stg clean, stge delete:
	remove rr-cache/MERGE_RR ?

Anything else or comments on this list?

-- robin

^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Jeff Licquia @ 2006-10-20 22:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jan Hudec, bazaar-ng, git, Jakub Narebski
In-Reply-To: <Pine.LNX.4.64.0610201133260.3962@g5.osdl.org>

On Fri, 2006-10-20 at 11:48 -0700, Linus Torvalds wrote:
> Here's a real-life schenario that we hit several times with BK over the 
> years:
> 
>  - take a real repository, and a patch that gets discussed that adds a new 
>    file.
>  - take two different people applying that patch to their trees (or, do 
>    the equivalent thing, which is to just create the same filename
>    independently, because the solution is obvious - and the same - to 
>    both developers).
>  - now, have somebody merge both of those two peoples trees (eg me)
>  - have the two people continue to use their trees, modifying it, and 
>    getting merged.
> 
> Trust me, this isn't even _unlikely_. It happens. And it's a serious 
> problem for a file-ID case. Why? Because you have two different file ID's 
> for the same pathname. 

I tried this to see what bzr would do.  Here's the critical point where
the first merges are done ("a" is mainline, "b" and "c" are external
branches being merged into "a").

---
jeff@lsblap:~/tmp/linus-file-id/a$ bzr pull ../b
All changes applied successfully.
1 revision(s) pulled.
jeff@lsblap:~/tmp/linus-file-id/a$ bzr pull ../c
bzr: ERROR: These branches have diverged.  Use the merge command to reconcile them.
jeff@lsblap:~/tmp/linus-file-id/a$ bzr merge ../c
Conflict adding file file2.  Moved existing file to file2.moved.
1 conflicts encountered.
jeff@lsblap:~/tmp/linus-file-id/a$ bzr status
added:
  file2
renamed:
  file2 => file2.moved
conflicts:
  Conflict adding file file2.  Moved existing file to file2.moved.
pending merges:
  Jeff Licquia 2006-10-20 commit c of file2
---

file2 and file2.moved have identical contents at this point.  I fixed it
by deleting file2.moved, "bzr resolve file2", and committing.

After this conflict is resolved, merging from b causes conflicts, while
merging from c appears to work fine.  This continues until b merges from
a (and resolves a conflict in a similar manner to a), at which time
merging/pulling works as you'd expect between the branches.  Whenever b
is marked as conflicting before it merges from a, bzr preserves b's
changes by moving b's modified file.

All in all, not ideal, but it seems bzr handles this better than bk.
Certainly, bzr doesn't silently drop anyone's changes, at least.  I
suspect that bzr could improve its handling of this use case, but not,
I'm sure, to Linus's specifications; some of the fun and games does seem
to come from the use of file IDs.

^ permalink raw reply

* Re: VCS comparison table
From: Carl Worth @ 2006-10-20 21:48 UTC (permalink / raw)
  To: Aaron Bentley
  Cc: Linus Torvalds, Jakub Narebski, Andreas Ericsson, bazaar-ng, git
In-Reply-To: <45382120.9060702@utoronto.ca>

[-- Attachment #1: Type: text/plain, Size: 12348 bytes --]

On Thu, 19 Oct 2006 21:06:40 -0400, Aaron Bentley wrote:
> I understand your argument now.

Well, I'm glad to know we each feel like we are communicating at
times, here.

>                                  It's nothing to do with numbers per se,
> and all about per-branch namespaces.  Correct?

The entire discussion is about how to name things in a distributed
system. The premise that Linus has put forth in a very compelling way,
is that attempting to use sequential numbers for names in a
distributed system will break down. The breakdown could be that the
names are not stable, or that the system is used in a centralized way
to avoid the instability of the names.

Now, that causality might not accurately describe the way bzr has
developed. It may be that the centralization bias was determined by
other reasons, and that given those, using sequential numbers for
names makes perfect sense.

But it really is fundamental and unavoidable that sequential numbers
don't work as names in a distributed version control system.

> I meant that the active branch and a mirror of the abandoned branch
> could be stored in the same repository, for ease of access.

Granted, everything can be stored in one repository. But that still
doesn't change what I was trying to say with my example. One of the
repositories would "win" (the names it published during the fork would
still be valid). And the other repository would "lose" (the names it
published would be not valid anymore). Right?

Now, maybe there's some "simple" mapping from old names to new names
for the losing repository, (something like adding a prefix of
"losers/" to the beginning of the names or something or adding a "15."
prefix or whatever). The point is that the old names are
invalidated. And there's no way to guarantee this kind of change won't
happen in the future, (no matter how old a project is).

I constructed that example to show that the naming has a social impact
in forcing a distinction between winners and losers in the merge, (or
mainline and side branch, or whatever you want to name the
distinction). The two re-joining projects could be really amiable,
create a new virgin mainline and treat both histories as side
branches. In this version, everyone loses as all the old names are
invalidated.

> Bazaar encourages you to stick lots and lots of branches in your
> repository.  They don't even have to be related.  For example, my repo
> contains branches of bzr, bzrtools, Meld, and BazaarInspect.

Git allows this just fine. And lots of branches belonging to a single
project is definitely the common usage. It is not common (nor
encouraged) for unrelated projects to share a repository, since a git
clone will fetch every branch in the repository. common for a single
base URL to provide a common basis for a hierarchy of git
repositories, (see, for example http://repo.or.cz/), and that may
provide similar benefits.

I'm noticing another terminology conflict here. The notion of "branch"
in bzr is obviously very different than in git. For example the bzr
man page has a sentence beginning with "if there is already a branch
at the location but it has no working tree". I'm still not sure
exactly what a bzr branch is, but it's clearly something different
from a git branch, (which is absolutely nothing more than a name
referencing a particular commit object). [Note: after playing with it
a bit more down below, a bzr "branch" appears to be something like a
git "repository" that can only hold a single branch.]

> I can see where you're coming from, but to me, the trade-off seems
> worthwhile.  Because historical data gets less and less valuable the
> older it gets.  By the time the URL for a branch goes dark, there's
> unlikely to be any reason to refer to one of its revisions at all.

I strongly disagree on this point. One, I don't think that the "time
for a branch to go dark" is necessarily long, (or if it is, then
that's another barrier that's setup against distributed
development---people have to have a long-term repository before they
can usefully start publishing a branch). Second, I'm not comfortable
with any limit on usefulness of history. Would you willingly throw
away commits, mailing list posts, or closed bug reports older than any
given age for any projects that you care about?

> When you create a new branch from scratch, the number starts at zero.
> If you copy a branch, you copy its number, too.
>
> Every time you commit, the number is incremented.  If you pull, your
> numbers are adjusted to be identical to those of the branch you pulled from.
>
> Is that really complicated?

OK. So now I had to actually try things out. I went ahead and
installed bzr and was able to init and commit from the man page. I had
to go to IRC to figure out how to create and change branches, (the
documentation for "bzr branch" just said FROM_LOCATION and TO_LOCATION
and I couldn't figure out what to pass for those).

Here's the setup I came up with for a tweaked version of the a[bc]m
diamond example I showed with git earlier, (I just added a second
commit to each branch before merging):

	mkdir bzrtest; cd bzrtest
	mkdir master; cd master; bzr init
	touch a; bzr add a; bzr commit -m "Initial commit of a"
	cd ..
	bzr branch master b; cd b
	touch b; bzr add b; bzr commit -m "Commit b on b branch"
	echo "change" > b; bzr commit -m "Change b on b branch"
	cd ..
	bzr branch master c; cd c
	touch c; bzr add c; bzr commit -m "Commit c on c branch"
	echo "change" > c; bzr commit -m "Change c on c branch"
	cd ../master
	bzr merge ../b; bzr commit -m "Merge in b"
	bzr merge ../c; bzr commit -m "Merge in c"

First, I've been told that this is a lot less efficient than possible
since I have what in bzr terms is three unshared "branches" here,
(what git would really call three separate "repositories").

Second, I think that using the filesystem for separating branches is a
really bad idea. One, it intrudes on my branch namespace, (note that
in many commands above I have to use things like "../b" where I'd like
to just name my branch "b". Two, it prevents bzr from having any
notion of "all branches" in places where git takes advantage of it,
(such as git-clone and "gitk --all"). Three, it certainly encourages
the storage problem I ran into above, (and I'd be interested to see a
"corrected" version of the commands above to fix the storage
inefficiencies).

But anyway, those are all new topics, what we were trying to talk
about is revision numbers. After the above commands I can run bzr log
in my three branches, master, b, and c and I get the following
revision number sequences:

master: 1 2 3
b: 1 2 3
c: 1 2 3

And from this state if I ask questions with bzr missing and look at
just the revision numbers, then the answers are useless. I get answers
like:

	.../b:$ bzr missing ../c
	You have 2 extra revision(s):
	revno: 3
	  Change b on b branch
	revno: 2
	  Commit b on b branch

	You are missing 2 revision(s):
	revno: 3
	  Change c on c branch
	revno: 2
	  Commit c on c branch

	.../b:$ bzr missing ../master
	You are missing 2 revision(s):
	revno: 3
	  Merge in c
	revno: 2
	  Merge in b

So there we have the revision numbers 2 and 3 each being used to name
three different revisions. That's a lot of aliasing already.
Then, if the b and c branches each treat master as their mainline and
each pull, then both branches get their numbers all shuffled.

Oh, drat. I just realized that I'm running 0.11 here which doesn't
have the dotted-decimal numbers. (I'm trying to get bzr.dev too, but
it appears to be stuck about 40% of the way through "Fetch phase
1/4" [Note: it ). In this version, the commits brought in as part of a merge
don't get any "simple" number at all and instead "bzr log" shows a
merge ID.

I hadn't realized that the dotted decimal notation was so new that the
community hadn't had a lot of experience with it yet. But, your
description doesn't actually presume that notation. What you asked
was:

	> When you create a new branch from scratch, the number starts at zero.
	> If you copy a branch, you copy its number, too.
	>
	> Every time you commit, the number is incremented.  If you pull, your
	> numbers are adjusted to be identical to those of the branch you pulled from.
	>
	> Is that really complicated?

And to answer. That description doesn't describe at all what happens
to the "simple" numbers of commits that are merged. In the version I
have, they disappear and get replaced with "ugly" numbers. In 0.12
something else happens instead, (that's the part I don't understand
yet).

And my argument isn't just "confusing" it's "confusing or
useless". I understand that pull destroys numbers, and how, but that
makes the numbers I had generated earlier useless. I still don't
understand how people can avoid number changing, (since pull seems the
only way to synch up without infinite new merge commits being added
back and forth).

So, yes, it really is complicated or my brain is just too small.

> > The naming in git really is beautiful and beautifully simple.
>
> Well, you've got to admit that those names are at least superficially ugly.

Sure. But I'll gladly take a simple system with superficial warts than
a complex system with superficial beauty.

> What's nice is being able see the revno 753 and knowing that "diff -r
> 752..753" will show the changes it introduced.  Checking the revo on a
> branch mirror and knowing how out-of-date it is.

With git I get to see a revision number of b62710d4 and know that
"diff b62710d4^ b62710d4" will show its changes, though much more
likely just "show b62710d4". I really cannot fathom a place where
arithmetic on revision numbers does something useful that git revision
specifications don't do just as easily. Anybody have an example for
me?

-Carl

PS. The "bzr branch" of bzr.dev did eventually finish. I can see the
dotted-decimal numbers in my example now, (1.1.1 and 1.2.2 for the
commits that came from branch b; 1.2.1 and 1.2.2 for the commits that
came from branch c). At 5 characters a piece these are well on their
way to getting just as "ugly" as git names, (once it's all
cut-and-paste the difference in ugliness is negligible).

And now, I see it's not just pull that does number rewriting. If I use
the following command (after the chunk of commands above):

	cd ..; bzr branch -r 1.2.2 master 1.2.2

It appears to just create newly linearized revision numbers from whole
cloth for the new branch (1, 2, and 3 corresponding to mainline 1,
1.2.1, and 1.2.2). That's totally surprising, very confusing, and
would invalidate any use I wanted to make of published revision
numbers for the mainline branch while I was working on this branch.

See? This stuff really doesn't work.

Motivating scenario for the above: Imagine 1.2.3 commited garbage so I
want to fix it by branching from 1.2.2 rather than the mainline
"2". Then after I branch, I learn something about "1.2.1" that I want
to investigate more closely. I try to inspect that in my branch, but
ouch! I don't have that revision.

Is there even a way to say "show me the change introduced by what is
named '1.2.1' in the source branch in this scenario" ?

Note: In #bzr I just learned that there is a way for me to do this
_if_ I also happen to have a pull of the original branch somewhere on
my machine. Something like:

	bzr diff -r1.2.0:../master -r1.2.1:../master

I don't know if there's a way to get diff's .. notation to work with
that, (I can't manage to). But these simple numbers are getting less
simple all the time.

With git, if I find a revision number somewhere, I can cut-and-paste
it and get the right thing:

	git show b62710d4f8602203d848daf2d444865b611fff09

But with bzr if I find "1.2.1" somewhere I'm likely to type:

	bzr diff -r1.2.0..1.2.1

If I'm lucky, then that fails with:

	bzr: ERROR: Requested revision: '1.2.0' does not exist in branch:

and I go back to the source, find out what branch it was referring to,
remember where that is on my machine (../master, say), and manually
type that to my command line to get:

	bzr diff -r1.2.0:../master -r1.2.1:../master

If I'm unlucky then the first diff comes up with some unrelated commit
and I get to be confused before I go through that same process.

Now do you see? It really, really does not work. This stuff is about
as un-simple as could be, and this things will happen.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* [PATCH] Fix typo in show-index.c
From: Lars Hjemli @ 2006-10-20 21:24 UTC (permalink / raw)
  To: git

Signed-off-by: Lars Hjemli <hjemli@gmail.com>
---
 show-index.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/show-index.c b/show-index.c
index c21d660..a30a2de 100644
--- a/show-index.c
+++ b/show-index.c
@@ -8,7 +8,7 @@ int main(int argc, char **argv)
 	static unsigned int top_index[256];
 
 	if (fread(top_index, sizeof(top_index), 1, stdin) != 1)
-		die("unable to read idex");
+		die("unable to read index");
 	nr = 0;
 	for (i = 0; i < 256; i++) {
 		unsigned n = ntohl(top_index[i]);
-- 
1.4.3.rc2.g4035b

^ permalink raw reply related

* Re: [PATCH] Use diff3 instead of merge in merge-recursive.
From: Uwe Zeisberger @ 2006-10-20 21:11 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0610181135120.14200@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin wrote:
> Hi Uwe,
> 
> On Wed, 18 Oct 2006, Uwe Zeisberger wrote:
> 
> > If no error occurs, merge (from rcs 5.7) is nothing but:
> > 
> > 	diff3 -E -am -L label1 -L label2 -L label3 file1 file2 file3 > tmpfile
> > 	cat tmpfile > file1
> 
> Interesting.

> I wonder if we could streamline the code such that index_fd 
> is called directly on the output of diff3? Of course, the result has to be 
> removed when the call to diff3 fails.
I thought about that, too.  But my primary intention was to get rid of
'merge', because the Solaris boxes I use from time to time lack merge,
but have (GNU) diff3[1].  I already had a mental note to look into that.

If Linus is right that there are systems that have merge but lack diff3,
then a combined approach is maybe the best?  That is, try diff3 and if
that is missing, try merge.  (Or the other way round if you prefer.)

OK, I looked a bit deeper into rcs, and it seems to handle the BSD diff3
case.  So Linus might be right.

BTW, merge -p sends the merged result to stdout instead of overwriting
the first file given.  That is

	merge -p -L label1 -L label2 -L label3 file1 file2 file3

and (GNU)

	diff3 -E -am -L label1 -L label2 -L label3 file1 file2 file3

are exactly equivalent.
So if that option of merge is old enough, these are the candidates for
the "combined approach" (see above).

> > I didn't made any timing tests or further tests for correctness, but I
> > hope Johannes still has the framework from the time when he converted
> > the Python script to C?  
> > 
> > @Johannes: If so, could you test this patch?
> 
> I have to dig a little where I have it, but I think I can give it a try in 
> a few hours (imagine this lyrics to the melody of the day job blues).
Seems to be a long blues because you didn't sent any results. :-(

Best regards
Uwe

[1] They also have a version of diff3 (I guess from BSD) that is not
suited to be used for merging, at least rcs' merge cannot use it.

-- 
Uwe Zeisberger

If a lawyer and an IRS agent were both drowning, and you could only save
one of them, would you go to lunch or read the paper?

^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Linus Torvalds @ 2006-10-20 20:57 UTC (permalink / raw)
  To: Aaron Bentley; +Cc: bazaar-ng, Jan Hudec, Git Mailing List, Jakub Narebski
In-Reply-To: <4539318D.9040004@utoronto.ca>

On Fri, 20 Oct 2006, Aaron Bentley wrote:
> 
> Agreed.  We start by comparing BASE and OTHER, so all those comparisons
> are in-memory operations that don't hit disk.  Only for files where BASE
> and OTHER differ do we even examine the THIS version.

Git just slurps in all three trees. I actually think that the current 
merge-recursive.c does it the stupid way (ie it expands all trees 
recursively, regardless of whether it's needed or not), but I should 
really check with Dscho, since I had nothing to do with that code.

I wrote a tree-level merger that avoided doing the recursive tree reading 
when the tree-SHA1's matched entirely, and re-doing the latest merge using 
that took all of 0.037s, because it didn't recursively expand any of the 
uninteresting trees.

But the default recursive merge was ported from the python script that 
did it a full tree at a time, so it's comparatively "slow". But it's fast 
enough (witness the under-1s time ;) that I think the motivation to be 
smarter about reading the trees was basically not just there, so my 
"git-merge-tree" thing is languishing as a proof-of-concept.

So right now, git merging itself doesn't even take advantage of the "you 
can compare two whole directories in one go". We do that all over the 
place in other situations, though (it's a big reason for why doing a 
"diff" between different revisions is so fast - you can cut the problem 
space up and ignore the known-identical parts much faster).

That tree-based data structure turned out to be wonderful. Originally (as 
in "first weeks of actual git work" in April 2005) git had a flat "file 
manifest" kind of thing, and that really sucked.  So the data structures 
are important, and I think we got those right fairly early on.

> We can do a do-nothing kernel merge in < 20 seconds, and that's
> comparing every single file in the tree.  In Python.  I was aiming for
> less than 10 seconds, but didn't quite hit it.

Well, so I know I can do that particular actual merge in 0.037 seconds 
(that's not counting the history traversal to actually find the common 
parent, which is another 0.01s or more ;), so we should be able to 
comfortably do the simple merges in less than a tenth of a second. But at 
some point, apparently nobody just cares.

Of course, this kind of thing depends a lot on developer behaviour. We had 
some performance bugs that we didn't notice simply because the kernel 
didn't show any of those patterns, but people using it for other things 
had slower merges. Sometimes you don't see the problem, just because you 
end up looking at the wrong pattern for performance.

> > So recursive basically generates the matrix of similarity for the 
> > new/deleted files, and tries to match them up, and there you have your 
> > renames - without ever looking at the history of how you ended up where 
> > you are.
> 
> So in the simple case, you compare unmatched THIS, OTHER and BASE files
> to find the renames?

Right. Some cases are easy: if one of the branches only added files (which 
is relatively common), that obviously cannot be a rename. So you don't 
even have to compare all possible combinarions - you know you don't have 
renames from one branch to the other ;)

But I'm not even the authorative person to explain all the details of the 
current recursive merge, and I might have missed something. Dscho? 
Fredrik? Anything you want to add?

			Linus

^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: David Lang @ 2006-10-20 20:55 UTC (permalink / raw)
  To: Petr Baudis
  Cc: Linus Torvalds, Shawn Pearce, Aaron Bentley, Jakub Narebski,
	bazaar-ng, git
In-Reply-To: <20061020205330.GK20017@pasky.or.cz>

On Fri, 20 Oct 2006, Petr Baudis wrote:

>>> I've talked to some people who really didn't mind (or even liked) Git's
>>> heuristics when it came to _inspecting_ movement of content, but were
>>> really nervous about merge following such heuristics.
>>
>> remember, git only stores the results. so when you are merging it doesn't
>> even look for renames.
>
> Of course it does look for renames; when you use the recursive strategy,
> it will try to merge across renames.

sorry, missed that.

David Lang

^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Petr Baudis @ 2006-10-20 20:53 UTC (permalink / raw)
  To: David Lang; +Cc: bazaar-ng, Linus Torvalds, Shawn Pearce, git, Jakub Narebski
In-Reply-To: <Pine.LNX.4.63.0610201345440.5248@qynat.qvtvafvgr.pbz>

Dear diary, on Fri, Oct 20, 2006 at 10:49:53PM CEST, I got a letter
where David Lang <dlang@digitalinsight.com> said that...
> On Fri, 20 Oct 2006, Petr Baudis wrote:
> 
> >
> >Dear diary, on Fri, Oct 20, 2006 at 07:48:58PM CEST, I got a letter
> >where Linus Torvalds <torvalds@osdl.org> said that...
> >>So yeah, I've seen a few strange cases myself, but they've actually been
> >>interesting. Like seeing how much of a file was just a copyright license,
> >>and then a file being considered a "copy" just because it didn't actually
> >>introduce any real new code.
> >
> >Well it's certainly "interesting" and fun to see, but is it equally fun
> >to handle mismerges caused by a broken detection?
> >
> >I've talked to some people who really didn't mind (or even liked) Git's
> >heuristics when it came to _inspecting_ movement of content, but were
> >really nervous about merge following such heuristics.
> 
> remember, git only stores the results. so when you are merging it doesn't 
> even look for renames.

Of course it does look for renames; when you use the recursive strategy,
it will try to merge across renames.

^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Shawn Pearce @ 2006-10-20 20:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: bazaar-ng, git, Jakub Narebski
In-Reply-To: <Pine.LNX.4.64.0610201045550.3962@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> wrote:
> On Fri, 20 Oct 2006, Shawn Pearce wrote:
> > 
> > I renamed hundreds of small files in one shot and also did a few
> > hundered adds and deletes of other small XML files.  Git generated
> > a lot of those unrelated adds/deletes as rename/modifies, as their
> > content was very similiar.  Some people involved in the project
> > freaked as the files actually had nothing in common with one
> > another... except for a lot of XML elements (as they shared the
> > same DTD).
> 
> Heh. We can probably tweak the heuristics (one of the _great_ things about 
> content detection is that you can fix it after the fact, unlike the 
> alternative).
> 
> That said, I've personally actually found the content-based similarity 
> analysis to often be quite informative, even when (and perhaps 
> _especially_ when) it ended up showing something that the actual author of 
> the thing didn't intend.
> 
> So yeah, I've seen a few strange cases myself, but they've actually been 
> interesting. Like seeing how much of a file was just a copyright license, 
> and then a file being considered a "copy" just because it didn't actually 
> introduce any real new code.

Aside from that one strange case I just mentioned I've always seen
the strategy to work very well.  Its never done something I didn't
expect and I've never seen copies or that I didn't expect to see,
knowing what the author of the change did.

So even though I had a little bit of trouble with that rename
situation above I'm _very_ happy with the way Git handles renames.

And the truth is that case above really was quite correct: XML is
very verbose.  When 70% of the file is just required XML to frame
the other 30% of the file's payload its not surprising that files
are considered to be similar when they only differ by a little bit
of payload.

^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: David Lang @ 2006-10-20 20:49 UTC (permalink / raw)
  To: Petr Baudis
  Cc: Linus Torvalds, Shawn Pearce, Aaron Bentley, Jakub Narebski,
	bazaar-ng, git
In-Reply-To: <20061020202318.GJ20017@pasky.or.cz>

On Fri, 20 Oct 2006, Petr Baudis wrote:

> 
> Dear diary, on Fri, Oct 20, 2006 at 07:48:58PM CEST, I got a letter
> where Linus Torvalds <torvalds@osdl.org> said that...
>> So yeah, I've seen a few strange cases myself, but they've actually been
>> interesting. Like seeing how much of a file was just a copyright license,
>> and then a file being considered a "copy" just because it didn't actually
>> introduce any real new code.
>
> Well it's certainly "interesting" and fun to see, but is it equally fun
> to handle mismerges caused by a broken detection?
>
> I've talked to some people who really didn't mind (or even liked) Git's
> heuristics when it came to _inspecting_ movement of content, but were
> really nervous about merge following such heuristics.

remember, git only stores the results. so when you are merging it doesn't even 
look for renames.

the only time you get renames is after-the-fact when you ask git for a report 
about what changed. then (if you enable rename detection) it will tell you what 
files have changed, and what files look like they may have been renames 
(possibly with changes). but if you don't ask git to look for renames it won't 
bother and you can just ignore the concept entirely.

or if you only want complete renames (as opposed to rename + change) then use 
the option to tell it that you don't want to consider it a rename unless it's 
100% the same (or 99%, or whatever satisfies you)

David Lang

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox