Git development

Git development
 help / color / mirror / Atom feed

* [PATCH 1/2] Pass through unresolved deltas when writing a pack
From: Jan Harkes @ 2006-10-20  0:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0610190936440.3962@g5.osdl.org>

The resulting pack should be correct if we have the base somewhere else in
the received pack, if we didn't have the base the received pack would be
faulty and can't be unpacked as loose objects either.

The internal pack index information is not updated correctly anymore.

Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>

---
 builtin-unpack-objects.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index f139308..b95c93c 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -246,7 +246,10 @@ static void unpack_delta_entry(unsigned 
 	}
 
 	if (!has_sha1_file(base_sha1)) {
-		add_delta_to_list(base_sha1, delta_data, delta_size);
+		if (pack_file)
+			write_pack_delta(base_sha1, delta_data, delta_size);
+		else
+			add_delta_to_list(base_sha1, delta_data, delta_size);
 		return;
 	}
 	base = read_sha1_file(base_sha1, type, &base_size);
-- 
1.4.2.1

^ permalink raw reply related

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Jan Harkes @ 2006-10-20  0:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0610190936440.3962@g5.osdl.org>

On Thu, Oct 19, 2006 at 09:48:29AM -0700, Linus Torvalds wrote:
> On Thu, 19 Oct 2006, Jan Harkes wrote:
> > 
> > If we find a delta against a base that is not found in our repository we
> > can keep it as a delta, the base should show up later on in the
> > thin-pack. Whenever we find a delta against a base that we haven't seen
> > in the received part of the thin pack, but is available from the
> > repository we should expand it because there is a chance we may not see
> > this base in the remainder of the thin-pack.
> 
> Yes, indeed. We can also have another heuristic: if we find a delta, and 
> we haven't seen the object it deltas against, we can still keep it as a 
> delta IF WE ALSO DON'T ALREADY HAVE THE BASE OBJECT. Because then we know 
> that the base object has to be there later in the pack (or we have a 
> dangling delta, which we'll just consider an error).
> 
> So yeah, maybe my patch-series is something we can still save.

It looks like you were really close. When we cannot resolve a delta, we
just write it to the packfile and we don't queue it. If it can be
resolved we write it as a full object.

The only thing that cannot be reliably tracked is the pack index
information. The offsets are trivial, but we cannot calculate the SHA1
for a delta without applying it to it's base, if the base comes later
the existing code could do it, but if it has already been written to the
pack we can't easily track back.

And why add all the extra complexity. Running git-index-pack after
git-update-objects --repack not only generates the correct index without
a problem, it also serves as an extra consistency check and we keep this
code isolated from any possible future changes to the index file format.

I'll try to follow this up with 2 patches, one is an almost trivial
change to your code that makes it write out a pack with all full objects
and resolvable deltas converted to full objects, any unresolved deltas
are expected to be relative to some other object in the same pack.

The rewritten pack is indexed correctly even when I run git-update-index
in a repository that does not contain any of the objects in the thin-pack.
Ofcourse it also works when the objects are available, but the resulting
full pack is considerably bigger since we can find a suitable base for
every delta.

> However, the thing that makes me suspect that it is _not_ saveable, is 
> this:
...
> The answer is: no. It's not trivial. Or rather, it _is_ trivial, but you 
> have to _remember_ all of the actual data for A, B, C and D all the way to 
> the end, because only if you have that data in memory can you actually 
> _recreate_ B, C and D even enough to get their SHA1's (which you need, 
> just in order to know that the pack is complete, must less to be able to 
> create a non-delta version in case it hadn't been).

Only if you want to build the index at the same time, we don't need to
know the SHA1 values for unresolved deltas.

> Anyway, I just pushed the "rewrite-pack" branch to my git repo on 
> kernel.org, so once it mirrors out, if you really want to try to fix up 
> the mess I left behind, there it is:

I think I still left quite a bit of the mess unfixed.

Jan

^ permalink raw reply

* Re: [PATCH] git-svnimport.perl: copying directory from original SVN place
From: Sasha Khapyorsky @ 2006-10-20  0:22 UTC (permalink / raw)
  To: Junio C Hamano, Matthias Urlichs; +Cc: git
In-Reply-To: <20061008213118.GA1367@sashak.voltaire.com>

Hi guys,

On 23:31 Sun 08 Oct     , Sasha Khapyorsky wrote:
> 
> When copying whole directory, if source directory is not in already
> imported tree, try to get it from original SVN location. This happens
> when source directory is not matched by provided 'trunk' and/or
> 'tags/branches' templates or when it is not part of specified SVN
> sub-project.

Any news with it?

Sasha

> 
> Signed-off-by: Sasha Khapyorsky <sashak@voltaire.com>
> ---
>  git-svnimport.perl |   93 ++++++++++++++++++++++++++++++----------------------
>  1 files changed, 54 insertions(+), 39 deletions(-)
> 
> diff --git a/git-svnimport.perl b/git-svnimport.perl
> index 988514e..4ae0eec 100755
> --- a/git-svnimport.perl
> +++ b/git-svnimport.perl
> @@ -193,6 +193,13 @@ sub ignore {
>  	}
>  }
>  
> +sub dir_list {
> +	my($self,$path,$rev) = @_;
> +	my ($dirents,undef,$properties)
> +	    = $self->{'svn'}->get_dir($path,$rev,undef);
> +	return $dirents;
> +}
> +
>  package main;
>  use URI;
>  
> @@ -342,35 +349,16 @@ if ($opt_A) {
>  
>  open BRANCHES,">>", "$git_dir/svn2git";
>  
> -sub node_kind($$$) {
> -	my ($branch, $path, $revision) = @_;
> +sub node_kind($$) {
> +	my ($svnpath, $revision) = @_;
>  	my $pool=SVN::Pool->new;
> -	my $kind = $svn->{'svn'}->check_path(revert_split_path($branch,$path),$revision,$pool);
> +	my $kind = $svn->{'svn'}->check_path($svnpath,$revision,$pool);
>  	$pool->clear;
>  	return $kind;
>  }
>  
> -sub revert_split_path($$) {
> -	my($branch,$path) = @_;
> -
> -	my $svnpath;
> -	$path = "" if $path eq "/"; # this should not happen, but ...
> -	if($branch eq "/") {
> -		$svnpath = "$trunk_name/$path";
> -	} elsif($branch =~ m#^/#) {
> -		$svnpath = "$tag_name$branch/$path";
> -	} else {
> -		$svnpath = "$branch_name/$branch/$path";
> -	}
> -
> -	$svnpath =~ s#/+$##;
> -	return $svnpath;
> -}
> -
>  sub get_file($$$) {
> -	my($rev,$branch,$path) = @_;
> -
> -	my $svnpath = revert_split_path($branch,$path);
> +	my($svnpath,$rev,$path) = @_;
>  
>  	# now get it
>  	my ($name,$mode);
> @@ -413,10 +401,9 @@ sub get_file($$$) {
>  }
>  
>  sub get_ignore($$$$$) {
> -	my($new,$old,$rev,$branch,$path) = @_;
> +	my($new,$old,$rev,$path,$svnpath) = @_;
>  
>  	return unless $opt_I;
> -	my $svnpath = revert_split_path($branch,$path);
>  	my $name = $svn->ignore("$svnpath",$rev);
>  	if ($path eq '/') {
>  		$path = $opt_I;
> @@ -435,7 +422,7 @@ sub get_ignore($$$$$) {
>  		close $F;
>  		unlink $name;
>  		push(@$new,['0644',$sha,$path]);
> -	} else {
> +	} elsif (defined $old) {
>  		push(@$old,$path);
>  	}
>  }
> @@ -480,6 +467,27 @@ sub branch_rev($$) {
>  	return $therev;
>  }
>  
> +sub expand_svndir($$$);
> +
> +sub expand_svndir($$$)
> +{
> +	my ($svnpath, $rev, $path) = @_;
> +	my @list;
> +	get_ignore(\@list, undef, $rev, $path, $svnpath);
> +	my $dirents = $svn->dir_list($svnpath, $rev);
> +	foreach my $p(keys %$dirents) {
> +		my $kind = node_kind($svnpath.'/'.$p, $rev);
> +		if ($kind eq $SVN::Node::file) {
> +			my $f = get_file($svnpath.'/'.$p, $rev, $path.'/'.$p);
> +			push(@list, $f) if $f;
> +		} elsif ($kind eq $SVN::Node::dir) {
> +			push(@list,
> +			     expand_svndir($svnpath.'/'.$p, $rev, $path.'/'.$p));
> +		}
> +	}
> +	return @list;
> +}
> +
>  sub copy_path($$$$$$$$) {
>  	# Somebody copied a whole subdirectory.
>  	# We need to find the index entries from the old version which the
> @@ -488,8 +496,11 @@ sub copy_path($$$$$$$$) {
>  	my($newrev,$newbranch,$path,$oldpath,$rev,$node_kind,$new,$parents) = @_;
>  
>  	my($srcbranch,$srcpath) = split_path($rev,$oldpath);
> -	unless(defined $srcbranch) {
> -		print "Path not found when copying from $oldpath @ $rev\n";
> +	unless(defined $srcbranch && defined $srcpath) {
> +		print "Path not found when copying from $oldpath @ $rev.\n".
> +			"Will try to copy from original SVN location...\n"
> +			if $opt_v;
> +		push (@$new, expand_svndir($oldpath, $rev, $path));
>  		return;
>  	}
>  	my $therev = branch_rev($srcbranch, $rev);
> @@ -503,7 +514,7 @@ sub copy_path($$$$$$$$) {
>  	}
>  	print "$newrev:$newbranch:$path: copying from $srcbranch:$srcpath @ $rev\n" if $opt_v;
>  	if ($node_kind eq $SVN::Node::dir) {
> -			$srcpath =~ s#/*$#/#;
> +		$srcpath =~ s#/*$#/#;
>  	}
>  	
>  	my $pid = open my $f,'-|';
> @@ -582,10 +593,12 @@ sub commit {
>  		if(defined $oldpath) {
>  			my $p;
>  			($parent,$p) = split_path($revision,$oldpath);
> -			if($parent eq "/") {
> -				$parent = $opt_o;
> -			} else {
> -				$parent =~ s#^/##; # if it's a tag
> +			if(defined $parent) {
> +				if($parent eq "/") {
> +					$parent = $opt_o;
> +				} else {
> +					$parent =~ s#^/##; # if it's a tag
> +				}
>  			}
>  		} else {
>  			$parent = undef;
> @@ -651,9 +664,10 @@ #	}
>  				push(@old,$path); # remove any old stuff
>  			}
>  			if(($action->[0] eq "A") || ($action->[0] eq "R")) {
> -				my $node_kind = node_kind($branch,$path,$revision);
> +				my $node_kind = node_kind($action->[3], $revision);
>  				if ($node_kind eq $SVN::Node::file) {
> -					my $f = get_file($revision,$branch,$path);
> +					my $f = get_file($action->[3],
> +							 $revision, $path);
>  					if ($f) {
>  						push(@new,$f) if $f;
>  					} else {
> @@ -668,19 +682,20 @@ #	}
>  							  \@new, \@parents);
>  					} else {
>  						get_ignore(\@new, \@old, $revision,
> -							   $branch, $path);
> +							   $path, $action->[3]);
>  					}
>  				}
>  			} elsif ($action->[0] eq "D") {
>  				push(@old,$path);
>  			} elsif ($action->[0] eq "M") {
> -				my $node_kind = node_kind($branch,$path,$revision);
> +				my $node_kind = node_kind($action->[3], $revision);
>  				if ($node_kind eq $SVN::Node::file) {
> -					my $f = get_file($revision,$branch,$path);
> +					my $f = get_file($action->[3],
> +							 $revision, $path);
>  					push(@new,$f) if $f;
>  				} elsif ($node_kind eq $SVN::Node::dir) {
>  					get_ignore(\@new, \@old, $revision,
> -						   $branch,$path);
> +						   $path, $action->[3]);
>  				}
>  			} else {
>  				die "$revision: unknown action '".$action->[0]."' for $path\n";
> -- 
> 1.4.2.3
> 

^ permalink raw reply

* Re: [PATCH] Don't use $author_name undefined when $from contains no /\s</.
From: Paul Eggert @ 2006-10-19 23:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, bug-gnu-utils, Jim Meyering
In-Reply-To: <7vr6x4q9b6.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> If "trailing space" highlighting picks up the first column blank
> in "diff -u" output, that highlighting feature is *broken*.

If the buffer contains arbitrary text, some of which is diff -u output
and some of which is not, then it it isn't possible in general for the
highlighting mode to distinguish between the diff -u part and the
other part.  This sort of thing is fairly common among people who
email patches and code around, or who generate files containing a
combination of patches and other things.

If the change bothers you a lot, you might want to follow up to
<http://www.opengroup.org/austin/mailarchives/ag-review/msg02139.html>,
which proposes the change in question to the POSIX folks.  This change
is atop the earlier change I proposed to specify "diff -u" format in
the first place; see
<http://www.opengroup.org/austin/mailarchives/ag-review/msg02077.html>.
You can follow up by writing to austin-group-l@opengroup.org and
citing XCU ERN 103.  You can find a copy of XCU ERN 103 at
<http://www.opengroup.org/austin/aardvark/latest/xcubug2.txt>;
look for "Number 103".

Since git uses diff -u format, it would make sense to git to work with
the upcoming POSIX spec for diff -u, either by adjusting the spec or
by adjusting git.

^ permalink raw reply

* Re: VCS comparison table
From: Carl Worth @ 2006-10-19 23:42 UTC (permalink / raw)
  To: Aaron Bentley
  Cc: Linus Torvalds, Jakub Narebski, Andreas Ericsson, bazaar-ng, git
In-Reply-To: <453803E6.2060309@utoronto.ca>

[-- Attachment #1: Type: text/plain, Size: 5266 bytes --]

On Thu, 19 Oct 2006 19:01:58 -0400, Aaron Bentley wrote:
> I don't think this is true.  The abandoned mainline does not need to be
> destroyed.  It can be kept at the same location that it always was, with
> the numbers that it always had. So the number + URL combo stays
> meaningful.

Sure that's possible, but it gets rather unwieldy the more
repositories you have involved. I've been arguing that bzr really does
encourage centralized, not distributed development, and you were having
trouble seeing how I came to that conclusion. Do you see how "maintain
an independent URL namespace for every distributed branch" doesn't
encourage much distributed development?

>             Additionally, the new mainline can keep a mirror of the
> abandoned mainline in its repository, because there are virtually no
> additional storage requirements to doing so.

And this part I don't understand. I can understand the mainline
storing the revisions, but I don't understand how it could make them
accessible by the published revision numbers of the "abandoned"
line. And that's the problem.

> > But for these communications, revision numbers will not provide
> > historically stable values that can be used.
>
> They certainly can.
>
> The coder says "I've put up a branch at http://example.com/bzr/feature.
>  In revision 5, I started work on feature A.  I finished work in
> revision 6.  But then I had to fix a related bug in revision 7."

"I've put this branch up" isn't historically stable...

> As long as that coder is active

...which is what you just said there yourself.

On the other hand, git names really do live forever, regardless of
where the code is hosted or how it moves around. When I'm talking
about historical stability, I'm talking about being able to publish
numbers that live forever.

It sounds like bzr has numbers like this inside it, (but not nearly as
simple as the ones that git has), but that users aren't in the
practice of communicating with them. Instead, users communicate with
the unstable numbers. And that's a shame from an historical
perspective.

> This is true, but his code is likely to all land in the mainline at
> once.  Since his own revnos are more fine-grained, he's not likely want
> to use the mainline revnos.

What I'd like to be able to do, is advertise a temporary repository,
and while using it, publish names for revisions that will still be
valid when the code gets pushed out to the mainline. That is
supporting distributed development, and everything I'm hearing says
that the bzr revision numbers don't support that.

> I felt that you were mischaracterizing my _statement_ that "it's
> exceedingly uncommon for [revnos] to change" as an _argument_ "it's
> exceedingly uncommon for [revnos] to change".  The reality is that we
> keep saying revnos don't change because git users keep saying "but what
> if the revnos change?".

OK.

The original claim that sparked the discussion was that bzr has a
"simple namespace" while git does not. We've been talking for quite a
while here, and I still don't fully understand how these numbers are
generated or what I can expect to happen to the numbers associated
with a given revision as that revision moves from one repository to
another. It's really not a simple scheme.

Meanwhile, I have been arguing that the "simple" revision numbers that
bzr advertises have restrictions on their utility, (they can only be
used with reference to a specific repository, or with reference to
another that treats it as canonical). I _think_ I understand the
numbers well enough to say that still.

Compare that with the git names. The scheme really is easy to
understand, (either the new user already understands cryptographic
hashes, or else it's as easy as "a long string of digits that git
assigns as the name"). The names have universal utility in time and
space, (for definitions of the the universe larger than I will ever be
able to observe anyway). And the natural inclination to abbreviate the
a name when repeating it, (note the recent post with bzr UUIDs
exhibiting the same inclination), doesn't make the names any less
useful since the abbreviation alone will work most always.

The naming in git really is beautiful and beautifully simple.

It's not monotonically increasing from one revision to the next, but
I've never found that to be an issue. Of course, we do still use our
own "simple" names for versioning the releases and snapshots of
software we manage with git, and that's where being able to easily
determine "newer" or "older" by simple numerical examination is
important. I've honestly never encountered a situation where I was
handed two git sha1 sums and wished that I could do the same thing.

> If you're interested, it's called "Bugs Everywhere" and it's available here:
> http://panoramicfeedback.com/opensource/
>
> New VCS backends are welcome :-D

Thanks, I hope to take a look at that at some point.

> So actually, not all branches are treated equally by Git users.  Public
> branches are treated as append-only, but private branches are treated as
> mutable.  (It's the same with bzr users, of course.)

Well, some users treat all branches as append only and shun rebase.

[snip of remaining agreement of similarity between the tools]

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: VCS comparison table
From: Ryan Anderson @ 2006-10-19 23:28 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Linus Torvalds, Matthew D. Fuller, Andreas Ericsson, Carl Worth,
	bazaar-ng, git, Jakub Narebski
In-Reply-To: <vpqlknc3zmn.fsf@ecrins.imag.fr>

On 10/19/06, Matthieu Moy <Matthieu.Moy@imag.fr> wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
>
> > Btw, I do believe that bzr seems to be acting a lot like BK, at least when
> > it comes to versioning. I suspect that is not entirely random either, and
> > I suspect it's been a conscious effort to some degree.
> >
> > Which is fine, in the sense that there are certainly much worse things to
> > try to copy.
>
> By curiosity, how would you compare git and Bitkeeper, on a purely
> technical basis? (not asking for a detailed comparison, but an "X is
> globaly/much/terribly/not better than Y" kind of statement ;-) )

Having used both in a past job setting (simultaneously even),
BitKeeper was a huge win over CVS, but after a while, some of its
tools  were just very frustrating in comparison with comparable Git
interfaces, and I had actually written a terribly slow BK -> Git
converter just so I could incrementally import our BK tree, then use
Git's history-viewing because it was so much more pleasant to work
with.

For small projects (~5 people), they weren't hugely different, but Git
just felt more comfortable after a while.  (It was actually possible
to do a commit from the command line in a single command, without
getting annoyed by the interface, for a trivial example.)

^ permalink raw reply

* Re: [PATCH 2/2] Remove dead code after direct graph drawing
From: Josef Weidendorfer @ 2006-10-19 23:11 UTC (permalink / raw)
  To: Marco Costalba; +Cc: git
In-Reply-To: <200610191613.19142.Josef.Weidendorfer@gmx.de>

On Thursday 19 October 2006 16:13, Josef Weidendorfer wrote:
> Signed-off-by: Josef Weidendorfer <Josef.Weidendorfer@gmx.de>

Hmmm...

Is the git mailing list the right place for qgit patches?
Probably, I should have prefixed them with "qgit:" ...

Josef

^ permalink raw reply

* Re: VCS comparison table
From: Aaron Bentley @ 2006-10-19 23:01 UTC (permalink / raw)
  To: Carl Worth
  Cc: Linus Torvalds, Jakub Narebski, Andreas Ericsson, bazaar-ng, git
In-Reply-To: <878xjc2qeb.wl%cworth@cworth.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Carl Worth wrote:
> On Thu, 19 Oct 2006 10:58:48 -0400, Aaron Bentley wrote:

> Let's imagine there's a complete fork in the bzr codebase tomorrow. We
> need not suppose any acrimony, just an amiable split as two subsets of
> the team start taking the code in different directions.

...

> Finally the two teams ... want to
> merge their code together.
> 
> After the merge, there can be only one mainline, so one team or the
> other will have to concede to give up the numbers they had generated
> and published during the fork.

I don't think this is true.  The abandoned mainline does not need to be
destroyed.  It can be kept at the same location that it always was, with
the numbers that it always had.  So the number + URL combo stays
meaningful.  Additionally, the new mainline can keep a mirror of the
abandoned mainline in its repository, because there are virtually no
additional storage requirements to doing so.

> An individual takes the bzr codebase and starts working on it. It's
> experimental stuff, so it's not pushed back into the central
> repository yet. But our coder isn't a total recluse, so his friends
> help him with the code he's working on. They communicate about their
> work, (perhaps on the main bzr mailing list), and make statements such
> as "feature F is working perfectly as of version V".
> 
> But for these communications, revision numbers will not provide
> historically stable values that can be used.

They certainly can.

The coder says "I've put up a branch at http://example.com/bzr/feature.
 In revision 5, I started work on feature A.  I finished work in
revision 6.  But then I had to fix a related bug in revision 7."

As long as that coder is active, they'll keep their repository at the
same location.  And because branches are cheap (even cheaper than
delta-compressed revisions), there's no reason to delete old branches.
It's better to keep them around for reference purposes.

> It's impossible for our
> coder to predict the numbers that will be assigned to his code when
> they get merged back into the mainline---since some other unknown
> programmer may have branched at exactly the same point and is trying
> to make the same determination.

This is true, but his code is likely to all land in the mainline at
once.  Since his own revnos are more fine-grained, he's not likely want
to use the mainline revnos.

> Now, the programmers could get stable numbers by keeping the branch in
> the main tree, or by at least pushing out the branching point to
> "reserve" a number in the main tree.

I don't know what you mean by pushing out the branching point.

>> That doesn't follow.  Just because something is arguably true doesn't
>> make it bad.  And in this case, I'm not arguing that it's true, I'm
>> saying that it's true, because that is what my experience tells me is true.
> 
> [I'm sorry, but I didn't grasp this sentence. I think I lost the
> antecedent of "it" somewhere.]

I felt that you were mischaracterizing my _statement_ that "it's
exceedingly uncommon for [revnos] to change" as an _argument_ "it's
exceedingly uncommon for [revnos] to change".  The reality is that we
keep saying revnos don't change because git users keep saying "but what
if the revnos change?".


>>          And I personally have been developing a bugtracker that is
>> distributed in the same way bzr is; it stores bug data in the source
>> tree of a project, so that bug activities follow branches around.
> 
> That kind of thing sounds very useful. As I've been talking about
> "numbers" here in bug trackers and mailing lists, it should be obvious
> that I consider the information stored in such systems an important
> part of the history of a code project. So it would be nice if all of
> that history were stored in an equally reliable system in some way.

If you're interested, it's called "Bugs Everywhere" and it's available here:
http://panoramicfeedback.com/opensource/

New VCS backends are welcome :-D

>> In the first place, it seems fairly common in the Git community to
>> rebase.  This process throws away old revisions and creates new
>> revisions that are morally equivalent[1].
> 
> Yes, rebasing does "destroy history" in one sense, (in actual fact, it
> creates new commits and leaves the old ones around, which may or may
> not have references to them anymore). But i's definitely not common
> for git users to use rebase in a situation where it would change any
> published number.

So actually, not all branches are treated equally by Git users.  Public
branches are treated as append-only, but private branches are treated as
mutable.  (It's the same with bzr users, of course.)

> And git helps with this as well. If I ever forget that I've already
> pushed a change and then I rebase, then the next time I try to push,
> git will complain that I'm attempting to throw away history on the
> remote end, and will refuse to cooperate, (unless I force it).

Same here.

> There's a similar safety mechanism on the pull side. If I did force a
> history-rewriting push, then users who tried to pull it would also
> have to force git's hand before it would rewrite their history.

Same here.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFOAPm0F+nu1YWqI0RAhkdAJ9InxuEjbToGQU2AOJmfZw124Lb2wCeMmDC
9w08eZbmL19FfVQmtpPcYkQ=
=AmGo
-----END PGP SIGNATURE-----

^ permalink raw reply

* Re: [PATCH] Don't use $author_name undefined when $from contains no /\s</.
From: Junio C Hamano @ 2006-10-19 21:34 UTC (permalink / raw)
  To: Paul Eggert; +Cc: git
In-Reply-To: <87fydkj8q1.fsf@penguin.cs.ucla.edu>

Paul Eggert <eggert@CS.UCLA.EDU> writes:

> Junio C Hamano <junkio@cox.net> writes:
>
>> I see no good reason, other than saving a single byte from the
>> output stream perhaps.
>
> That wasn't the motivation.  Rather, it was to support the
> style where people use editors that highlight trailing
> blanks, since trailing blanks can cause trouble in some
> contexts (e.g., they can change the semantics of C programs
> and Makefiles).  When examining unified diffs, any added or
> removed trailing blanks will be easy to spot with such an
> editor, but only if "diff -u" doesn't output any trailing
> blanks of its own.

If "trailing space" highlighting picks up the first column blank
in "diff -u" output, that highlighting feature is *broken*.

"git diff --color" does the whitespace breakage highlighting,
but it knows that the first column *is* not payload and does not
highlight it.

> You can read more about this at the thread that inspired
> the diffutils change, rooted here:
>
> http://lists.gnu.org/archive/html/bug-gnu-utils/2006-09/msg00005.html

I've read it.  It was not convincing and was not even an amusing
read.

^ permalink raw reply

* Re: [PATCH] Don't use $author_name undefined when $from contains no /\s</.
From: Paul Eggert @ 2006-10-19 21:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jim Meyering, git, bug-diffutils
In-Reply-To: <7vk62wruum.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> I see no good reason, other than saving a single byte from the
> output stream perhaps.

That wasn't the motivation.  Rather, it was to support the
style where people use editors that highlight trailing
blanks, since trailing blanks can cause trouble in some
contexts (e.g., they can change the semantics of C programs
and Makefiles).  When examining unified diffs, any added or
removed trailing blanks will be easy to spot with such an
editor, but only if "diff -u" doesn't output any trailing
blanks of its own.

You can read more about this at the thread that inspired
the diffutils change, rooted here:

http://lists.gnu.org/archive/html/bug-gnu-utils/2006-09/msg00005.html

> Does that mean if you have a line that has only one TAB (perhaps
> caused by broken autoindent in the editor), that is "input data"
> and is output as "SP TAB LF"?

Yes, that's correct.  In the highlighting-editor scenario,
such a line would be highlighted, but the people who want to
see trailing white space highlighted will indeed want the
highlighting here, so it's fine.

This change was not motivated by broken MUAs.  Broken MUAs
are a problem that GNU 'patch' has already had to deal with,
for many years.  The change was motivated by a desire to
make significant trailing white space easier to find, when
people are examining text that contains some diffs and some
other stuff.

^ permalink raw reply

* Re: VCS comparison table
From: Linus Torvalds @ 2006-10-19 20:47 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Matthew D. Fuller, Andreas Ericsson, Carl Worth, bazaar-ng, git,
	Jakub Narebski
In-Reply-To: <vpqlknc3zmn.fsf@ecrins.imag.fr>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 8997 bytes --]

On Thu, 19 Oct 2006, Matthieu Moy wrote:
> 
> By curiosity, how would you compare git and Bitkeeper, on a purely
> technical basis? (not asking for a detailed comparison, but an "X is
> globaly/much/terribly/not better than Y" kind of statement ;-) )

I think git is better for kernel work these days, but a large portion of 
that is that a lot of the features have literally been tweaked for us (for 
very obvious reasons).

For example, the whole "rebase" thing (or explicitly making cherry-picking 
easy) is something that a number of kernel people do, and even if I have 
to admit to not liking the practice very much (it kind of hides the "true" 
development history), it does have huge advantages, and it makes history a 
lot easier to read.

Similarly, I often used the single-file graphical history viewing in BK 
("revtool"), but being able to follow the history of multiple files as one 
"entity" really is something that once you get used to, it's really really 
hard going back, and "gitk" does generate a much more readable graph.

And I think the git way of doing branches is just simply superior. Git 
always did branches in the sense that the way merges happened you _always_ 
had several heads, but actually making them available and switching 
between them was something that wasn't my idea, and that I even was a bit 
apprehensive about. I was wrong. Git branches are branches done right. I 
just don't see how you _could_ do them better.

That said, a lot of the features I like and _I_ consider really important 
are possibly not that important to others. For example, maybe nobody else 
really cares about viewing the history of a particular subsystem, the way 
I do. For a lot of people, single-file is probably ok. 

For example, while git now does "annotate" (or "blame"), it's not 
lightning fast, and I simply don't care. Doing a

	git blame kernel/sched.c

takes about three seconds for me, and that's on a pretty good machine (and 
on the kernel tree, which for me is always in the cache ;). Quite frankly, 
if I cared deeply about that kind of annotation, I'd probably be upset 
about it. There are basically _no_ other git operations that take that 
long. I can get the _full_ log of the last 18 months of the kernel much 
faster than that.

And the slowness of annotate comes directly from the design of git, and 
from the fact that it's not how I tend to look at changes. Rather than 
doing "git blame kernel/sched.c", I'm _much_ more likely to just do

	git log -p kernel/sched.c

and see the changes as individual patches instead (and perhaps search for 
some pattern that I'm looking for by just literally using a regex in the 
pager).

Also, the fact that you need to repack the archive every once in a while 
doesn't disturb me. I probably end up repacking the kernel almost daily, 
which is _waay_ excessive, but it's just become habit of mine. I've seen 
people who really don't like it, and I've also seen people who apparently 
never even realized that they should do an occasional "git repack -a -d", 
and then they have hundreds of thousands of loose objects and wonder why 
the performance is so bad ;)

BK never had these issues. BK always kept things "packed", which made a 
lot of operations much slower ("bk undo" was painfully slow). BK could 
annotate quickly, since it was really a file-based history, in a way that 
git fundamentally isn't, and can never be (and I don't _want_ it to be, 
but it means that "annotate" is slow).

And BK had some great tools. The merge tool was superior ("bk resolve"? I 
forget). The patch-application tool was great.

But both of those tools are things that git doesn't have, for _another_ 
reason: the way git works, you don't really need them. For example, the 
patch application tool was great, but the biggest reason it was needed in 
the first place was tracking renames explicitly.

In that kind of environment, you have serious problems with patches, and 
you actually _need_ a tool to let the user explain when something is a 
rename and when it isn't. With git not tracking renames, the patch 
application tool simply isn't needed.

The same goes to some degree to "bk resolve". Because git has the index, 
and you can _leave_ things unresolved in the index, you don't need a 
graphical tool to resolve things - git knows very fundamentally about 
incomplete merges _and_ about multiple branches (which you need in order 
to keep track of both the branch you merge from and the branch you merge 
into), and it's fine to resolve any conflicts in the normal working tree.

So for at least _my_ usage, git does everything very well, but that's 
because if it didn't fit me, I fixed it until it did. 

And "git bisect" really does rock. I still cannot believe that apparently 
nobody did it before us. It's such a useful thing, and it works so well in 
unambiguous cases (and not all cases are that unambiguous, but an 
appreciably large subset is).

So that said, git does work very well for us, but I do want to end on a 
note on thigns that BitKeeper did and nobody else has:

 - Larry was first. The undeniable fact is, that before BK (and for 
   several years _after_ BK), the open-source alternatives were just CRAP.

   You can say anything you like about his personality, but dammit, 
   compared to Larry, most people I know are idiots. People don't give BK 
   the credit it deserves. When Tridge "reverse-engineered" it, people 
   were making jokes about how trivial some of the protocols were. That 
   misses the point ENTIRELY. The point is, compared to BK, everything 
   else absolutely _sucked_, and BK really was a watershed program.

   Never EVER underestimate how important BK was. Quite frankly, I think 
   most open-source SCM's _still_ suck. I'm constantly amazed that anybody 
   would touch SVN with a ten-foot pole. Talk about crap. And SVN is at 
   least usable, unlike a lot of other projects.

 - When I did git, one of the things that actually _helped_ me was that I 
   was consciously trying to not do a BK clone. I wanted to do the same 
   things that BK did, but I very much did _not_ want to do them the _way_ 
   BK did them. I respect Larry too much, and I didn't want there to be 
   any question about git being just a "clone".

   So a lot of the git design ended up very much trying to avoid old 
   designs on purpose, and I think that really helped. The fact that I 
   didn't have a background in SCM's, and that I thought all the weaves 
   etc were confusing, meant that I instead went for a radically different 
   way of doing things.

   And I'm 100% convinced that "radically different" was the right thing 
   to do. That was what allowed git to really soar. A lot of the good 
   things in git come exactly from the fact that git does _not_ do things 
   like most traditional SCM's do. But BK should still get a lot of 
   credit, because it was what taught me (and a lot of other people) what 
   being "distributed" really meant.

 - On a more personal note: people say that BK showed the "failure" of 
   using a commercial closed-source program. I would disagree. Not only 
   did the kernel get a whole lot of useful work out of BK, we learnt how 
   distributed systems _should_ work, and quite frankly, I'd do ít all 
   over again in a heartbeat.

   If there was a "failure" in the BK saga, it was in how horrendously 
   _bad_ all open-source SCM's were, even with BK showing how it should 
   have been done for several years. THAT is the failure. The fact that 
   there were hundreds of people who whined about BK, and nobody really 
   did anything productive. 

Now, I'm obviously biased, but I really do believe that git is the best 
open-source SCM there is, by a _mile_. I don't know how many people 
realize this, but we literally haven't changed our data formats in over a 
year. I was looking at my old git import of the BKCVS tree today, because 
I wanted to look up the "BKrev" format for the email earlier in this tree, 
and I realized that the pack-file was from July of last year. That's 
within a few _weeks_ of the pack-file being introduced at all, and guess 
what? It all still worked. No "on-the-fly format conversion", no 
_nothing_. It just worked.

That should tell people something. It's pretty much the fastest SCM out 
there (and yeah, that's on almost any operation you can name), it still 
has the smallest disk footprint I've ever heard of, and it hasn't had the 
"format of the week" disease that every other project seems to go through.

And it's used in production settings on some of the biggest projects out 
there. SVN has more users, but let's face it, SVN really isn't even in the 
running. Technology-wise, the thing is just not worth bothering with, but 
it's a good crutch for people who are used to CVS and never want to use 
anything lse.

Am I happy with git? I'm happy as a clam. It turned out even better than I 
ever thought it would. And BK was what taught me what to aim for.

			Linus

^ permalink raw reply

* Re: VCS comparison table
From: Nathaniel Smith @ 2006-10-19 19:01 UTC (permalink / raw)
  To: git
In-Reply-To: <45345CBE.8020209@utoronto.ca>

Aaron Bentley <aaron.bentley <at> utoronto.ca> writes:
> Bazaar also supports multiple unrelated branches in a repository, as
> does CVS, SVN (depending how you squint), Arch, and probably Monotone.

It's quite common in Monotone.  You could probably do it in Mercurial as well,
though I don't know that anyone does.  SVK definitely does it (since each user
has a single repo that's shared by all the projects they work on).

Trivia-ly yours,
-- Nathaniel

^ permalink raw reply

* Re: VCS comparison table
From: Junio C Hamano @ 2006-10-19 19:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0610191110290.3962@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> The other big difference is being able to do merges in seconds. The 
> biggest cost of doing a big merge these days seems to literally be 
> generating the diffstat of the changes at the end (which is purely a UI 
> issue, but one that I find so important that I'll happily take the extra 
> few seconds for that, even if it sometimes effectively doubles the 
> overhead).

An interesting effect on this is when people have a column for
merge performance in a SCM comparison table, they would include
time to run the diffstat as part of the time spent for merging
when they fill in the number for git, but not for any other SCM.

I know you won't misunderstand me but for the sake of others, I
should add this: I am not saying diffstat should be optional.

^ permalink raw reply

* Re: [PATCH] Don't use $author_name undefined when $from contains no /\s</.
From: Junio C Hamano @ 2006-10-19 19:03 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git, bug-diffutils, Paul Eggert
In-Reply-To: <878xjckw7x.fsf@rho.meyering.net>

Jim Meyering <jim@meyering.net> writes:

> There were two differences, both involving removed trailing blanks.
> The first was a part of the diff: a line consisting of a single space
> denoting an empty line in the context.  I understood that those types
> of lines may safely be truncated (removing the trailing blank),
> and in fact, GNU diff -u (cvs) now does this by default:
>
> 2006-09-05  Paul Eggert  <eggert@cs.ucla.edu>
>
>         * NEWS: diff -u no longer outputs trailing white space unless the
>         input data has it.  Suggested by Jim Meyering.
>         * doc/diff.texi (Detailed Unified): Document this.
>         * src/context.c (pr_unidiff_hunk): Implement this.

Gaah.  Paul, why did you have to break this?  I see no good
reason, other than saving a single byte from the output stream
perhaps.

Leading ' ' at the context line is _not_ trailing white space;
it is a metadata just like a leading '+' or '-' is.

We could certainly update git-apply to understand it and we
probably would need to do so to cope with patch generated with
this *broken* GNU diff behaviour.

I see why some people consider why it _might_ be a good change.
A broken MUA tend to have trouble with lines that has only
whitespaces, so if a patch application program (patch or
git-apply) wants to deal with such a broken MUA, accepting a
totally empty line as if it is a line that has a single
whitespace at the beginning would save us from grief in some
cases.

However, I am not sure what "unless input data has it" means.
Does that mean if you have a line that has only one TAB (perhaps
caused by broken autoindent in the editor), that is "input data"
and is output as "SP TAB LF"?  If that is the case, then I do
not think dropping the leading SP only for an empty line makes
any sense.  A broken MUA would happily munge a line "SP TAB LF"
just as it would eat a line "SP LF".  Worse, such a MUA would
munge "+ TAB LF" into "+ LF", making the result of patch
application to be something the original patch author did not
intend to have.

If anything, this new behaviour makes the situation *actively*
worse.

By deciding to keep "SP TAB LF", you are saying that you _care_
about that trailing TAB in the patch and whitespace breakage
affects your payload in a bad way in your particular
application.  If that is the case, you would want to detect any
whitespace breakage a MUA might have caused before applying that
patch, and a broken context line that ought to be "SP LF" but
somehow comes out from MUA as "LF" would have served us as a
coalmine canary to help us detect the breakage.  Paul's change
to GNU diff is to kill that canary and I do not see any benefit
for doing so.

Why?

Please revert the patch, pretty please?

^ permalink raw reply

* Re: VCS comparison table
From: Matthieu Moy @ 2006-10-19 18:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: bazaar-ng, Matthew D. Fuller, Carl Worth, Andreas Ericsson, git,
	Jakub Narebski
In-Reply-To: <Pine.LNX.4.64.0610191110290.3962@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> Btw, I do believe that bzr seems to be acting a lot like BK, at least when 
> it comes to versioning. I suspect that is not entirely random either, and 
> I suspect it's been a conscious effort to some degree.
>
> Which is fine, in the sense that there are certainly much worse things to 
> try to copy.

By curiosity, how would you compare git and Bitkeeper, on a purely
technical basis? (not asking for a detailed comparison, but an "X is
globaly/much/terribly/not better than Y" kind of statement ;-) )

^ permalink raw reply

* Re: VCS comparison table
From: Linus Torvalds @ 2006-10-19 18:30 UTC (permalink / raw)
  To: Matthew D. Fuller
  Cc: Carl Worth, Andreas Ericsson, bazaar-ng, git, Jakub Narebski
In-Reply-To: <Pine.LNX.4.64.0610190948540.3962@g5.osdl.org>

On Thu, 19 Oct 2006, Linus Torvalds wrote:
> 
> Ahh. They may be that even in BK. I know BK had various 16-bit CRC 
> checksums, but they were probably on the actual _file_ contents, not in 
> the key itself.

Btw, I do believe that bzr seems to be acting a lot like BK, at least when 
it comes to versioning. I suspect that is not entirely random either, and 
I suspect it's been a conscious effort to some degree.

Which is fine, in the sense that there are certainly much worse things to 
try to copy.

That said, at least BK was up-front about the versions changing, and 
didn't try to do anything to hinder it. It still confused some people, and 
it wasn't a great naming system, but it did work.

In the big picture, the version naming between BK and git hasn't been an 
issue for anybody in practice, I suspect.

So if you want to look at features that actually matter more, try out 
something like

	gitk drivers/scsi include/scsi

on the kernel archive (I assume that somebody has tried importing the 
kernel git tree into bzr - quite frankly, if bzr cannot handle that size 
tree without problems, you have much bigger issues!).

In other words, being able to look at history of more than a single file 
has been a _huge_ bonus. 

The other big difference is being able to do merges in seconds. The 
biggest cost of doing a big merge these days seems to literally be 
generating the diffstat of the changes at the end (which is purely a UI 
issue, but one that I find so important that I'll happily take the extra 
few seconds for that, even if it sometimes effectively doubles the 
overhead).

Looking at the dates of the merges yesterday, they're literally half a 
minute apart, and that's not me _scripting_ them - that's me actually 
looking up the emails, typing in the "git pull " and pasting the source 
repository, and git fetching the data over the network and merging it, and 
checking out the result (and me verifying that the resulting diffstat 
matches what the email says). Doing four of those in a row in less than 
two minutes is actually a really big deal.

At some point, "performance" is just more than a question of how fast 
things are, it becomes a big part of usability.

			Linus

^ permalink raw reply

* Re: [PATCH] Don't use $author_name undefined when $from contains no /\s</.
From: Jim Meyering @ 2006-10-19 18:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, bug-diffutils, Paul Eggert
In-Reply-To: <7vbqo8uvkn.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> Jim Meyering <jim@meyering.net> writes:
>
>> I noticed a case not handled in a recent patch.
>
> Thanks. Will apply.
>
> Curiously your patch was whitespace damaged.

I wondered what you meant, so compared what I sent
with the output of the command I ran:

  git-format-patch --stdout --signoff HEAD~1

There were two differences, both involving removed trailing blanks.
The first was a part of the diff: a line consisting of a single space
denoting an empty line in the context.  I understood that those types
of lines may safely be truncated (removing the trailing blank),
and in fact, GNU diff -u (cvs) now does this by default:

2006-09-05  Paul Eggert  <eggert@cs.ucla.edu>

        * NEWS: diff -u no longer outputs trailing white space unless the
        input data has it.  Suggested by Jim Meyering.
        * doc/diff.texi (Detailed Unified): Document this.
        * src/context.c (pr_unidiff_hunk): Implement this.

The only other difference was the removal of the trailing blank following
the "--" signature introducer.

I see that git-apply does not handle this new format:

  $ git-apply patch
  fatal: corrupt patch at line 47

That diagnostic comes from builtin-apply.c:

		if (len <= 0)
			die("corrupt patch at line %d", linenr);

It would be nice if git would accept such unified diff output,
since no other program we know of rejects them.  Paul Eggert has
even submitted revised wording to make POSIX allow this style
of output.

For reference, the GNU diff thread started here:
  http://lists.gnu.org/archive/html/bug-gnu-utils/2006-09/msg00005.html

^ permalink raw reply

* Re: VCS comparison table
From: J. Bruce Fields @ 2006-10-19 17:14 UTC (permalink / raw)
  To: Carl Worth
  Cc: Aaron Bentley, Linus Torvalds, Jakub Narebski, Andreas Ericsson,
	bazaar-ng, git
In-Reply-To: <8764eg2qaa.wl%cworth@cworth.org>

On Thu, Oct 19, 2006 at 10:01:33AM -0700, Carl Worth wrote:
> On Thu, 19 Oct 2006 10:58:48 -0400, Aaron Bentley wrote:
> > On the other hand, I think your revision identifiers are not as
> > permanent as you think.
> >
> > In the first place, it seems fairly common in the Git community to
> > rebase.  This process throws away old revisions and creates new
> > revisions that are morally equivalent[1].
> 
> Yes, rebasing does "destroy history" in one sense, (in actual fact, it
> creates new commits and leaves the old ones around, which may or may
> not have references to them anymore).

Note that the id's are still permanent in this case; they will never
(module some assumptions about the crypto) be reused.  So a given id
points at one and only one object, for all time; it's just that we may
forget what that one object is....

> > In the second place, one must consider the "nuclear launch codes"
> > scenario.
> 
> Sure. And git does provide tools that can do this.

So in this case you can certainly lose the launch codes.  But you have
forever granted everyone a way to determine whether a given guess at the
launch codes is correct.  (Again, assuming some stuff about SHA1).

--b.

^ permalink raw reply

* Re: VCS comparison table
From: Matthew D. Fuller @ 2006-10-19 17:06 UTC (permalink / raw)
  To: Petr Baudis
  Cc: bazaar-ng, Karl Hasselström, Linus Torvalds, Carl Worth,
	Andreas Ericsson, git, Jakub Narebski
In-Reply-To: <20061019160103.GZ75501@over-yonder.net>

On Thu, Oct 19, 2006 at 11:01:03AM -0500 I heard the voice of
Matthew D. Fuller, and lo! it spake thus:
>
> Now, the question of "is that merge commit E really necessary, when
> you could just attach D to the end of the graph and create something
> like [...] is perhaps a useful question (and one that there's
> obviously disagreement on).  And it may be a fruitful one to
> discuss, if we're not way off in the weeds already.
>
> But, it's also not QUITE the same question as "Is the left-vs-other
> path distinction meaningful and to be preserved?"

Let me elaborate a little on this.

bzr COULD create

>   a-.
>   |\ \
>   b c |
>   |/ /
>   D-'

instead of

>   a-.
>   |\ \
>   b c |
>   |\|/
>   | D
>   |/ 
>   E

for the previously discussed merge, basically duplicating
'fast-forward' behavior.  It doesn't currently, but it could just as
well without disturbing the attributes it gains from assigning meaning
to the left-most parent.  The choice to create E is the result of an
independent decision from the choice to treat the left path as
special.

What the leftmost discussion impacts is the case of 

    a-.
    |\ \
    | b c
    |/ /
    D-'

vs

    a-.-.
     \ \ \
      b c |
     / / /
    D-'-'

Now, the branches are distinct to bzr, but they're not different.  If
you try to merge one from the other, merge will quite rightly tell you
there's nothing to do, since you both have all the same revs.  git
doesn't recognize the distinction at all, of course.  The difference
is mostly cosmetic.  But, it's a cosmetic difference that bzr devs
(and users, I venture) find _useful_, which is why it's fought for.
And everything else seems to follow from that.

If you don't think the distinction is meaningful or useful, you can
ignore it, and the tool should work just fine.  The main place the
distinction would show up is in the cosmetics of how "log" looks (and
probably similarly in any tool that graphically describes ancestry),
and a custom log output formatter could probably be very easily
written to obviate even that.

^ permalink raw reply

* Re: VCS comparison table
From: Carl Worth @ 2006-10-19 17:01 UTC (permalink / raw)
  To: Aaron Bentley
  Cc: Linus Torvalds, Jakub Narebski, Andreas Ericsson, bazaar-ng, git
In-Reply-To: <453792A8.1010700@utoronto.ca>

[-- Attachment #1: Type: text/plain, Size: 7913 bytes --]

On Thu, 19 Oct 2006 10:58:48 -0400, Aaron Bentley wrote:
> >> In bzr development, it's very rare for anyone's revision numbers to change.
> >
> > Which just says to me that the bzr developers really are sticking to a
> > centralized model.
>
> I don't see why you're reaching that conclusion.  I'd like to understand
> that better, because Linus seems to be concluding the same thing, and it
> doesn't make sense to me.

First, I want to point out that I think we're having a delightfully
enlightening conversation here, and I'm glad for that.

Let me provide a couple of hypothetical situations to try to
demonstrate my thinking here. The first is far-fetched but perhaps
easier to understand the implications. But the second is the real,
everyday situation that is much more important.

Far-fetched
-----------
Let's imagine there's a complete fork in the bzr codebase tomorrow. We
need not suppose any acrimony, just an amiable split as two subsets of
the team start taking the code in different directions.

Now, at the time of the fork, all published revision numbers apply
equally well to either team's codebase, (obviously, since they are
identical). But as the projects diverge they each start publishing
revision numbers with respect to their own repositories in their own
bug trackers, etc. Obviously, each project has its own "mainline" so
these new revision numbers are only unique within each project and not
between the two.

Time passes...

Finally the two teams (who had remained good friends after the
breakup) find a unifying theory that will let them work on a single
tool that will meet the needs of both user bases. So they want to
merge their code together.

After the merge, there can be only one mainline, so one team or the
other will have to concede to give up the numbers they had generated
and published during the fork. That is, the numbers will not be usable
within the new, merged repository.

Everyday
--------
Now, the above scenario is just silly. It's not likely to ever happen,
so it's really not worth considering as a motivating case.

But, what does (and should) happen everyday is exactly the same. So
here's a realistic situation that is worth considering:

An individual takes the bzr codebase and starts working on it. It's
experimental stuff, so it's not pushed back into the central
repository yet. But our coder isn't a total recluse, so his friends
help him with the code he's working on. They communicate about their
work, (perhaps on the main bzr mailing list), and make statements such
as "feature F is working perfectly as of version V".

But for these communications, revision numbers will not provide
historically stable values that can be used. It's impossible for our
coder to predict the numbers that will be assigned to his code when
they get merged back into the mainline---since some other unknown
programmer may have branched at exactly the same point and is trying
to make the same determination. Neither programmer can know which code
will land first, so neither can know what numbers will get assigned,
right?

Now, the programmers could get stable numbers by keeping the branch in
the main tree, or by at least pushing out the branching point to
"reserve" a number in the main tree.

So, the only way to get stable numbers is to rely on this central
tree.

Does that make sense?

> That doesn't follow.  Just because something is arguably true doesn't
> make it bad.  And in this case, I'm not arguing that it's true, I'm
> saying that it's true, because that is what my experience tells me is true.

[I'm sorry, but I didn't grasp this sentence. I think I lost the
antecedent of "it" somewhere.]

> > In cairo, for example, we've made a habit of including a revision
> > identifier in our bug tracking system for every commit that resolves a
> > bug.
>
> We do it the other way around: we put a bug number in the commit
> message.

Oh, we do that too. That number is important, (for "what the heck is
this commit trying to do, and why", since (sadly) much of the why ends
up getting stuck off in external bug tracking tools). But the reverse
direction is also important, ("Hey, this bug got fixed in the
development version, but I want to backport it to my distribution
package. Where can I find it?").

>          And I personally have been developing a bugtracker that is
> distributed in the same way bzr is; it stores bug data in the source
> tree of a project, so that bug activities follow branches around.

That kind of thing sounds very useful. As I've been talking about
"numbers" here in bug trackers and mailing lists, it should be obvious
that I consider the information stored in such systems an important
part of the history of a code project. So it would be nice if all of
that history were stored in an equally reliable system in some way.

> On the other hand, I think your revision identifiers are not as
> permanent as you think.
>
> In the first place, it seems fairly common in the Git community to
> rebase.  This process throws away old revisions and creates new
> revisions that are morally equivalent[1].

Yes, rebasing does "destroy history" in one sense, (in actual fact, it
creates new commits and leaves the old ones around, which may or may
not have references to them anymore). But i's definitely not common
for git users to use rebase in a situation where it would change any
published number.

For example, I regularly use git-rebase, (and similar "git-commit
 --amend"), as I'm putting together a new branch that exists only
in a repository on my laptop with nobody having external visibility to
it.

So, if I see a typo in a commit and I've never pushed it anywhere,
I'll just "git commit --amend" to fix it. But if I see that typo only
after I push out the change, then I just make a new commit to fix it,
(and suck up the fact that my mistake will be a permanent part of the
history).

And git helps with this as well. If I ever forget that I've already
pushed a change and then I rebase, then the next time I try to push,
git will complain that I'm attempting to throw away history on the
remote end, and will refuse to cooperate, (unless I force it).

There's a similar safety mechanism on the pull side. If I did force a
history-rewriting push, then users who tried to pull it would also
have to force git's hand before it would rewrite their history.

[By the way, it is sometimes useful to make chaotic, regularly-rebased
branches visible to others, so they can watch what's going on. (Junio
does this with his "proposed updates (pu)" branch in hit repository
for git itself, for example). It's just that such branches should
never be used to start new development if they expect to pull from the
branch again later, nor should the revision numbers of such a branch
ever be considered permanent, nor published anywhere.]

> In the second place, one must consider the "nuclear launch codes"
> scenario.

Sure. And git does provide tools that can do this. Of course, the
"normal" tools strictly add new commits and move branches (which are
no more than references to commits) around. But moving branches can
leave commits unreferenced. And a "prune" command does exist, (which
isn't needed in "normal" use), which will delete unreferenced objects.

-Carl

> [1] This is a process that I find discomforting, because I consider the
> original revisions to be real, historical data, and I don't like the
> idea of throwing it away.

As I mentioned above. They aren't thrown away. I often use rebase when
re-building an ugly series of patches into a nice clean set of
patches. And in that situation, I might rebase from the old to the
new, but still with a reference to the old branch until I'm done with
the entire process. And it's perfectly possible, and legitimate that
such a reference has been published and the old branch will live
"forever" even if I rebased it. So rebase isn't necessarily
destructive.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: VCS comparison table
From: Carl Worth @ 2006-10-19 16:59 UTC (permalink / raw)
  To: Aaron Bentley
  Cc: Linus Torvalds, Andreas Ericsson, bazaar-ng, git, Jakub Narebski
In-Reply-To: <453792A8.1010700@utoronto.ca>

[-- Attachment #1: Type: text/plain, Size: 7913 bytes --]

On Thu, 19 Oct 2006 10:58:48 -0400, Aaron Bentley wrote:
> >> In bzr development, it's very rare for anyone's revision numbers to change.
> >
> > Which just says to me that the bzr developers really are sticking to a
> > centralized model.
>
> I don't see why you're reaching that conclusion.  I'd like to understand
> that better, because Linus seems to be concluding the same thing, and it
> doesn't make sense to me.

First, I want to point out that I think we're having a delightfully
enlightening conversation here, and I'm glad for that.

Let me provide a couple of hypothetical situations to try to
demonstrate my thinking here. The first is far-fetched but perhaps
easier to understand the implications. But the second is the real,
everyday situation that is much more important.

Far-fetched
-----------
Let's imagine there's a complete fork in the bzr codebase tomorrow. We
need not suppose any acrimony, just an amiable split as two subsets of
the team start taking the code in different directions.

Now, at the time of the fork, all published revision numbers apply
equally well to either team's codebase, (obviously, since they are
identical). But as the projects diverge they each start publishing
revision numbers with respect to their own repositories in their own
bug trackers, etc. Obviously, each project has its own "mainline" so
these new revision numbers are only unique within each project and not
between the two.

Time passes...

Finally the two teams (who had remained good friends after the
breakup) find a unifying theory that will let them work on a single
tool that will meet the needs of both user bases. So they want to
merge their code together.

After the merge, there can be only one mainline, so one team or the
other will have to concede to give up the numbers they had generated
and published during the fork. That is, the numbers will not be usable
within the new, merged repository.

Everyday
--------
Now, the above scenario is just silly. It's not likely to ever happen,
so it's really not worth considering as a motivating case.

But, what does (and should) happen everyday is exactly the same. So
here's a realistic situation that is worth considering:

An individual takes the bzr codebase and starts working on it. It's
experimental stuff, so it's not pushed back into the central
repository yet. But our coder isn't a total recluse, so his friends
help him with the code he's working on. They communicate about their
work, (perhaps on the main bzr mailing list), and make statements such
as "feature F is working perfectly as of version V".

But for these communications, revision numbers will not provide
historically stable values that can be used. It's impossible for our
coder to predict the numbers that will be assigned to his code when
they get merged back into the mainline---since some other unknown
programmer may have branched at exactly the same point and is trying
to make the same determination. Neither programmer can know which code
will land first, so neither can know what numbers will get assigned,
right?

Now, the programmers could get stable numbers by keeping the branch in
the main tree, or by at least pushing out the branching point to
"reserve" a number in the main tree.

So, the only way to get stable numbers is to rely on this central
tree.

Does that make sense?

> That doesn't follow.  Just because something is arguably true doesn't
> make it bad.  And in this case, I'm not arguing that it's true, I'm
> saying that it's true, because that is what my experience tells me is true.

[I'm sorry, but I didn't grasp this sentence. I think I lost the
antecedent of "it" somewhere.]

> > In cairo, for example, we've made a habit of including a revision
> > identifier in our bug tracking system for every commit that resolves a
> > bug.
>
> We do it the other way around: we put a bug number in the commit
> message.

Oh, we do that too. That number is important, (for "what the heck is
this commit trying to do, and why", since (sadly) much of the why ends
up getting stuck off in external bug tracking tools). But the reverse
direction is also important, ("Hey, this bug got fixed in the
development version, but I want to backport it to my distribution
package. Where can I find it?").

>          And I personally have been developing a bugtracker that is
> distributed in the same way bzr is; it stores bug data in the source
> tree of a project, so that bug activities follow branches around.

That kind of thing sounds very useful. As I've been talking about
"numbers" here in bug trackers and mailing lists, it should be obvious
that I consider the information stored in such systems an important
part of the history of a code project. So it would be nice if all of
that history were stored in an equally reliable system in some way.

> On the other hand, I think your revision identifiers are not as
> permanent as you think.
>
> In the first place, it seems fairly common in the Git community to
> rebase.  This process throws away old revisions and creates new
> revisions that are morally equivalent[1].

Yes, rebasing does "destroy history" in one sense, (in actual fact, it
creates new commits and leaves the old ones around, which may or may
not have references to them anymore). But i's definitely not common
for git users to use rebase in a situation where it would change any
published number.

For example, I regularly use git-rebase, (and similar "git-commit
 --amend"), as I'm putting together a new branch that exists only
in a repository on my laptop with nobody having external visibility to
it.

So, if I see a typo in a commit and I've never pushed it anywhere,
I'll just "git commit --amend" to fix it. But if I see that typo only
after I push out the change, then I just make a new commit to fix it,
(and suck up the fact that my mistake will be a permanent part of the
history).

And git helps with this as well. If I ever forget that I've already
pushed a change and then I rebase, then the next time I try to push,
git will complain that I'm attempting to throw away history on the
remote end, and will refuse to cooperate, (unless I force it).

There's a similar safety mechanism on the pull side. If I did force a
history-rewriting push, then users who tried to pull it would also
have to force git's hand before it would rewrite their history.

[By the way, it is sometimes useful to make chaotic, regularly-rebased
branches visible to others, so they can watch what's going on. (Junio
does this with his "proposed updates (pu)" branch in hit repository
for git itself, for example). It's just that such branches should
never be used to start new development if they expect to pull from the
branch again later, nor should the revision numbers of such a branch
ever be considered permanent, nor published anywhere.]

> In the second place, one must consider the "nuclear launch codes"
> scenario.

Sure. And git does provide tools that can do this. Of course, the
"normal" tools strictly add new commits and move branches (which are
no more than references to commits) around. But moving branches can
leave commits unreferenced. And a "prune" command does exist, (which
isn't needed in "normal" use), which will delete unreferenced objects.

-Carl

> [1] This is a process that I find discomforting, because I consider the
> original revisions to be real, historical data, and I don't like the
> idea of throwing it away.

As I mentioned above. They aren't thrown away. I often use rebase when
re-building an ugly series of patches into a nice clean set of
patches. And in that situation, I might rebase from the old to the
new, but still with a reference to the old branch until I'm done with
the entire process. And it's perfectly possible, and legitimate that
such a reference has been published and the old branch will live
"forever" even if I rebased it. So rebase isn't necessarily
destructive.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: VCS comparison table
From: Linus Torvalds @ 2006-10-19 16:49 UTC (permalink / raw)
  To: Matthew D. Fuller
  Cc: Andreas Ericsson, Carl Worth, bazaar-ng, git, Jakub Narebski
In-Reply-To: <20061019161319.GA75501@over-yonder.net>

On Thu, 19 Oct 2006, Matthew D. Fuller wrote:

> On Thu, Oct 19, 2006 at 08:25:26AM -0700 I heard the voice of
> Linus Torvalds, and lo! it spake thus:
> > 
> > The biggest difference seems to be that in bzr, the final checksum
> > is 64-bit,
> 
> Actually, as best I know, it's not a checksum, just random bits (a
> quick glance at the code seems to agree with me).

Ahh. They may be that even in BK. I know BK had various 16-bit CRC 
checksums, but they were probably on the actual _file_ contents, not in 
the key itself.

		Linus

^ permalink raw reply

* Re: [ANNOUNCE] Example Cogito Addon - cogito-bundle
From: Linus Torvalds @ 2006-10-19 16:48 UTC (permalink / raw)
  To: Jan Harkes; +Cc: Junio C Hamano, git
In-Reply-To: <20061019160750.GS17794@delft.aura.cs.cmu.edu>

On Thu, 19 Oct 2006, Jan Harkes wrote:
> 
> If we find a delta against a base that is not found in our repository we
> can keep it as a delta, the base should show up later on in the
> thin-pack. Whenever we find a delta against a base that we haven't seen
> in the received part of the thin pack, but is available from the
> repository we should expand it because there is a chance we may not see
> this base in the remainder of the thin-pack.

Yes, indeed. We can also have another heuristic: if we find a delta, and 
we haven't seen the object it deltas against, we can still keep it as a 
delta IF WE ALSO DON'T ALREADY HAVE THE BASE OBJECT. Because then we know 
that the base object has to be there later in the pack (or we have a 
dangling delta, which we'll just consider an error).

So yeah, maybe my patch-series is something we can still save.

However, the thing that makes me suspect that it is _not_ saveable, is 
this:

 - let's assume we have a nice thin pack, with object A B C D (in that 
   order), which is actually a good pack in itself (ie it _might_ be thin, 
   but it's actually self-sufficient)

 - let A be a full object, and B be packed as a delta off A, C as a delta 
   off B, and D as a delta off C.

 - Try to repack it as a streaming thing (the end result _should_ 
   obviously be exactly the same as the input, since it turns out to be 
   self-sufficient)

Looks trivial, no?

The answer is: no. It's not trivial. Or rather, it _is_ trivial, but you 
have to _remember_ all of the actual data for A, B, C and D all the way to 
the end, because only if you have that data in memory can you actually 
_recreate_ B, C and D even enough to get their SHA1's (which you need, 
just in order to know that the pack is complete, must less to be able to 
create a non-delta version in case it hadn't been).

So we can definitely do the one-pass creation, but it requires that we 
keep track of everything we've expanded so far in memory (because we won't 
have the data available any other way - we don't have them as objects in 
our object database, and we don't have a good new pack yet).

But if you do that, then yes, it's salvageable.

> About that patch series, is there a simple way to import the series into
> a local repository? git-am doesn't like it, even after splitting it into
> separate files on the linebreaks. I guess git-mailinfo could be taught
> to recognise the git-log headers. Or have I missed some useful git apply
> trick.

No, you've not missed anything. I didn't really expect anybody to want to 
seriously play with it, so I didn't bother to do things properly. 

Especially since I hadn't even written very good commit messages.

Anyway, I just pushed the "rewrite-pack" branch to my git repo on 
kernel.org, so once it mirrors out, if you really want to try to fix up 
the mess I left behind, there it is:

	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/git.git rewrite-pack

Maybe it's recoverable. 

		Linus

^ permalink raw reply

* Re: VCS comparison table
From: Matthieu Moy @ 2006-10-19 16:38 UTC (permalink / raw)
  To: Carl Worth; +Cc: bazaar-ng, Petr Baudis, git
In-Reply-To: <87ac3s2syi.wl%cworth@cworth.org>

Carl Worth <cworth@cworth.org> writes:

> Yes. We're identifying the core underlying technical difference behind
> the recent discussion. Namely bzr treats one parent as special, (the
> parent that was the branch tip previously). And this special treatment
> eliminates the ability to fast-forward, 

No.

bzr could trivially do fast-forward too. It's an explicit design
decision to have two separate commands.

> adds merge commits that wouldn't exist with fast forwarding,

They don't exist either with "pull".

The difference between bzr and git is smaller than you think on this
point I believe.

> There's a bit more to it than that though. The git command named
> "pull" will perform a fast-forward if possible, but will create a
> merge commit if necessary. For example:

The bzr command "pull" will do a fast-forward if possible, but will
refuse to continue and ask you to create the merge commit with other
commands if necessary.

> 	a       a                      a
> 	| pulls | and fast-forwards to |
> 	b       b                      b
> 	        |                      |
> 	        c                      c

Same as bzr.

> whereas:
>
>         a       a                       a
>         | pulls | and creates a merge  / \
>         b       c                     b   c
>                                        \ /
>                                         m

Here, bzr will refuse to pull. It will say "branches have diverged"
and tell you to use merge.

Then, you'll do

$ bzr merge

# optionally "bzr status"

$ bzr commit -m "merged such or such thing"

So, "git pull" seems roughly equivalent to something like

$ bzr pull || (bzr merge; bzr commit -m merge)

> So I'm curious. What does bzr pull do in the case of divergence like
> this? (And this is the "numbers will be changed" case, by the way).

Not yet. The "numbers will be changed" is if b pulls, right after.

Then, one other difference is in the UI. bzr shows you commits in a
kind of hierarchical maner, like (fictive example, that's not the real
exact format).

$ bzr log
commiter: upstream@maintainer.com
message:
  merged the work on a feature
  ------
  commiter: contributor@site.com
  message:
    prepared for feature X
  ------
  commiter: contributor@site.com
  message:
    implemented feature X
  ------
  commiter: contributor@site.com
  message:
    added testcase for feature X
------
commiter: upstream@maintainer.com
message:
  something else

No big difference in the model either, but it probably reveals a
different vision of what "history" means.

-- 
Matthieu

^ permalink raw reply

* Re: [PATCH] Don't use $author_name undefined when $from contains no /\s</.
From: Junio C Hamano @ 2006-10-19 16:19 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git
In-Reply-To: <87vemgn1s2.fsf@rho.meyering.net>

Jim Meyering <jim@meyering.net> writes:

> I noticed a case not handled in a recent patch.

Thanks. Will apply.

Curiously your patch was whitespace damaged.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox