Git development

Git development
 help / color / mirror / Atom feed

* Re: Revised PPC assembly implementation
From: linux @ 2005-04-26  2:35 UTC (permalink / raw)
  To: davem, paulus; +Cc: git, linux
In-Reply-To: <20050425161746.7d943e62.davem@davemloft.net>

(Sorry about that last e-mail.  gnome-terminal crashed and sent the file
before I edited it.  Here's what I meant to send.)

> Do a block with the integer ALUs in parallel with a block done using
> Altivec :-)  There should be enough spare insn slots so that the loads
> are absorbed properly.

Unfortunately, the blocks are connected by a data dependency.
It's basically a large-key block cipher, chained by:

iv[] = fixed_initial_value.
iv[] += encrypt(iv, text[0..63])
iv[] += encrypt(iv, text[64..127])
iv[] += encrypt(iv, text[128..191])
iv[] += encrypt(iv, text[192..255])
etc.

There is no coarse-grain parallelism to exploit, unless you want
to be hashing two separate files at once.  Which would do too much
damage to the structure of the source to be worth considering.

> Unlike UltraSPARC's VIS, with altivec you can reasonably do shifts and
> rotates, which is the only reason I'm suggesting this.

I don't quite think it's worth it, though.  It's not data-parallel
enough.

We could theoretically use it to form the w[] vector, but that's only
4 instructions in registers which are very flexibly schedulable and
nicely fill in the cracks between other instructions.

Oh, here's STEPD1+UPDATEW scheduled optimally for the G4.  %r5 holds the
constant K.  Note that t < s <= t+16.  W(s) and W((s)-16) are actually
the same register.

add   RE(t),RE(t),W(t);	xor    %r0,RD(t),RB(t);	xor    W(s),W((s)-16),W((s)-3);
add   RE(t),RE(t),%r5;	xor    %r0,%r0,RC(t);	xor    W(s),W(s),W((s)-8);
add   RE(t),RE(t),%r0;	rotlwi %r0,RA(t),5;	xor    W(s),W(s),W((s)-14);
add   RE(t),RE(t),%r0;	rotlwi RB(t),RB(t),30;	rotlwi W(s),W(s),1;

However, whether that can be done in 6 cycles on a G5 is a bit unclear.
It can't be 6 consecutive cycles, but with some motion of code
across the edges, perhaps...

0: add   RE(t),RE(t),W(t);		xor    %r0,RD(t),RB(t);
1: xor    W(s),W((s)-16),W((s)-3);	(add)
2: add   RE(t),RE(t),%r5;		xor    %r0,%r0,RC(t);
3: xor    W(s),W(s),W((s)-8);		(rotlwi)
4: add   RE(t),RE(t),%r0;		rotlwi %r0,RA(t),5;
5: xor    W(s),W(s),W((s)-14);		rotlwi RB(t),RB(t),30;
6:
7: add   RE(t),RE(t),%r0;
8:
9: rotlwi W(s),W(s),1;

The problem there is forcing that ordering, rather than issuing the final
add in cycle 6 and pushing everything else ahead of it.

STEPD0+UPDATEW and STEPD1+UPDATEW are 13 and 14 instructions,
respectively, and don't fit into a 3-issue machine as neatly.

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Mike Taht @ 2005-04-26  2:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Matt Mackall, linux-kernel, git
In-Reply-To: <Pine.LNX.4.58.0504251859550.18901@ppc970.osdl.org>

Linus Torvalds wrote:
 > On Mon, 25 Apr 2005, Matt Mackall wrote:
 >
 >>Here are the results of checking in the first 12 releases of Linux 2.6
 >>into empty repositories for Mercurial v0.3 (hg) and git-pasky-0.7.
 >>This is on my 512M Pentium M laptop. Times are in seconds.

One difference is probably - mercurial appears to be using zlib's 
*default* compression of 6....

using zlib compression of 9 really impacts git...

as per http://www.gelato.unsw.edu.au/archives/git/0504/1988.html

 >On a 700MHz p3, UDMA33, freebsd 5.3, ffs (soft updates) I get:

 >compressor | levels (size, time to compress, time to uncompress)
 >-----------+-------------------------------------------------------------------
 >gzip       | 9 (28M, 1:19, 30), 6 (28M, 31.7, 30), 3 (30M, 26.1,28.7)
 >           | 1 (31M, 23.6, 29.8)
 >bzip2      | 9 (27M, 2:14, 37.4) 6 (27M, 2:11, 38.8) 3 (27M, 2:10,38.3)
 >lzop       | 9 (32M, 2:15, 35.4) 7 (32M, 57.9, 40.3) 3 (39M, 36.0,44.4)

as per setting GIT_COMPRESSION 3 rather than Z_BEST_COMPRESSION

http://www.gelato.unsw.edu.au/archives/git/0504/1478.html


>>
>>                 user         system       real        du -sh
>>ver    files   hg    git    hg    git    hg    git    hg   git
>>
>>2.6.0  15007 19.949 35.526 3.171 2.264 25.138 87.994 145M   89M
>>2.6.1    998  5.906  4.018 0.573 0.464 10.267  5.937 146M   99M
>>2.6.2   2370  9.696 13.051 0.752 0.652 12.970 15.167 150M  117M
>>2.6.3   1906 10.528 11.509 0.816 0.639 18.406 14.318 152M  135M
>>2.6.4   3185 11.140  7.380 0.997 0.731 15.265 12.412 156M  158M
>>2.6.5   2261 10.961  6.939 0.843 0.640 20.564  8.522 158M  177M
>>2.6.6   2642 11.803 10.043 0.870 0.678 22.360 11.515 162M  197M
>>2.6.7   3772 18.411 15.243 1.189 0.915 32.397 21.498 165M  227M
>>2.6.8   4604 20.922 16.054 1.406 1.041 39.622 25.056 172M  262M
>>2.6.9   4712 19.306 12.145 1.421 1.102 35.663 24.958 179M  297M
>>2.6.10  5384 23.022 18.154 1.393 1.182 40.947 32.085 186M  338M
>>2.6.11  5662 27.211 19.138 1.791 1.253 42.605 31.902 193M  379M


-- 

Mike Taht


   "Imagination is more important than knowledge.
	-- Albert Einstein"

^ permalink raw reply

* Re: Revised PPC assembly implementation
From: linux @ 2005-04-26  2:14 UTC (permalink / raw)
  To: davem, paulus; +Cc: git, linux
In-Reply-To: <20050425161746.7d943e62.davem@davemloft.net>

>From davem@davemloft.net Mon Apr 25 23:26:06 2005
Date: Mon, 25 Apr 2005 16:17:46 -0700
From: "David S. Miller" <davem@davemloft.net>
To: Paul Mackerras <paulus@samba.org>
Cc: linux@horizon.com, git@vger.kernel.org
Subject: Re: Revised PPC assembly implementation
In-Reply-To: <17005.30365.995256.963911@cargo.ozlabs.ibm.com>
References: <17004.47876.414.756912@cargo.ozlabs.ibm.com>
	<20050425173430.11031.qmail@science.horizon.com>
	<17005.30365.995256.963911@cargo.ozlabs.ibm.com>
X-Mailer: Sylpheed version 1.0.4 (GTK+ 1.2.10; sparc-unknown-linux-gnu)
X-Face: "_;p5u5aPsO,_Vsx"^v-pEq09'CU4&Dc1$fQExov$62l60cgCc%FnIwD=.UF^a>?5'9Kn[;433QFVV9M..2eN.@4ZWPGbdi<=?[:T>y?SD(R*-3It"Vj:)"dP
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

On Tue, 26 Apr 2005 09:00:45 +1000
Paul Mackerras <paulus@samba.org> wrote:

> The main loop seems to be taking about 560 cycles (assuming that
> essentially all the time spent in my little test program is spent in
> the main loop).  It contains about 1000 integer instructions, which
> will take at least 500 cycles, as we have 2 ALUs.  So we are already
> within about 10% of the theoretical optimum.

Time to bust out the altivec perhaps :)

Do a block with the integer ALUs in parallel with a block done using
Altivec :-)  There should be enough spare insn slots so that the loads
are absorbed properly.

Unlike UltraSPARC's VIS, with altivec you can reasonably do shifts and
rotates, which is the only reason I'm suggesting this.



^ permalink raw reply

* Re: [PATCH 0/2] diff-tree/diff-cache helper
From: Nicolas Pitre @ 2005-04-26  2:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504251832480.18901@ppc970.osdl.org>

On Mon, 25 Apr 2005, Linus Torvalds wrote:

> This also makes me think that we should just make "show-diff" show the
> same format, at which point show-diff actually matches all the other
> tools, and it is likely to make show-diff more useful to boot.
> 
> The thing I personally use "show-diff" for these days is actually just to
> check whether I have anything dirty in my tree, and then it would actually
> be preferable to just get the filenaname printout (in the same old
> "diff-cache" format) rather than the full diff.

That makes a lot of sense.  And I think that path filtering in diff-tree 
should be factored out and supported into all of the diff-* commands as 
well (not necessarily in diff-tree-helper).


Nicolas

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Linus Torvalds @ 2005-04-26  2:08 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel, git
In-Reply-To: <20050426004111.GI21897@waste.org>

On Mon, 25 Apr 2005, Matt Mackall wrote:
>
> Here are the results of checking in the first 12 releases of Linux 2.6
> into empty repositories for Mercurial v0.3 (hg) and git-pasky-0.7.
> This is on my 512M Pentium M laptop. Times are in seconds.
> 
>                  user         system       real        du -sh
> ver    files   hg    git    hg    git    hg    git    hg   git
> 
> 2.6.0  15007 19.949 35.526 3.171 2.264 25.138 87.994 145M   89M
> 2.6.1    998  5.906  4.018 0.573 0.464 10.267  5.937 146M   99M
> 2.6.2   2370  9.696 13.051 0.752 0.652 12.970 15.167 150M  117M
> 2.6.3   1906 10.528 11.509 0.816 0.639 18.406 14.318 152M  135M
> 2.6.4   3185 11.140  7.380 0.997 0.731 15.265 12.412 156M  158M
> 2.6.5   2261 10.961  6.939 0.843 0.640 20.564  8.522 158M  177M
> 2.6.6   2642 11.803 10.043 0.870 0.678 22.360 11.515 162M  197M
> 2.6.7   3772 18.411 15.243 1.189 0.915 32.397 21.498 165M  227M
> 2.6.8   4604 20.922 16.054 1.406 1.041 39.622 25.056 172M  262M
> 2.6.9   4712 19.306 12.145 1.421 1.102 35.663 24.958 179M  297M
> 2.6.10  5384 23.022 18.154 1.393 1.182 40.947 32.085 186M  338M
> 2.6.11  5662 27.211 19.138 1.791 1.253 42.605 31.902 193M  379M

That time in checking things in is worrisome.

"git" is basically linear in the size of the patch, which is what I want,
since most patches I work with are a couple of files at most. The patches
you are checking in are huge - I never actually work with a change that is
as big as a whole release. I work with changes that are five files or
something.

"hg" seems to basically slow down the more patches you have applied. It's 
hard to tell from the limited test set, but look at "user" time. It seems 
to increase from 6 seconds to 27 seconds.

To make an interesting benchmark, try applying the first 200 patches in 
the current git kernel archive. Can you do them three per second? THAT is 
the thing you should optimize for, not checking in huge changes.

If you're checking in a change to 1000+ files, you're doing something
wrong.

> Full-tree working dir diff (2.6.0 base with 2.6.1 in working dir):
> hg:  real 4.920s  user 4.629s  sys 0.260s
> git: real 3.531s  user 1.869s  sys 0.862s
> (this needed an update-cache --refresh on top of git commit, which
> took another: real 2m52.764s  user 2.833s  sys 1.008s)

You're doing something wrong with git here. Why would you need to update 
your cache?

			Linus

^ permalink raw reply

* Re: Mercurial 0.3 vs git benchmarks
From: Daniel Phillips @ 2005-04-26  1:49 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel, git, Linus Torvalds
In-Reply-To: <20050426004111.GI21897@waste.org>

On Monday 25 April 2005 20:41, Matt Mackall wrote:
> Despite the above, it compares pretty well to git in speed and is
> quite a bit better in terms of storage space. By reducing the zlib
> compression level, it could probably win across the board.

Hi Matt,

Congratulations on an impressive demo!  How about actually checking the 
compression vs wall clock theory?  And I probably don't have to mention 
psyco...

Regards,

Daniel

^ permalink raw reply

* Re: git add / update-cache --add fails.
From: Junio C Hamano @ 2005-04-26  1:38 UTC (permalink / raw)
  To: rhys; +Cc: git
In-Reply-To: <200504252252.05957.rhys@rhyshardwick.co.uk>

Just a wild guess.  Are you trying to run the command from a
subdirectory, not from the top directory (that is, the one that
has subdirectory .git/ in it)?

^ permalink raw reply

* Re: [PATCH 0/2] diff-tree/diff-cache helper
From: Linus Torvalds @ 2005-04-26  1:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <7v1x8zsamn.fsf_-_@assigned-by-dhcp.cox.net>

On Sun, 24 Apr 2005, Junio C Hamano wrote:
>
> I use a set of small scripts [*1*] directly on top of the core
> git, which needed to make patches out of diff-tree and
> diff-cache output.  Its output is compatible with what show-diff
> produces.

Good, applied.

This also makes me think that we should just make "show-diff" show the
same format, at which point show-diff actually matches all the other
tools, and it is likely to make show-diff more useful to boot.

The thing I personally use "show-diff" for these days is actually just to
check whether I have anything dirty in my tree, and then it would actually
be preferable to just get the filenaname printout (in the same old
"diff-cache" format) rather than the full diff.

Maybe rename the "show-diff" command to be "cache-diff", and if somebody
wants the old "show-diff" thing, just have a script that does

	#!/bin/sh
	cache-diff | diff-tree-helper

and nothing more.

Talking about renaming, at some point we really should prepend "git-" to 
all the git commands. I didn't want to do the extra typing when I started 
out and was unsure about the name, but hey, by now we really should.

Junio, what do you think?

		Linus

^ permalink raw reply

* Re: Revised PPC assembly implementation
From: Paul Mackerras @ 2005-04-26  1:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux, git
In-Reply-To: <20050425161746.7d943e62.davem@davemloft.net>

David S. Miller writes:

> Time to bust out the altivec perhaps :)

I looked at this but I couldn't see a way to use altivec effectively
for SHA1.

The problem is that we have a chain of dependencies with the A
variable (which is 32-bit) where each A value depends on the previous
A value and on one of the 80 W values.  The W values are derived from
the 16 words (32-bit) of the input data block.

It might be possible to use altivec for generating the W values
(although there is the problem that W[k] depends on W[k-3], making it
hard to do a 4-way parallelization), but I don't see any way of
parallelizing the calculation of the A values, which is the critical
path.  Using altivec for generating the W values but the integer ALUs
for the A calculations would mean we had to go via memory, too, since
there isn't any way to transfer stuff directly between altivec
registers and GPRs.

We can't do four blocks from the same sequence in parallel either.  We
could do four blocks from four separate streams in parallel, but that
seems hard to organize...

Regards,
Paul.

^ permalink raw reply

* Re: git.git object database at kernel.org?
From: H. Peter Anvin @ 2005-04-26  1:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano
In-Reply-To: <Pine.LNX.4.58.0504251756190.18901@ppc970.osdl.org>

Linus Torvalds wrote:
> 
> On Mon, 25 Apr 2005, H. Peter Anvin wrote:
> 
>>Oh well.  If you have the offset, the algorithm is fully arithmetric and 
>>doesn't rely on the zoneinfo system, so it can be trivially implemented. 
> 
> You have a different definition of "trivial" than I do. I have not a 
> frigging clue how to handle leap seconds etc ;)
> 

Leap seconds don't exist in the POSIX time_t universe, so they always obey:

	... + 3600*hour + 60*min + sec

... which means that during a positive leap second, time_t remains 
unchanged for 2 seconds, and for a negative leap second time_t jumps. 
Thus, the difference between two time_t doesn't always match the exact 
number of seconds between those two points in time.

> 
>>   And again, curl_gettime() does handle the whole string to time_t 
>>conversion of the common formats.
> 
> I don't doubt you, I just would prefer to not rely on boutique libraries 
> too much. 
> 
> Yeah, we already use it for http-pull, so I guess it's moot, but at least 
> that felt less like a core command..
> 

If we're already using libcurl, we might as well.  Otherwise, I'd just 
rip out curl_gettime from the libcurl sources.

	-hpa

^ permalink raw reply

* Re: git.git object database at kernel.org?
From: Linus Torvalds @ 2005-04-26  0:58 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Git Mailing List, Junio C Hamano
In-Reply-To: <426D8FDF.5050608@zytor.com>

On Mon, 25 Apr 2005, H. Peter Anvin wrote:
> 
> Oh well.  If you have the offset, the algorithm is fully arithmetric and 
> doesn't rely on the zoneinfo system, so it can be trivially implemented. 

You have a different definition of "trivial" than I do. I have not a 
frigging clue how to handle leap seconds etc ;)

>    And again, curl_gettime() does handle the whole string to time_t 
> conversion of the common formats.

I don't doubt you, I just would prefer to not rely on boutique libraries 
too much. 

Yeah, we already use it for http-pull, so I guess it's moot, but at least 
that felt less like a core command..

		Linus

^ permalink raw reply

* Re: A darcs that can pull from git
From: Linus Torvalds @ 2005-04-26  0:55 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: Git Mailing List, darcs-devel
In-Reply-To: <7i4qdusxdw.fsf@lanthane.pps.jussieu.fr>


[ Side note: I tend to read the mailing lists much less often, and more 
  likely to skip stuff, so if you have a question that is literally for me 
  personally, it's probably best to Cc my private address rather than 
  depending on me reading every single mailing list email ]

On Mon, 25 Apr 2005, Juliusz Chroboczek wrote:
> 
> Linus, could you please suggest a suitable license statement to
> include in whichever files of yours we choose to include in Darcs?  Is
> David's suggestion (stock GPL boilerplate with ``or any later
> version'' removed) okay with you?

Stock GNU boilerplate without the "or any later version" works fine. 

As does a simple one-liner "Licensed under GPLv2", for that matter. It's 
not like there can be any real confusion.

		Linus

^ permalink raw reply

* Re: git.git object database at kernel.org?
From: H. Peter Anvin @ 2005-04-26  0:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano
In-Reply-To: <Pine.LNX.4.58.0504251729080.18901@ppc970.osdl.org>

Linus Torvalds wrote:
> 
> On Mon, 25 Apr 2005, H. Peter Anvin wrote:
> 
>>No, mktime() always uses the local time zone.  It's the inverse of 
>>localtime().
> 
> Note that this still doesn't make any sense.
> 
> A true inverse of "localtime()" should still take the GMT offset from
> "struct tm", and it would work fine, assuming that localtime() set that
> offset correctly.
> 
> So _I_ think it's incredibly stupid that mktime() looks at the local 
> timezone. 
> 
> Oh, well.  Not a big issue except for the date conversion, and since there 
> hopefully aren't any old repo's left, we can leave it behind us.
> 

It *is* incredibly stupid, but dates back to the fact that a the GMT 
offset field in struct tm is a reasonably recent invention, and a lot of 
old code depended on just stuffing fields in struct tm and calling 
mktime(); they would leave the offset field uninitialized, as opposed to 
setting it to some well-defined "I don't know" value.

What totally blows is that if mktime() can't be fixed, that we haven't 
added mktime_have_offset() or something like that.

Oh well.  If you have the offset, the algorithm is fully arithmetric and 
doesn't rely on the zoneinfo system, so it can be trivially implemented. 
   And again, curl_gettime() does handle the whole string to time_t 
conversion of the common formats.

	-hpa

^ permalink raw reply

* Re: git add / update-cache --add fails.
From: Linus Torvalds @ 2005-04-26  0:47 UTC (permalink / raw)
  To: Rhys Hardwick; +Cc: git
In-Reply-To: <200504252252.05957.rhys@rhyshardwick.co.uk>

On Mon, 25 Apr 2005, Rhys Hardwick wrote:
> 
> Just to clarify, the latest version of git to be merged with pasky is:
> 
> 4e03aae5feb2e3fd2f543796ca3d3e8aa86c02dc
> 
> I have tried rebooting

[ somebody has been using windows for too long ;]

Just do an "strace update-cache --add xxxxx", that often gives a clue. 
Also, "ltrace" is a wonderful tool at times.

update-cache will be unhappy if the file is unreadable, for example. But 
it will also be unhappy if it cannot create the sha1 hashed object file, 
which can happen if the permissions on the object directories are screwed 
up or similar.

I think these things should generally show up as a sore thumb in an 
strace.

		Linus

^ permalink raw reply

* Mercurial 0.3 vs git benchmarks
From: Matt Mackall @ 2005-04-26  0:41 UTC (permalink / raw)
  To: linux-kernel, git; +Cc: Linus Torvalds

This is to announce an updated version of Mercurial. Mercurial is a
scalable, fast, distributed SCM that works in a model similar to BK
and Monotone. It has functional clone/branch and pull/merge support
and a working first pass implementation of network pull. It's also
extremely small and hackable: it's about 1000 lines of code.

 http://selenic.com/mercurial/

Here are the results of checking in the first 12 releases of Linux 2.6
into empty repositories for Mercurial v0.3 (hg) and git-pasky-0.7.
This is on my 512M Pentium M laptop. Times are in seconds.

                 user         system       real        du -sh
ver    files   hg    git    hg    git    hg    git    hg   git

2.6.0  15007 19.949 35.526 3.171 2.264 25.138 87.994 145M   89M
2.6.1    998  5.906  4.018 0.573 0.464 10.267  5.937 146M   99M
2.6.2   2370  9.696 13.051 0.752 0.652 12.970 15.167 150M  117M
2.6.3   1906 10.528 11.509 0.816 0.639 18.406 14.318 152M  135M
2.6.4   3185 11.140  7.380 0.997 0.731 15.265 12.412 156M  158M
2.6.5   2261 10.961  6.939 0.843 0.640 20.564  8.522 158M  177M
2.6.6   2642 11.803 10.043 0.870 0.678 22.360 11.515 162M  197M
2.6.7   3772 18.411 15.243 1.189 0.915 32.397 21.498 165M  227M
2.6.8   4604 20.922 16.054 1.406 1.041 39.622 25.056 172M  262M
2.6.9   4712 19.306 12.145 1.421 1.102 35.663 24.958 179M  297M
2.6.10  5384 23.022 18.154 1.393 1.182 40.947 32.085 186M  338M
2.6.11  5662 27.211 19.138 1.791 1.253 42.605 31.902 193M  379M

tar of .hg/   108175360
tar of .git/  209385920

Full-tree change status (no changes):
hg:  real 0.799s  user 0.607s  sys 0.167s
git: real 0.124s  user 0.051s  sys 0.051s

Check-out time (2.6.0):
hg:  real 34.084s  user 4.069s  sys 2.024s
git: real 30.487s  user 2.393s  sys 1.007s

Full-tree working dir diff (2.6.0 base with 2.6.1 in working dir):
hg:  real 4.920s  user 4.629s  sys 0.260s
git: real 3.531s  user 1.869s  sys 0.862s
(this needed an update-cache --refresh on top of git commit, which
took another: real 2m52.764s  user 2.833s  sys 1.008s)

Merge from 2.6.0 to 2.6.1:
hg:  real 15.507s  user 6.175s  sys 0.442s
git: haven't quite figured this one out yet

Some notes:

- hg has a separate index file for each file checked in, which is why
  the initial check-in is larger
- this also means it touches twice as many files, typically
- neither hg nor git quite fit in cache on my 512M laptop (nor does a
  kernel compile), but the extra indexing makes hg's wall times a bit longer
- hg does a form of delta compression, so each checkin requires
  retrieving a previous version, checking its hash, doing a diff,
  compressing it, and checking in the result
- hg is written in pure Python

Despite the above, it compares pretty well to git in speed and is
quite a bit better in terms of storage space. By reducing the zlib
compression level, it could probably win across the board.

The size numbers will get dramatically more unbalanced with more
history - a conversion of the history in BK to git is expected to take
over 3G, which Mercurial may actually take less space due to storing
compressed binary forward-only deltas.

While disk may be cheap, network bandwidth is not. Given that the
common case usage of git will be to do network pulls, it will find
most of its speed wasted on waiting for the network. Mercurial will
almost certainly win here for typical developer usage as it can do
efficient delta communication (though it currently doesn't attempt any
pipelining so suffers a bit in round trips).

More discussion about Mercurial's design can be found here:

 http://selenic.com/mercurial/notes.txt

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply

* Re: mod-times (was: keyword expansion)
From: Linus Torvalds @ 2005-04-26  0:34 UTC (permalink / raw)
  To: tony.luck; +Cc: Thomas Glanzmann, git
In-Reply-To: <200504251756.j3PHuSh01362@unix-os.sc.intel.com>

On Mon, 25 Apr 2005 tony.luck@intel.com wrote:
> 
> One way to do this would be to rip on some of the core fundamentals of GIT
> and store the time that an object was created inside the object. E.g.
> 
>    blob size secs-since-1970 ...

You really don't want that.

The thing is, somebody doing a "touch" on a file should _not_ cause that 
tree to be committed as a new tree.

Trying to save mtime in a git archive is about a million times worse than 
saving the whole "mode" information, and there we already ended up cutting 
it down to just one bit, exactly because it was horrible not to.

So this is not about git formats, this is about it just not being 
practical to do. Git simply isn't a good thing to store mtime in.

			Linus

^ permalink raw reply

* Re: git.git object database at kernel.org?
From: Linus Torvalds @ 2005-04-26  0:32 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Git Mailing List, Junio C Hamano
In-Reply-To: <426D3B01.8060408@zytor.com>

On Mon, 25 Apr 2005, H. Peter Anvin wrote:
> 
> No, mktime() always uses the local time zone.  It's the inverse of 
> localtime().

Note that this still doesn't make any sense.

A true inverse of "localtime()" should still take the GMT offset from
"struct tm", and it would work fine, assuming that localtime() set that
offset correctly.

So _I_ think it's incredibly stupid that mktime() looks at the local 
timezone. 

Oh, well.  Not a big issue except for the date conversion, and since there 
hopefully aren't any old repo's left, we can leave it behind us.

		Linus

^ permalink raw reply

* Re: Hash collision count
From: Petr Baudis @ 2005-04-26  0:00 UTC (permalink / raw)
  To: Tom Lord; +Cc: git
In-Reply-To: <200504252350.QAA02241@emf.net>

Dear diary, on Tue, Apr 26, 2005 at 01:50:31AM CEST, I got a letter
where Tom Lord <lord@emf.net> told me that...
> 
>   From: Petr Baudis <pasky@ucw.cz>
> 
>   Pasky:
> 
>   > No, a collision is pretty common thing, actually. It's the main power of
>   > git, actually - when you do read-tree, modify it and do write-tree
>   > (typically when doing commit), everything you didn't modify (99% of
>   > stuff, most likely) is basically a collision - but it's ok since it
>   > just stays the same.
> 
> That is not the way people ordinarily use the word "collision".
> It's pretty much the opposite of the normal way, actually.

You need to quote me in the context of Jeff Garzik's

> > Third, a data check only occurs in the highly unlikely case that a hash
> > already exists -- a collision.  Rather than "trillions of times", more
> > like "one in a trillion chance."

I just wanted to point out that the data check would hahve to occur
everytime you didn't modify an object.

Kind regards,

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: Hash collision count
From: Tom Lord @ 2005-04-25 23:50 UTC (permalink / raw)
  To: pasky; +Cc: git
In-Reply-To: <20050423234637.GS13222@pasky.ji.cz>


  From: Petr Baudis <pasky@ucw.cz>

  Pasky:

  > No, a collision is pretty common thing, actually. It's the main power of
  > git, actually - when you do read-tree, modify it and do write-tree
  > (typically when doing commit), everything you didn't modify (99% of
  > stuff, most likely) is basically a collision - but it's ok since it
  > just stays the same.

That is not the way people ordinarily use the word "collision".
It's pretty much the opposite of the normal way, actually.

-t

^ permalink raw reply

* Re: git pull issues...
From: Morten Welinder @ 2005-04-25 23:43 UTC (permalink / raw)
  To: Dan Holmsand; +Cc: git
In-Reply-To: <d4jn91$n4f$1@sea.gmane.org>

The two approaches are very much alike -- there are only so many ways to
skin a cat, after all.

I actually first considered the -K approach, but then realized that only very
new rsyncs support that.  My version works with rsync 2.6.2 (and maybe
earlier).  It could be made to go further back by not using -from0 which isn't
really needed here.

But I like your check-for-HEAD detail.

Morten

^ permalink raw reply

* Re: Revised PPC assembly implementation
From: David S. Miller @ 2005-04-25 23:17 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linux, git
In-Reply-To: <17005.30365.995256.963911@cargo.ozlabs.ibm.com>

On Tue, 26 Apr 2005 09:00:45 +1000
Paul Mackerras <paulus@samba.org> wrote:

> The main loop seems to be taking about 560 cycles (assuming that
> essentially all the time spent in my little test program is spent in
> the main loop).  It contains about 1000 integer instructions, which
> will take at least 500 cycles, as we have 2 ALUs.  So we are already
> within about 10% of the theoretical optimum.

Time to bust out the altivec perhaps :)

Do a block with the integer ALUs in parallel with a block done using
Altivec :-)  There should be enough spare insn slots so that the loads
are absorbed properly.

Unlike UltraSPARC's VIS, with altivec you can reasonably do shifts and
rotates, which is the only reason I'm suggesting this.


^ permalink raw reply

* [rft] repository for the r8169 driver
From: Francois Romieu @ 2005-04-25 23:03 UTC (permalink / raw)
  To: git; +Cc: jgarzik

A repo for the r8169 stuff is set up at :

rsync://www.fr.zoreil.com/linux-2.6.git/

It is (supposedly) derived from a recent 2.6.12-git repo.
I'd appreciate if someone could report whether it's usable or not.

--
Ueimor

^ permalink raw reply

* Re: Revised PPC assembly implementation
From: Paul Mackerras @ 2005-04-25 23:00 UTC (permalink / raw)
  To: linux; +Cc: git
In-Reply-To: <20050425173430.11031.qmail@science.horizon.com>

linux@horizon.com writes:

> Huh?  I'm saving 19 registers, r13..r31, and not saving 13, namely
> r0..r12.

Oops. :)  Somehow I thought you were saving r13..r32 or something. :)

> Damn.  So that's actually *worse* than me earlier version which achieved
> an (also piddling) 2% speedup?

I wouldn't say it is worse, I would say it is the same.  I didn't do
as many runs of the previous version.  The spread of times looked
about the same with both of your versions.

> Damn, I wish I had at that IBM pipeline profiling tool.  If it could
> just tell me which cycles didn't have both ALUs busy, I could solve it
> in relatively little time.

I'm going to look at trying to get it going.

> The place that could really use scheduing help is the G4, which has three
> integer ALUs, but can only *think* about executing the bottom three entries
> in the reorder queue.  So if one of those instructions isn't ready, it
> stalls in the queue and idles the ALU with it.

Yes, the performance on the G4 is also important.  Not everyone has a
G5. ;)

> Maybe I can improve the scheduling some more...

The main loop seems to be taking about 560 cycles (assuming that
essentially all the time spent in my little test program is spent in
the main loop).  It contains about 1000 integer instructions, which
will take at least 500 cycles, as we have 2 ALUs.  So we are already
within about 10% of the theoretical optimum.

So I think we are already at the point of diminishing returns as far
as the overall performance of git is concerned.  But if you want to
try to get that last 10%, go for it... :)

Paul.

^ permalink raw reply

* Re: git "tag" objects implemented - and a re-done commit
From: Linus Torvalds @ 2005-04-25 22:41 UTC (permalink / raw)
  To: Petr Baudis; +Cc: H. Peter Anvin, Git Mailing List
In-Reply-To: <20050425221810.GM13467@pasky.ji.cz>

On Tue, 26 Apr 2005, Petr Baudis wrote:
> 
> Could we please at least maintain the newline between the "header" and data,
> like in the commit objects?

Yes, I did that in the "git-tag-script" I actually committed, although git 
doesn't currently really care (ie fsck won't complain).

		Linus

^ permalink raw reply

* Re: git "tag" objects implemented - and a re-done commit
From: Linus Torvalds @ 2005-04-25 22:39 UTC (permalink / raw)
  To: Andreas Gal; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504251505260.18901@ppc970.osdl.org>



On Mon, 25 Apr 2005, Linus Torvalds wrote:
> 
> So I'll probably just push out my tags with my archives, and then people
> can verify them if they want to.

Ok, for the intrepid users, you can now test to see if you can pick them 
out. fsck should make them totally obvious, and here's my public key in 
case you also want to verify the things.

Of course, since I normally don't use pgp signing etc, it's entirely 
possible that I've done something stupid, and I'm now sending you my 
secret key and my full porn-collection.

		Linus

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1.2.4 (GNU/Linux)

mQGiBEJqZ4sRBADKttqQOCAxRzz5qGmo5QnSR5GTkSlPTm4lCuaVUon0qQPNrasr
cSBAOJ1MlXjhbRPrN3pAhI+taLgrWQ231zUNHxCTmWJZV3Yzxr8xJQGlfHlVOxXB
LI42tAfCjHOF7z8pPj6AGhtE2+fzq1U3mOlA/fUG4uYDOwIoPK+qgbM6SwCgulqs
DGlQKFFtFgW8HVnDftFmyZMD+wc0E9jRa9HJ3b1U3vY1jrxpoVw5QeeIZdSRnRFy
sknOHca5mlJvTidu1cs7xCuvpufw1VIVvgf4tPwXcTDEKthYEhoty+DFOqZ9R7pg
EMhjYbq+Q8yLT3OWQtUKV4B10FRYIWidnJ8y2CjLduTmB+cyj976oxEY/llLBbQM
yuDrBADDLw/3KZL5D75icA0l/uebQ6/73j8jcRoVu0gTqAdQBYL6Zv7Y0G7xHUCo
Eqgo+p2LXAeU9IoeA5/h8SNVDw4fYoqo6VQTkr+ydegHkjwlbrhOL/gxzlY1Pde1
TBi6+QCUssk0FCPMALt7M+OgFpSKx7pP2xSsDsMvvNNAmLl0JrQ0TGludXMgVG9y
dmFsZHMgKHRhZyBzaWduaW5nIGtleSkgPHRvcnZhbGRzQG9zZGwub3JnPoheBBMR
AgAeBQJCameLAhsDBgsJCAcDAgMVAgMDFgIBAh4BAheAAAoJEBd2LEZ24hy7I84A
nROHRYes4RU8btdleR0TgwJG7jMvAKCF2CingjxaC4sTL7BkFfNacTkBYLkBDQRC
ameMEAQAlJiw0IBltu5ihEXE4mFYiWHuVAoeufVJ9fONv67y6fu3efJ10PJ7AQdG
Ufez+8yxkrahyIVC77NuQLDrRfvgmrJ8sbP8xb6QEbY1bnwLeuciTolGjL+kYi17
J74iG2cQDyimnLWJm5lNqeUOz3nTW429SyLCRhXpR1lUjijiVi8AAwcD/1f4VEql
u9HHTA4S+1aoOQV5guZCr6JbYdWkAZeeFRpFSXfCae6uO8DhpD7o/8kiK3O8qP1O
yjQF0bG26iLCm8MdJCO0WQ2xsVlwrrvnNPpgRgbirOgoxHM4ESq/YV+MqXo41Hm0
ilHRM7OIbmm7uvFSlUJmUasuJRsrhibilbvNiEkEGBECAAkFAkJqZ4wCGwwACgkQ
F3YsRnbiHLsolQCfRVImDkgijhPGmwyI7T19bWltXwsAniMi9gakkN+9DT8E5kli
e8uTEk8f
=PRrZ
-----END PGP PUBLIC KEY BLOCK-----


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox