Git development
 help / color / mirror / Atom feed
* Software
From: Betty T. Sheller @ 2006-02-12 14:22 UTC (permalink / raw)
  To: Git

Learn to build simple and clean websites that can bring in the dough... 
Understanding 0EM software 

New software on our site:

Plus! XP - $59.95
After Effects 6 - $69.95
Premiere 7 - $69.95
Fireworks MX 2004 - $69.95
Photoshop 7 - $69.95 
Norton System Works 2003 - $59.95
Picture It Premium 9 - $59.95
Windows 98 - $49.95
PageMaker 7 (2CD) - $69.95
Actobat 6.0 Pro - $79.95
After Effects 6 - $69.95
Office 97 SR2 - $49.95
Actobat 6.0 Pro - $79.95
InDesign CS - $69.95

Our site:
http://paulinusag.com

^ permalink raw reply

* ***DONTUSE*** Re: [PATCH] Use a hashtable for objects instead of a sorted list
From: Junio C Hamano @ 2006-02-12 13:46 UTC (permalink / raw)
  To: git
In-Reply-To: <7vlkwgdbk6.fsf_-_@assigned-by-dhcp.cox.net>

I've pushed things out to "master" and "next" branch.

Quite a lot of things.

One thing that I expected to be there is not.  It is the
hashtable patch.  It is in "pu".

I once had it in my private "next", but dropped it before
pushing things out.

The problem does not seem to trigger with casual use, but I
found that with a clone from my primary repository with '-l -s'
(that is, a clone that uses alternates mechanism to borrow from
my primary repository), fsck-objects built with the patch seems
to report bogus things "missing".  I have not traced it fully;
instead I ended up spending most of the night (I noticed it at
around 01:30 and now it is 05:30 so that's about four hours)
recovering some of my refs and double checking if my primary
repository is not corrupt X-<.  At least, the primary repository
looks sane now.

With luck, I would muster enough energy to figure it out, but I
need some sleep first.

The problem seems to be very elusive.  I took a snapshot of the
two repositories involved, so that I can use them as an isolated
test case (the one is my primary repository and the other one is
the "-l -s" clone). The problem is repeatable, but the SHA1 of
the file the broken fsck-objects reports to be missing is
different from the one I observed in the first experiment with
the real repositories.  It appears it has something to do with
the directory listing order of fsck-objects, which in turn means
the reproduction of the problem is related to memory allocation
patterns, so maybe valgrind would help.  On the other hand, even
if I published a tarball of these two repositories somewhere,
other people (or myself) who extract the tarball would probably
not see the same SHA1 reported as missing X-<.

Anyway, I've pushed them out before crashing, after I double
checked that versions built from my "master" and "next" do not
seem to show the problem, while with the one in "pu", the first
patch after merging "next" in it being the said patch, exhibits
the problem.

^ permalink raw reply

* Configuration file musings
From: Mark Wooding @ 2006-02-12 13:45 UTC (permalink / raw)
  To: git

Having thought about things a bit, I've reached the conclusion that the
configuration file $GIT_DIR/config is trying to hold (at least) three
entirely different kinds of configuration.

  * User configuration: basically, how I like GIT to work for me.  I
    think that the way it represents my name in commit messages is user
    configuration, as would be the behaviour of `git-commit PATH'.
    Environment variables almost work for this, but they're a nuisance
    to change.  This stuff ought to be somewhere in my home directory,
    probably; though it would be useful to override temporarily, or on a
    per-repository basis.

  * Project configuration: how GIT should be supporting a particular
    project.  The merge.summary flag is like this, I think: whether to
    have summaries in merge messages is a policy decision to be taken
    for a whole project, rather than something to be left to the whims
    of individual developers.  Such settings probably to be propagated
    through git-clone, git-fetch and so on.

  * True repository configuration: how this particular repository ought
    to behave.  I can't think of many examples off the top of my head,
    but core.repositoryformatversion and core.filemode are the sorts of
    things I'm thinking of.

I'm not entirely sure where I'm going with this at the moment, and I
don't like some of the complexity which seems inherent in doing anything
about it, but I thought I'd stick my oar in anyway.

-- [mdw]

^ permalink raw reply

* Re: [PATCH] Use a hashtable for objects instead of a sorted list
From: Junio C Hamano @ 2006-02-12 12:11 UTC (permalink / raw)
  To: Alexandre Julliard; +Cc: git, Johannes Schindelin, Linus Torvalds
In-Reply-To: <87oe1dez7k.fsf@wine.dyndns.org>

Alexandre Julliard <julliard@winehq.org> writes:

> Junio C Hamano <junkio@cox.net> writes:
>
>> Alexandle, if you have a chance, could you try Johannes' patch
>> on your workload to see if it works OK for you?
>
> It works great for me, CPU time is down to 15 sec instead of 20 sec
> with my patch.

Thanks.  Now we have three independent numbers to back up that
Johannes is the winner....

Grrrrrrr.  Please, DO NOT USE THIS ONE YET.

At least, not with your production repository.

I am trying to nail it down but it appears at least fsck-objects
using this version gives bogus results.  I am first trying to
see if my primary working repository is sane.

Oh, and thanks again for your initial patch, which was what
started this drastic improvement.

^ permalink raw reply

* ***DONTUSE*** Re: [PATCH] Use a hashtable for objects instead of a sorted list
From: Junio C Hamano @ 2006-02-12 12:08 UTC (permalink / raw)
  To: Florian Weimer; +Cc: git
In-Reply-To: <87accwlt8k.fsf@mid.deneb.enyo.de>

Florian Weimer <fw@deneb.enyo.de> writes:

> (GCC should do the rest.)
>...
> AFAICS, obj_allocs is a power of two.

Yes, I already have something like these in my tree (the latter
did not help much as far as I could tell, though).

****HOWEVER****

Do not use this (not just my patch but with the whole hashtable
version) in your production repository yet.

I've got a mysterious corruption and bogus output from
fsck-objects, and have been tracking it (see the timestamp of
this message).

1.2.0 will most likely to be be *delayed*.  I have to first make
sure my private repository is sane.  Grrrrrrrr.

^ permalink raw reply

* [PATCH] Add howto about separating topics.
From: kent @ 2006-02-12 12:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmzgxn1dz.fsf@assigned-by-dhcp.cox.net>

This howto consists of a footnote from an email by JC to the git
mailing list (<7vfyms0x4p.fsf@assigned-by-dhcp.cox.net>).

Signed-off-by: Kent Engstrom <kent@lysator.liu.se>

---

 Documentation/howto/separating-topic-branches.txt |   91 +++++++++++++++++++++
 1 files changed, 91 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/howto/separating-topic-branches.txt

39f152ae224f45a3d977aa8966a477dbc1df676d
diff --git a/Documentation/howto/separating-topic-branches.txt b/Documentation/howto/separating-topic-branches.txt
new file mode 100644
index 0000000..090e2c9
--- /dev/null
+++ b/Documentation/howto/separating-topic-branches.txt
@@ -0,0 +1,91 @@
+From: Junio C Hamano <junkio@cox.net>
+Subject: Separating topic branches
+Abstract: In this article, JC describes how to separate topic branches.
+
+This text was originally a footnote to a discussion about the
+behaviour of the git diff commands.
+
+Often I find myself doing that [running diff against something other
+than HEAD] while rewriting messy development history.  For example, I
+start doing some work without knowing exactly where it leads, and end
+up with a history like this:
+
+            "master"
+        o---o
+             \                    "topic" 
+              o---o---o---o---o---o
+
+At this point, "topic" contains something I know I want, but it
+contains two concepts that turned out to be completely independent.
+And often, one topic component is larger than the other.  It may
+contain more than two topics.
+
+In order to rewrite this mess to be more manageable, I would first do
+"diff master..topic", to extract the changes into a single patch, start
+picking pieces from it to get logically self-contained units, and
+start building on top of "master":
+
+        $ git diff master..topic >P.diff
+        $ git checkout -b topicA master
+        ... pick and apply pieces from P.diff to build
+        ... commits on topicA branch.
+                      
+              o---o---o
+             /        "topicA"
+        o---o"master"
+             \                    "topic" 
+              o---o---o---o---o---o
+
+Before doing each commit on "topicA" HEAD, I run "diff HEAD"
+before update-index the affected paths, or "diff --cached HEAD"
+after.  Also I would run "diff --cached master" to make sure
+that the changes are only the ones related to "topicA".  Usually
+I do this for smaller topics first.
+
+After that, I'd do the remainder of the original "topic", but
+for that, I do not start from the patchfile I extracted by
+comparing "master" and "topic" I used initially.  Still on
+"topicA", I extract "diff topic", and use it to rebuild the
+other topic:
+
+        $ git diff -R topic >P.diff ;# --cached also would work fine
+        $ git checkout -b topicB master
+        ... pick and apply pieces from P.diff to build
+        ... commits on topicB branch.
+
+                                "topicB"
+               o---o---o---o---o
+              /
+             /o---o---o
+            |/        "topicA"
+        o---o"master"
+             \                    "topic" 
+              o---o---o---o---o---o
+
+After I am done, I'd try a pretend-merge between "topicA" and
+"topicB" in order to make sure I have not missed anything:
+
+        $ git pull . topicA ;# merge it into current "topicB"
+        $ git diff topic
+                                "topicB"
+               o---o---o---o---o---* (pretend merge)
+              /                   /
+             /o---o---o----------'
+            |/        "topicA"
+        o---o"master"
+             \                    "topic" 
+              o---o---o---o---o---o
+
+The last diff better not to show anything other than cleanups
+for crufts.  Then I can finally clean things up:
+
+        $ git branch -D topic
+        $ git reset --hard HEAD^ ;# nuke pretend merge
+
+                                "topicB"
+               o---o---o---o---o
+              / 
+             /o---o---o
+            |/        "topicA"
+        o---o"master"
+
-- 
1.1.6.g29e5

^ permalink raw reply related

* Re: [PATCH] Use a hashtable for objects instead of a sorted list
From: Florian Weimer @ 2006-02-12 11:19 UTC (permalink / raw)
  To: git
In-Reply-To: <7virrli9am.fsf@assigned-by-dhcp.cox.net>

* Junio C. Hamano:

>  static int hashtable_index(const unsigned char *sha1)
>  {
> -	unsigned int i = *(unsigned int *)sha1;
> -	return (int)(i % obj_allocs);
> +	int cnt;

> +	unsigned int ix = *sha1++;
> +
> +	for (cnt = 1; cnt < sizeof(unsigned int); cnt++) {
> +		ix <<= 8;
> +		ix |= *sha1++;
> +	}

memcpy(&ix, sha1, sizeof(ix));

(GCC should do the rest.)

> +	return (int)(ix % obj_allocs);
>  }

return (int)(ix & (obj_allocs - 1));

AFAICS, obj_allocs is a power of two.

^ permalink raw reply

* Re: Make "git clone" less of a deathly quiet experience
From: Andreas Ericsson @ 2006-02-12 11:02 UTC (permalink / raw)
  To: Keith Packard
  Cc: Linus Torvalds, Junio C Hamano, Git Mailing List, Petr Baudis
In-Reply-To: <1139717510.4183.34.camel@evo.keithp.com>

Keith Packard wrote:
> On Sun, 2006-02-12 at 04:43 +0100, Andreas Ericsson wrote:
> 
> 
>>A weird oddity; Cloning is faster over rsync, day-to-day pulling is not.
> 
> 
> Precisely. If the protocol could deliver existing packs instead of
> unpacking and repacking them, then git would be as fast as rsync and I
> wouldn't have to worry about supporting two protocols.
> 

Caching features have been discussed, but that means the daemon needs to 
have write-access to some directory within the repository. It would also 
work poorly for projects that see very rapid development unless the 
cached pack-files can be amended to. A sort of "create packs on demand". 
It shouldn't be too difficult, really.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* Re: [PATCH] Use a hashtable for objects instead of a sorted list
From: Alexandre Julliard @ 2006-02-12  8:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git
In-Reply-To: <7virrli9am.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> I am also interested to find out how much the rehashing you do
> when you update obj_allocs to a larger value is costing.
>
> Alexandle, if you have a chance, could you try Johannes' patch
> on your workload to see if it works OK for you?

It works great for me, CPU time is down to 15 sec instead of 20 sec
with my patch.

-- 
Alexandre Julliard
julliard@winehq.org

^ permalink raw reply

* Re: [PATCH] binary-tree-based objects.
From: Linus Torvalds @ 2006-02-12  7:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602112117560.3691@g5.osdl.org>



On Sat, 11 Feb 2006, Linus Torvalds wrote:
> 
> Before:
> 	real    0m41.322s	user    0m40.612s	sys     0m0.492s
> After:
> 	real    0m22.542s	user    0m22.080s	sys     0m0.448s
Johannes:
	real    0m13.814s	user    0m13.492s	sys     0m0.296s

> And just so you wouldn't think that all my machines are slow..
> 
> Before:
> 	real    0m28.645s	user    0m28.366s	sys     0m0.280s
> After:
> 	real    0m16.566s	user    0m16.373s	sys     0m0.196s
Johannes:
	real    0m10.239s	user    0m10.029s	sys     0m0.208s

So the hashing thing is indeed the clear winner.

Make it so. 

		Linus

^ permalink raw reply

* Re: [PATCH] binary-tree-based objects.
From: Linus Torvalds @ 2006-02-12  6:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Alexandre Julliard, Johannes Schindelin
In-Reply-To: <7vaccxdsaf.fsf@assigned-by-dhcp.cox.net>



On Sat, 11 Feb 2006, Junio C Hamano wrote:
> 
> It turns out that Johannes (with my patch to fix possible
> unsigned int alignment issue and the initial call to
> find_object()) is the clear winner.

Having looked at it, I will have to agree. Johannes' approach looks 
pretty clean, and has the same memory overhead mine has (two pointers per 
object in the hash - one used, one empty), but has a lot fewer memcmp() 
calls and pointer chasing.

So I'll growl softly but concur. Johannes' code isn't even very complex.

		Linus

^ permalink raw reply

* Re: [PATCH] binary-tree-based objects.
From: Junio C Hamano @ 2006-02-12  6:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Alexandre Julliard, Johannes Schindelin
In-Reply-To: <7v1wy9f7q4.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> Linus Torvalds <torvalds@osdl.org> writes:
>
>> On Sat, 11 Feb 2006, Linus Torvalds wrote:
>>> 
>>> If somebody shows that the other approaches are faster, then I guess I'll 
>>> just have to sulk in a corner and grown quietly at people.
>>
>> growl. growL. With an 'L'!
>
> I do not get it.
> ...

I first suspected you just meant the typo (s/grown/growl/) but
it probably is that you really meant GROWL (and sulk).

It turns out that Johannes (with my patch to fix possible
unsigned int alignment issue and the initial call to
find_object()) is the clear winner.

        base   - tip of "master"
        lt-obj - the binary tree without balancing
        aj-obj - Alexandre's 256-way buckets
        js-obj - Johannes' circular hash

Although I have _not_ double checked the correctness of them, I
did not see major flaw in any of them.

base/git-rev-list --objects v2.6.14..linus
real	2m32.088s	user	2m2.830s	sys	0m0.890s
real	2m6.614s	user	2m1.860s	sys	0m0.660s
real	2m13.776s	user	2m2.450s	sys	0m0.590s
real	2m6.062s	user	2m2.420s	sys	0m0.690s
real	2m15.567s	user	2m3.170s	sys	0m0.900s

lt-obj/git-rev-list --objects v2.6.14..linus
real	0m42.889s	user	0m40.170s	sys	0m0.570s
real	0m44.247s	user	0m40.320s	sys	0m0.530s
real	0m40.891s	user	0m40.110s	sys	0m0.500s
real	0m41.874s	user	0m40.090s	sys	0m0.530s
real	0m41.596s	user	0m40.050s	sys	0m0.600s

aj-obj/git-rev-list --objects v2.6.14..linus
real	0m36.842s	user	0m36.200s	sys	0m0.490s
real	0m37.178s	user	0m36.740s	sys	0m0.390s
real	0m37.222s	user	0m36.540s	sys	0m0.610s
real	0m36.924s	user	0m36.410s	sys	0m0.360s
real	0m37.341s	user	0m36.150s	sys	0m0.620s

js-obj/git-rev-list --objects v2.6.14..linus
real	0m24.689s	user	0m24.120s	sys	0m0.390s
real	0m24.753s	user	0m24.020s	sys	0m0.360s
real	0m27.650s	user	0m24.470s	sys	0m0.440s
real	0m33.480s	user	0m24.030s	sys	0m0.460s
real	0m25.329s	user	0m24.490s	sys	0m0.390s


base/git-name-rev --all
real	0m4.193s	user	0m4.060s	sys	0m0.130s
real	0m4.179s	user	0m4.100s	sys	0m0.080s
real	0m4.210s	user	0m4.040s	sys	0m0.150s
real	0m4.162s	user	0m4.100s	sys	0m0.060s
real	0m4.697s	user	0m4.100s	sys	0m0.120s

lt-obj/git-name-rev --all
real	0m2.199s	user	0m2.120s	sys	0m0.080s
real	0m2.186s	user	0m2.110s	sys	0m0.080s
real	0m2.187s	user	0m2.150s	sys	0m0.040s
real	0m2.817s	user	0m2.150s	sys	0m0.070s
real	0m2.323s	user	0m2.170s	sys	0m0.050s

aj-obj/git-name-rev --all
real	0m2.136s	user	0m2.050s	sys	0m0.080s
real	0m2.164s	user	0m2.080s	sys	0m0.060s
real	0m2.143s	user	0m2.070s	sys	0m0.070s
real	0m2.141s	user	0m2.080s	sys	0m0.060s
real	0m2.154s	user	0m2.070s	sys	0m0.090s

js-obj/git-name-rev --all
real	0m2.047s	user	0m2.010s	sys	0m0.040s
real	0m2.040s	user	0m1.970s	sys	0m0.070s
real	0m2.025s	user	0m1.970s	sys	0m0.060s
real	0m2.170s	user	0m2.020s	sys	0m0.030s
real	0m2.046s	user	0m2.010s	sys	0m0.030s

^ permalink raw reply

* Re: [PATCH] binary-tree-based objects.
From: Junio C Hamano @ 2006-02-12  5:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602112122400.3691@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> On Sat, 11 Feb 2006, Linus Torvalds wrote:
>> 
>> If somebody shows that the other approaches are faster, then I guess I'll 
>> just have to sulk in a corner and grown quietly at people.
>
> growl. growL. With an 'L'!

I do not get it.

But my impression was the circular hash with trivial fixes were
the fastest.  I am benching them now.

^ permalink raw reply

* Re: [PATCH] binary-tree-based objects.
From: Linus Torvalds @ 2006-02-12  5:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602112117560.3691@g5.osdl.org>



On Sat, 11 Feb 2006, Linus Torvalds wrote:
> 
> If somebody shows that the other approaches are faster, then I guess I'll 
> just have to sulk in a corner and grown quietly at people.

growl. growL. With an 'L'!

		Linus

^ permalink raw reply

* Re: [PATCH] binary-tree-based objects.
From: Linus Torvalds @ 2006-02-12  5:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0602112045340.3691@g5.osdl.org>



On Sat, 11 Feb 2006, Linus Torvalds wrote:
> 
> Before:
> 	real    0m41.322s	user    0m40.612s	sys     0m0.492s
> 	real    0m40.797s	user    0m40.140s	sys     0m0.468s
> 	real    0m40.433s	user    0m40.016s	sys     0m0.412s
> 
> After:
> 	real    0m22.542s	user    0m22.080s	sys     0m0.448s
> 	real    0m22.660s	user    0m22.336s	sys     0m0.312s
> 	real    0m22.671s	user    0m22.236s	sys     0m0.292s

And just so you wouldn't think that all my machines are slow..

Before:
	real    0m28.645s	user    0m28.366s	sys     0m0.280s
	real    0m28.700s	user    0m28.486s	sys     0m0.212s

After:
	real    0m16.566s	user    0m16.373s	sys     0m0.196s
	real    0m16.512s	user    0m16.277s	sys     0m0.236s

so there (that's all with current kernel HEAD, mostly packed).

Now, I haven't compared it to the other suggested fixes (hashing, and the 
256-way bucket-sorting), but I obviously prefer the tree approach because 
it's my idea (and my ideas are _always_ superior) and because it's so dang 
simple.

If somebody shows that the other approaches are faster, then I guess I'll 
just have to sulk in a corner and grown quietly at people.

		Linus

^ permalink raw reply

* Re: [PATCH] binary-tree-based objects.
From: Linus Torvalds @ 2006-02-12  5:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vhd75fc6y.fsf_-_@assigned-by-dhcp.cox.net>



On Sat, 11 Feb 2006, Junio C Hamano wrote:
> 
>  * I haven't benched this seriously yet.  One datapoint:
> 
> 	time git-rev-list --objects v2.6.15..linus | wc -l
> 
>    are 53sec vs 22sec improvement with the same output.

Another datapoint: doing 

	time git-rev-list --objects HEAD > /dev/null 

three times in a row (to verify that the numbers are stable - they very 
clearly are).

Before:
	real    0m41.322s	user    0m40.612s	sys     0m0.492s
	real    0m40.797s	user    0m40.140s	sys     0m0.468s
	real    0m40.433s	user    0m40.016s	sys     0m0.412s

After:
	real    0m22.542s	user    0m22.080s	sys     0m0.448s
	real    0m22.660s	user    0m22.336s	sys     0m0.312s
	real    0m22.671s	user    0m22.236s	sys     0m0.292s

and doing some trivial oprofile runs shows that the object lookup is no 
longer dominant (my libc's don't have symbol information, so I don't get 
good profile data, but it shows that libc and libz are the biggest issues, 
with memcmp and malloc/free apparently being much bigger issues than the 
object lookup).

			Linus

^ permalink raw reply

* Re: Make "git clone" less of a deathly quiet experience
From: Keith Packard @ 2006-02-12  4:11 UTC (permalink / raw)
  To: Andreas Ericsson
  Cc: keithp, Linus Torvalds, Junio C Hamano, Git Mailing List,
	Petr Baudis
In-Reply-To: <43EEAEF3.7040202@op5.se>

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

On Sun, 2006-02-12 at 04:43 +0100, Andreas Ericsson wrote:

> A weird oddity; Cloning is faster over rsync, day-to-day pulling is not.

Precisely. If the protocol could deliver existing packs instead of
unpacking and repacking them, then git would be as fast as rsync and I
wouldn't have to worry about supporting two protocols.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* [PATCH] binary-tree-based objects.
From: Junio C Hamano @ 2006-02-12  4:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7vslqpi9mg.fsf@assigned-by-dhcp.cox.net>

This implements Linus' idea to keep objects in a binary tree,
instead of using the linear array as we currently do.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 * I haven't benched this seriously yet.  One datapoint:

	time git-rev-list --objects v2.6.15..linus | wc -l

   are 53sec vs 22sec improvement with the same output.

 fsck-objects.c |   36 +++++++++++++++++----------------
 name-rev.c     |   17 +++++++++++-----
 object.c       |   61 +++++++++++++++++++++-----------------------------------
 object.h       |    3 ++-
 4 files changed, 55 insertions(+), 62 deletions(-)

3c160f4d94cf16db5dc9c603e98ebacbe9ac4ca7
diff --git a/fsck-objects.c b/fsck-objects.c
index 9950be2..28a7c1b 100644
--- a/fsck-objects.c
+++ b/fsck-objects.c
@@ -56,23 +56,21 @@ static int objwarning(struct object *obj
 }
 
 
-static void check_connectivity(void)
+static void check_connectivity(struct object *obj)
 {
-	int i;
-
 	/* Look up all the requirements, warn about missing objects.. */
-	for (i = 0; i < nr_objs; i++) {
-		struct object *obj = objs[i];
-
-		if (!obj->parsed) {
-			if (!standalone && has_sha1_file(obj->sha1))
-				; /* it is in pack */
-			else
-				printf("missing %s %s\n",
-				       obj->type, sha1_to_hex(obj->sha1));
-			continue;
-		}
+ again:
+	if (!obj)
+		return;
 
+	if (!obj->parsed) {
+		if (!standalone && has_sha1_file(obj->sha1))
+			; /* it is in pack */
+		else
+			printf("missing %s %s\n",
+			       obj->type, sha1_to_hex(obj->sha1));
+	}
+	else {
 		if (obj->refs) {
 			const struct object_refs *refs = obj->refs;
 			unsigned j;
@@ -91,14 +89,16 @@ static void check_connectivity(void)
 		if (show_unreachable && !(obj->flags & REACHABLE)) {
 			printf("unreachable %s %s\n",
 			       obj->type, sha1_to_hex(obj->sha1));
-			continue;
 		}
-
-		if (!obj->used) {
+		else if (!obj->used) {
 			printf("dangling %s %s\n", obj->type, 
 			       sha1_to_hex(obj->sha1));
 		}
 	}
+	if (obj->left && obj->right)
+		check_connectivity(obj->left);
+	obj = obj->right ? obj->right : obj->left;
+	goto again;
 }
 
 /*
@@ -556,6 +556,6 @@ int main(int argc, char **argv)
 		}
 	}
 
-	check_connectivity();
+	check_connectivity(objs_root);
 	return 0;
 }
diff --git a/name-rev.c b/name-rev.c
index bbadb91..a4fecfb 100644
--- a/name-rev.c
+++ b/name-rev.c
@@ -120,6 +120,17 @@ static const char* get_rev_name(struct o
 	return buffer;
 }
 	
+void show_all_names(struct object *obj)
+{
+	while (obj) {
+		printf("%s %s\n", sha1_to_hex(obj->sha1), get_rev_name(obj));
+		if (obj->left && obj->right)
+			show_all_names(obj->left);
+		obj = obj->right ? obj->right : obj->left;
+	}
+}
+
+
 int main(int argc, char **argv)
 {
 	struct object_list *revs = NULL;
@@ -230,11 +241,7 @@ int main(int argc, char **argv)
 				fwrite(p_start, p - p_start, 1, stdout);
 		}
 	} else if (all) {
-		int i;
-
-		for (i = 0; i < nr_objs; i++)
-			printf("%s %s\n", sha1_to_hex(objs[i]->sha1),
-					get_rev_name(objs[i]));
+		show_all_names(objs_root);
 	} else
 		for ( ; revs; revs = revs->next)
 			printf("%s %s\n", revs->name, get_rev_name(revs->item));
diff --git a/object.c b/object.c
index 1577f74..a1b0729 100644
--- a/object.c
+++ b/object.c
@@ -5,65 +5,50 @@
 #include "commit.h"
 #include "tag.h"
 
-struct object **objs;
+struct object *objs_root;
 int nr_objs;
-static int obj_allocs;
 
 int track_object_refs = 1;
 
-static int find_object(const unsigned char *sha1)
+static struct object **lookup_object_position(const unsigned char *sha1)
 {
-	int first = 0, last = nr_objs;
+	struct object **p = &objs_root;
 
-        while (first < last) {
-                int next = (first + last) / 2;
-                struct object *obj = objs[next];
-                int cmp;
-
-                cmp = memcmp(sha1, obj->sha1, 20);
-                if (!cmp)
-                        return next;
-                if (cmp < 0) {
-                        last = next;
-                        continue;
-                }
-                first = next+1;
-        }
-        return -first-1;
+	for (;;) {
+		struct object *object = *p;
+		int sign;
+
+		if (!object)
+			break;
+		sign = memcmp(sha1, object->sha1, 20);
+		if (!sign)
+			break;
+		p = &object->left;
+		if (sign < 0)
+			continue;
+		p = &object->right;
+	}
+	return p;
 }
 
 struct object *lookup_object(const unsigned char *sha1)
 {
-	int pos = find_object(sha1);
-	if (pos >= 0)
-		return objs[pos];
-	return NULL;
+	return *lookup_object_position(sha1);
 }
 
 void created_object(const unsigned char *sha1, struct object *obj)
 {
-	int pos = find_object(sha1);
+	struct object **op = lookup_object_position(sha1);
 
 	obj->parsed = 0;
 	memcpy(obj->sha1, sha1, 20);
 	obj->type = NULL;
 	obj->refs = NULL;
 	obj->used = 0;
-
-	if (pos >= 0)
+	obj->left = obj->right = NULL;
+	if (*op)
 		die("Inserting %s twice\n", sha1_to_hex(sha1));
-	pos = -pos-1;
-
-	if (obj_allocs == nr_objs) {
-		obj_allocs = alloc_nr(obj_allocs);
-		objs = xrealloc(objs, obj_allocs * sizeof(struct object *));
-	}
-
-	/* Insert it into the right place */
-	memmove(objs + pos + 1, objs + pos, (nr_objs - pos) * 
-		sizeof(struct object *));
-
-	objs[pos] = obj;
+	*op = obj;
 	nr_objs++;
 }
 
diff --git a/object.h b/object.h
index 0e76182..32b276d 100644
--- a/object.h
+++ b/object.h
@@ -19,12 +19,13 @@ struct object {
 	unsigned char sha1[20];
 	const char *type;
 	struct object_refs *refs;
+	struct object *left, *right;
 	void *util;
 };
 
 extern int track_object_refs;
 extern int nr_objs;
-extern struct object **objs;
+extern struct object *objs_root;
 
 /** Internal only **/
 struct object *lookup_object(const unsigned char *sha1);
-- 
1.1.6.g69c5

^ permalink raw reply related

* Re: Two crazy proposals for changing git's diff commands
From: Junio C Hamano @ 2006-02-12  3:48 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: git
In-Reply-To: <20060212031527.GA31228@fieldses.org>

"J. Bruce Fields" <bfields@fieldses.org> writes:

> On Wed, Feb 08, 2006 at 05:21:12PM -0800, Junio C Hamano wrote:
>> Of course, learning various flags to give "git diff" is part of
>> understanding the index
>
> Well, there's understanding the index, and then there's memorizing the
> flags...
> ...
> But maybe that's just me.  (And maybe the namespace in question is
> already to crowded to allow for INDEX and WORK.)

I do not think it is just you.  The real problem, honestly
speaking, is that "git diff" wrapper cheats and avoids doing its
own set of flags.

The low-level is just a mechanism UI is built upon, and as a
mechanism, except perhaps maybe --cached might be now better
spelled as --index, has set of options and semantics that are
consistent with its world model (index centric way of thinking).

Because "git diff" wrapper cheats, it ends up exposing the
low-level flags and arguments to the end user, and to use that
effectively, obviously you need to understand the world model
the low-level is built upon.

It was OK (it could be argued that it was even better than sugar
coating to make it *inconsistent* with the underlying world
model) so far, as long as people who use it are aware of the
index centric world model, but that "consistency with the
underlying world model" makes it harder to approach and causes
confusion.

That is why I these days often mention "welding training
wheels".  Doing half-baked sugarcoating of the UI layer would
break mental model of people who understand the world model
low-level builds and tries to make effective use of low-level
through the UI.

^ permalink raw reply

* Re: Make "git clone" less of a deathly quiet experience
From: Andreas Ericsson @ 2006-02-12  3:43 UTC (permalink / raw)
  To: Keith Packard
  Cc: Linus Torvalds, Junio C Hamano, Git Mailing List, Petr Baudis
In-Reply-To: <1139685031.4183.31.camel@evo.keithp.com>

Keith Packard wrote:
> On Sat, 2006-02-11 at 09:45 -0800, Linus Torvalds wrote:
> 
> 
>>More importantly, it really wouldn't have helped that much in this 
>>situation. At least for me, the network is 90% of the problem, the 
>>pack-file generation is at most 10%. So cached packfiles really only 
>>matter for server-side problems (high CPU load, or lack of memory, or 
>>heavy disk activity).
> 
> 
> I'd like to see git use less CPU than CVS does on my distribution host;
> some mechanism for re-using either existing or cached packs would help a
> whole lot with that. The alternative is to see people switch to rsync
> instead, which seems like a far worse idea.   
> 

A weird oddity; Cloning is faster over rsync, day-to-day pulling is not.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* Re: Two crazy proposals for changing git's diff commands
From: J. Bruce Fields @ 2006-02-12  3:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Carl Worth, git
In-Reply-To: <7vfymtl43b.fsf@assigned-by-dhcp.cox.net>

On Wed, Feb 08, 2006 at 05:21:12PM -0800, Junio C Hamano wrote:
> Of course, learning various flags to give "git diff" is part of
> understanding the index

Well, there's understanding the index, and then there's memorizing the
flags.  I would've thought it'd be a lot easier to remember something
like

git diff HEAD INDEX
git diff INDEX WORK
git diff HEAD WORK

than, respectively,

git diff --cached
git diff
git diff HEAD

But maybe that's just me.  (And maybe the namespace in question is
already to crowded to allow for INDEX and WORK.)

--b.

^ permalink raw reply

* [PATCH] Add support for explicit type specifiers when calling git-repo-config
From: Petr Baudis @ 2006-02-12  3:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: A Large Angry SCM, git
In-Reply-To: <7vwtg2pkt2.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Sat, Feb 11, 2006 at 05:43:21AM CET, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
>  (3) neither of these commands know list of all the possible
>      configuration items, nor types of them, so core.filename
>      can be spelled as "1" or "true" to mean the same thing to
>      our C code, but repo-config faithfully returns how the
>      value is literally spelled in the configuration file.  The
>      following two means the same thing to the C layer, so the
>      calling script needs to further interpret the output from
>      git-repo-config:
> 
> 	$ git repo-config core.filemode ;# [core] filemode=1
> 	1
> 	$ git repo-config core.filemode ;# [core] filemode=true
> 	true
> 
>  (4) worse, boolean 'true' can be specified by just having the
>      configuration item in the file, but repo-config dumps core
>      on that:
> 
> 	$ git repo-config core.filemode ;# [core] filemode
>         segmentation fault

This patch provides a partial solution - if you query only for variables
of the same type (or just a single variable), this adds type-checking
and transformation to the given type.

It is basically what Cogito would like to see - centralized variables
database in GIT won't help us, but we would like to have custom but
still typed variables in the config file.

---

[PATCH] Add support for explicit type specifiers when calling git-repo-config

Currently, git-repo-config will just return the raw value of option
as specified in the config file; this makes things difficult for scripts
calling it, especially if the value is supposed to be boolean.

This patch makes it possible to ask git-repo-config to check if the option
is of the given type (int or bool) and write out the value in its
canonical form. If you do not pass --int or --bool, the behaviour stays
unchanged and the raw value is emitted.

This also incidentally fixes the segfault when option with no value is
encountered.

Signed-off-by: Petr Baudis <pasky@suse.cz>

---
commit 8dcc626cd144b2c6eae2a299242bbbe905cb0059
tree 0d4dcc3a44eb318ef52c3d64dda11768745f7583
parent 29e55cd5ad9e17d2ff8a1a37b7ee45d18d1e59d6
author Petr Baudis <pasky@suse.cz> Sun, 12 Feb 2006 04:09:01 +0100
committer Petr Baudis <xpasky@machine.or.cz> Sun, 12 Feb 2006 04:09:01 +0100

 Documentation/git-repo-config.txt |   18 ++++++--
 repo-config.c                     |   80 +++++++++++++++++++++++--------------
 2 files changed, 62 insertions(+), 36 deletions(-)

diff --git a/Documentation/git-repo-config.txt b/Documentation/git-repo-config.txt
index 3069464..33fcde4 100644
--- a/Documentation/git-repo-config.txt
+++ b/Documentation/git-repo-config.txt
@@ -8,12 +8,12 @@ git-repo-config - Get and set options in
 
 SYNOPSIS
 --------
-'git-repo-config' name [value [value_regex]]
-'git-repo-config' --replace-all name [value [value_regex]]
-'git-repo-config' --get name [value_regex]
-'git-repo-config' --get-all name [value_regex]
-'git-repo-config' --unset name [value_regex]
-'git-repo-config' --unset-all name [value_regex]
+'git-repo-config' [type] name [value [value_regex]]
+'git-repo-config' [type] --replace-all name [value [value_regex]]
+'git-repo-config' [type] --get name [value_regex]
+'git-repo-config' [type] --get-all name [value_regex]
+'git-repo-config' [type] --unset name [value_regex]
+'git-repo-config' [type] --unset-all name [value_regex]
 
 DESCRIPTION
 -----------
@@ -26,6 +26,12 @@ should provide a POSIX regex for the val
 *not* matching the regex, just prepend a single exclamation mark in front
 (see EXAMPLES).
 
+The type specifier can be either '--int' or '--bool', which will make
+'git-repo-config' ensure that the variable(s) are of the given type and
+convert the value to the canonical form (simple decimal number for int,
+a "true" or "false" string for bool). If no type specifier is passed,
+no checks or transformations are performed on the value.
+
 This command will fail if
 
 . .git/config is invalid,
diff --git a/repo-config.c b/repo-config.c
index c31e441..ccdee3c 100644
--- a/repo-config.c
+++ b/repo-config.c
@@ -2,7 +2,7 @@
 #include <regex.h>
 
 static const char git_config_set_usage[] =
-"git-repo-config [--get | --get-all | --replace-all | --unset | --unset-all] name [value [value_regex]]";
+"git-repo-config [ --bool | --int ] [--get | --get-all | --replace-all | --unset | --unset-all] name [value [value_regex]]";
 
 static char* key = NULL;
 static char* value = NULL;
@@ -10,6 +10,7 @@ static regex_t* regexp = NULL;
 static int do_all = 0;
 static int do_not_match = 0;
 static int seen = 0;
+static enum { T_RAW, T_INT, T_BOOL } type = T_RAW;
 
 static int show_config(const char* key_, const char* value_)
 {
@@ -25,7 +26,17 @@ static int show_config(const char* key_,
 			fprintf(stderr, "More than one value: %s\n", value);
 			free(value);
 		}
-		value = strdup(value_);
+
+		if (type == T_INT) {
+			value = malloc(256);
+			sprintf(value, "%d", git_config_int(key_, value_));
+		} else if (type == T_BOOL) {
+			value = malloc(256);
+			sprintf(value, "%s", git_config_bool(key_, value_)
+					     ? "true" : "false");
+		} else {
+			value = strdup(value_ ? : "");
+		}
 		seen++;
 	}
 	return 0;
@@ -72,43 +83,52 @@ static int get_value(const char* key_, c
 
 int main(int argc, const char **argv)
 {
+	int i;
 	setup_git_directory();
-	switch (argc) {
+	for (i = 1; i < argc; i++) {
+		if (!strcmp(argv[i], "--int"))
+			type = T_INT;
+		else if (!strcmp(argv[i], "--bool"))
+			type = T_BOOL;
+		else
+			break;
+	}
+	switch (argc-i) {
+	case 1:
+		return get_value(argv[i], NULL);
 	case 2:
-		return get_value(argv[1], NULL);
-	case 3:
-		if (!strcmp(argv[1], "--unset"))
-			return git_config_set(argv[2], NULL);
-		else if (!strcmp(argv[1], "--unset-all"))
-			return git_config_set_multivar(argv[2], NULL, NULL, 1);
-		else if (!strcmp(argv[1], "--get"))
-			return get_value(argv[2], NULL);
-		else if (!strcmp(argv[1], "--get-all")) {
+		if (!strcmp(argv[i], "--unset"))
+			return git_config_set(argv[i+1], NULL);
+		else if (!strcmp(argv[i], "--unset-all"))
+			return git_config_set_multivar(argv[i+1], NULL, NULL, 1);
+		else if (!strcmp(argv[i], "--get"))
+			return get_value(argv[i+1], NULL);
+		else if (!strcmp(argv[i], "--get-all")) {
 			do_all = 1;
-			return get_value(argv[2], NULL);
+			return get_value(argv[i+1], NULL);
 		} else
 
-			return git_config_set(argv[1], argv[2]);
-	case 4:
-		if (!strcmp(argv[1], "--unset"))
-			return git_config_set_multivar(argv[2], NULL, argv[3], 0);
-		else if (!strcmp(argv[1], "--unset-all"))
-			return git_config_set_multivar(argv[2], NULL, argv[3], 1);
-		else if (!strcmp(argv[1], "--get"))
-			return get_value(argv[2], argv[3]);
-		else if (!strcmp(argv[1], "--get-all")) {
+			return git_config_set(argv[i], argv[i+1]);
+	case 3:
+		if (!strcmp(argv[i], "--unset"))
+			return git_config_set_multivar(argv[i+1], NULL, argv[i+2], 0);
+		else if (!strcmp(argv[i], "--unset-all"))
+			return git_config_set_multivar(argv[i+1], NULL, argv[i+2], 1);
+		else if (!strcmp(argv[i], "--get"))
+			return get_value(argv[i+1], argv[i+2]);
+		else if (!strcmp(argv[i], "--get-all")) {
 			do_all = 1;
-			return get_value(argv[2], argv[3]);
-		} else if (!strcmp(argv[1], "--replace-all"))
+			return get_value(argv[i+1], argv[i+2]);
+		} else if (!strcmp(argv[i], "--replace-all"))
 
-			return git_config_set_multivar(argv[2], argv[3], NULL, 1);
+			return git_config_set_multivar(argv[i+1], argv[i+2], NULL, 1);
 		else
 
-			return git_config_set_multivar(argv[1], argv[2], argv[3], 0);
-	case 5:
-		if (!strcmp(argv[1], "--replace-all"))
-			return git_config_set_multivar(argv[2], argv[3], argv[4], 1);
-	case 1:
+			return git_config_set_multivar(argv[i], argv[i+1], argv[i+2], 0);
+	case 4:
+		if (!strcmp(argv[i], "--replace-all"))
+			return git_config_set_multivar(argv[i+1], argv[i+2], argv[i+3], 1);
+	case 0:
 	default:
 		usage(git_config_set_usage);
 	}


-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply related

* Re: [PATCH] Teach repo-config the -l and --get-regexp options
From: Junio C Hamano @ 2006-02-12  3:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0602111306450.25997@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> 	Happier?

Not really.

It still dumps core with:

	[core]
        	boolvarsaretrueiftheirnamesarelisted

The patch does not address any of the more important issues I
listed with git-var and git-repo-config in that message.

^ permalink raw reply

* Re: [PATCH] fetch-clone progress: finishing touches.
From: Linus Torvalds @ 2006-02-12  3:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vslqpjq2q.fsf@assigned-by-dhcp.cox.net>



On Sat, 11 Feb 2006, Junio C Hamano wrote:
> 
>    BTW, don't you mean 512 down there???
> 
>         -	msecs += (int)(tv.tv_usec - prev_tv.tv_usec) >> 10;
>         +	msecs += usec_to_binarymsec(tv.tv_usec - prev_tv.tv_usec);
>         +
>                 if (msecs > 500) {
>                         prev_tv = tv;

Well, it's just a random number, but if you like 512 better than 500, go 
wild ;)

		Linus

^ permalink raw reply

* Re: [PATCH] Use a hashtable for objects instead of a sorted list
From: Junio C Hamano @ 2006-02-12  2:46 UTC (permalink / raw)
  To: Johannes Schindelin, Alexandre Julliard; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0602120254260.10235@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> In a simple test, this brings down the CPU time from 47 sec to 22 sec.

I was planning to take Alexandre's patch, but the approach your
patch takes feels more correct -- it scales with the number of
objects you need to handle, instead of having fixed 256
hashbuckets.

BTW, your version dumped core in hashtable_index immediately
after I started "git-rev-list --objects HEAD".  How did you get
_any_ CPU time?

I am not sure expecting that object name pointers are always
(unsigned int *) aligned as your patch does is OK.  We may want
to have something like the attached patch on top of yours.

I am also interested to find out how much the rehashing you do
when you update obj_allocs to a larger value is costing.

Alexandle, if you have a chance, could you try Johannes' patch
on your workload to see if it works OK for you?

-- >8 --
[PATCH] do not assume object name pointers are uint aligned.

Also fix an obvious bug that caused it dump core at my first
attempt.  There might be others but I did not actively look for
them.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff --git a/object.c b/object.c
index 3259862..59e5e36 100644
--- a/object.c
+++ b/object.c
@@ -13,17 +13,24 @@ int track_object_refs = 1;
 
 static int hashtable_index(const unsigned char *sha1)
 {
-	unsigned int i = *(unsigned int *)sha1;
-	return (int)(i % obj_allocs);
+	int cnt;
+	unsigned int ix = *sha1++;
+
+	for (cnt = 1; cnt < sizeof(unsigned int); cnt++) {
+		ix <<= 8;
+		ix |= *sha1++;
+	}
+	return (int)(ix % obj_allocs);
 }
 
 static int find_object(const unsigned char *sha1)
 {
-	int i = hashtable_index(sha1);
+	int i;
 
 	if (!objs)
 		return -1;
 
+	i = hashtable_index(sha1);
 	while (objs[i]) {
 		if (memcmp(sha1, objs[i]->sha1, 20) == 0)
 			return i;

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox