Git development
 help / color / mirror / Atom feed
* Re: Horrible re-packing?
From: Nicolas Pitre @ 2006-06-05 21:22 UTC (permalink / raw)
  To: Olivier Galibert; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List
In-Reply-To: <20060605211436.GA58708@dspnet.fr.eu.org>

On Mon, 5 Jun 2006, Olivier Galibert wrote:

> On Mon, Jun 05, 2006 at 12:03:31PM -0700, Linus Torvalds wrote:
> > Comments?
> 
> Why don't you just sort the full path+filename with a strcmp variant
> that starts by the end of the string for comparison?  May at least be
> simpler to understand.

Much more expensive for both memory usage and CPU cycles.


Nicolas

^ permalink raw reply

* Re: Horrible re-packing?
From: Linus Torvalds @ 2006-06-05 21:27 UTC (permalink / raw)
  To: Olivier Galibert; +Cc: Junio C Hamano, Nicolas Pitre, Git Mailing List
In-Reply-To: <20060605211436.GA58708@dspnet.fr.eu.org>



On Mon, 5 Jun 2006, Olivier Galibert wrote:
> 
> Why don't you just sort the full path+filename with a strcmp variant
> that starts by the end of the string for comparison?  May at least be
> simpler to understand.

That's actually what I was going to do, but we don't save the whole name, 
just the sorting number.

(This is actually an area where saving space is important - we can easily 
be working with hundreds of thousands or millions of objects, and we don't 
want to keep the name of each of them around).

So the suggested hash sort is designed exactly to end up approximating 
that ascii sort-from-end-of-string.

		Linus

^ permalink raw reply

* Re: Horrible re-packing?
From: Linus Torvalds @ 2006-06-05 21:40 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606051637490.24152@localhost.localdomain>



On Mon, 5 Jun 2006, Nicolas Pitre wrote:
> 
> In other words, the pack shrunk to less than half the size of the 
> previous one !

Ok, that's a bit more extreme than expected.

It's obviously great news, and says that the approach of sorting by 
"reversed name" is a great heuristic, but at the same time it makes me 
worry a bit that this thing that is supposed to be a heuristic ends up 
being _so_ important from a pack size standpoint. I was happier when it 
was more about saving a couple of percent.

Now, your repo may be a strange case, and it just happens to fit the 
suggested hash, but on the other hand it's nice to see three totally 
different repositories that all improve, albeit with wildly different 
numbers.

I'm wondering if we could have some "incremental optimizer" thing that 
would take a potentially badly packed archive, and just start looking for 
better delta chain possibilities? That way we would still try to get a 
good initial pack with some heuristic, but we could have people run the 
incremental improver every once in a while looking for good deltas that it 
missed due to the project not fitting the heuristics..

The fact that we normally do incremental repacking (and "-f" is unusual) 
is obviously one thing that makes us less susceptible to bad patterns (and 
is also what allows us to run the incremental optimizer - any good delta 
choice will automatically percolate into subsequent versions, including 
packs that have been cloned).

So the packing strategy itself seems to be very stable (and partly _due_ 
to the "optimization" to re-use earlier pack choices), but we currently 
lack the thing that fixes up any initial bad assumptions in case they 
happen.

			Linus

^ permalink raw reply

* Re: [PATCH] Fix git_setup_directory_gently when GIT_DIR is set
From: Johannes Schindelin @ 2006-06-05 22:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vodx74ca9.fsf@assigned-by-dhcp.cox.net>

Hi,

On Mon, 5 Jun 2006, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > When calling git_setup_directory_gently, and GIT_DIR was set, it just
> > ignored the variable nongit_ok.
> 
> Hmph.  Is this really a breakage?  That is, gently() is meant
> for a case where you do not know if you even find a git
> repository and tell it not to complain because you are prepared
> for the case where you are not in a git repository.

Yes, it is a breakage: in git-clone, line 212, we explicitely set GIT_DIR 
(to the not-yet-existing repository path), and call git-init-db. Now, with 
the alias thing we need to get the config if it exists, so we _got_ to 
call gently(). Boom.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH] Fix git_setup_directory_gently when GIT_DIR is set
From: Junio C Hamano @ 2006-06-05 23:10 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0606060053440.25344@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi,
>
> On Mon, 5 Jun 2006, Junio C Hamano wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> 
>> > When calling git_setup_directory_gently, and GIT_DIR was set, it just
>> > ignored the variable nongit_ok.
>> 
>> Hmph.  Is this really a breakage?  That is, gently() is meant
>> for a case where you do not know if you even find a git
>> repository and tell it not to complain because you are prepared
>> for the case where you are not in a git repository.
>
> Yes, it is a breakage: in git-clone, line 212, we explicitely set GIT_DIR 
> (to the not-yet-existing repository path), and call git-init-db. Now, with 
> the alias thing we need to get the config if it exists, so we _got_ to 
> call gently(). Boom.

Hmph.  Would it be a bug in clone that does not create GIT_DIR
then?

^ permalink raw reply

* Re: Horrible re-packing?
From: Nicolas Pitre @ 2006-06-05 23:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606051432270.5498@g5.osdl.org>

On Mon, 5 Jun 2006, Linus Torvalds wrote:

> 
> 
> On Mon, 5 Jun 2006, Nicolas Pitre wrote:
> > 
> > In other words, the pack shrunk to less than half the size of the 
> > previous one !
> 
> Ok, that's a bit more extreme than expected.
> 
> It's obviously great news, and says that the approach of sorting by 
> "reversed name" is a great heuristic, but at the same time it makes me 
> worry a bit that this thing that is supposed to be a heuristic ends up 
> being _so_ important from a pack size standpoint. I was happier when it 
> was more about saving a couple of percent.

Well... this is the repository that exhibited a repack regression a 
while ago, going from something like ~40MB to ~160MB when Junio 
initially added the directory in the name hash.  No other popular 
repositories had that problem.

Which is why I said this repo is particularly sensitive to heuristic 
changes.  So I wouldn't worry too much about your proposed patch making 
it too great in this case.  It certainly didn't cause any (significant) 
regression overall which is what matters.

We already have surprizing results when combining two heuristics 
together although if used separately they do worse.  So trying to have 
fallback/incremental heuristics is going to make things simply too 
complicated for when it breaks.  Better experiment with new ideas and 
adopt them when they do a better job universally.

... which your proposed hashing change does.


Nicolas

^ permalink raw reply

* Re: Fix typo in tutorial-2.txt
From: Johannes Schindelin @ 2006-06-05 23:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: J. Bruce Fields, Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606051245470.5498@g5.osdl.org>

Hi,

On Mon, 5 Jun 2006, Linus Torvalds wrote:

> I didn't actually _test_ the tutorial, but if the old command worked, 
> something is really wrong!

Maybe some new gitter wants to get dirty hands? We used to have a test 
which just replayed the commands in the tutorial, to make sure that new 
users would not hit a regression or a syntax change...

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH] Fix git_setup_directory_gently when GIT_DIR is set
From: Johannes Schindelin @ 2006-06-05 23:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vk67v2o85.fsf@assigned-by-dhcp.cox.net>

Hi,

On Mon, 5 Jun 2006, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > Hi,
> >
> > On Mon, 5 Jun 2006, Junio C Hamano wrote:
> >
> >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> >> 
> >> > When calling git_setup_directory_gently, and GIT_DIR was set, it just
> >> > ignored the variable nongit_ok.
> >> 
> >> Hmph.  Is this really a breakage?  That is, gently() is meant
> >> for a case where you do not know if you even find a git
> >> repository and tell it not to complain because you are prepared
> >> for the case where you are not in a git repository.
> >
> > Yes, it is a breakage: in git-clone, line 212, we explicitely set GIT_DIR 
> > (to the not-yet-existing repository path), and call git-init-db. Now, with 
> > the alias thing we need to get the config if it exists, so we _got_ to 
> > call gently(). Boom.
> 
> Hmph.  Would it be a bug in clone that does not create GIT_DIR
> then?

I don't think so. The whole point in calling git-init-db is to create 
that. GIT_DIR is set so that the otherwise nice work-in-a-subdirectory 
does not kick in. Imagine for example:

	git-clone ./. victim

(taken straight out of t5400). If GIT_DIR was not set, git-init-db (which 
reads repositoryformat from the config if that exists, right?) would find 
.git/ in git/t/trash, and _not_ create git/t/trash/victim/.git/.

Ciao,
Dscho

^ permalink raw reply

* Re: Fix typo in tutorial-2.txt
From: J. Bruce Fields @ 2006-06-05 23:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606051245470.5498@g5.osdl.org>

On Mon, Jun 05, 2006 at 12:47:49PM -0700, Linus Torvalds wrote:
> 
> This should be obvious enough.
> 
> I didn't actually _test_ the tutorial, but if the old command worked, 
> something is really wrong!

Aie, sorry, thanks.--b.

^ permalink raw reply

* Re: [PATCH] Fix git_setup_directory_gently when GIT_DIR is set
From: Junio C Hamano @ 2006-06-05 23:43 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0606060117180.25685@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> Hmph.  Would it be a bug in clone that does not create GIT_DIR
>> then?
>
> I don't think so. The whole point in calling git-init-db is to create 
> that. GIT_DIR is set so that the otherwise nice work-in-a-subdirectory 
> does not kick in. Imagine for example:
>
> 	git-clone ./. victim
>
> (taken straight out of t5400). If GIT_DIR was not set, git-init-db (which 
> reads repositoryformat from the config if that exists, right?) would find 
> .git/ in git/t/trash, and _not_ create git/t/trash/victim/.git/.

I know clone currently relies on init-db to create the directory if
it does not exist (I wrote the code after all).

I am questioning if that was a wise thing to do.  In the case of
clone, we _know_ where we want the directory to be, so creating
the directory upfront before calling init-db feels like the
right thing to do.  In all the case other than this "clone calls
init-db" I can think of, if we have GIT_DIR set and it is set to
a non-existent location, it would be a bug in the code/script
and I think it is saner to error out in such a case.

^ permalink raw reply

* Re: Fix typo in tutorial-2.txt
From: J. Bruce Fields @ 2006-06-05 23:44 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.63.0606060111570.25344@wbgn013.biozentrum.uni-wuerzburg.de>

On Tue, Jun 06, 2006 at 01:15:32AM +0200, Johannes Schindelin wrote:
> On Mon, 5 Jun 2006, Linus Torvalds wrote:
> 
> > I didn't actually _test_ the tutorial, but if the old command worked, 
> > something is really wrong!
> 
> Maybe some new gitter wants to get dirty hands? We used to have a test 
> which just replayed the commands in the tutorial, to make sure that new 
> users would not hit a regression or a syntax change...

Might be a nice idea.

But don't commit SHA1's, e.g., depend on times, authors, etc., that we
can't count on being the same?  So you'd have to be clever to deal with
stuff like

	git cat-file commit a344bd52

Also, note that I didn't even attempt to keep a consistent scenario
throughout the first tutorial.

--b.

^ permalink raw reply

* Re: Horrible re-packing?
From: Junio C Hamano @ 2006-06-05 23:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0606051256280.5498@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> On Mon, 5 Jun 2006, Junio C Hamano wrote:
>> 
>> IIRC, sometimes this function is called with path and name split
>> and sometimes with full path in name
>
> Yeah, I was pretty confused by the whole hashing thing. Are you sure that 
> complexity is needed, it seems a bit overkill.

Two issues in the code confuses any reader of that function.

 - The code wants to hash Makefile from different revisions
   together, and Makefile and t/Makefile close to each other.
   The current code did it by treating '/' specially, used
   basename hash as the upper bits of the resulting hash and
   dirname hash as the lower bits.  It's my tendency to treat
   slashes specially too much, which is one of your favorite
   things to pick me on.

   This is not needed by your change anymore -- by only using
   the tail of the filename, and making sure tail part weighs
   more in the resulting group number, the new code gives the
   desired grouping characteristics in a much simpler way.

 - The output from rev-list --objects is fed to the function as
   its name parameter while path is set to NULL.  When we allow
   a thin pack to be generated, rev-list --objects output also
   contains "-<commit-object-name>" lines.  We read trees for
   these commits that are not going to be sent but can be used
   as base objects, and pass the pathname discovered from the
   tree using path and name pair (path is set to the linked list
   of struct name_path that describes the dirname, and name is
   set to the basename).  This was done to reduce the need for
   allocating and copying the pathname in preparation for
   calling name_hash() function.

   If you use only the "name" variable in your group number
   computation, and suppose we are doing send-pack to send
   updates between rev A..B, contrib/git-svn/Makefile from rev B
   will use git-svn/Makefile (tail 16 characters) to compute the
   number, but the blob from rev A (which we are not going to
   send but would want to use as a potential delta base) will
   have contrib/git-svn part in "path" (the element points at
   string "git-svn", and its uplink points at another element
   that points at "contrib" with an uplink that says it is at
   the root level), and Makefile in "name".  They will be hashed
   slightly differently.

^ permalink raw reply

* Re: Fix typo in tutorial-2.txt
From: Linus Torvalds @ 2006-06-05 23:58 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Johannes Schindelin, Junio C Hamano, Git Mailing List
In-Reply-To: <20060605234441.GB14993@fieldses.org>



On Mon, 5 Jun 2006, J. Bruce Fields wrote:
> 
> But don't commit SHA1's, e.g., depend on times, authors, etc., that we
> can't count on being the same?

You can set all these by hand with the

	GIT_{AUTHOR,COMMITTER}_{NAME,EMAIL,DATE}

variables. In fact, the test script library already does that for authors 
(although not for dates, I think) to get consistent answers.

			Linus

^ permalink raw reply

* Re: Horrible re-packing?
From: Junio C Hamano @ 2006-06-06  0:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7v8xob2m6j.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

>  - The output from rev-list --objects is fed to the function as
>    its name parameter while path is set to NULL.  When we allow
>    a thin pack to be generated, rev-list --objects output also
>    contains "-<commit-object-name>" lines.  We read trees for
>    these commits that are not going to be sent but can be used
>    as base objects, and pass the pathname discovered from the
>    tree using path and name pair (path is set to the linked list
>    of struct name_path that describes the dirname, and name is
>    set to the basename).  This was done to reduce the need for
>    allocating and copying the pathname in preparation for
>    calling name_hash() function.

but it can trivially fixed by doing something like this.

---

diff --git a/pack-objects.c b/pack-objects.c
index 3590cd5..e0328f8 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -463,17 +463,10 @@ static void rehash_objects(void)
 	}
 }
 
-struct name_path {
-	struct name_path *up;
-	const char *elem;
-	int len;
-};
-
 #define DIRBITS 12
 
-static unsigned name_hash(struct name_path *path, const char *name)
+static unsigned name_hash(const char *name)
 {
-	struct name_path *p = path;
 	const char *n = name + strlen(name);
 	unsigned hash = 0, name_hash = 0, name_done = 0;
 
@@ -492,14 +485,6 @@ static unsigned name_hash(struct name_pa
 		name_hash = hash;
 		hash = 0;
 	}
-	for (p = path; p; p = p->up) {
-		hash = hash * 11 + '/';
-		n = p->elem + p->len;
-		while (p->elem <= --n) {
-			unsigned char c = *n;
-			hash = hash * 11 + c;
-		}
-	}
 	/*
 	 * Make sure "Makefile" and "t/Makefile" are hashed separately
 	 * but close enough.
@@ -686,9 +671,9 @@ static int name_cmp_len(const char *name
 }
 
 static void add_pbase_object(struct tree_desc *tree,
-			     struct name_path *up,
 			     const char *name,
-			     int cmplen)
+			     int cmplen,
+			     const char *fullname)
 {
 	struct name_entry entry;
 
@@ -702,13 +687,12 @@ static void add_pbase_object(struct tree
 		    sha1_object_info(entry.sha1, type, &size))
 			continue;
 		if (name[cmplen] != '/') {
-			unsigned hash = name_hash(up, name);
+			unsigned hash = name_hash(fullname);
 			add_object_entry(entry.sha1, hash, 1);
 			return;
 		}
 		if (!strcmp(type, tree_type)) {
 			struct tree_desc sub;
-			struct name_path me;
 			struct pbase_tree_cache *tree;
 			const char *down = name+cmplen+1;
 			int downlen = name_cmp_len(down);
@@ -719,10 +703,7 @@ static void add_pbase_object(struct tree
 			sub.buf = tree->tree_data;
 			sub.size = tree->tree_size;
 
-			me.up = up;
-			me.elem = entry.path;
-			me.len = entry.pathlen;
-			add_pbase_object(&sub, &me, down, downlen);
+			add_pbase_object(&sub, down, downlen, fullname);
 			pbase_tree_put(tree);
 		}
 	}
@@ -778,14 +759,14 @@ static void add_preferred_base_object(ch
 
 	for (it = pbase_tree; it; it = it->next) {
 		if (cmplen == 0) {
-			hash = name_hash(NULL, "");
+			hash = name_hash("");
 			add_object_entry(it->pcache.sha1, hash, 1);
 		}
 		else {
 			struct tree_desc tree;
 			tree.buf = it->pcache.tree_data;
 			tree.size = it->pcache.tree_size;
-			add_pbase_object(&tree, NULL, name, cmplen);
+			add_pbase_object(&tree, name, cmplen, name);
 		}
 	}
 }
@@ -1328,7 +1309,7 @@ int main(int argc, char **argv)
 		}
 		if (get_sha1_hex(line, sha1))
 			die("expected sha1, got garbage:\n %s", line);
-		hash = name_hash(NULL, line+41);
+		hash = name_hash(line+41);
 		add_preferred_base_object(line+41, hash);
 		add_object_entry(sha1, hash, 0);
 	}

^ permalink raw reply related

* Re: Horrible re-packing?
From: Chris Wedgwood @ 2006-06-06  0:18 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Olivier Galibert, Linus Torvalds, Junio C Hamano,
	Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606051721180.24152@localhost.localdomain>

On Mon, Jun 05, 2006 at 05:22:02PM -0400, Nicolas Pitre wrote:

> Much more expensive for both memory usage and CPU cycles.

On a modern CPU I'm not sure you would notice unless the dataset was
insanely large.

^ permalink raw reply

* Re: Horrible re-packing?
From: Linus Torvalds @ 2006-06-06  0:35 UTC (permalink / raw)
  To: Chris Wedgwood
  Cc: Nicolas Pitre, Olivier Galibert, Junio C Hamano, Git Mailing List
In-Reply-To: <20060606001802.GB5401@tuatara.stupidest.org>



On Mon, 5 Jun 2006, Chris Wedgwood wrote:
>
> On Mon, Jun 05, 2006 at 05:22:02PM -0400, Nicolas Pitre wrote:
> 
> > Much more expensive for both memory usage and CPU cycles.
> 
> On a modern CPU I'm not sure you would notice unless the dataset was
> insanely large.

The dataset really _is_ pretty large.

For the kernel, we're talking about a quarter million strings, and it's 
not shrinking. Other (CVS imported from long histories) are in the several 
million object range.

The real problem, btw, is not the CPU cost of sorting them (hey, O(nlogn) 
works ok), but the memory use. We have to keep those quarter million names 
in memory too. At a "pointer + average of ~30 bytes of full pathname 
allocation + malloc overhead", the strings would basically take about 40 
bytes of memory each, and cause constant cache-misses.  In contrast, the 
"hash" value is a 32-bit unsigned, with no pointer overhead.

It's not the biggest part to keep track of (we need to obviously keep 
track of the 20-byte SHA1 and the pointers between objects), but if we had 
the full string info, it would be quite noticeable overhead, on the order 
of several tens of megabytes. 

Now, "tens of megabytes" is not a make-or-break issue any more (you 
definitely want lots of memory if you want to develop with large trees), 
but in a very real sense, the memory footprint of an app is often very 
closely correlated with its performance these days. 

Trying to keep things within the L2 cache can help a lot, and even if we 
expect 2MB and 4MB L2's to get more and more common, it means that we 
should _strive_ to keep datasets down. As it is, we have no _chance_ of 
staying in the L2 cache on the kernel, but for smaller projects we 
hopefully can still do so ;)

			Linus

^ permalink raw reply

* Re: [PATCH] Fix git_setup_directory_gently when GIT_DIR is set
From: Junio C Hamano @ 2006-06-06  1:06 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <7vfyij2mo8.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>>> Hmph.  Would it be a bug in clone that does not create GIT_DIR
>>> then?
>>
>> I don't think so. The whole point in calling git-init-db is to create 
>> that. GIT_DIR is set so that the otherwise nice work-in-a-subdirectory 
>> does not kick in. Imagine for example:
>>
>> 	git-clone ./. victim
>>
>> (taken straight out of t5400). If GIT_DIR was not set, git-init-db (which 
>> reads repositoryformat from the config if that exists, right?) would find 
>> .git/ in git/t/trash, and _not_ create git/t/trash/victim/.git/.
>
> I know clone currently relies on init-db to create the directory if
> it does not exist (I wrote the code after all).

Ah, I think I see the real problem is.  Alias handling is done
too early, and for commands like init-db that does _not_ even
want to look at an existing repository it tries to use GIT_DIR.

So how about this patch instead on top of yours?

-- >8 --
git alias: try alias last.

This disables alias "foo" from being used for git-foo, and when
we do use alias we check the built-in and then existing command
names first and then alias as the fallback.  This avoids the
problem of common commands used in scripts getting clobbered by
user specific aliases.

---
diff --git a/git.c b/git.c
index 8854472..6db8f2b 100644
--- a/git.c
+++ b/git.c
@@ -202,6 +202,7 @@ int main(int argc, const char **argv, ch
 	char *slash = strrchr(cmd, '/');
 	char git_command[PATH_MAX + 1];
 	const char *exec_path = NULL;
+	int done_alias = 0;
 
 	/*
 	 * Take the basename of argv[0] as the command
@@ -229,7 +230,6 @@ int main(int argc, const char **argv, ch
 	if (!strncmp(cmd, "git-", 4)) {
 		cmd += 4;
 		argv[0] = cmd;
-		handle_alias(&argc, &argv);
 		handle_internal_command(argc, argv, envp);
 		die("cannot handle %s internally", cmd);
 	}
@@ -287,13 +287,21 @@ int main(int argc, const char **argv, ch
 	exec_path = git_exec_path();
 	prepend_to_path(exec_path, strlen(exec_path));
 
-	handle_alias(&argc, &argv);
+	while (1) {
+		/* See if it's an internal command */
+		handle_internal_command(argc, argv, envp);
 
-	/* See if it's an internal command */
-	handle_internal_command(argc, argv, envp);
+		/* .. then try the external ones */
+		execv_git_cmd(argv);
 
-	/* .. then try the external ones */
-	execv_git_cmd(argv);
+		/* It could be an alias -- this works around the insanity
+		 * of overriding "git log" with "git show" by having
+		 * alias.log = show
+		 */
+		if (done_alias || !handle_alias(&argc, &argv))
+			break;
+		done_alias = 1;
+	}
 
 	if (errno == ENOENT)
 		cmd_usage(0, exec_path, "'%s' is not a git-command", cmd);

^ permalink raw reply related

* Re: [PATCH] Fix git_setup_directory_gently when GIT_DIR is set
From: Johannes Schindelin @ 2006-06-06  5:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmzcr14ao.fsf@assigned-by-dhcp.cox.net>

Hi,

On Mon, 5 Jun 2006, Junio C Hamano wrote:

> Junio C Hamano <junkio@cox.net> writes:
> 
> > I know clone currently relies on init-db to create the directory if
> > it does not exist (I wrote the code after all).
> 
> Ah, I think I see the real problem is.  Alias handling is done
> too early, and for commands like init-db that does _not_ even
> want to look at an existing repository it tries to use GIT_DIR.
> 
> So how about this patch instead on top of yours?

Yes, I like it. I was so focused on reading the config early to be done 
with it, and avoiding reading the config twice (which it can do now, with 
your patch, but only if one is so insane to use nested aliases).

Your method of avoiding shadowing proper git commands is elegant, if maybe 
incomplete: you have no way of warning the user that the alias is not 
used. But then, I do not think that matters.

Ciao,
Dscho

^ permalink raw reply

* Re: What's in git.git (part #2)
From: Shawn Pearce @ 2006-06-06  5:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v3beodpqs.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
> 
> > I find it useful to track what I've sent to you just in case I
> > screw up some ref somewhere.  I like knowing that if I perform a
> > bad update-ref call (which I'm prone to do sometimes) that I can
> > recover quickly as the log exists.
> 
> I find it interesting to be able to say:
> 
> 	$ git log next@{yesterday}..next
> 
> I often find myself getting curious to see:
> 
> 	$ git reflog next
>         Wed May 31 14:23:58 2006 -0700
>                 62b693a... Merge branch 'master' into next
>         ...

Hmm, looks like nobody has actually implemented that - at least not
in 'next'.  :-)

Is that a serious feature request?  If so I'll do it.  BTW: we're
also still lacking reflog during receive-pack.  Much of the update()
function in receive-pack.c can be replaced with the new locking
functions, except that if reflog is enabled on the target ref then
GIT_COMMITTER_NAME and GIT_COMMITTER_EMAIL need to be set for the
update to succeed.

I've been busy at work but I really have been wanting to put
some time into this GIT Eclipse plugin that I've been neglecting.
Some folks at work are starting to become more interested in it.
I have gotten the really core functionality done; all that is
left is the hard stuff (directory synchronization, push, fetch,
non-delta pack generation for push[1], tree diff).


*1* Generating a simple pack with only deflate compression on the
objects should be trivial.  Since this is 100% pure Java code nobody
should be expecting great performance out of it from day 1 anyway.
Sure its not an optimal transport but if you were that worried about
the transfer byte costs to push then you probably would just prefer
to use the native tools code tools anyway.

-- 
Shawn.

^ permalink raw reply

* Re: Importing Mozilla CVS into git
From: Martin Langhoff @ 2006-06-06  5:55 UTC (permalink / raw)
  To: Jon Smirl; +Cc: git
In-Reply-To: <9e4733910606022128h23ff94fbg3fcb4fa191254b5a@mail.gmail.com>

On 6/3/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 6/1/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> > With the attached patch you can parse the entire Mozilla tree. The
> > tree has over 100,000 files in it and about 300 branches.
>
> I was a little low with these counts, more like 110,000 files and some
> parts of the tree have 1,000 branches. Total tree size is 3GB.

I don't think it really has that many branches. If I am to believe
cvsps (which took 3GB to walk the history), it has some branches with
recursive loops in their ancestry (MANG_MATH_BRANCH and
SpiderMonkey140_BRANCH have eachother as ancestors!?), 197969 commits
and 796 branches.

This repository has been mangled quite badly. Don't know what you guys
did with it, but it sure isn't pretty. I'm working on getting
git-cvsimport to get through a complete import.

cheers,



martin

^ permalink raw reply

* New release?
From: Junio C Hamano @ 2006-06-06  6:02 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds
In-Reply-To: <Pine.LNX.4.64.0606052002530.5498@g5.osdl.org>

The "next" queue has been shrinking, and nothing is going to
"maint" branch, which can mean only one thing ;-).

The last issue of "What's in" talked about 1.4.0-rc1 but somehow
vger threw it on the bitbucket.  The message started like this:

        Subject: What's in git.git
        To: git@vger.kernel.org
	Date: Wed, 31 May 2006 23:40:26 -0700

        It's been a while since the last feature release, and I
        think with the recent "many things built-in" (including the
        busybox style integration) we are nearing a good time to do the
        next feature release 1.4.0.

        Before doing a 1.4.0-rc1, I would like to see the following
        topics in the "next" branch graduate to "master":

         - re-add missing flags to format-patch.  I have resurrected
           "--signoff"; if people care about something else we dropped
           when we went built-in, please raise hand and submit patches.

         - tree-parser updates from Linus seems to be fine in the sense
           I haven't seen breakage from it; I'll push it out to "master"
           before the end of the week.  I'd like to do another round of
           update to introduce a unified tree/index/directory walker, so
           settling this down is sort of urgent.

         - http-fetch fixes from Nick, which looked obviously correct.
           I would appreciate test reports from people who saw breakages
           on this one.

         - reflog from Shawn.  Do people find this useful?  I've enabled
           reflog on "next" branch in my development repository to see
           how useful it would be for myself a few days ago, and also in
           a linux-2.6 repository I use for testing (I do not hack on
           kernel myself).  

All of the above have been cleared with the recent fixes to
miscellaneous minor breakages around the tree-parser updates, so
I think what we have on the "master" branch tonight is worthy of
1.4.0-rc1 status.

        Other topics in "next" includes:

         - read/write-tree --prefix.  This is remnant of now-vetoed
           subproject support using "bind commit".  I kept both of them
           because they could be useful independent of "bind commit",
           but I do not know how much.  I think read-tree --prefix might
           probably be more useful than write-tree --prefix, since the
           latter can be writing out the whole tree and run rev-parse
           $tree:/path/name to extract that part, but the former does
           not have an easy equivalent (you could pipe ls-tree output to
           sed and pipe that to update-cache --index-info, but that is
           crumsy). 

           I'd like to do "gitlink" based subproject support but most
           likely that needs to come after tree/index/directory walker.

I am inclined to drop "read/write-tree --prefix" even from
"next"; we can resurrect it later if it proves to be useful in
order to do something else, such as "gitlink".

         - fetch-pack client-side hack.  When your repository has more
           roots than the repository you are fetching from, the common
           commit discovery exchange between fetch-pack and upload-pack
           ends up traversing down the ancestry chain of the history the
           other end do not have.  The hack in the "next" branch is to
           give up the common commit discovery early on the client side,
           which Ralf Baechle who originally reported the problem says
           to fix the problem (<20060526154239.GA20839@linux-mips.org>);
           but the proper fix involves a bit smarter upload-pack.

           I've posted a hacky upload-pack patch:

                <7vfyiwi4xl.fsf@assigned-by-dhcp.cox.net>

           but I think it should really needs to be cleaned up properly.

I think this can happen after 1.4.0.

        Things that we might want to have in 1.4.0 but not even in "next"
        yet include:

         - p4 importer (Sean Estabrooks) -- are people interested?

I am in favour of doing this before 1.4.0 (not in contrib/ but
as an importer next to other git-*import).

         - letting fetch-pack ask for an arbitrary commit object the
           user obtained out of band (Eric W Biederman) -- waiting for
           updated patch.  We would need a corresponding one-liner patch
           to upload-pack when we do this.

This can wait.

         - using ~/.gitrc to give a fall-back default when
           $GIT_DIR/config does not have values.

I suspect this would be more involved than Pasky's initial
patch; but it can wait.

         - command aliases and possibly default arguments via the
           configuration file.

This we have in "next" tonight and we should be able to have it
in "master" by the end of the week.  I took what Johannes did,
but I think that is pretty much the same as what Pasky did
initially, but forgot to mention that in the commit log.

I am expected to be mostly offline next week (among other
things, http://osdl.jp/seminar0613/ if you can read Japanese),
so I am planning to do 1.4.0-rc1 by my next git day (Wednesday
7th), and perhaps the real 1.4.0 on Saturday 10th if things go
smoothly.

Now v1.4.0-rc1 seems to have mirrored out.  Please have fun.

^ permalink raw reply

* Re: What's in git.git (part #2)
From: Junio C Hamano @ 2006-06-06  6:16 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git
In-Reply-To: <20060606053905.GA9797@spearce.org>

Shawn Pearce <spearce@spearce.org> writes:

>> I find it interesting to be able to say:
>> 
>> 	$ git log next@{yesterday}..next
>> 
>> I often find myself getting curious to see:
>> 
>> 	$ git reflog next
>>         Wed May 31 14:23:58 2006 -0700
>>                 62b693a... Merge branch 'master' into next
>>         ...
>
> Hmm, looks like nobody has actually implemented that - at least not
> in 'next'.  :-)
>
> Is that a serious feature request?

I've written it but it was so trivial I threw it away after
writing the e-mail you are responding to with it.

As I said, I _think_ I was interested in seeing it primarily
because reflog was a new curiosity to me.  It is more like
wanting to know how the new tool works more than using the new
tool effectively to improve my productivity.  In a "serious"
environment, a tool is just something you would use to get the
real job done, not to toy around to see how _it_ works, so I
suspect the above would not be so useful in practice, as I wrote
in the message.

^ permalink raw reply

* Re: [PATCH] Cleanup git-send-email.perl:extract_valid_email
From: Junio C Hamano @ 2006-06-06  6:59 UTC (permalink / raw)
  To: Horst von Brand, Ryan Anderson, Eric Wong; +Cc: git
In-Reply-To: <200606040010.k540ANa4015204@laptop11.inf.utfsm.cl>

Horst von Brand <vonbrand@inf.utfsm.cl> writes:

>> >  	# check for a local address:
>> > -	return $address if ($address =~ /^([\w\-]+)$/);
>> > +	return $address if ($address =~ /^([\w\-.]+)$/);
>> 
>> I keep forgetting this, '+' is a valid (and useful) setup, too.
>
> Oops...
>> 
>> Actually, I'm retracting my earlier ack on this.  This is way too
>> restrictive.  I'd rather allow an occasional invalid email address than
>> to reject valid ones.  I generally trust git users to know what they're
>> doing when entering email addresses[1].
>> 
>> *, $, ^, +, = are all valid characters in the username portion (not sure
>> about local accounts, though), and I'm sure there are more that I don't
>> know about.
>
> As a general principle, I prefer to check what is legal instead of trying
> to filter out what isn't.

If we start doing addr-spec in RFC2822 (page 17) ourselves, we
should rather be using Email::Valid.  A permissive sanity check
to catch obvious mistakes would be more appropriate here than
being RFC police.

I think something like the attached, on top of your patch, would
be appropriate for upcoming 1.4.0.

-- >8 --
send-email: be more lenient and just catch obvious mistakes.

This cleans up the pattern matching subroutine by introducing
two variables to hold regexp to approximately match local-part
and domain in the e-mail address.  It is meant to catch obvious
mistakes with a cheap check.

The patch also moves "scalar" to force Email::Valid->address()
to work in !wantarray environment to extract_valid_address;
earlier it was in the caller of the subroutine, which was way
too error prone.

---
diff --git a/git-send-email.perl b/git-send-email.perl
index a7a7797..700d0c3 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -312,16 +312,18 @@ our ($message_id, $cc, %mail, $subject, 
 
 sub extract_valid_address {
 	my $address = shift;
+	my $local_part_regexp = '[^<>"\s@]+';
+	my $domain_regexp = '[^.<>"\s@]+\.[^<>"\s@]+';
 
 	# check for a local address:
-	return $address if ($address =~ /^([\w\-.]+)$/);
+	return $address if ($address =~ /^($local_part_regexp)$/);
 
 	if ($have_email_valid) {
-		return Email::Valid->address($address);
+		return scalar Email::Valid->address($address);
 	} else {
 		# less robust/correct than the monster regexp in Email::Valid,
 		# but still does a 99% job, and one less dependency
-		$address =~ /([\w\-.]+@[\w\-.]+)/;
+		$address =~ /($local_part_regexp\@$domain_regexp)/;
 		return $1;
 	}
 }
@@ -384,7 +386,7 @@ X-Mailer: git-send-email $gitversion
 		defined $pid or die $!;
 		if (!$pid) {
 			exec($smtp_server,'-i',
-			     map { scalar extract_valid_address($_) }
+			     map { extract_valid_address($_) }
 			     @recipients) or die $!;
 		}
 		print $sm "$header\n$message";

^ permalink raw reply related

* peek_remote return forever zero
From: Paolo Teti @ 2006-06-06  8:18 UTC (permalink / raw)
  To: git

A question..

Why in peek-remote.c the peek_remote func.
return an int (and forever zero) ?

peek_remote return an int for future use
(and just for now zero) or we can replace it with a void?

^ permalink raw reply

* Re: What's in git.git (part #2)
From: Johannes Schindelin @ 2006-06-06  8:19 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git
In-Reply-To: <20060606053905.GA9797@spearce.org>

Hi,

On Tue, 6 Jun 2006, Shawn Pearce wrote:

> *1* Generating a simple pack with only deflate compression on the
> objects should be trivial.  Since this is 100% pure Java code nobody
> should be expecting great performance out of it from day 1 anyway.

If you use jzlib (http://www.jcraft.com/jzlib/) it should not be slow. 
IMHO the I/O will be the bottleneck, but I would be happy to be 
contradicted by the facts.

Ciao,
Dscho

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox