Git development
 help / color / mirror / Atom feed
* Cogito bug on Debian
From: Martin Langhoff @ 2006-04-20 23:17 UTC (permalink / raw)
  To: Git Mailing List, Petr Baudis

This was spotted circulating on Catalyst's IRC channel. Apparently,
the bug "causes non-serious data loss".

     http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=330031

cheers,


martin

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Junio C Hamano @ 2006-04-20 23:02 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604201630320.2215@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

>> I originally thought, with one single notable exception of
>> Makefile, having the identically named file in many different
>> directories is not common nor sane,
>
> I'd tend to disagree with that but...

I disagree with that myself now.  The kernel tree has many
files with the same basename (e.g. arch/*/kernel/irq.c).

It is a different issue if they are good delta base candidates
with each other, though.

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Linus Torvalds @ 2006-04-20 22:59 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, Nicolas Pitre, git
In-Reply-To: <20060420220240.GB32748@spearce.org>



On Thu, 20 Apr 2006, Shawn Pearce wrote:

> Junio C Hamano <junkio@cox.net> wrote:
> > 
> > This _might_ improve things:
> > 
> > diff --git a/pack-objects.c b/pack-objects.c
> > index 09f4f2c..0c6abe9 100644
> > --- a/pack-objects.c
> > +++ b/pack-objects.c
> > @@ -1037,7 +1039,7 @@ static int try_delta(struct unpacked *cu
> >  	sizediff = oldsize > size ? oldsize - size : size - oldsize;
> >  
> >  	if (size < 50)
> > -		return -1;
> > +		return 0;
> >  	if (old_entry->depth >= max_depth)
> >  		return 0;
> >  
> > @@ -1052,7 +1054,7 @@ static int try_delta(struct unpacked *cu
> >  	if (cur_entry->delta)
> >  		max_size = cur_entry->delta_size-1;
> >  	if (sizediff >= max_size)
> > -		return -1;
> > +		return 0;
> >  	delta_buf = diff_delta(old->data, oldsize,
> >  			       cur->data, size, &delta_size, max_size);
> >  	if (!delta_buf)
> 
> Holy cow, it did:
> 
>   Total 46391, written 46391 (delta 8391), reused 37774 (delta 0)
>    46M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack
> 
> That's the smallest packing I've seen yet.  And it doesn't have a
> negative affect on repacking GIT either.

I think I know what's going on, and why your bisection claimed it was the 
re-hashing change that was the problem, even though it really wasn't.

That

	if (sizediff >= max_size)
		return -1;

check is actually fairly _old_. It's from June 2005, ie it's from pretty 
much the first two days of that packing thing existing in the first place.

(The initial repacking was done June 25th, with a lot of tweaking over the 
next few days. That sizediff thing was part of the fairly early tweaking).

The thing is, that check made sense back then. Why? Because we sorted 
things in decreasing size order back then (I think this was before we even 
did any name-based heuristic sorting at all), so that when we tried the 
delta algorithm, and the size diff was bigger than the last delta size, we 
pretty much _knew_ the new delta would be bigger still, and there was no 
point in continuing with try_delta.

HOWEVER. We have since changed the sorting to sort according to name 
before it sorts according to size, so that old heuristic that depended on 
the size being monotonically increasing simply doesn't make any sense any 
more.

So I think at that second "return -1" really _should_ be changed to a 
"return 0", and not just because it helps your particular case. It's 
literally a bug these days, because the assumptions that caused it to 
return -1 simply aren't true any more.

(It wasn't _strictly_ true even originally: even if the sizediff is huge, 
the _delta_ may not be huge, since we can delete data with a small delta. 
So it's quite likely that we should compare the "old is bigger than new" 
and "new is bigger than old" separately and have different heuristics for 
them. Again, that was simply not much of an issue back when we sorted just 
by size).

So even the "return 0" might not be completely right. We might actually 
want to look at how big the delta is, and return only once that fails.

			Linus

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Junio C Hamano @ 2006-04-20 22:35 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Nicolas Pitre, git
In-Reply-To: <20060420220240.GB32748@spearce.org>

Shawn Pearce <spearce@spearce.org> writes:

> Junio C Hamano <junkio@cox.net> wrote:
>
>> The list is sorted by type then hash then size (type_size_sort),
>> so if you have t/Makefile that are big medium small too-small
>> and then doc/Makefile that are big medium, once you see the
>> too-small t/Makefile it would not consider the big doc/Makefile
>> as a candidate X-<.
>> 
>> This _might_ improve things:
>>... 
>
> Holy cow, it did:
>
>   Total 46391, written 46391 (delta 8391), reused 37774 (delta 0)
>    46M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack
>
> That's the smallest packing I've seen yet.  And it doesn't have a
> negative affect on repacking GIT either.

Thanks for trying.  Mind trying one more?

I suspect the test patch makes pack-objects a lot more
expensive.

The code before the test patch said "if the size is very small
or size difference is too great, do not consider this, and do
not consider any more objects in the delta window, because we
know they are either even smaller of the same path, they have
different names, or they are of different type".  The test patch
you tried was a quick and dirty hack that said "under the
too-small condition, skip this one, but keep trying the rest of
the delta window".

Here is a cleaned up patch.  What it does is "under the
too-small condition, see if the object has the same basename,
and if so keep going, but otherwise skip the rest as before".

If you have objects like this and are trying to pack the first
object (this list is sorted in the order pack-object tries):

       (size)	(path)
	1000	t/0-11-AdjLite.deg
          10	t/0-11-AdjLite.deg
         800	s/0-11-AdjLite.deg
	  20	t/0-12-AdjLite.deg

the current code stops after checking t/0-11-AdjLite.deg.  The
test patch tries all of them.  This patch skips that file, but
tries "s/0-11-AdjLite.deg", and then stops at the next one.

-- >8 --

diff --git a/pack-objects.c b/pack-objects.c
index 09f4f2c..2173709 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -1036,8 +1036,6 @@ static int try_delta(struct unpacked *cu
 	oldsize = old_entry->size;
 	sizediff = oldsize > size ? oldsize - size : size - oldsize;
 
-	if (size < 50)
-		return -1;
 	if (old_entry->depth >= max_depth)
 		return 0;
 
@@ -1048,20 +1046,27 @@ static int try_delta(struct unpacked *cu
 	 * more space-efficient (deletes don't have to say _what_ they
 	 * delete).
 	 */
-	max_size = size / 2 - 20;
-	if (cur_entry->delta)
-		max_size = cur_entry->delta_size-1;
-	if (sizediff >= max_size)
-		return -1;
-	delta_buf = diff_delta(old->data, oldsize,
-			       cur->data, size, &delta_size, max_size);
-	if (!delta_buf)
+	if (50 <= size) {
+		max_size = size / 2 - 20;
+		if (cur_entry->delta)
+			max_size = cur_entry->delta_size-1;
+		if (sizediff < max_size) {
+			delta_buf = diff_delta(old->data, oldsize,
+					       cur->data, size,
+					       &delta_size, max_size);
+			if (!delta_buf)
+				return 0;
+			cur_entry->delta = old_entry;
+			cur_entry->delta_size = delta_size;
+			cur_entry->depth = old_entry->depth + 1;
+			free(delta_buf);
+			return 0;
+		}
+	}
+	/* Keep going as long as the basename matches */
+	if (((cur_entry->hash ^ old_entry->hash) >>DIRBITS) == 0)
 		return 0;
-	cur_entry->delta = old_entry;
-	cur_entry->delta_size = delta_size;
-	cur_entry->depth = old_entry->depth + 1;
-	free(delta_buf);
-	return 0;
+	return -1;
 }
 
 static void progress_interval(int signum)

^ permalink raw reply related

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Shawn Pearce @ 2006-04-20 22:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, git
In-Reply-To: <7vfyk8vscl.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> Nicolas Pitre <nico@cam.org> writes:
> 
> >> But I suspect we have a built-in "we sort bigger to smaller, and
> >> we cut off when we switch bins" somewhere in find_delta() loop,
> >> which I do not recall touching when I did that change, so that
> >> may be interfering and preventing 0-11-AdjLite.deg from all over
> >> the place to delta against each other.
> >
> > I just cannot find something that would do that in the code.  When 
> > --no-reuse-delta is specified, the only things that will break the loop
> > in find_delta() is when try_delta() returns -1, and that happens only 
> > when changing object type or when the size difference is too big, but 
> > nothing looks at the name hash.
> 
> The list is sorted by type then hash then size (type_size_sort),
> so if you have t/Makefile that are big medium small too-small
> and then doc/Makefile that are big medium, once you see the
> too-small t/Makefile it would not consider the big doc/Makefile
> as a candidate X-<.
> 
> This _might_ improve things:
> 
> diff --git a/pack-objects.c b/pack-objects.c
> index 09f4f2c..0c6abe9 100644
> --- a/pack-objects.c
> +++ b/pack-objects.c
> @@ -1037,7 +1039,7 @@ static int try_delta(struct unpacked *cu
>  	sizediff = oldsize > size ? oldsize - size : size - oldsize;
>  
>  	if (size < 50)
> -		return -1;
> +		return 0;
>  	if (old_entry->depth >= max_depth)
>  		return 0;
>  
> @@ -1052,7 +1054,7 @@ static int try_delta(struct unpacked *cu
>  	if (cur_entry->delta)
>  		max_size = cur_entry->delta_size-1;
>  	if (sizediff >= max_size)
> -		return -1;
> +		return 0;
>  	delta_buf = diff_delta(old->data, oldsize,
>  			       cur->data, size, &delta_size, max_size);
>  	if (!delta_buf)

Holy cow, it did:

  Total 46391, written 46391 (delta 8391), reused 37774 (delta 0)
   46M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack

That's the smallest packing I've seen yet.  And it doesn't have a
negative affect on repacking GIT either.

-- 
Shawn.

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Jakub Narebski @ 2006-04-20 21:56 UTC (permalink / raw)
  To: git
In-Reply-To: <7vodywvsrq.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> v1.2.3 hash was base-name only.  doc/Makefile and t/Makefile
> were thrown in the same bin and sorted by size.  When the
> history you are packing is deep, and doc/Makefile and t/Makefile
> are not related at all, this made effective size of delta window
> 1/N where N is the number of such duplicates.
> 
> The one you found above uses a hash that is fully full-path.
> The two are in completely different bins, and bins are totally
> random.  This was not a good strategy.
> 
> v1.3.0 hash is base-name hash concatenated with leading-path
> has.  t/Makefile and doc/Makefile go in separate bins, but the
> bins are close to each other; this avoids the problem in v1.2.3
> when you have deep history, but at the same time if you do not
> have many many versions of t/Makefile to overflow the delta
> window, it gives t/Makefile a chance to delta with doc/Makefile.
[...]
> You could try this patch to resurrect the hash used in v1.2.3,
> and you may get better packing for your particular repository;
> but I am not sure if it gives better results in the general
> case.  I am running the test myself now while waiting for my
> day-job database to load X-<.

Perhaps the packing code could check which version gives smaller pack, or at
least be instructed that one might want different packing heuristic for
specific repository? Surely 2x difference in size is worth considering (and
complication)...

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Shawn Pearce @ 2006-04-20 21:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Nicolas Pitre, Linus Torvalds
In-Reply-To: <7vodywvsrq.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
> 
> > I just spent some time bisecting this issue and it looks like the
> > following change by Junio may be the culprit:
> >
> >   commit 1d6b38cc76c348e2477506ca9759fc241e3d0d46
[snip]
> Unfortunately, that is not the same hash we use in v1.3.0, so we
> need to look elsewhere for interactions.

Pity.  Then either bisect goofed or there was a goof in meatspace
while using bisect.  I honestly expected bisect to point at the
problem commit.  I tried reverting 1d6b38cc but it didn't apply
cleanly and I didn't feel like working through all of the conflicts
at the time.
 
[snip]
> The earlier observation by Linus on reverting eeef7135 is
> consistent with it; that commit was the one that introduced
> v1.3.0 hash.

Yet reverting that didn't help either.
 
[snip]
> You could try this patch to resurrect the hash used in v1.2.3,
> and you may get better packing for your particular repository;
> but I am not sure if it gives better results in the general
> case.  I am running the test myself now while waiting for my
> day-job database to load X-<.
[snip]

Nope.  When applied to 'next' it didn't help very much:

  Total 46391, written 46391 (delta 6466), reused 38662 (delta 0)
  118M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack


Just to note: the 1.3.0 packer is saving 1M in the GIT repository
over the 1.2.3 packer.  So for a real project it does seem to have
some benefit.  And if you benchmarked the 1.3.0 packer against
the Linux kernel and found it to be better than the 1.2.3 packer
that's even better.

I think this repository of mine may just be a degenerate case which
GIT doesn't pack very well.  GIT can't be all things to all people!

-- 
Shawn.

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Junio C Hamano @ 2006-04-20 21:40 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604201630320.2215@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

>> But I suspect we have a built-in "we sort bigger to smaller, and
>> we cut off when we switch bins" somewhere in find_delta() loop,
>> which I do not recall touching when I did that change, so that
>> may be interfering and preventing 0-11-AdjLite.deg from all over
>> the place to delta against each other.
>
> I just cannot find something that would do that in the code.  When 
> --no-reuse-delta is specified, the only things that will break the loop
> in find_delta() is when try_delta() returns -1, and that happens only 
> when changing object type or when the size difference is too big, but 
> nothing looks at the name hash.

The list is sorted by type then hash then size (type_size_sort),
so if you have t/Makefile that are big medium small too-small
and then doc/Makefile that are big medium, once you see the
too-small t/Makefile it would not consider the big doc/Makefile
as a candidate X-<.

This _might_ improve things:

diff --git a/pack-objects.c b/pack-objects.c
index 09f4f2c..0c6abe9 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -1037,7 +1039,7 @@ static int try_delta(struct unpacked *cu
 	sizediff = oldsize > size ? oldsize - size : size - oldsize;
 
 	if (size < 50)
-		return -1;
+		return 0;
 	if (old_entry->depth >= max_depth)
 		return 0;
 
@@ -1052,7 +1054,7 @@ static int try_delta(struct unpacked *cu
 	if (cur_entry->delta)
 		max_size = cur_entry->delta_size-1;
 	if (sizediff >= max_size)
-		return -1;
+		return 0;
 	delta_buf = diff_delta(old->data, oldsize,
 			       cur->data, size, &delta_size, max_size);
 	if (!delta_buf)

^ permalink raw reply related

* Re: cg-clone produces "___" file and no working tree
From: Zack Brown @ 2006-04-20 21:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vodywxago.fsf@assigned-by-dhcp.cox.net>

Hi,

> honestly I would not recommend "runnning
> without installing" unless you know what you are doing ;-).

OK, you're right, the problem was that I was not doing a proper install. I
followed the directions and it worked. Thanks!

Be well,
Zack

On Thu, Apr 20, 2006 at 01:23:35PM -0700, Junio C Hamano wrote:
> Junio C Hamano <junkio@cox.net> writes:
> 
> > Zack Brown <zbrown@tumblerings.org> writes:
> >
> >> Not true. I went into the git source directory, and ran "make". Nothing more.
> >
> > Ah, I misunderstood.  You are trying to run it _without_
> > installing it.
> >
> > Well, then probably you do not have templates installed
> > anywhere, especially not where git-init-db expects them to be
> > found.
> 
> (sorry for the short message sent unfinished by mistake).
> 
> Running things without installing is somewhat tricky, but test
> framework needs to do that, so there are some things you would
> need to do.
> 
>  - "git init-db" takes --template argument; in the source area
>    before installing, they are built in templates/blt/.
> 
>  - "git" and programs that need to invoke other git programs
>    (e.g. git-send-pack) expects things to be found in gitexecdir
>    you set when you build.  If you are not installing, you need
>    to override that with GIT_EXEC_PATH environment variable.
> 
> There might be other things, but you should be able to find them
> from what t/Makefile and t/test-lib.sh do.
> 
> Having said that, honestly I would not recommend "runnning
> without installing" unless you know what you are doing ;-).
> 

-- 

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Junio C Hamano @ 2006-04-20 21:31 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: git, Nicolas Pitre, Linus Torvalds
In-Reply-To: <20060420173131.GF31738@spearce.org>

Shawn Pearce <spearce@spearce.org> writes:

> I just spent some time bisecting this issue and it looks like the
> following change by Junio may be the culprit:
>
>   commit 1d6b38cc76c348e2477506ca9759fc241e3d0d46
>   Author: Junio C Hamano <junkio@cox.net>
>   Date:   Wed Feb 22 22:10:24 2006 -0800
>   
>       pack-objects: use full pathname to help hashing with "thin" pack.
>       
>       This uses the same hashing algorithm to the "preferred base
>       tree" objects and the incoming pathnames, to group the same
>       files from different revs together, while spreading files with
>       the same basename in different directories.
>       
>       Signed-off-by: Junio C Hamano <junkio@cox.net>
>   

Unfortunately, that is not the same hash we use in v1.3.0, so we
need to look elsewhere for interactions.

v1.2.3 hash was base-name only.  doc/Makefile and t/Makefile
were thrown in the same bin and sorted by size.  When the
history you are packing is deep, and doc/Makefile and t/Makefile
are not related at all, this made effective size of delta window
1/N where N is the number of such duplicates.

The one you found above uses a hash that is fully full-path.
The two are in completely different bins, and bins are totally
random.  This was not a good strategy.

v1.3.0 hash is base-name hash concatenated with leading-path
has.  t/Makefile and doc/Makefile go in separate bins, but the
bins are close to each other; this avoids the problem in v1.2.3
when you have deep history, but at the same time if you do not
have many many versions of t/Makefile to overflow the delta
window, it gives t/Makefile a chance to delta with doc/Makefile.

The earlier observation by Linus on reverting eeef7135 is
consistent with it; that commit was the one that introduced
v1.3.0 hash.

You could try this patch to resurrect the hash used in v1.2.3,
and you may get better packing for your particular repository;
but I am not sure if it gives better results in the general
case.  I am running the test myself now while waiting for my
day-job database to load X-<.

NOTE NOTE NOTE.  The hash in v1.2.3 was done with the basename
(relying on rev-list --objects to only show the basename) and
hashed from front to back.  The current one uses the same hash
scrambling function, but it hashes from back to front, and it
knows rev-list --objects gives it a full path.

What this patch does is to stop the hashing after we are done
with the basename part.  So it still gives different hash value
to the same path from v1.2.3 version, but the distribution
should be equivalent.

NOTE 2.  Feeding output from the current "rev-list --objects" to
v1.2.3 pack-object is the same as "hash full path and spread
things out" intermediate version, which is the worst performer.

-- >8 --
git diff
diff --git a/pack-objects.c b/pack-objects.c
index 09f4f2c..e58e169 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -492,6 +492,8 @@ static unsigned name_hash(struct name_pa
 		name_hash = hash;
 		hash = 0;
 	}
+	return name_hash;
+
 	for (p = path; p; p = p->up) {
 		hash = hash * 11 + '/';
 		n = p->elem + p->len;

^ permalink raw reply related

* [PATCH] fix pack-object buffer size
From: Nicolas Pitre @ 2006-04-20 21:25 UTC (permalink / raw)
  To: git

The input line has 40 _chars_ of sha1 and no 20 _bytes_. It should also 
account for the space before the pathname, and the terminating \n and \0.

Signed-off-by: Nicolas Pitre <nico@cam.org>
---

I doubt anyone has ever used a repository with paths long enough to hit 
the limit, but better make it right nevertheless.

diff --git a/pack-objects.c b/pack-objects.c
index 09f4f2c..3c2767b 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -1231,7 +1231,7 @@ static void setup_progress_signal(void)
 int main(int argc, char **argv)
 {
 	SHA_CTX ctx;
-	char line[PATH_MAX + 20];
+	char line[40 + 1 + PATH_MAX + 2];
 	int window = 10, depth = 10, pack_to_stdout = 0;
 	struct object_entry **list;
 	int num_preferred_base = 0;

^ permalink raw reply related

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Nicolas Pitre @ 2006-04-20 21:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v8xq0yteb.fsf@assigned-by-dhcp.cox.net>

On Thu, 20 Apr 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > On Thu, 20 Apr 2006, Shawn Pearce wrote:
> >
> >> The more that I think about it the more it seems possible that the
> >> pathname hashing is what may be causing the problem.  Not only did
> >> bisect point to 1d6b38cc76c348e2477506ca9759fc241e3d0d46 but the
> >> directory which contains the bulk of the space has many files with
> >> the same name located in different directories:
> > [...]
> >
> > But the bad commit according to your bisection talks about "thin" packs 
> > which are not involved in your case.  So something looks fishy with that 
> > commit which should not have touched path hashing in the non-thin pack 
> > case...  I think...
> 
> I think this explains it.  The new code hashes full-path, but
> places bins for the paths with the same basename next to each
> other, so before Makefile and doc/Makefile and t/Makefile were
> all in the same bin, but now they are in three different bins
> next to each other.

That is fine.  In fact I did try with a tweaked name_hash() that 
completely ignored all directory components and the resulting pack was 
even bigger, much bigger, when repacking Shawn's repo.

> I originally thought, with one single notable exception of
> Makefile, having the identically named file in many different
> directories is not common nor sane,

I'd tend to disagree with that but...

> and the new code favors to
> delta with the exact same path for deeper history over wasting
> delta window for making delta with objects with the same name in
> different places in more recent history.  I think I benched this
> with kernel repository (git.git was too small for that).

This is obviously fine.  And if a file in a given directory has few 
revisions then the delta window will consider objects for a file with 
the same name in other directories as well, which is also sensible.  So 
if files of the same name are located in different directories they 
should delta well against each other if they're similar enough.  This 
should cover Shawn's repo layout.

> But I suspect we have a built-in "we sort bigger to smaller, and
> we cut off when we switch bins" somewhere in find_delta() loop,
> which I do not recall touching when I did that change, so that
> may be interfering and preventing 0-11-AdjLite.deg from all over
> the place to delta against each other.

I just cannot find something that would do that in the code.  When 
--no-reuse-delta is specified, the only things that will break the loop
in find_delta() is when try_delta() returns -1, and that happens only 
when changing object type or when the size difference is too big, but 
nothing looks at the name hash.

It is also hard to corelate it with commit 1d6b38cc which is the one 
that introduced the regression.


Nicolas

^ permalink raw reply

* Re: cg-clone produces "___" file and no working tree
From: Petr Baudis @ 2006-04-20 20:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vk69kxabp.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Thu, Apr 20, 2006 at 10:26:34PM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> Petr Baudis <pasky@suse.cz> writes:
> 
> > Duh, but shouldn't git-init-db create .git/info at any rate, even when
> > no templates are installed?
> 
> I do not think so.  We tend to lazily create necessary
> directories under .git/ these days, and absolute minimum git
> should not need an empty .git/info directory.
> 
> If there is something that creates files in .git/info without
> making sure that leading path exists, we should fix it (maybe
> update-server-info forgets it?  I haven't checked).

Aww. cg-clone assumed that .git/info is canonical part of git repository
now and git-init-db will always creat eit, but now it seems to be the
case only for .git/objects/info.

I've fixed cg-clone. Thanks for the info.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: cg-clone produces "___" file and no working tree
From: Junio C Hamano @ 2006-04-20 20:26 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20060420201915.GF27689@pasky.or.cz>

Petr Baudis <pasky@suse.cz> writes:

> Duh, but shouldn't git-init-db create .git/info at any rate, even when
> no templates are installed?

I do not think so.  We tend to lazily create necessary
directories under .git/ these days, and absolute minimum git
should not need an empty .git/info directory.

If there is something that creates files in .git/info without
making sure that leading path exists, we should fix it (maybe
update-server-info forgets it?  I haven't checked).

^ permalink raw reply

* Re: cg-clone produces "___" file and no working tree
From: Junio C Hamano @ 2006-04-20 20:23 UTC (permalink / raw)
  To: Zack Brown; +Cc: git
In-Reply-To: <7vslo8xaql.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> Zack Brown <zbrown@tumblerings.org> writes:
>
>> Not true. I went into the git source directory, and ran "make". Nothing more.
>
> Ah, I misunderstood.  You are trying to run it _without_
> installing it.
>
> Well, then probably you do not have templates installed
> anywhere, especially not where git-init-db expects them to be
> found.

(sorry for the short message sent unfinished by mistake).

Running things without installing is somewhat tricky, but test
framework needs to do that, so there are some things you would
need to do.

 - "git init-db" takes --template argument; in the source area
   before installing, they are built in templates/blt/.

 - "git" and programs that need to invoke other git programs
   (e.g. git-send-pack) expects things to be found in gitexecdir
   you set when you build.  If you are not installing, you need
   to override that with GIT_EXEC_PATH environment variable.

There might be other things, but you should be able to find them
from what t/Makefile and t/test-lib.sh do.

Having said that, honestly I would not recommend "runnning
without installing" unless you know what you are doing ;-).

^ permalink raw reply

* Re: cg-clone produces "___" file and no working tree
From: Petr Baudis @ 2006-04-20 20:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Zack Brown, git
In-Reply-To: <7vslo8xaql.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Thu, Apr 20, 2006 at 10:17:38PM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> Zack Brown <zbrown@tumblerings.org> writes:
> 
> > Not true. I went into the git source directory, and ran "make". Nothing more.
> 
> Ah, I misunderstood.  You are trying to run it _without_
> installing it.
> 
> Well, then probably you do not have templates installed
> anywhere, especially not where git-init-db expects them to be
> found.

Duh, but shouldn't git-init-db create .git/info at any rate, even when
no templates are installed?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: cg-clone produces "___" file and no working tree
From: Junio C Hamano @ 2006-04-20 20:17 UTC (permalink / raw)
  To: Zack Brown; +Cc: git
In-Reply-To: <20060420200849.GA3653@tumblerings.org>

Zack Brown <zbrown@tumblerings.org> writes:

> Not true. I went into the git source directory, and ran "make". Nothing more.

Ah, I misunderstood.  You are trying to run it _without_
installing it.

Well, then probably you do not have templates installed
anywhere, especially not where git-init-db expects them to be
found.

^ permalink raw reply

* Re: cg-clone produces "___" file and no working tree
From: Zack Brown @ 2006-04-20 20:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vejzsywrq.fsf@assigned-by-dhcp.cox.net>

On Thu, Apr 20, 2006 at 10:36:25AM -0700, Junio C Hamano wrote:
> Zack Brown <zbrown@tumblerings.org> writes:
> 
> > I just downloaded the latest versions of git and cogito from kernel.org:
> > cogito-0.17.2 and git-1.3.0; put their directories in my path, and ran "make" on
> > both of them. There's no other version in my path.
> 
> Earlier, you were having this symptom:
> 
> >> What do these command say?
> >> 
> >> 	$ git --exec-path
> >> 	$ ls -l "`git --exec-path`/git-clone"
> >
> > 22:07:05 [zbrown] ~$ git --exec-path
> > /home/zbrown/bin
> > 07:10:34 [zbrown] ~$ ls -l "`git --exec-path`/git-clone"
> > ls: /home/zbrown/bin/git-clone: No such file or directory
> >
> > Does that mean it's looking in /home/zbrown/bin for the git binaries?
> 
> If that is the case, you did not just (quote) "and ran "make"".
> 
> You must have run "make frotz=xyzzy target", but you did not mention
> what frotz, xyzzy and target were.

Not true. I went into the git source directory, and ran "make". Nothing more.

I've been doing that for a long time, whenever I sync with the repository. I
didn't know the installation instructions had changed.

> It probably would help if you did this:
> 
> 	make clean
> 	make bindir=$HOME/git/git gitexecdir=$HOME/git/git/
> 	make bindir=$HOME/git/git gitexecdir=$HOME/git/git/ install

OK, I did this. The first 2 commands worked fine. The third complained of
duplicate files, and exited with an error. Maybe because the source tree is also
$HOME/git/git

I then did a 'cd ..; mkdir tmp; cd tmp; git-init-db' as before, but there
is still no ".git/info" entry created.

Be well,
Zack

> 
> As I said in a previous message, the first paragraph in INSTALL
> file explains this.
> 

-- 
Zack Brown

^ permalink raw reply

* Re: n-heads and patch dependency chains
From: Junio C Hamano @ 2006-04-20 18:55 UTC (permalink / raw)
  To: git
In-Reply-To: <1145556505.5314.149.camel@cashmere.sps.mot.com>

Jon Loeliger <jdl@freescale.com> writes:

> On Tue, 2006-04-04 at 06:47, Andreas Ericsson wrote:
>
>> No, I mean that this would commit both to the testing branch (being the 
>> result of several merged topic-branches) and to the topic-branch merged 
>> in. Commit as in regular commit, with a commit-message and a patch. The 
>> resulting repository would be the exact same as if the change was 
>> committed only to the topic-branch and then cherry-picked on to the 
>> testing-branch.

To be consistent, I think the result should be "as if the change
was commited only to the topic-branch and then the topic-branch
was *merged* into the testing-branch", since you start your
testing branch as "being the result of several merged topic-branches".

I do that (manually) all the time, with:

	$ git checkout next
        $ hack hack hack

        $ git checkout -m one/topic
        $ git commit -o this-path that-path
        $ git checkout next
        $ git pull . one/topic

Giving a short-hand for the last four-command sequence would
certainly be nice.

> I am your number one fan!  If I finish reading these 600+
> messages, will I find out you have already implemented it,
> it's committed, and you just need me to test it now? :-)

Likewise... ;-)

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Junio C Hamano @ 2006-04-20 18:49 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604201414490.2215@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

> On Thu, 20 Apr 2006, Shawn Pearce wrote:
>
>> The more that I think about it the more it seems possible that the
>> pathname hashing is what may be causing the problem.  Not only did
>> bisect point to 1d6b38cc76c348e2477506ca9759fc241e3d0d46 but the
>> directory which contains the bulk of the space has many files with
>> the same name located in different directories:
> [...]
>
> But the bad commit according to your bisection talks about "thin" packs 
> which are not involved in your case.  So something looks fishy with that 
> commit which should not have touched path hashing in the non-thin pack 
> case...  I think...

I think this explains it.  The new code hashes full-path, but
places bins for the paths with the same basename next to each
other, so before Makefile and doc/Makefile and t/Makefile were
all in the same bin, but now they are in three different bins
next to each other.

I originally thought, with one single notable exception of
Makefile, having the identically named file in many different
directories is not common nor sane, and the new code favors to
delta with the exact same path for deeper history over wasting
delta window for making delta with objects with the same name in
different places in more recent history.  I think I benched this
with kernel repository (git.git was too small for that).

But I suspect we have a built-in "we sort bigger to smaller, and
we cut off when we switch bins" somewhere in find_delta() loop,
which I do not recall touching when I did that change, so that
may be interfering and preventing 0-11-AdjLite.deg from all over
the place to delta against each other.

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Nicolas Pitre @ 2006-04-20 18:24 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <20060420175554.GH31738@spearce.org>

On Thu, 20 Apr 2006, Shawn Pearce wrote:

> The more that I think about it the more it seems possible that the
> pathname hashing is what may be causing the problem.  Not only did
> bisect point to 1d6b38cc76c348e2477506ca9759fc241e3d0d46 but the
> directory which contains the bulk of the space has many files with
> the same name located in different directories:
[...]

But the bad commit according to your bisection talks about "thin" packs 
which are not involved in your case.  So something looks fishy with that 
commit which should not have touched path hashing in the non-thin pack 
case...  I think...


Nicolas

^ permalink raw reply

* Re: n-heads and patch dependency chains
From: Jon Loeliger @ 2006-04-20 18:08 UTC (permalink / raw)
  To: Andreas Ericsson; +Cc: Jakub Narebski, Git List
In-Reply-To: <44325CDB.2000101@op5.se>

On Tue, 2006-04-04 at 06:47, Andreas Ericsson wrote:

> No, I mean that this would commit both to the testing branch (being the 
> result of several merged topic-branches) and to the topic-branch merged 
> in. Commit as in regular commit, with a commit-message and a patch. The 
> resulting repository would be the exact same as if the change was 
> committed only to the topic-branch and then cherry-picked on to the 
> testing-branch.

I am your number one fan!  If I finish reading these 600+
messages, will I find out you have already implemented it,
it's committed, and you just need me to test it now? :-)

jdl

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Shawn Pearce @ 2006-04-20 17:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List, Nicolas Pitre
In-Reply-To: <Pine.LNX.4.64.0604200954440.3701@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> wrote:
> Ok, so that wasn't it, and the new sort order is superior.
> 
> That means that it probably _is_ the delta changes themselves (probably 
> commit c13c6bf7 "diff-delta: bound hash list length to avoid O(m*n) 
> behavior". You can try
> 
> 	git revert c13c6bf7

No effect.
 
> to see if that's it. Although Nico already showed interest, and if you 
> make the archive available to him, he's sure to figure it out.

I sent the URL privately to Nico as I did not want the repository
to be publically available before next Tuesday.

> You can try "--depth=50" (slogan: more "hot delta on delta action"), but 
> it's looking less and less like a delta selection issue, and more and more 
> like the deltas themselves are deproved.

No effect at either 50 or 100.

The more that I think about it the more it seems possible that the
pathname hashing is what may be causing the problem.  Not only did
bisect point to 1d6b38cc76c348e2477506ca9759fc241e3d0d46 but the
directory which contains the bulk of the space has many files with
the same name located in different directories:

	results/MT/Math/10000/0-11-AdjLite.deg
	results/MT/Math/10000/0-12-AdjLite.deg
	...
	results/MT/Math/30000/2-11-AdjLite.deg
	results/MT/Math/30000/2-12-AdjLite.deg
	...
	results/Rand48/Math/10000/2-11-AdjLite.deg
	results/Rand48/Math/10000/2-12-AdjLite.deg
	...
	results/Rand48/Math/30000/2-11-AdjLite.deg
	results/Rand48/Math/30000/2-12-AdjLite.deg
	...

For example the name '0-11-AdjLite.deg' occurs in 63 directories and
none of those occurrances are likely to delta against one another
very well.  Also most of these files only have 1 or 2 revisions,
so there is very little per-file history.

-- 
Shawn.

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Nicolas Pitre @ 2006-04-20 17:54 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <20060420173131.GF31738@spearce.org>

On Thu, 20 Apr 2006, Shawn Pearce wrote:

> I just spent some time bisecting this issue and it looks like the
> following change by Junio may be the culprit:
> 
>   commit 1d6b38cc76c348e2477506ca9759fc241e3d0d46
>   Author: Junio C Hamano <junkio@cox.net>
>   Date:   Wed Feb 22 22:10:24 2006 -0800
>   
>       pack-objects: use full pathname to help hashing with "thin" pack.
>       
>       This uses the same hashing algorithm to the "preferred base
>       tree" objects and the incoming pathnames, to group the same
>       files from different revs together, while spreading files with
>       the same basename in different directories.
>       
>       Signed-off-by: Junio C Hamano <junkio@cox.net>
>   
>   :100644 100644 af3bdf5d358b8a47ed23bcb7e9721e956eb59d60 3a16b7e4ce25ec05c64817dfd92dd9d517ab9dd3 M      pack-objects.c

Hmmm... This one is for Junio to fix I'd say.  Not sure what it does.


Nicolas

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Nicolas Pitre @ 2006-04-20 17:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Shawn Pearce, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0604200954440.3701@g5.osdl.org>

On Thu, 20 Apr 2006, Linus Torvalds wrote:

> That means that it probably _is_ the delta changes themselves (probably 
> commit c13c6bf7 "diff-delta: bound hash list length to avoid O(m*n) 
> behavior". You can try
> 
> 	git revert c13c6bf7
> 
> to see if that's it. Although Nico already showed interest, and if you 
> make the archive available to him, he's sure to figure it out.

It is not that.  With that code disabled there is still a 2x pack size.

Substituting diff-delta.c from the version in 1.2.3 doesn't solve the 
issue either.


Nicolas

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox