From: Nicolas Pitre <nico@cam.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: 1.3.0 creating bigger packs than 1.2.3
Date: Thu, 20 Apr 2006 17:02:25 -0400 (EDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0604201630320.2215@localhost.localdomain> (raw)
In-Reply-To: <7v8xq0yteb.fsf@assigned-by-dhcp.cox.net>
On Thu, 20 Apr 2006, Junio C Hamano wrote:
> Nicolas Pitre <nico@cam.org> writes:
>
> > On Thu, 20 Apr 2006, Shawn Pearce wrote:
> >
> >> The more that I think about it the more it seems possible that the
> >> pathname hashing is what may be causing the problem. Not only did
> >> bisect point to 1d6b38cc76c348e2477506ca9759fc241e3d0d46 but the
> >> directory which contains the bulk of the space has many files with
> >> the same name located in different directories:
> > [...]
> >
> > But the bad commit according to your bisection talks about "thin" packs
> > which are not involved in your case. So something looks fishy with that
> > commit which should not have touched path hashing in the non-thin pack
> > case... I think...
>
> I think this explains it. The new code hashes full-path, but
> places bins for the paths with the same basename next to each
> other, so before Makefile and doc/Makefile and t/Makefile were
> all in the same bin, but now they are in three different bins
> next to each other.
That is fine. In fact I did try with a tweaked name_hash() that
completely ignored all directory components and the resulting pack was
even bigger, much bigger, when repacking Shawn's repo.
> I originally thought, with one single notable exception of
> Makefile, having the identically named file in many different
> directories is not common nor sane,
I'd tend to disagree with that but...
> and the new code favors to
> delta with the exact same path for deeper history over wasting
> delta window for making delta with objects with the same name in
> different places in more recent history. I think I benched this
> with kernel repository (git.git was too small for that).
This is obviously fine. And if a file in a given directory has few
revisions then the delta window will consider objects for a file with
the same name in other directories as well, which is also sensible. So
if files of the same name are located in different directories they
should delta well against each other if they're similar enough. This
should cover Shawn's repo layout.
> But I suspect we have a built-in "we sort bigger to smaller, and
> we cut off when we switch bins" somewhere in find_delta() loop,
> which I do not recall touching when I did that change, so that
> may be interfering and preventing 0-11-AdjLite.deg from all over
> the place to delta against each other.
I just cannot find something that would do that in the code. When
--no-reuse-delta is specified, the only things that will break the loop
in find_delta() is when try_delta() returns -1, and that happens only
when changing object type or when the size difference is too big, but
nothing looks at the name hash.
It is also hard to corelate it with commit 1d6b38cc which is the one
that introduced the regression.
Nicolas
next prev parent reply other threads:[~2006-04-20 21:02 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-20 13:36 1.3.0 creating bigger packs than 1.2.3 Shawn Pearce
2006-04-20 14:47 ` Linus Torvalds
2006-04-20 15:03 ` Shawn Pearce
2006-04-20 16:07 ` Linus Torvalds
2006-04-20 16:43 ` Shawn Pearce
2006-04-20 17:03 ` Linus Torvalds
2006-04-20 17:24 ` Junio C Hamano
2006-04-20 17:31 ` Shawn Pearce
2006-04-20 17:54 ` Nicolas Pitre
2006-04-20 21:31 ` Junio C Hamano
2006-04-20 21:53 ` Shawn Pearce
2006-04-20 21:56 ` Jakub Narebski
2006-04-20 17:41 ` Nicolas Pitre
2006-04-20 17:55 ` Shawn Pearce
2006-04-20 18:24 ` Nicolas Pitre
2006-04-20 18:49 ` Junio C Hamano
2006-04-20 21:02 ` Nicolas Pitre [this message]
2006-04-20 21:40 ` Junio C Hamano
2006-04-20 22:02 ` Shawn Pearce
2006-04-20 22:35 ` Junio C Hamano
2006-04-21 1:01 ` Shawn Pearce
2006-04-20 22:59 ` Linus Torvalds
2006-04-21 0:52 ` Nicolas Pitre
2006-04-21 1:20 ` Shawn Pearce
2006-04-21 2:28 ` Nicolas Pitre
2006-04-21 2:40 ` Shawn Pearce
2006-04-21 3:07 ` Nicolas Pitre
2006-04-21 2:32 ` Shawn Pearce
2006-04-20 23:02 ` Junio C Hamano
2006-04-20 16:09 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0604201630320.2215@localhost.localdomain \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).