git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Historical kernel repository size
@ 2006-09-14 14:22 Petr Baudis
  2006-09-14 14:38 ` tglx
  2006-09-14 15:31 ` Linus Torvalds
  0 siblings, 2 replies; 16+ messages in thread
From: Petr Baudis @ 2006-09-14 14:22 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: git

  Hi,

  just to test the packing improvements we had achieved over the last
year, I have repacked the historical kernel repository and achieved a
significant improvement:

xpasky@machine[0:0]~/hi/history$ git-repack -a -f
Generating pack...
Done counting 566638 objects.
Deltifying 566638 objects.
 100% (566638/566638) done
Writing 566638 objects.
 100% (566638/566638) done
Total 566638, written 566638 (delta 456212), reused 98435 (delta 0)
Pack pack-4d27038611fe7755938efd4a2745d5d5d35de1c1 created.
xpasky@machine[0:0]~/hi/history$ l .git/objects/pack/
total 476264
-rw-r--r-- 1 xpasky users  13600376 Sep 14 16:18 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
-rw-r--r-- 1 xpasky users 197168186 Sep 14 16:18 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack
-rw-r--r-- 1 xpasky users  13600376 Sep 14 12:18 pack-cc3517351ecce3ef7ba010559992bdfc10b7acd4.idx
-rw-r--r-- 1 xpasky users 262818936 Sep 14 12:29 pack-cc3517351ecce3ef7ba010559992bdfc10b7acd4.pack

  Since it's a nice place for people to check about how efficient we are
with compressing the repository, perhaps it would be a good idea to
repack the historical repository on kernel.org?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Snow falling on Perl. White noise covering line noise.
Hides all the bugs too. -- J. Putnam

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 14:22 Historical kernel repository size Petr Baudis
@ 2006-09-14 14:38 ` tglx
  2006-09-14 15:38   ` Andy Whitcroft
  2006-09-14 15:31 ` Linus Torvalds
  1 sibling, 1 reply; 16+ messages in thread
From: tglx @ 2006-09-14 14:38 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

Petr,

>   just to test the packing improvements we had achieved over the last
> year, I have repacked the historical kernel repository and achieved a
> significant improvement:
> ....
>   Since it's a nice place for people to check about how efficient we are
> with compressing the repository, perhaps it would be a good idea to
> repack the historical repository on kernel.org?

I'll do once I'm back home.

    tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 14:22 Historical kernel repository size Petr Baudis
  2006-09-14 14:38 ` tglx
@ 2006-09-14 15:31 ` Linus Torvalds
  2006-09-14 21:23   ` Nicolas Pitre
  1 sibling, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2006-09-14 15:31 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Thomas Gleixner, Git Mailing List



On Thu, 14 Sep 2006, Petr Baudis wrote:
> 
>   just to test the packing improvements we had achieved over the last
> year, I have repacked the historical kernel repository and achieved a
> significant improvement:

Umm.. Only apparently because the old pack was really really bad. It also 
has the wrong name, probably because it's using the original naming that 
had the SHA1 computed on the unsorted input. That was changed a long time 
ago.

Yours isn't wonderful either.

> -rw-r--r-- 1 xpasky users  13600376 Sep 14 16:18 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
> -rw-r--r-- 1 xpasky users 197168186 Sep 14 16:18 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack
> -rw-r--r-- 1 xpasky users  13600376 Sep 14 12:18 pack-cc3517351ecce3ef7ba010559992bdfc10b7acd4.idx
> -rw-r--r-- 1 xpasky users 262818936 Sep 14 12:29 pack-cc3517351ecce3ef7ba010559992bdfc10b7acd4.pack

Mine are:

-rw-r--r-- 1 torvalds torvalds  13600376 Apr 19 10:06 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
-rw-r--r-- 1 torvalds torvalds 185374386 Apr 19 10:06 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack

and as you can see from the date, they aren't exactly very recent, but 
they shave an additional 6% off the size.

I agree that the _oroginal_ history pack by Thomas seems to be bad, and 
that's from Aug 9 2005, so it's likely with some really really old packing 
rules.

For better packing, I think I used a larger depth, ie try something like

	git repack -a -f --depth=50

to get more improvement. For a historical archive that you don't much use, 
doign the deeper depth is definitely worth it.

		Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 14:38 ` tglx
@ 2006-09-14 15:38   ` Andy Whitcroft
  2006-09-14 15:52     ` Shawn Pearce
  2006-09-14 15:52     ` Petr Baudis
  0 siblings, 2 replies; 16+ messages in thread
From: Andy Whitcroft @ 2006-09-14 15:38 UTC (permalink / raw)
  To: tglx; +Cc: Petr Baudis, git

tglx@linutronix.de wrote:
> Petr,
> 
>>   just to test the packing improvements we had achieved over the last
>> year, I have repacked the historical kernel repository and achieved a
>> significant improvement:
>> ....
>>   Since it's a nice place for people to check about how efficient we are
>> with compressing the repository, perhaps it would be a good idea to
>> repack the historical repository on kernel.org?
> 
> I'll do once I'm back home.

Is there any reason this isn't a live history.  ie that we don't
constantly pull linus' master branch into this history to make it a real
complete history?

Perhaps that isn't possible ... hmmm.  I guess it might only work if
linus' repo was actually a grafted version of this history?

/me watches his head explode.

-apw

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 15:38   ` Andy Whitcroft
@ 2006-09-14 15:52     ` Shawn Pearce
  2006-09-14 15:52     ` Petr Baudis
  1 sibling, 0 replies; 16+ messages in thread
From: Shawn Pearce @ 2006-09-14 15:52 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: tglx, Petr Baudis, git

Andy Whitcroft <apw@shadowen.org> wrote:
> tglx@linutronix.de wrote:
> > Petr,
> > 
> >>   just to test the packing improvements we had achieved over the last
> >> year, I have repacked the historical kernel repository and achieved a
> >> significant improvement:
> >> ....
> >>   Since it's a nice place for people to check about how efficient we are
> >> with compressing the repository, perhaps it would be a good idea to
> >> repack the historical repository on kernel.org?
> > 
> > I'll do once I'm back home.
> 
> Is there any reason this isn't a live history.  ie that we don't
> constantly pull linus' master branch into this history to make it a real
> complete history?
> 
> Perhaps that isn't possible ... hmmm.  I guess it might only work if
> linus' repo was actually a grafted version of this history?

Right - the only way to join the two is to graft them together.

Since grafts are a purely local matter anyone can pull both into
the same repository and insert the correct grafts to get a complete
history.  You would just want to publish on the kernel.org website
the correct grafts file, so users don't have to figure it out on
their own.

Since I'm not a kernel developer I haven't even looked to see if
such a grafts file has been published.  :-)

-- 
Shawn.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 15:38   ` Andy Whitcroft
  2006-09-14 15:52     ` Shawn Pearce
@ 2006-09-14 15:52     ` Petr Baudis
  1 sibling, 0 replies; 16+ messages in thread
From: Petr Baudis @ 2006-09-14 15:52 UTC (permalink / raw)
  To: Andy Whitcroft; +Cc: tglx, git

Dear diary, on Thu, Sep 14, 2006 at 05:38:29PM CEST, I got a letter
where Andy Whitcroft <apw@shadowen.org> said that...
> Is there any reason this isn't a live history.  ie that we don't
> constantly pull linus' master branch into this history to make it a real
> complete history?

Because at the early times of Git, things were evolving fast and it
would be unfeasible to have to drag this old history around in case of
format changes and stuff. Also, at that time history was still very big
and it would be impractical to require all the kernel developers to grab
all the bitkeeper history (it still kind of is).

> Perhaps that isn't possible ... hmmm.  I guess it might only work if
> linus' repo was actually a grafted version of this history?
> 
> /me watches his head explode.

	http://lkml.org/lkml/2006/6/17/110

may be useful.

It wasn't accepted. Oh well, I may try to resubmit it again soon. :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Snow falling on Perl. White noise covering line noise.
Hides all the bugs too. -- J. Putnam

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 15:31 ` Linus Torvalds
@ 2006-09-14 21:23   ` Nicolas Pitre
  2006-09-14 21:32     ` Thomas Gleixner
  2006-09-14 21:37     ` Thomas Gleixner
  0 siblings, 2 replies; 16+ messages in thread
From: Nicolas Pitre @ 2006-09-14 21:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Thomas Gleixner, Git Mailing List

On Thu, 14 Sep 2006, Linus Torvalds wrote:

> For better packing, I think I used a larger depth, ie try something like
> 
> 	git repack -a -f --depth=50
> 
> to get more improvement. For a historical archive that you don't much use, 
> doign the deeper depth is definitely worth it.

Using a larger window helps too.  It of course has a direct impact on 
the processing to perform a full repack, but it has no runtime costs 
when the pack is used.  So I'd suggest adding --window=50 to the above.

[ I made those suggestions in person to Thomas at OLS to which 
  he replied he'd do it when he'd get back home.   ;-) ]


Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 21:23   ` Nicolas Pitre
@ 2006-09-14 21:32     ` Thomas Gleixner
  2006-09-14 21:37     ` Thomas Gleixner
  1 sibling, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2006-09-14 21:32 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List

On Thu, 2006-09-14 at 17:23 -0400, Nicolas Pitre wrote:
> On Thu, 14 Sep 2006, Linus Torvalds wrote:
> 
> > For better packing, I think I used a larger depth, ie try something like
> > 
> > 	git repack -a -f --depth=50
> > 
> > to get more improvement. For a historical archive that you don't much use, 
> > doign the deeper depth is definitely worth it.
> 
> Using a larger window helps too.  It of course has a direct impact on 
> the processing to perform a full repack, but it has no runtime costs 
> when the pack is used.  So I'd suggest adding --window=50 to the above.
> 
> [ I made those suggestions in person to Thomas at OLS to which 
>   he replied he'd do it when he'd get back home.   ;-) ]

Thanks for the reminder. I actually logged into kernel.org already :)

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 21:23   ` Nicolas Pitre
  2006-09-14 21:32     ` Thomas Gleixner
@ 2006-09-14 21:37     ` Thomas Gleixner
  2006-09-14 21:42       ` Nicolas Pitre
  1 sibling, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2006-09-14 21:37 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List

On Thu, 2006-09-14 at 17:23 -0400, Nicolas Pitre wrote:
> On Thu, 14 Sep 2006, Linus Torvalds wrote:
> 
> > For better packing, I think I used a larger depth, ie try something like
> > 
> > 	git repack -a -f --depth=50
> > 
> when the pack is used.  So I'd suggest adding --window=50 to the above.

Great advise !

git repack neither accepts --depth nor --window

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 21:37     ` Thomas Gleixner
@ 2006-09-14 21:42       ` Nicolas Pitre
  2006-09-14 21:54         ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Pitre @ 2006-09-14 21:42 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List

On Thu, 14 Sep 2006, Thomas Gleixner wrote:

> On Thu, 2006-09-14 at 17:23 -0400, Nicolas Pitre wrote:
> > On Thu, 14 Sep 2006, Linus Torvalds wrote:
> > 
> > > For better packing, I think I used a larger depth, ie try something like
> > > 
> > > 	git repack -a -f --depth=50
> > > 
> > when the pack is used.  So I'd suggest adding --window=50 to the above.
> 
> Great advise !
> 
> git repack neither accepts --depth nor --window

Is the GIT version on kernel.org _that_ old?

What a shame...


Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 21:42       ` Nicolas Pitre
@ 2006-09-14 21:54         ` Thomas Gleixner
  2006-09-14 22:24           ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2006-09-14 21:54 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List

On Thu, 2006-09-14 at 17:42 -0400, Nicolas Pitre wrote:
> > git repack neither accepts --depth nor --window
> 
> Is the GIT version on kernel.org _that_ old?
> 
> What a shame...

[tglx@hera history.git]$ git --version
git version 1.4.2.1

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 21:54         ` Thomas Gleixner
@ 2006-09-14 22:24           ` Thomas Gleixner
  2006-09-14 23:15             ` Junio C Hamano
  2006-09-15  1:19             ` Nicolas Pitre
  0 siblings, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2006-09-14 22:24 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List

On Thu, 2006-09-14 at 23:54 +0200, Thomas Gleixner wrote:
> On Thu, 2006-09-14 at 17:42 -0400, Nicolas Pitre wrote:
> > > git repack neither accepts --depth nor --window
> > 
> > Is the GIT version on kernel.org _that_ old?
> > 
> > What a shame...
> 
> [tglx@hera history.git]$ git --version
> git version 1.4.2.1

I know I'm stupid

"git-repack --window=50 --depth=50 -a -f" works
"git-repack -a -f --window=50 --depth=50" does not

Intuitive user interfaces are my favorite pitfalls.

-rw-rw-r-- 1 tglx ftpadmin  13600376 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
-rw-rw-r-- 1 tglx ftpadmin 158679705 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack

	tglx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 22:24           ` Thomas Gleixner
@ 2006-09-14 23:15             ` Junio C Hamano
  2006-09-15  1:19             ` Nicolas Pitre
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2006-09-14 23:15 UTC (permalink / raw)
  To: tglx; +Cc: git

Thomas Gleixner <tglx@linutronix.de> writes:

>> [tglx@hera history.git]$ git --version
>> git version 1.4.2.1
>
> I know I'm stupid
>
> "git-repack --window=50 --depth=50 -a -f" works
> "git-repack -a -f --window=50 --depth=50" does not
>
> Intuitive user interfaces are my favorite pitfalls.

Whaaaat?

I've run them under "sh -x" and both results in a pipe of:

	git-rev-list --objects --all |
        git-pack-objects --non-empty --no-reuse-delta --window=50 --depth=50 \
	.git/.tmp-<somepid>-pack

Now you are making me really worried.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-14 22:24           ` Thomas Gleixner
  2006-09-14 23:15             ` Junio C Hamano
@ 2006-09-15  1:19             ` Nicolas Pitre
  2006-09-15  9:03               ` Olivier Galibert
  1 sibling, 1 reply; 16+ messages in thread
From: Nicolas Pitre @ 2006-09-15  1:19 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List

On Fri, 15 Sep 2006, Thomas Gleixner wrote:

> On Thu, 2006-09-14 at 23:54 +0200, Thomas Gleixner wrote:
> > On Thu, 2006-09-14 at 17:42 -0400, Nicolas Pitre wrote:
> > > > git repack neither accepts --depth nor --window
> > > 
> > > Is the GIT version on kernel.org _that_ old?
> > > 
> > > What a shame...
> > 
> > [tglx@hera history.git]$ git --version
> > git version 1.4.2.1

OK that's recent enough indeed.

> I know I'm stupid
> 
> "git-repack --window=50 --depth=50 -a -f" works
> "git-repack -a -f --window=50 --depth=50" does not
> 
> Intuitive user interfaces are my favorite pitfalls.

Erm... Both incantations work fine fine here.

> -rw-rw-r-- 1 tglx ftpadmin  13600376 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
> -rw-rw-r-- 1 tglx ftpadmin 158679705 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack

And I get the same result as well.


Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-15  1:19             ` Nicolas Pitre
@ 2006-09-15  9:03               ` Olivier Galibert
  2006-09-15 16:45                 ` Nicolas Pitre
  0 siblings, 1 reply; 16+ messages in thread
From: Olivier Galibert @ 2006-09-15  9:03 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Thomas Gleixner, Linus Torvalds, Petr Baudis, Git Mailing List

On Thu, Sep 14, 2006 at 09:19:04PM -0400, Nicolas Pitre wrote:
> Erm... Both incantations work fine fine here.
> 
> > -rw-rw-r-- 1 tglx ftpadmin  13600376 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
> > -rw-rw-r-- 1 tglx ftpadmin 158679705 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack
> 
> And I get the same result as well.

For the curious, a 100/100 parameter gives a size of 154261771.
Disminishing returns, here I come.

  OG.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Historical kernel repository size
  2006-09-15  9:03               ` Olivier Galibert
@ 2006-09-15 16:45                 ` Nicolas Pitre
  0 siblings, 0 replies; 16+ messages in thread
From: Nicolas Pitre @ 2006-09-15 16:45 UTC (permalink / raw)
  To: Olivier Galibert
  Cc: Thomas Gleixner, Linus Torvalds, Petr Baudis, Git Mailing List

On Fri, 15 Sep 2006, Olivier Galibert wrote:

> On Thu, Sep 14, 2006 at 09:19:04PM -0400, Nicolas Pitre wrote:
> > Erm... Both incantations work fine fine here.
> > 
> > > -rw-rw-r-- 1 tglx ftpadmin  13600376 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
> > > -rw-rw-r-- 1 tglx ftpadmin 158679705 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack
> > 
> > And I get the same result as well.
> 
> For the curious, a 100/100 parameter gives a size of 154261771.

Right.   And then the runtime cost of extracting objects out of such a 
pack increases due to the deeper delta chain.

The average runtime cost is probably linear with the delta depth, 
something like f(x) = a*x + k.

But the size reduction follows f(x) = a/x + k.

So to say that infinite delta length does not provide infinite packing 
size reduction. Anything larger than 50 is probably not worth the 
small reduction gain.


Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-09-15 16:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-14 14:22 Historical kernel repository size Petr Baudis
2006-09-14 14:38 ` tglx
2006-09-14 15:38   ` Andy Whitcroft
2006-09-14 15:52     ` Shawn Pearce
2006-09-14 15:52     ` Petr Baudis
2006-09-14 15:31 ` Linus Torvalds
2006-09-14 21:23   ` Nicolas Pitre
2006-09-14 21:32     ` Thomas Gleixner
2006-09-14 21:37     ` Thomas Gleixner
2006-09-14 21:42       ` Nicolas Pitre
2006-09-14 21:54         ` Thomas Gleixner
2006-09-14 22:24           ` Thomas Gleixner
2006-09-14 23:15             ` Junio C Hamano
2006-09-15  1:19             ` Nicolas Pitre
2006-09-15  9:03               ` Olivier Galibert
2006-09-15 16:45                 ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).