* Historical kernel repository size
@ 2006-09-14 14:22 Petr Baudis
2006-09-14 14:38 ` tglx
2006-09-14 15:31 ` Linus Torvalds
0 siblings, 2 replies; 16+ messages in thread
From: Petr Baudis @ 2006-09-14 14:22 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: git
Hi,
just to test the packing improvements we had achieved over the last
year, I have repacked the historical kernel repository and achieved a
significant improvement:
xpasky@machine[0:0]~/hi/history$ git-repack -a -f
Generating pack...
Done counting 566638 objects.
Deltifying 566638 objects.
100% (566638/566638) done
Writing 566638 objects.
100% (566638/566638) done
Total 566638, written 566638 (delta 456212), reused 98435 (delta 0)
Pack pack-4d27038611fe7755938efd4a2745d5d5d35de1c1 created.
xpasky@machine[0:0]~/hi/history$ l .git/objects/pack/
total 476264
-rw-r--r-- 1 xpasky users 13600376 Sep 14 16:18 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
-rw-r--r-- 1 xpasky users 197168186 Sep 14 16:18 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack
-rw-r--r-- 1 xpasky users 13600376 Sep 14 12:18 pack-cc3517351ecce3ef7ba010559992bdfc10b7acd4.idx
-rw-r--r-- 1 xpasky users 262818936 Sep 14 12:29 pack-cc3517351ecce3ef7ba010559992bdfc10b7acd4.pack
Since it's a nice place for people to check about how efficient we are
with compressing the repository, perhaps it would be a good idea to
repack the historical repository on kernel.org?
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Snow falling on Perl. White noise covering line noise.
Hides all the bugs too. -- J. Putnam
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 14:22 Historical kernel repository size Petr Baudis
@ 2006-09-14 14:38 ` tglx
2006-09-14 15:38 ` Andy Whitcroft
2006-09-14 15:31 ` Linus Torvalds
1 sibling, 1 reply; 16+ messages in thread
From: tglx @ 2006-09-14 14:38 UTC (permalink / raw)
To: Petr Baudis; +Cc: git
Petr,
> just to test the packing improvements we had achieved over the last
> year, I have repacked the historical kernel repository and achieved a
> significant improvement:
> ....
> Since it's a nice place for people to check about how efficient we are
> with compressing the repository, perhaps it would be a good idea to
> repack the historical repository on kernel.org?
I'll do once I'm back home.
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 14:22 Historical kernel repository size Petr Baudis
2006-09-14 14:38 ` tglx
@ 2006-09-14 15:31 ` Linus Torvalds
2006-09-14 21:23 ` Nicolas Pitre
1 sibling, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2006-09-14 15:31 UTC (permalink / raw)
To: Petr Baudis; +Cc: Thomas Gleixner, Git Mailing List
On Thu, 14 Sep 2006, Petr Baudis wrote:
>
> just to test the packing improvements we had achieved over the last
> year, I have repacked the historical kernel repository and achieved a
> significant improvement:
Umm.. Only apparently because the old pack was really really bad. It also
has the wrong name, probably because it's using the original naming that
had the SHA1 computed on the unsorted input. That was changed a long time
ago.
Yours isn't wonderful either.
> -rw-r--r-- 1 xpasky users 13600376 Sep 14 16:18 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
> -rw-r--r-- 1 xpasky users 197168186 Sep 14 16:18 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack
> -rw-r--r-- 1 xpasky users 13600376 Sep 14 12:18 pack-cc3517351ecce3ef7ba010559992bdfc10b7acd4.idx
> -rw-r--r-- 1 xpasky users 262818936 Sep 14 12:29 pack-cc3517351ecce3ef7ba010559992bdfc10b7acd4.pack
Mine are:
-rw-r--r-- 1 torvalds torvalds 13600376 Apr 19 10:06 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
-rw-r--r-- 1 torvalds torvalds 185374386 Apr 19 10:06 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack
and as you can see from the date, they aren't exactly very recent, but
they shave an additional 6% off the size.
I agree that the _oroginal_ history pack by Thomas seems to be bad, and
that's from Aug 9 2005, so it's likely with some really really old packing
rules.
For better packing, I think I used a larger depth, ie try something like
git repack -a -f --depth=50
to get more improvement. For a historical archive that you don't much use,
doign the deeper depth is definitely worth it.
Linus
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 14:38 ` tglx
@ 2006-09-14 15:38 ` Andy Whitcroft
2006-09-14 15:52 ` Shawn Pearce
2006-09-14 15:52 ` Petr Baudis
0 siblings, 2 replies; 16+ messages in thread
From: Andy Whitcroft @ 2006-09-14 15:38 UTC (permalink / raw)
To: tglx; +Cc: Petr Baudis, git
tglx@linutronix.de wrote:
> Petr,
>
>> just to test the packing improvements we had achieved over the last
>> year, I have repacked the historical kernel repository and achieved a
>> significant improvement:
>> ....
>> Since it's a nice place for people to check about how efficient we are
>> with compressing the repository, perhaps it would be a good idea to
>> repack the historical repository on kernel.org?
>
> I'll do once I'm back home.
Is there any reason this isn't a live history. ie that we don't
constantly pull linus' master branch into this history to make it a real
complete history?
Perhaps that isn't possible ... hmmm. I guess it might only work if
linus' repo was actually a grafted version of this history?
/me watches his head explode.
-apw
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 15:38 ` Andy Whitcroft
@ 2006-09-14 15:52 ` Shawn Pearce
2006-09-14 15:52 ` Petr Baudis
1 sibling, 0 replies; 16+ messages in thread
From: Shawn Pearce @ 2006-09-14 15:52 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: tglx, Petr Baudis, git
Andy Whitcroft <apw@shadowen.org> wrote:
> tglx@linutronix.de wrote:
> > Petr,
> >
> >> just to test the packing improvements we had achieved over the last
> >> year, I have repacked the historical kernel repository and achieved a
> >> significant improvement:
> >> ....
> >> Since it's a nice place for people to check about how efficient we are
> >> with compressing the repository, perhaps it would be a good idea to
> >> repack the historical repository on kernel.org?
> >
> > I'll do once I'm back home.
>
> Is there any reason this isn't a live history. ie that we don't
> constantly pull linus' master branch into this history to make it a real
> complete history?
>
> Perhaps that isn't possible ... hmmm. I guess it might only work if
> linus' repo was actually a grafted version of this history?
Right - the only way to join the two is to graft them together.
Since grafts are a purely local matter anyone can pull both into
the same repository and insert the correct grafts to get a complete
history. You would just want to publish on the kernel.org website
the correct grafts file, so users don't have to figure it out on
their own.
Since I'm not a kernel developer I haven't even looked to see if
such a grafts file has been published. :-)
--
Shawn.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 15:38 ` Andy Whitcroft
2006-09-14 15:52 ` Shawn Pearce
@ 2006-09-14 15:52 ` Petr Baudis
1 sibling, 0 replies; 16+ messages in thread
From: Petr Baudis @ 2006-09-14 15:52 UTC (permalink / raw)
To: Andy Whitcroft; +Cc: tglx, git
Dear diary, on Thu, Sep 14, 2006 at 05:38:29PM CEST, I got a letter
where Andy Whitcroft <apw@shadowen.org> said that...
> Is there any reason this isn't a live history. ie that we don't
> constantly pull linus' master branch into this history to make it a real
> complete history?
Because at the early times of Git, things were evolving fast and it
would be unfeasible to have to drag this old history around in case of
format changes and stuff. Also, at that time history was still very big
and it would be impractical to require all the kernel developers to grab
all the bitkeeper history (it still kind of is).
> Perhaps that isn't possible ... hmmm. I guess it might only work if
> linus' repo was actually a grafted version of this history?
>
> /me watches his head explode.
http://lkml.org/lkml/2006/6/17/110
may be useful.
It wasn't accepted. Oh well, I may try to resubmit it again soon. :-)
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Snow falling on Perl. White noise covering line noise.
Hides all the bugs too. -- J. Putnam
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 15:31 ` Linus Torvalds
@ 2006-09-14 21:23 ` Nicolas Pitre
2006-09-14 21:32 ` Thomas Gleixner
2006-09-14 21:37 ` Thomas Gleixner
0 siblings, 2 replies; 16+ messages in thread
From: Nicolas Pitre @ 2006-09-14 21:23 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Petr Baudis, Thomas Gleixner, Git Mailing List
On Thu, 14 Sep 2006, Linus Torvalds wrote:
> For better packing, I think I used a larger depth, ie try something like
>
> git repack -a -f --depth=50
>
> to get more improvement. For a historical archive that you don't much use,
> doign the deeper depth is definitely worth it.
Using a larger window helps too. It of course has a direct impact on
the processing to perform a full repack, but it has no runtime costs
when the pack is used. So I'd suggest adding --window=50 to the above.
[ I made those suggestions in person to Thomas at OLS to which
he replied he'd do it when he'd get back home. ;-) ]
Nicolas
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 21:23 ` Nicolas Pitre
@ 2006-09-14 21:32 ` Thomas Gleixner
2006-09-14 21:37 ` Thomas Gleixner
1 sibling, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2006-09-14 21:32 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List
On Thu, 2006-09-14 at 17:23 -0400, Nicolas Pitre wrote:
> On Thu, 14 Sep 2006, Linus Torvalds wrote:
>
> > For better packing, I think I used a larger depth, ie try something like
> >
> > git repack -a -f --depth=50
> >
> > to get more improvement. For a historical archive that you don't much use,
> > doign the deeper depth is definitely worth it.
>
> Using a larger window helps too. It of course has a direct impact on
> the processing to perform a full repack, but it has no runtime costs
> when the pack is used. So I'd suggest adding --window=50 to the above.
>
> [ I made those suggestions in person to Thomas at OLS to which
> he replied he'd do it when he'd get back home. ;-) ]
Thanks for the reminder. I actually logged into kernel.org already :)
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 21:23 ` Nicolas Pitre
2006-09-14 21:32 ` Thomas Gleixner
@ 2006-09-14 21:37 ` Thomas Gleixner
2006-09-14 21:42 ` Nicolas Pitre
1 sibling, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2006-09-14 21:37 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List
On Thu, 2006-09-14 at 17:23 -0400, Nicolas Pitre wrote:
> On Thu, 14 Sep 2006, Linus Torvalds wrote:
>
> > For better packing, I think I used a larger depth, ie try something like
> >
> > git repack -a -f --depth=50
> >
> when the pack is used. So I'd suggest adding --window=50 to the above.
Great advise !
git repack neither accepts --depth nor --window
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 21:37 ` Thomas Gleixner
@ 2006-09-14 21:42 ` Nicolas Pitre
2006-09-14 21:54 ` Thomas Gleixner
0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Pitre @ 2006-09-14 21:42 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List
On Thu, 14 Sep 2006, Thomas Gleixner wrote:
> On Thu, 2006-09-14 at 17:23 -0400, Nicolas Pitre wrote:
> > On Thu, 14 Sep 2006, Linus Torvalds wrote:
> >
> > > For better packing, I think I used a larger depth, ie try something like
> > >
> > > git repack -a -f --depth=50
> > >
> > when the pack is used. So I'd suggest adding --window=50 to the above.
>
> Great advise !
>
> git repack neither accepts --depth nor --window
Is the GIT version on kernel.org _that_ old?
What a shame...
Nicolas
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 21:42 ` Nicolas Pitre
@ 2006-09-14 21:54 ` Thomas Gleixner
2006-09-14 22:24 ` Thomas Gleixner
0 siblings, 1 reply; 16+ messages in thread
From: Thomas Gleixner @ 2006-09-14 21:54 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List
On Thu, 2006-09-14 at 17:42 -0400, Nicolas Pitre wrote:
> > git repack neither accepts --depth nor --window
>
> Is the GIT version on kernel.org _that_ old?
>
> What a shame...
[tglx@hera history.git]$ git --version
git version 1.4.2.1
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 21:54 ` Thomas Gleixner
@ 2006-09-14 22:24 ` Thomas Gleixner
2006-09-14 23:15 ` Junio C Hamano
2006-09-15 1:19 ` Nicolas Pitre
0 siblings, 2 replies; 16+ messages in thread
From: Thomas Gleixner @ 2006-09-14 22:24 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List
On Thu, 2006-09-14 at 23:54 +0200, Thomas Gleixner wrote:
> On Thu, 2006-09-14 at 17:42 -0400, Nicolas Pitre wrote:
> > > git repack neither accepts --depth nor --window
> >
> > Is the GIT version on kernel.org _that_ old?
> >
> > What a shame...
>
> [tglx@hera history.git]$ git --version
> git version 1.4.2.1
I know I'm stupid
"git-repack --window=50 --depth=50 -a -f" works
"git-repack -a -f --window=50 --depth=50" does not
Intuitive user interfaces are my favorite pitfalls.
-rw-rw-r-- 1 tglx ftpadmin 13600376 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
-rw-rw-r-- 1 tglx ftpadmin 158679705 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack
tglx
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 22:24 ` Thomas Gleixner
@ 2006-09-14 23:15 ` Junio C Hamano
2006-09-15 1:19 ` Nicolas Pitre
1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2006-09-14 23:15 UTC (permalink / raw)
To: tglx; +Cc: git
Thomas Gleixner <tglx@linutronix.de> writes:
>> [tglx@hera history.git]$ git --version
>> git version 1.4.2.1
>
> I know I'm stupid
>
> "git-repack --window=50 --depth=50 -a -f" works
> "git-repack -a -f --window=50 --depth=50" does not
>
> Intuitive user interfaces are my favorite pitfalls.
Whaaaat?
I've run them under "sh -x" and both results in a pipe of:
git-rev-list --objects --all |
git-pack-objects --non-empty --no-reuse-delta --window=50 --depth=50 \
.git/.tmp-<somepid>-pack
Now you are making me really worried.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-14 22:24 ` Thomas Gleixner
2006-09-14 23:15 ` Junio C Hamano
@ 2006-09-15 1:19 ` Nicolas Pitre
2006-09-15 9:03 ` Olivier Galibert
1 sibling, 1 reply; 16+ messages in thread
From: Nicolas Pitre @ 2006-09-15 1:19 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List
On Fri, 15 Sep 2006, Thomas Gleixner wrote:
> On Thu, 2006-09-14 at 23:54 +0200, Thomas Gleixner wrote:
> > On Thu, 2006-09-14 at 17:42 -0400, Nicolas Pitre wrote:
> > > > git repack neither accepts --depth nor --window
> > >
> > > Is the GIT version on kernel.org _that_ old?
> > >
> > > What a shame...
> >
> > [tglx@hera history.git]$ git --version
> > git version 1.4.2.1
OK that's recent enough indeed.
> I know I'm stupid
>
> "git-repack --window=50 --depth=50 -a -f" works
> "git-repack -a -f --window=50 --depth=50" does not
>
> Intuitive user interfaces are my favorite pitfalls.
Erm... Both incantations work fine fine here.
> -rw-rw-r-- 1 tglx ftpadmin 13600376 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
> -rw-rw-r-- 1 tglx ftpadmin 158679705 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack
And I get the same result as well.
Nicolas
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-15 1:19 ` Nicolas Pitre
@ 2006-09-15 9:03 ` Olivier Galibert
2006-09-15 16:45 ` Nicolas Pitre
0 siblings, 1 reply; 16+ messages in thread
From: Olivier Galibert @ 2006-09-15 9:03 UTC (permalink / raw)
To: Nicolas Pitre
Cc: Thomas Gleixner, Linus Torvalds, Petr Baudis, Git Mailing List
On Thu, Sep 14, 2006 at 09:19:04PM -0400, Nicolas Pitre wrote:
> Erm... Both incantations work fine fine here.
>
> > -rw-rw-r-- 1 tglx ftpadmin 13600376 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
> > -rw-rw-r-- 1 tglx ftpadmin 158679705 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack
>
> And I get the same result as well.
For the curious, a 100/100 parameter gives a size of 154261771.
Disminishing returns, here I come.
OG.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Historical kernel repository size
2006-09-15 9:03 ` Olivier Galibert
@ 2006-09-15 16:45 ` Nicolas Pitre
0 siblings, 0 replies; 16+ messages in thread
From: Nicolas Pitre @ 2006-09-15 16:45 UTC (permalink / raw)
To: Olivier Galibert
Cc: Thomas Gleixner, Linus Torvalds, Petr Baudis, Git Mailing List
On Fri, 15 Sep 2006, Olivier Galibert wrote:
> On Thu, Sep 14, 2006 at 09:19:04PM -0400, Nicolas Pitre wrote:
> > Erm... Both incantations work fine fine here.
> >
> > > -rw-rw-r-- 1 tglx ftpadmin 13600376 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.idx
> > > -rw-rw-r-- 1 tglx ftpadmin 158679705 Sep 14 22:16 pack-4d27038611fe7755938efd4a2745d5d5d35de1c1.pack
> >
> > And I get the same result as well.
>
> For the curious, a 100/100 parameter gives a size of 154261771.
Right. And then the runtime cost of extracting objects out of such a
pack increases due to the deeper delta chain.
The average runtime cost is probably linear with the delta depth,
something like f(x) = a*x + k.
But the size reduction follows f(x) = a/x + k.
So to say that infinite delta length does not provide infinite packing
size reduction. Anything larger than 50 is probably not worth the
small reduction gain.
Nicolas
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2006-09-15 16:46 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-14 14:22 Historical kernel repository size Petr Baudis
2006-09-14 14:38 ` tglx
2006-09-14 15:38 ` Andy Whitcroft
2006-09-14 15:52 ` Shawn Pearce
2006-09-14 15:52 ` Petr Baudis
2006-09-14 15:31 ` Linus Torvalds
2006-09-14 21:23 ` Nicolas Pitre
2006-09-14 21:32 ` Thomas Gleixner
2006-09-14 21:37 ` Thomas Gleixner
2006-09-14 21:42 ` Nicolas Pitre
2006-09-14 21:54 ` Thomas Gleixner
2006-09-14 22:24 ` Thomas Gleixner
2006-09-14 23:15 ` Junio C Hamano
2006-09-15 1:19 ` Nicolas Pitre
2006-09-15 9:03 ` Olivier Galibert
2006-09-15 16:45 ` Nicolas Pitre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).