public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Smaller compressed kernel source tarballs?
@ 2006-09-21 20:32 Dax Kelson
       [not found] ` <20060921204250 .GN13641@csclub.uwaterloo.ca>
  2006-09-21 20:42 ` Lennart Sorensen
  0 siblings, 2 replies; 43+ messages in thread
From: Dax Kelson @ 2006-09-21 20:32 UTC (permalink / raw)
  To: Linux kernel; +Cc: Linus Torvalds

Today as I was watching the linux-2.6.18.tar.bz2 slowly download I
thought it would be nice if it could be made smaller.

The 7zip program/algorithm is free software (LGPL) and can be obtained
from http://www.7-zip.org/ and it is distributed with several
distributions (it is in Fedora Core 6 extras for example).

Here are the numbers:

ls -al
-rw-r--r--  1 root root 240138240 Sep 21 13:55 linux-2.6.18.tar
-rw-r--r--  1 root root  34180796 Sep 21 13:42 linux-2.6.18.tar.7z
-rw-r--r--  1 root root  41863580 Sep 21 13:45 linux-2.6.18.tar.bz2
-rw-r--r--  1 root root  52467357 Sep 21 13:13 linux-2.6.18.tar.gz

ls -alh
-rw-r--r--  1 root root 230M Sep 21 13:55 linux-2.6.18.tar
-rw-r--r--  1 root root  33M Sep 21 13:42 linux-2.6.18.tar.7z
-rw-r--r--  1 root root  40M Sep 21 13:45 linux-2.6.18.tar.bz2
-rw-r--r--  1 root root  51M Sep 21 13:13 linux-2.6.18.tar.gz

Smaller the better, especially with the international audience.

Dax Kelson


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-21 20:32 Smaller compressed kernel source tarballs? Dax Kelson
       [not found] ` <20060921204250 .GN13641@csclub.uwaterloo.ca>
@ 2006-09-21 20:42 ` Lennart Sorensen
  2006-09-21 21:40   ` Dax Kelson
                     ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: Lennart Sorensen @ 2006-09-21 20:42 UTC (permalink / raw)
  To: Dax Kelson; +Cc: Linux kernel, Linus Torvalds

On Thu, Sep 21, 2006 at 02:32:57PM -0600, Dax Kelson wrote:
> Today as I was watching the linux-2.6.18.tar.bz2 slowly download I
> thought it would be nice if it could be made smaller.
> 
> The 7zip program/algorithm is free software (LGPL) and can be obtained
> from http://www.7-zip.org/ and it is distributed with several
> distributions (it is in Fedora Core 6 extras for example).
> 
> Here are the numbers:
> 
> ls -al
> -rw-r--r--  1 root root 240138240 Sep 21 13:55 linux-2.6.18.tar
> -rw-r--r--  1 root root  34180796 Sep 21 13:42 linux-2.6.18.tar.7z
> -rw-r--r--  1 root root  41863580 Sep 21 13:45 linux-2.6.18.tar.bz2
> -rw-r--r--  1 root root  52467357 Sep 21 13:13 linux-2.6.18.tar.gz
> 
> ls -alh
> -rw-r--r--  1 root root 230M Sep 21 13:55 linux-2.6.18.tar
> -rw-r--r--  1 root root  33M Sep 21 13:42 linux-2.6.18.tar.7z
> -rw-r--r--  1 root root  40M Sep 21 13:45 linux-2.6.18.tar.bz2
> -rw-r--r--  1 root root  51M Sep 21 13:13 linux-2.6.18.tar.gz
> 
> Smaller the better, especially with the international audience.

But after you download it once, you can just get the diff next time.
How is the decompression time on 7zip versus bzip2 and gzip?

--
Len Sorensen

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
       [not found]   ` <20060921171747.9ae2b42e.seanlkml@sympatico.ca>
@ 2006-09-21 21:17     ` Sean
  2006-09-21 21:41     ` Dax Kelson
  1 sibling, 0 replies; 43+ messages in thread
From: Sean @ 2006-09-21 21:17 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Dax Kelson, Linux kernel, Linus Torvalds

On Thu, 21 Sep 2006 16:42:50 -0400
Lennart Sorensen <lsorense@csclub.uwaterloo.ca> wrote:

> On Thu, Sep 21, 2006 at 02:32:57PM -0600, Dax Kelson wrote:
> > Today as I was watching the linux-2.6.18.tar.bz2 slowly download I
> > thought it would be nice if it could be made smaller.
[...]
> But after you download it once, you can just get the diff next time.
> How is the decompression time on 7zip versus bzip2 and gzip?

Not to mention that by using Git it will take care of all that for you.
Downloading only the updates with no need for you to manually apply diffs
etc..

Sean

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-21 20:42 ` Lennart Sorensen
@ 2006-09-21 21:40   ` Dax Kelson
  2006-09-22 14:00     ` Lennart Sorensen
       [not found]   ` <20060921171747.9ae2b42e.seanlkml@sympatico.ca>
  2006-09-21 21:43   ` H. Peter Anvin
  2 siblings, 1 reply; 43+ messages in thread
From: Dax Kelson @ 2006-09-21 21:40 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Linux kernel, Linus Torvalds

On Thu, 2006-09-21 at 16:42 -0400, Lennart Sorensen wrote:
> But after you download it once, you can just get the diff next time.
> How is the decompression time on 7zip versus bzip2 and gzip?

Decompression times on 2.6.18 are as follows:

gzip:   0m3.509s
7zip:   0m10.012s
bzip2:  0m22.703s

Dax Kelson


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
       [not found]   ` <20060921171747.9ae2b42e.seanlkml@sympatico.ca>
  2006-09-21 21:17     ` Sean
@ 2006-09-21 21:41     ` Dax Kelson
  2006-09-21 21:50       ` Bob Copeland
       [not found]       ` <20060921175717.272c58ee.seanlkml@sympatico.ca>
  1 sibling, 2 replies; 43+ messages in thread
From: Dax Kelson @ 2006-09-21 21:41 UTC (permalink / raw)
  To: Sean; +Cc: Lennart Sorensen, Linux kernel, Linus Torvalds

On Thu, 2006-09-21 at 17:17 -0400, Sean wrote:
> Not to mention that by using Git it will take care of all that for you.
> Downloading only the updates with no need for you to manually apply diffs
> etc..
> 
> Sean

Git users and tarball users are different audiences.

Dax Kelson


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-21 20:42 ` Lennart Sorensen
  2006-09-21 21:40   ` Dax Kelson
       [not found]   ` <20060921171747.9ae2b42e.seanlkml@sympatico.ca>
@ 2006-09-21 21:43   ` H. Peter Anvin
  2006-09-22 14:00     ` Lennart Sorensen
  2 siblings, 1 reply; 43+ messages in thread
From: H. Peter Anvin @ 2006-09-21 21:43 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Dax Kelson, Linux kernel, Linus Torvalds

Lennart Sorensen wrote:
> On Thu, Sep 21, 2006 at 02:32:57PM -0600, Dax Kelson wrote:
>> Today as I was watching the linux-2.6.18.tar.bz2 slowly download I
>> thought it would be nice if it could be made smaller.
>>
>> The 7zip program/algorithm is free software (LGPL) and can be obtained
>> from http://www.7-zip.org/ and it is distributed with several
>> distributions (it is in Fedora Core 6 extras for example).
>>
> 
> But after you download it once, you can just get the diff next time.
> How is the decompression time on 7zip versus bzip2 and gzip?
> 

7zip (LZMA) decompresses quickly, and the decompressor text is actually 
smaller than the equivalent for gzip.  Quite nice.

What is not nice is the code for the compressor, which is a total mess. 
  I have been holding out on implementing LZMA on kernel.org, because 
just as zip (deflate) didn't become common in the Unix world until an 
encapsulation format that handles things expected in the Unix world, 
e.g. streaming, was created (gzip), I don't think LZMA is going to be 
widely used until there is an "lzip" which does the same thing.  I 
actually started the work of adding LZMA support to gzip, but then 
realized it would be better if a new encapsulation format with proper 
64-bit support everywhere was created.

	-hpa


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-21 21:41     ` Dax Kelson
@ 2006-09-21 21:50       ` Bob Copeland
       [not found]       ` <20060921175717.272c58ee.seanlkml@sympatico.ca>
  1 sibling, 0 replies; 43+ messages in thread
From: Bob Copeland @ 2006-09-21 21:50 UTC (permalink / raw)
  To: Dax Kelson; +Cc: Sean, Lennart Sorensen, Linux kernel, Linus Torvalds

On 9/21/06, Dax Kelson <dax@gurulabs.com> wrote:
> Git users and tarball users are different audiences.

Try ketchup then.  http://www.selenic.com/ketchup/wiki/

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
       [not found]       ` <20060921175717.272c58ee.seanlkml@sympatico.ca>
@ 2006-09-21 21:57         ` Sean
  2006-09-21 22:00         ` David Lang
       [not found]         ` <Pin e.LNX.4.63.0609211455570.17238@qynat.qvtvafvgr.pbz>
  2 siblings, 0 replies; 43+ messages in thread
From: Sean @ 2006-09-21 21:57 UTC (permalink / raw)
  To: Dax Kelson; +Cc: Lennart Sorensen, Linux kernel, Linus Torvalds

On Thu, 21 Sep 2006 15:41:15 -0600
Dax Kelson <dax@gurulabs.com> wrote:

> 
> Git users and tarball users are different audiences.
> 

Don't see why that needs to be the case.  Git can even produce the
tarballs once you've synced up with kernel.org (see git-tar-tree).
People interested in conserving bandwidth should really consider
the use of Git.

Sean

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
       [not found]       ` <20060921175717.272c58ee.seanlkml@sympatico.ca>
  2006-09-21 21:57         ` Sean
@ 2006-09-21 22:00         ` David Lang
  2006-09-21 22:24           ` Dave Jones
       [not found]         ` <Pin e.LNX.4.63.0609211455570.17238@qynat.qvtvafvgr.pbz>
  2 siblings, 1 reply; 43+ messages in thread
From: David Lang @ 2006-09-21 22:00 UTC (permalink / raw)
  To: Sean; +Cc: Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds

On Thu, 21 Sep 2006, Sean wrote:

> On Thu, 21 Sep 2006 15:41:15 -0600
> Dax Kelson <dax@gurulabs.com> wrote:
>
>>
>> Git users and tarball users are different audiences.
>>
>
> Don't see why that needs to be the case.  Git can even produce the
> tarballs once you've synced up with kernel.org (see git-tar-tree).
> People interested in conserving bandwidth should really consider
> the use of Git.

yes,
   however git users are people who plan on following every kernel version for a 
while, tarball users are people who grab a copy of the kernel once in a while 
(probably not every version). for the tarball users they would have to grab 
multiple patches to get from the last thing that they have to whatever is 
current. and frankly they may not (and probably should not) trust the last thing 
that they have, as in many cases it's a distro patched kernel that may not be 
compatable with the vanilla kernel.

people who start downloading every revision should start useing git or patches, 
but not everyone needs it.

also people could be behind a firewall that prevents git from working properly, 
for them tarballs and patches are the right way of doing things.

David Lang

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-21 22:24           ` Dave Jones
@ 2006-09-21 22:16             ` David Lang
  2006-09-21 22:40               ` Dave Jones
  0 siblings, 1 reply; 43+ messages in thread
From: David Lang @ 2006-09-21 22:16 UTC (permalink / raw)
  To: Dave Jones
  Cc: Sean, Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds

On Thu, 21 Sep 2006, Dave Jones wrote:

> On Thu, Sep 21, 2006 at 03:00:48PM -0700, David Lang wrote:
>
> > for the tarball users they would have to grab
> > multiple patches to get from the last thing that they have to whatever is
> > current.
>
> ketchup solves that problem. One command brings any tree up to current.

so are you saying that ketchup should be used for _all_ access to the vanilla 
tree that isn't done via git?

if not then tarballs still have a place.

and how does ketchup deal with patched trees to start with?

> > also people could be behind a firewall that prevents git from working properly,
> > for them tarballs and patches are the right way of doing things.
>
> If they can't git through a firewall, they won't be able to wget a tarball through
> it either.

to work properly git should talk it's own protocol, http/ftp can be allowed (and 
authenticated) through firewalls that don't allow the git protocol.

David Lang

> 	Dave
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
       [not found]             ` <20060921182554.23044ca3.seanlkml@sympatico.ca>
@ 2006-09-21 22:20               ` David Lang
  0 siblings, 0 replies; 43+ messages in thread
From: David Lang @ 2006-09-21 22:20 UTC (permalink / raw)
  To: Sean; +Cc: Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds

On Thu, 21 Sep 2006, Sean wrote:

>> also people could be behind a firewall that prevents git from working properly,
>> for them tarballs and patches are the right way of doing things.
>
> I use git from behind a firewall everyday without a problem.  If you've seen
> such a problem yourself, a bug report would hopefully lead to a solution.

it's not a bug, it's simply the fact that git (properly) uses it's own port for 
it's own protocol, and not all firewalls allow access to that port. in some 
cases even where a person would have the ability to get the firewall changed 
they may not want to for other (political) reasons.

even if git tunneled over HTTP there would be firewalls that would require 
authentication that git wouldn't be able to do and would therefor block the 
access.

David Lang

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-21 22:00         ` David Lang
@ 2006-09-21 22:24           ` Dave Jones
  2006-09-21 22:16             ` David Lang
  0 siblings, 1 reply; 43+ messages in thread
From: Dave Jones @ 2006-09-21 22:24 UTC (permalink / raw)
  To: David Lang
  Cc: Sean, Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds

On Thu, Sep 21, 2006 at 03:00:48PM -0700, David Lang wrote:

 > for the tarball users they would have to grab 
 > multiple patches to get from the last thing that they have to whatever is 
 > current.

ketchup solves that problem. One command brings any tree up to current.

 > also people could be behind a firewall that prevents git from working properly, 
 > for them tarballs and patches are the right way of doing things.

If they can't git through a firewall, they won't be able to wget a tarball through
it either.

	Dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
       [not found]             ` <20060921182554.23044ca3.seanlkml@sympatico.ca>
@ 2006-09-21 22:25           ` Sean
       [not found]             ` <20060921182554.23044ca3.seanlkml@sympatico.ca>
  0 siblings, 1 reply; 43+ messages in thread
From: Sean @ 2006-09-21 22:25 UTC (permalink / raw)
  To: David Lang; +Cc: Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds

On Thu, 21 Sep 2006 15:00:48 -0700 (PDT)
David Lang <dlang@digitalinsight.com> wrote:

> yes,
>    however git users are people who plan on following every kernel version for a 
> while, tarball users are people who grab a copy of the kernel once in a while 
> (probably not every version). for the tarball users they would have to grab 
> multiple patches to get from the last thing that they have to whatever is 
> current. and frankly they may not (and probably should not) trust the last thing 
> that they have, as in many cases it's a distro patched kernel that may not be 
> compatable with the vanilla kernel.
>
> people who start downloading every revision should start useing git or patches, 
> but not everyone needs it.

Agreed, but for those people there isn't going to be much need (if any) to
worry about if the tar ball is in .gzip or .bzip2 or whatever then either.  And
that was the case that inspired the suggestion.
 
> also people could be behind a firewall that prevents git from working properly, 
> for them tarballs and patches are the right way of doing things.

I use git from behind a firewall everyday without a problem.  If you've seen
such a problem yourself, a bug report would hopefully lead to a solution.

Thanks,
Sean

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-21 22:40               ` Dave Jones
@ 2006-09-21 22:34                 ` David Lang
       [not found]                   ` <20060921193823.ec49d446.seanlkml@sympatico.ca>
  0 siblings, 1 reply; 43+ messages in thread
From: David Lang @ 2006-09-21 22:34 UTC (permalink / raw)
  To: Dave Jones
  Cc: Sean, Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds

On Thu, 21 Sep 2006, Dave Jones wrote:

> 
> On Thu, Sep 21, 2006 at 03:16:57PM -0700, David Lang wrote:
> > On Thu, 21 Sep 2006, Dave Jones wrote:
> >
> > > On Thu, Sep 21, 2006 at 03:00:48PM -0700, David Lang wrote:
> > >
> > > > for the tarball users they would have to grab
> > > > multiple patches to get from the last thing that they have to whatever is
> > > > current.
> > >
> > > ketchup solves that problem. One command brings any tree up to current.
> >
> > so are you saying that ketchup should be used for _all_ access to the vanilla
> > tree that isn't done via git?
> > if not then tarballs still have a place.
>
> I think you have a misunderstanding over what ketchup is/does.
> It cannot usurp tarballs by its very nature. It retrieves tarballs (if necessary)
> and whatever patches are necessary to get to the tree you want.
> http://www.selenic.com/ketchup/

in that case the compression of the tarballs is still worth dealing with

> > and how does ketchup deal with patched trees to start with?
>
> By unpatching if necessary.

assuming that it knows where to get the patches from, I was refering to things 
like the debian or redhat tree with their patches.

> > > > also people could be behind a firewall that prevents git from working properly,
> > > > for them tarballs and patches are the right way of doing things.
> > >
> > > If they can't git through a firewall, they won't be able to wget a tarball through
> > > it either.
> >
> > to work properly git should talk it's own protocol, http/ftp can be allowed (and
> > authenticated) through firewalls that don't allow the git protocol.
>
> 'properly' is the wrong word here. optimally, yes, but the firewall argument
> alone isn't sufficient to claim git can't be used to clone a tree.
> A tree cloned over http: vs one over git: has exactly the same information in
> it. All the history, all the changes. Everything.

in most cases, but there are cases where the dumb transports can make mistakes 
(there have been several threads on the git list covering these), git is good 
enough to notice mos of them, but there is still room for problems. Also, 
installing and configuring git should not be a prerequesite to getting the 
kernel.

the point being git and ketchup do not eliminate the need to transfer tarballs, 
and therfor do not eliminate the attractivness of a compression that saves a 
significant amount of bandwidth.

I was responding to the (apparent) argument that with git and ketchup people 
should not ever be downloading tarballs, so something that cuts the size of a 
tarball in half doesn't make any difference.

David Lang

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-21 22:16             ` David Lang
@ 2006-09-21 22:40               ` Dave Jones
  2006-09-21 22:34                 ` David Lang
  0 siblings, 1 reply; 43+ messages in thread
From: Dave Jones @ 2006-09-21 22:40 UTC (permalink / raw)
  To: David Lang
  Cc: Sean, Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds

On Thu, Sep 21, 2006 at 03:16:57PM -0700, David Lang wrote:
 > On Thu, 21 Sep 2006, Dave Jones wrote:
 > 
 > > On Thu, Sep 21, 2006 at 03:00:48PM -0700, David Lang wrote:
 > >
 > > > for the tarball users they would have to grab
 > > > multiple patches to get from the last thing that they have to whatever is
 > > > current.
 > >
 > > ketchup solves that problem. One command brings any tree up to current.
 > 
 > so are you saying that ketchup should be used for _all_ access to the vanilla 
 > tree that isn't done via git?
 > if not then tarballs still have a place.

I think you have a misunderstanding over what ketchup is/does.
It cannot usurp tarballs by its very nature. It retrieves tarballs (if necessary)
and whatever patches are necessary to get to the tree you want.
http://www.selenic.com/ketchup/
 
 > and how does ketchup deal with patched trees to start with?

By unpatching if necessary.

 > > > also people could be behind a firewall that prevents git from working properly,
 > > > for them tarballs and patches are the right way of doing things.
 > >
 > > If they can't git through a firewall, they won't be able to wget a tarball through
 > > it either.
 > 
 > to work properly git should talk it's own protocol, http/ftp can be allowed (and 
 > authenticated) through firewalls that don't allow the git protocol.

'properly' is the wrong word here. optimally, yes, but the firewall argument
alone isn't sufficient to claim git can't be used to clone a tree.
A tree cloned over http: vs one over git: has exactly the same information in
it. All the history, all the changes. Everything.

	Dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
       [not found]                   ` <20060921193823.ec49d446.seanlkml@sympatico.ca>
@ 2006-09-21 23:38                     ` Sean
  0 siblings, 0 replies; 43+ messages in thread
From: Sean @ 2006-09-21 23:38 UTC (permalink / raw)
  To: David Lang
  Cc: Dave Jones, Dax Kelson, Lennart Sorensen, Linux kernel,
	Linus Torvalds

On Thu, 21 Sep 2006 15:34:53 -0700 (PDT)
David Lang <dlang@digitalinsight.com> wrote:

> I was responding to the (apparent) argument that with git and ketchup people 
> should not ever be downloading tarballs, so something that cuts the size of a 
> tarball in half doesn't make any difference.

Sure there are some cases where tarballs are more appropriate, but with git
and maybe some of the other tools it should really be the minority situation.
I wonder how many people just use tarballs out of inertia.  All said though
saving a few bytes of bandwidth by making the tarballs smaller can't hurt.

Sean

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-21 21:43   ` H. Peter Anvin
@ 2006-09-22 14:00     ` Lennart Sorensen
  2006-09-22 16:13       ` H. Peter Anvin
  2006-09-22 16:13       ` Jan Engelhardt
  0 siblings, 2 replies; 43+ messages in thread
From: Lennart Sorensen @ 2006-09-22 14:00 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Dax Kelson, Linux kernel, Linus Torvalds

On Thu, Sep 21, 2006 at 02:43:46PM -0700, H. Peter Anvin wrote:
> 7zip (LZMA) decompresses quickly, and the decompressor text is actually 
> smaller than the equivalent for gzip.  Quite nice.
> 
> What is not nice is the code for the compressor, which is a total mess. 
>  I have been holding out on implementing LZMA on kernel.org, because 
> just as zip (deflate) didn't become common in the Unix world until an 
> encapsulation format that handles things expected in the Unix world, 
> e.g. streaming, was created (gzip), I don't think LZMA is going to be 
> widely used until there is an "lzip" which does the same thing.  I 
> actually started the work of adding LZMA support to gzip, but then 
> realized it would be better if a new encapsulation format with proper 
> 64-bit support everywhere was created.

It doesn't handle streaming?

So you can't do: tar c dirname | 7zip dirname.tar.7z ?

--
Len Sorensen
RuggedCom

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-21 21:40   ` Dax Kelson
@ 2006-09-22 14:00     ` Lennart Sorensen
  0 siblings, 0 replies; 43+ messages in thread
From: Lennart Sorensen @ 2006-09-22 14:00 UTC (permalink / raw)
  To: Dax Kelson; +Cc: Linux kernel, Linus Torvalds

On Thu, Sep 21, 2006 at 03:40:09PM -0600, Dax Kelson wrote:
> Decompression times on 2.6.18 are as follows:
> 
> gzip:   0m3.509s
> 7zip:   0m10.012s
> bzip2:  0m22.703s

Hmm, not bad.

--
Len Sorensen

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-22 14:00     ` Lennart Sorensen
@ 2006-09-22 16:13       ` H. Peter Anvin
  2006-09-22 16:13       ` Jan Engelhardt
  1 sibling, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2006-09-22 16:13 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Dax Kelson, Linux kernel, Linus Torvalds

Lennart Sorensen wrote:
> On Thu, Sep 21, 2006 at 02:43:46PM -0700, H. Peter Anvin wrote:
>> 7zip (LZMA) decompresses quickly, and the decompressor text is actually 
>> smaller than the equivalent for gzip.  Quite nice.
>>
>> What is not nice is the code for the compressor, which is a total mess. 
>>  I have been holding out on implementing LZMA on kernel.org, because 
>> just as zip (deflate) didn't become common in the Unix world until an 
>> encapsulation format that handles things expected in the Unix world, 
>> e.g. streaming, was created (gzip), I don't think LZMA is going to be 
>> widely used until there is an "lzip" which does the same thing.  I 
>> actually started the work of adding LZMA support to gzip, but then 
>> realized it would be better if a new encapsulation format with proper 
>> 64-bit support everywhere was created.
> 
> It doesn't handle streaming?
> 
> So you can't do: tar c dirname | 7zip dirname.tar.7z ?
> 

Nope, and in particular you can't do:

tar cf - dirname | 7zip | ssh ...

This is because 7zip is an archiving format in its own right, much like 
zip.  What we want is something that is to 7zip what gzip is to zip.

	-hpa

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-22 14:00     ` Lennart Sorensen
  2006-09-22 16:13       ` H. Peter Anvin
@ 2006-09-22 16:13       ` Jan Engelhardt
  2006-09-22 16:33         ` H. Peter Anvin
  1 sibling, 1 reply; 43+ messages in thread
From: Jan Engelhardt @ 2006-09-22 16:13 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: H. Peter Anvin, Dax Kelson, Linux kernel, Linus Torvalds

>> widely used until there is an "lzip" which does the same thing.  I 
>> actually started the work of adding LZMA support to gzip, but then 
>> realized it would be better if a new encapsulation format with proper 
>> 64-bit support everywhere was created.
>
>It doesn't handle streaming?
>
>So you can't do: tar c dirname | 7zip dirname.tar.7z ?

man 7z [slightly changed for reasonability]:

  -si
      Read data from StdIn (eg: tar -c directory | 7z a -si directory.tar.7z)



Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-22 16:13       ` Jan Engelhardt
@ 2006-09-22 16:33         ` H. Peter Anvin
  2006-09-22 17:41           ` Johannes Stezenbach
  0 siblings, 1 reply; 43+ messages in thread
From: H. Peter Anvin @ 2006-09-22 16:33 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Lennart Sorensen, Dax Kelson, Linux kernel, Linus Torvalds

Jan Engelhardt wrote:
>>> widely used until there is an "lzip" which does the same thing.  I 
>>> actually started the work of adding LZMA support to gzip, but then 
>>> realized it would be better if a new encapsulation format with proper 
>>> 64-bit support everywhere was created.
>> It doesn't handle streaming?
>>
>> So you can't do: tar c dirname | 7zip dirname.tar.7z ?
> 
> man 7z [slightly changed for reasonability]:
> 
>   -si
>       Read data from StdIn (eg: tar -c directory | 7z a -si directory.tar.7z)
> 

Yes, but you can't make it write to an unseekable stdout.

	-hpa

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-22 16:33         ` H. Peter Anvin
@ 2006-09-22 17:41           ` Johannes Stezenbach
  2006-09-22 18:09             ` H. Peter Anvin
  0 siblings, 1 reply; 43+ messages in thread
From: Johannes Stezenbach @ 2006-09-22 17:41 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jan Engelhardt, Lennart Sorensen, Dax Kelson, Linux kernel,
	Linus Torvalds

On Fri, Sep 22, 2006 at 09:33:01AM -0700, H. Peter Anvin wrote:
> Jan Engelhardt wrote:
> >>>widely used until there is an "lzip" which does the same thing.  I 
> >>>actually started the work of adding LZMA support to gzip, but then 
> >>>realized it would be better if a new encapsulation format with proper 
> >>>64-bit support everywhere was created.
> >>It doesn't handle streaming?
> >>
> >>So you can't do: tar c dirname | 7zip dirname.tar.7z ?
> >
> >man 7z [slightly changed for reasonability]:
> >
> >  -si
> >      Read data from StdIn (eg: tar -c directory | 7z a -si 
> >      directory.tar.7z)
> >
> 
> Yes, but you can't make it write to an unseekable stdout.

It seems the "lzma" program from LZMA Utils can:

http://tukaani.org/lzma/
  "Very similar command line interface than what gzip and bzip2 have."

(Debian sid has this in the "lzma" package.)


Johannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-22 17:41           ` Johannes Stezenbach
@ 2006-09-22 18:09             ` H. Peter Anvin
  2006-09-22 18:19               ` Michael Tokarev
  0 siblings, 1 reply; 43+ messages in thread
From: H. Peter Anvin @ 2006-09-22 18:09 UTC (permalink / raw)
  To: Johannes Stezenbach
  Cc: Jan Engelhardt, Lennart Sorensen, Dax Kelson, Linux kernel,
	Linus Torvalds

Johannes Stezenbach wrote:
> 
> It seems the "lzma" program from LZMA Utils can:
> 
> http://tukaani.org/lzma/
>   "Very similar command line interface than what gzip and bzip2 have."
> 
> (Debian sid has this in the "lzma" package.)
> 

Yes, it can.  If that's the way things go then I don't mind it, however, 
my biggest problem with lzma utils is that the command line parsing is 
done in a shell script wrapper.

Maybe I'll start using it anyway...

	-hpa


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-22 18:09             ` H. Peter Anvin
@ 2006-09-22 18:19               ` Michael Tokarev
  2006-09-22 18:26                 ` H. Peter Anvin
  0 siblings, 1 reply; 43+ messages in thread
From: Michael Tokarev @ 2006-09-22 18:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Johannes Stezenbach, Jan Engelhardt, Lennart Sorensen, Dax Kelson,
	Linux kernel, Linus Torvalds

H. Peter Anvin wrote:
> Johannes Stezenbach wrote:
>>
>> It seems the "lzma" program from LZMA Utils can:
>>
>> http://tukaani.org/lzma/
>>   "Very similar command line interface than what gzip and bzip2 have."
>>
>> (Debian sid has this in the "lzma" package.)
>>
> 
> Yes, it can.  If that's the way things go then I don't mind it, however,
> my biggest problem with lzma utils is that the command line parsing is
> done in a shell script wrapper.

Well, I don't see any shell code here, in /usr/bin/lzma as in istalled from
debian version 4.43-2.

But note that this lzma utility does not have any 'magic number' and does
no crc checks.  On the site it's said lzma(sdk) is under rewrite to support
new format with magic number and crc checks...

After reading this thread I wanted to teach GNU tar to automatically recognize
..tar.lzma archives - and failed, eactly because of the lack of magic number
at the start of a file...

/mjt

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-22 18:19               ` Michael Tokarev
@ 2006-09-22 18:26                 ` H. Peter Anvin
  2006-09-25 11:51                   ` Paulo Marques
  0 siblings, 1 reply; 43+ messages in thread
From: H. Peter Anvin @ 2006-09-22 18:26 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Johannes Stezenbach, Jan Engelhardt, Lennart Sorensen, Dax Kelson,
	Linux kernel, Linus Torvalds

Michael Tokarev wrote:
> 
> Well, I don't see any shell code here, in /usr/bin/lzma as in istalled from
> debian version 4.43-2.
> 
> But note that this lzma utility does not have any 'magic number' and does
> no crc checks.

Ah, right, that's a total killer.

> On the site it's said lzma(sdk) is under rewrite to support
> new format with magic number and crc checks...

That is an absolute must, IMO.  I would use the gzip format as a base.

	-hpa


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-22 18:26                 ` H. Peter Anvin
@ 2006-09-25 11:51                   ` Paulo Marques
  2006-09-25 15:47                     ` H. Peter Anvin
  0 siblings, 1 reply; 43+ messages in thread
From: Paulo Marques @ 2006-09-25 11:51 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Michael Tokarev, Johannes Stezenbach, Jan Engelhardt,
	Lennart Sorensen, Dax Kelson, Linux kernel, Linus Torvalds

H. Peter Anvin wrote:
> Michael Tokarev wrote:
>>[...]
>> On the site it's said lzma(sdk) is under rewrite to support
>> new format with magic number and crc checks...
> 
> That is an absolute must, IMO.  I would use the gzip format as a base.

If you're suggesting a gzip like format (but with different magic, 
etc.), that's ok.

However, it has been suggested on similar threads to use the CM field of 
the gzip format to introduce different compression methods.

While this is the purpose of this field, I find this to be a very bad 
idea. The worse part of it is that, after "lzma gzip" files start to 
proliferate, you never know if you can decompress a .gz with your 
version of gunzip, which is something that you currently have for granted.

If more formats start being supported inside gzip, this only gets worse...

-- 
Paulo Marques - www.grupopie.com

"The face of a child can say it all, especially the
mouth part of the face."

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-09-25 11:51                   ` Paulo Marques
@ 2006-09-25 15:47                     ` H. Peter Anvin
  0 siblings, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2006-09-25 15:47 UTC (permalink / raw)
  To: Paulo Marques
  Cc: Michael Tokarev, Johannes Stezenbach, Jan Engelhardt,
	Lennart Sorensen, Dax Kelson, Linux kernel, Linus Torvalds

Paulo Marques wrote:
> H. Peter Anvin wrote:
>> Michael Tokarev wrote:
>>> [...]
>>> On the site it's said lzma(sdk) is under rewrite to support
>>> new format with magic number and crc checks...
>>
>> That is an absolute must, IMO.  I would use the gzip format as a base.
> 
> If you're suggesting a gzip like format (but with different magic, 
> etc.), that's ok.
> 
> However, it has been suggested on similar threads to use the CM field of 
> the gzip format to introduce different compression methods.
> 
> While this is the purpose of this field, I find this to be a very bad 
> idea. The worse part of it is that, after "lzma gzip" files start to 
> proliferate, you never know if you can decompress a .gz with your 
> version of gunzip, which is something that you currently have for granted.
> 
> If more formats start being supported inside gzip, this only gets worse...
> 

Doesn't mean that one should name the files .gz.

A more significant reason to not do this is that I think there are a lot 
of programs out where which only check the magic number and not the 
compression format.

	-hpa

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-10-02  3:35 Smaller compressed kernel source tarballs? Drew Scott Daniels
@ 2006-10-02  3:32 ` Bernd Eckenfels
  2006-10-02  3:35 ` Willy Tarreau
  1 sibling, 0 replies; 43+ messages in thread
From: Bernd Eckenfels @ 2006-10-02  3:32 UTC (permalink / raw)
  To: linux-kernel

In article <20061002033511.GB12695@zimmer> you wrote:
> The pace of compression algorithm development is high enough that I'd
> suggest that the bar be placed quite high before switching to a new
> compression format that's not reverse compatible.
> 
> For those interested, I'm working on publishing a proof of concept that 
> can make most tarballs compress better. About 2-3% better in my tests 
> with bzip2/gzip on the Linux kernel source code.

3% is not a high bar.

Gruss
Bernd

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
@ 2006-10-02  3:35 Drew Scott Daniels
  2006-10-02  3:32 ` Bernd Eckenfels
  2006-10-02  3:35 ` Willy Tarreau
  0 siblings, 2 replies; 43+ messages in thread
From: Drew Scott Daniels @ 2006-10-02  3:35 UTC (permalink / raw)
  To: linux-kernel

ppmd, also in Debian had better compression than lzma. PAQ8i has even
better compression, but isn't in Debian. See the maximumcompression web
site or other archive comparison tests.

The pace of compression algorithm development is high enough that I'd
suggest that the bar be placed quite high before switching to a new
compression format that's not reverse compatible.

For those interested, I'm working on publishing a proof of concept that 
can make most tarballs compress better. About 2-3% better in my tests 
with bzip2/gzip on the Linux kernel source code.

     Drew Daniels
Resume: http://www.boxheap.net/ddaniels/resume.html


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-10-02  3:35 Smaller compressed kernel source tarballs? Drew Scott Daniels
  2006-10-02  3:32 ` Bernd Eckenfels
@ 2006-10-02  3:35 ` Willy Tarreau
       [not found]   ` <Pi ne.LNX.4.63.0610012205280.28534@qynat.qvtvafvgr.pbz>
                     ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: Willy Tarreau @ 2006-10-02  3:35 UTC (permalink / raw)
  To: Drew Scott Daniels; +Cc: linux-kernel

On Sun, Oct 01, 2006 at 10:35:11PM -0500, Drew Scott Daniels wrote:
> ppmd, also in Debian had better compression than lzma. PAQ8i has even
> better compression, but isn't in Debian. See the maximumcompression web
> site or other archive comparison tests.

Interesting. But I suspect that you have not checked the compression time.
PAQ8I for instance is between 100 and 300 times SLOWER than bzip2 to achieve
about 30% smaller ! Given that the kernel already takes a very long time to
compress with bzip2, it would take several hours to compress it with such
tools. While they're very interesting proofs of concept for compression
research, they're not suited to any real world usage !

> The pace of compression algorithm development is high enough that I'd
> suggest that the bar be placed quite high before switching to a new
> compression format that's not reverse compatible.

At least, ppmd takes the same time as bzip2 to achieve about 12% better
compression. But I don't think it justifies a switch.

> For those interested, I'm working on publishing a proof of concept that 
> can make most tarballs compress better. About 2-3% better in my tests 
> with bzip2/gzip on the Linux kernel source code.

A lot of improvement can be made in tar to compress better archive with
large number of small files such as the kernel. You just have to see the
difference in archive size depending on the base directory name. If you
come up with something really interesting which does not alter the output
format nor the compression time, it might get a place in the git-tar-tree
command. But IMHO, it would me more interesting to further reduce patches
size than tarballs size, since patches might be downloaded far more often.

Regards,
Willy


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-10-02  3:35 ` Willy Tarreau
       [not found]   ` <Pi ne.LNX.4.63.0610012205280.28534@qynat.qvtvafvgr.pbz>
@ 2006-10-02  5:11   ` David Lang
  2006-10-02  5:49     ` Willy Tarreau
  2006-10-02 15:16     ` Phillip Susi
  2006-10-03 10:28   ` Jan Engelhardt
  2 siblings, 2 replies; 43+ messages in thread
From: David Lang @ 2006-10-02  5:11 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Drew Scott Daniels, linux-kernel

On Mon, 2 Oct 2006, Willy Tarreau wrote:

> A lot of improvement can be made in tar to compress better archive with
> large number of small files such as the kernel. You just have to see the
> difference in archive size depending on the base directory name. If you
> come up with something really interesting which does not alter the output
> format nor the compression time, it might get a place in the git-tar-tree
> command. But IMHO, it would me more interesting to further reduce patches
> size than tarballs size, since patches might be downloaded far more often.

I just had what's probably a silly thought.

as an alturnative to useing tar, what about useing a git pack?

create a git archive with no history, just the current files, and then pack it 
with agressive delta options.

since git uses compression on the result anyway it's unlikly to be much worse 
then a tarball, and since it can use deltas across files it may even be better 
(potentially enough better to cover the cost of downloading the git binaries)

this would be especially effective once git adds a 'shallow clone' capability to 
then take the snapshot pack and extend it (either forward or backward as 
requested by the user), but may be worth doing even without this.

thoughts?

David Lang

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-10-02  5:11   ` David Lang
@ 2006-10-02  5:49     ` Willy Tarreau
  2006-10-02 15:16     ` Phillip Susi
  1 sibling, 0 replies; 43+ messages in thread
From: Willy Tarreau @ 2006-10-02  5:49 UTC (permalink / raw)
  To: David Lang; +Cc: Drew Scott Daniels, linux-kernel

On Sun, Oct 01, 2006 at 10:11:49PM -0700, David Lang wrote:
> On Mon, 2 Oct 2006, Willy Tarreau wrote:
> 
> >A lot of improvement can be made in tar to compress better archive with
> >large number of small files such as the kernel. You just have to see the
> >difference in archive size depending on the base directory name. If you
> >come up with something really interesting which does not alter the output
> >format nor the compression time, it might get a place in the git-tar-tree
> >command. But IMHO, it would me more interesting to further reduce patches
> >size than tarballs size, since patches might be downloaded far more often.
> 
> I just had what's probably a silly thought.
> 
> as an alturnative to useing tar, what about useing a git pack?

Nice idea, but I tried on 2.4 : 43 MB for git-pack vs 38 for tar.gz and
31 for tar.bz2. However, it is blazingly fast. 4 seconds vs 30 for tar.gz
(hot cache).

When speed is important, it's a clear winner. When size matters, it's not
the best solution.

Regards,
Willy


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-10-02  5:11   ` David Lang
  2006-10-02  5:49     ` Willy Tarreau
@ 2006-10-02 15:16     ` Phillip Susi
  2006-10-02 15:48       ` David Lang
  1 sibling, 1 reply; 43+ messages in thread
From: Phillip Susi @ 2006-10-02 15:16 UTC (permalink / raw)
  To: David Lang; +Cc: Willy Tarreau, Drew Scott Daniels, linux-kernel

David Lang wrote:
> I just had what's probably a silly thought.
> 
> as an alturnative to useing tar, what about useing a git pack?
> 
> create a git archive with no history, just the current files, and then 
> pack it with agressive delta options.
> 

Isn't that what a patch.gz is?  Diff generates the deltas and then they 
are compressed.  Can't get much simpler or better than that.

> since git uses compression on the result anyway it's unlikly to be much 
> worse then a tarball, and since it can use deltas across files it may 
> even be better (potentially enough better to cover the cost of 
> downloading the git binaries)
> 
> this would be especially effective once git adds a 'shallow clone' 
> capability to then take the snapshot pack and extend it (either forward 
> or backward as requested by the user), but may be worth doing even 
> without this.
> 
> thoughts?
> 
> David Lang


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-10-02 15:16     ` Phillip Susi
@ 2006-10-02 15:48       ` David Lang
  2006-10-02 20:20         ` Phillip Susi
  0 siblings, 1 reply; 43+ messages in thread
From: David Lang @ 2006-10-02 15:48 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Willy Tarreau, Drew Scott Daniels, linux-kernel

On Mon, 2 Oct 2006, Phillip Susi wrote:

> David Lang wrote:
>> I just had what's probably a silly thought.
>> 
>> as an alturnative to useing tar, what about useing a git pack?
>> 
>> create a git archive with no history, just the current files, and then pack 
>> it with agressive delta options.
>> 
>
> Isn't that what a patch.gz is?  Diff generates the deltas and then they are 
> compressed.  Can't get much simpler or better than that.

not quite, a git pack includes everythign you need to get the full source, a 
patch.gz requires that you have the prior version of the source to start with.

David Lang

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-10-02 20:20         ` Phillip Susi
@ 2006-10-02 20:12           ` David Lang
  2006-10-02 20:35             ` Willy Tarreau
       [not found]             ` <2006 1002203527.GA585@1wt.eu>
  0 siblings, 2 replies; 43+ messages in thread
From: David Lang @ 2006-10-02 20:12 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Willy Tarreau, Drew Scott Daniels, linux-kernel

no, I was suggesting a pack file that contained _only_ the head version.

within the pack file it would delta against other files in the pack (how many 
copies of the GPLv2 text exist across all files for example)

however Willy did a test and found that the resulting pack was significantly 
larger then a .tgz. I don't know what options he used, so while there's some 
chance that being more agressive in looking for deltas would result in an 
improvement, the difference to make up is fairly significant.

David Lang

On Mon, 2 Oct 2006, Phillip Susi wrote:

> Date: Mon, 02 Oct 2006 16:20:40 -0400
> From: Phillip Susi <psusi@cfl.rr.com>
> To: David Lang <dlang@digitalinsight.com>
> Cc: Willy Tarreau <w@1wt.eu>, Drew Scott Daniels <ddaniels@UMAlumni.mb.ca>,
>     linux-kernel@vger.kernel.org
> Subject: Re: Smaller compressed kernel source tarballs?
> 
> It sounded like you were talking about a modified pack file that did NOT 
> contain everything you need to get the current source.  You said it would 
> have no history and use aggressive delta compression to achieve a smaller 
> size than a full tarball.  If the pack contains the full previous version and 
> the delta to the head version, then it will be larger than the tar, not 
> smaller.
>
> David Lang wrote:
>> On Mon, 2 Oct 2006, Phillip Susi wrote:
>> 
>>> David Lang wrote:
>>>> I just had what's probably a silly thought.
>>>> 
>>>> as an alturnative to useing tar, what about useing a git pack?
>>>> 
>>>> create a git archive with no history, just the current files, and then 
>>>> pack it with agressive delta options.
>>>> 
>>> 
>>> Isn't that what a patch.gz is?  Diff generates the deltas and then they 
>>> are compressed.  Can't get much simpler or better than that.
>> 
>> not quite, a git pack includes everythign you need to get the full source, 
>> a patch.gz requires that you have the prior version of the source to start 
>> with.
>> 
>> David Lang
>
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-10-02 15:48       ` David Lang
@ 2006-10-02 20:20         ` Phillip Susi
  2006-10-02 20:12           ` David Lang
  0 siblings, 1 reply; 43+ messages in thread
From: Phillip Susi @ 2006-10-02 20:20 UTC (permalink / raw)
  To: David Lang; +Cc: Willy Tarreau, Drew Scott Daniels, linux-kernel

It sounded like you were talking about a modified pack file that did NOT 
contain everything you need to get the current source.  You said it 
would have no history and use aggressive delta compression to achieve a 
smaller size than a full tarball.  If the pack contains the full 
previous version and the delta to the head version, then it will be 
larger than the tar, not smaller.

David Lang wrote:
> On Mon, 2 Oct 2006, Phillip Susi wrote:
> 
>> David Lang wrote:
>>> I just had what's probably a silly thought.
>>>
>>> as an alturnative to useing tar, what about useing a git pack?
>>>
>>> create a git archive with no history, just the current files, and 
>>> then pack it with agressive delta options.
>>>
>>
>> Isn't that what a patch.gz is?  Diff generates the deltas and then 
>> they are compressed.  Can't get much simpler or better than that.
> 
> not quite, a git pack includes everythign you need to get the full 
> source, a patch.gz requires that you have the prior version of the 
> source to start with.
> 
> David Lang


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-10-02 20:12           ` David Lang
@ 2006-10-02 20:35             ` Willy Tarreau
       [not found]             ` <2006 1002203527.GA585@1wt.eu>
  1 sibling, 0 replies; 43+ messages in thread
From: Willy Tarreau @ 2006-10-02 20:35 UTC (permalink / raw)
  To: David Lang; +Cc: Phillip Susi, Drew Scott Daniels, linux-kernel

On Mon, Oct 02, 2006 at 01:12:55PM -0700, David Lang wrote:
> no, I was suggesting a pack file that contained _only_ the head version.
> 
> within the pack file it would delta against other files in the pack (how 
> many copies of the GPLv2 text exist across all files for example)
> 
> however Willy did a test and found that the resulting pack was 
> significantly larger then a .tgz. I don't know what options he used, so 
> while there's some chance that being more agressive in looking for deltas 
> would result in an improvement, the difference to make up is fairly 
> significant.

no options at all, so there may be room for improvement. Also, on my
notebook, I have hardlinked all my linux directories so that each
content only appears once. I don't have the numbers right here, but
I remember that it was really useful to merge lots of different versions,
but that the net gain within one given tree was really minor, as there
are not that many identical files in one tree.

Regards,
Willy


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
       [not found]                 ` <20061002174938.bb82027d.seanlkml@sympatico.ca>
@ 2006-10-02 21:42                   ` David Lang
  2006-10-03  2:48                   ` Willy Tarreau
  1 sibling, 0 replies; 43+ messages in thread
From: David Lang @ 2006-10-02 21:42 UTC (permalink / raw)
  To: Sean; +Cc: Willy Tarreau, Phillip Susi, Drew Scott Daniels, linux-kernel

On Mon, 2 Oct 2006, Sean wrote:

> On Mon, 2 Oct 2006 22:35:27 +0200
> Willy Tarreau <w@1wt.eu> wrote:
>
>> On Mon, Oct 02, 2006 at 01:12:55PM -0700, David Lang wrote:
>>> no, I was suggesting a pack file that contained _only_ the head version.
>>>
>>> within the pack file it would delta against other files in the pack (how
>>> many copies of the GPLv2 text exist across all files for example)
>>>
>>> however Willy did a test and found that the resulting pack was
>>> significantly larger then a .tgz. I don't know what options he used, so
>>> while there's some chance that being more agressive in looking for deltas
>>> would result in an improvement, the difference to make up is fairly
>>> significant.
>>
>> no options at all, so there may be room for improvement. Also, on my
>> notebook, I have hardlinked all my linux directories so that each
>> content only appears once. I don't have the numbers right here, but
>> I remember that it was really useful to merge lots of different versions,
>> but that the net gain within one given tree was really minor, as there
>> are not that many identical files in one tree.
>
> Hey Willy,
>
> I don't really understand the objective here, but you may want to double
> check your procedure, the entire 2.4 history only takes a single 41M pack
> in Git for me.

the idea was to use a git pack instead of a .tgz or .tar.bz2 as a distribution 
format from kernel.org

for example, the pack would only include the 2.6.18 kernel, no history.

once git supports shallow clones then the distributed blob could be a clone seed 
that a person could download and then track changes from there forward. but 
that's a future enhancement.

David Lang

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
       [not found]                 ` <20061002174938.bb82027d.seanlkml@sympatico.ca>
@ 2006-10-02 21:49               ` Sean
       [not found]                 ` <20061002174938.bb82027d.seanlkml@sympatico.ca>
  2006-10-03  2:48                   ` Willy Tarreau
  1 sibling, 1 reply; 43+ messages in thread
From: Sean @ 2006-10-02 21:49 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: David Lang, Phillip Susi, Drew Scott Daniels, linux-kernel

On Mon, 2 Oct 2006 22:35:27 +0200
Willy Tarreau <w@1wt.eu> wrote:

> On Mon, Oct 02, 2006 at 01:12:55PM -0700, David Lang wrote:
> > no, I was suggesting a pack file that contained _only_ the head version.
> > 
> > within the pack file it would delta against other files in the pack (how 
> > many copies of the GPLv2 text exist across all files for example)
> > 
> > however Willy did a test and found that the resulting pack was 
> > significantly larger then a .tgz. I don't know what options he used, so 
> > while there's some chance that being more agressive in looking for deltas 
> > would result in an improvement, the difference to make up is fairly 
> > significant.
> 
> no options at all, so there may be room for improvement. Also, on my
> notebook, I have hardlinked all my linux directories so that each
> content only appears once. I don't have the numbers right here, but
> I remember that it was really useful to merge lots of different versions,
> but that the net gain within one given tree was really minor, as there
> are not that many identical files in one tree.

Hey Willy,

I don't really understand the objective here, but you may want to double
check your procedure, the entire 2.4 history only takes a single 41M pack
in Git for me.

Sean

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
       [not found]                 ` <20061002174938.bb82027d.seanlkml@sympatico.ca>
  2006-10-02 21:42                   ` David Lang
@ 2006-10-03  2:48                   ` Willy Tarreau
  1 sibling, 0 replies; 43+ messages in thread
From: Willy Tarreau @ 2006-10-03  2:48 UTC (permalink / raw)
  To: Sean; +Cc: David Lang, Phillip Susi, Drew Scott Daniels, linux-kernel

On Mon, Oct 02, 2006 at 05:49:38PM -0400, Sean wrote:
> On Mon, 2 Oct 2006 22:35:27 +0200
> Willy Tarreau <w@1wt.eu> wrote:
> 
> > On Mon, Oct 02, 2006 at 01:12:55PM -0700, David Lang wrote:
> > > no, I was suggesting a pack file that contained _only_ the head version.
> > > 
> > > within the pack file it would delta against other files in the pack (how 
> > > many copies of the GPLv2 text exist across all files for example)
> > > 
> > > however Willy did a test and found that the resulting pack was 
> > > significantly larger then a .tgz. I don't know what options he used, so 
> > > while there's some chance that being more agressive in looking for deltas 
> > > would result in an improvement, the difference to make up is fairly 
> > > significant.
> > 
> > no options at all, so there may be room for improvement. Also, on my
> > notebook, I have hardlinked all my linux directories so that each
> > content only appears once. I don't have the numbers right here, but
> > I remember that it was really useful to merge lots of different versions,
> > but that the net gain within one given tree was really minor, as there
> > are not that many identical files in one tree.
> 
> Hey Willy,
> 
> I don't really understand the objective here, but you may want to double
> check your procedure, the entire 2.4 history only takes a single 41M pack
> in Git for me.

I'm not really surprized, as GIT history begins at 2.4.32 and recent
2.4 patches are very small. So basically, the size is about the same
for the latest 2.4 and all 2.4 history.

Willy


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-10-02  3:35 ` Willy Tarreau
       [not found]   ` <Pi ne.LNX.4.63.0610012205280.28534@qynat.qvtvafvgr.pbz>
  2006-10-02  5:11   ` David Lang
@ 2006-10-03 10:28   ` Jan Engelhardt
  2006-10-03 18:24     ` Phillip Susi
  2 siblings, 1 reply; 43+ messages in thread
From: Jan Engelhardt @ 2006-10-03 10:28 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Drew Scott Daniels, linux-kernel


>> ppmd, also in Debian had better compression than lzma. PAQ8i has even
>> better compression, but isn't in Debian. See the maximumcompression web
>> site or other archive comparison tests.
>
>Interesting. But I suspect that you have not checked the compression time.
>PAQ8I for instance is between 100 and 300 times SLOWER than bzip2 to achieve
>about 30% smaller ! Given that the kernel already takes a very long time to
>compress with bzip2, it would take several hours to compress it with such
>tools. While they're very interesting proofs of concept for compression
>research, they're not suited to any real world usage !

There are lots of obscure compression formats that achieve somewhat 
better compression at the cost of MUCH more time (neglecting they are 
not too open), such as MS CAB and ACE.


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Smaller compressed kernel source tarballs?
  2006-10-03 10:28   ` Jan Engelhardt
@ 2006-10-03 18:24     ` Phillip Susi
  2006-10-04 15:57       ` Compressing pages [was: Re: Smaller compressed kernel source tarballs?] Jörn Engel
  0 siblings, 1 reply; 43+ messages in thread
From: Phillip Susi @ 2006-10-03 18:24 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Willy Tarreau, Drew Scott Daniels, linux-kernel

Jan Engelhardt wrote:
> There are lots of obscure compression formats that achieve somewhat 
> better compression at the cost of MUCH more time (neglecting they are 
> not too open), such as MS CAB and ACE.

CAB is an archive container format, not a compression algorithm.  Last 
time I worked on some code to handle it, they used the standard LZW 
algorithm implemented by gzip ( but had the ability to support others in 
the future ) and could only compress 32kb blocks.  The small block size 
led to poor compression.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Compressing pages [was: Re: Smaller compressed kernel source tarballs?]
  2006-10-03 18:24     ` Phillip Susi
@ 2006-10-04 15:57       ` Jörn Engel
  0 siblings, 0 replies; 43+ messages in thread
From: Jörn Engel @ 2006-10-04 15:57 UTC (permalink / raw)
  To: Phillip Susi
  Cc: Jan Engelhardt, Willy Tarreau, Drew Scott Daniels, linux-kernel

On Tue, 3 October 2006 14:24:01 -0400, Phillip Susi wrote:
> Jan Engelhardt wrote:
> >There are lots of obscure compression formats that achieve somewhat 
> >better compression at the cost of MUCH more time (neglecting they are 
> >not too open), such as MS CAB and ACE.
> 
> CAB is an archive container format, not a compression algorithm.  Last 
> time I worked on some code to handle it, they used the standard LZW 
> algorithm implemented by gzip ( but had the ability to support others in 
> the future ) and could only compress 32kb blocks.  The small block size 
> led to poor compression.

Actually, compression in 4KiB blocks is a _very_ interesting
benchmark.  Jffs2 works with that size for compression and other
compressed filesystems likely do the same, although possibly with
something larger like 64KiB.

And the results are completely different in that benchmark.  Gzip
actually beats bzip2 hands-down on compression ratio, for example.

I used to have a script, but cannot find it anymore.  Basically
something like:

while (read next 4KiB from input file) {
	compress chunk
	add compressed_size to total
}
print total

Jörn

-- 
Unless something dramatically changes, by 2015 we'll be largely
wondering what all the fuss surrounding Linux was really about.
-- Rob Enderle

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2006-10-04 15:58 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-02  3:35 Smaller compressed kernel source tarballs? Drew Scott Daniels
2006-10-02  3:32 ` Bernd Eckenfels
2006-10-02  3:35 ` Willy Tarreau
     [not found]   ` <Pi ne.LNX.4.63.0610012205280.28534@qynat.qvtvafvgr.pbz>
2006-10-02  5:11   ` David Lang
2006-10-02  5:49     ` Willy Tarreau
2006-10-02 15:16     ` Phillip Susi
2006-10-02 15:48       ` David Lang
2006-10-02 20:20         ` Phillip Susi
2006-10-02 20:12           ` David Lang
2006-10-02 20:35             ` Willy Tarreau
     [not found]             ` <2006 1002203527.GA585@1wt.eu>
2006-10-02 21:49               ` Sean
     [not found]                 ` <20061002174938.bb82027d.seanlkml@sympatico.ca>
2006-10-02 21:42                   ` David Lang
2006-10-03  2:48                   ` Willy Tarreau
2006-10-03 10:28   ` Jan Engelhardt
2006-10-03 18:24     ` Phillip Susi
2006-10-04 15:57       ` Compressing pages [was: Re: Smaller compressed kernel source tarballs?] Jörn Engel
  -- strict thread matches above, loose matches on Subject: below --
2006-09-21 20:32 Smaller compressed kernel source tarballs? Dax Kelson
     [not found] ` <20060921204250 .GN13641@csclub.uwaterloo.ca>
2006-09-21 20:42 ` Lennart Sorensen
2006-09-21 21:40   ` Dax Kelson
2006-09-22 14:00     ` Lennart Sorensen
     [not found]   ` <20060921171747.9ae2b42e.seanlkml@sympatico.ca>
2006-09-21 21:17     ` Sean
2006-09-21 21:41     ` Dax Kelson
2006-09-21 21:50       ` Bob Copeland
     [not found]       ` <20060921175717.272c58ee.seanlkml@sympatico.ca>
2006-09-21 21:57         ` Sean
2006-09-21 22:00         ` David Lang
2006-09-21 22:24           ` Dave Jones
2006-09-21 22:16             ` David Lang
2006-09-21 22:40               ` Dave Jones
2006-09-21 22:34                 ` David Lang
     [not found]                   ` <20060921193823.ec49d446.seanlkml@sympatico.ca>
2006-09-21 23:38                     ` Sean
     [not found]         ` <Pin e.LNX.4.63.0609211455570.17238@qynat.qvtvafvgr.pbz>
2006-09-21 22:25           ` Sean
     [not found]             ` <20060921182554.23044ca3.seanlkml@sympatico.ca>
2006-09-21 22:20               ` David Lang
2006-09-21 21:43   ` H. Peter Anvin
2006-09-22 14:00     ` Lennart Sorensen
2006-09-22 16:13       ` H. Peter Anvin
2006-09-22 16:13       ` Jan Engelhardt
2006-09-22 16:33         ` H. Peter Anvin
2006-09-22 17:41           ` Johannes Stezenbach
2006-09-22 18:09             ` H. Peter Anvin
2006-09-22 18:19               ` Michael Tokarev
2006-09-22 18:26                 ` H. Peter Anvin
2006-09-25 11:51                   ` Paulo Marques
2006-09-25 15:47                     ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox