* Re: Smaller compressed kernel source tarballs?
@ 2006-10-02 3:35 Drew Scott Daniels
2006-10-02 3:32 ` Bernd Eckenfels
2006-10-02 3:35 ` Willy Tarreau
0 siblings, 2 replies; 43+ messages in thread
From: Drew Scott Daniels @ 2006-10-02 3:35 UTC (permalink / raw)
To: linux-kernel
ppmd, also in Debian had better compression than lzma. PAQ8i has even
better compression, but isn't in Debian. See the maximumcompression web
site or other archive comparison tests.
The pace of compression algorithm development is high enough that I'd
suggest that the bar be placed quite high before switching to a new
compression format that's not reverse compatible.
For those interested, I'm working on publishing a proof of concept that
can make most tarballs compress better. About 2-3% better in my tests
with bzip2/gzip on the Linux kernel source code.
Drew Daniels
Resume: http://www.boxheap.net/ddaniels/resume.html
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: Smaller compressed kernel source tarballs? 2006-10-02 3:35 Smaller compressed kernel source tarballs? Drew Scott Daniels @ 2006-10-02 3:32 ` Bernd Eckenfels 2006-10-02 3:35 ` Willy Tarreau 1 sibling, 0 replies; 43+ messages in thread From: Bernd Eckenfels @ 2006-10-02 3:32 UTC (permalink / raw) To: linux-kernel In article <20061002033511.GB12695@zimmer> you wrote: > The pace of compression algorithm development is high enough that I'd > suggest that the bar be placed quite high before switching to a new > compression format that's not reverse compatible. > > For those interested, I'm working on publishing a proof of concept that > can make most tarballs compress better. About 2-3% better in my tests > with bzip2/gzip on the Linux kernel source code. 3% is not a high bar. Gruss Bernd ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-10-02 3:35 Smaller compressed kernel source tarballs? Drew Scott Daniels 2006-10-02 3:32 ` Bernd Eckenfels @ 2006-10-02 3:35 ` Willy Tarreau [not found] ` <Pi ne.LNX.4.63.0610012205280.28534@qynat.qvtvafvgr.pbz> ` (2 more replies) 1 sibling, 3 replies; 43+ messages in thread From: Willy Tarreau @ 2006-10-02 3:35 UTC (permalink / raw) To: Drew Scott Daniels; +Cc: linux-kernel On Sun, Oct 01, 2006 at 10:35:11PM -0500, Drew Scott Daniels wrote: > ppmd, also in Debian had better compression than lzma. PAQ8i has even > better compression, but isn't in Debian. See the maximumcompression web > site or other archive comparison tests. Interesting. But I suspect that you have not checked the compression time. PAQ8I for instance is between 100 and 300 times SLOWER than bzip2 to achieve about 30% smaller ! Given that the kernel already takes a very long time to compress with bzip2, it would take several hours to compress it with such tools. While they're very interesting proofs of concept for compression research, they're not suited to any real world usage ! > The pace of compression algorithm development is high enough that I'd > suggest that the bar be placed quite high before switching to a new > compression format that's not reverse compatible. At least, ppmd takes the same time as bzip2 to achieve about 12% better compression. But I don't think it justifies a switch. > For those interested, I'm working on publishing a proof of concept that > can make most tarballs compress better. About 2-3% better in my tests > with bzip2/gzip on the Linux kernel source code. A lot of improvement can be made in tar to compress better archive with large number of small files such as the kernel. You just have to see the difference in archive size depending on the base directory name. If you come up with something really interesting which does not alter the output format nor the compression time, it might get a place in the git-tar-tree command. But IMHO, it would me more interesting to further reduce patches size than tarballs size, since patches might be downloaded far more often. Regards, Willy ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <Pi ne.LNX.4.63.0610012205280.28534@qynat.qvtvafvgr.pbz>]
* Re: Smaller compressed kernel source tarballs? 2006-10-02 3:35 ` Willy Tarreau [not found] ` <Pi ne.LNX.4.63.0610012205280.28534@qynat.qvtvafvgr.pbz> @ 2006-10-02 5:11 ` David Lang 2006-10-02 5:49 ` Willy Tarreau 2006-10-02 15:16 ` Phillip Susi 2006-10-03 10:28 ` Jan Engelhardt 2 siblings, 2 replies; 43+ messages in thread From: David Lang @ 2006-10-02 5:11 UTC (permalink / raw) To: Willy Tarreau; +Cc: Drew Scott Daniels, linux-kernel On Mon, 2 Oct 2006, Willy Tarreau wrote: > A lot of improvement can be made in tar to compress better archive with > large number of small files such as the kernel. You just have to see the > difference in archive size depending on the base directory name. If you > come up with something really interesting which does not alter the output > format nor the compression time, it might get a place in the git-tar-tree > command. But IMHO, it would me more interesting to further reduce patches > size than tarballs size, since patches might be downloaded far more often. I just had what's probably a silly thought. as an alturnative to useing tar, what about useing a git pack? create a git archive with no history, just the current files, and then pack it with agressive delta options. since git uses compression on the result anyway it's unlikly to be much worse then a tarball, and since it can use deltas across files it may even be better (potentially enough better to cover the cost of downloading the git binaries) this would be especially effective once git adds a 'shallow clone' capability to then take the snapshot pack and extend it (either forward or backward as requested by the user), but may be worth doing even without this. thoughts? David Lang ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-10-02 5:11 ` David Lang @ 2006-10-02 5:49 ` Willy Tarreau 2006-10-02 15:16 ` Phillip Susi 1 sibling, 0 replies; 43+ messages in thread From: Willy Tarreau @ 2006-10-02 5:49 UTC (permalink / raw) To: David Lang; +Cc: Drew Scott Daniels, linux-kernel On Sun, Oct 01, 2006 at 10:11:49PM -0700, David Lang wrote: > On Mon, 2 Oct 2006, Willy Tarreau wrote: > > >A lot of improvement can be made in tar to compress better archive with > >large number of small files such as the kernel. You just have to see the > >difference in archive size depending on the base directory name. If you > >come up with something really interesting which does not alter the output > >format nor the compression time, it might get a place in the git-tar-tree > >command. But IMHO, it would me more interesting to further reduce patches > >size than tarballs size, since patches might be downloaded far more often. > > I just had what's probably a silly thought. > > as an alturnative to useing tar, what about useing a git pack? Nice idea, but I tried on 2.4 : 43 MB for git-pack vs 38 for tar.gz and 31 for tar.bz2. However, it is blazingly fast. 4 seconds vs 30 for tar.gz (hot cache). When speed is important, it's a clear winner. When size matters, it's not the best solution. Regards, Willy ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-10-02 5:11 ` David Lang 2006-10-02 5:49 ` Willy Tarreau @ 2006-10-02 15:16 ` Phillip Susi 2006-10-02 15:48 ` David Lang 1 sibling, 1 reply; 43+ messages in thread From: Phillip Susi @ 2006-10-02 15:16 UTC (permalink / raw) To: David Lang; +Cc: Willy Tarreau, Drew Scott Daniels, linux-kernel David Lang wrote: > I just had what's probably a silly thought. > > as an alturnative to useing tar, what about useing a git pack? > > create a git archive with no history, just the current files, and then > pack it with agressive delta options. > Isn't that what a patch.gz is? Diff generates the deltas and then they are compressed. Can't get much simpler or better than that. > since git uses compression on the result anyway it's unlikly to be much > worse then a tarball, and since it can use deltas across files it may > even be better (potentially enough better to cover the cost of > downloading the git binaries) > > this would be especially effective once git adds a 'shallow clone' > capability to then take the snapshot pack and extend it (either forward > or backward as requested by the user), but may be worth doing even > without this. > > thoughts? > > David Lang ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-10-02 15:16 ` Phillip Susi @ 2006-10-02 15:48 ` David Lang 2006-10-02 20:20 ` Phillip Susi 0 siblings, 1 reply; 43+ messages in thread From: David Lang @ 2006-10-02 15:48 UTC (permalink / raw) To: Phillip Susi; +Cc: Willy Tarreau, Drew Scott Daniels, linux-kernel On Mon, 2 Oct 2006, Phillip Susi wrote: > David Lang wrote: >> I just had what's probably a silly thought. >> >> as an alturnative to useing tar, what about useing a git pack? >> >> create a git archive with no history, just the current files, and then pack >> it with agressive delta options. >> > > Isn't that what a patch.gz is? Diff generates the deltas and then they are > compressed. Can't get much simpler or better than that. not quite, a git pack includes everythign you need to get the full source, a patch.gz requires that you have the prior version of the source to start with. David Lang ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-10-02 15:48 ` David Lang @ 2006-10-02 20:20 ` Phillip Susi 2006-10-02 20:12 ` David Lang 0 siblings, 1 reply; 43+ messages in thread From: Phillip Susi @ 2006-10-02 20:20 UTC (permalink / raw) To: David Lang; +Cc: Willy Tarreau, Drew Scott Daniels, linux-kernel It sounded like you were talking about a modified pack file that did NOT contain everything you need to get the current source. You said it would have no history and use aggressive delta compression to achieve a smaller size than a full tarball. If the pack contains the full previous version and the delta to the head version, then it will be larger than the tar, not smaller. David Lang wrote: > On Mon, 2 Oct 2006, Phillip Susi wrote: > >> David Lang wrote: >>> I just had what's probably a silly thought. >>> >>> as an alturnative to useing tar, what about useing a git pack? >>> >>> create a git archive with no history, just the current files, and >>> then pack it with agressive delta options. >>> >> >> Isn't that what a patch.gz is? Diff generates the deltas and then >> they are compressed. Can't get much simpler or better than that. > > not quite, a git pack includes everythign you need to get the full > source, a patch.gz requires that you have the prior version of the > source to start with. > > David Lang ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-10-02 20:20 ` Phillip Susi @ 2006-10-02 20:12 ` David Lang 2006-10-02 20:35 ` Willy Tarreau [not found] ` <2006 1002203527.GA585@1wt.eu> 0 siblings, 2 replies; 43+ messages in thread From: David Lang @ 2006-10-02 20:12 UTC (permalink / raw) To: Phillip Susi; +Cc: Willy Tarreau, Drew Scott Daniels, linux-kernel no, I was suggesting a pack file that contained _only_ the head version. within the pack file it would delta against other files in the pack (how many copies of the GPLv2 text exist across all files for example) however Willy did a test and found that the resulting pack was significantly larger then a .tgz. I don't know what options he used, so while there's some chance that being more agressive in looking for deltas would result in an improvement, the difference to make up is fairly significant. David Lang On Mon, 2 Oct 2006, Phillip Susi wrote: > Date: Mon, 02 Oct 2006 16:20:40 -0400 > From: Phillip Susi <psusi@cfl.rr.com> > To: David Lang <dlang@digitalinsight.com> > Cc: Willy Tarreau <w@1wt.eu>, Drew Scott Daniels <ddaniels@UMAlumni.mb.ca>, > linux-kernel@vger.kernel.org > Subject: Re: Smaller compressed kernel source tarballs? > > It sounded like you were talking about a modified pack file that did NOT > contain everything you need to get the current source. You said it would > have no history and use aggressive delta compression to achieve a smaller > size than a full tarball. If the pack contains the full previous version and > the delta to the head version, then it will be larger than the tar, not > smaller. > > David Lang wrote: >> On Mon, 2 Oct 2006, Phillip Susi wrote: >> >>> David Lang wrote: >>>> I just had what's probably a silly thought. >>>> >>>> as an alturnative to useing tar, what about useing a git pack? >>>> >>>> create a git archive with no history, just the current files, and then >>>> pack it with agressive delta options. >>>> >>> >>> Isn't that what a patch.gz is? Diff generates the deltas and then they >>> are compressed. Can't get much simpler or better than that. >> >> not quite, a git pack includes everythign you need to get the full source, >> a patch.gz requires that you have the prior version of the source to start >> with. >> >> David Lang > > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-10-02 20:12 ` David Lang @ 2006-10-02 20:35 ` Willy Tarreau [not found] ` <2006 1002203527.GA585@1wt.eu> 1 sibling, 0 replies; 43+ messages in thread From: Willy Tarreau @ 2006-10-02 20:35 UTC (permalink / raw) To: David Lang; +Cc: Phillip Susi, Drew Scott Daniels, linux-kernel On Mon, Oct 02, 2006 at 01:12:55PM -0700, David Lang wrote: > no, I was suggesting a pack file that contained _only_ the head version. > > within the pack file it would delta against other files in the pack (how > many copies of the GPLv2 text exist across all files for example) > > however Willy did a test and found that the resulting pack was > significantly larger then a .tgz. I don't know what options he used, so > while there's some chance that being more agressive in looking for deltas > would result in an improvement, the difference to make up is fairly > significant. no options at all, so there may be room for improvement. Also, on my notebook, I have hardlinked all my linux directories so that each content only appears once. I don't have the numbers right here, but I remember that it was really useful to merge lots of different versions, but that the net gain within one given tree was really minor, as there are not that many identical files in one tree. Regards, Willy ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <2006 1002203527.GA585@1wt.eu>]
* Re: Smaller compressed kernel source tarballs? [not found] ` <20061002174938.bb82027d.seanlkml@sympatico.ca> @ 2006-10-02 21:49 ` Sean [not found] ` <20061002174938.bb82027d.seanlkml@sympatico.ca> 2006-10-03 2:48 ` Willy Tarreau 1 sibling, 1 reply; 43+ messages in thread From: Sean @ 2006-10-02 21:49 UTC (permalink / raw) To: Willy Tarreau; +Cc: David Lang, Phillip Susi, Drew Scott Daniels, linux-kernel On Mon, 2 Oct 2006 22:35:27 +0200 Willy Tarreau <w@1wt.eu> wrote: > On Mon, Oct 02, 2006 at 01:12:55PM -0700, David Lang wrote: > > no, I was suggesting a pack file that contained _only_ the head version. > > > > within the pack file it would delta against other files in the pack (how > > many copies of the GPLv2 text exist across all files for example) > > > > however Willy did a test and found that the resulting pack was > > significantly larger then a .tgz. I don't know what options he used, so > > while there's some chance that being more agressive in looking for deltas > > would result in an improvement, the difference to make up is fairly > > significant. > > no options at all, so there may be room for improvement. Also, on my > notebook, I have hardlinked all my linux directories so that each > content only appears once. I don't have the numbers right here, but > I remember that it was really useful to merge lots of different versions, > but that the net gain within one given tree was really minor, as there > are not that many identical files in one tree. Hey Willy, I don't really understand the objective here, but you may want to double check your procedure, the entire 2.4 history only takes a single 41M pack in Git for me. Sean ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20061002174938.bb82027d.seanlkml@sympatico.ca>]
* Re: Smaller compressed kernel source tarballs? [not found] ` <20061002174938.bb82027d.seanlkml@sympatico.ca> @ 2006-10-02 21:42 ` David Lang 2006-10-03 2:48 ` Willy Tarreau 1 sibling, 0 replies; 43+ messages in thread From: David Lang @ 2006-10-02 21:42 UTC (permalink / raw) To: Sean; +Cc: Willy Tarreau, Phillip Susi, Drew Scott Daniels, linux-kernel On Mon, 2 Oct 2006, Sean wrote: > On Mon, 2 Oct 2006 22:35:27 +0200 > Willy Tarreau <w@1wt.eu> wrote: > >> On Mon, Oct 02, 2006 at 01:12:55PM -0700, David Lang wrote: >>> no, I was suggesting a pack file that contained _only_ the head version. >>> >>> within the pack file it would delta against other files in the pack (how >>> many copies of the GPLv2 text exist across all files for example) >>> >>> however Willy did a test and found that the resulting pack was >>> significantly larger then a .tgz. I don't know what options he used, so >>> while there's some chance that being more agressive in looking for deltas >>> would result in an improvement, the difference to make up is fairly >>> significant. >> >> no options at all, so there may be room for improvement. Also, on my >> notebook, I have hardlinked all my linux directories so that each >> content only appears once. I don't have the numbers right here, but >> I remember that it was really useful to merge lots of different versions, >> but that the net gain within one given tree was really minor, as there >> are not that many identical files in one tree. > > Hey Willy, > > I don't really understand the objective here, but you may want to double > check your procedure, the entire 2.4 history only takes a single 41M pack > in Git for me. the idea was to use a git pack instead of a .tgz or .tar.bz2 as a distribution format from kernel.org for example, the pack would only include the 2.6.18 kernel, no history. once git supports shallow clones then the distributed blob could be a clone seed that a person could download and then track changes from there forward. but that's a future enhancement. David Lang ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? [not found] ` <20061002174938.bb82027d.seanlkml@sympatico.ca> 2006-10-02 21:42 ` David Lang @ 2006-10-03 2:48 ` Willy Tarreau 1 sibling, 0 replies; 43+ messages in thread From: Willy Tarreau @ 2006-10-03 2:48 UTC (permalink / raw) To: Sean; +Cc: David Lang, Phillip Susi, Drew Scott Daniels, linux-kernel On Mon, Oct 02, 2006 at 05:49:38PM -0400, Sean wrote: > On Mon, 2 Oct 2006 22:35:27 +0200 > Willy Tarreau <w@1wt.eu> wrote: > > > On Mon, Oct 02, 2006 at 01:12:55PM -0700, David Lang wrote: > > > no, I was suggesting a pack file that contained _only_ the head version. > > > > > > within the pack file it would delta against other files in the pack (how > > > many copies of the GPLv2 text exist across all files for example) > > > > > > however Willy did a test and found that the resulting pack was > > > significantly larger then a .tgz. I don't know what options he used, so > > > while there's some chance that being more agressive in looking for deltas > > > would result in an improvement, the difference to make up is fairly > > > significant. > > > > no options at all, so there may be room for improvement. Also, on my > > notebook, I have hardlinked all my linux directories so that each > > content only appears once. I don't have the numbers right here, but > > I remember that it was really useful to merge lots of different versions, > > but that the net gain within one given tree was really minor, as there > > are not that many identical files in one tree. > > Hey Willy, > > I don't really understand the objective here, but you may want to double > check your procedure, the entire 2.4 history only takes a single 41M pack > in Git for me. I'm not really surprized, as GIT history begins at 2.4.32 and recent 2.4 patches are very small. So basically, the size is about the same for the latest 2.4 and all 2.4 history. Willy ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-10-02 3:35 ` Willy Tarreau [not found] ` <Pi ne.LNX.4.63.0610012205280.28534@qynat.qvtvafvgr.pbz> 2006-10-02 5:11 ` David Lang @ 2006-10-03 10:28 ` Jan Engelhardt 2006-10-03 18:24 ` Phillip Susi 2 siblings, 1 reply; 43+ messages in thread From: Jan Engelhardt @ 2006-10-03 10:28 UTC (permalink / raw) To: Willy Tarreau; +Cc: Drew Scott Daniels, linux-kernel >> ppmd, also in Debian had better compression than lzma. PAQ8i has even >> better compression, but isn't in Debian. See the maximumcompression web >> site or other archive comparison tests. > >Interesting. But I suspect that you have not checked the compression time. >PAQ8I for instance is between 100 and 300 times SLOWER than bzip2 to achieve >about 30% smaller ! Given that the kernel already takes a very long time to >compress with bzip2, it would take several hours to compress it with such >tools. While they're very interesting proofs of concept for compression >research, they're not suited to any real world usage ! There are lots of obscure compression formats that achieve somewhat better compression at the cost of MUCH more time (neglecting they are not too open), such as MS CAB and ACE. Jan Engelhardt -- ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-10-03 10:28 ` Jan Engelhardt @ 2006-10-03 18:24 ` Phillip Susi 2006-10-04 15:57 ` Compressing pages [was: Re: Smaller compressed kernel source tarballs?] Jörn Engel 0 siblings, 1 reply; 43+ messages in thread From: Phillip Susi @ 2006-10-03 18:24 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Willy Tarreau, Drew Scott Daniels, linux-kernel Jan Engelhardt wrote: > There are lots of obscure compression formats that achieve somewhat > better compression at the cost of MUCH more time (neglecting they are > not too open), such as MS CAB and ACE. CAB is an archive container format, not a compression algorithm. Last time I worked on some code to handle it, they used the standard LZW algorithm implemented by gzip ( but had the ability to support others in the future ) and could only compress 32kb blocks. The small block size led to poor compression. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Compressing pages [was: Re: Smaller compressed kernel source tarballs?] 2006-10-03 18:24 ` Phillip Susi @ 2006-10-04 15:57 ` Jörn Engel 0 siblings, 0 replies; 43+ messages in thread From: Jörn Engel @ 2006-10-04 15:57 UTC (permalink / raw) To: Phillip Susi Cc: Jan Engelhardt, Willy Tarreau, Drew Scott Daniels, linux-kernel On Tue, 3 October 2006 14:24:01 -0400, Phillip Susi wrote: > Jan Engelhardt wrote: > >There are lots of obscure compression formats that achieve somewhat > >better compression at the cost of MUCH more time (neglecting they are > >not too open), such as MS CAB and ACE. > > CAB is an archive container format, not a compression algorithm. Last > time I worked on some code to handle it, they used the standard LZW > algorithm implemented by gzip ( but had the ability to support others in > the future ) and could only compress 32kb blocks. The small block size > led to poor compression. Actually, compression in 4KiB blocks is a _very_ interesting benchmark. Jffs2 works with that size for compression and other compressed filesystems likely do the same, although possibly with something larger like 64KiB. And the results are completely different in that benchmark. Gzip actually beats bzip2 hands-down on compression ratio, for example. I used to have a script, but cannot find it anymore. Basically something like: while (read next 4KiB from input file) { compress chunk add compressed_size to total } print total Jörn -- Unless something dramatically changes, by 2015 we'll be largely wondering what all the fuss surrounding Linux was really about. -- Rob Enderle ^ permalink raw reply [flat|nested] 43+ messages in thread
* Smaller compressed kernel source tarballs?
@ 2006-09-21 20:32 Dax Kelson
[not found] ` <20060921204250 .GN13641@csclub.uwaterloo.ca>
2006-09-21 20:42 ` Lennart Sorensen
0 siblings, 2 replies; 43+ messages in thread
From: Dax Kelson @ 2006-09-21 20:32 UTC (permalink / raw)
To: Linux kernel; +Cc: Linus Torvalds
Today as I was watching the linux-2.6.18.tar.bz2 slowly download I
thought it would be nice if it could be made smaller.
The 7zip program/algorithm is free software (LGPL) and can be obtained
from http://www.7-zip.org/ and it is distributed with several
distributions (it is in Fedora Core 6 extras for example).
Here are the numbers:
ls -al
-rw-r--r-- 1 root root 240138240 Sep 21 13:55 linux-2.6.18.tar
-rw-r--r-- 1 root root 34180796 Sep 21 13:42 linux-2.6.18.tar.7z
-rw-r--r-- 1 root root 41863580 Sep 21 13:45 linux-2.6.18.tar.bz2
-rw-r--r-- 1 root root 52467357 Sep 21 13:13 linux-2.6.18.tar.gz
ls -alh
-rw-r--r-- 1 root root 230M Sep 21 13:55 linux-2.6.18.tar
-rw-r--r-- 1 root root 33M Sep 21 13:42 linux-2.6.18.tar.7z
-rw-r--r-- 1 root root 40M Sep 21 13:45 linux-2.6.18.tar.bz2
-rw-r--r-- 1 root root 51M Sep 21 13:13 linux-2.6.18.tar.gz
Smaller the better, especially with the international audience.
Dax Kelson
^ permalink raw reply [flat|nested] 43+ messages in thread[parent not found: <20060921204250 .GN13641@csclub.uwaterloo.ca>]
* Re: Smaller compressed kernel source tarballs? 2006-09-21 20:32 Smaller compressed kernel source tarballs? Dax Kelson [not found] ` <20060921204250 .GN13641@csclub.uwaterloo.ca> @ 2006-09-21 20:42 ` Lennart Sorensen [not found] ` <20060921171747.9ae2b42e.seanlkml@sympatico.ca> ` (2 more replies) 1 sibling, 3 replies; 43+ messages in thread From: Lennart Sorensen @ 2006-09-21 20:42 UTC (permalink / raw) To: Dax Kelson; +Cc: Linux kernel, Linus Torvalds On Thu, Sep 21, 2006 at 02:32:57PM -0600, Dax Kelson wrote: > Today as I was watching the linux-2.6.18.tar.bz2 slowly download I > thought it would be nice if it could be made smaller. > > The 7zip program/algorithm is free software (LGPL) and can be obtained > from http://www.7-zip.org/ and it is distributed with several > distributions (it is in Fedora Core 6 extras for example). > > Here are the numbers: > > ls -al > -rw-r--r-- 1 root root 240138240 Sep 21 13:55 linux-2.6.18.tar > -rw-r--r-- 1 root root 34180796 Sep 21 13:42 linux-2.6.18.tar.7z > -rw-r--r-- 1 root root 41863580 Sep 21 13:45 linux-2.6.18.tar.bz2 > -rw-r--r-- 1 root root 52467357 Sep 21 13:13 linux-2.6.18.tar.gz > > ls -alh > -rw-r--r-- 1 root root 230M Sep 21 13:55 linux-2.6.18.tar > -rw-r--r-- 1 root root 33M Sep 21 13:42 linux-2.6.18.tar.7z > -rw-r--r-- 1 root root 40M Sep 21 13:45 linux-2.6.18.tar.bz2 > -rw-r--r-- 1 root root 51M Sep 21 13:13 linux-2.6.18.tar.gz > > Smaller the better, especially with the international audience. But after you download it once, you can just get the diff next time. How is the decompression time on 7zip versus bzip2 and gzip? -- Len Sorensen ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20060921171747.9ae2b42e.seanlkml@sympatico.ca>]
* Re: Smaller compressed kernel source tarballs? [not found] ` <20060921171747.9ae2b42e.seanlkml@sympatico.ca> @ 2006-09-21 21:17 ` Sean 2006-09-21 21:41 ` Dax Kelson 1 sibling, 0 replies; 43+ messages in thread From: Sean @ 2006-09-21 21:17 UTC (permalink / raw) To: Lennart Sorensen; +Cc: Dax Kelson, Linux kernel, Linus Torvalds On Thu, 21 Sep 2006 16:42:50 -0400 Lennart Sorensen <lsorense@csclub.uwaterloo.ca> wrote: > On Thu, Sep 21, 2006 at 02:32:57PM -0600, Dax Kelson wrote: > > Today as I was watching the linux-2.6.18.tar.bz2 slowly download I > > thought it would be nice if it could be made smaller. [...] > But after you download it once, you can just get the diff next time. > How is the decompression time on 7zip versus bzip2 and gzip? Not to mention that by using Git it will take care of all that for you. Downloading only the updates with no need for you to manually apply diffs etc.. Sean ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? [not found] ` <20060921171747.9ae2b42e.seanlkml@sympatico.ca> 2006-09-21 21:17 ` Sean @ 2006-09-21 21:41 ` Dax Kelson 2006-09-21 21:50 ` Bob Copeland [not found] ` <20060921175717.272c58ee.seanlkml@sympatico.ca> 1 sibling, 2 replies; 43+ messages in thread From: Dax Kelson @ 2006-09-21 21:41 UTC (permalink / raw) To: Sean; +Cc: Lennart Sorensen, Linux kernel, Linus Torvalds On Thu, 2006-09-21 at 17:17 -0400, Sean wrote: > Not to mention that by using Git it will take care of all that for you. > Downloading only the updates with no need for you to manually apply diffs > etc.. > > Sean Git users and tarball users are different audiences. Dax Kelson ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-21 21:41 ` Dax Kelson @ 2006-09-21 21:50 ` Bob Copeland [not found] ` <20060921175717.272c58ee.seanlkml@sympatico.ca> 1 sibling, 0 replies; 43+ messages in thread From: Bob Copeland @ 2006-09-21 21:50 UTC (permalink / raw) To: Dax Kelson; +Cc: Sean, Lennart Sorensen, Linux kernel, Linus Torvalds On 9/21/06, Dax Kelson <dax@gurulabs.com> wrote: > Git users and tarball users are different audiences. Try ketchup then. http://www.selenic.com/ketchup/wiki/ ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20060921175717.272c58ee.seanlkml@sympatico.ca>]
* Re: Smaller compressed kernel source tarballs? [not found] ` <20060921175717.272c58ee.seanlkml@sympatico.ca> @ 2006-09-21 21:57 ` Sean 2006-09-21 22:00 ` David Lang [not found] ` <Pin e.LNX.4.63.0609211455570.17238@qynat.qvtvafvgr.pbz> 2 siblings, 0 replies; 43+ messages in thread From: Sean @ 2006-09-21 21:57 UTC (permalink / raw) To: Dax Kelson; +Cc: Lennart Sorensen, Linux kernel, Linus Torvalds On Thu, 21 Sep 2006 15:41:15 -0600 Dax Kelson <dax@gurulabs.com> wrote: > > Git users and tarball users are different audiences. > Don't see why that needs to be the case. Git can even produce the tarballs once you've synced up with kernel.org (see git-tar-tree). People interested in conserving bandwidth should really consider the use of Git. Sean ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? [not found] ` <20060921175717.272c58ee.seanlkml@sympatico.ca> 2006-09-21 21:57 ` Sean @ 2006-09-21 22:00 ` David Lang 2006-09-21 22:24 ` Dave Jones [not found] ` <Pin e.LNX.4.63.0609211455570.17238@qynat.qvtvafvgr.pbz> 2 siblings, 1 reply; 43+ messages in thread From: David Lang @ 2006-09-21 22:00 UTC (permalink / raw) To: Sean; +Cc: Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds On Thu, 21 Sep 2006, Sean wrote: > On Thu, 21 Sep 2006 15:41:15 -0600 > Dax Kelson <dax@gurulabs.com> wrote: > >> >> Git users and tarball users are different audiences. >> > > Don't see why that needs to be the case. Git can even produce the > tarballs once you've synced up with kernel.org (see git-tar-tree). > People interested in conserving bandwidth should really consider > the use of Git. yes, however git users are people who plan on following every kernel version for a while, tarball users are people who grab a copy of the kernel once in a while (probably not every version). for the tarball users they would have to grab multiple patches to get from the last thing that they have to whatever is current. and frankly they may not (and probably should not) trust the last thing that they have, as in many cases it's a distro patched kernel that may not be compatable with the vanilla kernel. people who start downloading every revision should start useing git or patches, but not everyone needs it. also people could be behind a firewall that prevents git from working properly, for them tarballs and patches are the right way of doing things. David Lang ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-21 22:00 ` David Lang @ 2006-09-21 22:24 ` Dave Jones 2006-09-21 22:16 ` David Lang 0 siblings, 1 reply; 43+ messages in thread From: Dave Jones @ 2006-09-21 22:24 UTC (permalink / raw) To: David Lang Cc: Sean, Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds On Thu, Sep 21, 2006 at 03:00:48PM -0700, David Lang wrote: > for the tarball users they would have to grab > multiple patches to get from the last thing that they have to whatever is > current. ketchup solves that problem. One command brings any tree up to current. > also people could be behind a firewall that prevents git from working properly, > for them tarballs and patches are the right way of doing things. If they can't git through a firewall, they won't be able to wget a tarball through it either. Dave ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-21 22:24 ` Dave Jones @ 2006-09-21 22:16 ` David Lang 2006-09-21 22:40 ` Dave Jones 0 siblings, 1 reply; 43+ messages in thread From: David Lang @ 2006-09-21 22:16 UTC (permalink / raw) To: Dave Jones Cc: Sean, Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds On Thu, 21 Sep 2006, Dave Jones wrote: > On Thu, Sep 21, 2006 at 03:00:48PM -0700, David Lang wrote: > > > for the tarball users they would have to grab > > multiple patches to get from the last thing that they have to whatever is > > current. > > ketchup solves that problem. One command brings any tree up to current. so are you saying that ketchup should be used for _all_ access to the vanilla tree that isn't done via git? if not then tarballs still have a place. and how does ketchup deal with patched trees to start with? > > also people could be behind a firewall that prevents git from working properly, > > for them tarballs and patches are the right way of doing things. > > If they can't git through a firewall, they won't be able to wget a tarball through > it either. to work properly git should talk it's own protocol, http/ftp can be allowed (and authenticated) through firewalls that don't allow the git protocol. David Lang > Dave > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-21 22:16 ` David Lang @ 2006-09-21 22:40 ` Dave Jones 2006-09-21 22:34 ` David Lang 0 siblings, 1 reply; 43+ messages in thread From: Dave Jones @ 2006-09-21 22:40 UTC (permalink / raw) To: David Lang Cc: Sean, Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds On Thu, Sep 21, 2006 at 03:16:57PM -0700, David Lang wrote: > On Thu, 21 Sep 2006, Dave Jones wrote: > > > On Thu, Sep 21, 2006 at 03:00:48PM -0700, David Lang wrote: > > > > > for the tarball users they would have to grab > > > multiple patches to get from the last thing that they have to whatever is > > > current. > > > > ketchup solves that problem. One command brings any tree up to current. > > so are you saying that ketchup should be used for _all_ access to the vanilla > tree that isn't done via git? > if not then tarballs still have a place. I think you have a misunderstanding over what ketchup is/does. It cannot usurp tarballs by its very nature. It retrieves tarballs (if necessary) and whatever patches are necessary to get to the tree you want. http://www.selenic.com/ketchup/ > and how does ketchup deal with patched trees to start with? By unpatching if necessary. > > > also people could be behind a firewall that prevents git from working properly, > > > for them tarballs and patches are the right way of doing things. > > > > If they can't git through a firewall, they won't be able to wget a tarball through > > it either. > > to work properly git should talk it's own protocol, http/ftp can be allowed (and > authenticated) through firewalls that don't allow the git protocol. 'properly' is the wrong word here. optimally, yes, but the firewall argument alone isn't sufficient to claim git can't be used to clone a tree. A tree cloned over http: vs one over git: has exactly the same information in it. All the history, all the changes. Everything. Dave ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-21 22:40 ` Dave Jones @ 2006-09-21 22:34 ` David Lang [not found] ` <20060921193823.ec49d446.seanlkml@sympatico.ca> 0 siblings, 1 reply; 43+ messages in thread From: David Lang @ 2006-09-21 22:34 UTC (permalink / raw) To: Dave Jones Cc: Sean, Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds On Thu, 21 Sep 2006, Dave Jones wrote: > > On Thu, Sep 21, 2006 at 03:16:57PM -0700, David Lang wrote: > > On Thu, 21 Sep 2006, Dave Jones wrote: > > > > > On Thu, Sep 21, 2006 at 03:00:48PM -0700, David Lang wrote: > > > > > > > for the tarball users they would have to grab > > > > multiple patches to get from the last thing that they have to whatever is > > > > current. > > > > > > ketchup solves that problem. One command brings any tree up to current. > > > > so are you saying that ketchup should be used for _all_ access to the vanilla > > tree that isn't done via git? > > if not then tarballs still have a place. > > I think you have a misunderstanding over what ketchup is/does. > It cannot usurp tarballs by its very nature. It retrieves tarballs (if necessary) > and whatever patches are necessary to get to the tree you want. > http://www.selenic.com/ketchup/ in that case the compression of the tarballs is still worth dealing with > > and how does ketchup deal with patched trees to start with? > > By unpatching if necessary. assuming that it knows where to get the patches from, I was refering to things like the debian or redhat tree with their patches. > > > > also people could be behind a firewall that prevents git from working properly, > > > > for them tarballs and patches are the right way of doing things. > > > > > > If they can't git through a firewall, they won't be able to wget a tarball through > > > it either. > > > > to work properly git should talk it's own protocol, http/ftp can be allowed (and > > authenticated) through firewalls that don't allow the git protocol. > > 'properly' is the wrong word here. optimally, yes, but the firewall argument > alone isn't sufficient to claim git can't be used to clone a tree. > A tree cloned over http: vs one over git: has exactly the same information in > it. All the history, all the changes. Everything. in most cases, but there are cases where the dumb transports can make mistakes (there have been several threads on the git list covering these), git is good enough to notice mos of them, but there is still room for problems. Also, installing and configuring git should not be a prerequesite to getting the kernel. the point being git and ketchup do not eliminate the need to transfer tarballs, and therfor do not eliminate the attractivness of a compression that saves a significant amount of bandwidth. I was responding to the (apparent) argument that with git and ketchup people should not ever be downloading tarballs, so something that cuts the size of a tarball in half doesn't make any difference. David Lang ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20060921193823.ec49d446.seanlkml@sympatico.ca>]
* Re: Smaller compressed kernel source tarballs? [not found] ` <20060921193823.ec49d446.seanlkml@sympatico.ca> @ 2006-09-21 23:38 ` Sean 0 siblings, 0 replies; 43+ messages in thread From: Sean @ 2006-09-21 23:38 UTC (permalink / raw) To: David Lang Cc: Dave Jones, Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds On Thu, 21 Sep 2006 15:34:53 -0700 (PDT) David Lang <dlang@digitalinsight.com> wrote: > I was responding to the (apparent) argument that with git and ketchup people > should not ever be downloading tarballs, so something that cuts the size of a > tarball in half doesn't make any difference. Sure there are some cases where tarballs are more appropriate, but with git and maybe some of the other tools it should really be the minority situation. I wonder how many people just use tarballs out of inertia. All said though saving a few bytes of bandwidth by making the tarballs smaller can't hurt. Sean ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <Pin e.LNX.4.63.0609211455570.17238@qynat.qvtvafvgr.pbz>]
* Re: Smaller compressed kernel source tarballs? [not found] ` <20060921182554.23044ca3.seanlkml@sympatico.ca> @ 2006-09-21 22:25 ` Sean [not found] ` <20060921182554.23044ca3.seanlkml@sympatico.ca> 0 siblings, 1 reply; 43+ messages in thread From: Sean @ 2006-09-21 22:25 UTC (permalink / raw) To: David Lang; +Cc: Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds On Thu, 21 Sep 2006 15:00:48 -0700 (PDT) David Lang <dlang@digitalinsight.com> wrote: > yes, > however git users are people who plan on following every kernel version for a > while, tarball users are people who grab a copy of the kernel once in a while > (probably not every version). for the tarball users they would have to grab > multiple patches to get from the last thing that they have to whatever is > current. and frankly they may not (and probably should not) trust the last thing > that they have, as in many cases it's a distro patched kernel that may not be > compatable with the vanilla kernel. > > people who start downloading every revision should start useing git or patches, > but not everyone needs it. Agreed, but for those people there isn't going to be much need (if any) to worry about if the tar ball is in .gzip or .bzip2 or whatever then either. And that was the case that inspired the suggestion. > also people could be behind a firewall that prevents git from working properly, > for them tarballs and patches are the right way of doing things. I use git from behind a firewall everyday without a problem. If you've seen such a problem yourself, a bug report would hopefully lead to a solution. Thanks, Sean ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <20060921182554.23044ca3.seanlkml@sympatico.ca>]
* Re: Smaller compressed kernel source tarballs? [not found] ` <20060921182554.23044ca3.seanlkml@sympatico.ca> @ 2006-09-21 22:20 ` David Lang 0 siblings, 0 replies; 43+ messages in thread From: David Lang @ 2006-09-21 22:20 UTC (permalink / raw) To: Sean; +Cc: Dax Kelson, Lennart Sorensen, Linux kernel, Linus Torvalds On Thu, 21 Sep 2006, Sean wrote: >> also people could be behind a firewall that prevents git from working properly, >> for them tarballs and patches are the right way of doing things. > > I use git from behind a firewall everyday without a problem. If you've seen > such a problem yourself, a bug report would hopefully lead to a solution. it's not a bug, it's simply the fact that git (properly) uses it's own port for it's own protocol, and not all firewalls allow access to that port. in some cases even where a person would have the ability to get the firewall changed they may not want to for other (political) reasons. even if git tunneled over HTTP there would be firewalls that would require authentication that git wouldn't be able to do and would therefor block the access. David Lang ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-21 20:42 ` Lennart Sorensen [not found] ` <20060921171747.9ae2b42e.seanlkml@sympatico.ca> @ 2006-09-21 21:40 ` Dax Kelson 2006-09-22 14:00 ` Lennart Sorensen 2006-09-21 21:43 ` H. Peter Anvin 2 siblings, 1 reply; 43+ messages in thread From: Dax Kelson @ 2006-09-21 21:40 UTC (permalink / raw) To: Lennart Sorensen; +Cc: Linux kernel, Linus Torvalds On Thu, 2006-09-21 at 16:42 -0400, Lennart Sorensen wrote: > But after you download it once, you can just get the diff next time. > How is the decompression time on 7zip versus bzip2 and gzip? Decompression times on 2.6.18 are as follows: gzip: 0m3.509s 7zip: 0m10.012s bzip2: 0m22.703s Dax Kelson ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-21 21:40 ` Dax Kelson @ 2006-09-22 14:00 ` Lennart Sorensen 0 siblings, 0 replies; 43+ messages in thread From: Lennart Sorensen @ 2006-09-22 14:00 UTC (permalink / raw) To: Dax Kelson; +Cc: Linux kernel, Linus Torvalds On Thu, Sep 21, 2006 at 03:40:09PM -0600, Dax Kelson wrote: > Decompression times on 2.6.18 are as follows: > > gzip: 0m3.509s > 7zip: 0m10.012s > bzip2: 0m22.703s Hmm, not bad. -- Len Sorensen ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-21 20:42 ` Lennart Sorensen [not found] ` <20060921171747.9ae2b42e.seanlkml@sympatico.ca> 2006-09-21 21:40 ` Dax Kelson @ 2006-09-21 21:43 ` H. Peter Anvin 2006-09-22 14:00 ` Lennart Sorensen 2 siblings, 1 reply; 43+ messages in thread From: H. Peter Anvin @ 2006-09-21 21:43 UTC (permalink / raw) To: Lennart Sorensen; +Cc: Dax Kelson, Linux kernel, Linus Torvalds Lennart Sorensen wrote: > On Thu, Sep 21, 2006 at 02:32:57PM -0600, Dax Kelson wrote: >> Today as I was watching the linux-2.6.18.tar.bz2 slowly download I >> thought it would be nice if it could be made smaller. >> >> The 7zip program/algorithm is free software (LGPL) and can be obtained >> from http://www.7-zip.org/ and it is distributed with several >> distributions (it is in Fedora Core 6 extras for example). >> > > But after you download it once, you can just get the diff next time. > How is the decompression time on 7zip versus bzip2 and gzip? > 7zip (LZMA) decompresses quickly, and the decompressor text is actually smaller than the equivalent for gzip. Quite nice. What is not nice is the code for the compressor, which is a total mess. I have been holding out on implementing LZMA on kernel.org, because just as zip (deflate) didn't become common in the Unix world until an encapsulation format that handles things expected in the Unix world, e.g. streaming, was created (gzip), I don't think LZMA is going to be widely used until there is an "lzip" which does the same thing. I actually started the work of adding LZMA support to gzip, but then realized it would be better if a new encapsulation format with proper 64-bit support everywhere was created. -hpa ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-21 21:43 ` H. Peter Anvin @ 2006-09-22 14:00 ` Lennart Sorensen 2006-09-22 16:13 ` H. Peter Anvin 2006-09-22 16:13 ` Jan Engelhardt 0 siblings, 2 replies; 43+ messages in thread From: Lennart Sorensen @ 2006-09-22 14:00 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Dax Kelson, Linux kernel, Linus Torvalds On Thu, Sep 21, 2006 at 02:43:46PM -0700, H. Peter Anvin wrote: > 7zip (LZMA) decompresses quickly, and the decompressor text is actually > smaller than the equivalent for gzip. Quite nice. > > What is not nice is the code for the compressor, which is a total mess. > I have been holding out on implementing LZMA on kernel.org, because > just as zip (deflate) didn't become common in the Unix world until an > encapsulation format that handles things expected in the Unix world, > e.g. streaming, was created (gzip), I don't think LZMA is going to be > widely used until there is an "lzip" which does the same thing. I > actually started the work of adding LZMA support to gzip, but then > realized it would be better if a new encapsulation format with proper > 64-bit support everywhere was created. It doesn't handle streaming? So you can't do: tar c dirname | 7zip dirname.tar.7z ? -- Len Sorensen RuggedCom ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-22 14:00 ` Lennart Sorensen @ 2006-09-22 16:13 ` H. Peter Anvin 2006-09-22 16:13 ` Jan Engelhardt 1 sibling, 0 replies; 43+ messages in thread From: H. Peter Anvin @ 2006-09-22 16:13 UTC (permalink / raw) To: Lennart Sorensen; +Cc: Dax Kelson, Linux kernel, Linus Torvalds Lennart Sorensen wrote: > On Thu, Sep 21, 2006 at 02:43:46PM -0700, H. Peter Anvin wrote: >> 7zip (LZMA) decompresses quickly, and the decompressor text is actually >> smaller than the equivalent for gzip. Quite nice. >> >> What is not nice is the code for the compressor, which is a total mess. >> I have been holding out on implementing LZMA on kernel.org, because >> just as zip (deflate) didn't become common in the Unix world until an >> encapsulation format that handles things expected in the Unix world, >> e.g. streaming, was created (gzip), I don't think LZMA is going to be >> widely used until there is an "lzip" which does the same thing. I >> actually started the work of adding LZMA support to gzip, but then >> realized it would be better if a new encapsulation format with proper >> 64-bit support everywhere was created. > > It doesn't handle streaming? > > So you can't do: tar c dirname | 7zip dirname.tar.7z ? > Nope, and in particular you can't do: tar cf - dirname | 7zip | ssh ... This is because 7zip is an archiving format in its own right, much like zip. What we want is something that is to 7zip what gzip is to zip. -hpa ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-22 14:00 ` Lennart Sorensen 2006-09-22 16:13 ` H. Peter Anvin @ 2006-09-22 16:13 ` Jan Engelhardt 2006-09-22 16:33 ` H. Peter Anvin 1 sibling, 1 reply; 43+ messages in thread From: Jan Engelhardt @ 2006-09-22 16:13 UTC (permalink / raw) To: Lennart Sorensen; +Cc: H. Peter Anvin, Dax Kelson, Linux kernel, Linus Torvalds >> widely used until there is an "lzip" which does the same thing. I >> actually started the work of adding LZMA support to gzip, but then >> realized it would be better if a new encapsulation format with proper >> 64-bit support everywhere was created. > >It doesn't handle streaming? > >So you can't do: tar c dirname | 7zip dirname.tar.7z ? man 7z [slightly changed for reasonability]: -si Read data from StdIn (eg: tar -c directory | 7z a -si directory.tar.7z) Jan Engelhardt -- ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-22 16:13 ` Jan Engelhardt @ 2006-09-22 16:33 ` H. Peter Anvin 2006-09-22 17:41 ` Johannes Stezenbach 0 siblings, 1 reply; 43+ messages in thread From: H. Peter Anvin @ 2006-09-22 16:33 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Lennart Sorensen, Dax Kelson, Linux kernel, Linus Torvalds Jan Engelhardt wrote: >>> widely used until there is an "lzip" which does the same thing. I >>> actually started the work of adding LZMA support to gzip, but then >>> realized it would be better if a new encapsulation format with proper >>> 64-bit support everywhere was created. >> It doesn't handle streaming? >> >> So you can't do: tar c dirname | 7zip dirname.tar.7z ? > > man 7z [slightly changed for reasonability]: > > -si > Read data from StdIn (eg: tar -c directory | 7z a -si directory.tar.7z) > Yes, but you can't make it write to an unseekable stdout. -hpa ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-22 16:33 ` H. Peter Anvin @ 2006-09-22 17:41 ` Johannes Stezenbach 2006-09-22 18:09 ` H. Peter Anvin 0 siblings, 1 reply; 43+ messages in thread From: Johannes Stezenbach @ 2006-09-22 17:41 UTC (permalink / raw) To: H. Peter Anvin Cc: Jan Engelhardt, Lennart Sorensen, Dax Kelson, Linux kernel, Linus Torvalds On Fri, Sep 22, 2006 at 09:33:01AM -0700, H. Peter Anvin wrote: > Jan Engelhardt wrote: > >>>widely used until there is an "lzip" which does the same thing. I > >>>actually started the work of adding LZMA support to gzip, but then > >>>realized it would be better if a new encapsulation format with proper > >>>64-bit support everywhere was created. > >>It doesn't handle streaming? > >> > >>So you can't do: tar c dirname | 7zip dirname.tar.7z ? > > > >man 7z [slightly changed for reasonability]: > > > > -si > > Read data from StdIn (eg: tar -c directory | 7z a -si > > directory.tar.7z) > > > > Yes, but you can't make it write to an unseekable stdout. It seems the "lzma" program from LZMA Utils can: http://tukaani.org/lzma/ "Very similar command line interface than what gzip and bzip2 have." (Debian sid has this in the "lzma" package.) Johannes ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-22 17:41 ` Johannes Stezenbach @ 2006-09-22 18:09 ` H. Peter Anvin 2006-09-22 18:19 ` Michael Tokarev 0 siblings, 1 reply; 43+ messages in thread From: H. Peter Anvin @ 2006-09-22 18:09 UTC (permalink / raw) To: Johannes Stezenbach Cc: Jan Engelhardt, Lennart Sorensen, Dax Kelson, Linux kernel, Linus Torvalds Johannes Stezenbach wrote: > > It seems the "lzma" program from LZMA Utils can: > > http://tukaani.org/lzma/ > "Very similar command line interface than what gzip and bzip2 have." > > (Debian sid has this in the "lzma" package.) > Yes, it can. If that's the way things go then I don't mind it, however, my biggest problem with lzma utils is that the command line parsing is done in a shell script wrapper. Maybe I'll start using it anyway... -hpa ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-22 18:09 ` H. Peter Anvin @ 2006-09-22 18:19 ` Michael Tokarev 2006-09-22 18:26 ` H. Peter Anvin 0 siblings, 1 reply; 43+ messages in thread From: Michael Tokarev @ 2006-09-22 18:19 UTC (permalink / raw) To: H. Peter Anvin Cc: Johannes Stezenbach, Jan Engelhardt, Lennart Sorensen, Dax Kelson, Linux kernel, Linus Torvalds H. Peter Anvin wrote: > Johannes Stezenbach wrote: >> >> It seems the "lzma" program from LZMA Utils can: >> >> http://tukaani.org/lzma/ >> "Very similar command line interface than what gzip and bzip2 have." >> >> (Debian sid has this in the "lzma" package.) >> > > Yes, it can. If that's the way things go then I don't mind it, however, > my biggest problem with lzma utils is that the command line parsing is > done in a shell script wrapper. Well, I don't see any shell code here, in /usr/bin/lzma as in istalled from debian version 4.43-2. But note that this lzma utility does not have any 'magic number' and does no crc checks. On the site it's said lzma(sdk) is under rewrite to support new format with magic number and crc checks... After reading this thread I wanted to teach GNU tar to automatically recognize ..tar.lzma archives - and failed, eactly because of the lack of magic number at the start of a file... /mjt ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-22 18:19 ` Michael Tokarev @ 2006-09-22 18:26 ` H. Peter Anvin 2006-09-25 11:51 ` Paulo Marques 0 siblings, 1 reply; 43+ messages in thread From: H. Peter Anvin @ 2006-09-22 18:26 UTC (permalink / raw) To: Michael Tokarev Cc: Johannes Stezenbach, Jan Engelhardt, Lennart Sorensen, Dax Kelson, Linux kernel, Linus Torvalds Michael Tokarev wrote: > > Well, I don't see any shell code here, in /usr/bin/lzma as in istalled from > debian version 4.43-2. > > But note that this lzma utility does not have any 'magic number' and does > no crc checks. Ah, right, that's a total killer. > On the site it's said lzma(sdk) is under rewrite to support > new format with magic number and crc checks... That is an absolute must, IMO. I would use the gzip format as a base. -hpa ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-22 18:26 ` H. Peter Anvin @ 2006-09-25 11:51 ` Paulo Marques 2006-09-25 15:47 ` H. Peter Anvin 0 siblings, 1 reply; 43+ messages in thread From: Paulo Marques @ 2006-09-25 11:51 UTC (permalink / raw) To: H. Peter Anvin Cc: Michael Tokarev, Johannes Stezenbach, Jan Engelhardt, Lennart Sorensen, Dax Kelson, Linux kernel, Linus Torvalds H. Peter Anvin wrote: > Michael Tokarev wrote: >>[...] >> On the site it's said lzma(sdk) is under rewrite to support >> new format with magic number and crc checks... > > That is an absolute must, IMO. I would use the gzip format as a base. If you're suggesting a gzip like format (but with different magic, etc.), that's ok. However, it has been suggested on similar threads to use the CM field of the gzip format to introduce different compression methods. While this is the purpose of this field, I find this to be a very bad idea. The worse part of it is that, after "lzma gzip" files start to proliferate, you never know if you can decompress a .gz with your version of gunzip, which is something that you currently have for granted. If more formats start being supported inside gzip, this only gets worse... -- Paulo Marques - www.grupopie.com "The face of a child can say it all, especially the mouth part of the face." ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Smaller compressed kernel source tarballs? 2006-09-25 11:51 ` Paulo Marques @ 2006-09-25 15:47 ` H. Peter Anvin 0 siblings, 0 replies; 43+ messages in thread From: H. Peter Anvin @ 2006-09-25 15:47 UTC (permalink / raw) To: Paulo Marques Cc: Michael Tokarev, Johannes Stezenbach, Jan Engelhardt, Lennart Sorensen, Dax Kelson, Linux kernel, Linus Torvalds Paulo Marques wrote: > H. Peter Anvin wrote: >> Michael Tokarev wrote: >>> [...] >>> On the site it's said lzma(sdk) is under rewrite to support >>> new format with magic number and crc checks... >> >> That is an absolute must, IMO. I would use the gzip format as a base. > > If you're suggesting a gzip like format (but with different magic, > etc.), that's ok. > > However, it has been suggested on similar threads to use the CM field of > the gzip format to introduce different compression methods. > > While this is the purpose of this field, I find this to be a very bad > idea. The worse part of it is that, after "lzma gzip" files start to > proliferate, you never know if you can decompress a .gz with your > version of gunzip, which is something that you currently have for granted. > > If more formats start being supported inside gzip, this only gets worse... > Doesn't mean that one should name the files .gz. A more significant reason to not do this is that I think there are a lot of programs out where which only check the magic number and not the compression format. -hpa ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2006-10-04 15:58 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-02 3:35 Smaller compressed kernel source tarballs? Drew Scott Daniels
2006-10-02 3:32 ` Bernd Eckenfels
2006-10-02 3:35 ` Willy Tarreau
[not found] ` <Pi ne.LNX.4.63.0610012205280.28534@qynat.qvtvafvgr.pbz>
2006-10-02 5:11 ` David Lang
2006-10-02 5:49 ` Willy Tarreau
2006-10-02 15:16 ` Phillip Susi
2006-10-02 15:48 ` David Lang
2006-10-02 20:20 ` Phillip Susi
2006-10-02 20:12 ` David Lang
2006-10-02 20:35 ` Willy Tarreau
[not found] ` <2006 1002203527.GA585@1wt.eu>
2006-10-02 21:49 ` Sean
[not found] ` <20061002174938.bb82027d.seanlkml@sympatico.ca>
2006-10-02 21:42 ` David Lang
2006-10-03 2:48 ` Willy Tarreau
2006-10-03 10:28 ` Jan Engelhardt
2006-10-03 18:24 ` Phillip Susi
2006-10-04 15:57 ` Compressing pages [was: Re: Smaller compressed kernel source tarballs?] Jörn Engel
-- strict thread matches above, loose matches on Subject: below --
2006-09-21 20:32 Smaller compressed kernel source tarballs? Dax Kelson
[not found] ` <20060921204250 .GN13641@csclub.uwaterloo.ca>
2006-09-21 20:42 ` Lennart Sorensen
[not found] ` <20060921171747.9ae2b42e.seanlkml@sympatico.ca>
2006-09-21 21:17 ` Sean
2006-09-21 21:41 ` Dax Kelson
2006-09-21 21:50 ` Bob Copeland
[not found] ` <20060921175717.272c58ee.seanlkml@sympatico.ca>
2006-09-21 21:57 ` Sean
2006-09-21 22:00 ` David Lang
2006-09-21 22:24 ` Dave Jones
2006-09-21 22:16 ` David Lang
2006-09-21 22:40 ` Dave Jones
2006-09-21 22:34 ` David Lang
[not found] ` <20060921193823.ec49d446.seanlkml@sympatico.ca>
2006-09-21 23:38 ` Sean
[not found] ` <Pin e.LNX.4.63.0609211455570.17238@qynat.qvtvafvgr.pbz>
2006-09-21 22:25 ` Sean
[not found] ` <20060921182554.23044ca3.seanlkml@sympatico.ca>
2006-09-21 22:20 ` David Lang
2006-09-21 21:40 ` Dax Kelson
2006-09-22 14:00 ` Lennart Sorensen
2006-09-21 21:43 ` H. Peter Anvin
2006-09-22 14:00 ` Lennart Sorensen
2006-09-22 16:13 ` H. Peter Anvin
2006-09-22 16:13 ` Jan Engelhardt
2006-09-22 16:33 ` H. Peter Anvin
2006-09-22 17:41 ` Johannes Stezenbach
2006-09-22 18:09 ` H. Peter Anvin
2006-09-22 18:19 ` Michael Tokarev
2006-09-22 18:26 ` H. Peter Anvin
2006-09-25 11:51 ` Paulo Marques
2006-09-25 15:47 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox