git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4] Add Gitweb support for XZ compressed snapshots
@ 2009-07-31 13:32 Mark A Rada
  2009-07-31 18:07 ` J.H.
  2009-07-31 18:48 ` Jakub Narebski
  0 siblings, 2 replies; 4+ messages in thread
From: Mark A Rada @ 2009-07-31 13:32 UTC (permalink / raw)
  To: git

Ok, so I got a good nights sleep now, and reviewed the results of my
benchmarks to make sure they were consistent (turns out I had the
archive sizes in the wrong order for the XZ repository tests).

I also reworded a number of things and added a conclusion to the
benchmarks.

Let me know what you think.


--
Mark A Rada (ferrous26)
marada@uwaterloo.ca


------->8--------------
From: Mark Rada <marada@uwaterloo.ca>
Date: Thu, 30 Jul 2009 08:56:42 -0400
Subject: [PATCH] Add Gitweb support for XZ compressed snapshots

The XZ compression format uses the LZMA2 compression algorithm, which
often yields higher compression ratios than both GZip and BZip2 at the
cost of using more CPU time and RAM. Though, while XZ is the slowest
for compression, it is much faster than BZip2 for decompression, almost
comparable to GZip (see benchmarks below).

You can enable XZ compressed snapshots by adding 'txz' to the list of
default options for snapshots in your $GITWEB_CONFIG.

I did some simple benchmarks, starting with an already tarballed
archive of the repos listed below. Memory usage seemed to be consistent
for any given algorithm at their default compression level. Timings were
gathered using the `time' command.

CPU: AMD Sempron 3400+ (1 core @ 1.8GHz with 256K L2 cache)
Virtual Memory Usage
	GZip: 4152K	BZip2: 13352K	XZ: 102M

Linux 2.6 series (f5886c7f96f2542382d3a983c5f13e03d7fc5259) 	349M
gzip 	23.70s user    	0.47s system  	99% cpu  24.227 total 	76M
gunzip 	3.74s user     	0.74s system  	94% cpu  4.741 total
bzip2 	130.96s user   	0.53s system  	99% cpu  2:11.97 total 	59M
bunzip2 31.05s user    	1.02s system  	99% cpu  32.355 total
xz 	448.78s user 	0.91s system  	99% cpu  7:31.28 total 	51M
unxz 	7.67s user     	0.80s system  	98% cpu  8.607 total

Git (0a53e9ddeaddad63ad106860237bbf53411d11a7) 			11M
gzip 	0.77s user 	0.03s system 	99% cpu  0.792 total 	2.5M
gunzip 	0.12s user 	0.02s system 	98% cpu  0.142 total
bzip2 	3.42s user 	0.02s system 	99% cpu  3.454 total 	2.1M
bunzip2 0.95s user 	0.03s system 	99% cpu  0.984 total
xz 	12.88s user 	0.14s system 	98% cpu  13.239 total 	1.9M
unxz 	0.27s user 	0.03s system 	99% cpu  0.298 total

XZ (669413bb2db954bbfde3c4542fddbbab53891eb4) 			1.8M
gzip  	0.12s user 	0.00s system 	95% cpu  0.132 total 	442K
gunzip 	0.02s user 	0.00s system 	97% cpu  0.027 total
bzip2  	1.28s user 	0.01s system 	99% cpu  1.298 total 	363K
bunzip2 0.15s user 	0.01s system 	100% cpu 0.157 total
xz 	1.62s user 	0.03s system 	99% cpu  1.652 total 	347K
unxz   	0.05s user 	0.00s system 	99% cpu  0.058 total

Purely from a time and memory perspective, nothing compares to GZip in
each of the three tests. Though, if you have an average upload speed of
20KB/s, it would take ~400 seconds longer to transfer the kernel  
snapshot
that was BZip2 compressed than it would the XZ compressed snapshot, the
transfer time difference is even greater when compared to the GZip
compressed snapshot. The wall clock time savings are relatively the same
for all test cases, but less dramatic for the smaller repositories.

The obvious downside for XZ compressed snapshots is the large CPU and
memory load put on the server to actualy generate the snapshot. Though  
XZ
will eventually have good threading support, and I suspect then that the
wall clock time for making an XZ compressed snapshot would go down
considerably if the server had a beefy multi-core CPU.

I have not enabled XZ compression by default because the current default
is GZip, and XZ is only really competitive with BZip2. Also, the XZ  
format
is still fairly new (the format was declared stable about 6 months ago),
and there have been no "stable" releases of the utils yet.

Signed-off-by: Mark Rada <marada@uwaterloo.ca>
---
  gitweb/gitweb.perl |    8 ++++++++
  1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7fbd5ff..3398163 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -176,6 +176,13 @@ our %known_snapshot_formats = (
  		'format' => 'tar',
  		'compressor' => ['bzip2']},

+	'txz' => {
+		'display' => 'tar.xz',
+		'type' => 'application/x-xz',
+		'suffix' => '.tar.xz',
+		'format' => 'tar',
+		'compressor' => ['xz']},
+
  	'zip' => {
  		'display' => 'zip',
  		'type' => 'application/x-zip',
@@ -188,6 +195,7 @@ our %known_snapshot_formats = (
  our %known_snapshot_format_aliases = (
  	'gzip'  => 'tgz',
  	'bzip2' => 'tbz2',
+	'xz'    => 'txz',

  	# backward compatibility: legacy gitweb config support
  	'x-gzip' => undef, 'gz' => undef,
-- 
1.6.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCHv4] Add Gitweb support for XZ compressed snapshots
  2009-07-31 13:32 [PATCHv4] Add Gitweb support for XZ compressed snapshots Mark A Rada
@ 2009-07-31 18:07 ` J.H.
  2009-07-31 18:27   ` Jakub Narebski
  2009-07-31 18:48 ` Jakub Narebski
  1 sibling, 1 reply; 4+ messages in thread
From: J.H. @ 2009-07-31 18:07 UTC (permalink / raw)
  To: Mark A Rada; +Cc: git

Mark A Rada wrote:
> Ok, so I got a good nights sleep now, and reviewed the results of my
> benchmarks to make sure they were consistent (turns out I had the
> archive sizes in the wrong order for the XZ repository tests).
> 
> I also reworded a number of things and added a conclusion to the
> benchmarks.
> 
> Let me know what you think.
> 
> 
> -- 
> Mark A Rada (ferrous26)
> marada@uwaterloo.ca
> 
> 
> ------->8--------------
> From: Mark Rada <marada@uwaterloo.ca>
> Date: Thu, 30 Jul 2009 08:56:42 -0400
> Subject: [PATCH] Add Gitweb support for XZ compressed snapshots
> 
> The XZ compression format uses the LZMA2 compression algorithm, which
> often yields higher compression ratios than both GZip and BZip2 at the
> cost of using more CPU time and RAM. Though, while XZ is the slowest
> for compression, it is much faster than BZip2 for decompression, almost
> comparable to GZip (see benchmarks below).
> 
> You can enable XZ compressed snapshots by adding 'txz' to the list of
> default options for snapshots in your $GITWEB_CONFIG.
> 
> I did some simple benchmarks, starting with an already tarballed
> archive of the repos listed below. Memory usage seemed to be consistent
> for any given algorithm at their default compression level. Timings were
> gathered using the `time' command.
> 
> CPU: AMD Sempron 3400+ (1 core @ 1.8GHz with 256K L2 cache)
> Virtual Memory Usage
>     GZip: 4152K    BZip2: 13352K    XZ: 102M
> 
> Linux 2.6 series (f5886c7f96f2542382d3a983c5f13e03d7fc5259)     349M
> gzip     23.70s user        0.47s system      99% cpu  24.227 total     76M
> gunzip     3.74s user         0.74s system      94% cpu  4.741 total
> bzip2     130.96s user       0.53s system      99% cpu  2:11.97 
> total     59M
> bunzip2 31.05s user        1.02s system      99% cpu  32.355 total
> xz     448.78s user     0.91s system      99% cpu  7:31.28 total     51M
> unxz     7.67s user         0.80s system      98% cpu  8.607 total
> 
> Git (0a53e9ddeaddad63ad106860237bbf53411d11a7)             11M
> gzip     0.77s user     0.03s system     99% cpu  0.792 total     2.5M
> gunzip     0.12s user     0.02s system     98% cpu  0.142 total
> bzip2     3.42s user     0.02s system     99% cpu  3.454 total     2.1M
> bunzip2 0.95s user     0.03s system     99% cpu  0.984 total
> xz     12.88s user     0.14s system     98% cpu  13.239 total     1.9M
> unxz     0.27s user     0.03s system     99% cpu  0.298 total
> 
> XZ (669413bb2db954bbfde3c4542fddbbab53891eb4)             1.8M
> gzip      0.12s user     0.00s system     95% cpu  0.132 total     442K
> gunzip     0.02s user     0.00s system     97% cpu  0.027 total
> bzip2      1.28s user     0.01s system     99% cpu  1.298 total     363K
> bunzip2 0.15s user     0.01s system     100% cpu 0.157 total
> xz     1.62s user     0.03s system     99% cpu  1.652 total     347K
> unxz       0.05s user     0.00s system     99% cpu  0.058 total
> 
> Purely from a time and memory perspective, nothing compares to GZip in
> each of the three tests. Though, if you have an average upload speed of
> 20KB/s, it would take ~400 seconds longer to transfer the kernel snapshot
> that was BZip2 compressed than it would the XZ compressed snapshot, the
> transfer time difference is even greater when compared to the GZip
> compressed snapshot. The wall clock time savings are relatively the same
> for all test cases, but less dramatic for the smaller repositories.
> 
> The obvious downside for XZ compressed snapshots is the large CPU and
> memory load put on the server to actualy generate the snapshot. Though XZ
> will eventually have good threading support, and I suspect then that the
> wall clock time for making an XZ compressed snapshot would go down
> considerably if the server had a beefy multi-core CPU.
> 
> I have not enabled XZ compression by default because the current default
> is GZip, and XZ is only really competitive with BZip2. Also, the XZ format
> is still fairly new (the format was declared stable about 6 months ago),
> and there have been no "stable" releases of the utils yet.

One thing that would concern me greatly, is not so much the CPU time 
(though that's a *huge* change in comparison to gz) but the memory 
usage.  Where gzip and bzip2 are chewing 4M and 13M respectively, xz 
chews 102M.  From a 'beefy' server perspective chewing up that much 
memory per snapshot for that long could be bad.  This is likely 
something that needs to have some sort of enable/disable switch if it's 
going to be included.

Good analysis btw, though I'll admit it makes me leary of something 
dynamically generating xz compressed files.

- John 'Warthog9' Hawley

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCHv4] Add Gitweb support for XZ compressed snapshots
  2009-07-31 18:07 ` J.H.
@ 2009-07-31 18:27   ` Jakub Narebski
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Narebski @ 2009-07-31 18:27 UTC (permalink / raw)
  To: J.H.; +Cc: Mark A Rada, git

"J.H." <warthog19@eaglescrag.net> writes:
> Mark A Rada wrote:

> > I have not enabled XZ compression by default because the current
> > default is GZip, and XZ is only really competitive with
> > BZip2. Also, the XZ format is still fairly new (the format was
> > declared stable about 6 months ago), and there have been no
> > "stable" releases of the utils yet.
> 
> One thing that would concern me greatly, is not so much the CPU time
> (though that's a *huge* change in comparison to gz) but the memory
> usage.  Where gzip and bzip2 are chewing 4M and 13M respectively, xz
> chews 102M.  From a 'beefy' server perspective chewing up that much
> memory per snapshot for that long could be bad.  This is likely
> something that needs to have some sort of enable/disable switch if
> it's going to be included.

First, by default project specific override for 'snapshot' feature is
disabled, and only 'tgz' ('.tar.gz') snapshot links are generated.

Second, if you want to enable project specific override for 'snapshot'
feature, you can still control what snapshot formats (among 'tgz',
'tbz2', 'zip' and now 'txz') from $GITWEB_CONFIG by simply deleting
formats you don't want to be used from %known_snapshot_formats.

For example

  delete $known_snapshot_formats{'txz'};

> 
> Good analysis btw, though I'll admit it makes me leary of something
> dynamically generating xz compressed files.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCHv4] Add Gitweb support for XZ compressed snapshots
  2009-07-31 13:32 [PATCHv4] Add Gitweb support for XZ compressed snapshots Mark A Rada
  2009-07-31 18:07 ` J.H.
@ 2009-07-31 18:48 ` Jakub Narebski
  1 sibling, 0 replies; 4+ messages in thread
From: Jakub Narebski @ 2009-07-31 18:48 UTC (permalink / raw)
  To: Mark A Rada; +Cc: git

Mark A Rada <marada@uwaterloo.ca> writes:

> Ok, so I got a good nights sleep now, and reviewed the results of my
> benchmarks to make sure they were consistent (turns out I had the
> archive sizes in the wrong order for the XZ repository tests).
> 
> I also reworded a number of things and added a conclusion to the
> benchmarks.
> 
> Let me know what you think.

Well separated change.  Very detailed commit message; that's good!.

[...]
> Linux 2.6 series (f5886c7f96f2542382d3a983c5f13e03d7fc5259) 	349M
> gzip 	23.70s user    	0.47s system  	99% cpu  24.227 total 	76M
> gunzip 	3.74s user     	0.74s system  	94% cpu  4.741 total
> bzip2 	130.96s user   	0.53s system  	99% cpu  2:11.97 total 	59M
> bunzip2 31.05s user    	1.02s system  	99% cpu  32.355 total
> xz 	448.78s user 	0.91s system  	99% cpu  7:31.28 total 	51M
> unxz 	7.67s user     	0.80s system  	98% cpu  8.607 total
> 
> Git (0a53e9ddeaddad63ad106860237bbf53411d11a7) 			11M
> gzip 	0.77s user 	0.03s system 	99% cpu  0.792 total 	2.5M
> gunzip 	0.12s user 	0.02s system 	98% cpu  0.142 total
> bzip2 	3.42s user 	0.02s system 	99% cpu  3.454 total 	2.1M
> bunzip2 0.95s user 	0.03s system 	99% cpu  0.984 total
> xz 	12.88s user 	0.14s system 	98% cpu  13.239 total 	1.9M
> unxz 	0.27s user 	0.03s system 	99% cpu  0.298 total
> 
> XZ (669413bb2db954bbfde3c4542fddbbab53891eb4) 			1.8M
> gzip  	0.12s user 	0.00s system 	95% cpu  0.132 total 	442K
> gunzip 	0.02s user 	0.00s system 	97% cpu  0.027 total
> bzip2  	1.28s user 	0.01s system 	99% cpu  1.298 total 	363K
> bunzip2 0.15s user 	0.01s system 	100% cpu 0.157 total
> xz 	1.62s user 	0.03s system 	99% cpu  1.652 total 	347K
> unxz   	0.05s user 	0.00s system 	99% cpu  0.058 total

Note that for me the above results are not aligned in table.
This is a cosmetic issue.

> Purely from a time and memory perspective, nothing compares to GZip in
> each of the three tests. Though, if you have an average upload speed of
> 20KB/s, it would take ~400 seconds longer to transfer the kernel
> snapshot
> that was BZip2 compressed than it would the XZ compressed snapshot, the
> transfer time difference is even greater when compared to the GZip
> compressed snapshot. The wall clock time savings are relatively the same
> for all test cases, but less dramatic for the smaller repositories.
> 
> The obvious downside for XZ compressed snapshots is the large CPU and
> memory load put on the server to actualy generate the snapshot. Though
> XZ
> will eventually have good threading support, and I suspect then that the
> wall clock time for making an XZ compressed snapshot would go down
> considerably if the server had a beefy multi-core CPU.
> 
> I have not enabled XZ compression by default because the current default
> is GZip, and XZ is only really competitive with BZip2. Also, the XZ
> format
> is still fairly new (the format was declared stable about 6 months ago),
> and there have been no "stable" releases of the utils yet.

Those above three paragraphs are strangely wrapped, with single word
for a whole line ('snapshot', 'XZ', 'format').  This is a cosmetic issue.

> 
> Signed-off-by: Mark Rada <marada@uwaterloo.ca>

-- 
Jakub Narebski
Git User's Survey 2009: http://tinyurl.com/GitSurvey2009

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-07-31 18:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-31 13:32 [PATCHv4] Add Gitweb support for XZ compressed snapshots Mark A Rada
2009-07-31 18:07 ` J.H.
2009-07-31 18:27   ` Jakub Narebski
2009-07-31 18:48 ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).