git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J.H." <warthog19@eaglescrag.net>
To: Mark A Rada <marada@uwaterloo.ca>
Cc: git@vger.kernel.org
Subject: Re: [PATCHv4] Add Gitweb support for XZ compressed snapshots
Date: Fri, 31 Jul 2009 11:07:47 -0700	[thread overview]
Message-ID: <4A7332F3.50908@eaglescrag.net> (raw)
In-Reply-To: <828BD9FC-1238-4B2E-858D-248977F04D31@uwaterloo.ca>

Mark A Rada wrote:
> Ok, so I got a good nights sleep now, and reviewed the results of my
> benchmarks to make sure they were consistent (turns out I had the
> archive sizes in the wrong order for the XZ repository tests).
> 
> I also reworded a number of things and added a conclusion to the
> benchmarks.
> 
> Let me know what you think.
> 
> 
> -- 
> Mark A Rada (ferrous26)
> marada@uwaterloo.ca
> 
> 
> ------->8--------------
> From: Mark Rada <marada@uwaterloo.ca>
> Date: Thu, 30 Jul 2009 08:56:42 -0400
> Subject: [PATCH] Add Gitweb support for XZ compressed snapshots
> 
> The XZ compression format uses the LZMA2 compression algorithm, which
> often yields higher compression ratios than both GZip and BZip2 at the
> cost of using more CPU time and RAM. Though, while XZ is the slowest
> for compression, it is much faster than BZip2 for decompression, almost
> comparable to GZip (see benchmarks below).
> 
> You can enable XZ compressed snapshots by adding 'txz' to the list of
> default options for snapshots in your $GITWEB_CONFIG.
> 
> I did some simple benchmarks, starting with an already tarballed
> archive of the repos listed below. Memory usage seemed to be consistent
> for any given algorithm at their default compression level. Timings were
> gathered using the `time' command.
> 
> CPU: AMD Sempron 3400+ (1 core @ 1.8GHz with 256K L2 cache)
> Virtual Memory Usage
>     GZip: 4152K    BZip2: 13352K    XZ: 102M
> 
> Linux 2.6 series (f5886c7f96f2542382d3a983c5f13e03d7fc5259)     349M
> gzip     23.70s user        0.47s system      99% cpu  24.227 total     76M
> gunzip     3.74s user         0.74s system      94% cpu  4.741 total
> bzip2     130.96s user       0.53s system      99% cpu  2:11.97 
> total     59M
> bunzip2 31.05s user        1.02s system      99% cpu  32.355 total
> xz     448.78s user     0.91s system      99% cpu  7:31.28 total     51M
> unxz     7.67s user         0.80s system      98% cpu  8.607 total
> 
> Git (0a53e9ddeaddad63ad106860237bbf53411d11a7)             11M
> gzip     0.77s user     0.03s system     99% cpu  0.792 total     2.5M
> gunzip     0.12s user     0.02s system     98% cpu  0.142 total
> bzip2     3.42s user     0.02s system     99% cpu  3.454 total     2.1M
> bunzip2 0.95s user     0.03s system     99% cpu  0.984 total
> xz     12.88s user     0.14s system     98% cpu  13.239 total     1.9M
> unxz     0.27s user     0.03s system     99% cpu  0.298 total
> 
> XZ (669413bb2db954bbfde3c4542fddbbab53891eb4)             1.8M
> gzip      0.12s user     0.00s system     95% cpu  0.132 total     442K
> gunzip     0.02s user     0.00s system     97% cpu  0.027 total
> bzip2      1.28s user     0.01s system     99% cpu  1.298 total     363K
> bunzip2 0.15s user     0.01s system     100% cpu 0.157 total
> xz     1.62s user     0.03s system     99% cpu  1.652 total     347K
> unxz       0.05s user     0.00s system     99% cpu  0.058 total
> 
> Purely from a time and memory perspective, nothing compares to GZip in
> each of the three tests. Though, if you have an average upload speed of
> 20KB/s, it would take ~400 seconds longer to transfer the kernel snapshot
> that was BZip2 compressed than it would the XZ compressed snapshot, the
> transfer time difference is even greater when compared to the GZip
> compressed snapshot. The wall clock time savings are relatively the same
> for all test cases, but less dramatic for the smaller repositories.
> 
> The obvious downside for XZ compressed snapshots is the large CPU and
> memory load put on the server to actualy generate the snapshot. Though XZ
> will eventually have good threading support, and I suspect then that the
> wall clock time for making an XZ compressed snapshot would go down
> considerably if the server had a beefy multi-core CPU.
> 
> I have not enabled XZ compression by default because the current default
> is GZip, and XZ is only really competitive with BZip2. Also, the XZ format
> is still fairly new (the format was declared stable about 6 months ago),
> and there have been no "stable" releases of the utils yet.

One thing that would concern me greatly, is not so much the CPU time 
(though that's a *huge* change in comparison to gz) but the memory 
usage.  Where gzip and bzip2 are chewing 4M and 13M respectively, xz 
chews 102M.  From a 'beefy' server perspective chewing up that much 
memory per snapshot for that long could be bad.  This is likely 
something that needs to have some sort of enable/disable switch if it's 
going to be included.

Good analysis btw, though I'll admit it makes me leary of something 
dynamically generating xz compressed files.

- John 'Warthog9' Hawley

  reply	other threads:[~2009-07-31 18:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-31 13:32 [PATCHv4] Add Gitweb support for XZ compressed snapshots Mark A Rada
2009-07-31 18:07 ` J.H. [this message]
2009-07-31 18:27   ` Jakub Narebski
2009-07-31 18:48 ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A7332F3.50908@eaglescrag.net \
    --to=warthog19@eaglescrag.net \
    --cc=git@vger.kernel.org \
    --cc=marada@uwaterloo.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).