public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Josef Bacik <josef@redhat.com>
Cc: Jim Faulkner <jfaulkne@ccs.neu.edu>, linux-btrfs@vger.kernel.org
Subject: Re: worse than expected compression ratios with -o compress
Date: Mon, 18 Jan 2010 16:29:51 -0500	[thread overview]
Message-ID: <20100118212951.GC4065@think> (raw)
In-Reply-To: <20100118141240.GA10710@localhost.localdomain>

On Mon, Jan 18, 2010 at 09:12:40AM -0500, Josef Bacik wrote:
> On Sat, Jan 16, 2010 at 11:16:50AM -0500, Jim Faulkner wrote:
> >
> > I have a mysql database which consists of hundreds of millions, if not  
> > billions of Usenet newsgroup headers.  This data should be highly  
> > compressable, so I put the mysql data directory on a btrfs filesystem  
> > mounted with the compress option:
> > /dev/sdi on /var/news/mysql type btrfs (rw,noatime,compress,noacl)
> >
> > However, I'm not seeing the kind of compression ratios that I would 
> > expect with this type of data.  FYI, all my tests are using Linux 
> > 2.6.32.3. Here's my current disk usage:
> > Filesystem            Size  Used Avail Use% Mounted on
> > /dev/sdi              302G  122G  181G  41% /var/news/mysql
> >
> > and here's the actual size of all files:
> > delta-9 mysql # pwd
> > /var/news/mysql
> > delta-9 mysql # du -h --max-depth=1
> > 747K    ./mysql
> > 0       ./test
> > 125G    ./urd
> > 125G    .
> > delta-9 mysql #
> >
> > As you can see, I am only shaving off 3 gigs out of 125 gigs worth of 
> > what should be very compressable data.  The compressed data ends up being 
> > around 98% the size of the original data.
> >
> > To contrast, rzip can compress a database dump of this data to around 7%  
> > of its original size.  This is an older database dump, which is why it is 
> > smaller.  Before:
> > -rw------- 1 root root  69G 2010-01-15 14:55 mysqlurdbackup.2010-01-15
> > and after:
> > -rw------- 1 root root 5.2G 2010-01-16 05:34 mysqlurdbackup.2010-01-15.rz
> >
> > Of course it took 15 hours to compress the data, and btrfs wouldn't be  
> > able to use rzip for compression anyway.
> >
> > However, I still would expect to see better compression ratios than 98% 
> > on such data.  Are there plans to implement a better compression 
> > algorithm? Alternatively, is there a way to tune btrfs compression to 
> > achieve better ratios?
> >
> 
> Currently the only compression algorithm we support is gzip, so try gzipp'ing
> your database to get a better comparison.  The plan is to eventually support
> other compression algorithms, but currently we do not.  Thanks,

The compression code backs off compression pretty quickly if parts of
the file do not compress well.  This is another way of saying it favors
CPU time over the best possible compression.  If gzip ends up better
than what you're getting from btrfs, I can give you a patch to force
compression all the time.

-chris


  reply	other threads:[~2010-01-18 21:29 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-16 16:16 worse than expected compression ratios with -o compress Jim Faulkner
2010-01-17 14:34 ` Sander
2010-01-18 14:46   ` Jim Faulkner
2010-01-18 16:06     ` Jim Faulkner
2010-01-18 14:12 ` Josef Bacik
2010-01-18 21:29   ` Chris Mason [this message]
2010-01-18 22:11     ` Jim Faulkner
2010-01-20 16:30       ` Chris Mason
2010-01-21 18:16         ` Jim Faulkner
2010-01-21 20:04           ` Gregory Maxwell
2010-01-21 20:07             ` Chris Mason
2010-01-21 20:05           ` Chris Mason
2010-01-21 22:38             ` Jim Faulkner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100118212951.GC4065@think \
    --to=chris.mason@oracle.com \
    --cc=jfaulkne@ccs.neu.edu \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox